ABSTRACT Title of Dissertation: SIMULATION-BASED OPTIMIZATION OF TRANSPORTATION SYSTEMS: THEORY, SURROGATE MODELS, AND APPLICATIONS Xiang He, Doctor of Philosophy, 2014 Dissertation directed by: Lei Zhang, Associate Professor Department of Civil & Environmental Engineering The construction of new highway infrastructure has not kept pace with the growth of travel, mainly due to the limitation of land and funding availability. To improve the mobility, safety, reliability and sustainability of the transportation system, various transportation planning and traffic operations policies have been developed in the past few decades. On the other hand, simulation is widely used to evaluate the impacts of those policies, due to its advantages in capturing network and behavior details and capability of analyzing various combinations of policies. A simulation-based optimization (SBO) method, which combines the strength of simulation evaluation and mathematical optimization, is imperative for supporting decision making in practice. The objective of this dissertation is to develop SBO methods that can be efficiently applied to transportation planning and operations problems. Surrogate-based methods are selected as the research focus after reviewing various existing SBO methods. A systematic framework for applying the surrogate-based optimization methods in transportation research is then developed. The performance of different forms of surrogate models is compared through a numerical example, and regressing Kriging is identified as the best model in approximating the unknown response surface when no information regarding the simulation noise is available. Accompanied with an expected improvement global infill strategy, regressing Kriging is successfully applied in a real world application of optimizing the dynamic pricing for a toll road in the Inter-County Connector (ICC) regional network in the State of Maryland. To further explore its capability in dealing with problems that are of more interest to planners and operators of the transportation system, this method is then extended to solve constrained and multi- objective optimization problems. Due to the observation of heteroscedasticity in transportation simulation outputs, two surrogate models that can be adapted for heteroscedastic data are developed: a heteroscedastic support vector regression (SVR) model and a Bayesian stochastic Kriging model. These two models deal with the heteroscedasticity in simulation noise in different ways, and their superiority in approximating the response surface of simulations with heteroscedastic noise over regressing Kriging is verified through both numerical studies and real world applications. Furthermore, a distribution-based SVR model which takes into account the statistical distribution of simulation noise is developed. By utilizing the bootstrapping method, a global search scheme can be incorporated into this model. The value of taking into account the statistical distribution of simulation noise in improving the convergence rate for optimization is then verified through numerical examples and a real world application of integrated corridor traffic management. This research is one of the first to introduce simulation-based optimization methods into large-scale transportation network research. Various types of practical problems (with single-objective, with multi-objective or with complex constraints) can be resolved. Meanwhile, the developed optimization methods are general and can be applied to analyze all types of policies using any simulator. Methodological improvements to the surrogate models are made to take into account the statistical characteristics of simulation noise. These improvements are shown to enhance the prediction accuracy of the surrogate models, and further enhance the efficiency of optimization. Generally, compared to traditional surrogate models, fewer simulation evaluations would be needed to find the optimal solution when these improved models are applied. SIMULATION-BASED OPTIMIZATION OF TRANSPORTATION SYSTEMS: THEORY, SURROGATE MODELS, AND APPLICATIONS By Xiang He Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2014 Advisory Committee: Professor Lei Zhang, Chair/Advisor Professor Michael Fu Professor Ali Haghani Professor Elise Miller-Hooks Professor Paul Schonfeld © Copyright by Xiang He 2014 ii Acknowledgements It would not have been possible for me to finish this research without the help and support from so many people. It is a great pleasure to thank all the people that contributed to the preparation and the completion of this thesis. I am deeply indebted to my advisor, Dr. Lei Zhang, for his continuous support, inspiring advice and enthusiastic encouragement throughout the years of my Ph.D. study at the University of Maryland, College Park. I am grateful for the opportunity he gives me to work on this emerging and interesting topic. He is not only an ideal advisor in research, but also a great mentor in my pursuit of career and a good friend that cares about my personal life. I would like to extend my thanks to the rest of my dissertation committee: Professor Michael Fu, Professor Ali Haghani, Professor Elise Miller-Hooks and Professor Paul Schonfeld, for their encouragement, insightful comments and valuable suggestions to my research. My sincere thanks also go to Dr. Xiqun (Michael) Chen, for the knowledge and insight he has shared. A lot of ideas in this research come from the fruitful discussion between us. I thank all my colleagues in the Transportation Systems Research Lab at the University of Maryland. The ideas and suggestions they shared with me during our group meetings have greatly contributed to the completion of this research. I would also like to thank my fellow graduate students from other groups in the transportation iii program. I appreciate the opportunities to take class, cooperate on projects and have discussions with them. Finally, I would like to express my gratitude towards my family and dedicate this dissertation to them. They have been encouraging me, supporting me and loving me all the time. I am especially thankful to my wife, Zheng Zhang, for her understanding and love, and for coming to the US to accompany me during those days I worked on this dissertation. iv Table of Contents Acknowledgements ....................................................................................................... ii  Table of Contents ......................................................................................................... iv  List of Tables ............................................................................................................. viii  List of Figures ............................................................................................................... x  List of Abbreviations ................................................................................................. xiv  List of Mathematical Symbols .................................................................................. xvii  Chapter 1: Introduction ................................................................................................. 1  1.1  Background ................................................................................................... 1  1.2 Objective Evaluation ........................................................................................... 3  1.2.1 Mathematical Methods ................................................................................. 4  1.2.2 Simulation Models ....................................................................................... 8  1.3 Research Objectives and Methodology ............................................................ 10  1.4 Outline of the Dissertation ................................................................................ 12  Chapter 2: Literature Review ...................................................................................... 15  2.1 Simulation-Based Optimization Methods ......................................................... 17  2.1.1 Methods Primarily for Discrete Optimization Problems ........................... 17  2.1.2 Gradient-Based Methods ........................................................................... 20  2.1.3 Mathematical Programming-Based Methods ............................................ 21  2.1.4 Metaheuristics ............................................................................................ 22  2.1.5 Surrogate-Based Methods .......................................................................... 24  2.2 Deterministic vs. Stochastic .............................................................................. 26  2.3 SBO Application in Transportation Research .................................................. 28  2.4 Summary ........................................................................................................... 32  Chapter 3: Surrogate-Based Optimization Procedures and Application ..................... 34  3.1 Framework ........................................................................................................ 34  3.2 Optimization Procedures ................................................................................... 37  v 3.2.1 Design of Experiments ............................................................................... 37  3.2.2 Surrogate Model Construction .................................................................. 38  3.2.3 Infill Strategies ........................................................................................... 44  3.2.4 Summary .................................................................................................... 46  3.3 Numerical Test .................................................................................................. 47  3.4 Test Bed for Real World Application ............................................................... 55  3.4.1 The ICC Road Network in Maryland ......................................................... 56  3.4.2 Open Source DTA Simulator: DynusT ....................................................... 58  3.5 Application: Optimal Dynamic Pricing for Toll Roads .................................... 60  3.5.1 Background ................................................................................................ 60  3.5.2 Problem Setting and Formulation ............................................................. 62  3.5.3 Surrogate Models Evaluation .................................................................... 65  3.5.4 Optimization Results .................................................................................. 69  3.5.5 Sensitivity Analysis ..................................................................................... 73  3.6 Conclusions ....................................................................................................... 75  Chapter 4: Constrained and Multi-Objective Simulation-based Optimization ........... 77  4.1 Constrained Simulation-based Optimization .................................................... 77  4.1.1 Problem Formulation ................................................................................. 79  4.1.2 Infill Strategy ............................................................................................. 80  4.1.3 Optimization Results .................................................................................. 81  4.2 Multi-Objective Simulation-based Optimization .............................................. 87  4.2.1 Problem Formulation ................................................................................. 88  4.2.2 Infill Strategy ............................................................................................. 89  4.2.3 Optimization Results .................................................................................. 91  4.3 Conclusions ....................................................................................................... 95  Chapter 5: An Enhanced SVR Method: Adapting for Heteroscedastic Simulation Noise ........................................................................................................................... 97  5.1 Heteroscedasticity in Transportation Simulation Outputs ................................ 97  5.2 Methodology ..................................................................................................... 99  5.2.1 Traditional Support Vector regression (SVR) ............................................ 99  vi 5.2.2 An Enhanced SVR .................................................................................... 103  5.3 Numerical Tests .............................................................................................. 106  5.4 Application ...................................................................................................... 113  5.5 Conclusions ..................................................................................................... 117  Chapter 6: Bayesian Stochastic Kriging for Simulations with Heteroscedastic Errors ................................................................................................................................... 120  6.1 Bayesian Stochastic Kriging Model ............................................................... 120  6.1.1 Model Formulation .................................................................................. 121  6.1.2 Model Evaluation ..................................................................................... 130  6.2 Numerical Example ........................................................................................ 131  6.2.1 Toy Network ............................................................................................. 132  6.2.2 Numerical Results .................................................................................... 134  6.3 Application of Integrated Corridor Planning and Operational Optimization . 137  6.3.1 Research Problem .................................................................................... 137  6.3.2 Study Area ................................................................................................ 139  6.3.3 Simulation Demand and Supply Calibration ........................................... 142  6.3.4 SBO Results .............................................................................................. 148  6.4 Conclusions ..................................................................................................... 155  Chapter 7: Bootstrapping of an Enhanced SVR Method Considering Distribution of Simulation Noise ....................................................................................................... 157  7.1 Asymmetric Distribution of Simulation Noise ............................................... 157  7.2 Model Development and Evaluation .............................................................. 160  7.2.1 Loss Function ........................................................................................... 161  7.2.2 Distribution-Based Support Vector Regression ....................................... 162  7.2.3 Numerical Test ......................................................................................... 166  7.3 Bootstrapping and Infill .................................................................................. 169  7.3.1 Bootstrapping Method ............................................................................. 170  7.3.2 Expected Improvement-Based Infill ......................................................... 172  7.3.3 Numerical Example .................................................................................. 173  7.4 Application ...................................................................................................... 176  vii 7.5 Conclusions ..................................................................................................... 181  Chapter 8: Conclusions ............................................................................................. 183  8.1 Contributions .................................................................................................. 183  8.2 Recommendations for Future Research .......................................................... 186  Bibliography ............................................................................................................. 189  viii List of Tables   Table 3-1: Characteristics of Surrogate Models. ........................................................ 47 Table 3-2: Input Data and Equilibrium Flow for a Small Road Network. ................. 50 Table 3-3: The Estimation Accuracy Comparison of Surrogate Models, under 40 Times of Objective Function Evaluations. .................................................................. 54 Table 3-4: Selected Design Parameters and Baseline Value. ..................................... 58 Table 3-5: Space-Filling Latin Hypercube of Parameters for DoE. ........................... 63 Table 3-6: Leave-One-Out Cross Validation Results. ................................................ 66 Table 5-1: Performance of the Heteroscedastic SVR Model and Regressing Kriging Model. ....................................................................................................................... 110 Table 6-1: Input Data for a Small Road Network. .................................................... 133 Table 6-2: Dynamic Travel Demand. ....................................................................... 133 Table 6-3: The Goodness-of-Fit of Surrogate Models. ............................................. 135 Table 6-4: Summaries of the 24-Hour Simulation for the Corridor. ........................ 143 Table 6-5: Calibration Errors of Network-Wide Traffic Flow Quantities on General- Purpose Lanes. .......................................................................................................... 146 Table 6-6: Comparison of Measured and Simulated VMT on HOV Lanes. (Miles) 147 Table 6-7: Space-Filling Latin Hypercube Sampling of Parameters for DoE. ......... 149 Table 6-8: Comparison of the Baseline and Optima for PM Peak Simulation Results. ................................................................................................................................... 152 Table 7-1: Chi-Square Goodness-of-Fit Test Results ............................................... 159 ix Table 7-2: Performance of Heteroscedastic SVR, Bayesian Stochastic Kriging and Distribution-Based SVR. .......................................................................................... 168 Table 7-3: Comparison of the Baseline, BSK Optima and D-SVR Optima for PM Peak Simulation Results. .......................................................................................... 179 x List of Figures Figure 1-1: Rational Planning Model Flowchart. (Source: Levinson et al., 2013) ....... 2  Figure 3-1: Transportation Simulation-based Optimization Procedure using the Surrogate Methodology. ............................................................................................. 34  Figure 3-2: An Illustration of 100 LHS DoEs Based on the Spacing Filling Criterion ( )p X . ......................................................................................................................... 38  Figure 3-3: Numerical Network. (link 1 and 2 are the tolled links) ........................... 49  Figure 3-4: The True Response Function of *( , )F z q . ............................................... 50  Figure 3-5: Validation of Surrogate Models. .............................................................. 53  Figure 3-6: An Illustration of the Higher Computational Efficiency of the Surrogate Models compared to GA. ............................................................................................ 55  Figure 3-7: The Inter-County Connector (in thick line) and Regional Network. (Source: http://www.mdta.maryland.gov/ICC/Toll_Rates.html) ............................... 57  Figure 3-8: Prediction Accuracy of the Leave-One-Out Cross Validation: (a) M1; (b) M3; (c) M11; (d) M12................................................................................................. 68  Figure 3-9: Comparisons of Traffic Flow Volumes of Additive Toll Links between the Baseline and Optimized Toll Rates. ...................................................................... 70  Figure 3-10: Comparison of Toll Revenue Collected on Additive Toll Links between the Baseline and Optimized Toll Rates, (a) Toll Revenue; (b) Cumulative Toll Revenue....................................................................................................................... 71  xi Figure 3-11: Comparison of the Vehicle Throughput between the Baseline and Optimized Toll Rates, (a) Vehicle Throughput; (b) Cumulative Vehicle Throughput. ..................................................................................................................................... 72  Figure 3-12: Sensitivity Analysis of the (a) Baseline and (b) Optima. ....................... 74  Figure 4-1: Initial and Infill Points for the Constrained Optimization Problem. ........ 82  Figure 4-2: Traffic Volumes of ICC Segments under Optima and Baseline Toll. ..... 84  Figure 4-3: Comparison of Network Wide Average Travel Time between Optimal and Baseline Toll Plans. .............................................................................................. 85  Figure 4-4: Comparison of Toll Revenue between Optimal and Baseline Toll Plans, (a) Toll Revenue; (b) Cumulative Toll Revenue. ....................................................... 86  Figure 4-5: Comparison of Throughput between Optimal and Baseline Toll Plans, (a) Vehicle Throughput; (b) Cumulative Vehicle Throughput. ....................................... 86  Figure 4-6: Example of a Pareto Set. .......................................................................... 89  Figure 4-7: Pareto front for the multi-objective optimization problem, (a) Pareto Front Based on Initial Samples; (b) Pareto Front Based on Augmented Samples. .............. 92  Figure 5-1: Heteroscedasticity Exhibited in the Simulation Outputs, (a) Standard Deviation vs. Mean; (b) Coefficient of Variation vs. Mean. ...................................... 98  Figure 5-2: Comparison of the Traditional and the Heteroscedastic SVR Model, (a) Traditional SVR Model; (b) Heteroscedastic SVR Model. ...................................... 105  Figure 5-3: Validations RMSE for the 6-Variable Function with a High Variance Level, (a) Small Sample; (b) Large Sample. ............................................................. 112  Figure 5-4: Traffic Volumes of ICC Segments under Optima and Baseline Tolls. .. 115  Figure 5-5: Average Travel Time under Optima and Baseline Tolls. ...................... 116  xii Figure 5-6: Throughput under Optima and Baseline Tolls, (a) Vehicle Throughput; (b) Cumulative Vehicle Throughput. ........................................................................ 117  Figure 6-1: Numerical Illustration Network. ............................................................ 132  Figure 6-2: Comparison of Surrogate Models, (a) Regressing Kriging; (b) Bayesian Stochastic Kriging. .................................................................................................... 136  Figure 6-3: Simulation Network of I-270 Freeway/Arterial Corridor. ..................... 140  Figure 6-4: Supply Calibration Results, (a) Modified Greenshield’s Model Calibration for Detector ID 3392; (b) Comparison of Measured and Simulated Traffic Flows on General-Purpose Lanes. ............................................................................................ 145  Figure 6-5: Comparison of Mean Measured and Simulated Cumulative HOV-Lane VMT. ......................................................................................................................... 148  Figure 6-6: Prediction Accuracy of the Bayesian Stochastic Kriging. ..................... 151  Figure 6-7: Distributions of Simulation Outputs for the Baseline and Optima. ....... 152  Figure 6-8: Comparisons of Traffic Flow Characteristics on the Work Zone HOV/HOT Lane........................................................................................................ 153  Figure 6-9: Comparison of the Baseline and Optima, (a) Average Travel Time; (b) Vehicle Throughput. ................................................................................................. 154  Figure 7-1: Distribution of Simulation Noise at Four Randomly Selected Design Points......................................................................................................................... 158  Figure 7-2: Framework of the Expected Improvement-Based Infill with Bootstrapped Distribution-Based SVR Surrogate Model. .............................................................. 172  Figure 7-3: Comparison of the EI Infill Global Optimization with Distribution-Based SVR and Bayesian Stochastic Kriging. .................................................................... 174  xiii Figure 7-4: Comparison of the EI Infill Global Optimization with Distribution-Based SVR and Bayesian Stochastic Kriging (Averaged over 10 Instances). .................... 176  Figure 7-5: Distributions of the Baseline and Optima Predicted by the D-SVR and BSK Models. ............................................................................................................. 178  Figure 7-6: Network Wide Average Trip Travel Time along Hour of the Day of the Baseline, BSK Optima and D-SVR Optima Cases. .................................................. 180  xiv List of Abbreviations AEOV Absolute Error of Optimal Value AMS Anisotropic Mesoscopic Simulation ANOVA Analysis of Variance BSK Bayesian Stochastic Kriging CCD Central Composite Design CI Confidence Interval CV Cross Validation DMS Dynamic Message Signs DoE Design of Experiments D-SVR Distribution-based Support Vector Regression DTA Dynamic Traffic Assignment DynusT Dynamic Urban Systems in Transportation EGO Efficient Global Optimization EGOp Estimated Global Optimum EI Expected Improvement FDSA Finite Differences Stochastic Approximation FIFO First-In, First-Out GA Genetic Algorithm GFV Gap Function Vehicle HOT High-Occupancy Toll HOV High-Occupancy Vehicle xv H-SVR Heteroscedastic Support Vector Regression IAGO Informational Approach to Global Optimization LHS Latin Hypercube Sampling LR/SF Likelihood Ratio/Score Function MAE Maximum Absolute Error MCP Multiple Comparison Procedures MLE Maximum Likelihood Estimation MoEs Measures of Effectiveness MPEC Mathematical Program with Equilibrium Constraints NMAE Normalized Maximum Absolute Error NRMSE Normalized Root Mean Squared Error OO Ordinal Optimization PA Perturbation Analysis PCC Pearson Correlation Coefficient PDF Probability Density Function PI Probability of Improvement RBF Radial Basis Function RBFNN Radial Basis Function Neural Network RMSE Root Mean Square Deviation R&S Ranking and Selection RS Radom Search SA Stochastic Approximation SBO Simulation-based Optimization xvi SKO Sequential Kriging Optimization SNOBFIT Stable Noisy Optimization by Branch and FIT SPSA Simultaneous Perturbation Stochastic Approximation SUE Stochastic User Equilibrium SVM Support Vector Machine SVR Support Vector Regression TTI Texas Transportation Institute UE User Equilibrium VSL Variable Speed Limit xvii List of Mathematical Symbols b Index of bootstrap sample B Bootstrap sample size B Covariance matrix of the prior distribution of β C Coefficient in SVR governing the tradeoff between model complexity and prediction accuracy ( )f  Objective function ˆ ( )f  Estimated objective function ( )F  Sample response function min max,g g Lower and upper bound for constraints ( )G  Sample constraint function i Index of input sample point I Identity matrix j Index of regression basis functions k Dimension of input variables ( )L  Likelihood function ( )L  Negative log-likelihood function m Dimension of the vector of regression basis functions M Number of points in the Pareto set n Sample size ( )p  Probability density function ( )q  Vector of regression basis functions Q Matrix of regression basis functions xviii r Index of replication of simulation R Number of replications of simulations  The real set 2ˆ ( )s  Estimated predictor’s variance 2 ( )s  Sample variance in Bayesian stochastic Kriging w Vector of weighted coefficients of basis function x Vector of input variables min max,x x Lower and upper bound for input variables X Set of input vectors ˆ,y y Simulation response and its estimation y Set of simulation responses ( )Y  Random variable corresponding to simulation response ( )Z  Random field in Bayesian stochastic Kriging , , ,       Dual variables in SVR Gumbel Scale parameter for Gumbel distribution β Weight parameters of regression basis functions 0β Mean of the prior distribution of β  SVR insensitive margin ( )  Random field in Kriging ( )r  Simulation random noise in Bayesian stochastic Kriging ( )  Probability density function ( )P Gaussian   Product of the probability density functions of two Gaussian distribution xix Φ Gram matrix ( )  Basis function  Euler’s constant ˆ,  Mean of the objective function and its MLE estimate Gumbel Location parameter for Gumbel distribution  Scaling coefficients for Kriging 2 2, ˆ  Variance of the random field in Kriging and its MLE estimate Σ Intrinsic covariance matrix in Bayesian stochastic Kriging  SVR prediction error  Prediction loss for traditional SVR ,   Prediction loss for under-prediction and over-prediction in SVR ( )  Kriging basis function ψ Vector of Kriging basis functions Ψ Correlation matrix for Kriging Ψ Correlation matrix for regressing Kriging 1 Chapter 1: Introduction 1.1 Background Transportation systems are core public infrastructure that is closely related to the everyday life of personal people. It does not only directly serve people’s travel demand, but also supports living needs indirectly through the transport of goods. Efficient transportation can benefit the society in various aspects, including stimulating the economy, improving the environment, reducing energy consumption, etc. Keeping the transportation system in the best possible condition with the available resources is the primary goal for transportation system planners and operators. Transportation planning is defined as the process of developing strategies for operating, managing, maintaining, and financing the area’s transportation system in such a way that it will advance the area’s long-term goals (FHWA and FTA, 2007). The most widely used method for transportation planning is the rational planning model (Friedmann, 1987), which is composed of five major modules: defining goals and objectives, identifying problems, generating alternatives, evaluating alternatives, and developing plans. Figure 1 illustrates the typical steps involved in the rational planning process. It can be noticed that the optimal decision is not derived through searching thoroughly over the entire feasible domain. Instead, several alternatives are generated and evaluated first, and the optimal strategy is selected through pairwise comparison of the system performance of those alternatives. In practice, only a finite 2 number of alternatives can be analyzed for this what-if scenario-based analysis, which could be a major disadvantage of the rational planning model. Figure 1-1: Rational Planning Model Flowchart. (Source: Levinson et al., 2013) For decisions on infrastructure construction, the assumption of limited alternatives may be appropriate because this type of transportation planning strategies faces strict constraints in terms of both budget and land use. There are usually very few choices available for consideration. However, for decisions on travel demand management policies and traffic operations strategies, the limited alternative assumption can hardly hold. Unlike the discrete nature of construction decisions (e.g. mode to investment, number of lanes, alignment, etc.), the optimization of travel demand management policies and traffic operations strategies are usually continuous or approximately continuous problems (e.g. pricing for tolled facilities, ramp metering strategy, traffic signal plan, etc.). Thus unlimited alternatives should be considered. The goal of this type of problem becomes seeking improved settings of 3 policy instruments from all possible options instead of selecting the best settings from a fixed set of alternatives. In this situation, scenario-based analysis is no longer a sufficient method for searching for the optimal solution. A routine that retrieves information from evaluated alternatives and intelligently generates new promising alternatives is needed. There is already a large number of optimization methods developed in the operations research field. However, none of those methods can solve all types of problems effectively. A prerequisite of the success in resolving a specific optimization problem is the judicious choice of appropriate methods. The key factors that influence the choice of optimization methods include the structure of the objective functions, constraints, dimension of the problems, etc. Of all these factors, the structure of the objective functions plays a rather crucial role as it determines what information is available for the optimization module. These include whether the problem is a linear problem, whether gradient information is available, how much time it takes to evaluate one decision variables combination and so on. In the following subsection, the major forms of objective functions for transportation planning and operations decision problems will be introduced. 1.2 Objective Evaluation The performance of the transportation network is the result of complex travelers’ behaviors and their interactions with the network infrastructure. To evaluate the transportation system performance under certain planning and operations 4 strategies, the influence of implemented measures on the network characteristics and the resulting impact on travelers’ behavior need to be modeled. At the macro level, traveler’s behavior is usually modeled by mathematical equations with various assumptions (e.g. Wardrops’s first principle, utility maximization, etc.). These mathematical equations have been applied in the widely-used four-step model as well as the more advanced activity-based travel demand models. On the other hand, simulation models are developed for the meso or micro level analysis of travel behavior. By intelligently modelling the true behavior of individual traveler/group of travelers, simulation can introduce more stochasticity into the system. At the same time, simulation has a better capability in describing the dynamics of travel behavior than typical analytical models. Both mathematical equations and simulation models have been widely utilized to evaluate the performance of transportation systems. In the following two subsections, a brief review of transportation planning and operations decision problems whose objectives are evaluated through these two methods will be presented. Moreover, the advantages and disadvantages of the two evaluation methods are also summarized. 1.2.1 Mathematical Methods The analysis of transportation planning and management problems is usually conducted for a relatively large network. Decision makers are interested in the overall performance of the regional network under a certain planning and management strategy. In this case, the behavior of each traveler in response to the policy stimulus should be modeled in the first stage, and the aggregate performance can then be 5 summarized based on the individual behaviors. Various decision making problems of transportation planning and management have been formulated mathematically as bi- level optimization, with the upper-level formulating the objective function while the lower-level modeling individual travelers’ behavior in response to the policy. The main outputs of the lower-level problem are the travel path for each traveler and the associated attributes of each travel path. Usually, travelers are assumed to make their route choices in a user optimal manner, and hence the lower-level problem can be formulated as a user equilibrium (UE) traffic assignment model. To relax the assumption that travelers have complete knowledge of the network conditions, stochastic user equilibrium (SUE) models can also be applied for the lower level problem. As for the objective defined by the upper-level problem, objectives which have been frequently considered include minimizing total travel cost, minimizing average trip travel time, minimizing total vehicle miles traveled, maximizing total revenue, maximizing consumers’ surplus, etc. Optimization for transportation planning and management problems with objectives evaluated through mathematical equations has been extensively studied in the existing literature. LeBlanc and Boyce (1986) investigated the optimal road network design in achieving the lowest summation of total travel cost and network improvement cost. In this paper, the problem is formulated following the bi-level structure, and both upper and lower problems are approximated with piecewise linear functions for the convenience of solution derivation. The optimal congestion pricing under different network conditions in terms of maximizing economic benefit was studies in the 1990’s. Cases with the lower level optimization specified as a 6 deterministic user equilibrium problem (Yang and Huang, 1998) or a stochastic user equilibrium problem (Yang, 1999) were discussed separately. Chan and Lam (2002) proposed a bi-level programming model to determine the best density for speed detector deployment in a network with travel time information provided to travelers. The lower-level problem was a probit-based traffic assignment model, and the upper- level problem searched for the speed detector density that minimized the measured travel time error variance as well as the social cost of the speed detectors. More recent applications of bi-level optimization structure for transportation planning and management include optimizing shelter location for hurricane events (Li et al., 2011), deriving optimal land use development plan to improve network reliability (Yim et al., 2011), finding the optimal combination of exclusive transit lanes on a network basis (Mesbah, 2011), and others. Decision making on traffic operations measures has also been formulated as optimization problems with objective functions expressed by mathematical equations. For urban road traffic control, isolated intersection control strategies with either fixed-time or traffic-response strategies were investigated, respectively (Improta and Cantarella, 1984; Vincent and Young, 1986). Later, both strategies for coordinated control of signal at multiple intersections were studied (Stamatiadis and Gartner, 1996; Kessaci et al., 1990; Gartner et al., 2001). These research efforts generally formulate the problem as a constrained optimization, and model the traffic dynamics with different types of assumptions either at the link level or at a more detailed level in a way such as cell-transmission representation. Regarding freeway traffic control, the most direct measure ramp metering has been studied. The optimization of ramp 7 metering strategies is formulated as mathematical programs with traffic dynamics described by different traffic flow models, among which both fixed-time ramp metering strategies and reactive ramp metering strategies analyzed (Papageorgiou, 1980; Chen et al., 1997; Zhang et al., 2001). In addition, mathematical methods have been applied to evaluate the performance of other types of traffic control strategies, such as dynamic message signs (DMS) (Messmer and Papageorgiou, 1995; Ben- Akiva et al., 1997), variable speed limit (VSL) (Yang et al., 2013), etc. More advanced integrated optimization of combined traffic operations strategies has also been investigated (Kotsialos et al., 2002; Hegyi et al., 2005; Carlson et al., 2010). The mathematical methods are of interest to researchers mainly due to their tractability. In this case, the computation of the objective values would be very convenient, and traditional optimization approaches can be easily applied. However, this method suffers from the limitation that strong assumptions on travel behavior and network dynamics need to be set up in order to build up the analytical form for the objective functions. For instance, one major behavior dimension needs to be considered for evaluating transportation planning policies is route choice. The travelers’ route choice is usually described by UE or SUE models, in which link travel time is assumed to always decrease along with the increase of link flow. Link flow is supposed to be able to increase without limit. Meanwhile, the influence of traffic control measures (e.g. traffic signal, ramp metering, etc.) on route choice cannot be captured, and some real world phenomenon such as queuing and the First- In, First-Out (FIFO) constraint are not taken into account. All these features limit the ability of the mathematical methods to reflect the traffic realism accurately. For the 8 evaluation of traffic operations strategies, the major concern is the traffic dynamics. Common assumptions associated with mathematical methods include constant link speed, ignorance of route choice behavior in response to traffic control, etc. Moreover, it is more appropriate to apply the mathematical methods for analysis in small regions (e.g. a single intersection or a single corridor). In the case of utilizing mathematical methods evaluating the impact of traffic control measures implemented in large-scale networks (i.e. regional network or even larger), further assumptions such as the store-and-forward concept need to be made to reduce the computational complexity. As the analytical models for the impact of transportation planning policies and traffic operations strategies focus on different aspects of the transportation system, it is very difficult to build a single analytical model capable of evaluating the joint impact of planning policies and operations strategies on the transportation system performance, which is another limitation of the mathematical methods. 1.2.2 Simulation Models Simulation models mentioned in this section mainly refer to microscopic or mesoscopic models, which capture more detailed traffic dynamics than macroscopic models. Compared to analytical models, fewer assumptions at the aggregate level which obviously violate the characteristics of real traffic (e.g. unlimited link volume, ignorance of the FIFO constraint, etc.) are imposed in simulation. Thus transportation management agencies are becoming more favorable to simulation models and are prone to use simulation for informing decisions about specific investment or 9 management and operations policies in the transportation system. Computerized simulation models representing the transportation system were developed since the 1950’s in the United States (Wang and Prevedouros, 1996), to address more and more operational needs in the planning and operations decisions. In simulation models, specific rules are assigned to individual or group of travelers in their behavior, and their response to the network infrastructure. The entire transportation system is then formed with lots of the complex interactions among travelers as well as between the travelers and the network infrastructure. Therefore, the evolution of the transportation system is essentially determined by those behavioral rules. This bottom-up design makes simulation models very good representatives of the real world transportation systems, and thus more reliable evaluation tools for agencies in informing decisions. Various simulation models at different analysis levels for the transportation system have been developed, such as CORridor SIMulation (Halati et al., 1997), TransModeler (Balakrishna et al., 2009), Vissim (Fellendorf and Vortisch, 2010), Aimsun (Barceló and Casas, 2005), DynusT (Chiu et al., 2010a), DynaMIT (Ben- Akiva et al., 2002), DYNASMART (Mahmassani, 2002), Dynameq (Tian et al., 2007), etc. In terms of application, simulation models have been widely used in evaluating the effect of transportation planning and operations policies. Examples include studies on congestion pricing (De Palma et al., 2005), high-occupancy toll (HOT) lane pricing (Murray et al., 2001), traffic signal (Mosseri et al., 2004), dynamic route guidance (Gao et al., 2008), ramp metering (Bellemans et al., 2006), etc. 10 In general, simulation models exhibit strong advantages in capturing network and behavior details, which is of great benefit to transportation planning and management agencies. In addition, as fewer aggregate-level strong assumptions on traffic dynamics are imposed in simulation models, the evaluation through simulation is believed to perform better in representing the real traffic than mathematical methods. Moreover, a simulation model is suitable for evaluating the impact of combinations of transportation planning and operations strategies, since different levels of traffic details are taken into account simultaneously. The major disadvantage associated with simulation models is that the evaluation of objective functions through simulation is usually computationally expensive. It may cost hours to days for a single run of the simulation on a large network (e.g. the entire road network for a metropolitan area). This would make optimization based on simulation evaluation rather time consuming. Meanwhile, as multiple random processes are involved in the simulation, there are usually no closed forms for the objective functions. In this case, it would be difficult to retrieve information such as gradient, which further restricts the usage of traditional optimization methods (e.g. Newton’s method) when simulation evaluation is applied. 1.3 Research Objectives and Methodology Along with the increase of vehicle ownership and population, travel demand in the U.S. continues to grow, but the construction of new highway infrastructure has not kept pace with the growth of travel due to the constraints of land availability and 11 budget limit. According to Highway Statistics (1980, 2010), the vehicle miles traveled in the U.S. increased 94 percent between 1980 and 2010, while the miles of highways only increased 5.4 percent during the same period of time. The resulting traffic congestion problem thus becomes the major issue that transportation planners and operators need to take care of, especially in urban areas where population is densely distributed. The Texas Transportation Institute (TTI) estimated in their 2012 Urban Mobility Report (Schrank et al., 2012) that congestion in 498 metropolitan areas caused urban Americans to travel 5.5 billion hours more and to purchase an extra 2.9 billion gallons of fuel for a congestion cost of $121 billion in 2011. Besides adding highway capacity, various transportation planning and operations strategies have been proposed to improve mobility and reduce congestion, including congestion pricing, traffic signal, access management, travel demand management, traveler information, etc. These policies may be implemented separately or jointly in a particular area. The major challenge for a transportation agency is how to select specific policy variables, with the aim of operating the transportation system in the most efficient way. The evaluation of the impact of specific policies on the transportation system performance is thus a crucial step in the decision making process. Realizing the advantages of simulation models in representing the traffic realism, transportation agencies are more prone to choose simulation models as the evaluation tool than mathematical methods. Therefore, the objective of this dissertation is to evaluate transportation planning and operations strategies with simulation, and develop methods to optimize those strategies based on simulation 12 evaluation. Meanwhile, as the objective functions would be very expensive to evaluate when simulation is applied, the developed optimization method should be very efficient and able to find optimal or near-optimal solutions with a reasonable computation budget. In order to achieve the specified objectives, the dissertation starts with a review of existing simulation-based optimization (SBO) methods implemented in both the transportation field and other disciplines. The appropriate method and implementation framework are then developed according to the characteristics of transportation simulation and the specified requirement of limited computation budget. The optimization methods for problems in different forms (i.e. single- objective optimization, multi-objective optimization and constrained optimization) are investigated. To deal with the heterogeneity of error variance and asymmetric distribution observed in transportation simulation outputs, advanced SBO methods including enhanced support vector regression (SVR) and Bayesian Stochastic Kriging (BSK) models are developed. The associating infill strategies for the iterative application of these methods are also discussed. 1.4 Outline of the Dissertation This dissertation is composed of 8 chapters. Existing SBO methods developed in different fields are first reviewed. The appropriate method suitable for implementation in the domain of transportation research is then selected and applied. Based on the observation of specific characteristics associated with transportation 13 simulation, the selected method is then improved to make it fit better the particular problem. The remaining chapters are arranged as follows. Chapter 2 reviews SBO methods developed in existing literature, including both methods discussed theoretically and those applied practically. The features associated with each method are analyzed and compared. Specifically, a review on the applications of SBO in transportation research is conducted. Chapter 3 introduces the framework as well as the technical details of the surrogate-based optimization method, which is selected as the method to be applied in the current research. Different types of surrogate models are discussed and compared using a numerical example. The promising surrogate models identified through the numerical test are then utilized for a real world application of optimizing the dynamic pricing for a toll road. Chapter 4 further extends the capability of the surrogate models in dealing with problems of more interest to decision makers. Constrained optimization and multi-objective optimization problems are formulated, and the appropriate infill strategies are applied accompanying with the regressing Kriging model to solve these two types of problems. Chapter 5 describes the heterogeneity of error variance observed in the simulation outputs from the dynamic road pricing case study, and then develops an enhanced SVR model named heteroscedastic SVR for the simulation-based optimization, taking into account the heteroscedastic simulation noise. Chapter 6 develops another type of surrogate model named Bayesian stochastic Kriging, which also takes into account the heterogeneity in simulation 14 outputs. Meanwhile, the uncertainty in parameter estimation is considered in this model. A different case study with integrated optimization of HOT toll rate and freeway diversion control is created for the application of this model. Chapter 7 further investigates the distribution of simulation noise, and develops an improved surrogate model based on SVR, which is named distribution- based SVR, with prediction error penalized in a way related to the probability density function of simulation noise. In addition, the method of bootstrapping is used for the estimation of predictor’s variance. The expected improvement infill strategy is then incorporated to the surrogate model for global optimization. Chapter 8 concludes the dissertation. Major findings and contributions of the research is summarized. The dissertation ends with the proposition of future research directions as the extension of the current research. 15 Chapter 2: Literature Review Simulation-based optimization is a technique that integrates optimization routines into simulation analysis. A typical SBO problem consists of decision variables serving as parameters in simulation and objective functions evaluated through simulation. Sometimes, simple box constraints to decision variables or constraints that are associating measurements of experimental simulations would be imposed to the problem. The problem can usually be formulated in the following from, min ( ) [ ( , )]k f E F w x x x (2.1) s.t. min max x x x min max[ ( , )]E G w g x g where kx  is the k-dimensional vector of input variables, ( )f x represents the objective function, w is a sample path (simulation replication), and F is the sample response function (simulation). When the simulation model F is a stochastic simulation, a single run of the model F only provides an estimator of the true response. Thus the objective function of the problem is the expectation of the simulation output. The vector of decision variables x is restricted by the lower bound minx and upper bound maxx . The last inequality represents another constraint that is evaluated through the simulation function ( , )G wx , and the lower and upper bound for this constraint are ming and maxg , respectively. 16 There are quite a few excellent review papers on the subject of simulation- based optimization approaches. Fu (1994) reviewed methods for optimizing discrete- even systems via simulation. Both the discrete parameter and the continuous parameter cases were discussed. Techniques for optimization from a finite set including multiple-comparison procedures and ranking-and-selection procedures were introduced for the discrete parameter case, while gradient-based methods were the focus for the continuous parameter case. Andradóttir (1998) presented a review focusing on gradient-based methods for optimization of problems with continuous decision variables and random search methods for problems with discrete decision variables. Tekin and Sabuncuoglu (2004) summarized the latest development in SBO methods, and classified the existing techniques according to problem characteristics such as the shape of the response surface (global as compared to local optimization), objective functions (single or multiple objectives) and parameter spaces (discrete or continuous parameters). More recently, Hachicha et al. (2010) provided a literature survey with all classification criteria and proposed a global classification scheme of SBO methods. In their review paper, SBO problems were classified regarding their input variables (quantitative variables and qualitative variables), output variables (single-objective problem or multi-objective problem), parameter spaces (discrete or continuous parameters), the shape of the response surface (global as compared to local optimization), or by their optimization procedure. In addition to the reviews of theoretical development of SBO methods, a discussion regarding the progress of SBO methods in application was provided in Fu (2002). The optimization modules in five commercial software —AutoStat, 17 OptQuest, OPTIMIZ, SimRunner and WITNESS Optimizer were introduced. Based on the review, the optimization procedures implemented in simulation software were all based on metaheuristics and predominantly evolutionary algorithms. 2.1 Simulation-Based Optimization Methods On the basis of the review of SBO approaches aforementioned, we group the most popular methods into five categories according to the characteristics of the methods in this dissertation. The five categories are methods primarily for discrete optimization problems, gradient-based methods, mathematical programming-based methods, metaheuristics, and surrogate-based methods. The following subsections briefly introduce each type of the SBO methods. 2.1.1 Methods Primarily for Discrete Optimization Problems The SBO methods suitable for discrete optimization problems are reviewed in Swisher et al. (2004). The methods are differentiated based on the size of the feasible solution set. When the feasible solution set is finite and small, ranking and selection (R&S) and multiple comparison procedures (MCP) would be appropriate. On the other hand, if the set is infinite or very large, techniques such as random search (RS) and ordinal optimization (OO) can be applied. Both R&S and MCP methods need to perform exhaustive evaluations of all feasible solutions. This is only practical when the solution set is small. Being 18 different from traditional optimization methods, R&S and MCP focus more on the comparison aspect than searching. There are two important concepts in the R&S method which need user specification: an indifference zone  and a confidence level P . With multiple replications for each parameter setting, the sample mean and variance of the simulation output can be computed. According to the user-defined indifference zone and confidence level, the number of additional replications for each parameter setting is determined. The optimal solution is then chosen as the parameter setting with the smallest average sample mean. This derived optimal solution can be guaranteed to be within  of the true optima at the confidence level P (Law and Kelton, 1991). The MCP method is different from R&S in that it makes conclusions based on the confidence intervals of the difference between any pair of parameter settings instead of directly comparing the sample means. With the assumption that the simulation for different parameter settings is independent and the simulation noise follows the normal distribution, the confidence interval for the difference between any pair of input points can be computed. When all confidence intervals are formed, one would simply look for the parameter setting that the confidence interval for the difference with all other pairs is strictly negative. This parameter setting is the clear winner in the feasible solution set. If no clear winner is found, one can crudely eliminate some candidates, run more replications for the smaller set and then look for the clear winner. This process can be repeated until the conclusive inference can be made (Hsu and Nelson, 1998; Goldsman et al., 1991). 19 For problems with infinite feasible solutions, it is definitely impractical to apply methods that need exhaustive evaluations of all possible solutions. The appropriate method for this type of problem should employ some mechanism to reduce the size of the effective solution set. The random search method is of interest due to the existence of theoretical convergence proofs (Fu, 2002). The main idea of the method is to move iteratively from a current parameter setting to another one in the neighborhood of the current point. The number of visits to each design point would be counted. Within each iteration, a new design point is selected from the neighborhood of the current point according to some pre-specified probability distribution. This new point is then evaluated through simulation and compared to the performance of the current point. The counter for the point with better performance would be increased by one, and the new current point is updated. However, this algorithm may face implementation difficulty of sampling randomly from the neighborhood with the appropriate distribution (Banks et al. 2000). More details on the random search method in simulation can be found in Andradóttir (2005). Ordinal optimization can reduce the search for optima from a very large sample to a much smaller one, by softening the goal of looking for the best to looking for a solution that is good enough. The development of this method is based on the observation that finding ordering among candidate solutions is much easier than carrying out the estimation for each solution individually in most cases. It is found that estimating the difference in performance for two design points using the sample mean is governed by the Monte Carlo convergence rate of 1 / n , while deciding the 20 ordering of two design points based on the sample mean has an exponential convergence rate (Fu et al., 2005). More details on the ordinal optimization method can refer to Ho et al. (1992), Ho and Deng (1994) and Lee et al. (1999). 2.1.2 Gradient-Based Methods Gradient-based methods applied in simulation-based optimization mainly refer to the stochastic approximation (SA) algorithms. Stochastic approximation is essentially the adaption of gradient search method in deterministic optimization for stochastic settings. This method iteratively updates the current solution by moving toward the gradient direction, 1 ( )k k k kf   x x x (2.2) where kx and 1kx are the current solution and the updated solutions, respectively. ( )f k x represents the estimated gradient of the objective function at the current design point, and k is the step length. As SA searches for the updated solution along a line, it is a local optimization method. Under proper conditions, the SA method can guarantee to converge to a local optimum. The appropriate estimation of the gradient is very crucial to the effectiveness of the method. A review of available gradient estimators that can be implemented in simulation is provided in Fu (2006). Popular techniques for gradient estimation include finite difference estimation, perturbation analysis (PA) (Spall, 1992; Bettonvil, 1989; Glasserman, 1991; Fu and Hu, 1994), likelihood ratio/score function (LR/SF) estimators (Glynn, 1987; Glynn, 1989; Rubinstein and Shapiro, 21 1993), and weak derivatives (Heidergott, 2001). Based on the theory of finite difference estimation, three alternative estimators are generated, which are naïve one- sided finite difference estimation, two-sided symmetric difference estimation and simultaneous perturbations. Generally, only single run of simulation is needed for the PA and LR/SF estimators, but the implementation of these two estimators require knowledge about the structure of the simulation and modifications to the simulation source code. On the other hand, application of the finite difference estimation and weak derivatives method require multiple simulation runs, and the gradient estimation is usually noisier than the PA and LR/SF estimators. 2.1.3 Mathematical Programming-Based Methods The main idea of mathematical programming based methods is to evaluate a relatively large set of samples, so as to approximately turn the stochastic problem into a deterministic problem. In this way, the powerful deterministic optimization methods already developed can be directly applied for simulation-based optimization problems (Robinson, 1996). Sample-path optimization is the most popular method falls into this category implemented for SBO problems. It takes many simulations for each of the design point, and conducts optimization based on the sample means, 1 1( ) ( )    nn i i f fnx x (2.3) where  ( )if x are the independent and identically distributed (i.i.d.) unbiased estimates of the true function f (i.e. the output from a single run of simulation), n is the 22 number of replications, and ( )nf x represents the sample mean at the design point x over n replications. According to the strong law of large numbers, ( ) ( )nf fx x with probability 1. The task of sample-path optimization is then to optimize the deterministic function ( )nf x . If the derivatives are available for this approximated deterministic problem, the effectiveness of this approach would be enhanced significantly, as a lot of nonlinear programming packages require them. However, the major disadvantage of this approach is the requirement of large number of replications of simulation evaluation, which are usually very computationally expensive. 2.1.4 Metaheuristics Metaheuristics have been applied successfully in a lot of real world applications. The main strength of metaheuristics is their flexibility. They can be applied in various types of problems (e.g. discrete optimization problem, continuous optimization problem, deterministic optimization problem, stochastic optimization problem, etc.), and all types of constraints can be taken into account. Moreover, researchers are interested in them because they are global optimization methods. Four most popular metaheuristics that have been applied to simulation-based optimization are genetic algorithms, tabu search, simulated annealing and scatter search. Genetic algorithm is an evolutionary algorithm, which is inspired from the process of biological evolution. This method updates a finite set of solutions iteratively. In each iteration, new solutions are generated through crossover and 23 mutation processes. Only those satisfying certain criteria would be accepted, and used for the generation of their offspring in the next iteration. The algorithm would proceed to generate more favorable solutions along iterations. More detailed introduction of the genetic algorithm can be found in Goldberg (1989) and Schmitt (2001). The most notable feature of tabu search is the memory introduced into the optimization framework. This method is based on local search that iteratively moves the current solution to a neighbor solution. Unlike other local search methods, tabu search allows moves to neighbor solutions that have worse objective values than the current solution. In this way, the procedure would not be trapped at a local optimal solution. Meanwhile, a list of forbidden moves would be memorized to avoid circling or infinite loops. The tabu list would be updated along the iterations. Glover and Laguna (1997) provided a very comprehensive reference for tabu search and its applications. Simulated annealing (Eglese, 1990) mimicked the process of annealing in metallurgy, which involves heating and controlled cooling of a material. Similarly to the tabu search setting, simulated annealing also iteratively searches for a new solution in the neighborhood of the current solution. If a better solution is found, it replaces the current solution with probability one. On the other hand, if a worse solution is found, the current solution would be replaced by it with a probability strictly less than one, and the probability of moving to a worse solution should decrease along the iterations. 24 Similarly to genetic algorithm, scatter search is also an evolutionary algorithm that iteratively updates a finite set of solutions. Basically, generalized forms of linear combinations of current solutions would be generated as the optional solutions for the updated set. These new solutions are then mapped into associated feasible solutions. After the mapping, the new feasible solutions would be evaluated based on some criteria including diversity, objective function value, etc. Only those defined as good points according to the criteria are kept in the new solution set for the next iteration. An advantage of the scatter search is that information not contained separately in the original points can be captured through the combination process. A complete reference on scatter search can be found in Laguna and Marti (2002). 2.1.5 Surrogate-Based Methods Surrogate-based optimization approach is a feasible alternative to solve continuous optimization problems with computationally costly objective functions. Surrogate modeling or metamodel-based simulation optimization aims to regress the response surface that characterizes the relationship between the inputs of decision variables and simulation outputs (Hussain et al., 2002; Queipo et al., 2005; Jakobsson et al., 2010). The surrogate simplifies simulation optimization because of its deterministic rather than stochastic relationship between the input and output (Barton and Meckesheimer, 2006). Using only an initial input dataset and corresponding output values of the objective function, the surrogate model can be developed as an approximation of the expensive-to-evaluate objective. In the surrogate-based optimization approach, an unknown function that formulates the relationship between 25 simulation input and output is approximated with a predefined parametric function whose coefficients can be determined via the experiment design. The most fundamental forms of surrogate models are linear and quadratic polynomial regression (Montgomery, 2008). On the basis of exploration and exploitation of the computationally efficient surrogate, the optimal solution can be obtained. However, the expediently implemented low-order polynomials may be heavily biased when applied in complex functions of high nonlinearity. More advanced radial basis functions (RBF) (Björkman and Holmström, 2000; Gutmann, 2001; Regis and Shoemaker, 2005; Zhou et al., 2013) and Kriging models are capable of providing good predictions for the complex response surface. A radial basis function neural network (RBFNN) learns input-output mapping by covering the input space with basis functions that transform a vector from the input space to the output space (Adeli and Karim, 2000; Adeli and Jiang, 2009). A support vector regression (SVR) surrogate model provides a good compromise between prediction accuracy and robustness of other approximations (Smola and Schölkopf, 2004; Wandekokem et al., 2011; Li et al., 2013). Forrester et al. (2006) pointed out that one smooth continuous approximation function (e.g. Gaussian RBF and ordinary Kriging) is unable to fit the discontinuous simulation output due to random noises around the true average response value. The optimization accuracy relies on how accurate the surrogate models are in capturing the performance variations. 26 2.2 Deterministic vs. Stochastic In a stochastic simulation, the output from a single run of simulation is just a realization of a random number. In this case, the output is not likely to be equal to the true objective value. Directly using the stochastic output as the estimator of the true objective value in the optimization routines may result in searching in the wrong direction or premature stopping. The noise introduced by stochastic simulation should be properly controlled during the optimization process. There are mainly three types of methods implemented in stochastic optimization to take into account the simulation noise. The most common way is to conduct multiple simulation runs at each design point, and use the mean of the output from those replications as the estimator of the true objective value. This is an unbiased estimator which would converge to the true objective value with probability 1 if the number of replications is infinite. The advantage of this method is that it is very flexible and deterministic optimization methods can be directly applied without any revisions. However, the performance of this method is determined by the quality of the estimator, which is heavily dependent on the number of replications. For simulations with expensive-to-evaluate objective functions, raising the number of replications would significantly increase the computational burden for the entire optimization process. Instead of requiring the accurate objective value for each design point, the second type of method acknowledges the noise in simulation outputs, and allows the objective value used in the optimization routine to be different from the simulation 27 output. However, this method would only accept deviations within a specific limit. If the deviation is greater than the predefined limit, the part exceeding the limit would be penalized. A typical method in this category is surrogate-based method with SVR as the metamodel. By defining an  - insensitive loss function, the difference of the surrogate prediction and the actual simulation output at any design point that is larger than  will be penalized when constructing the SVR response surface. The error acceptance band  is usually selected to be the standard deviation of simulation noise. Very accurate estimation of the true objective value is not necessary when applying this type of method. Thus only a few replications at each design point are needed, which can already provide rough estimates of the true objective value as well as the variance of simulation noise. In this way, the computational cost is remarkably reduced compared to the first type of method. A problem with this type of method is that the choice of  is rather arbitrary, and the distribution of the stochastic output is intrinsically assumed to be symmetric. The third type of method realizes the stochastic nature of the simulation outputs, and incorporates the consideration of the distribution of the simulation noise into the optimization process. In addition to utilizing the mean of simulation outputs as the estimator of the true objective value, the method makes use of the information retrieved from the replications more comprehensively, by assuming the sample variance of simulation outputs to be the variance of simulation noise. This estimated distribution is then explicitly introduced into the optimization process. A recent surrogate-based optimization method with stochastic Kriging metamodels (Ankenman et al., 2010) falls into this category. The simulation noise is assumed to 28 be normally distributed with zero mean and variance equal to the sample variance. The parameters of the metamodel are then estimated through maximum likelihood estimation. Similar to the second type of methods, the number of replications needed for this method is much smaller than the first one. And from the theoretical point of view, this type of methods should perform better as it utilizes more information from the evaluation of design points than the second type of methods. A major disadvantage of this method is that the simulation noise is usually assumed to be normally distributed due to computational convenience. This assumption may restrict the power of the method in dealing with stochastic optimization problems when the actual simulation noise is not normally distributed. 2.3 SBO Application in Transportation Research As many real world problems are very complex and mathematically intractable, simulation has become an appropriate tool for performance evaluation. Therefore, SBO has been applied in various research fields to help in decision making. Examples of SBO applications have been found in inventory management in supply chains (Schwartz et al., 2006), logistics management (Kochel et al., 2003: Yoo et al., 2010), production planning and scheduling (Feng and Wu, 2006; Kazemi and Zarandi, 2008), wireless sensor network design (Simon et al., 2003), building design (Tresidder et al., 2012; Gengembre et al., 2012), design of aircraft (Simpson et al., 2001), etc. 29 Specifically for the research in the transportation planning and operations field, applications of SBO are not very common yet. In the literature, transportation engineering optimization problems are characterized by computationally expensive objective functions, high dimensional decision variables, and stochastic simulation experiments. In the early stages, due to the limitation of computers’ computing power, metamodels were developed to approximate expensive simulation models in order to reduce computational burden. Studies developing surrogate models and comparing the performance of surrogate models and simulation models were conducted, which include approximating delays caused by a single queue in waterway (Ramanathan and Schonfeld, 1994), approximating delays through series of waterway queues (Dai and Schonfeld, 1998) and approximating delays caused by a queuing network (Zhu et al., 1999). In terms of optimization, SBO is most frequently applied for the calibration of simulation models, while a few other studies have utilized SBO to suggest transportation planning or operations strategies. The dynamic traffic assignment (DTA) model calibration of O-D flows and all other simulation parameters is formulated as a large-scale iterative simulation-based optimization problem, which can be solved with several alternative approaches, such as Bayesian method, Stable Noisy Optimization by Branch and FIT (SNOBFIT), Box-Complex, Simultaneous Perturbation Stochastic Approximation (SPSA), Finite Differences Stochastic Approximation (FDSA) (Vaze et al., 2009; Sundaram et al., 2011; Flötteröd et al;, 2011; Omrani and Kattan, 2012). The performance of SPSA for the calibration of large-scale traffic simulation models has also been demonstrated. 30 For example, Balakrishna et al. (2007a) adapted the systematic traffic simulation model calibration methodology for the simultaneous calibration of all demand and supply models within a microscopic traffic simulation model using aggregate, time- varying traffic measurements; Balakrishna et al. (2007b) presented a systematic offline DTA calibration methodology that estimated all demand-and-supply inputs and parameters simultaneously. Efforts have also been made to support decision making on various transportation planning and operational policies. Problems investigated include selection of a charging cordon in a general traffic network using Genetic Algorithm (GA) (Sumalee, 2004), developing a decision support tool for mitigating traffic congestion (Melouk et al., 2010), determining the traffic light signal timings for a single intersection with stochastic approximation (Fu and Howell, 2003), optimizing regional signal timing strategies using surrogate modeling (Osorio, 2010), optimizing coordinated, area-wide traffic signal control considering drivers re-routing behaviors using a metaheuristic model (Teklu et al., 2007), jointly optimizing traffic control and transit priority settings with GA (Stevanovic et al., 2008), optimizing waterway transportation investment with SPSA (Ting and Schonfeld, 1998), selection of interdependent transportation projects considering cost uncertainty and budget constraint with GA (Tao and Schonfeld, 2005), selection and scheduling of interdependent transportation projects with island models considering cost uncertainty (Tao and Schonfeld, 2007) or not (Tao and Schonfeld, 2006), scheduling of interrelated waterway projects with single budget constraint (Wang and Schonfeld, 2005), constraints on project multiplicity and precedence (Wang and Schonfeld, 2008) 31 or even more complicated constraints on project multiplicity, precedence and regional budget (Wang and Schonfeld, 2012) using GA, scheduling of waterway maintenance projects with constraints on budget and lock condition using GA (Wang et al., 2009), and to develop a heuristic approach for system wide highway project selection based on total benefit maximization under budget uncertainty (Li et al., 2010). Furthermore, to enhance the efficiency of simulation based optimization with heuristic methods, Yang (2010) developed a hybrid approach combining simulation and analytic methods along with parallel computing techniques. A first stage coarse search was conducted based on the analytic model, and in the second stage, a refined search based on simulation model was then performed inside the promising region provided by the first stage. The most widely used SBO approach is heuristic methods (i.e. GA and simulated annealing), which can be conveniently applied to deal with different problems, while they are not very efficient in terms of searching for the optima. In a few other studies, stochastic approximation and surrogate modelling methods are investigated and applied. Besides being directly utilized for solving optimization problems, surrogate- based methods are becoming popular and have been applied for other types of problems in the transportation domain in recent years. This type of methods is gaining attention mainly due to its ability of approximating the global trend of the objective function and power of significantly reducing the number of simulation runs. In particular, Ciuffo et al. (2011) utilized Kriging metamodeling to verify different micro-simulation calibration methods. Sensitivity analysis of traffic simulation 32 models have also been benefited from the application of surrogate models (Retherford and McDonald, 2011; Ciuffo et al., 2013). 2.4 Summary Simulations have shown their advantages in representing the realism and various detailed aspects of the transportation system, and simulation-based optimization has shown its potential in supporting transportation planning and operations decision makings. Although various SBO methods have been developed for decades, the application of SBO in the transportation domain is still very few. According to the “no free lunch” theorems (Wolpert and MacReady, 1997; Ciuffo and Punzo, 2013) in the optimization community, there is no algorithm that outperforms all the others over the entire domain of problems, which means that the choice of the most appropriate algorithm depends on the features of the specific problem. As transportation simulations take account of the interactions between complex travel demand patterns and network supply, it is always time consuming to evaluate the system performance. It can take up to hours to days for a single run of a regional network simulation. Thus, simulation-based optimization methods that require fewer objective function evaluations are desired. Meanwhile, due to the stochastic nature of transportation simulations, method that can deal with the simulation noise with fewer replications at design points is preferred. Although online simulation, as well as real-time traffic control, has appeared along with the 33 development of technology, most of transportation planning and operations decisions are still made offline due to the constraint of computational power. In this case, optimization methods that can take advantage of distributed computing are favored. Moreover, transportation simulation models are usually very complex due to the consideration of the interactions among travelers and between travelers and the network on all dimensions (e.g. departure time, mode, route, destination, etc.). The underlying models for different simulators are usually different. To reduce the potential cost of transferring the developed methods from one simulator to another, methods that do not require specific knowledge about the underlying structure of the simulation is of interest. Therefore, we favor derivative-free methods for the optimization problems based on transportation simulation. Overall, according to all these features associated with transportation simulations, surrogate-based methods are chosen as the focus of this dissertation. In addition of applying existing surrogate-based methods, improvements will be made to better adapt the methods to specific problems and enhance the performance of the optimization procedure. 34 Chapter 3: Surrogate-Based Optimization Procedures and Application 3.1 Framework A framework of surrogate-based optimization procedures using transportation simulations is illustrated in Figure 3-1. Figure 3-1: Transportation Simulation-based Optimization Procedure using the Surrogate Methodology. 35 The first step is to define the optimization problem. Given the objective functions that can be evaluated by simulation outputs, we generate an initial number of toll charges through a design of experiments (DoE). The DoE method used in the current study is Latin Hypercube Sampling (LHS), which would be introduced in the next subsection. The second step is to select a transportation simulator for the specific optimization problem. Run the initial set of design points using the chosen simulator. Based on the simulation outputs, we evaluate the objective functions. Considering the generally large number of DoE cases (that aim to obtain a more accurate surrogate), the analysis could be computationally expensive. Simulation outputs of a transportation network usually involve random noises, so it is better to run each sampling point for several repetitions and estimate the mean value of objective function evaluations, if the computational burden is affordable. The third step is to construct a response surface using surrogate models. In this study, first we adopt one-stage surrogate models, i.e. quadratic polynomial function, Gaussian RBF, ordinary Kriging and SVR. To find new design points based on the initial samples, we then consider two-stage surrogate models by infilling points to the initial set using criteria such as the probability of improvement and the expectation of improvement across the response surface. Due to simulation random errors, we may also incorporate the surrogate models that are capable of dealing with noises. Once all of the cases are analyzed, proper parameters of the surrogate models can be determined. This step is regarded as the most important component in the whole framework. 36 The fourth step is to assess and validate the assumed surrogate models by comparing an additional test set of objective function data with values estimated by the surrogates at points corresponding to the variables at which the independent objective function values are calculated. On the basis of the error observed with the validation dataset, the accuracy of each surrogate model is checked using certain criteria such as correlation coefficient or coefficient of determination to determine whether the initially assumed surrogate model is appropriate. If it is, the best surrogate model will be employed to explore the optimal solutions. If the evidence shows that a certain surrogate model does not achieve the required predictive performance based on the current test dataset, a proper way is to recall the two-stage surrogate models to generate infill points and run transportation simulations for new points until the accuracy criteria are reached. Finally, we find the optimal solution using the optimized surrogate models. Though the estimated response surface may be too complex to explore its global optima using analytical techniques such as gradient descent method, we can still apply a heuristic approach, e.g. GA, to seek the global optima for the estimated response surface. The computational costs of this tuning process can be neglected compared to the burden of transportation simulations. All key components in the framework are highlighted in shadow boxes in Figure 3-1. The details of three of the main components: DoE, surrogate model construction, and infill strategies, will be further explained in the following subsection. 37 3.2 Optimization Procedures This section explains the technical steps that are necessary to apply the surrogate models to simulation-based optimization problems. 3.2.1 Design of Experiments A space filling DoE is useful when only a few runs of simulation can be afforded within the computational budget. In the current study, LHS is employed to generate initial samples to fit surrogate models. Each design variable is stratified into an equal number of intervals according to the LHS setting. Being different from classic designs such as 2k or 3k fractional factorial designs and central composite designs (CCD) in (Montgomery, 2008), where each dimension of the decision variable is split into a relatively large number of equal-size bins, in which subsamples are uniformly generated. The LHS is advantageous in that the mapping of high dimensional design inputs into each dimension is uniformly distributed without overlap. Thus, such a property makes LHS one of the space-filling types of DoE. In this paper, we recall the maximin design defined by Forrester et al. (2008). The ranking criterion function proposed by Morris and Mitchell (1995) is 1/ * 1 arg min ( ) arg min pm p p j j j J d        X XX X (3.1) where ( ) ( ) ( ) ( ) T ( ) ( )|| || ( ) ( )i j i j i jd     x x x x x x , values of m distances are sorted in an ascending order, i.e. 1 2 ,..., md d d   . Let 1 2, ,..., mJ J J be the number of 1 2, ,..., md d d , respectively. To illustrate the concept of LHS, we generate 100 different 38 2-dimensional plans. Each one contains 20 points. Then we calculate the p values of 100 different plans. Figure 3-2 shows the evolutionary process of the sampling plan space filling values, two of which are zoomed in and compared. The LHS plan with a lower value of p distributes more uniformly in the feasible domain. Figure 3-2: An Illustration of 100 LHS DoEs Based on the Spacing Filling Criterion ( )p X . 3.2.2 Surrogate Model Construction Surrogate models serve as the approximation of the true response surface of expensive-to-evaluate objective functions. As the closed form of the surrogate models is available, great computational cost can be saved when searching for the optimal 39 solution. Common forms of surrogate models in the literature are introduced in this section.  Quadratic Polynomial Function Using the DoE generation approach introduced above, we have an initial set of sampling plan (1) (2) ( ) T[ , , , ]nX x x x and responses (1) (2) ( ) T[ , , , ]ny y yy  . The most commonly used surrogate model is the quadratic polynomial function given by 2 0 1 1 ˆ( ) , k k k i i ij i j ii i i i j j i f x x x x             x x  (3.2) where T1 2[ , ,..., ]kx x xx is a k –dimensional point to be predicted, ˆ ( )f x is the estimate of the real objective function ( )f x , 0 is the intercept, i s are the linear coefficients, ij s are the coefficients of interaction terms, ii s are the quadratic coefficients.  Radial Basis Function (RBF) Compared with lower order polynomial function, RBF surrogate models can obtain better approximations to true objective functions of high nonlinearity. RBF uses the basis function ( )r that only depends on the radial distance r between x and each sample point ( )ix . It assumes that the correlation of arbitrary two sample points depends only on the distance (e.g. Euclid distance) in the decision variable space. We seek a RBF approximation to fˆ in the form T ( ) 1 ˆ ( ) (|| ||) n i i i f w    φx w x x (3.3) where w is the vector of weighted coefficients of RBF vector φ , and ( )|| ||ix x is the Euclidean norm. 40 A Gaussian basis function is used in this paper, i.e.  ( ) ( ) 2(|| ||) exp || || , 0i ic c     x x x x (3.4) where c is the shape parameter that can be determined by tuning the minimization of a cross-validation (CV) error estimate in the optimization step. It is worthy to note that c can be various if we define different radial basis functions (RBF network), while normalization of input variables is not necessary for the Gaussian basis function because there are weight parameters for each function, so a universe c will be used for the Gaussian radial basis function in this paper. The prediction at a new point is given by 1 Tˆ( ) ( )f x Φ y φ (3.5) where Φ denotes the so-called Gram matrix, each element of which is defined as ( ) ( ) , (|| ||)j ii j  Φ x x . The prediction error at any x in the design space is given by (Gibbs, 1997) 2 T 1ˆ ( ) 1s  x φ Φ φ (3.6)  Kriging Method The Kriging method predicts a response by summarizing a linear model and a high frequency variation component that represents fluctuations around the trend. In this study, we will consider the ordinary Kriging model ( ) ( ), E[ ] 0f     x x (3.7) where  is the mean of the objective function, and  is the estimation error with a covariance of ( ) ( ) 2 ( ) ( )Cov[ ( ), ( )] ( , )i j i j   x x x x , 2 is the variance, ( ) ( )( , )i j x x is the Kriging basis function with the following correlation form 41 ( ) ( ) ( ) ( ) 2 1 ( , ) exp ( ) ki j i j l l l l x x        x x (3.8) where T1[ , , ]k θ  is a vector of scaling coefficients that allow different widths of the basis function for each dimension of the k -dimensional x decision variable. The element of correlation matrix based on all the observed data is ( ) ( ), ( )i ji j  Ψ x x . Suppose all observed data are jointly Gaussian distributed, the likelihood function can be formulated as T 1 2 2 /2 1/2 2 1 ( ) ( )( | , ) exp(2 ) | | 2nL            y 1 Ψ y 1y Ψ (3.9) where Ψ is the determinant of Ψ , and y is a 1n vector representing the observation of the n sample points. 1 is a 1n unit column. By taking the derivatives of the log-likelihood function with respect to  and 2 , and set them to be zero, the maximum likelihood estimates (MLEs) of  and 2 are then T 1 T 1ˆ   1 Ψ y1 Ψ 1 (3.10) T 1 2 ˆ ˆ( ) ( )ˆ n     y 1 Ψ y 1 (3.11) Substituting ˆ and 2ˆ into Equation 3.9, the vector of scaling coefficients θ can be tuned through maximizing 2ˆ ˆ( | , )L  y . To make predictions at a new point *x using the Kriging model, suppose *( )f x is the supposed function value at *x . We can add this new point to the observed dataset, and calculate the maximum likelihood estimation (MLE) of the mean value and variance at the new point ܠ∗ based on Gaussian Process (Rasmussen and Williams, 2006). 42 T 1ˆ ˆ ˆ( ) ( )f    x ψ Ψ y 1 (3.12) T 1 2 2 T 1 T 1 1ˆ ˆ( ) 1s           1 Ψ ψx ψ Ψ ψ 1 Ψ 1 (3.13) where ψ is the vector of the correlation between the new point *x and all the sample points, expressed as (1) ( ) T[ ( , ), ( , )]n  x x x xψ  . The aforementioned ordinary Kriging model is noise free. So the predictions at sampled points are exactly the same as observations, which may be biased when simulation noise is taken into account. As a consequence, the surrogate response surface may perform over fitting features because the estimated response surface needs to pass all sampled points in the ordinary Kriging model. A model which has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data. To get rid of this problem, a regularization constant is added into the correlation matrix to filter noise. The regressing Kriging model can be used by adding the error estimation of the observed data to the diagonal of the correlation matrix, so that the new matrix is  Ψ Ψ I , where  is the regulation constant, I is identity matrix.  can also be determined through MLE. Using the regressing Kriging model, the predicted mean value and estimation variance a the new point *x are given by * T 1ˆ ˆ ˆ( ) ( )r rf    x Ψ y 1ψ (3.14) T 1 2 * 2 T 1 T 1 1ˆ ˆ( ) 1r rs             1 Ψx Ψ 1 Ψ 1   ψψ ψ (3.15) And the MLEs of the mean and variance of the Gaussian Process are given by 43 T 1 T 1ˆr   1 Ψ y1 Ψ 1   (3.16) T 1 2 ˆ ˆ( ) ( )ˆ r rr n     y 1 Ψ y 1 (3.17)  Support Vector Regression (SVR) SVR is one of the most important applications of support vector machine (SVM). An overview of the basic ideas underlying SVR for regression and function estimation has been given in (Smola and Schölkopf, 2004). The key attribute of SVR is that it specifies and calculates the so-called  -margin within which the sample data errors are accepted without impacts on the surrogate prediction. The prediction is determined entirely by the support vectors that lie on or outside the  -margin (Forrester et al., 2008). In the  -SVR, the goal is to find a surrogate that has the least  deviation from the observations for all the training dataset, and at the same time is as flat as possible. SVR is a powerful tool for prediction given large, high- dimensional datasets. The parameter tuning time is longer than other surrogate methods due to the presence of the quadratic programming problem during the computation process. However, the additional computation time is marginal compared to the time spent on simulation. SVR is selected as an alternative surrogate- based optimization method in this study and compared with other aforementioned surrogate models. The technical details of the SVR method would be introduced later in Chapter 4. 44 3.2.3 Infill Strategies To enhance the accuracy of surrogate models based on initial samples, further objective function evaluations based on certain infill or update strategies are required. This section incorporates the suboptimal exploration strategy that induces local optimization and the global exploration strategy that is promising to locate the global optimum.  Suboptimal infill strategy The local optima search strategy can be achieved by exploration over the surrogate surface estimated using the aforementioned surrogate models. In this study, we use GA to explore ˆ ( )f x and seek its global optima. The update point is given by min max update ˆarg min ( )f  x x xx x (3.18) Then the simulation output at updatex will be evaluated by an extra simulation run, i.e. updatey . The infill strategy will be terminated when the maximal number of simulation runs is reached or the Euclidian norm of two adjacent update points is smaller than a predefined tolerance.  Global optimal infill strategy Global optimization can be classified into deterministic and stochastic methods. The former one generates a deterministic sequence of points converging to a globally optimal solution. Transportation simulation-based optimization problem may not belong to deterministic category because various sources of uncertainties lead to stochastic simulation outputs, e.g. random seeds in trip generation, probabilistic route choice behaviors of travelers, and DTA. The latter one randomly generates feasible 45 updating points to infill the initial samples using a number of heuristic algorithms for the optimal parameter tuning (Kim and Adeli, 2001; Baraldi et al., 2011; Chabuk et al., 2012; Sarma and Adeli, 2001; Song et al., 2013). To obtain the global optima for expensive-to-evaluate functions, a series of two-stage procedures can be incorporated (the first stage includes DoE and surrogate model construction). The second stage conducts the exploitation process to locate promising regions. The global optimization has been investigated in quite a few existing studies. Jones et al. (1998) proposed the Efficient Global Optimization (EGO) based on Kriging basis functions, and applied the expected improvement of the surrogate to select new points. To handle noisy objective functions, Huang et al. (2006) provided the Sequential Kriging Optimization (SKO) as an extension of the EGO algorithm. Villemonteix et al. (2009) proposed the Informational Approach to Global Optimization (IAGO) that selects the infill point based on the entropy minimization. Jakobsson et al. (2010) proposed an RBF-based surrogate model for global optimization of expensive and noisy black box functions, whereas updating infill points minimize the total model uncertainty weighting. More detailed discussions on the exploration and exploitation process can be found in (Jones, 2001; Forrester and Keane, 2009; Kleijnen, 2009). The two most common methods use estimated standard deviation information to select an infill sample with the maximum probability of improvement (PI) or expected improvement (EI), which are given by min 2 2 ˆ1 ( ( ))PI( ) exp dˆ2 ( )ˆ2 ( ) y u f uss        xx xx (3.19) 46 min 2 min 2 ˆ1 ( ( ))EI( ) ( ) exp dˆ2 ( )ˆ2 ( ) y u fy u uss         xx xx (3.20) where PI( )x and EI( )x are the PI and EI estimations at the point x , miny denotes the smallest value of all outputs in the training dataset. 3.2.4 Summary Table 3-1 shows the twelve models we investigate in the current study. Method 1 (M1) is the quadratic polynomial function based surrogate, which only recalls the 2nd-order polynomial functions with interaction terms. Methods 2, 3 and 4 are both one-stage models that estimate the response surface only using the initial samples. Methods 5 through 8 are suboptimal two-stage approaches. M9 and M10 enhance the Kriging method with the global optimal infill strategy using PI and EI maximization, respectively. To deal with noisy data, M11 is the one stage regressing Kriging model. The main difference between Kriging with and without noisy errors is that the estimated response surface would pass through the known points in M3, while M11 allows some bias to the known points to obtain a much smoother response surface. Finally, M12 is the EI infill re-interpolating Kriging method. The 12 methods will be tested and compared using a small transportation network with additive toll links in the following subsection. The selected methods from the numerical test are then applied to a real transportation network in the State of Maryland. 47 Abbr. Methods One-stage surrogate Infill surrogate Global optimum Simulation noise M1 Quadratic polynomial √ × × × M2 Gaussian RBF √ × × × M3 Kriging √ × × × M4 SVR √ × × × M5 Suboptimal updating quadratic polynomial × √ × × M6 Suboptimal updating Gaussian RBF × √ × × M7 Suboptimal updating Kriging × √ × × M8 Suboptimal updating SVR × √ × × M9 Probability of improvement infill Kriging × √ √ √ M10 Expected improvement infill Kriging × √ √ √ M11 Regressing Kriging √ × × √ M12 Expected improvement infill Re-interpolating Kriging × √ √ √ Table 3-1: Characteristics of Surrogate Models. 3.3 Numerical Test This section tests the surrogate models in Table 3-1 using a second-best social optima additive highway pricing with a fixed demand for a small network. The user equilibrium (UE) assignment is chosen because the true objective function can be exactly known through an analytical derivation, so we can validate the estimated response surfaces with the true response surface. Though UE and simulation are quite different in objective function evaluations, e.g. mean travel time of all travelers, the input-output mapping can be estimated and validated through surrogate models. The test based on the small network with UE assignment can provide insights on the features of different surrogate models and help to identify the most capable method that can be used to model the input-output mapping in a larger network. The link-based pricing scheme is investigated as the second-best toll charging in a small road network, where tolls are charged only on a subset of selected links, 48 which can be categorized as a Mathematical Program with Equilibrium Constraints (MPEC) (Yang and Huang, 2005). The second-best road pricing problem in this paper is to choose a set of optimal toll charges to minimize the total travel time (or the average travel time due to fixed demand). The bi-level mathematical program with equilibrium constraints can be formulated. The upper level model is * * *min ( , ) ( )a a a a A F t q q  z z q (3.21a) * 0s.t. arg min ( , )d aq a a A c q q   qq z (3.21b) min max z z z (3.21c) , rs rs rs a p ap r R s S p P q f a A       (3.21d) , rs rs rs k p kp r R s S p P q f k K       (3.21e) , , rs rs p rs p P f d r R s S     (3.21f) 0, , ,rsp rsf r R s S p P    (3.21g) where F is the total travel time function, T1[ ,..., ]kz zz is the link toll vector, satisfying k K , K A is a subset of tolled links, A is the whole link set, * * T[ , , ]aqq   is the equilibrium link flow vector, *aq is the equilibrium flow of link a , satisfying a A , at is the average travel time of link a , constraints (3.21b) through (3.21g) are the conservation conditions, rspf is the path flow of OD pair ( , )r s , rsap is the 0-1 indicator, R and S are origin and destination sets. 49 The well-known Frank–Wolfe method can be used to solve the lower level programming problem of the traffic equilibrium model (Ramadurai and Ukkusuri, 2011; Szeto et al., 2011; Aziz and Ukkusuri, 2012; Unnikrishnan and Lin, 2012). The solution of the bi-level programming problem can be obtained by using the gap function approach solved by the augmented Lagrangian algorithm (Yang and Huang, 2004; Meng and Wang, 2008). Travelers are assumed to be homogeneous with identical values of time. From the perception of link-based cost, the generalized cost function ac of link a can be expressed as follows ( )( , ) ( ) , k k k a a a a z t q a k Kc q t q a A a K         z (3.22) where kz and kt are the toll charge and average travel time of link k ,  is the value of time. Figure 3-3: Numerical Network. (link 1 and 2 are the tolled links) Consider a network depicted in Figure 3-3, consisting of 6 nodes and 8 links. Link 1 and 2 are road segments subject to toll charge. The BPR link performance function is applied 0( ) 1.0 ,aa a a a qt q t a AC            (3.23) 50 where 0at is the free-flow travel time, aC is the link capacity, parameters are 0.15  and 4  , see Table 3-2. Link 1 2 3 4 5 6 7 8 0 at 20 20 20 20 6 1 1 6 aC 800 800 600 600 500 800 800 500 * aq 686 686 314 314 314 0 0 314 *( )a at q 21.6 21.6 20.2 20.2 6.1 1 1 6.1 Table 3-2: Input Data and Equilibrium Flow for a Small Road Network. Figure 3-4: The True Response Function of *( , )F z q . There is only one O–D pair from node 1 to node 3 with a demand of 1000d  (flow units), and the value of time is 1  . Four paths from node 1 to node 3 using links are: 1–2, 1–6–4–8, 5–3–4–8, 5–3–7–2. One of the optimal toll charges is * T[5.555, 4.045]z . The minimized average travel time is min 46.22F  . Optimal 0 2 4 6 8 10 0 5 1046 48 50 52 54 Link 1 toll z1Link 2 toll z2 Av era ge tra vel tim e 51 path flows are * *1 3 686f f  and * *2 4 0f f  . Figure 3-4 shows the true responses of the upper level objective function *ˆ ( , )F z q corresponding to 1z and 2z . It is an interpolation surface based on a uniform 20 20 grid. The simulation random errors are because the UE link flows may be not integers. To compare the surrogate-based optimization approaches shown in Table 3-1, we generate 10, 20, 30 and 40 initial LHS samples for the five one-stage surrogate models M1, M2, M3, M4 and M11, respectively. Then we generate other initial 8, 15, 25 and 35 LHS points for the two-stage models M5, M6, M7, M8, M9, M10 and M12, then add 2, 5, 5 and 5 infill points to the initial samples, respectively. The surrogate models are then validated by a uniform 20 20 grid sample points. The overall performance of the surrogate models is evaluated using 6 accuracy measures: (1) Root Mean Square Error (RMSE), which provides a global error measure over the entire design domain ( ) ( ) 2 1 1 ˆ( ( ) (RMS ))E n i i i f fn    x x (3.24) (2) Maximum Absolute Error (MAE), which is indicative of local deviations ( ) ( ) 1 ˆMAE max | ( ) ( ) |i i i n f f    x x (3.25) (3) Normalized Root Mean Squared Error (NRMSE) ( ) ( ) 2 1 ( ) 2 1 ˆ[ ( ) ( )] NRMSE ( ( )) n i i i n i i f f f       x x x (3.26) (4) Normalized Maximum Absolute Error (NMAE) 52 ( ) ( ) ( )1 1( ) 2 1 ˆmax | ( ) ( ) | 1 ˆNMAX , ( )1 ( ( ) ) i i n ii n n ii i f f f fnf fn          x x x x (3.27) (5) Estimated Global Optimum (EGOp) *ˆ ˆEGOp ( )f x (3.28) where * * ˆˆarg min ( ), arg min ( )f f x x x x . (6) Pearson correlation coefficient (PCC) 2 2 2 2 2 2 ˆ ˆ ˆ ˆ[ ( ) ][ ( ) ] N ff f fr N f f N f f               (3.29) where N is the number of independent set of objective function data to be compared, f denotes objective function values from the independent test set and fˆ are the corresponding surrogate model estimations. If 2 1r  , the surrogate is exactly predicting the test data, while 2 0r  indicates no correlation between the surrogate and the objective function. 53 Figure 3-5: Validation of Surrogate Models. Figure 3-5 shows the estimation errors by comparing the prediction values with true objective function for the uniform 20 20 grid. The Pearson correlation PCC, RMSE, MAE values are plotted for each surrogate model and different sample size. For all surrogate models except M2 and M6, the estimation accuracy would be higher when the sample size increases. Table 3-3 shows the results of the 12 models in terms of five measures of effectiveness (MoEs) under the largest test sample size, i.e. 40 evaluations of the objective function. The minimum values of each column are highlighted in bold. Results show that the best model with the smallest errors of RMSE, MAX, NRMSE and NMAX is M11, and the second best model is M12. The smallest 0 0.5 1 PC C 0 0.5 1 1.5 RM SE M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M120 5 10 MA E Surrogate models 10 evaluations 20 evaluations 30 evaluations 40 evaluations 54 * *ˆ ˆEGOp ( , )F z q is 44.33 obtained by M6; however, its true response is 48.68 that is larger than the estimated minimum. From other MoEs of M6, we can see its overall prediction accuracy is poor. In the last column, seven models finally converge to the global optima. Method RMSE MAX NRMSE NMAX EGOp *ˆ( )f x M1 0.21 0.95 0.44% 0.68 46.15 46.22 M2 0.76 5.62 1.57% 4.03 46.20 46.24 M3 0.24 1.44 0.50% 1.03 46.12 46.22 M4 0.94 3.67 1.95% 2.63 46.45 46.25 M5 0.19 0.71 0.40% 0.51 46.22 46.23 M6 1.30 7.21 2.69% 5.17 44.33 48.68 M7 0.24 1.20 0.50% 0.86 46.22 46.22 M8 1.13 3.76 2.34% 2.69 47.20 46.23 M9 0.36 1.41 0.75% 1.01 46.22 46.22 M10 0.23 1.09 0.48% 0.78 46.22 46.22 M11 0.10 0.45 0.21% 0.32 46.09 46.22 M12 0.16 0.57 0.33% 0.41 46.21 46.22 Table 3-3: The Estimation Accuracy Comparison of Surrogate Models, under 40 Times of Objective Function Evaluations. The novelty in this work is the computational time savings. To demonstrate how surrogate models can intelligently mimic simulation-based objective function evaluation and reduce computational times, we compare the convergence and efficiency of surrogate models with GA (the population size is 10, generations are 10, other parameters are default value given by MATLAB R2010a GA Toolbox). Figure 3-6 quantifies the computational savings obtained from this method using the number 55 of objective function evaluations. We can see that the best surrogate model (M12) can find the global optima only using 10 evaluations. 0 10 20 30 40 50 60 70 80 90 10046 47 48 49 50 51 Numer of objective function evaluations Av era ge tra vel tim e Best: 47.3164 Mean: 47.9666 GA best fitness GA mean fitness Best surrogate: M12 Worst surrogate: M8 Figure 3-6: An Illustration of the Higher Computational Efficiency of the Surrogate Models compared to GA. In summary, the three best one-state surrogate models are M1, M3 and M11 (M1 performs better mainly because the true response surface is not very complex as shown in Figure 3-4), and the best two-stage surrogate models is M12. In the following of this chapter, we will apply these four models in a real world transportation network. 3.4 Test Bed for Real World Application 56 In terms of application, this chapter would investigate the optimal dynamic pricing of toll facilities in a regional transportation network using the selected surrogate models in the previous section. In order to apply SBO for real world transportation planning and operations decisions, a road network as well as a well- developed simulator is required. This section will introduce these two components of our test bed for the real world application. 3.4.1 The ICC Road Network in Maryland The Inter-County Connector (ICC) is probably the most significant and high- profile highway project in Maryland since the completion of the existing interstate freeway system several decades ago. It links existing and proposed development areas between the I-270/I-370 and I-95/US-1 corridors within central and eastern Montgomery County and northwestern Prince George's County. The simulation model covers the central and eastern Montgomery County and the northwestern Prince George’s County of the State of Maryland. Before the construction of ICC, there was no freeway connecting the areas lying northwest and northeast of the capital beltway. The traffic between these two areas usually travels through I-495, which contributes to the severe congestion on I-495 during peak hours. The ICC was constructed aiming at promoting development of the surrounding areas as well as alleviating congestion on I-495. The ICC is a toll facility with different toll rates for its five segments, and the toll rate for each segment is variable along time. Vehicles with E-ZPass, an electronic toll transponder, are charged directly when they 57 travel through ICC. If a vehicle without E-ZPass uses ICC, a $1 video processing fee is added to the total price and a bill sent to the vehicle registration address. Figure 3-7: The Inter-County Connector (in thick line) and Regional Network. (Source: http://www.mdta.maryland.gov/ICC/Toll_Rates.html) To test the effectiveness of applying a simulation-based optimization method to improve the transportation system performance, a case study on optimizing the toll scheme of the ICC in Maryland has been conducted. A simulation model for the regional network is developed to evaluate the system performance. All freeways and arterial roads within the region in Figure 3-7 are included in the transportation network, which is relatively large with 201 TAZs, 1077 nodes and 2158 links. In our case study, actuated signal timings were coded into DynusT for all intersections that have signals applied in real world in the network. In a previous research on the before-and-after study of the ICC road (Zhang et al., 2013) for the same regional 58 network, the dynamic travel demand has already been calibrated and validated with field data from detectors and floating cars. The simplified current pricing scheme for two-axle vehicles with E-ZPass during different time periods as well as the proposed limit of the toll rates is summarized in Table 3-4. Parameter (unit) Symbol Baseline Lower bound Upper bound Toll charge in peak period, Segment I ($) Between I-370 and MD 97 1z 1.45 0 3 Toll charge in peak period, Segment II ($) Between MD 97 and MD 182 2z 0.60 0 1.5 Toll charge in peak period, Segment III ($) Between MD 182 and MD 650 3z 0.75 0 1.5 Toll charge in peak period, Segment IV ($) Between MD 650 and US 29 4z 0.65 0 1.5 Toll charge in peak period, Segment V ($) Between US 29 and I-95 5z 0.70 0 1.5 Off-peak / Peak toll charge ratio  80% 0 100% Table 3-4: Selected Design Parameters and Baseline Value. 3.4.2 Open Source DTA Simulator: DynusT DTA models fill the gap between static travel forecasting models and microscopic traffic simulation models, and enable modeling traffic dynamics at a relatively large scale within a reasonable amount of time. In the DTA framework, UE condition is only applied to travelers departing at the same time between the same OD pairs. Time-dependent shortest paths for travelers are computed based on time- varying link travel times when they arrive at the various links along a route. 59 DynusT (Dynamic Urban Systems in Transportation) is a simulation-based DTA model, which adopts the dynamic interactions between the network supply and user demand. DynusT performs well regarding its computational efficiency. However, it is essentially a route choice model. Some important aspects of travel demand analysis such as trip generation, mode choice and departure time choice are not enabled in DynusT. Time-varying link travel time needed for DTA in DynusT is retrieved from the Anisotropic Mesoscopic Simulation (AMS) model (Chiu, et al., 2010b), which is a vehicle-based mesoscopic traffic simulation approach that explicitly considers the anisotropic property of traffic flow in the vehicle state update at each simulation step. DynusT applies a gap function vehicle-based (GFV) solution algorithm to solve the DTA problem (Chiu and Bustillos, 2009). For each iteration and each OD-departure time combination, the number of vehicles to be updated with a new path is dependent on the relative gap function value, and vehicles with longer travel time are prioritized for path updating. Compared with the widely used successive average method, GFV can avoid over adjustments of flow and thus lead to more consistent and robust assignment results. Meanwhile, DynusT adopts a method of isochronal vehicle assignment which divides analysis periods into epochs and sequentially performs vehicle assignment in each epoch (Nava and Chiu, 2012). This significantly improves the model scalability regardless of the total analysis period. In the newly released 2012 version, DynusT has been fully parallelized in simulation, time-dependent shortest path and assignment algorithms, and therefore boosts the computational speed dramatically. However, the current simulator doesn’t address capacity drop due to congestion. 60 Although other models of both microscopic and mesoscopic traffic simulation are widely available, (e.g. DynaMIT, DYNASMART, and Dynameq for mesoscopic models; TransModeler, VISSIM and AIMSUN for microscopic models) and some of them may possess some desirable features, DynusT is selected in this study due to its advantage in computation time. 3.5 Application: Optimal Dynamic Pricing for Toll Roads For the real world application, we focus on the decision making of a public highway operator (such as the government or public agency) on the dynamic pricing a toll road. We assume that most commuting demands do not change departure time or cancel trips due to toll charges. Therefore, fixed commuting demands are assumed for the regional road network for this study. DynusT is selected as the DTA and mesoscopic vehicle simulation tool to evaluate network performance given various link-additive highway pricing rates. The computation time needed to obtain a solution from this black-box function evaluated by DynusT can be considerably reduced by surrogate-based optimization models, which make noisy data processing and computation intensive global optimization feasible. 3.5.1 Background The idea of charging the use of road originates from the economic theory that users should be aware of the costs they impose upon one another and pay for the 61 additional congestion they create. The well-known marginal cost pricing principle was first discussed in Beckmann (1967). Yang and Huang (1998) then further examined this classical principle in connection with the general deterministic network equilibrium problem. They found that the optimal tolls for the variable-demand network equilibrium problems took the same form as the marginal cost toll in terms of maximizing net economic benefits. However, the optimal toll would be different in the presence of queues due to limited road capacity. Optimal toll charge under the stochastic network equilibrium was investigated later in Yang (1999). For the problem of charging toll on only a subset of links in the network, the assumptions of fixed demand (Yang and Lam, 1996) and elastic demand (Yang and Bell, 1997) have been studied. More complex optimization problems with equity constraints and multiple user classes were also investigated (Yang and Zhang, 2002; Yang and Huang, 2004). Although the optimization of the highway toll has been studied extensively in previous literatures, the network model used for traffic routing was always macro- level static user equilibrium models. This type of model suffers from several limitations, e.g. the dynamics of travel behavior along time cannot be captured, and the influence of micro or meso level operational improvements such as traffic signals and dynamic message signs can’t be considered. On the other hand, dynamic network supply models can overcome these shortcomings, and provide more detailed information regarding the system performance. All these features make dynamic network supply models a relatively better approach to evaluate the performance of transportation system under different pricing strategies. Yet the drawbacks with 62 utilizing dynamic network supply model in optimizing road tolls should also be noticed. The DTA is usually coupled with simulation, which makes it impossible to retrieve a closed form of the function mapping toll strategies to the output performance of network. Meanwhile, the simulation of a medium or large scale network may take hours to days to converge. The computational burden incurred by simulations would be a big challenge for the optimization. Moreover, the random noise in simulation as well as DTA outputs cannot be neglected during the process of optimization. Formulating the problem of toll road pricing optimization in the simulation- based optimization framework is advantageous over mathematical formulation in that it relaxes various unreasonable assumptions in mathematical functions and can better represent the traffic realism as well as the travelers’ responses to policy stimuli. On the other hand, surrogate-based method clearly serves as a feasible and promising way to tackle all the challenges associated with simulation-based optimization problems. In the following sections, the application of surrogate-based optimization of toll road pricing with different objective functions would be presented. 3.5.2 Problem Setting and Formulation As introduced in section 3.4.1, this real world application problem is formulated with 6 decision variables, which are the independent peak period toll rate for the five segments of the ICC and the uniform off-peak to peak toll charge ratio. To apply the simulation-based optimization, the first step is to generate an initial sample plan. Utilizing the LHS method, we obtain the optimized sample plan, i.e. X , 63 including 64 initial LHS points plus three chosen inputs: the minimal toll plan min x x 0 , the maximal toll plan maxx x and the baseline inputs baselinex x . The initial sample points are listed in Table 3-5. Sample 1z 2z 3z 4z 5z  Sample 1z 2z 3z 4z 5z  $ $ $ $ $ % $ $ $ $ $ % 1 1.24 1.29 0.07 1.31 0.95 0% 33 2.24 0.36 0.86 0.93 0.24 52% 2 2.57 0.76 1.10 0.83 0.81 92% 34 0.95 1.40 0.81 0.14 0.71 56% 3 0.90 0.98 0.95 1.43 0.10 71% 35 1.71 0.57 0.10 0.12 0.29 86% 4 2.52 0.38 1.29 1.05 1.26 70% 36 0.38 1.45 1.02 0.43 1.21 75% 5 2.67 0.95 1.48 0.55 1.43 60% 37 1.14 1.21 0.93 0.74 1.00 46% 6 1.81 1.24 0.24 0.71 0.93 33% 38 0.19 0.17 0.05 1.12 0.98 38% 7 0.43 1.07 0.43 0.76 0.31 29% 39 0.57 0.64 1.50 1.07 0.88 41% 8 1.48 0.00 1.26 0.05 0.90 35% 40 1.52 1.00 0.29 1.14 0.40 21% 9 1.86 0.40 1.31 0.38 0.43 59% 41 2.33 0.12 1.43 0.33 1.38 89% 10 1.76 0.86 0.83 0.50 0.69 30% 42 0.10 0.88 0.17 0.95 1.14 17% 11 2.48 1.17 0.57 0.36 0.07 63% 43 1.38 0.48 0.71 0.21 1.29 90% 12 0.86 0.24 0.19 0.48 0.05 32% 44 2.10 0.93 0.40 0.90 0.74 95% 13 0.52 0.55 0.88 0.45 1.24 54% 45 0.24 0.05 1.33 1.10 0.17 65% 14 0.62 1.38 1.38 0.79 1.12 16% 46 1.33 0.60 0.74 1.45 0.67 62% 15 0.71 1.48 1.21 0.67 0.50 67% 47 2.38 0.14 1.45 1.19 0.45 81% 16 3.00 1.10 0.36 0.31 0.62 24% 48 1.57 0.90 0.76 1.33 1.45 19% 17 0.05 0.02 0.14 0.02 0.38 10% 49 0.00 0.50 0.67 1.40 0.26 57% 18 2.29 0.07 1.14 1.24 0.57 25% 50 0.76 0.45 0.79 0.81 0.55 40% 19 2.71 0.81 0.60 0.62 1.50 87% 51 1.43 0.43 0.38 0.98 1.02 68% 20 1.19 0.71 0.50 0.00 1.19 37% 52 1.95 1.05 1.07 1.00 1.48 84% 21 2.43 1.43 1.12 0.40 0.76 3% 53 1.00 1.12 0.00 0.07 0.21 14% 22 1.29 0.79 0.26 0.29 0.14 49% 54 0.33 0.19 1.24 0.86 0.86 94% 23 0.48 1.33 1.19 1.50 1.36 51% 55 2.05 1.50 0.45 1.38 0.83 44% 24 0.29 0.21 1.05 0.60 0.33 2% 56 2.00 1.26 0.64 0.10 1.31 48% 25 0.81 1.02 0.12 0.57 0.48 100% 57 0.14 1.36 1.36 0.19 0.00 97% 26 2.86 0.52 1.17 0.17 0.79 22% 58 1.05 1.31 0.69 0.26 1.05 8% 27 2.81 0.74 0.52 0.88 0.12 78% 59 0.67 0.83 0.33 1.48 1.07 76% 28 2.62 0.26 0.55 1.17 1.40 43% 60 2.90 0.67 0.48 1.29 0.02 13% 29 2.76 0.33 0.98 1.02 1.10 11% 61 2.19 0.31 0.21 1.21 0.36 79% 30 1.67 0.62 1.00 0.52 1.33 6% 62 1.90 1.14 0.90 0.64 0.52 83% 31 1.10 0.10 0.02 1.26 0.60 5% 63 2.95 0.69 1.40 1.36 0.19 27% 32 1.62 1.19 0.31 0.24 0.64 73% 64 2.14 0.29 0.62 0.69 1.17 98% Lower bound 0.00 0.00 0.00 0.00 0.00 0% Upper bound 3 1.5 1.5 1.5 1.5 1 Baseline 1.45 0.6 0.75 0.65 0.7 80% Table 3-5: Space-Filling Latin Hypercube of Parameters for DoE. 64 After the initial sample is generated, all the sample points should be evaluated through the simulator. The observation of these sample points would then be used for the constructing of surrogate models as well as the search for optimal solution. From a public agency’s perspective, the objective would be to minimize the total social costs of the whole network or maximize total social welfare, while if a road is privately operated, maximizing total toll revenue may be the main objective. In the current study, the objective is set to minimize the average travel time for all network users given fixed OD demands. The objective function can be formulated as peak peak off off TT peak off ( TT ( ) TT ( , )) min E[ ( , )] E ( )k rs rs rs rs r R s S rs rs r R s S d d f d d                  x z z z  (3.30a) 1 min maxs.t. , k   z z z z  (3.30b) 0 1  (3.30c) T T[ , ], k x z x  (3.30d) where TT ( , )f z is the stochastic average travel time function of the network, we minimize its expectation given the same input z and  to eliminate random simulation errors. The decision vector x includes toll rates z of each toll road segment and the ratio  of off-peak-hour toll rates to the peak-hour values, so x is a k -dimensional decision variable vector; the origin and destination sets are r R and s S , ( , )r s is the OD pair; peakTTrs is the average travel time for trips during peak hours corresponding to the OD pair ( , )r s ; offTTrs is the average travel time of the OD pair ( , )r s in off-peak hours; peakrsd and offrsd are the peak and off-peak demands of the 65 OD pair ( , )r s , respectively; the toll rate ratio is 0 1  ; the box constraints are considered in this model, i.e. minz and maxz , which are lower and upper boundaries for segment toll rates, respectively. 3.5.3 Surrogate Models Evaluation All the sample points are evaluated by DynusT to provide observations for surrogate model construction and SBO. To achieve convergence, 10 iterations of DTA and simulations are run for each toll plan, and the relative gaps for the DTA are found to be below 3% for all experiments. However, to save computation effort as much as possible, every toll plan is evaluated by the simulation-based DTA only once but including 10 iterations, despite the existence of noise in the route choice results. For each simulation run, the simulator obtains valid results when the convergence is achieved after several times of assignments and vehicular platoon simulations. To simplify the optimization problem, this case study only cares about the travel time for users departing during the extended morning peak period (5–10 am). This extended morning peak consists of two off-peak periods (5-6 am and 9-10 am) and the morning peak period (6-9 am). To make the simulation results for the research period more reliable, one hour before 5 am is added into the simulation for warming up, and an extra hour after 10 am is also introduced to let the vehicles in the network disseminate. Thus a total of 7 hours is simulated in DynusT. Using a desktop with 3.10 GHz-quad CPU and 4 GB-Ram, one evaluation of each point takes about 5 hours. The average travel time for all network users during the extended morning peak period that finished their trips is computed after each simulation. Besides the 66 single simulation for each sample points, to characterize the baseline, we run 10 repetitions that produce 17.96, 17.94, 17.71, 18.22, 18.30, 17.83, 18.04, 17.78, 19.31, 18.44 minutes, respectively. The average baseline output is 18.15 min, and the standard deviation is 0.47 min, which could be an estimate of the simulation noise standard deviation, i.e. noiseˆ 0.47  min. To estimate the mean and standard deviation of stochastic simulation outputs of the optimal variables, we will also run 10 repetitions, and then compare the estimated mean objective function value and standard deviation with the baseline to see how much improvement can be achieved after optimization. As the simulation of the ICC network costs about 5 hours for each sample shown in Table 3-5, the surrogate models help reduce tremendous computation time compared to a traditional scenario study, which needs to evaluate all possible solutions through the entire feasible domain. Methods RMSE MAX NRMSE NMAX EGO *TT ˆE[ ( )]f x M1 0.64 1.63 3.63% 3.18 15.69 - M3 0.52 1.16 2.91% 2.26 17.14 - M11 0.52 1.41 2.92% 2.74 17.51 - M12 0.53 1.48 2.99% 2.90 17.36 17.70 Table 3-6: Leave-One-Out Cross Validation Results. Using the four models we chose in the numerical test section, Table 3-6 shows the evaluation of response surfaces using the leave-one- out CV based on the first five MoEs introduced previously. We can see that the ordinary Kriging (M3), regressing Kriging (M11) and the expected improvement infill re-interpolating Kriging (M12) 67 approaches produce smaller estimation errors than the quadratic polynomial model (M1), which doesn't regress the response surface very well (larger RMSE, MAE and NMAX) though its EGO is extremely small. However, the simulation outputs of the samples should be treated as random variables with simulation errors instead of an accurate number. In this case, the aforementioned 5 MoEs are not suitable for evaluating the performance of these response surface models any more. Thus a new method analyzing the effectiveness of estimated confidence interval is proposed to evaluate model performance for the cases with significant simulation errors. The best model is then identified using this approach, and is then applied to search for the optimal toll rate in minimizing network average travel time. Since no a priori information of simulation random errors is available, such random deviations from the expected smooth response can be simplified as uniformly distributed across the feasible domain, i.e. 2TT noiseVar[ ( )] , ( , )f   x z , which is independently distributed with the regression error variance 2ˆ ( )s x . Then the estimated response variance 2TTˆ ( )s x at x is given by 2 2 2TT noiseˆ ˆ ˆ( ) ( )s s  x x (3.31) where 2noiseˆ is the estimate of the simulation noise variance. Figure 3-8 applies the leave-one-out CV to predict the response of each sample point using other points. Four surrogate methods (M1, M3, M11 and M12) are compared. Figure 3-8(a) shows the estimated mean of average travel time values TTˆ ( )f x at the initial 67 points as shown in Table 3-5 using M1. The total estimation standard error given noiseˆ 0.47  is formulated by 68 2 2TT noiseˆ ˆ ˆ( ) ( )s s  x x (3.32) The estimated mean values with one standard error upper and lower bounds are given by TT TTˆ ˆ( ) ( )f x s x for the training set X . As a comparison, we plot the random observations y as well. The sample points in X are sorted according to estimated mean values of the average travel time in a descending order. The TTˆ ( )f x of M1 decreases quickly and the estimated response of the last point adds bias to the observations. (a) (b) (c) (d) Figure 3-8: Prediction Accuracy of the Leave-One-Out Cross Validation: (a) M1; (b) M3; (c) M11; (d) M12. Figure 3-8(b-c) show the CV results of M3 and M11 for the 67 initial sample points. The difference between them is very small, which is also shown in Table 3-6. 69 Figure 3-8(d) shows the results of the re-interpolating Kriging model (M12) for 97 points (including the initial samples and 30 infill points using the EI maximization criterion). The estimated standard deviation is smaller in this model, which means the optima after 30 infill points would have a smaller variance. Note that all random observations are within two standard deviations from the mean accounting for about 95% confidence level. The results show the advantage of M12: much narrower estimation bounds is shown for M12 than for M3, M11 or M1, which indicates a higher predicting concentration. Therefore, based on the overall regression performance (indicated by RMSE, MAX, NRMSE, NMAX and EGO) and the prediction error bounds, we find the infill re-interpolating Kriging (M12) gives the best solution to the problem. 3.5.4 Optimization Results At the end of the 30th update, the estimated best solution is * T[$ 2.28, $ 0.15, $1.29, $1.31, $ 0.24, ]ˆ 69%x , the estimated global mean value of the minimized objective function is * *TT TTˆ ˆ ( ˆ ) 17.36f f x min. The ratio of the off- peak to peak pricing is reduced, not suggesting that tolls in the off-peak are increased, because the peak-hour toll charge rates of the optima increase for highway Segments I, III and IV, but the rates decrease for highway Segments II and V. Based on 10 runs of simulation under the same best input, we find the mean value of simulation outputs is 17.70 min that is 2.5% less than the mean value of the baseline average travel time. We compute the F statistic and p-value based on the analysis of variance (ANOVA) for the baseline and optimized toll charge plans. Results show the F statistic is 4.68 70 and the corresponding p-value is 0.04 that is close to zero indicating mean travel times are significantly different. Figure 3-9: Comparisons of Traffic Flow Volumes of Additive Toll Links between the Baseline and Optimized Toll Rates. The simulation generates statistics of the network performance every minute. As link volume fluctuates significantly at the one-minute interval, Figure 3-9 shows 10-minute moving average traffic flow volume of the 10 additive toll links in the network. The left column shows westbound ICC, and the right column shows eastbound ICC. Two subfigures in each row represent westbound and eastbound toll segment, i.e. Segments I, II, III, IV and V from top to bottom. Overall, the time series of toll link volumes is changed under the optimized toll compared to the baseline, 0 2000 4000 Vo lum e (ve h/h ) Link 307 -> 301 0 2000 4000 Vo lum e (ve h/h ) Link 302 -> 306 0 2000 4000 Vo lum e (ve h/h ) Link 310 -> 309 0 2000 4000 Vo lum e (ve h/h ) Link 308 -> 311 0 2000 4000 Vo lum e (ve h/h ) Link 314 -> 312 0 2000 4000 Vo lum e (ve h/h ) Link 313 -> 315 0 2000 4000 Vo lum e (ve h/h ) Link 318 -> 316 0 2000 4000 Vo lum e (ve h/h ) Link 317 -> 319 0 50 100 150 200 250 3000 2000 4000 Simulation time (min) Vo lum e (ve h/h ) Link 322 -> 320 0 50 100 150 200 250 3000 2000 4000 Simulation time (min) Vo lum e (ve h/h ) Link 321 -> 323 Optima Baseline Optima Baseline Optima Baseline Optima Baseline Optima Baseline Optima Baseline Optima Baseline Optima Baseline Optima Baseline Optima Baseline 71 while the total flow passing through each toll link during the simulation period did not change a lot. In 8 out of the 10 additive toll links, traffic volume is significantly higher during the fifth hour, i.e. 9–10 am (240 min to 300 min) under the optimal toll. (a) (b) Figure 3-10: Comparison of Toll Revenue Collected on Additive Toll Links between the Baseline and Optimized Toll Rates, (a) Toll Revenue; (b) Cumulative Toll Revenue. Figure 3-10(a) shows the comparison of toll revenue dynamics between the optimal toll and the baseline. The curve of cumulative toll revenue collected along time in Figure 3-10(b) shows that the optimized toll case generates a toll revenue of around 62 thousand dollars, which is a 20% increase compared to the current toll case. During the first hour of the simulation period, toll revenue collected under both cases is almost the same. During the three peak hours from 6 to 9 am, as traffic flow 0 30 60 90 120 150 180 210 240 270 3000 1000 2000 3000 4000 Simulation time (min) To ll r ev en ue ($ ) Optima Baseline 0 30 60 90 120 150 180 210 240 270 3000 2 4 6 8 x 10 4 Simulation time (min) Cu mu lat ive to ll r ev en ue ($ ) Optima Baseline 72 of all toll links is very close, the increased toll revenue of the optimal toll case mainly comes from the increased peak toll rates. The peak/off-peak ratio of the optimal toll case (69%) is smaller than that of the current toll case (80%), and the average off- peak toll rates of the two cases are about on the same level. The increase of toll revenue during the last hour of the simulation period under the optimal toll mainly comes from the increase of link volumes. Figure 3-11 compares the total flow throughput at the network exit for the optimal and the baseline solutions. The optima increase the throughput capacity in peak hours (especially from 180 to 240 min). (a) (b) Figure 3-11: Comparison of the Vehicle Throughput between the Baseline and Optimized Toll Rates, (a) Vehicle Throughput; (b) Cumulative Vehicle Throughput. 0 30 60 90 120 150 180 210 240 270 3000 1 2 3 4 x 10 4 Simulation time (min) Ve hic le t hro ug hp ut Optima Baseline 0 30 60 90 120 150 180 210 240 270 3000 2 4 6 x 10 5 Simulation time (min) Cu mu lat ive ve hic le t hro ug hp ut Optima Baseline 73 Because we only suggest change the toll rate on one highway, so the influence of the small scale of toll rate change on the whole transportation network should be small. However, the small improvement in the mean travel time of all users of the transportation network (2.5% reduction) cannot be neglected. The average travel time under optimal toll was 17.70 minutes, which is 0.45 minutes shorter than the average travel time under the current tolls. The total vehicle throughput for the simulation period is around 570,000. The value of time is assumed to be $15/hour for the network users. Thus the reduction in average travel time equals a total of $65,000 saving in travel cost for the extended morning peak period (5 hours) for each day. If we consider the whole 24 hour of each day, and even consider a one-year effect, such a small improvement in mean travel time in the extended peak hours would mean a huge saving from an operational and policy standpoint. Overall, the simulation results show that implementing the optimal toll predicted by the Kriging model considering simulation noise can benefit society in multiple ways. Travelers gain from the reduction of travel cost and the government benefits from the increase of toll revenue, while there is hardly any cost associating with the change of toll rates. Thus adjusting the current toll rate to the optimal toll rate should be an encouraging policy option to enhance transportation system performance in the study region. 3.5.5 Sensitivity Analysis Moreover, to explore the sensitivity for the baseline baselinex and the optima *xˆ in the method M12, we provide a joint contour plot of the baseline in Figure 3-12(a). 74 (a) (b) Figure 3-12: Sensitivity Analysis of the (a) Baseline and (b) Optima. Each tile shows a contour of the estimated surrogate function TTˆ ( )f x (the average travel time) versus two of the six variables, with the remaining four variables held at the baseline value. This helps visualize how the surrogate values change around baselinex . The baseline values and the ranges of each dimension can be found in Table 3-5. Take the upper-left contour plot of the average travel time surrogate 75 function as an example, it is a conditional function of TT 1 2 3 4 5ˆ ( , | , , , )f z z z z z  given 3 $ 0.75z  , 4 $ 0.65z  , 5 $ 0.70z  , 80%  . The warmer colors of the joint contour plots indicate the longer average travel time, while the cooler colors show shorter travel time values. Analogically, the joint contour plots in Figure 3-12(b) show the sensitivity analysis around the optima *xˆ , which is denoted by the squares. The main difference from Figure 3-12(a) is that the values around *xˆ are toward much cooler colors (closer to cyan and blue) than the baseline sensitivity (closer to orange and red). It also validates that the optimal solution performs better than the baseline inputs. 3.6 Conclusions In this chapter, a systematic framework of transportation simulation-based optimization is proposed to solve the highway toll optimization with expensive-to- evaluate objective functions and obvious simulation random errors. The main novelty of computational time savings can be achieved through applying the efficient surrogate models which can intelligently mimic the simulation input-output mapping. A family of surrogate-based optimization approaches to model response surface of transportation simulation input-output is introduced. Both one-stage and two-stage surrogate models are tested and compared using a small-scale numerical study. Four out of the twelve surrogate models are identified to be promising 76 approximation of the true response surface of the numerical study, and are then applied to the real world application of dynamic pricing optimization. Taking advantage of the simulation’s power in reflecting the traffic realism, we evaluate the transportations system performance in response to different toll charges in a real transportation network with a simulation-based DTA model DynusT. Among the four selected promising models, the expected improvement infill re- interpolating Kriging performs the best. With only 97 samples, this model can produce highly reliable estimates of simulation outputs over the entire feasible domain, and thus successfully help find the optimal toll rate. The predicted optimal toll rate obtained from the Kriging model is then evaluated through simulation. The predicted output is relatively consistent with simulation outputs. Overall, a 2.5% improvement to the entire network in terms of average trip travel time can be achieved, through adjusting the toll rate of a single freeway from the baseline to the predicted optima. The travel time savings for the extended morning peak period of each day is equivalent to $65,000 assuming a $15 per hour value of time. 77 Chapter 4: Constrained and Multi-Objective Simulation-based Optimization Single-objective optimization without any constraints is the simplest form for an optimization problem, and is usually the first stage in research efforts. Regarding the highway toll optimization problem we investigated in the previous chapter, there are usually multiple performance measures that highway operators would be concerned about. If the toll road is operated by a private company, the major objective is most likely to be maximizing toll revenue, while they would also pay attention to the traffic condition on the toll road, so as to guarantee a minimum level of service to the paying users. If a toll road is owned and operated by public agencies, maximizing social welfare for all users of the network is probably the primary objective. Meanwhile, they also care about the revenue that can be collected from the toll and some other measures like air pollution and safety. Under the situation that decision makers have specific expectation on several of the performance measures, the problem can be formulated as a constrained optimization problem. Otherwise, multi-objective optimization would be the proper form of the problem to be analyzed. 4.1 Constrained Simulation-based Optimization In this section, we extend the previously investigated single-objective problem into a constrained optimization form, and try to deal with the toll road pricing optimization problem from the public agencies’ point of view more realistically. 78 Specifically, the public agencies are assumed to be interested in two performance measures: average travel time for the network users and the toll revenue. The scenario to be investigated is that public agencies aim to minimize network wide average travel time while intending to keep the toll revenue higher or equal to a specific level. Via our observation during the previous study, the simulation noise is very significant. Thus we decide to perform 3 repetitions of the evaluation for each sample point and use the average of outputs as the observations, so as to reduce the effect of the simulation random noises. The repetitions would significantly increase the computation time for the evaluation of initial sample points as well as the infill points. To reduce the overall computational burden, we only evaluate the transportation system performance during the 3-hour peak period (6-9 am). In this case, 5 hours (5-10 am, including one hour (5-6 am) for warming up and one hour (9- 10 am) for dissemination) is simulated in DynusT. Using a desktop with 3.10 GHz- quad CPU and 4 GB-Ram, one evaluation of each point takes about 3 hours. Thus the 3-repetition evaluations for all the 67 initial points cost around 603 computer hours in all. The performance measures we are interested in are the average travel time for vehicles departing during the peak period and the total peak period toll revenue. As the network is relatively congested during the peak hours, there are still a few vehicles departing during the peak period cannot arrive at their destinations even one hour after the peak period. Thus we only compute the average travel time for those vehicles that finished their trips. 79 4.1.1 Problem Formulation The constrained optimization problem can be formulated as follows:    peak peak peak Tm T , ,inE Ek rs rs r R s S TT rs r R s S df d                   x zz   (4.1a)       peakE ,s.t. , E aTR a TA f v a TollLimit            z z z (4.1b) 1 min max , k  z z z z  (4.1c) (4.1d) [ , ],T T k x z x  (4.1e) where  ,TRf z is the function capturing the stochastic toll revenue for the peak period; TA is a subset of links that are tolled in the network;  peak ,av z represents the peak period volume on link a ;  a is a function mapping the ID of link to its corresponding index in the vector z ; TollLimit is the predetermined lower bound of the peak-hour toll revenue. All other variables have the same meanings with those explained in the single-objective problem. Although only the performance in peak period is of concern, the off-peak toll rate would still influence the objective value, because the route choice for vehicles departing at the beginning or end of the peak period would be indirectly affected by the off-peak toll due to its impact on the distribution of vehicles in the network both before and after the peak hours. After the simulation evaluation of a certain toll plan, the travel route and travel time for each vehicle trip can be generated and saved. At the link level, time dependent flow on each link can also be summarized. Both 0 1  a 80 network wide average travel time and toll revenue for the peak period can thus be recorded. Expected improvement infill re-interpolating Kriging is identified as the best model in the single-objective optimization problem introduced in section 3.5.3, we thus will only investigate this model for the constrained optimization as well as the multi-objective optimization to be introduced in the next section. In fact, re- interpolating Kriging is just a slight revision of regressing Kriging (Forrester et al., 2006), to adapt for the deterministic experiments. As the simulator DynusT in the current study performs nondeterministic experiments, regressing Kriging model works perfectly fine with both approximating the response surface and the process of selecting infill points. Therefore, expected improvement infill regressing Kriging is finally applied for the constrained optimization and the multi-objective optimization problems. 4.1.2 Infill Strategy For a constrained optimization problem with both the objective and constraint to be expensive-to-evaluate functions as illustrated in equation 4.1, considering only the expected improvement of the objective value at a point is not sufficient. If the probability that the constraints would be violated at the point *x is very high, the overall expectation of improvement should still be very low even if the EI of objective value is large at that point. The overall expectation of improvement at a certain point is large only if both the EI of objective value and the probability of satisfying the constraints are large. *x 81 Suppose the surrogate for the constraint function is G( )x and the constraint limit is ming . The mean value and estimation variance at a point *x can be computed in the same way as for the objective function, which are denoted by *ˆ ( )gf x and *ˆ ( )gs x . The probability that the constraint is met can be given by min * 2 * min 2 ** ˆ( ( ))1P[G( ) exp dˆ2 ( )ˆ2 (] ) g gg g u fg uss         xx xx (4.2) We define an indicator function to express the constrained improvement as follows. min min( ) if ( )( ) 0 otherwise y Y G gCI    x xx (4.3) As the surrogate for the objective function is dependent from that for the constraint function, the constrained expected improvement at the point *x can thus be defined as * * * minE[ )]=EI( ) P[G(( ])CI gx x x (4.4) The choice of infill points for the constrained optimization problem can be based on maximizing the constrained expected improvement. 4.1.3 Optimization Results We then apply this infill strategy of constrained expected improvement in the ICC pricing optimization study. In the constrained optimization problem setting, the constraint limit of the toll revenue is set to be the revenue collected at the baseline case, which is $7552.3. The target of this problem is to improve the network wide 82 traffic condition during the peak period as much as possible, while keeping collecting toll revenue at least at the same level as the baseline case. Figure 4-1: Initial and Infill Points for the Constrained Optimization Problem. According to the criterion of expected improvement, 15 infill points (each infill point is simulated three times, and the total cost is about 135 computer hours) are added into the sample. Among the 15 infill toll plans, the toll revenue collected under 12 toll plans meets the constraint. All the initial points and infill points are plotted in Figure 4-1. The horizontal line represents the constraint limit, and the circle lies exactly on the line represents the output of the baseline case. The average travel time for the baseline case is about 23.51 minutes. Compared to other toll plans in both 83 the initial sample and infill sample, the baseline case performs very bad regarding average travel time. Based on the initial sample, the best toll plan satisfying the constraint is T[$ 0.67, $ 0.83, $ 0.33, 1.48$ , $1.07 6, ]7 %initialbest x , 19.29 mininitialbestTT s , $9,261.1initialbestTR  . The infill points improved the average travel time a little bit. The best toll plan from the infill points is T[$ 0.66, $ 0.84, $ 0.38, $ 0.6 , $ ,1 1.16 %]39infillbest x , 19.06 mininfillbestTT s , $11,243.4infillbestTR  . The network average travel time in reduced by about 1.2%, and the added benefit is that toll revenue has increased by around $2,000, which is a 20% improvement. Most of the infill points satisfy the constraint, which means the criterion of expected improvement is very effective in searching for feasible solutions. However, as the expected improvement method balances between local exploitation and global exploration, the regions with high average travel time would still be explored. The objective values (peak period average travel time) of those infill points vary a lot. Most of them perform better than the baseline case, while there is only one point found to outperform the best observation in the initial sample. We search for the constrained optima after infilling 15 points, and find that the global optimum coincides with the infill point that produces the best performance. As DynusT provides detailed information regarding the traffic conditions, the transportation performance on ICC links and at the aggregate level are compared 84 between the estimated global optima and the baseline. Figure 4-2 shows the variations of traffic volume on the five tolled segments of the ICC during the peak period. Traffic flow for different directions is depicted separately. Even during the peak period, flow on the ICC is always lower than 3,000 vehicles per hour. As the ICC has three lanes in each direction, the capacity of ICC is not used sufficiently. Traffic volume under the optimal toll plan is significantly higher on three of the segments than the baseline case. This is an expected change since the toll rate for the three segments is lower than the baseline. The optimal toll for segment five is 70% higher than the baseline, and the volume for the east bound of segment five exhibits apparent drop compared to the baseline. Figure 4-2: Traffic Volumes of ICC Segments under Optima and Baseline Toll. 85 Various network wide performance measures are also compared between the two toll plans. The network wide average trip travel time for the two cases is illustrated in Figure 4-3. The optimal toll plan reduces average travel time by almost 10 minutes during the most congested period (7:30-9:00 am). Looking back to Figure 4-2, we can find the most significant increase in the use of the ICC just occurs at that period, which means the optimal toll plan performs very well in diverting vehicles from very congested routes. Figure 4-3: Comparison of Network Wide Average Travel Time between Optimal and Baseline Toll Plans. In the same manner, due to the significant increase of users in the period of 7:30-9:00 am, toll revenue collected during this period is almost two times higher than the baseline case as shown in Figure 4-4. The overall increase of the toll revenue for the optimal toll case during the three-hour peak period is about $3,000. Moreover, the improvement in travel time brings in additional benefits. Figure 4-5 shows that the 86 throughput for the whole network increases significantly for the period 7:30-9:00 am, and the total throughput for the morning peak period increased by about 10% under the optimal toll. The adjustment of toll rates for one freeway in the regional network has successfully led to a more efficient usage of the road network capacity. (a) (b) Figure 4-4: Comparison of Toll Revenue between Optimal and Baseline Toll Plans, (a) Toll Revenue; (b) Cumulative Toll Revenue. (a) (b) Figure 4-5: Comparison of Throughput between Optimal and Baseline Toll Plans, (a) Vehicle Throughput; (b) Cumulative Vehicle Throughput. In summary, the optimal solution can reduce average travel time by around 20% (4.5 minutes in reduction). The overall throughput for the peak period is about 87 300,000, thus the total reduction in travel time for the morning peak hours is 22,250 hours (around $300,000 in travel time saving assuming $15 per hour value of time), which is very remarkable. Moreover, the added benefit is that the total toll revenue is increase by 50%. 4.2 Multi-Objective Simulation-based Optimization As illustrated previously, there are usually multiple performance measures that highway operators would be concerned about. However, the constrained optimization problem can be formulated only when the decision makers are pretty sure about the limit they want to impose on several of the performance measures. If the highway operators have neither specific weights among the multiple goals nor expectations regarding any of the objectives, the dynamic pricing optimization for toll roads can be formulated as a multi-objective optimization problem. In this case, a set of optional toll plans which do not dominate each other can be generated. This set of compromise solutions is called the Pareto set, and it can serve as the foundation for decision making. For the multi-objective optimization problem, the objectives are assumed to be minimizing peak-period average trip travel time and maximizing peak-period toll revenue. The same initial samples and their corresponding observations for the constrained optimization problem in the previous section are utilized to solve this multi-objective problem. However, the infill strategies and the infill points would be different. 88 4.2.1 Problem Formulation The multi-objective optimization problem can be formulated as follows:    peak peak peak mi ,E ,n Ek rs rs r R s S TT rs r R s S d TTf d                   x zz   (4.5a)       peakmaxE , ,Ek aTR a TAf v a           x z z z  (4.5b) 1 min maxs.t. , k  z z z z  (4.5c) (4.5d) [ , ],T T k x z x  (4.5e) Overall, the specification of this problem is very similar with that for the constrained optimization problem introduced in the previous section. The only difference is that one of the constraints in the constrained optimization is re- formulated as an objective function. The expected results from this model is a set of optional toll plans, in which no one can dominate any of the others in terms of both of the two objectives. This set of results can provide decision makers a clear understanding on how good a combination of objectives can be achieved. Further evaluation of these optional toll plans can be done later when the weighting between the two objectives is determined. 0 1  89 4.2.2 Infill Strategy After the evaluation of all initial sample points, a Pareto set of solutions can be identified. All the solutions in the Pareto set are optimal in some sense, while they cannot dominate any other solutions in the Pareto set in terms of all objective values. If a solution in the Pareto set is better than another in one of the objective values, it should be worse in at least one of other objective values than the other. Figure 4-6 shows an example of a Pareto set with five non-dominated points (noted as stars in the figure) for a problem with two objectives. New points found in the shaded area would augment the current Pareto front, while any points in the hatched area will at least dominate one of the points in the current Pareto set. Thus any new points found in the hatched area would lead to the update of the current Pareto front. Figure 4-6: Example of a Pareto Set. 90 Similar to the case of the constrained optimization, the two surrogates for the two objective functions are assumed to be independent. The probability density function (PDF) of a combination of output regarding the two objectives (y , y )P Gaussian f g  can be described as the product of two Gaussian PDFs. The probability that a new point *x is an improvement to the current Pareto front can thus be computed by integrating the joint PDF (y , y )P Gaussian f g  over the region below and to the left of the current Pareto front, which include both the shaded and hatched area shown in Figure 4-6. The probability of improvement at a new point *x is expressed as *(1) *(i 1) *(i) *(i) *( ) *( ) 1* 1 P[I( ) dy dy dy d] (y , y ) (y , y ) ( y dy dyy , y ) M M f f g P Gaussian f g P Gaussian f gf g P Gaussia M g f g f i gn f g ff                      x (4.6) where *(i) *(i)( , )f g are the points in the current Pareto set, 1, 2, , i M  and M is the total number of points in the Pareto set. The probability of improvement represents integration of the PDF over the region below and to the left of the Pareto front. The expected improvement is then the first moment of the integral over the same region. Suppose we know the position of the centroid of the EI( )*x integral (see Figure 4-6), the Euclidean distance between the centroid and each member of the Pareto set can be computed. The expected improvement criterion is subsequently calculated using the Pareto set member closest to the centroid, * *(y ( ), y ( ))f g* *x x , by taking the product of the probability of improvement with the shortest Euclidean distance (Forrester and Keane, 2009). 91 Suppose the position of the centroid is  (y ( ), y ( ))f g* *x x , the expected improvement at the point *x is given by  * *2 2EI( )=P[I( )] (y ( ) y ( ) y) ( ) ( )( )yf gf g * * * * * *x x x x x x (4.7) where  *(1) *(i 1) *(i) *(i) *( ) *( ) 1 1 (y , y ) (yy dy dy y dy dy y dy dy P[ , y ) (y , y ) y ( )]( ) I M M f f g P Gaussian f g P Gaussian f gf g P Gaussian f g M f g f f f g f i f g f f                          * *x x    (4.8) and y ( )g *x can be computed similarly. 4.2.3 Optimization Results Through infill points to the sample, the Pareto front for the multi-objective problem is expected to be augmented or updated. The Pareto front for the initial sample and the augmented sample with infill points are presented in Figure 4-7. 92 (a) (b) Figure 4-7: Pareto front for the multi-objective optimization problem, (a) Pareto Front Based on Initial Samples; (b) Pareto Front Based on Augmented Samples. 93 Similar to the method of evaluating initial sample points, the infill points are all evaluated three times through DynusT and the mean of the output is used as the observation. Among the 67 initial sample points, only three of them are on the Pareto front. Any of the other points would be dominated by at least one of the points from the Pareto set in terms of both average travel time and toll revenue. The toll plan and the corresponding performance regarding the two objectives for the three Pareto set members are T 1 1 1[$ 0, $ 0, $ 0, $ 0, $ 0, %] $00 , 19.14 min ,TT s TR  x T 2 2 1[$ 0.67, $ 0.83, $ 0.33, $ , $ ,1.48 1.07 76 , 19.29 min ,%] $9, 261.1TT s TR  x T 3 3 3[$ 0.76, $ 0.45, $ 0.79, $ , $ ,0.81 0.55 40 , 19.54 min ,%] $9,941.7TT s TR  x Compared to the 0 toll case, more than 8,000 dollars can be collected in the 3- hour peak period through charging ICC users. At the same time, the average travel time for the whole network is just slightly increased. However, the maximal toll revenue can be collected is around 10,000 dollars. Keeping the toll revenue at this level, the network wide average trip travel time may increase dramatically by adjusting the toll plans. Just looking at the Pareto front based on the 67 initial points, we can say that the toll plans 2x and 3x are the two optimal solutions that the operator of the ICC may be interested in. The objective values under these two toll plans are significantly better than any other toll plans that are evaluated. By infilling 20 points (costs about 180 computer hours) based on the expected improvement criterion, the Pareto front is augmented slightly with two points generated during the infill process. The two new Pareto set members are 94 T 4 4 40.33 0 40 ,[$ 0.36, $ 0.54, $ 0.76, $ , $ , % 19.28 min] $6,86, 0.7TT s TR  x T 5 5 5[$ , $ 0.57, $ 0.01, $ , $ , %] 23.51.00 0.41 0.85 50 , min ,0 $9,959.8TT s TR  x However, no points that dominate the initial Pareto set members are found through the infilling. After the 20 infill iterations, the maximal expected improvement over the entire domain is about 1000, which means that there is still a very large possibility that better toll plans that dominate or augment current Pareto set members can be found through further infilling. As the purpose of this study is just to illustrate the effectiveness of the expected improvement-based infill strategy in finding new Pareto set members, further infilling is not performed here. In summary, the expected improvement-based infill strategy is pretty effective in that two new Pareto set members are found by evaluating just 20 points. The surrogate-based optimization framework and this specific infill strategy work very well in providing optimal solutions to decision makers for the multi-objective optimization problems. Comparing the result of multi-objective problem to that of the constrained optimization problem, we can find that the surrogate-based method with infill by the expected improvement criterion performs more efficiently in the constrained optimization problem. With less iterations of infill, a much better optimal solution is found for the constrained problem. A possible reason is that the constrained problem imposed a limit on one of the goals, which restricts the search for expected improvement in a smaller region than the case of multi-objective optimization problem. Further exploration of this issue can be done in the future. One research question we would like to investigate is whether the performance of the method will 95 be improved if restrictions are set for both objectives in the multi-objective optimization problem. 4.3 Conclusions The decision making of transportation planning and operations policies always concerns about multiple system performance measures, e.g. efficiency, safety, pollution, reliability, economy, etc. To make the study more application ready, simulation-based optimization problems with multiple objectives (network average trip travel time and total toll revenue collected) considerations are formulated in this chapter for the ICC pricing optimization study introduced previously. When there are predetermined goals regarding several of the objectives, a constrained optimization problem is formulated. Otherwise, if there is no prior knowledge on which objective is more important and no constraint on any of the objectives, a multi-objective optimization problem can be formulated. Expected improvement based infill strategies designed separately for the constrained optimization and multi-objective optimization structures are utilized to solve the two types of problems. With a constraint that toll revenue should be higher than or equal to that collected at the baseline, an optimal solution is found for the constrained optimization problem after 15 iterations of infill, which improve the network wide average trip travel time for the peak period by 20% while satisfying the constraint. For the multi-objective optimization problem, a Pareto set with non- dominant optimal solutions based on initial sample evaluation is first generated for 96 decision makers’ consideration. After 20 iterations of infill, 2 new points are found to augment the initial Pareto front. The effectiveness of the proposed multi-objective optimization procedures is thus proved. 97 Chapter 5: An Enhanced SVR Method: Adapting for Heteroscedastic Simulation Noise This chapter further investigates the ICC dynamic pricing study mentioned in Chapter 3 and 4. Heteroscedasticity in simulation noise is observed. As this feature violates the assumptions of the surrogate models analyzed previously, a new method that can adapt for the heteroscedastic simulation noise is proposed and explored in this chapter. 5.1 Heteroscedasticity in Transportation Simulation Outputs Using the observations of the 3-repetition evaluation of the 67 initial sample points for the ICC network introduced in Chapter 4, we can analyze the features of the simulation noise. The mean output as well as the standard deviation of the mean output for each sample point can be computed based on the observed value. Figure 5- 1 plots the mean output against its corresponding standard deviation/coefficient of variation for all the 67 initial samples. 98 (a) (b) Figure 5-1: Heteroscedasticity Exhibited in the Simulation Outputs, (a) Standard Deviation vs. Mean; (b) Coefficient of Variation vs. Mean. 99 The standard deviation of the mean output varies significantly among those initial samples as shown in Figure 5-1(a), which means the objective function is heteroscedastic. The coefficient of variation (standard deviation divided by the mean) is a relative measure to reflect the variance level of the noise. Figure 5-1(b) plots the mean output against its corresponding coefficient of variation for all the initial samples. The level of variance differs from 0 to 0.1. Overall, the variance level is relatively low. In general, the magnitude of the simulation noise is highly dependent on the location of the design points, and the assumption of identical distribution of noise over the entire research domain in most surrogate models is not satisfied. 5.2 Methodology SVR is a method that balances between model smoothness and the loss due to prediction error. When the variance of simulation noise is assumed to be homogenous, the error tolerance can be set to be uniform over the research domain. Otherwise, different error tolerance limit can be configured for different locations. 5.2.1 Traditional Support Vector regression (SVR) SVR originates from the theory of support vector machine (SVM). Being different from SVM’s objective of classifying sample data into different groups, SVR predicts the value at the sample points. This model provides very good compromise 100 between prediction accuracy and robustness of the approximations. The basic form of SVR prediction is the weighted sum of basis functions  added to a base term b .   ( ) ( ) 1 ˆ ( ) nT i i i f b b w      w φ xx ,x (5.1) where w is the vector of weights of the basis functions. The key attribute of SVR is that it allows us to specify a margin  , and any errors in the sample data within the  -margin are fully accepted without affecting the surrogate prediction. Under this assumption, the objective of SVR is to minimize the model complexity with the constraint that predictions at all sample points are within the  -margin of the corresponding observation. 21min 2 w (5.2a) ( ) ( )s.t. 1, 2, ,,i T i i ny b      w φ  (5.2b) where ( )iφ is the abbreviation of ( )( ) ix, x . In the objective function, the model complexity is measured by the 2nd norm of the coefficient vector w . ( )iy in the constraint represents the observation at the point i . However, a solution that meets the constraints at all sample points may not exist. The constraint can be relaxed a little bit by allowing the difference between the prediction and observation to be larger than  . As only errors smaller than  are assumed not to affect the surrogate prediction, errors that are larger than  should be penalized in the objective function. A penalizing function called  -insensitive loss function can be given by 0 if otherwise         (5.3) 101 where  is the actual error between prediction and observation. By minimizing the sum of 212 w and  , a tradeoff between the model complexity and the loss due to prediction error is obtained. The updated SVR model can be expressed as ( ) ( ) 1 21min 2 ( ) n i i i C n       w (5.4a) ( ) T ( ) ( )s.t. 1, 2, , ,i i iy b i n     w φ  (5.4b) T ( ) ( ) ( ) 1, 2, ,,i i ib y i n     w φ  (5.4c) ( ) ( ), 0, 1, 2, ,i i i n      (5.4d) The coefficient C ( C >=0) in the objective function governs the tradeoff between the model complexity and prediction accuracy. If C equals 0, then a flat function through  would be generated. As C approaches infinity, the derived function would pass through all observed points. By introducing Lagrange multipliers, the above constrained optimization problem can be solved through the Lagrangian: 2 ( ) ( ) ( ) ( ) ( ) ( ) 1 1 ( ) ( ) ( ) T ( ) 1 ( ) ( ) ( ) T ( ) 1 1 || || ( ) ( )2 ( ) ( ) n ni i i i i i i i n i i i i i n i i i i i CL n y b y b                                              w w φ w φ (5.5) The Lagrangian should be minimized with respect to primal variables w , b , ( )i  and ( )i  . Meanwhile, it should be maximized with respect to dual variables ( )i and ( )i  . Meanwhile, all of the dual variables should be larger than or equal to 102 0. At the optimal solution, the derivatives of L with respect to the primal variables would vanish. ( ) ( ) ( ) 1 ( ) 0 n i i i i L         w w x (5.6) ( ) ( ) 1 ( ) 0 n i i b i L         (5.7) ( ) ( ) ( ) 1,, 2, ,i i iCL i nn          (5.8) ( ) ( ) ( ) 1,, 2, ,i i iCL i nn          (5.9) By substituting Equation 5.6 into Equation 5.1, the SVR prediction can be expressed as T ( ) ( ) ( ) 1 ˆ( ) ( ) ( , ) n i i i i f b b          x w φ x x (5.10) The parameters ( )i  and  are still unknown at this stage, where ( )i  can be estimated through the dual variable optimization problem. ( ) ( ) ( ) ( ) (i) (j) ( ) ( ) , 1 1 (i) ( ) ( ) 1 1 ( )( ) ( , ) ( )ma ( ) x 2 n ni i j j i i i j i n i i i y                                x x (5.11a) ( ) ( ) 1 s.t. ( ) 0 n i i i       (5.11b) ( ) [0,C/ n] 1, 2,, ,i i n    (5.11c) The maximization problem in equation 5.11 can be formulated as a quadratic programing problem in the following form. 103 1 2min T TT T                                    Ψ Ψα α 1 y α Ψ Ψα α 1 y α (5.12a) s.t. 0T        α1 α (5.12b) , [ , / n] +α α 0 C (5.12c) where Ψ is the n n matrix for the basis functions, with elements to be (i) (j)( , ) x x . According to the KKT conditions, the base term b can be computed as ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) for 0, (0, / )ˆ for (0, / ), 0 i T i i i i T i i i y C nb y C n                     w x w x (5.13) In the current study, we use Gaussian function form for the basis function  . The choices of the parameter 2 (the Gaussian kernel variance) and the weight coefficient C are determined by cross validation. 5.2.2 An Enhanced SVR For the traditional SVR model, the margin  is set to be a constant for all sample points, which means that the tolerance of the error is the same throughout the entire domain. However, this is definitely not the case for heteroscedastic samples. Because observations at points with larger variance would have larger confidence intervals, at the same confidence level, larger errors should be tolerated at those corresponding points. If there is no prior knowledge of the variance of noise in the data, the margin  can be calculated by using SVR  (Schölkopf and Smola, 2002). 104 When prior knowledge regarding the noise level is available,  is usually chosen as the standard deviation of the noise (Forrester and Keane, 2009). Thus for heteroscedastic samples, the margin  at each point can be chosen as the corresponding standard deviation of the noise at that point. The SVR model can then be revised as follows to adapt for heteroscedasticity in sample data. This enhanced SVR model will be named as heteroscedastic SVR model (H-SVR) hereafter. In equation 5.4, the constant margin  in the first two constraints will be replaced by the unique margin ( )i corresponding to each sample point i . The parameters ( )i  can then be computed through solving the quadratic program 5.14. 1 2min T T                                 Ψ Ψ ε yα α α Ψ Ψ ε yα α α (5.14a) s.t. 0T        α1 α (5.14b) , [ , / n] +α α 0 C (5.14c) where ε is a 1n vector with the elements to be ( )i , which is the standard deviation of noise at point i . The base term  can be calculated through equation 5.15, which is revised from equation 5.13 accordingly. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) for 0, (0, / )ˆ for (0, / ), 0 i T i i i i i T i i i i y C nb y C n                     w x w x (5.15) Figure 5-2 shows the difference between the H-SVR model and the traditional SVR model through a one-variable function, and this figure is created mainly for illustrative purpose. The variance of the function noise is assumed to be heteroscedastic. 105 (a) traditional SVR model    (b) heteroscedastic SVR model    Figure 5-2: Comparison of the Traditional and the Heteroscedastic SVR Model, (a) Traditional SVR Model; (b) Heteroscedastic SVR Model. 0 0.2 0.4 0.6 0.8 1-10 -5 0 5 10 15 20 x f(x ) prediction true value + - sample data support vectors 0 0.2 0.4 0.6 0.8 1-10 -5 0 5 10 15 20 x f(x ) prediction true value + - sample data support vectors 106 It clearly shows that the errors that can be tolerated at different locations vary significantly, and the  -tube is no longer parallel to the predicted function. The advantage of the H-SVR model in terms of the overall prediction accuracy will be investigated through numerical test functions in the following section. 5.3 Numerical Tests Different forms of surrogate models including quadratic polynomial function, radial basis function, Kriging model and SVR have been investigated in the numerical study in Chapter 3. The objective function of the numerical study is computed by solving a static user equilibrium model. In that case, the output is in fact deterministic instead of stochastic. It was found that the regressing Kriging model performs best in terms of prediction accuracy. As the focus of this study is to investigate the capability of the H-SVR model in dealing with heteroscedastic sample data, the performance of the H-SVR model is compared to that of the regressing Kriging model in approximating several mathematical functions with heteroscedastic noise. The first test problem is a two-dimension function 1 1 2 1 1 2 2 1 2( , ) sin( ) sin( ), , [ 2 , 2 ]f x x x x x x x x      (5.16) with global optimal solutions of * T(4.9132, 4.9132) x , *1( ) 9.6289f  x . In order to introduce heteroscedastic noise into the function output, the observation at a certain point is set to be 107 1 1 2 1 1 2 1 1 2 1 2( , ) ( , ) 0.1 ( , ) N(0,1), , [ 2 , 2 ]f x x f x x f x x x x       (5.17) 1 1 2 1 1 2 1 1 2 1 2( , ) ( , ) 0.5 ( , ) N(0,1), , [ 2 , 2 ]f x x f x x f x x x x        (5.18) where N(0,1) represents a normal distribution with mean 0 and standard deviation 1. Equation 5.17 creates samples with a relatively low level of noise while Equation 5.18 creates samples with a relatively high level of noise. As the variance of noise is dependent on the magnitude of true function value at each point, the samples created from these two equations exhibit heteroscedasticity. For Equation 5.17, the standard deviation of the output at point 1 2( , )x x is 1 1 20.1 ( , )f x x , while the standard deviation would be 1 1 20.5 ( , )f x x for Equation 5.18. To investigate the impact of sample size on the prediction accuracy, two different samples are generated using LHS separately, with a small sample including 25 points and a large sample including 60 points. Prior knowledge regarding the variance of noise at each point is needed by the H-SVR model. As in real world applications, evaluation through simulation usually costs significant amount of time, we mimic the situation with high computational cost and constrain the number of evaluation for these simple test functions. Each point in the samples by either Equation 5.17 or Equation 5.18 is allowed to be evaluated three times to calculate the mean and variance of the corresponding output. The mean output, the sample variance of the mean output and the location of the corresponding points will then be used to construct the surrogate models. As each point is only evaluated three times, the computed mean and variance may not be reliable estimates of the real distribution attributes. To achieve a robust comparison between the performance of the H-SVR model and the regressing Kriging model, 10 random seeds are used for the three-time 108 evaluation of each sample point. Thus for both the small sample and the large sample, the whole process including creation of 3-replication dataset, surrogate model estimation and validation is repeated 10 times. After the surrogate models are constructed, an independent denser validation data set with 576 points is generated, and the true function value at each of the validation point is evaluated through Equation 5.16. Predictions at the validation points will be obtained using the surrogate models. Four MoEs are chosen to evaluate the performance of the two types of surrogate models, including RMSE, MAE and EGOp that are introduced in section 3.3, and a new MoE named absolute error of optimal value (AEOV). * * * *ˆˆ ˆAEOV ( ) , argmin ( ), min ( )f f f f f   x x x x (5.19) where f is the true function, fˆ is the surrogate function, *xˆ is the predicted global optima, and *f is the true optimal value. After the MoEs for each surrogate model is computed, the average performance for the 10-time model estimation and validation process is then compared between the H-SVR model and the regressing Kriging model. To investigate the impact of problem dimension on the performance of two types of models, we test a second problem, which is a six-dimension function, belonging to the Hartman’s families. 4 6 2 6 2 1 1 ( ) exp ( ) , ,0 1i ij j ij j i j f c a x p x             x x  (5.20) Similarly, the outputs with different level of noise variance are generated by equation 5.21 and 5.22. 109 6 2 2 2( ) ( ) 0.1 ( ) N(0,1), , 0 1,1 6jf f f x j      x x x x   (5.21) 6 2 2 2( ) ( ) 0.5 ( ) N(0,1), , 0 1,1 6jf f f x j      x x x x    (5.22) The parameters for this six-dimension test function are as follows: 10.0 3.0 17.0 3.50 1.7 8.0 0.05 10.0 17.0 0.10 8.0 14.0 3.0 3.50 1.70 10.0 17.0 8.0 17.0 8.0 0.05 10.0 0.1 14.0 ija         (5.23) 1.0 1.2 3.0 3.2 ic         (5.24) 0.1312 0.1696 0.5569 0.0124 0.8283 0.5886 0.2329 0.4135 0.8307 0.3736 0.1004 0.9991 0.2348 0.1451 0.3522 0.2883 0.3047 0.6650 0.4047 0.8828 0.8732 0.5743 0.1091 0.0381 ijp         (5.25) The global optima of this function is * (0.20169, 0.150011, 0.47687, 0.275332, 0.311652, 0.6573)x , *( ) 3.32237f  x . For the second test function, the small sample includes 65 points and the large sample includes 380 points. All the procedures of constructing the surrogate models and evaluating the performance of the surrogate models are the same as the case of the first test function, except that the independent validation sample for this function includes 729 points. Overall, three aspects of the problems are considered for the numerical test, which are problem dimension, sample size and variance level. Thus there are in total 8 scenarios to be investigated. The average performance of the two types of surrogate 110 models under different test functions, different sample sizes and different variance levels is listed in Table 5-1. The better performance regarding the four MoEs under each scenario is highlighted in bold in Table 5-1. In general, the MoE RMSE evaluates the overall prediction accuracy of the two surrogate models over the entire research domain, while MAE, EGOp and AEOV measures their capability of estimating the function value accurately at some specific small regions. For the current numerical study, as the validation sample is selected using the space filling algorithm LHS and no evolutionary process (e.g. global optimal infill) is performed, the MoE RMSE evaluating the performance over the entire domain are of more concern. Test Function Sample Size Variance Level Heteroscedastic SVR Model Regressing Kriging Model RMSE MAE EGOp AEOV RMSE MAE EGOp AEOV 2-variable function small low 2.33 7.10 -7.89 1.81 2.42 7.26 -7.89 0.88 high 2.68 7.62 -8.03 3.38 2.55 7.40 -9.40 2.41 large low 0.80 7.00 -9.09 0.07 0.98 7.10 -7.43 3.99 high 1.52 5.17 -6.22 1.77 1.56 5.82 -7.58 3.31 6-variable function small low 0.24 1.90 -2.01 0.85 0.29 1.94 -1.83 0.62 high 0.49 2.49 -1.25 1.34 0.30 2.08 -1.82 1.87 large low 0.14 1.31 -3.02 0.48 0.18 1.65 -3.14 0.49 high 0.44 2.14 -1.73 1.01 0.26 1.70 -2.38 0.98 Table 5-1: Performance of the Heteroscedastic SVR Model and Regressing Kriging Model. It’s very clear from Table 5-1 that the H-SVR model always outperforms the regressing Kriging model in terms of overall prediction accuracy when the variance level of the data is relatively low. For the four low-variance scenarios, ANOVA is 111 conducted to test the statistical significance of the difference between the RMSE predicted by the H-SVR model and the regressing Kriging model. The p value is computed to be 0.000, 0.015, 0.025 and 0.000 for the 2-dimension small-sample, 2- dimension large-sample, 6-dimension small-sample and 6-dimension large-sample cases, respectively. The differences in RMSE for the two models in all of the four low-variance cases are all significant at the 95% confidence level. In general, the increase of sample size would enhance the performance of both surrogate models under the low variance cases. Worse performance should be expected when the variance level is increased. However, the comparison between the two surrogate models becomes more complicated if the variance level is relatively high. Each of the two surrogate models does not show consistent better performance than the other. An interesting finding is that the performance of the H-SVR model varies significantly among the 10 sets of input-output data for the same scenario, while the regressing Kriging model provides quite consistent performance using the same 10 sets of data. The RMSEs for both of the two surrogates using 10 sets of data under the case of 6-variable function with a high variance level are plotted in Figure 5-3. The H-SVR model is shown to be rather sensitive to the generated random outputs of the samples. Thus it’s not a promising option to deal with problems with a high variance level. 112 (a) (b) Figure 5-3: Validations RMSE for the 6-Variable Function with a High Variance Level, (a) Small Sample; (b) Large Sample. 0 2 4 6 8 100.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Va lid atio n R MS E Sample Regressing Kriging Heteroscedastic SVR 0 2 4 6 8 100.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 Va lid ati on RM SE Sample Regressing Kriging Heteroscedastic SVR 113 5.4 Application The real world application to be investigated for this chapter is still the ICC dynamic pricing optimization problem. And the objective of the problem is to minimize network wide average trip travel time for vehicles departing during the morning peak period (6-9 am). The variance level of the ICC simulation output has been analyzed in section 5.1. Figure 5-1 shows that the coefficient of variation of the outputs for all the initial samples differs from 0 to 0.1. According to the conclusion from the numerical test, the H-SVR model outperforms the regressing Kriging model for heteroscedastic data when the variance level is low. Thus the H-SVR model should be suitable for solving this real world application problem. Based on the 3-time evaluation of the 67 initial samples, both the H-SVR model and the regressing Kriging model are constructed, and their capability of predicting optimal solutions is compared. The optimal solution found by the H-SVR model is * T[$ 0.30, $ 0.37, $ 0.00, $ ,0.00 0.$ ,00 1.5%]SVR x , and the estimated network average travel time under this toll plan is * 19.35 minSVRTT s . This toll plan is then evaluated through 10-time DynusT simulation runs, and the average of the network mean travel time is * 18.37 minSVRTT s , and the standard deviation is * 0.35 minSVR s  The regressing Kriging model is also constructed to approximate the response surface. However, after an extensive search for global optima based on the Kriging model using the genetic algorithm, the global optima is found to lie at the boundary 114 domain, with the toll rate of all the ICC sections to be 0. The estimated network mean travel time is * m20.19 inKrigingTT s . The actual network average travel time under 0 tolls is * 19.16 minKrigingTT s according to the 10-time simulation evaluation at this point, and the standard deviation is * 0.29 minKriging s  . We perform the ANOVA for the 10-time simulation results of the optima predicted by the H-SVR and the regressing Kriging models, respectively. The F statistic is 3.40 and the corresponding p value is 0.08, which means that the network wide average trip travel times in response to the two optimal toll plans predicted by different surrogate models are statistically different at the 90% confidence level. Figure 5-4 shows the variations of traffic volume on the five tolled segments of the ICC during the peak period under the two optimal toll plans predicted by the H- SVR and the regressing Kriging model as well as the baseline toll. Traffic flow for different directions is depicted separately. The traffic volume on the ICC sections is very low under the baseline toll plan. It’s almost always lower than 1,000 vehicles per hour. As the ICC has 3 lanes per direction, the capacity is definitely not used sufficiently. Traffic volume under the two optimal toll plans is much higher on all of the 5 segments than the baseline case. This change is expected since the toll rate on all of the segments is reduced. It’s not surprising that the traffic volume is the highest when the ICC is totally free. 115 Figure 5-4: Traffic Volumes of ICC Segments under Optima and Baseline Tolls. In addition to the comparison of traffic volume on the tolled links, the network level performance under the 3 toll plans is also analyzed. The average travel time for vehicles departing within every 5-minute interval is depicted in Figure 5-5. The average travel time is significantly reduced under both the optimal toll plan predicted by the regressing Kriging model and the H-SVR model, especially during the most congested period (7:30-9:00 am). The reduction of average travel time is around 30% during this period. Although the optimal solution predicted by the regressing Kriging 116 model makes the ICC road free, and introduces a lot of traffic onto the ICC, the network wide traffic condition is not as efficiently improved as the solution given by the H-SVR model. A possible reason is that most of the traffic diverted onto the ICC is short trips, and these trips do not contribute a lot to the network wide performance. Those long trips are not attracted to the ICC and stay jammed on their usual routes. Figure 5-5: Average Travel Time under Optima and Baseline Tolls. As the traffic condition is improved during the most congested period (7:30-9:00 am) under the two optimal toll plans, more vehicles are able to arrive at their destination during the peak period. The throughput for the whole network increases significantly for the period 7:30-9:00 am, and the total throughput for the morning peak period increases by about 11% under the optimal toll predicted by the regressing Kriging model. The increase in throughput is around 14% for the optimal 117 toll predicted by the H-SVR model. The adjustment of toll rates for one freeway in the regional network has successfully led to a more efficient usage of the road network capacity. As the average travel time is reduced and the overall throughput is increased at the same time, the total amount of travel time saving for all the network users during the morning peak hours is very huge. (b) Vehicle Throughput (c) Cumulative Vehicle Throughput Figure 5-6: Throughput under Optima and Baseline Tolls, (a) Vehicle Throughput; (b) Cumulative Vehicle Throughput. 5.5 Conclusions Most existing surrogate models that can approximate expensive-to-evaluate simulation response surfaces implicitly assume identical distribution of simulation noise over the entire feasible domain. However, via the observation of the simulation output of the regional network for our case study, we find that the noise of the 118 transportation simulation exhibits very significant heteroscedasticity. To take into account heteroscedastic noise in the sample data, an enhanced SVR method (H-SVR) is proposed in this chapter, which allows different levels of error tolerance throughout the feasible domain according to the level of variance at the specific locations. To test the effectiveness of using the H-SVR model to approximate heteroscedastic simulation output, a complete set of numerical test is designed, which includes 8 scenarios characterized by problem dimension, variance level and sample size. The performance of the H-SVR model is then compared to that of the regressing Kriging model. The test results show that the H-SVR model outperforms the regressing Kriging model in terms of overall prediction accuracy when the variance level of the noisy data is relatively low. However, when the level of variance in the data is relatively high, both of the two surrogates cannot provide satisfying performance. The H-SVR model is then applied to the application of optimizing the dynamic pricing of the ICC road as introduced in Chapter 4. The objective of this problem is set to be minimizing network wide average trip travel time for vehicles departing during the morning peak period. The optimal toll plan predicted by the H- SVR model is found to outperform that predicted by the regressing Kriging model (best surrogate model identified in the study introduced in Chapter 3). Overall, under the optimal toll plan predicted by the H-SVR model, the network wide average trip travel time for the peak period is reduced by around 20% compared to the baseline case, and the total vehicle throughput is increased by 14%. 119 However, no updating process is introduced when constructing the H-SVR models in both the numerical study and the real world application. The reason is that SVR models can only provide predicted value at unevaluated design points. Information such as the variance of the prediction at a particular point is not available. Therefore, the global optimal infill strategies used in Kriging models cannot be applied. In this case, the capability of the H-SVR surrogate model in finding the true optimal solution (or near-optimal solution) is heavily dependent on the choice of initial samples. This is the main limitation of the proposed H-SVR method. 120 Chapter 6: Bayesian Stochastic Kriging for Simulations with Heteroscedastic Errors To deal with the heteroscedastic noise in transportation simulation outputs, we propose another surrogate model in this chapter, which is named Bayesian stochastic Kriging model. Similar to the ordinary Kriging, this model can provide prediction of the distribution of output at any unevaluated design point. Thus the global optimal infill strategies introduced in section 3.2.3 can be easily applied. Moreover, to further illustrate the power of the simulation-based optimization approach in transportation research, a new problem of integrated optimization of transportation planning and operations decisions is developed as the case study in this chapter, which is very difficult to be formulated as an analytical model and thus is very hard to be solved using pure mathematical methods. 6.1 Bayesian Stochastic Kriging Model To account for both sampling and response-surface uncertainty that is inherent to a stochastic simulation, stochastic Kriging method can be incorporated with heteroscedastic noisy data. Ankenman et al. (2010) provides a mathematical foundation for stochastic Kriging that extends the power of Kriging metamodeling for deterministic computer experiments to modeling responses from heteroscedastic, stochastic simulations. Although the ordinary Kriging treats the unknown response surface as a Gaussian random field that exhibits spatial correlation, the method 121 assumes measurement errors to be independent and identically distributed, while stochastic Kriging accounts for unequal variances and correlation of the random errors across the design space in stochastic simulation (Chen et al., 2012). In this chapter, we enhance the stochastic Kriging model by incorporating parameter uncertainties. The proposed Bayesian stochastic Kriging model is derived in a Bayesian analysis framework which endeavors to estimate parameters of an underlying distribution-based on the observed distribution. 6.1.1 Model Formulation Suppose the unknown true response at point x is stochastic and following a probabilistic distribution ( )g x , and the mathematical expectation of the true response satisfies E[ ( )] ( )g x f x , where f is the true mean response surface. The expected square prediction error at x is 2E[( ( )) ]y g x . The stochastic Kriging method predicts a response by summarizing a linear model and a high frequency variation component that represents fluctuations around the trend. In this study, we consider the following stochastic Kriging model T( ) ( ) ( ) ( ), 1,2r ry Z r   x q x β x x  (6.1) where ( )ry x is the observed response obtained from the r th simulation replication at point x , T1( ) [ ( ), ..., ( )]mq qq x x x is an 1m vector of known regression basis functions (e.g. polynomial functions) of x , and β is an 1m vector of unknown weight parameters of each regression basis function. The stochastic nature of ( )Z x is referred as extrinsic uncertainty (Ankenman et al., 2010) because it is imposed to 122 reconstruct the metamodel. ( )r x is the intrinsic uncertainty that is only associated with simulation random noises. A series of simulation replications 1 2{ ( ), ( ), } x x  at the same point x are assumed to be independent and identically distributed. The Kriging surrogate approach assumes that the joint distribution of (0) 0 ( )f f x and (1) (2) ( ) T[ ( ), ( ), , ( )]nf f ff x x x is a ( 1)n -multivariate Gaussian distribution given by T T 0 20 1 1~ ,n zf                      q ψβf Q ψ Ψ (6.2) where T T (0)0 ( )q q x is an 1 m vector of regressors at (0) kx  . Take the n m matrix of regression basis functions as (1) (2) ( ) T[ ( ), ( ), , ( )]nQ q x q x q x , the ( , )i j th element of which is ( )( )ijq x for 1 i n  , 1 j m  . To reduce the influence of simulation random noises, multiple replications of simulations are necessary for each design point ( )ix , because the response mean and variance are given by ( ) ( ) 1 1( ) ( ), 1, ,i R i i r ri y y i nR  x x  (6.3) 2 ( ) ( ) ( ) 2 1 1( ) ( ( ) ( )) , 1, ,1 iRi i i r ri s y y i nR    x x x  (6.4) where ( )( )iy x is the sample mean at ( )ix , iR is the number of simulation replications at ( )ix , the mean response vector is (1) (2) ( ) T[ ( ), ( ), , ( )]ny y yy x x x , 2 ( )( )is x is the unbiased sample variance of iR simulation replications. So the variance estimation of the mean response is 2 ( ) 2 ( )( ( )) ( ) /i i is y s Rx x . 123 Similar to the n n extrinsic covariance matrix of 2z Ψ , let Σ be the n n intrinsic covariance matrix implied by the simulation noise. To simplify the covariance structure of simulation noises and keep its heteroscedasticity feature, the estimation of Σ can be given by 2 (1) 1 2 (2) 2 2 ( ) ( ) / 0 0 0 ( ) / 0ˆ 0 0 ( ) /n n s R s R s R              x xΣ x        (6.5) Lemma 1 (Conditional distribution of the multivariate Gaussian distribution). Suppose that T0[ , ]f f follow the ( 1)n dimensional multivariate Gaussian distribution 1 n , i.e. TT 0 20 1 2 1~ , /n z z f                      ψq βf ψ Ψ ΣQ (6.6) then the conditional distribution of 0f given f is an univariate Gaussian distribution 1 given by T T 2 1 2 T 2 1 0 1 0| ~ ( / ) ( ), [1 ( / ) ]z z zf           f q β ψ Ψ Σ f Qβ ψ Ψ Σ ψ (6.7) Proof. Consider the ( 1) ( 1)n n   matrix T 2 2 1 /z z        ψΣ ψ Ψ Σ , according to the Sherman-Morrison-Woodbury matrix inverse formula (Woodbury, 1950), we have T 2 1 1 T 2 T 1 1 2 2 1 T 2 1 1 2 T 1 (1 ( / ) ) ( / )1 ( / ) (1 ( / ) ) ( / ) z z z z z z                                  ψ Ψ Σ ψ ψ Ψ Σ ψψΣ Ψ Σ ψ ψ Ψ Σ ψ Ψ Σ ψψ (6.8) 124 To diagonalize the covariance matrix Σ of the multivariate Gaussian distribution, we conduct the LDU decomposition (Eaton, 1983) as follows T 2 1 T 2 2 1 ( / ) 0 0 / z z z           ψ Ψ Σ ψΛ ΣΛ Ψ Σ (6.9) where 2 1 1 0 ( / ) 1z         Λ Ψ Σ ψ . Since that T0[ , ]f f follow the multivariate Gaussian distribution, we premultiply TΛ in Equation 6.6, then T 0T T T0 +1~ ,nf               qΛ Λ β Λ ΣΛf Q (6.10) i.e. T 2 1 T T 2 1 T0 0 +1 ( / ) ( / )~ ,z znf                      ψ Ψ Σ f q β ψ Ψ Σ Qβ Λ ΣΛf Qβ (6.11) The covariance matrix has now been diagonalized. This is useful because zero covariance implies independence for normally distributed random variables and so it follows that T 2 1 T T 2 1 2 T 2 1 0 1 0( / ) ~ ( / ) , [1 ( / ) ]z z z zf               ψ Ψ Σ f q β ψ Ψ Σ Qβ ψ Ψ Σ ψ (6.12) When we move the second term on the left hand to right in Equation 6.12, it follows the Gaussian distribution conditional on f given by Equation 6.7. Lemma 2. For any m m symmetric and positive definite matrix βΣ , and 1m vector v , if the probability density function and an 1m multivariate random variable β satisfies  T 1 T( ) expp   ββ β Σ β v β , then  ~ ,m β ββ Σ v Σ . 125 Proof. Consider an 1m multivariate random variable w that has a multivariate Gaussian probability density function  ,m μ Σ , i.e. T 1 /2 1/2 1 1( ) exp ( ) ( )(2 ) det( ) 2mp        w w μ Σ w μΣ (6.13) Calculate integrals in terms of w on both sides of Equation 6.13, we have /2 1/2 T 1 T 1 T 1 T 1 1(2 ) | | exp ( ) ( ) d2 1 1exp ) d2 2 m m m                     Σ w μ Σ w μ w w Σ w μ Σ w μ Σ μ w   (6.14) so that T 1 T 1 /2 1/2 T 11 1exp ) d (2 ) det( ) exp )2 2m m               w Σ w μ Σ w w Σ μ Σ μ (6.15) Substitute β for w , βΣ for Σ , and Tv for T 1μ Σ , so we have  βμ Σ v . Since  T~ ,mw μ Σ , then  ~ ,m β ββ Σ v Σ . Lemma 3. Suppose that 2z , Ψ , Σ are known, then for an arbitrary β priori, the best linear unbiased predictor, which minimizes the mean squared prediction error between the linear predictor of the response at (0)x , is T T 2 1 0 0ˆ E[ | ] ( / ) ( E[ | ])zf      q β f ψ Ψ Σ y Q β f (6.16) Specifically, when β has a non-informative priori, i.e. ( ) 1p β , the best linear unbiased predictor of the response at ( ) 1p β is T T 2 1 0 0ˆ ˆ ˆˆ( / ) ( )zf      q β ψ Ψ Σ y Qβ (6.17) where T 2 1 1 T 2 1ˆ [ ( / ) ] ( / )z z      β Q Ψ Σ Q Q Ψ Σ y . 126 Proof. First, we prove that Equation 6.16 is an unbiased predictor of 0f . Using Lemma 1, we have 0 0 0 T 2 T 2 1 0 T T 2 1 0 E[ ] E[ | ] E[E[ | , ] | ] E[ ( ) ( ) | ] E[ | ] ( / ) ( E[ | ]) z z z f f f                  f f β f q β ψ Ψ Σ y Qβ f q β f ψ Ψ Σ y Q β f (6.18) So Equation 6.16 is the unbiased predictor of 0f , i.e. 0 0ˆE[ ] 0f f  . Second, fix an arbitrary unbiased predictor *0f , its mean squared prediction error is given by 2 * * 2 0 0 0 * 2 0 0 0 0 * 2 2 * 0 0 0 0 0 0 0 2 * 0 0 0 0 0 2 0 ( ) E[( ) ] ˆ ˆE[( ) ] ˆ ˆ ˆ ˆE[( ) ] ( ) 2E[( )( )] ˆ ˆ ˆ( ) 2E[( )( )] ˆ( ) s f f f f f f f f f s f f f f f s f f f f f s f                  (6.19) So 0ˆf is the best linear unbiased predictor that corresponds to the minimum mean squared prediction error. In particular, when β has a non-informative priori, i.e. ( ) 1p β , its conditional distribution given f is T 2 1 2 T T 2 1 T T 2 1 2 ( | ) ( | ) ( ) / ( ) 1exp ( ) ( / ) ( )2 1exp ( ( / ) 2 ( / ) )2 z z z z z p p p p                           β f y β β f y Qβ Ψ Σ y Qβ β Q Ψ Σ Qβ β Q Ψ Σ y (6.20) So Equation 6.20 satisfies the condition in Lemma 2, we have  ˆ ˆ( | ) ,mp  ββ f β Σ (6.21) 127 where T 2 1 1 T 2 1ˆ [ ( / ) ] ( / )z z      β Q Ψ Σ Q Q Ψ Σ y , T 2 1 1ˆ [ ( ) ]z    βΣ Q Ψ Σ Q . Substitute Equation 6.21 into Equation 6.16, i.e. ˆE[ | ] β f β , we have the best linear unbiased predictor of the response at (0)x is Equation 6.17. Theorem 1. When 2z , Ψ and Σ are known, (I) if ( ) 1p β on m , then the predictive distribution of the response at (0) kx  belongs to a Gaussian distribution, i.e. 20 | |1( | ) ( , )f fp f   f ff  , where T T 2 1 | 0 ˆ ˆ( / ) ( )f z     f q β ψ Ψ Σ y Qβ (6.22)   1T 02 2 T T| 0 21 , /m mf z z                   f q0 Qq ψ ψQ Ψ Σ (6.23) (II) if  0( ) ,mp β β B on m , then the predictive distribution of the response at (0) kx  belongs to Gaussian distribution, i.e. 20 | |1( | ) ( , )f fp f   f ff  , where T T 2 1 | 0 | |( / ) ( )f z     f β f β fq μ ψ Ψ Σ y Qμ (6.24) where T 2 1 1 1 T 2 1 1| 0[ ( ) ] [ ( ) ]z z          β fμ Q Ψ Σ Q B Q Ψ Σ y B β , and   12 1 T 02 2 T T| 0 21 , /zf z z                   f qB Qq ψ ψQ Ψ Σ (6.25) Proof. We first prove the informative β priori, i.e. the Gaussian distribution priori in (II), when | |B , the informative β priori reduces to the non-informative priori shows in (I). Given a known 2z , the conditional density of 0 |f f can be written as 128 0 0( | ) ( | , ) ( | )dmp f p f p f f β β f β (6.26) Consider the following conditional probability density function by applying Lemma 1, we have the first term in the right-side integrand of Equation 6.26  1 20 | , | ,( | , ) ,f fp f   f β f βf β  (6.27) where T T 2 1| , 0 ˆ| ( / ) ( )f z     f β q β ψ Ψ Σ y Qβ , 2 2 T 2 1| , ˆ[1 ( / ) ]f z z     f β ψ Ψ Σ ψ . According to the Bayes' theorem, we have the second term in the right-side integrand of Equation 6.26 as the following     T 2 1 T 1 0 0 1T T 2 1 1 T T 2 1 1 0 ( | ) ( )( | ) ( ) 1 1exp ( ) ( ) ( ) exp ( ) ( )2 2 1exp ( ) ( )2 z z z p pp p                                          y β ββ f f y Qβ Ψ Σ y Qβ β β B β β β Q Ψ Σ Q B β β Q Ψ Σ y B β (6.28) Since Equation 6.28 satisfies the condition in Lemma 2, we have  1 | |( | ) ,p  β f β fβ f μ Σ (6.29) where T 2 1 1 1 T 2 1 1 | 0ˆ[ ( ) ] [ ( ) ]z z          β fμ Q Ψ Σ Q B Q Ψ Σ y B β (6.30) T 2 1 1 1 | [ ( ) ]z      β fΣ Q Ψ Σ Q B (6.31) To simplify the notations in the followings, let 2| / z  Ψ Ψ Σ . Substitute Equation 6.27 and Equation 6.29 into Equation 6.26, and ignore constants of proportionality, we obtain 129 2 0 | , T 1 0 | | |2 | , T T 1 T T T 1 T 0 0 2 | , T 1 2 T 1 T 1 T 0 0 0 2 2 | , | , T 1 | ( ) 1( | ) exp ( ) ( ) d2 2 ( ) ( ) 2 ( ) ( )( )exp exp2 1 (2 m f f f f f fp f f f                                       f β β f β f β f f β f β f β f β β f f β μ Σ β μ β β ψ Ψ Q q ψ Ψ Q q β ψ Ψ y ψ Ψ y ψ Ψ Q q β β Σ β       T 1 T 1 | | | | | 1T 1 2 T T 1 T0 |2 2 | , | , d 2 ) ( ) 1exp exp + d2 2 m m f f f                                             β f β f β f β f β f β f f β f β β μ Σ μ μ Σ β ψ Ψ y hhβ Σ β ν β β   (6.32) where T T 1 T0|  h ψ Ψ Q q , T 1 10 | |2 | , ( ) f f      β f β f f β h ψ Ψ yν Σ μ . By Lemma 2, we can further derive Equation 6.32 as follows     T 1 2 /2 1/2 T0 0 2 | , 2 T 1 0 02 2 | , | , 1T 1 T T 2 1 T 1 0 | , | 02 | , 1T 1 T T 2 1 0 | , | |2 | , ( ) 1( | ) (2 ) det( ) exp 2 2 2 1exp ( ) + ( )2 1 ( ) + m f f f f f f f f fp f f f f f f                               f β f β f β f β β f f β f β β f f β ψ Ψ yf Σ ν Σν ψ Ψ y ψ Ψ y h hh Σ h ψ Ψ y ψ Ψ y h hh Σ                 2 1 , | | 2 12 T0 | , | 1 1T 1 2 T T T 2 1 1 | , | 0 | , | | | 0 2 1 12 T T 1 T 2 T0 | , | | | , | 0 2exp + exp 2 f f f f f f f f f f                                             f β β f β f f β β f f β β f f β β f β f β f f β β f β f f β β f Σ μ h Σ h ψ Ψ y h Σ h h hh Σ Σ μ h Σ h ψ Ψ y h μ h Σ h   (6.33) 130 where we apply the Sherman-Morrison-Woodbury matrix inverse formula (Woodbury, 1950) to derive the inverse matrices in Equation 6.33 as follows    1 1T 2 1 2 2 T 2 T 2| , | | , | | , | | , | | , |+ 1f f f f f           f β β f f β β f f β β f f β β f f β β fhh Σ Σ Σ h h Σ h h Σ (6.34)    1 1T T 2 1 1 2 T T 2 T| , | | | | , | | | , |+ [1 ]f f f       f β β f β f β f f β β f β f f β β fh hh Σ Σ μ h μ h Σ h h Σ h (6.35) We track the "decision variable" 0f in Equation 6.33, again, by Lemma 2, we conclude that 20 | |1( | ) ( , )f fp f   f ff  by substituting 2| ,f f β , |β fΣ and |β fμ , thus     12 T T 1 T 2 T| | , | | | , |f f f      f f β β f β f f β β fh Σ h ψ Ψ y h μ h Σ h (6.36) 2 2 T | | , |f f  f f β β fh Σ h (6.37) Then substitute Equations 6.27, 6.30 and 6.31 into Equations 6.36 and 6.37, we obtain the results of Equations 6.24 and 6.25. So we the case of informative β priori is proven. If |B or 1 m m B 0 , the non-informative β priori leads to the results shown in Equations 6.22 and 6.23. Note that the priori of β can be estimated by the first-round simulations across all design points. The least squares error estimations of  0 ,β B are T 1 T 0 1ˆ ( )β Q Q Q y and T T 1 1 0 1 0ˆ ˆ( ) ( )( )ˆ n m    y Qβ y Qβ Q QB , where (1) (2) ( ) T 1 1 1 1[ , , , ]ny y yy  is the first-round simulation output vector of all n points. 6.1.2 Model Evaluation As Kriging surrogate models provide predictive distributions for the outputs of any design points, we use the Kullback-Leibler divergence (Kullback and Leibler, 131 1951) as the performance measure. It is also called relative entropy, which is a non- symmetric measure of the difference between two probability distributions. Specifically, the Kullback-Leibler divergence of an unknown multivariate Gaussian distribution True from an estimated multivariate Gaussian distribution Est , is a measure of the information lost when using Est to approximate True . The analytical derivation of the Kullback-Leibler divergence of Gaussian distributions is given by 1 T 1 Est KL Est True True Est True Est True True Est True det( )1( || ) tr( ) ( ) ( ) ln2 det( )D n              ΣΣ Σ μ μ Σ μ μ Σ  (6.38) where KLD is the Kullback-Leibler divergence, Trueμ and Estμ are the mean values of True and Est , respectively, TrueΣ is the nonsingular covariance matrix of True , and the estimated covariance matrix of Est is given by 2 Est ˆ ˆˆ= z  Σ Ψ Σ (6.39) 6.2 Numerical Example Is it beneficial to include both parameter uncertainties and heteroscedastic simulation noise into the ordinary Kriging surrogate model? In this section, we work with a simplified transportation network to gain some insights of the Bayesian stochastic Kriging model using the DynusT simulator. 132 To illustrate the methodology developed in this chapter and compare it with existing models, we set up a toy network to simulate how travelers’ value of time (VOT) influence the route choice behaviors. Our purpose in this section is three-fold: To provide some intuition about what the Bayesian stochastic Kriging method does on approximating the black-box function of a simulation-based dynamic traffic assignment problem; to assess the parameter uncertainty; and to evaluate the estimation robustness by comparing the heteroscedastic errors in our model. 6.2.1 Toy Network The toy network is depicted in Figure 6-1. Overall, the network consists of 10 links and 6 nodes. There is only one origin-destination (OD) pair from node 1 to node 3 in this network. Two major parallel corridors serve for the travel demand between the OD pair. The route 1-2-3 is a freeway corridor with two general-purpose lanes and one High-occupancy vehicle (HOV) lane, while the route 1-4-5-6-3 is an arterial with 2 lanes. Vehicles may change their route choices between the freeway and the arterial under different VOT via links 8 and 9. The configuration of the links is illustrated in Table 6-1. Figure 6-1: Numerical Illustration Network. 133 Link 1 2 3 4 5 6 7 8 9 10 Length (mile) 45 45 45 45 30 30 0.2 0.2 0.2 0.2 Lanes 2 2 1 1 2 2 2 2 2 2 Speed limit (mph) 60 60 60 60 40 40 40 40 40 40 Free flow travel time (min) 45 45 45 45 45 45 0.3 0.3 0.3 0.3 Capacity (veh/lane/hour) 1500 1500 1500 1500 1000 1000 1500 1500 1500 1500 Table 6-1: Input Data for a Small Road Network. This network is coded into DynusT, and simulation-based dynamic traffic assignment is applied to capture the route choice behavior. Five-hour dynamic travel demand between the OD pair is set up as in Table 6-2. To allow all vehicles to dissipate, we simulate the network for 7 hours, with no travel demand during the last two hours. Time (hour) SOV a HOV 0 - 1 3000 300 1 - 2 5000 500 2 - 3 5800 850 3 - 4 4700 950 4 - 5 2800 350 a SOV: single occupancy vehicles. Table 6-2: Dynamic Travel Demand. In this numerical example, the average travel time for all vehicles is selected as the dependent variable, and the surface to be approximated is the response of average travel time to VOT. Due to the effect of random seeds and the stochastic 134 route choice behavior, the output of the DynusT model is noisy. To create the real response surface, we first generate a uniformly distributed sample of VOT, and then run 100 replications of simulations for each specific input of VOT. The mean of the output from the 100 replications is assumed to be the true response. This true response would be used for validating the surrogate models. In addition, we also consider the variation of the output. Thus the distribution of the output from the 100 replications of simulations for each VOT value is utilized to validate the estimated distribution of response value from the surrogate models. For the estimation of the Bayesian stochastic Kriging model, results from the first 10 replications of simulation for each corresponding VOT value are utilized. Moreover, to verify the advantageous of incorporating parameter uncertainties and heteroscedastic simulation noise into the surrogate model, we estimate a regressing Kriging model with the same input as that for the Bayesian stochastic Kriging model, and compare their performance on approximating the real response surface. 6.2.2 Numerical Results To compare the developed Bayesian stochastic Kriging model with existing surrogate-based optimization approaches, e.g. quadratic polynomial response surface method, ordinary Kriging for deterministic input-output relationship, and regressing Kriging, the first 10 replications are used to estimate the heteroscedastic simulation errors at different design points and the entire 100 replications are used to approximate the true response distributions, respectively. 135 Goodness-of-fit Quadratic Polynomial Ordinary Kriging Regressing Kriging Bayesian stochastic Kriging KLD N/Aa N/A 65.58 38.33 RMSE 1.90 0.19 0.87 0.20 MAE 5.67 0.59 2.70 0.56 NRMSE 1.89% 0.19% 0.86% 0.20% NMAE 3.10 0.33 1.52 0.31 PCC 0.00 0.99 0.98 0.99 a N/A: not applicable. Table 6-3: The Goodness-of-Fit of Surrogate Models. Table 6-3 compares results of the four models with six MoEs. It should be pointed out that the Kullback-Leibler divergence measures the difference of two probability density functions. It is an important performance measure to evaluate the surrogate function of a stochastic simulation optimization problem. At 30 design points (VOT increases from US$ 1/hour to US$ 30/hour with a step of US$ 1/hour), the ordinary Kriging method predicts zero errors to their observed mean values, while both regressing Kriging and Bayesian stochastic Kriging predict non-zero standard errors. For this stochastic simulation numerical example in the synthetic network, we can see that the Kullback-Leibler divergence of the Bayesian stochastic Kriging is smaller than regressing Kriging, indicating that the proposed model generates better predictions given heteroscedastic data. Table 6-3 also shows estimation errors by comparing the prediction values with true objective function (estimated by 100 replications of simulations) over these 30 design points. The RMSE, MAE, NRMSE, NMAE, PCC values are used for the mean output evaluation. We can see that the 136 Bayesian stochastic Kriging results in similar accuracy as the ordinary Kriging in mean value prediction. (a) (b) Figure 6-2: Comparison of Surrogate Models, (a) Regressing Kriging; (b) Bayesian Stochastic Kriging. 0 5 10 15 20 25 3095 100 105 110 VOT ($) Tra vel tim e (m in) Mean estimated travel time Std. estimated travel time Mean simulated travel time Std. simulated travel time 0 5 10 15 20 25 3095 100 105 110 VOT ($) Tra vel tim e ( mi n) Mean estimated travel time Std. estimated travel time Mean simulated travel time Std. simulated travel time 137 Figure 6-2 shows the surrogates of regressing Kriging and Bayesian stochastic Kriging corresponding to Table 6-3. These results verify the capability of the proposed Bayesian stochastic Kriging surrogate model in estimations of both mean values and standard errors given heteroscedastic noisy simulation input-output relationships. 6.3 Application of Integrated Corridor Planning and Operational Optimization 6.3.1 Research Problem This application aims to jointly optimize a transportation planning policy (i.e. HOT toll) and an operations strategy (i.e. DMS) for a freeway-arterial corridor network. Both of the two strategies have been separately investigated extensively in existing literature. HOT lanes reserve a set of freeway lanes HOV, while allowing low- occupancy vehicles to enter for a toll. A good pricing scheme of HOT facilities would increase the HOT lane usage and relieve congestion on the parallel general-purpose lanes. Benefits of HOT lanes in travel time reduction, freeway efficiency improvement, and bottleneck congestion mitigation have been widely studied in multiple region-specific case studies (Burris et al., 2009; Goel and Burris, 2012). Yin and Lou (2009) delivered a reactive self-learning approach for determining time- 138 varying tolls in response to the detected traffic arrivals. A Logit lane choice model was further adopted by Lou et al. (2011) to specify toll rates that maximized the freeway’s throughput. Gardner et al., (2013) modeled the choice process of individual drivers to pay and take the HOT lane, which could be helpful for the determining the toll setting. DMS is the most common way to provide real time travel information to drivers and encourage diversion before they approach the congested areas. The theory of random utility maximization has been applied in Ben-Akiva and Lerman (1985) and Khattack et al. (1995), while other research adopted if-then rules based on fuzzy logics to model the diversion in response to information (Paz and Peeta, 2009; Xiong and Zhang, 2013). In previous research, the optimization of diversion rate has been analyzed through heuristic methods. Jacob et al. (2006) proposed a reinforcement learning approach to investigate the optimal diversion control for an express/collector corridor affected by work zones. Chen et al. (2005) developed a simulated annealing- based algorithm to search for the optimal alternative route and diversion rate for a two-lane highway resurfacing project. In addition to the theoretical models, the diversion rate in response to DMS was investigated with field data in Horowitz et al. (2003). However, the focus of this study is not to analyze the actual diversion rate in response to DMS. Instead, we assume that the relationship between diversion rate and the information provided through DMS is already clear, and the objective of this study is to search for the optimal diversion rate that creates the best system performance. 139 The major challenge of jointly optimizing these two corridor management strategies is that it is very hard to formulate the objective function in a mathematical equation, as the two strategies concern about the analysis of travel behavior at different level. While on the other hand, simulation can conveniently evaluate the impact of the combined strategies to the system performance. Therefore, we take advantage of simulation, and apply SBO method to solve this problem. 6.3.2 Study Area The study freeway/arterial corridor is along a 15.50-mile freeway segment of Interstate 270 (I-270 is known as the Washington National Pike) in the Montgomery County, State of Maryland. It lies between I-495 (the Capital Beltway) north of Bethesda, and Maryland Route (MD 124) in the city of Gaithersburg, Montgomery County. The I-270 consists of a 12.40-mile mainline as well as a 2.10-mile spur that provides access to and from southbound I-495. The freeway corridor heads northwest from an interchange with I-495 and MD 355 (Rockville Pike) in suburban Bethesda as an up to twelve-lane freeway with a 55 mph speed limit. The left lane on each side is used as a HOV lane in the northbound direction between 15:30 and 18:30 weekdays and in the southbound direction between 6:00 AM and 9:00 AM weekdays. I-270 takes on a local-express lane configuration with the outer two lanes serving as local lanes and the inner three lanes and the HOV lane serving as express lanes. 140 Figure 6-3: Simulation Network of I-270 Freeway/Arterial Corridor. The urban network of arterials and freeways is coded into DynusT. All freeways and major arterials illustrated in Figure 6-3 are included in the network, 141 which has 61 traffic analysis zones, 435 nodes and 766 links. Three modes of dynamic OD matrices, i.e. SOV, HOV and trucks, are estimated based on demand data from the regional planning model (Zhang et al. 2012; Xiong et al., 2014). To investigate the impact of the two traffic management strategies (i.e. HOT toll and DMS) on travel behavior, we develop a work zone scenario in the aforementioned simulation network, which would create recurrent congestion in the corridor area. The effectiveness of jointly utilizing those strategies to divert traffic from congested areas can then be tested. The work zone scenario is simulated for a period of time from 14:00 to 19:00. As shown in Figure 6-3, the work zone is deployed on the north bound of I- 270 mainline, at the segment between the intersections to Montrose Road and Falls Road. Two lanes out of the initial five general-purpose lanes of this segment are closed for the road work. The initial HOV lane during the afternoon peak period (15:30-18:30) is converted to an HOT lane, allowing SOVs travelling through this lane with an additional monetary cost. Meanwhile, two DMSs are deployed at two major off-ramps of I-270 at the upstream of the work zone. These DMS can provide congestion warning to help divert traffic to the parallel arterial MD 355 before they approach to the work zone. Hereafter, we define the diversion rate as the proportion of drivers who make route choices based on their own perception of the congestion warning among all travelers. However, the response of travelers to the information provided by DMS is out of the scope of this study. Please refer to Paz and Peeta (2009) and Xiong and Zhang (2013) for more details. Instead, the objective is to search for the optimal diversion rate, which then serves as the goal for DMS. In 142 summary, the objective of the optimization problem is minimizing the average travel time for all finished trips during the 5-hour period, and the decision variables are HOT toll rate for the afternoon peak period (i.e. 15:30-18:30) and the independent diversion rate for the two DMS. The optimization problem is given by 3 1 2 3min E[ ( )] E[ ( , , )]f f x x x x x (6.40a) min maxs.t.  x x x (6.40b) where ( )f x represents the unknown true average trip travel time for all travelers of the corridor given the input x , 1x is the HOT toll rate, 2x is the diversion rate of the DMS next to the work zone, guiding travelers to Montrose Pkwy, 3x is the diversion rate of the DMS at the off-ramp to MD 187. So x is a three-dimensional decision variable vector. The box constraints are Tmin [0, ]0, 0x and T max 5.00 100[US$ , ,% ]100%x , which are lower and upper boundaries for planning and operational strategies, respectively. The lower bound collects no toll on the HOV lane, which equals to open the HOV lane to SOV in peak hours, while the upper bounds collects US$ 5.00 per SOV use of the HOT lane. 6.3.3 Simulation Demand and Supply Calibration This I-270 corridor network is a sub cut of the ICC regional network we introduced previously. Before using the simulation to evaluate system performance, the demand and supply of the simulated network should be calibrated. 143 Statistics SOV HOV Truck Total vehicles Number of vehicles 955,095 56,671 38,763 1,050,529 Average overall trip time (min) 8.26 8.48 6.94 8.23 Average trip times (min) 8.13 8.33 6.81 8.09 Average entry queuing time (min) 0.14 0.15 0.13 0.14 Average stop time (min) 0.69 0.62 0.60 0.69 Average trip distance (mile) 5.00 5.35 3.92 4.98 Average travel speed (mph) –a – – 36.93 a–: not available. Table 6-4: Summaries of the 24-Hour Simulation for the Corridor. We first simulated the travels of the corridor for the whole 24-hour weekday. A total number of 1,053,052 vehicles are loaded in the network during 24 hours. Field collections of urban street signal timing are also included in the network. From the 24-hour simulation results, we see that the largest vehicle volumes in the corridor are on the freeway I-270 and its parallel major arterial MD 355, as well as several other arterials (e.g. MD 28, MD 124 and MD 187). At the end of 20 iterations of DTA and mesoscopic simulation, the CPU time of operation is 572 min (CPU time of simulation is 348 min, CPU time of assignment is 224 min) with a 2.66 GHz-quad CPU and 4 GB-Ram computer. For the scenario after both demand and supply calibration, at the end of 20 iterations, a percentage of 99.76% vehicles complete their trips, see detailed vehicle statistics in Table 6-4. To calibrate the simulation model, we collect traffic flow data of 35 detector stations from January 1, 2013 to June 30, 2013 along the I-270 corridor, including 11 detectors on I-270 general-purpose lanes, 6 detectors on I-270 HOV lanes, 10 144 detectors on I-495, 6 detectors on MD 187, and 2 detectors on MD 355. We use 6- month (January 1 to June 30, 2013) empirical loop/microwave data of the freeway network, including lane-by-lane speed, occupancy and volume extracted from fixed detectors (CATT Lab, 2013). Each station detects 1 through 5 lanes at a time interval of 15 minutes. Since we simulate traffic flow based on weekday origin-destination demands (Chen et al., 2013), field measurements on Saturdays and Sundays are excluded. Eventually, we obtain 130-weekday measurements of 35 detection stations. Since simultaneous demand–supply calibration is found to be superior to demand- only calibration in accuracy (Balakrishna et al., 2007a, 2007b; Vaze et al., 2009), we will use the lane aggregate traffic data to support the demand and supply calibration. The mesoscopic traffic simulation in DynusT is based on the anisotropic mesoscopic simulation model, which moves vehicles according to the speed-density relationship (Chiu et al., 2010b). The modified two-regime Greensheild's type equation is used to quantify the relationship as follows f c 0 f 0 J c J 0 ( )(1 / ) vv v v v                 (6.41) where v is the space-mean speed,  is the density. Figure 6-4(a) shows the comparison of the default traffic flow model setting and the calibrated speed-density relationship for one of the detectors, i.e. station ID 3392 that locates at I-270 NB 0.23 Mile North of Grosvenor Ln. The least square error estimations of traffic flow parameters using loop/microwave data are density breakpoint c 31.34  veh/mile/lane, speed intercept int 105.73v  mph, minimal speed 0 5v  mph, jam density J 200  veh/mile/lane, shape term 3.61  , and 145 free-flow speed f 59.44v  mph. The detailed comparisons of other 34 detection stations between simulation results and 6-month field measurements are omitted due to the length limit. (a) (b) Figure 6-4: Supply Calibration Results, (a) Modified Greenshield’s Model Calibration for Detector ID 3392; (b) Comparison of Measured and Simulated Traffic Flows on General-Purpose Lanes. 0 50 100 150 2000 10 20 30 40 50 60 70 80 Density (veh/mile/lane) Sp eed (m ph ) Measurement Calibrated curve Default curve 0 20 40 60 80 Sp eed (m ph ) 0 500 1000 1500 Flo w ( veh /h) 0 2 4 6 8 10 12 14 16 18 20 220 50 100 150 Time (hour) De nsi ty (ve h/m ile ) 90% measurement CI Mean measurement Simulation 146 Figure 6-4(b) shows simulation results of the freeway network average speed, density and flow by comparing them with 6-month traffic flow data. The 90% confidence interval (CI) is estimated by the 130-weekday data. We can see the simulation matches well with historical measurements. To reveal the simulation accuracy from the perspective of network level statistics for both SOV and HOV performance, two separate experiments are performed. First, only the demand is calibrated using our previously proposed method (Zhang et al., 2012) while supply (e.g. number of lanes, traffic flow model parameters, and speed limit) is held using default values of DynusT. In the second experiment, both demand and supply are calibrated. Since there are 35 detector stations collecting traffic flow data in the study network, we first estimate the simulation errors of individual detector stations, and then aggregate the error into weighted average measures of goodness-of-fit using measured traffic volumes as weighted coefficients. Simulation errors Demand calibration only Demand/supply calibration Speed Flow Density Speed Flow Density RMSE 11.14 410.48 48.98 7.83 182.05 15.14 MAE 28.28 852.16 164.83 18.18 447.27 42.52 NRMSE 20.65% 67.48% 242.82% 14.25% 27.69% 48.58% NMAE 8.96 2.95 12.38 4.50 1.44 2.48 Table 6-5: Calibration Errors of Network-Wide Traffic Flow Quantities on General- Purpose Lanes. Table 6-5 shows the validation results of the simulation model before and after supply parameters calibration using 6-month field measurements on general- 147 purpose lanes. The demand/supply calibration provides considerable improvements in simulation accuracy and obtains more satisfactory results than the demand-only calibration. Segments Periods Field detection Demand calibration Demand/supply calibration Segment I (I-495 WB to I-270 NB) AM/PM peaks 1,493 512 217 Off-peak 5,134 329 2,127 Segment II (I-270 SB to I-495 EB) AM/PM peaks 1,314 170 4,689 Off-peak 5,050 403 14,323 Segment III (I-495 EB to I-270 NB) AM/PM peaks 1,358 56 76 Off-peak 5,604 65 934 Segment IV (I-270 SB to I-495 WB) AM/PM peaks 853 57 813 Off-peak 3,204 263 2,814 All segments AM/PM peaks 5,018 795 5,795 Off-peak 18,992 1,060 20,198 All day 24,010 1,855 25,994 Relative difference absolute AM/PM peaks N/A a 84.16% 15.50% Off-peak N/A 94.42% 6.35% All day N/A 92.27% 8.26% a N/A: not applicable. Table 6-6: Comparison of Measured and Simulated VMT on HOV Lanes. (Miles) Table 6-6 summarizes the HOV lane performance results of the case study with the real-world network. The performance measurement used for HOV lanes is different from general-purpose lanes in this study. In particular, the HOV lane performance may vary significantly with operation hours, so we estimate the vehicle miles traveled on HOV lanes in AM/PM peaks and off-peak periods, respectively. Loop/microwave detectors are installed on HOV lanes of four segments shown in 148 Table 6-6. The absolutes of relative differences (absolute percentage of simulation and measurement difference to the field measurement) for all HOV segments are reduced to 15.50% and 6.35% for peaks and off-peak, respectively. Figure 6-5 shows the similarity in temporal profiles of simulated and measured HOV-lane VMT. Figure 6-5: Comparison of Mean Measured and Simulated Cumulative HOV-Lane VMT. 6.3.4 SBO Results We simulate the 5-hour PM peak from 14:00 to 19:00 of the corridor. The HOT rate takes effects from 15:30 to 18:30, while DMS provides congestion warning information to the work zone for the whole simulation period. The initial design points to fit the Bayesian stochastic Kriging surrogate model is generated by LHS. The initial sample X includes 34 64 design points plus upper and lower bounds, and the baseline (neither HOT nor DMS implementations). 0 2 4 6 8 10 12 14 16 18 20 220 0.5 1 1.5 2 2.5 3 x 10 4 Time (hour) VM T o n H OV lan e (v eh- mi le) Mean measured HOV-lane VMT Simulated HOV-lane VMT 149 Sample 1x 2x 3x y sˆ Sample 1x 2x 3x y sˆ US$ % % min min US$ % % min min t 3.97 85.71 71.43 12.142 0.10 33 2.22 55.56 69.84 12.111 0.08 2 2.70 69.84 73.02 12.065 0.04 34 0.87 36.51 93.65 12.138 0.08 3 0.71 17.46 15.87 12.062 0.06 35 3.10 82.54 3.17 12.176 0.10 4 1.27 76.19 65.08 12.075 0.12 36 2.78 19.05 95.24 12.037 0.06 5 3.65 46.03 26.98 12.166 0.08 37 1.19 28.57 53.97 12.145 0.21 6 4.29 93.65 96.83 12.188 0.07 38 4.05 63.49 23.81 12.136 0.14 7 2.46 33.33 76.19 12.023 0.13 39 4.21 22.22 28.57 12.196 0.12 8 0.16 77.78 49.21 12.123 0.17 40 1.51 88.89 92.06 12.092 0.12 9 3.57 12.70 0.00 12.171 0.08 41 4.52 23.81 85.71 12.116 0.07 10 4.37 74.60 58.73 12.157 0.08 42 4.13 34.92 50.79 12.167 0.11 11 0.79 6.35 57.14 12.017 0.09 43 0.08 39.68 7.94 12.051 0.11 12 2.30 44.44 17.46 12.044 0.07 44 2.62 71.43 12.70 12.052 0.05 13 4.84 11.11 42.86 12.147 0.16 45 3.89 7.94 80.95 12.121 0.04 14 1.83 96.83 60.32 12.107 0.06 46 1.98 0.00 79.37 12.148 0.10 15 3.49 38.10 98.41 12.145 0.11 47 1.59 41.27 1.59 12.055 0.10 16 3.25 47.62 55.56 12.203 0.19 48 1.90 61.90 4.76 12.086 0.09 17 3.81 73.02 9.52 12.102 0.09 49 0.63 60.32 14.29 12.066 0.10 18 0.48 66.67 34.92 12.059 0.12 50 0.00 92.06 30.16 12.120 0.19 19 2.94 65.08 90.48 12.084 0.10 51 2.86 58.73 25.40 12.083 0.08 20 1.11 68.25 82.54 11.975 0.06 52 3.02 95.24 47.62 12.082 0.05 21 1.43 57.14 100.00 12.156 0.12 53 1.67 15.87 84.13 12.108 0.14 22 2.54 3.17 22.22 12.060 0.09 54 2.38 20.63 38.10 12.136 0.11 23 4.60 80.95 36.51 12.198 0.11 55 3.17 9.52 46.03 12.220 0.03 24 4.68 90.48 61.90 12.184 0.13 56 1.35 98.41 11.11 12.026 0.11 25 3.33 100.00 19.05 12.138 0.04 57 3.73 87.30 39.68 12.212 0.13 26 1.03 25.40 77.78 12.026 0.08 58 0.32 50.79 41.27 12.063 0.10 27 1.75 1.59 6.35 12.127 0.07 59 2.06 79.37 52.38 12.099 0.14 28 0.24 49.21 87.30 12.035 0.13 60 2.14 42.86 44.44 12.101 0.06 29 4.92 52.38 66.67 12.194 0.07 61 5.00 14.29 74.60 12.262 0.14 30 0.56 26.98 63.49 12.009 0.11 62 4.44 4.76 20.63 12.131 0.08 31 3.41 31.75 68.25 12.085 0.05 63 4.76 53.97 88.89 12.110 0.06 32 0.40 30.16 33.33 12.019 0.11 64 0.95 84.13 31.75 12.124 0.08 LBa 0.00 0.00 0.00 12.035 0.10 UBb 5.00 100 100 12.221 0.12 Baseline  0.00 0.00 12.320 0.08 N/Ac N/A N/A N/A N/A N/A a LB: lower bound; b UB: upper bound. Table 6-7: Space-Filling Latin Hypercube Sampling of Parameters for DoE. To reduce the influence of random simulation outputs, we run 5 replications for each design point, and each simulation run includes 10 iterations of DTA to 150 achieve the convergence. DynusT obtains valid results when the convergence is achieved after several times of assignments and vehicular platoon simulations. The average simulation takes around 63 min for each replication, and the relative gaps between two adjacent iterations for the DTA were found to be below 7% for all experiments. So the total computational time spent in Table 6-7 is over 350 hours. Take the baseline as an example, the mean output of the network-wide average travel time is 12.320y  min, and the estimated standard deviation is ˆ 0.08s  min based on 5 replications. We compare the mean objective function and the standard deviation of the optima with the baseline to see how much improvement can be achieved after optimization. Figure 6-6 shows the estimation results of the Bayesian stochastic Kriging surrogate model for each design point. Both estimated mean of the average travel time ˆ( )f x and its standard error ˆ( )s x are presented at these points X shown in Table 6-7. The dashed lines are bounds with one standard error given by ˆ ˆ( ) ( )f x s x . As a comparison, we plot the mean values y and the estimated standard deviations ˆ ( )s x of random observations as well. The design points in X are sorted according to estimated mean values of the average travel time in a descending order. 151 Figure 6-6: Prediction Accuracy of the Bayesian Stochastic Kriging. To further compare the baseline and the optimal case, we run the simulation for 5 replications for the optima. Figure 6-7 shows the distributions of the baseline and the optimal solution, i.e. * Tˆ 0 53.[US$ , ,25% 80. ]27%x , which corresponds to the estimated mean value of the average trip travel time. The predictive distribution of the average trip travel time for the optima is  Opti 2ma 12.020, 0.04 . Results from the 5-replication simulation of the optimal inputs show that the mean value of outputs is 11.970 min, which is close to the predictive mean value. The standard deviation of the outputs is 0.05 min. So the optimization results suggest the average trip travel time for all users of the I-270 corridor reduces 12.320 min to 11.970 min. 152 11.6 11.8 12 12.2 12.4 12.60 1 2 3 4 5 6 7 8 Pro bab ilit y D ens ity (% ) Average Trip Travel Time (min) Optima Baseline  Opti 2ma 11.970, 0.05  Basel 2ine 12.320, 0.08 Figure 6-7: Distributions of Simulation Outputs for the Baseline and Optima. Scope Statistics Baseline Optima Improvement Locally impacted vehicles (40,763 vehicles, 12.42%) Average trip time of impact vehicles (min) 26.11 24.86 4.79% System-wide impacts (328,314 vehicles, 100%) Complete trips 302,475 302,918 0.15% Average overall trip time (min) 12.32 11.97 2.84% Average trip distance (mile) 4.94 4.95 -0.20% a Average travel speed (mph) 24.07 24.82 3.12% a The negative value indicates no improvement. Table 6-8: Comparison of the Baseline and Optima for PM Peak Simulation Results. More network-wide statistics of the 5-hour simulation such as throughput, average overall trip time (including the demand loading time, stop/queueing time, and travelling time) is listed in Table 6-8. Moreover, the vehicles that passed through the work zone links before the work zone was placed in the network are identified as 153 locally impacted vehicles. The average trip travel time for those locally impacted vehicles under the baseline and optima is also listed in Table 6-8. Figure 6-8: Comparisons of Traffic Flow Characteristics on the Work Zone HOV/HOT Lane. The SOV is allowed to use the HOT lane without paying toll, which is to open the HOT lane to all vehicles during the operational time in the optima case, while SOV is restricted to be on the HOV lane in the baseline case. We further zoom in the HOV link parallel to the freeway mainline work zone (see the layout in Figure 6-3). As shown in Figure 8, the traffic volume on the HOT lane (optima) is larger than the HOV lane of the baseline after 15:30 when the managed lane takes effect. More vehicles are allocated to the HOT lane from the freeway mainline, which can release 14:00 15:00 16:00 17:00 18:00 19:000 20 40 60 Time (hour) Av era ge spe ed (m ph) Baseline Optima 14:00 15:00 16:00 17:00 18:00 19:000 1000 2000 3000 Time (hour) Flo w r ate (v eh/ h/l ane ) 14:00 15:00 16:00 17:00 18:00 19:000 100 200 300 Time (hour) De nsi ty (ve h/m ile /la ne) 154 the heavy congestion on the remaining three general-purpose lanes next to the work zone. Then the mean values of statistics of these replications are utilized to demonstrate performance improvements. Figure 6-9(a) illustrates the average trip travel time in every 5 minutes for vehicles that complete their journeys over the entire corridor. The network average travel time is reduced in the optimal case than the baseline. The largest reduction in the average travel time occurs during 17:00 and 18:00, which is a part of the HOV/HOT operation hours. Thus the optimal HOT rate together with DMS implementations successfully help alleviate peak-hour congestion throughout the network. Figure 6-9(b) compares the total corridor throughputs of the optimal solutions and the baseline. (a) (b)  Figure 6-9: Comparison of the Baseline and Optima, (a) Average Travel Time; (b) Vehicle Throughput. 14:00 15:00 16:00 17:00 18:007 8 9 10 11 12 13 14 15 16 Time (hour) Av era ge tra vel tim e (m in) Baseline Optima 14:00 15:00 16:00 17:00 18:000.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 4 Time (hour) Ve hic le t hou ghp ut Baseline Optima 155 As shown in Table 6-8, the optima increase the cumulative throughput slightly by 0.15% during the study period. The small improvement in the average travel time of all users in corridor (2.84% reduction) corresponds to more than 26 thousand dollars saved when we use the value of time as US$ 15/hour for the 5-hour PM peak simulations. The benefits can be even larger if we consider 24 hours or a long-term effect. The 2.84% reduction in travel time represents the average improvement to all corridor travelers caused by the optimal integrated transportation planning and operations strategies. If we focus on all vehicles that originally travel through the work zone link when the work zone was not set up (noted as locally impacted vehicles in Table 6-8), the percentage improvement is even more remarkable, which is 4.79% as shown in Table 6-8. 6.4 Conclusions On the basis of the regressing Kriging model, a Bayesian stochastic Kriging metamodel is developed in this chapter, which assumes a quadratic form of global trend and different levels of variance for the random simulation error at different locations of the input domain. The heteroscedasticity of the stochastic simulation output is thus taken into account. A synthetic network is built in DynusT and used to test the performance of the proposed Bayesian stochastic Kriging model. Through comparing the goodness-of-fit of the proposed model with three other surrogate models, i.e. quadratic polynomial 156 response surface method, ordinary Kriging and regressing Kriging, we find that the Bayesian stochastic Kriging model outperforms the other three models in both estimating the mean values and standard errors for a heteroscedastic simulation input- output relationship. This proposed method is then applied to jointly optimize a transportation planning strategy (i.e. HOT toll) and a traffic control measure (i.e. DMS) for a freeway-arterial corridor, which is a very difficult problem when simulation is not utilized. The predicted optimal solution is shown to improve the system performance by 2.84%, and the percentage improvement is 4.79% for locally impacted vehicles. However, to achieve computational convenience, simulation noise is assumed to be normally distributed during the developing of the Bayesian stochastic Kriging model. This assumption may not conform to the simulation outputs. The validity of this assumption will be analyzed in the next chapter. 157 Chapter 7: Bootstrapping of an Enhanced SVR Method Considering Distribution of Simulation Noise Observing the asymmetric distribution of simulation noise in the real world applications introduced in the previous chapters, we developed a new enhanced support vector regression model (named distribution-based SVR hereafter) that takes into account the noise distribution in this chapter. The penalty of prediction error is designed to be related to the probability density function of simulation noise. The assumption of normal distribution for simulation error in the Bayesian stochastic Kriging model is thus relaxed. In addition, to improve its capability in supporting global optimization, bootstrapping method is employed to estimate the variance of prediction provided by the distribution-based SVR model. The expected improvement infill strategy can then be applied to search for the new design points. 7.1 Asymmetric Distribution of Simulation Noise In addition to the characteristics of heteroscedasticity in transportation simulation noise introduced in Chapter 5, we further investigated the distribution of simulation noise for the application illustrated in Chapter 6. We randomly selected several design points from the initial sample, and replicated the simulation evaluation of those selected points by 75 times. The empirical simulation noise distributions at those design points are presented in Figure 7-1. The unit of simulation noise is minute. 158 -0.4 -0.2 0 0.2 0.40 5 10 15 20 25 30 35 Fre qu enc y (a) -0.4 -0.2 0 0.2 0.40 5 10 15 20 25 30 35 (b) -0.4 -0.2 0 0.2 0.40 5 10 15 20 25 30 35 Simulation Noise Fre qu enc y (c) -0.4 -0.2 0 0.2 0.40 5 10 15 20 25 30 35 Simulation Noise (d) Simulation Data Gumbel Distribution Gaussian Distribution 0.047Gumbel  0.000Gaussian  0.051Gumbel  0.044Gumbel  0.047Gumbel  0.000Gaussian  0.000Gaussian  0.000Gaussian  Figure 7-1: Distribution of Simulation Noise at Four Randomly Selected Design Points. At any design point, the average simulation output is used as the estimate of the true response. Thus the mean of simulation noise is zero. It is clearly shown in Figure 7-1 that the distribution of simulation noise at any of the design points is not symmetric. In general, the distribution is positively skewed and the mode of the distribution is negative. In order to develop surrogate models that take into account the distribution of simulation noise, the simulation data needs to be fit to appropriate parametric distributions. The candidate parametric distributions should be skewed. We reviewed 159 several commonly used skewed parametric distributions, and fit the observation at the four design points depicted in Figure 7-1 to those distributions. The asymmetric distributions which are investigated are summarized in Table 7-1. Design Points p-value Gaussian Distribution Gumbel Distribution Lognormal Distribution Weibull Distribution Point 1 0.2385 0.8109 0.8069 0.7184 Point 2 0.9380 0.9807 0.9899 0.9617 Point 3 0.4449 0.8358 0.8341 0.6160 Point 4 0.6525 0.9706 0.9669 0.9355 Table 7-1: Chi-Square Goodness-of-Fit Test Results The p-value for Gumbel distribution and lognormal distribution at the four randomly selected design points is very close, and significantly larger than that for the other two distributions. As small values of p cast doubt on the validity of the null hypothesis, the results in Table 7-1 suggest Gumbel distribution and lognormal distribution can fit the noisy simulation output better than the other two distributions. Gumbel distribution provides comparable performance to lognormal distribution in fitting the noisy data. One difference between Gumbel distribution and lognormal distribution is that the support of the former distribution is from negative infinity to positive infinity, while the support of the latter distribution is from 0 to positive infinity. Before fitted to lognormal distribution, the simulation noise data needs to be shifted to make sure all values are positive. When the number of observations is large, the optimal offset along with the parameters of the lognormal distribution can be estimated through maximum likelihood estimation. However, when surrogate based optimization is conducted, the number of observations at each design point is usually very limited 160 (the number of observations is 5 for both the numerical example and real world application introduced in this chapter). In this situation, it is difficult to determine the optimal offset. On the other hand, when we assume the simulation noise follows Gumbel distribution, only two parameters (location parameter and scale parameter) need to be estimated. Based on the estimation of standard deviation from the limited number of observations at the design point, the parameters of the Gumbel distribution can be easily derived. Therefore, Gumbel distribution is selected in this research and the methodology developed in this chapter is based on the assumption that simulation noise is Gumbel-distributed. The simulation data in Figure 7-1 has been fitted to both Gumbel distribution and Gaussian distribution. The modes of the fitted Gumbel distribution and Gaussian distribution are illustrated in the figure. It is obvious that Gumbel distribution fits the data much better than Gaussian distribution. Thus the common assumption of normally distributed simulation noise in most surrogate models does not hold any more. In the following section, an enhanced support vector regression model that approximates response surface of simulations with Gumbel-distributed noise will be developed. 7.2 Model Development and Evaluation The traditional SVR model tolerates prediction error within a pre-specified band [- , ]. When the prediction error lies outside the region defined by the band, 161 the penalty (presented as loss hereafter) is assumed to be linearly related to the prediction error. This loss function may not perform well when the distribution of simulation output is not symmetric. In this section, we investigate the appropriate way to formulate the loss function, and develop the proper SVR model for Gumbel-distributed simulation output data. 7.2.1 Loss Function To estimate the parameters of an unknown model with noisy data, a widely used method is maximum likelihood estimation (MLE). The likelihood of a particular dataset is defined as the joint probability density of the input-output data conditional on the underlying function f , which can be expressed as follows: 1 1 1 ( ) ({ , , },{ , , }| ) ( , | ) ( | , ) ( ) n n n n i i i i i i i L f p y y f p y f p y f p      1x x x x x  (7.1) Because the choice of input variables ix does not depend on the underlying function f , the likelihood can be further simplified as 1 ( ) ( | , ) n i i i L f p y f   x (7.2) For the ease of computation, the maximization of likelihood is usually converted to the minimization of negative log-likelihood in practice, which is 1 1 ( ) ln ( | , ) ln ( ( )) n n i i i i i i L f p y f p y f        x x (7.3) 162 As introduced in Chapter 5, the traditional SVR is a method that balances between model smoothness and loss due to prediction error. According to Equation 5.4a, the objective function of the traditional SVR model includes two terms. The first term 212 w represents model complexity, and the second term ( ) ( ) 1 ( ) n i i i C n       represents the loss due to prediction error. The ideal way to choose the loss function is to make the minimization of negative log-likelihood coincide with the minimization of the total prediction loss at the design points. Thus the loss function can be chosen as follows: ( , , ) ln ( ( )) 1, 2,, ,i i i iloss y f p y f i n   x x  (7.4) More specifically, the loss function for the two cases that the prediction ( )if x is smaller than the observation iy and the prediction ( )if x is larger than the observation iy can be expressed in Equation 7.5. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ln ( ) if ( ) 0 ( ) 1, 2, , ( ) ln ( ) if ( ) 0 ( ) 1, , ,, 2 , , , i i i i i i i i i i i i i i loss p y f y f i n loss p y f f y i n                             x x x x   (7.5) 7.2.2 Distribution-Based Support Vector Regression There are two types of Gumbel distribution, which are called Gumbel Minimum distribution and Gumbel Maximum distribution. The probability density functions of these two distributions are very similar. The only difference between these two distributions is that Gumbel Minimum distribution is negatively skewed while Gumbel Maximum distribution is positively skewed. According to Figure 7-1, the distribution of the I-270 corridor simulation outputs is positively skewed. Thus 163 the distribution-based SVR for data with Gumbel Maximum distribution is developed in this section. Distribution-based SVR for data with negatively skewed distribution can be formulated and solved in a similar manner. The probability density function of a Gumbel Maximum distribution is ( )1) ,( zz e Gumbel p x e x    (7.6) where Gumbel Gumbel xz   . Gumbel is the location parameter, and Gumbel is the scale parameter.  is the real set. Thus the loss function corresponding to the Gumbel Maximum distribution at any location ix , 1, 2, ,i n  is ( )( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ,( ) ln( ) if ( ) 0 ( ) ( ) ln( ) if ( ii Gumbel i Gumbel ii Gumbel i Gumbel i i i i iGumbel Gumbel i i i ii Gumbel i i i iGumbel Gumbel i ii Gumbel loss e y f y f loss e y f                                 x x x ( )0 ),) (i i if y     x (7.7) In this section, the tolerance band of prediction error ( ) is set to be 0. As the distribution of simulation noise is asymmetric, more studies need to be conducted for the proper selection of non-zero  . Similar to the traditional SVR presented in Equation 5.4, the Gumbel Maximum distribution-based SVR can be formulated as ( )2 ( ) 1 ( ( )1mi (2 )n ) n i i i C loss lossn       w (7.8a) ( ) T ( ) ( )s.t. 1, 2, ,,i i iy b i n   w φ  (7.8b) T ( ) ( ) ( ) 1, , 2, ,i i ib i n    w φ  (7.8c) ( ) ( ), 0, 1, 2, ,i i i n      (7.8d) 164 Following the same derivation process as described in Chapter 5, the Gumbel Maximum distribution-based SVR model finally becomes an optimization problem of dual variables. ( ) ( ) ( ) ( ) (i) (j) (i) ( ) ( ) , 1 1 ( ) ( ) ( ) 1 ( ) ( ) ( ) 1 1 ( )( ) ( , ) ( )2 ( ( ) ( ( ))) ( ( ) ) i ( )) n ( m n ni i j j i i i j i n i i i i n i i i i y C loss lossn C loss lossn                                              x x   (7.9a) ( ) ( ) 1 s.t. ( ) 0 n i i i       (7.9b) ( ) ( )( ( )) 1 ,, , 2,i iC loss i nn        (7.9c) ( ) ( ) ( )inf{ | ( ,( ) } 1, 2, ,i i iC loss i nn            (7.9d) ( ) ( ), 0, 1, 2, ,i i i n      (7.9e) When the prediction ( )if x at the design point ix is smaller than the observation iy , according to Equation 7.9d, the prediction error ( )i  at that point can be presented as ( ) ( ) ( ) ( ) ( )ln(1 ) i i i i iGumbel Gumbel Gumbel n C         (7.10) Thus the third term in Equation 7.9a at the specific design point ix can be expressed as 165 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ( ) ( ( ))) ( )( ln(1 ) ln(1 ) i i i i i i i i i i i i Gumbel Gumbel Gumbel Gumbel Gumbel C loss lossn n n n nC n C C C C                                (7.11) The constant terms which are not dependent on the dual variable ( )i are removed from Equation 7.11, as these terms would not influence the solution of the optimization problem presented in Equation 7.9. One major advantage of the traditional SVR is its computational efficiency by turning the estimation of the model into a quadratic programming problem. While Equation 7.11 includes the logarithm form of the dual variable ( )i , which makes the objective function presented in Equation 7.9 very complex and the solving of the optimization problem very computationally expensive. Therefore, we approximate the terms with logarithm form of variable ( )i by their Taylor series expansion to the second order. Equation 7.11 then approximately equals to ( ) ( ) 2 ( ) ( ) ( ) ( ) ( ) ( )( ( ) ( ( ))) 2 i i i i i i i Gumbel Gumbeln nC loss lossn C C                 (7.12) Through similarly processing the fourth term in Equation 7.9a, the following optimization problem is derived. ( ) ( ) ( ) ( ) (i) (j) (i) ( ) ( ) , 1 1 ( ) 2 ( ) 2 ( ) 2 ( ) ( ) ( ) 2 ( ) ( ) 1 1 1 ( )( ) ( , ) ( )2 ( ) ( )( ( ) ) ( ( ) ) i 2 n 2 m n ni i j j i i i j i i in ni i i i i iGumbel Gumbel Gumbel Gumbel i i y n n C C                                         x x      (7.13a) ( ) ( ) 1 s.t. ( ) 0 n i i i       (7.13b) 166 ( ) ( ) ( )[0, ], [0, ) 1, 2, ,,i ii Gumbel C i nn        (7.13c) The optimization problem presented in Equation 7.13 is still a quadratic programming problem, and it can be conveniently solved using convex optimization methods. When the distribution of simulation output is negatively skewed, the simulation noise can be fitted to Gumbel Minimum distribution. We omit the derivation of Gumbel Minimum distribution-based SVR here because it is very similar to the derivation of Gumbel Maximum distribution-based SVR introduced in this section. 7.2.3 Numerical Test Three surrogate models that can deal with heteroscedastic sample data are developed in this dissertation so far. In addition to the characteristics of heteroscedasticity, the noise of sample output is assumed to be Gumbel-distributed. The purpose of the numerical test is to analyze the capability of the newly proposed distribution-based SVR model in approximating response surfaces with skewed noise distribution. As discussed in section 5.3, the overall prediction accuracy over the entire research domain is of more concern when no evolutionary process is incorporated. This section will only focus on the MoE RMSE, and compare the performance of the three types of surrogate models: heteroscedastic SVR, Bayesian stochastic Kriging and distribution-based SVR. 167 The test functions and scenario settings are the same with the numerical test in Section 5.3. The two-dimension function in equation 5.16 and the 6-dimension function in Equation 5.20 serve as the unknown underlying function. In the low- variance and high-variance scenarios, the standard deviations are set to be 0.1 and 0.5 times the true function value at each sample point, respectively. The only difference in the numerical test design from Section 5.3 is that the distribution of noise is not Gaussian anymore. Instead, the noise is assumed to be Gumbel Minimum-distributed. Take the low-variance case for the two-dimension function as an example. The observation with Gumbel Minimum-distributed noise at a certain point is obtained by 1 1 2 1 1 2 1 2( , ) ( , ) ( , ), , [ 2 , 2 ]Gumbel Gumbelf x x f x x Gumbel x x       (7.14) The variance of a Gumbel Minimum distribution is 2 2 6 Gumbel   . As the standard deviation is set to be 1 1 20.1 ( , )f x x , the scale parameter Gumbel can be calculated as 1 1 20.06 ( , )f x x  . Moreover, the mean of a Gumbel Minimum distribution is Gumbel Gumbel   ( is Euler’s constant, approximately equals to 0.577). As the mean of noise added to the function value is 0, the location parameter Gumbel can be computed as 1 1 20.06 ( , )f x x  . When conducting the numerical test, the true standard deviation of noise is not known. We draw 5 random seeds at each of the sample point, and estimate the standard deviation from the sample at each point. The parameters Gumbel and Gumbel that will be used in the estimation of the distribution-based SVR can then be 168 computed. For each of the scenarios, the three surrogate models are estimated with the same sample dataset, and are then validated with a separate while unified validation dataset. In order to investigate the significance of the differences in prediction accuracy among the three models, the whole process including creation of 5-replication dataset, model estimation and validation is repeated 10 times. The performance of the three models under the 8 scenarios is shown in Table 7-1. Variance Level low High Test Function 2-variable function 6-variable function 2-variable function 6-variable function Sample Size Small (25) Large (60) Small (65) Large (380) Small (25) Large (60) Small (65) Large (380) RMSE Heteroscedastic SVR (H-SVR) 2.922 0.922 0.294 0.172 2.809 1.903 0.256 0.206 Bayesian stochastic Kriging (BSK) 2.606 1.840 0.380 0.201 2.691 2.171 0.349 0.235 Distribution-based SVR (D-SVR) 2.414 0.897 0.239 0.170 2.809 1.914 0.256 0.232 ANOVA between D-SVR and H-SVR (p value) 0.000* 0.013* 0.000* 0.029* 0.997 0.788 0.658 0.000* ANOVA between D-SVR and BSK (p value) 0.000* 0.000* 0.000* 0.000* 0.066 0.000* 0.000* 0.082 Note: * indicates the difference is significant at the 95% confidence level. Table 7-2: Performance of Heteroscedastic SVR, Bayesian Stochastic Kriging and Distribution-Based SVR. RMSE shown in Table 7-2 represents the average of RMSE for the 10-time model estimation and validation process. Under each of the scenarios, the lowest RMSE provided by the surrogate models is highlighted in bold. The most important finding from Table 7-2 is that the distribution-based SVR always outperforms the other two surrogate models when the variance level is relatively low. In addition, the significance of the differences in RMSE among the three models is tested through ANOVA. Under the scenarios with low variance, the p values for ANOVA between distribution-based SVR and heteroscedastic SVR and for ANOVA between 169 distribution-based SVR and Bayesian stochastic Kriging are all very close to 0, which indicates that the performance of distribution-based SVR is significantly better than that of heteroscedastic SVR and Bayesian stochastic Kriging. 7.3 Bootstrapping and Infill The capability of accurately approximating the true response surface is crucial to the success in simulation-based optimization. However, the performance of a surrogate model in searching for the optimal solution is heavily dependent on the layout of the initial sample. If the initial sample covers the area around the true optima, it may be easy to find the optimal solution or a near optima solution. Otherwise, successfully finding the optima is very difficult. In this case, it is very necessary to incorporate a global optimal infill strategy to the surrogate model for the optimization task. The expected improvement-based infill strategy has been recognized as an effective method in searching for the global optima in this dissertation and other existing literature. However, in order to apply the EI infill strategy, both prediction and the variance of the prediction at a new design point should be provided by the surrogate model. Nevertheless, traditional SVR models can only provide the prediction at new points. A supplement method to estimate the variance of prediction is required. In this chapter, we propose to incorporate the bootstrapping method with SVR for the estimation of predictor’s variance, and then apply the EI based infill strategy for surrogate-based optimization. 170 7.3.1 Bootstrapping Method Defined in Efron and Tibshirani (1993), bootstrapping is a method for assigning measures of accuracy to sample estimates. The measures of accuracy include bias, variance, confidence interval, prediction error, etc. In order to conduct surrogate-based optimization, surrogate models need to be estimated with sample input-output data to approximate the true response surface. According to observations of several real world applications introduced in previous chapters, the simulation used to evaluate the performance of transportation systems is stochastic. As a result, the surrogate model estimated with the sample output should be stochastic, and the prediction at any new points by the surrogate model should also be stochastic. In the area of surrogate-based optimization, bootstrapping has been implemented in a few studies. Kleijnen (2014) used bootstrapping to estimate the variance of Kriging predictor. In Kleijnen et al. (2012), the Kriging predictor’s variance estimated through bootstrapping was further used for the expected improvement-based global optimization. In a paper published earlier (Den Hertog, et al., 2006), the authors argued that the classic Kriging variance formula underestimates the predictor’s variance, and used bootstrapping for the estimation of predictor’s variance instead. The argument was then tested and verified through several artificial examples and a real-life case study. With regard to the SVR model, De Brabanter et al. (2011) estimated the confidence and prediction intervals for least squares support vector regression using bootstrapping. 171 Non-parametric bootstrapping is applied in this study. At each of the n initial design points (1) (2) ( ), , , nx x x , R -repetitions of simulations are performed. Thus R observations of output at each design point are retrieved, which can be expressed as ( )i ry , 1, 2, ,i n  , and 1, 2, ,r R  . Bootstrapping is generally a resampling method. To form a bootstrap sample, at each of the design points (1) (2) ( ), , , nx x x , one out of the R observations is randomly selected as the output, expressed as ( )*iy , 1, 2, ,i n  . This bootstrap sample is used to estimate the D-SVR model as illustrated in Equation 7.13. The estimated D-SVR model based on the bootstrap sample is expressed as *ˆ ( )f x . The whole process of resampling as well as the estimation of the D-SVR model is repeated B times. Therefore, at any new point ( 1)nx , B predictions ( 1)ˆ nby  , 1, 2, ,b B  can be given by the B estimated D-SVR model. With these B predictions, the predictor at the new point ( 1)nx can be computed as ( 1) ( 1) 1 ˆ ˆˆ B n b n b B y y B     (7.15) and the predictor’s variance is ( 1) ( 1) 2 ( 1) 2 1 ˆˆ ˆ( ) ( ) 1 B n n b B n b B y y s B         (7.16) 172 7.3.2 Expected Improvement-Based Infill The framework of incorporating the bootstrapping method to the D-SVR model for the surrogate-based optimization with the expected improvement infill strategy is illustrated in Figure 7-2. Figure 7-2: Framework of the Expected Improvement-Based Infill with Bootstrapped Distribution-Based SVR Surrogate Model. 173 The expected improvement-based infill strategy aims to find the new points with the maximal expected improvement compared to the minimal observation at the n old sample points. In the current study, the mean of the observations at each of the old sample points is used as the estimator of the true response, which is ( ) ( ) 1 R i r i r y y R   . Thus the minimal observation at old sample points is ( )min ( )ii y . Meanwhile, the infill strategy assumes that the distribution of prediction at a new point is Gaussian with mean ( 1)ˆˆ nBy  and variance ( 1) 2( )nBs  . The method of computing the EI can then be slightly revised from Equation 3.20 to the following, ( ) 2( 1) 2 ( 1)min ( ) ( ) ( 1)( 1) ˆˆ( )1E miI( ) ( )exp d2(2 ( ) )n ii ny i B B n i nn B yy ss uu u           x (7.17) 7.3.3 Numerical Example The effectiveness of global optimization with the EI infill strategy and bootstrapped D-SVR surrogate model is tested in this section using a numerical example. In addition, as it is verified through numerical examples in Section 7.2.3, when the simulation noise is not symmetrically distributed, the D-SVR model can approximate the true response surface more accurately than other surrogate models that do not consider the distribution of simulation noise. Under the same assumption regarding the simulation noise, it is reasonable to hypothesize that the D-SVR would also be more efficient in searching for the optimal solution when infill is applied than other models. Thus we also compare the performance of the EI infill D-SVR model in 174 searching for the optima with that of the Bayesian stochastic Kriging model. The infill strategy for the BSK model is also maximizing the EI. Instead of applying bootstrapping, the predictor’s variance for the BSK model is estimated through Bayesian analysis as introduced in Chapter 6. Figure 7-3: Comparison of the EI Infill Global Optimization with Distribution-Based SVR and Bayesian Stochastic Kriging. 0 20 40 60 80 100 120-9.65 -9.6 -9.55 -9.5 -9.45 -9.4 -9.35 -9.3 Ob ser ved M ini ma (a) Distribution-Based SVR 0 20 40 60 80 100 120-9.65 -9.6 -9.55 -9.5 -9.45 -9.4 -9.35 -9.3 Ob ser ved M ini ma Number of Infill Points (b) Bayesian Stochastic Kriging 10-instance infill process true minima 175 Among the 8 scenarios introduced in Section 7.2.3, the case of the 2- dimension function with a low variance level for response noise and a large sample size is used for this infill study. The initial sample size is 60. The infill process will terminate after 120 infill points are generated. For each of the design points, 5 replications of simulations are conducted. The bootstrap sample size B for the D- SVR method is set to be 100. In order to make the comparison between the two methods more robust, the whole procedure as illustrated in Figure 7-2 from the generation of initial sample, construction of surrogate models to the iterative infill process is repeated 10 times. The comparison of the performance in global optimization with the two methods is shown in Figure 7-3. Both methods are shown to be effective in finding the optimal solution through the EI global infill strategy. Due to the simulation noise, significant variation exhibits among the 10 instances of the optimization process for both methods. Overall, along with the infill of new points, the optimal solution identified by both methods converges to the true optima. However, the minimal observed value found by the D-SVR model declines faster than the BSK model along the infill process, which indicates that the D-SVR model is more efficient in finding the optimal solution than the BSK model. The infill process with both the D-SVR model and the BSK model is averaged over the 10 instances and compared in Figure 7-4. It shows more clearly that the D- SVR model can find the optimal solution much faster than the BSK model. With less than 20 iterations of infill, the D-SVR model is successful in finding solutions with 176 objective value smaller than -9.6, while the BSK model takes around 60 iterations of infill to find comparable solutions. In other words, with very tight computational budget, the D-SVR model will be able to provide much better solutions than the BSK model. Figure 7-4: Comparison of the EI Infill Global Optimization with Distribution-Based SVR and Bayesian Stochastic Kriging (Averaged over 10 Instances). 7.4 Application The case study investigated in this section is the same as that introduced in Section 6.3. The purpose of the case study is to jointly optimize the HOT toll and the freeway diversion control for a freeway-arterial corridor network, with the objective of minimizing the average trip travel time for all users of the network. 0 20 40 60 80 100 120-9.65 -9.6 -9.55 -9.5 -9.45 -9.4 -9.35 -9.3 Number of Infill Points Av era ge of Ob ser ved M ini ma Distribution-Based SVR Bayesian Stochastic Kriging true minima 177 Similar to the numerical test, both the distribution-based SVR model and the Bayesian stochastic Kriging model are applied for this case study. The same 67 initial sample points are used for the estimation of the surrogate models. In addition, 20 iterations of infill will be conducted to search for the optimal solution. One thing should be noted is that the simulation noise is Gumbel Maximum-distributed as shown in Figure 7-1, thus the D-SVR model used for this study is Gumbel Maximum- distribution-based SVR. After 20 iterations of infill, the optimal solution provided by the BSK model is  * T69.2[$ 0.00, %, ]2 7.36%BSK x . We then run 10 replications of simulations for this optimal input. The average trip travel time retrieved from the output from the 10 replications is * 11.943 11.945 11.977 12.032 11.91(12.005,11.930, , , , , ,6 12.058 11.93, ,2 11.946)BSKy  minutes. The mean average trip travel time is 11.968 min, and the standard deviation of the output is 0.05 min. The optimal solution provided by the Gumbel Maximum-distribution-based SVR model after 20 infill is  * T76.19 65.08%[$1.30, %, ]D SVR x , and the average trip travel time got from the 10-replication simulation is * 11.917 11.918 11.939 12.002 11.(11. 883960,11.893, , , , , ,11.929 11.898, ,12.028)D SVRy   minutes. The mean of the responses is 11.937 min, and the standard is 0.05 min. 178 Figure 7-5: Distributions of the Baseline and Optima Predicted by the D-SVR and BSK Models. As the distribution of the output is not Gaussian, it is not appropriate to test the statistical significance of the difference between the optimal objective values computed from the two models. With the assumption that the noise follows Gumbel Maximum distribution and the mean of noise is 0, we plot the distribution of the optima provided by the two models and compare them visually in Figure 7-5. It shows clearly that both of the optimal solutions predicted by the D-SVR model and the BSK model are significantly better than the baseline. To test the statistical significance of the difference between the mean output provided by the D- SVR optima and the BSK optima, the Mann-Whitney U test is conducted. Results 11.7 11.8 11.9 12 12.1 12.2 12.3 12.4 12.5 12.6 12.70 1 2 3 4 5 6 7 8 9 10 Pro bab ilit y D ens ity (% ) Average Trip Travel Time (min) Distribution Based SVR Optima Bayesian Stochastic Kriging Optima Baseline 179 show that the average trip travel time under the D-SVR optima is significantly lower than that under the BSK optima at the 90% confidence level. The performances of the entire corridor network and the locally impacted vehicles of the baseline and the optima predicted by the two surrogate methods are summarized and compared in Table 7-3. Scope Statistics Baseline BSK Optima D-SVR Optima Performance Improvement Performance Improvement Locally impacted vehicles Average trip time of impact vehicles (min) 26.110 24.393 6.58% 24.110 7.66% System- wide impacts Complete trips 302,475 304,859 0.79% 305,084 0.86% Average overall trip time (min) 12.320 11.968 2.86% 11.937 3.11% Average trip distance (mile) 4.94 4.95 -0.20% a 4.94 0 Average travel speed (mph) 24.07 24.82 3.12% 24.83 3.16% a The negative value indicates no improvement. Table 7-3: Comparison of the Baseline, BSK Optima and D-SVR Optima for PM Peak Simulation Results. The optimal solutions provided by both methods improve the performance of the corridor network compared to the baseline, which refers to the case that no effort is made in HOT lane management or freeway diversion control. The improvement in terms of average trip travel time to all of the corridor users made by the D-SVR optima is a bit larger than that made by the BSK optima. Although the difference looks small in this case study, the total saved travel time will become very large when accounting the large number of vehicles and the duration of the work zone. Moreover, if the traffic condition of the network gets even worse due to more incidents (e.g. 180 more work zones are placed), the total travel time saved by the optimal integrated corridor planning and operational strategy would become even larger. Concerning the locally impacted vehicles, their performance has been improved much more significantly by the joint optimization. Their average trip travel time is reduced by 6.58% when the BSK predicted optimal solution is implemented, and the D-SVR predicted optimal solution creates a 7.66% decrease in average trip travel time for those locally impacted vehicles. Figure 7-6: Network Wide Average Trip Travel Time along Hour of the Day of the Baseline, BSK Optima and D-SVR Optima Cases. To present more details of the effect on system wide performance caused by the optimal solutions provided by the two surrogate models, the changes of average trip travel time along time of the day for the three scenarios are plotted in Figure 7-6. 15:30 16:00 16:30 17:00 17:30 18:00 18:309 10 11 12 13 14 15 16 17 18 Time (hour) Av era ge Tri p T rav el T im e ( mi n) D-SVR Optima BSK Optima Baseline 181 During the most congested period (17:15-18:00), the optima predicted by both methods reduced the average trip travel time significantly by around 25%. After the optimal policy is implemented, the average trip travel time distributes more evenly within the afternoon peak period, and there is no severely congested period any more. During the period 17:45-18:15, the performance of the D-SVR optima is significantly better than that of the BSK optima. 7.5 Conclusions Due to the observation of asymmetrically distributed output from the I-270 freeway-arterial corridor simulation, an enhanced support vector regression model that takes into account the distribution of simulation noise is developed in this chapter. The newly developed surrogate model is named distribution-based support vector regression (D-SVR). This model relaxes the assumption of normality in simulation noise in most surrogate models. The prediction loss is assumed to be dependent on the actual distribution of simulation noise. Through a numerical test that sets the distribution of noise to be Gumbel Minimum, the developed D-SVR model is compared to the heteroscedastic SVR and the Bayesian stochastic Kriging in terms of the performance on prediction accuracy. The results show that when the signal-to-noise ratio is relatively high, the D-SVR model can predict the true response surface more accurately and always outperforms the heteroscedastic SVR and the BSK model. 182 In addition, bootstrapping is introduced and incorporated into the D-SVR surrogate-based optimization framework. Applying bootstrapping to provide prediction and predictor’s variance at new points and adopting the maximization of the expected improvement as the global optimal infill strategy, we tested the performance of the bootstrapped D-SVR in searching for the global optimal solution with a numerical example. It is found that the bootstrapped D-SVR is more efficient in global optimization, and it can find a better solution much faster than the BSK model. This finding suggests that with a fixed computation budget, the D-SVR model can provide a better solution than the BSK model, which is of great value to real world applications. The proposed EI infill bootstrapped D-SVR method is then utilized in the real world application of a joint optimization of HOT toll and freeway diversion control, which is introduced in Chapter 6. With 20 iterations of infill, the D-SVR predicted optima can reduce the network wide average trip travel time by 3.11% than the baseline, and the reduction for locally impacted vehicles is 7.66%. The comparison between the D-SVR method and the BSK method is also conducted. Results suggest the improvement made by the D-SVR optima is significantly higher than that made by the BSK optima. 183 Chapter 8: Conclusions Although simulation has shown its value in informing decision makers about the influence of their proposed policies on network performance, it is often used for scenario evaluation instead of optimization, mainly due to its high computation cost. There is clearly a gap between the promising evaluation tool of simulation and the goal of policy optimization in the transportation research. This dissertation tries to fill this gap by integrating rigorous mathematical optimization techniques with simulation evaluation. Efforts have also been made to reduce the required computation time for the optimization process. The remainder of this chapter summarizes the contributions of this research work, and proposes future research directions to further extend the developed methodologies. 8.1 Contributions This research is one of the first to introduce simulation-based optimization methods into large-scale transportation network research. In addition, based on observations of specific characteristics related to transportation simulation, existing simulation-based optimization methods are improved to enhance their efficiency. Overall, this dissertation has contributed to both methodology developments from the academic perspective and a number of real world applications from the industry perspective. 184 First, based on review of existing simulation-based optimization methods, surrogate-based optimization method is selected as the focus of this study. The surrogate-based optimization method enjoys both the advantages of simulation in policy evaluation and the efficiency of mathematical optimization. It can be used to solve transportation problems of imperative needs for practitioners. Various types of optimization problems have been proposed and solved in this dissertation, which include single-objective optimization, multi-objective optimization and constrained optimization problems. Second, the surrogate-based optimization method is capable of solving problems on the optimization of combinations of different types of policies, which is very difficult to be dealt with when simulation is not utilized. For instance, travel demand management policies (e.g. toll, managed lanes, etc.) mainly concern the macro-level travel behavior such as destination choice, departure time choice and route choice, while traffic operational strategies (e.g. freeway diversion control, traffic signal, etc.) mainly concern the micro-level traffic flow dynamics. Developing mathematical models that can capture the impact of both types of policies is very challenging. Existing analytical models evaluating the impact of one type of policies usually ignore the influence of measures from the other category. On the other hand, simulation models travel behavior at the individual level. The response of each individual to any policy can be captured, and thus the impact of combinations of policies to the system can be conveniently measured. Third, the optimization methods developed in this research are general and can be applied to many other optimization problems. As the simulation mainly serves 185 as a black-box function in the framework, any simulator can be utilized coupled with the developed methods in application. Moreover, as long as the simulation is responsive to the particular policy, any policy can be optimized with the developed methods. There is no transferability issue associated with this method. Thus the proposed method can be used by transportation management agencies at all levels of government. Fourth, existing surrogate-based optimization methods are significantly improved from the methodological point of view. Characteristics of transportation simulation noise have been investigated, and the theoretical foundation of the surrogate models is revised accordingly. Based on the observation of heteroscedasticity in simulation noise, two advanced surrogate models that adapts for the heteroscedastic noise are developed: the heteroscedastic support vector regression (H-SVR) model and the Bayesian stochastic Kriging (BSK) model. Furthermore, the transportation simulation noise is found to be asymmetrically distributed, violating the assumption of normality in noise in most surrogate models. A distribution-based support vector regression (D-SVR) model that takes into account the asymmetric distribution of noise is then developed. Overall, it is found that these enhanced models can improve the accuracy in approximating the true response surface. More importantly, they can also improve the converging speed of optimization, which leads to significant savings of computational cost in practice. Fifth, two real world problems have been investigated using the developed methods. The first problem is to optimize the dynamic pricing of a toll road in the State of Maryland. This problem is formulated into three forms: with single-objective, 186 with multi-objective and with complicated constraints. The solutions of the problem in all of the three forms are investigated in this study. The second problem is to jointly optimize the HOT toll rate and the freeway diversion control for a freeway- arterial corridor in the State of Maryland under a work zone scenario. The solutions provided by the developed methods to both of the two problems are shown to significantly improve the network performance compared to the baseline. The cost of adjusting these policies is minor, while the benefit in terms of travel time savings is significant. 8.2 Recommendations for Future Research This dissertation undertakes the research on integrating simulation evaluation with mathematical optimization, improving surrogate models for transportation simulation and solving real world transportation related problems. To better serve the needs of transportation management agencies, this work can be extended through several directions from the application point of view and the methodological point of view. The methods developed in this dissertation mainly deal with continuous optimization problems. While lots of transportation problems involve discrete decision variables (e.g. discrete/mixed integer network design problems, optimal allocation of traffic detectors, etc.), it is valuable to extend the ability of surrogate- based optimization methods in solving discrete or mixed discrete-continuous problems. 187 The simulator used in all of the case studies is DynusT, which is essentially a dynamic traffic assignment model. Travel adjustment behavior along other dimensions (e.g. departure time, mode, destination, etc.) than the route choice in response to any particular policies is generally ignored. Integrating other modules that model the travel behavioral change due to the policies would definitely enhance the simulation’s capability in predicting the traffic realism. More reliable optimization results can thus be provided when this type of fully integrated activity/agent-based models is used as the simulator for the surrogate-based optimization. In theory, if the number of simulation evaluations becomes infinite, the global optimal solution can always be found. However, the theoretical proof of convergence rate of the developed methods needs further investigation. In practice, the selection of stopping rule for the optimization process is also important. The rule applied in existing studies usually specifies a total number of simulation replications. Improving the method of generating the initial sample for the surrogate model estimation is also an interesting topic. A good initial sample that covers promising regions of the domain can greatly enhance the efficiency of the method in searching for the optimal solution. Furthermore, for the surrogate-based optimization methods with infill, the appropriate allocation of computational budget between the initial samples and infill samples also needs additional research. The efficiency of an optimization method can be significantly improved when parallel computing techniques are applied. Regarding surrogate-based optimization methods, the multiple-replication simulations on both initial and infill sample points can be easily distributed to multiple computer resources. Developing global optimal 188 infill strategies that can take advantage of parallel computing is a topic that deserves further investigation. 189 Bibliography Adeli, H. and Jiang, X. (2009). Intelligent Infrastructure: Neural Networks, Wavelets, and Chaos Theory for Intelligent Transportation Systems and Smart Structures, CRC Press, Taylor & Francis, Boca Raton, FL. Adeli, H. and Karim, A. (2000). Fuzzy-Wavelet RBFNN Model for Freeway Incident Detection. Journal of Transportation Engineering, 126(6): 464–471. Andradóttir, S. (1998). A Review of Simulation Optimization Techniques. In Proceedings of the 30th Winter Simulation Conference: 151-158. Andradóttir, S. (2005). An Overview of Simulation Optimization via Random Search. Handbooks in Operations Research and Management Science, 13: 617-631. Ankenman, B., Nelson, B. and Staum, J. (2010). Stochastic Kriging for Simulation Metamodeling. Operations Research, 58(2): 371–382. Aziz, H. and Ukkusuri, S. (2012). Integration of Environmental Objectives in a System Optimal Dynamic Traffic Assignment Model. Computer-Aided Civil and Infrastructure Engineering, 27(7): 494–511. Balakrishna, R., Antoniou, C., Ben-Akiva, M., Koutsopoulos, H. and Wen, Y. (2007a). Calibration of Microscopic Traffic Simulation Models: Methods and Application. Transportation Research Record, 1999: 198–207. Balakrishna, R., Ben-Akiva, M. and Koutsopoulos, H. (2007b). Offline Calibration of Dynamic Traffic Assignment: Simultaneous Demand-And-Supply Estimation. Transportation Research Record, 2003: 50–58. Balakrishna, R., Morgan, D., Slavin, H. and Yang, Q. (2009). Large-Scale Traffic Simulation Tools for Planning and Operations Management. In Proceedings of the 12th IFAC Symposium on Control in Transportation Systems: 117-122. Banks, J., Carson, J., Nelson, B. and Nicol, D. (2000). Discrete Event Systems Simulation. 3rd Edition. Prentice Hall, Englewood Cliffs, NJ. Baraldi, P., Canesi, R., Zio, E., Seraoui, R. and Chevalier, R. (2011). Genetic Algorithm-Based Wrapper Approach for Grouping Condition Monitoring Signals of Nuclear Power Plant Components. Integrated Computer-Aided Engineering, 18(3): 221–234. Barceló, J. and Casas, J. (2005). Dynamic Network Simulation with AIMSUN. Simulation Approaches in Transportation Analysis: 57-98. Springer US. 190 Barton, R. and Meckesheimer, M. (2006). Metamodel-Based Simulation Optimization. Handbooks in Operations Research and Management Science, Amsterdam, Elsevier B.V. Bellemans, T., De Schutter, B. and De Moor, B. (2006). Model Predictive Control for Ramp Metering of Motorway Traffic: A Case Study. Control Engineering Practice, 14(7): 757-767. Ben-Akiva, M., Bierlaire, M., Bottom, J., Koutsopoulos, H. and Mishalani, R. (1997). Development of a Route Guidance Generation System for Real-time Application. In Proceedings of 8th IFAC/IFIP/IFORS Symposium on Transportation Systems: 433- 439. Ben-Akiva, M., Bierlaire, M., Koutsopoulos, H. and Mishalani, R. (2002). Real Time Simulation of Traffic Demand-Supply Interactions within DynaMIT. Transportation and Network Analysis: Current Trends: 19-36. Springer US. Ben-Akiva, M. and Lerman, S. (1985). Discrete Choice Analysis: Theory and Application to Travel Demand. The MIT Press. Bettonvil, B. (1989). A Formal Description of Discrete Event Dynamic Systems Including Infinitesimal Perturbation Analysis. European Journal of Operational Research, 42(2): 213-222. Björkman, M. and Holmström, K. (2000). Global Optimization of Costly Nonconvex Functions Using Radial Basis Functions. Optimization and Engineering, 1(4): 373– 397. Burris, M., Ungemah, D., Mahlawat, M. and Pannu, M. (2009). Investigating the Impact of Tolls on High-Occupancy-Vehicle Lanes Using Managed Lanes. Transportation Research Record, 2099: 113–122. Carlson, R., Papamichail, I., Papageorgiou, M. and Messmer, A. (2010). Optimal Motorway Traffic Flow Control Involving Variable Speed Limits and Ramp Metering. Transportation Science, 44(2): 238-253. CATT Lab. (2013). The Center for Advanced Transportation Technology Laboratory. http://www.cattlab.umd.edu/ Chabuk, T., Reggia, J., Lohn, J. and Linden, D. (2012). Causally-Guided Evolutionary Optimization and Its Application to Antenna Array Design. Integrated Computer-Aided Engineering, 19(2): 111–124. Chan, K. and Lam, W. (2002). Optimal Speed Detector Density for the Network with Travel Time Information. Transportation Research Part A, 36(3): 203-223. 191 Chen, C., Schonfeld, P. and Paracha, J. (2005). Work Zone Optimization for Two- Lane Highway Resurfacing Projects with an Alternate Route. Transportation Research Record, 1911: 51–66. Chen, O., Hotz, A. and Ben-Akiva, M. (1997). Development and Evaluation of a Dynamic Ramp Metering Control Model. In Proceedings of 8th IFAC/IFIP/IFORS Symposium on Transportation Systems: 1162-1168. Chen, X., Ankenman, B. and Nelson, B. (2012). The Effects of Common Random Numbers on Stochastic Kriging Metamodels. ACM Transactions on Modeling and Computer Simulation, 22 (2): 7. Chen, X., Zhang, L., He, X., Xiong, C.and Li, Z. (2013). Surrogate-Based Optimization of Expensive-to-Evaluate Objective for Optimal Highway Toll Charges in Transportation Network. Computer-Aided Civil and Infrastructure Engineering, DOI: 10.1111/mice.12058. Chiu, Y., Bottom, J., Mahut, M., Paz, A., Balakrishna, R., Walle, T. and Hicks, J. (2010a). A Primer for dynamic Traffic Assignment. 89th Annual Meeting of Transportation Research Board, Washington, D. C. Chiu, Y. and Bustillos, B. (2009). A Gap Function Vehicle-Based Solution Procedure for Consistent and Robust Simulation-Based Dynamic Traffic Assignment. In the 88th Annual Meeting of Transportation Research Board, Washington, DC. Chiu, Y., Zhou, L. and Song, H. (2010b). Development and Calibration of the Anisotropic Mesoscopic Simulation Model for Uninterrupted Flow Facilities. Transportation Research Part B, 44(1): 152–174. Ciuffo, B., Casas, J., Montanino, M., Perarnau, J. and Punzo, V. (2013). From Theory to Practice: Gaussian Process Metamodels for the Sensitivity Analysis of Traffic Simulation Models. A Case Study of the Aimsum Mesoscopic Model. 92nd Annual Meeting of Transportation Research Board, Washington, D. C. Ciuffo, B., Punzo, V. (2013). “No Free Lunch” Theorems Applied to the Calibration of Traffic Simulation Models. IEEE Transactions on Intelligent Transportation Systems. In Press. Ciuffo, B., Punzo, V. and Quaglietta, E. (2011). Kriging Metamodelling to Verify Traffic Micro-Simulation Calibration Methods. 90th Annual Meeting of Transportation Research Board, Washington, D. C. Dai, D. and Schonfeld, P. (1998). Metamodels for estimating Delays through Series of Waterway Queues. Transportation Research Part B, 32(1): 1–19. 192 De Brabanter, L., De Brabanter, J., Suykens, J. and De Moor, B. (2011). Spproximate Confidence and Prediction Intervals for Least Squares Support Vector Regression. IEEE Transactions on Neural Networks, 22(1): 110-120. De Palma, A., Kilani, M. and Lindsey, R. (2005). Congestion Pricing on a Road Network: A Study Using the Dynamic Equilibrium Simulator METROPOLIS. Transportation Research Part A, 39(7): 588-611. Den Hertog, D., Kleijnen, J. and Siem, A. (2006). The Correct Kriging Variance Estimated by Bootstrapping. Journal of the Operational Research Society, 57(4): 400-409. Eaton, M. (1983). Multivariate Statistics: A Vector Space Approach. Eaton, M.L. (Ed.). New York: Wiley. Efron, B. and Tibshirani, R. (1994). An Introduction to the Bootstrap. Vol. 57. CRC Press. Eglese, R. (1990). Simulated Annealing: A Tool for Operational research. European Journal of Operational Research, 46(3): 271-281. Federal Highway Administration and Federal Transit Administration (2007). The Transportation Planning Process: Key Issues. Retrieved February 20, 2014 from http://www.planning.dot.gov/documents/BriefingBook/BBook.htm#1BB Fellendorf, M. and Vortisch, P. (2010). Microscopic Traffic Flow Simulator VISSIM. In Fundamentals of Traffic Simulation: 63-93. Springer New York. Feng, C. and Wu. H. (2006). Integrating fmGA and CYCLONE to Optimize the Schedule of Dispatching RMC Trucks. Automation in Construction, 15: 186-199. Flötteröd, G., Bierlaire, M. and Nagel, K. (2011). Bayesian Demand Calibration for Dynamic Traffic Simulations, Transportation Science, 45(4): 541–561. Forrester, A. and Keane, A. (2009). Recent Advances in Surrogate-Based Optimization. Progress in Aerospace Sciences, 45(1): 50–79. Forrester, A., Keane, A., and Bressloff, N. (2006). Design and Analysis of "Noisy" Computer Experiments, AIAA Journal, 44(10): 2331–2339. Forrester, A., Sobester, A. and Keane, A. (2008). Engineering Design via Surrogate Modelling: A Practical Guide, Wiley. Friedmann, J. (1987). Planning in the Public Domain – From Knowledge to Action. Princeton: Princeton University Press. 193 Fu, M. (1994). Optimization via Simulation: A Review. Annals of Operations Research, 53: 199-247. Fu, M. (2002). Optimization for Simulation: Theory vs. Practice. INFORMS Journal on Computing, 14(3): 192-215. Fu, M. (2006). Gradient Estimation. Chapter 19 in Handbooks in Operations Research and Management Science: Simulation. Henderson, S. and Nelson, B. edition, Elsevier. Fu, M. and Howell, W. (2003). Application of Perturbation Analysis to Traffic Light Signal Timing. In Proceedings of the 42nd IEEE Conference on Decision and Control: 4837-4840. Fu, M., Glover, F. and April, J. (2005). Simulation Optimization: A Review, New Developments, and Applications. In Proceedings of the 37th Winter Simulation Conference: 83-95. Fu, M. and Hu, J. (1994). Smoothed Perturbation Analysis Derivative Estimation for Markov Chains. Operations Research Letters, 15(5): 241-251. Gao, L., Liu, M., Sun, Z. and Mao, B. (2008). Simulation on Impact of Information Guidance on Regional Traffic Flow. Journal of Transportation Systems Engineering and Information Technology, 8(4): 63-69. Gardner, L., Bar-Gera, H. and Boyles, S. (2013). Development and Comparison of Choice Models and Tolling Schemes for High-Occupancy/Toll (HOT) Facilities. Transportation Research Part B, 55: 142–153. Gartner, N., Pooran, F. and Andrews, C. (2001). Implementation of the OPAC Adaptive Control Strategy in a Traffic Signal Network. Intelligent Transportation Systems, Proceedings of 2001 IEEE: 195-200. Gengembre, E., Ladevie, B., Fudym, O., Thuillier, A. (2012). A Kriging Constrained Efficient Global Optimization Approach Applied to Low-Energy Building Design Problems. Inverse Problems in Science and Engineering, 20(7): 1101-1114. Gibbs, M. (1997). Bayesian Gaussian Processes for Regression and Classification, Doctoral dissertation, University of Cambridge. Glasserman, P. (1991). Gradient Estimation via Perturbation Analysis. Kluwer Academic Publishers, Boston, Massachusetts. Glover, F. and Laguna, M. (1997). Tabu Search. Boston: Kluwer Academic. 194 Glynn, P. (1987). Likelihood Ratio Gradient Estimation: An Overview. In Proceedings of the 19th Winter Simulation Conference: 366-375. Glynn, P. (1989). Likelihood Ratio Derivative Estimators for Stochastic Systems. In Proceedings of the 21st Winter Simulation Conference: 374-380. Goel, R. and Burris, M. (2012). HOT Lane Policies and Their Implications. Transportation, 39(6): 1019–1033. Goldberg, D. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning, 412. Reading Menlo Park: Addison-wesley. Goldsman, D., Nelson, B. and Schmeiser, B. (1991). Methods for Selecting the Best System. In Proceedings of the 23rd Winter Simulation: 177-186. Gutmann, H. (2001). A Radial Basis Function Method for Global Optimization. Journal of Global Optimization, 19(3): 201–227. Hachicha, W., Ammeri, A., Masmoudi, F. and Chachoub, H. (2010). A Comprehensive Literature Classification of Simulation Optimisation Methods. Retrieve March 2, 2014 from http://mpra.ub.uni-muenchen.de/27652/ Halati, A., Lieu, H. and Walker, S. (1997). CORSIM-Corridor Traffic Simulation Model. Traffic Congestion and Traffic Safety in the 21st Century: Challenges, Innovations, and Opportunities. Hegyi, A., De Schutter, B. and Hellendoorn, H. (2005). Model Predictive Control for Optimal Coordination of Ramp Metering and Variable Speed Limits. Transportation Research Part C, 13(3): 185-209. Heidergott, B. (2001). Option Pricing via Monte Carlo Simulation — A Weak Derivative Approach. Probability in the Engineering and Informational Sciences, 15(3): 335-349. Highway Statistics. (1980). Office of Highway Policy Information. Federal Highway Administration. Highway Statistics. (2008). Office of Highway Policy Information. Federal Highway Administration. Ho, Y. and Deng, M. (1994). The Problem of Large Search Space in Stochastic Optimization. In Proceedings of the IEEE Conference on Decision and Control: 1470-1475. Ho, Y., Sreenivas, R. and Vakili, P. (1992). Ordinal Optimization of DEDS. Discrete Event Dynamic Systems: Theory and Applications, 2(1): 61-88. 195 Horowitz, A., Weisser, I. and Notbohm, T. (2003). Diversion from a Rural Work Zone with Traffic-Responsive Variable Message Signage System. Transportation Research Record, 1824: 23–28. Hsu, J. and Nelson, B. (1988). Optimization over a Finite Number of System Designs with One-Stage Sampling and Multiple Comparisons with the Best. In Proceedings of the 20th Winter Simulation Conference: 451-457. Huang, D., Allen, T., Notz, W. and Zeng, N. (2006). Global Optimization of Stochastic Bblack-Box Systems via Sequential Kriging Meta-models. Journal of Global Optimization, 34(3): 441–466. Hussain, M., Barton, R. & Joshi, S. (2002). Metamodeling: Radial Basis Functions, versus Polynomials. European Journal of Operational Research, 138(1): 142–154. Improta, G. and Cantarella, G. (1984). Control Systems Design for an Individual Signalized Junction. Transportation Research Part B, 18: 147-167. Jacob, C., Abdulhai, B., Hadayeghi, A. and Malone, B. (2006). Highway Work Zone Dynamic Traffic Control Using Machine Learning. IEEE Intelligent Transportation Systems Conference, ITSC 06, 262-272. Jakobsson, S., Patriksson, M., Rudholm, J. and Wojciechowski, A. (2010). A Method for Simulation-based Optimization Using Radial Basis Functions. Optimization and Engineering, 11(4): 501-532. Jones, D. (2001). A Taxonomy of Global Optimization Methods Based on Response Surfaces. Journal of Global Optimization, 21(4): 345–383. Jones, D., Schonlau, M. and Welch, W. (1998). Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global Optimization, 13(4): 455–492. Kazemi, A. and Zarandi, M. (2008). An Agent-Based Framework for Building Decision Support System in Supply Chain Management. Journal of Applied Sciences, 8: 1125-1137. Kessaci, A., Farges, J. and Henry, J. (1990). Upper Level for Real Time Urban Traffic Control Systems. In Preprints 11th IFAC World Congress, 10: 226-229. Khattak, A., Schofer, J. and Koppelman, F. (1995). Effect of Traffic Information on Commuters' Propensity to Change Route and Departure Time. Journal of Advanced Transportation. 29(2): 193-212. Kim, H. and Adeli, H. (2001). Discrete Cost Optimization of Composite Floors Using a Floating Point Genetic Algorithm. Engineering Optimization, 33(4): 485–501. 196 Kleijnen, J. (2009). Kriging Metamodeling in Simulation: A Review. European Journal of Operational Research, 192(3): 707–716. Kleijnen, J. (2014). Simulation-Optimization via Kriging and Bootstrapping: a Survey. Journal of Simulation, in press, DOI: 10.1057/jos.2014.4.. Kleijnen, J., Van Beers, W. and Van Nieuwenhuyse, I. (2012). Expected Improvement in Efficient Global Optimization through Bootstrapped Kriging. Journal of Global Optimization, 54(1): 59-73. Kochel, P., Kunze, S. and Nielander, U. (2003). Optimal Control of a Distributed Service System with Moving Resources: Application to the Fleet Sizing and Allocation Problem. International Journal of Production Economics, 81-82: 443-459. Kotsialos, A., Papateorgiou, M., Mangeas, M. and Haj-Salem, H. (2002). Coordinated and Integrated Control of Motorway Networks via Non-linear Optimal Control. Transportation Research Part C, 10(1): 65-84. Kullback, S. and Leibler, R. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22 (1): 79–86. Laguna, M. and Marti, R. (2002). Scatter Search. Boston: Kluwer Academic Publishers. Law, A. and Kelton, W. (1991). Simulation Modeling and Analysis, 2nd edtion. McGraw-Hill, New York. LeBlanc, L. and Boyce, D. (1986). A Bi-Level Programming for Exact Solution of the Network Design Problem with User-Optimal Flow. Transportation Research Part B, 20(3): 259-265. Lee, L., Lau, T. and Ho, Y. (1999). Explanation of Goal Softening in Ordinal Optimization. IEEE Transactions on Automatic Control, 44(1): 94-99. Levinson, D., Liu, H., Garrison, W., Hickman, M., Danczyk, A. and Corbett, M. (2013). Fundamental of Transportation. WIKI Book. Li, D., Xu, L., Goodman, E., Xu, Y. and Wu, Y. (2013). Integrating a Statistical Background-Foreground Extraction Algorithm and SVM Classifier for Pedestrian Detection and Tracking. Integrated Computer-Aided Engineering, 20(3): 201–216. Li, A., Xu, N., Nozick, L. and Davidson, R. (2011). Bilevel Optimization for Integrated Shelter Location Analysis and Transportation Planning for Hurricane Events. Journal of Infrastructure Systems, 17(4): 184-192. 197 Li, Z., Madanu, S., Zhou, B., Wang, Y. and Abbas, M. (2010). A Heuristic Approach for Selecting Highway Investment Alternatives. Computer-Aided Civil and Infrastructure Engineering, 25(6): 427–439. Lou, Y., Yin, Y. and Laval, J. (2011). Optimal Dynamic Pricing Strategies for High- Occupancy/Toll Lanes. Transportation Research Part C, 19(1): 64–74. Mahmassani, H. (2002). Dynamic Network Traffic Assignment and Simulation Methodology for Advanced System Management Applications. 81st Annual Meeting of Transportation Research Board, Washington, D. C. Melouk, S., Keskin, B., Armbrester, C. and Anderson, M. (2010). A Simulation Optimization-Based Decision Support Tool for Mitigating Traffic Congestion, Journal of the Operational Research Society, 62(11): 1971–1982. Meng, Q. and Wang, X. (2008). Sensitivity Analysis of Logit-Based Stochastic User Equilibrium Network Flows with Entry–Exit Toll Schemes. Computer-Aided Civil and Infrastructure Engineering, 23(2): 138–156. Mesbah, M., Sarvi, M., Ouveysi, I. and Currie, G. (2011). Optimization of Transit Priority in the Transportation Network Using a Decomposition Methodology. Transportation Research Part C, 19(2): 363-373. Messmer, A. and Papageorgiou, M. (1995). Route Diversion Control in Motorway networks via Nonlinear Optimization. IEEE Transactions on Control Systems Technology, 3(1): 144-154. Montgomery, D. (2008). Design and Analysis of Experiments, Wiley. Morris, M. and Mitchell, T. (1995). Exploratory designs for computational experiments. Journal of Statistical Planning and Inference, 43(3): 381–402. Mosseri, G., Hall, M. and Meyers, J. (2004). VISSIM Micro-simulation Modeling of Complex Geometry and Traffic Control: A Case Study of Ocean Parkway, NY. In Institute of Transportation Engineers Annual Meeting. Murray, P., Mahmassani, H. and Abdelghany, K. (2001). Methodology for Assessing High-Occupancy Toll-lane Usage and Network performance. Transportation Research Record, 1765(1): 8-15. Nava, E. and Chiu, Y. (2012). A Temporal Domain Decomposition Algorithmic Scheme for Large-Scale Dynamic Traffic Assignment, International Journal of Transportation Science and Technology, 1(1): 1–24. Omrani, R. and Kattan, L. (2012). Demand and Supply Calibration of Dynamic Traffic Assignment Models. Transportation Research Record, 2283: 100–112. 198 Osorio, C. (2010). Mitigating Network Congestion: Analytical Models, Optimization Methods and Their Applications, Lausanne, Ecole Polytechnique Fédérale de Lausanne, PhD Dissertation. Papageorgiou, M. (1980). A New Approach to Time-of-Day Control Based on a Dynamic Freeway Traffic Model. Transportation Research Part B, 14(4): 349-360. Paz, A. and Peeta, S. (2009). On-line Calibration of Behavior Parameters for Behavior-Consistent Route Guidance. Transportation Research Part B, 43(4): 403– 421. Queipo, N., Haftka, R., Shyy, W., Goel, T., Vaidyanathan, R. and Tucker, P. (2005). Surrogate-Based Analysis and Optimization. Progress in Aerospace Sciences, 41(1): 1–28. Ramadurai, G. and Ukkusuri, S. (2011). B–Dynamic: An Efficient Algorithm for Dynamic User Equilibrium Assignment in Activity‐Travel Networks. Computer- Aided Civil and Infrastructure Engineering, 26(4): 254–269. Ramanathan, V. and Schonfeld, P. (1994). Approximate Delays Caused by Lock Service Interruptions. Transportation Research Record, 1430: 41-49. Rasmussen, C. and Williams, C. (2006). Gaussian Processes for Machine Learning, MIT Press. Regis, R. and Shoemaker, C. (2005). Constrained Global Optimization of Expensive Black Box Functions Using Radial Basis Functions. Journal of Global Optimization, 31(1): 153–171. Retherford, J. and McDonald, M. (2011). Estimation and Validation of Gaussian Process Surrogate Models for MEPDG-Based Sensitivity Analysis and Design Optimization. 90th Annual Meeting of Transportation Research Board, Washington, D. C. Robinson, S. (1996). Analysis of Sample-Path Optimization. Mathematics of Operations Research, 21(3): 513-528. Rubinstein, R. and Shapiro, A. (1993). Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method. 346, New York, Wiley. Sarma, K. and Adeli, H. (2001). Bi-Level Parallel Genetic Algorithms for Optimization of Large Steel Structures. Computer-Aided Civil and Infrastructure Engineering, 16(5): 295–304. 199 Schmitt, L. (2001). Theory of Genetic Algorithms. Theoretical Computer Science, 259(1): 1-61. Schölkopf, B. and Smola, A. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. Cambridge, MIT Press. Schrank, D., Eisele, B. and Lomax, T. (2012). TTI’s 2012 Urban Mobility Report. Texas A&M Transportation Institute. Schwartz, J., Wang, W. and Rivera, D. (2006). Simulation-Based Optimization of Process Control Policies for Inventory Management in Supply Chains. Automatica, 42(8): 1311-1320. Simon, G., Volgyesi, P., Maróti, M. and Lédeczi, Á. (2003). Simulation-Based Optimization of Communication Protocols for Large-Scale Wireless Sensor Networks. In proceedings of the IEEE Aerospace Conference. Simpson, T., Mauery, T., Korte, J. and Mistree, F. (2001). Kriging Models for Global Approximation in Simulation-Based Multidisciplinary Design Optimization. AIAA Journal, 39(12): 2233-2241. Smola, A. and Schölkopf, B. (2004). A Tutorial on Support Vector Regression. Statistics and Computing, 14(3): 199–222. Song, M., Yin, M., Chen, X., Zhang, L. and Li, M. (2013), A Simulation-Based Approach for Sustainable Transportation Systems Evaluation and Optimization: Theory, Systematic Framework and Applications, In Proceedings of the 13th COTA International Conference of Transportation Professionals, Shenzhen, China. Spall, J. (1992). Multivariate stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation. IEEE Transactions on Automatic Control, 37(3): 332-341. Stamatiadis, C. and Gartner, N. (1996). MULTIBAND-96: A Program for Variable Bandwidth Progression Optimization of Multiarterial Traffic Networks. Transportation Research Record, 1554: 9-17. Stevanovic, J., Stevanovic, A., Martin, P. and Bauer, T. (2008). Stochastic Optimization of Traffic Control and Transit Priority Settings in VISSIM. Transportation Research Part C, 16(3): 332-349. Sumalee, A. (2004). Optimal Road User Charging Cordon Design: A Heuristic Optimization Approach, Computer-Aided Civil and Infrastructure Engineering, 19(5): 377–392. 200 Sundaram, S., Koutsopoulos, H., Ben-Akiva, M., Antoniou, C. and Balakrishna, R. (2011). Simulation-Based Dynamic Traffic Assignment for Short-Term Planning Applications. Simulation Modelling Practice and Theory, 19(1): 450–462. Swisher, J., Hyden, P., Jacobson, S. and Schruben, L. (2004). A Survey of Recent Advances in Discrete Input Parameter Discrete-Event Simulation Optimization. IIE Transactions, 36(6): 591-600. Szeto, W., Solayappan, M. and Jiang, Y. (2011). Reliability–Based Transit Assignment for Congested Stochastic tTransit Networks. Computer-Aided Civil and Infrastructure Engineering, 26(4): 311–326. Tao, X. and Schonfeld, P. (2005). Lagrangian Relaxation heuristic for Selecting Interdependent Transportation Projects under Cost Uncertainty. Transportation research record, 1931: 74-80. Tao, X. and Schonfeld, P. (2006). Selection and Scheduling of Interdependent Transportation Projects with Island Models. Transportation research record, 1981: 133-141. Tao, X. and Schonfeld, P. (2007). Island Models for a Stochastic Problem of Transportation Project Selection and Scheduling. Transportation research record, 2039: 16-23. Tekin, E. and Sabuncuoglu, I. (2004). Simulation Optimization: A Comprehensive Review on Theory and Applications. IIE Transactions, 36(11): 1067-1081. Teklu, F., Sumalee, A. and Watling, D. (2007). A Genetic Algorithm Approach for Optimizing Traffic Control Signals Considering Routing, Computer-Aided Civil and Infrastructure Engineering, 22(1): 31–43. Tian, X., Mahut, M., Jha, M. and Florian, M. (2007). Dynameq Application to Evaluating the Impact of Freeway Reconstruction. 86th Annual Meeting of Transportation Research Board, Washington, D. C. Ting, C. and Schonfeld, P. (1998). Optimization through Simulation of Waterway Transportation Investments. Transportation Research Record, 1620: 11–16. Torday, A. and Dumont, A. (2003). Floating Car Data Acquisition and Dynamic Route Guidance Strategies Evaluation with AIMSUN Tresidder, T., Zhang, Y. and Forrester, A. (2012). Acceleration of Building Design Optimization through the Use of Kriging Surrogate Models. In Proceedings of the Building Simulation and Optimization Conference: 1-8. 201 Unnikrishnan, A. and Lin, D.(2012). User Equilibrium with Recourse: Continuous Network Design Problem. Computer-Aided Civil and Infrastructure Engineering, 27(7): 512–524. Vaze, V., Antoniou, C., Wen, Y. and Ben-Akiva, M. (2009). Calibration of Dynamic Traffic Assignment Models With Point-To-Point Traffic Surveillance. Transportation Research Record, 2090: 1–9. Villemonteix, J., Vazquez, E. and Walter, E. (2009). An Informational Approach to the Global Optimization of Expensive-to-Evaluate Functions. Journal of Global Optimization, 44(4): 509–534. Vincent, R. and Young, C. (1986). Self-Optimizing Traffic Signal Control Using Microprocessor: The TRRL “MOVA” Strategy for Isolated Intersections. Traffic Engineering and Control, 27: 385-387. Wandekokem, E., Mendel, E., Fabris, F., Valentim, M., Batista, R., Varejao, F. and Rauber, T. (2011). Diagnosing Multiple Faults in Oil Rig Motor Pumps Using Support Vector Machine Classifier Ensembles. Integrated Computer-Aided Engineering, 18(1): 61–74. Wang, S. and Schonfeld, P. (2005). Scheduling Interdependent Waterway Projects through Simulation and Genetic Optimization. Journal of Waterway, Port, Coastal, and Ocean Engineering, 131(3): 89-97. Wang, S. and Schonfeld, P. (2008). Scheduling of Waterway Projects with Complex Interrelations. Transportation research record, 2062: 59-65. Wang, S. and Schonfeld, P. (2012). Simulation-based Scheduling of Mutually Exclusive Projects with Precedence and Regional Budget Constraints. Transportation research record, 2273: 1-9. Wang, S., Yang, N. and Schonfeld, P. (2009). Simulation-based Network Maintenance Planning and Scheduling. Transportation research record, 2100: 94-102. Wang, Y. and Prevedouros, P. (1996). Synopsis of Traffic Simulation Models. The 75th Annual Meeting of Transportation Research Board, Washington, D.C. Wolpert, D. and MacReady, W. (1997). No Free Lunch Theorems for Opimization. IEEE Transaction on Evolutionary Computation, 1: 67-82. Woodbury, M. (1950). Inverting Modified Matrices, Memorandum Rept. 42, Statistical Research Group, Princeton University, Princeton, NJ. 202 Xiong, C. and Zhang, L. (2013). A Descriptive Bayesian Approach to Modeling and Calibrating Drivers’ En Route Diversion Behavior. IEEE Transactions on Intelligent Transportation Systems, 14(4): 1817–1824. Xiong, C., Zhu, Z., He, X., Chen, X., Zhu, S. and Zhang, L. (2014). A 24-hour Large- Scale Microscopic Simulation Case Study of Inter-County Connector in Maryland. The 93rd Annual Meeting of Transportation Research Board, Washington, D.C. Yang, H. (1999). System Optimum, Stochastic User Equilibrium and Optimal Link Tolls. Transportation Science, 33(4): 354-360. Yang, H. and Huang, H. (1998). Principle of Marginal-Cost Pricing: How Does It Work in a General Network? Transportation Research Part A, 32: 45-54. Yang, H. and Huang, H. (2004). The Multiclass, Multicriteria Traffic Network Equilibrium and System Optimum Problem. Transportation Research Part B, 38(1):1–15. Yang, H. and Huang, H. (2005). Mathematical and Economic Theory of Road Pricing. Elsevier, Oxford. Yang, N. (2010). Optimization of Highway Work Zone Decisions Considering Short- term and Long-term Impacts. Doctoral dissertation, University of Maryland. Yang, Y., Lu, H., Yin, Y. and Yang, H. (2013). Optimization of Variable Speed Limits for Efficient, Safe, and Sustainable Mobility. Transportation Research Record, 2333(1): 37-45. Yim, K., Wong, S., Chen, A., Wong, C. and Lam, W. (2011). A Reliability-based Land Use and Transportation Optimization Model. Transportation Research Part C, 19(2): 351-362. Yin, Y. and Lou, Y. (2009). Dynamic Tolling Strategies for Managed Lanes. Journal of Transportation Engineering, 135(2): 45–52. Yoo, T., Cho, H. and Yücesan, E. (2010). Hybrid Algorithm for Discrete Event Simulation-based Supply Chain Optimization. Expert Systems with Applications, 37: 2354-2361. Zhang, L., Chang, G.L., Zhu, S., Xiong, C., Du, L., Mollanejad, M. Hopper, N., and Mahapatra, S. (2013). Integrating an Agent-Based Travel Behavior Model with Large-Scale Microscopic Traffic Simulation for Corridor-Level and Subarea Transportation Operations and Planning Applications. Journal of Urban Planning and Development, 139(2): 94-103. 203 Zhang, H., Jayakrishnam, R. and Recker, W. (2001). An Optimization Algorithm for Freeway Traffic Control. In Proceedings of 4th IEEE Conference on Intelligent Transportation Systems: 102-107. Zhou, L., Yan, G. and Ou, J. (2013). Response Surface Method Based on Radial Basis Functions for Modeling Large-Scale Structures in Model Updating. Computer- Aided Civil and Infrastructure Engineering, 28(3): 210–226. Zhu, L, Schonfeld, P., Kim, Y., Flood, I. and Ting, C. (1999). Queuing Network Analysis for Waterways with Artificial Neural Networks. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 13(05): 365-375.