ABSTRACT Title of dissertation: DUAL-BASED LOCAL SEARCH FOR DETERMINISTIC, STOCHASTIC, AND ROBUST VARIANTS OF THE CONNECTED FACILITY LOCATION PROBLEM Mar a Gisela Bardossy, Doctor of Philosophy, 2011 Dissertation directed by: Professor Subramanian Raghavan Robert H. Smith School of Business In this dissertation, we propose the study of a family of network design prob- lems that arise in a wide range of practical settings ranging from telecommunica- tions to data management. We investigate the use of heuristic search procedures coupled with lower bounding mechanisms to obtain high quality solutions for de- terministic, stochastic and robust variants of these problems. We extend the use of well-known methods such as the sample average approximation for stochastic opti- mization and the Bertsimas and Sim approach for robust optimization with heuris- tics and lower bounding mechanisms. This is particular important for NP-complete problems where even deterministic and small instances are di cult to solve to opti- mality. Our extensions provide a novel way of applying these techniques while using heuristics; which from a practical perspective increases their usefulness. We study the connected facility location (ConFL) problem, which arises in a number of applications that relate to the design of telecommunication networks as well as data distribution and management problems on networks. It combines features of the uncapacitated facility location problem with the Steiner tree prob- lem and is known to be NP-complete. In this setting, we wish to install a set of facilities on a communication network and assign customers to the installed facil- ities. In addition, the set of selected facilities needs to be connected by a Steiner tree. We propose a dual-based local search (DLS) heuristic that combines dual- ascent and local search which together yield strong lower and upper bounds on the optimal solution. Our procedure is applied to a slightly more general version of the ConFL problem that embraces a family of four di erent problems|the Steiner tree-star problem, the general Steiner tree-star problem, the ConFL problem, and the rent-or-buy problem|that combine facility location decisions with connectivity requirements. Consequently, our solution methodology can be successfully applied to all of them. DUAL-BASED LOCAL SEARCH FOR DETERMINISTIC, STOCHASTIC, AND ROBUST VARIANTS OF THE CONNECTED FACILITY LOCATION PROBLEM by Mar a Gisela Bardossy Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial ful llment of the requirements for the degree of Doctor of Philosophy 2011 Advisory Committee: Professor Subramanian Raghavan, Chair/Advisor Professor Michael Ball Professor Zhi-Long Chen Professor Bruce Golden Professor Richard La c Copyright by Mar a Gisela Bardossy 2011 Dedication To my nona, Selma Catalina Gariglio de Filippa. ii Acknowledgments First and foremost I?d like to thank my advisor, Professor Raghu Raghavan, for giving me the invaluable opportunity to work with him over the past ve years. He has always made himself available for help and advice beyond the doctoral pro- gram and there has never been an occasion when I?ve knocked on his door and he hasn?t given me time. Thanks are due to Professor Bruce Golden, Professor Zhi- Long Chen, Professor Michael Ball, and Professor Richard La for agreeing to serve on my dissertation committee and for sparing their invaluable time reviewing the manuscript. I have to thank Professor Kazim Ruhi for his great teaching advice and job search tips. I am also greatly indebted to my graduate fellows: Inbal Yahav, Dina Ribbink, Dilek Gunnec, Andrew Hall and Ming Chen, for their care and love. I was lucky to meet them and share this journey with them. I also want to thank my dear- est friends, Anita and Javier Bianchi, Francisca and Giorgo Sertsios, and Andrea Cavassa, who became my College Park family in these last ve years. I will forever cherish the great times that we spent together and I will miss them a lot as we continue our separate paths and dreams. I owe my deepest thanks to my husband, Jos e, who has always stood by me so I could accomplish my dreams. Words cannot express the gratitude I owe him. I need to thank my parents-in-law for letting me take their son away from them. I would also like to thank my family, who in their very special ways always supported me. Thank you mom for your support and constant encouragement. Thank you brothers iii for taking more family responsibility than you have to and always protecting me. I also want to thank my grandparents for being my role models in family values, love, generosity and joy, and whose memory is a constant inspiration to be a better Christian. Lastly, thank God for all your blessings! iv Table of Contents List of Tables vii List of Figures ix List of Abbreviations x 1 Dissertation Overview 1 2 The Connected Facility Location Problem and Related Problems 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Steiner tree-star (STS) problem . . . . . . . . . . . . . . . . . 5 2.1.2 General Steiner tree-star (GSTS) problem . . . . . . . . . . . 7 2.1.3 Connected Facility Location (ConFL) problem . . . . . . . . . 7 2.1.4 Rent-or-Buy (ROB) problem . . . . . . . . . . . . . . . . . . . 10 2.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Steiner tree-star (STS) and General STS problem . . . . . . . 13 2.2.2 Connected Facility Location (ConFL) problem . . . . . . . . . 14 2.2.3 Rent-or-Buy (ROB) problem . . . . . . . . . . . . . . . . . . . 17 2.3 General Connected Facility Location Problem . . . . . . . . . . . . . 17 2.3.1 Transforming the GSTS problem to the GConFL problem . . 19 2.3.2 Transforming the ConFL problem into the GConFL problem . 20 2.3.3 Transforming the ROB problem into the GConFL problem . . 22 2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 Heuristics for the General Connected Facility Connected Problem 24 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Modeling the GConFL Problem as a Directed Steiner Tree Problem with a Unit Degree Constraint on the Root Node . . . . . . . . . . . 26 3.3 Dual-Based Local Search Heuristic . . . . . . . . . . . . . . . . . . . 29 3.3.1 Dual-Ascent Phase . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.2 Local Search Phase . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 Computational Experiments . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 Problem Generation and Characteristics . . . . . . . . . . . . 40 3.4.2 UFL Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.3 Results on Complete Graphs . . . . . . . . . . . . . . . . . . . 44 3.4.3.1 STS Problem . . . . . . . . . . . . . . . . . . . . . . 44 3.4.3.2 GSTS Problem . . . . . . . . . . . . . . . . . . . . . 47 3.4.3.3 ROB Problem . . . . . . . . . . . . . . . . . . . . . . 49 3.4.3.4 ConFL Problem . . . . . . . . . . . . . . . . . . . . 52 3.4.4 Results on Sparse Instances . . . . . . . . . . . . . . . . . . . 54 3.4.5 Large-Scale Instances and Comparison to Ljubi c?s VNS heuristic 58 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 v 4 The Stochastic Connected Facility Location Problem 63 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3 A Note on Two-Stage Linear Programs with Fixed Recourse . . . . . 70 4.4 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4.1 SConFL with Uncertain Demands . . . . . . . . . . . . . . . . 76 4.4.2 SConFL with Uncertain Locations . . . . . . . . . . . . . . . . 80 4.5 Sample Average Approximation Method . . . . . . . . . . . . . . . . 83 4.5.1 Quality of the Solution . . . . . . . . . . . . . . . . . . . . . . 85 4.6 Proposed Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.7 Computational Experiments . . . . . . . . . . . . . . . . . . . . . . . 89 4.7.1 Expected Value Solutions . . . . . . . . . . . . . . . . . . . . 89 4.7.2 Problem Generation and Characteristics . . . . . . . . . . . . 91 4.7.3 Polynomial-Scenario Model Results . . . . . . . . . . . . . . . 92 4.7.4 SAA Results (Independent-Activation Model) . . . . . . . . . 93 4.7.5 Sample sizes N and N 0 and number of replications R . . . . . 97 4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5 Robust Optimization for the Connected Facility Connected Problem 102 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2 Robust Optimization and The ConFL Problem . . . . . . . . . . . . 105 5.2.1 Bertsimas and Sim?s Robust Optimization Model . . . . . . . 108 5.2.2 Extending Bertsimas and Sim?s Robust Approximation Algo- rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.2.3 The ConFL problem robust counterpart . . . . . . . . . . . . 122 5.3 An Example of RConFL problem using RCOH . . . . . . . . . . . . . 125 5.4 Special Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.5 Computational Experiments . . . . . . . . . . . . . . . . . . . . . . . 131 5.5.1 Problem Generation and Characteristics . . . . . . . . . . . . 131 5.5.2 Results on the RConFL Problem . . . . . . . . . . . . . . . . 136 5.5.2.1 Results for Set 1 - Disk Uncertainty Area . . . . . . 136 5.5.2.2 Results for Set 2 - Square Uncertainty Area . . . . . 141 5.5.2.3 Results for Set 3 - Rectangular Uncertainty Area . . 145 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 vi List of Tables 3.1 Comparison of heuristics for the STS problem on Set 1. M = 3, and facility opening costs are varied . . . . . . . . . . . . . . . . . . . . . 46 3.2 Comparison of heuristics for the STS problem on Set 2. fi = 30, and M factor is varied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 Comparison of heuristics for the GSTS problem on Set 1. M = 3, and facility opening costs are varied . . . . . . . . . . . . . . . . . . . 48 3.4 Comparison of heuristics for the GSTS problem on Set 2. fi = 30, and M factor is varied . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.5 Comparison of heuristics for the ROB problem on Set 3. M factor is varied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 Comparison of heuristics for the ConFL problem on Set 1. M = 3, and facility opening costs are varied . . . . . . . . . . . . . . . . . . . 53 3.7 Comparison of heuristics for the ConFL problem on Set 2. fi = 30, and M factor is varied . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.8 Comparison of heuristics for the ConFL problem on Set 4 . . . . . . . 55 3.9 Performance of the DLS heuristic computed using the best lower bound on Set 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.10 Comparison of heuristics for the ConFL problem on Set 5 . . . . . . . 57 3.11 Performance of the DLS heuristic computed using the best lower bound on Set 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.12 Comparison of the DLS heuristic with the VNS procedure on large- scale instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1 Comparison of heuristics for the Stochastic ConFL. fi = 30, M = 7, and v factor is varied . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.2 Comparison of heuristics for the Stochastic ConFL. fi = 30, M = 7, and v factor is varied . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.3 Average 98% con dence gaps for the SConFL, fi = 30, M = 3, and v factor is varied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.4 Minimum 98% con dence gaps for the SConFL, fi = 30, M = 3, and v factor is varied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.5 Maximum 98% con dence gaps for the SConFL, fi = 30, M = 3, and v factor is varied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.6 Sample variance of E[zNLB] and computational time for various N and R values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.1 Coordinates for nodes in the example . . . . . . . . . . . . . . . . . . 126 5.2 Assignment and tree edge costs in the example . . . . . . . . . . . . . 127 5.3 Assignment costs for each nominal problem in the example . . . . . . 127 5.4 Preliminary results for the example . . . . . . . . . . . . . . . . . . . 127 5.5 Example results for 0 3 . . . . . . . . . . . . . . . . . . . . . . 128 5.6 Facility node location and closest possible demand node location . . . 134 5.7 Facility node location and farthest possible demand node location . . 134 vii 5.8 Assignment costs for the example shown in Figure 5.6 and 5.7 . . . . 135 5.9 Average Gaps for Set 1 (M = 3) . . . . . . . . . . . . . . . . . . . . . 136 5.10 Average Gaps for Set 1 (M = 7) . . . . . . . . . . . . . . . . . . . . . 137 5.11 Average Computational Times in Seconds for Set 1 (M = 3) . . . . . 137 5.12 Average Computational Times in Seconds for Set 1 (M = 7) . . . . . 137 5.13 Average best-case solution loss for Set 1 (M = 3) . . . . . . . . . . . 140 5.14 Average best-case solution loss for Set 1 (M = 7) . . . . . . . . . . . 141 5.15 Average Gaps for Set 2 (M = 3) . . . . . . . . . . . . . . . . . . . . . 141 5.16 Average Gaps for Set 2 (M = 7) . . . . . . . . . . . . . . . . . . . . . 141 5.17 Average Computational Times in Seconds for Set 2 (M = 3) . . . . . 142 5.18 Average Computational Times in Seconds for Set 2 (M = 7) . . . . . 142 5.19 Average best-case solution loss for Set 2 (M = 3) . . . . . . . . . . . 144 5.20 Average best-case solution loss for Set 2 (M = 7) . . . . . . . . . . . 144 5.21 Average Gaps for Set 3 (M = 3) . . . . . . . . . . . . . . . . . . . . . 145 5.22 Average Gaps for Set 3 (M = 7) . . . . . . . . . . . . . . . . . . . . . 145 5.23 Computational Times for Set 3 (M = 3) . . . . . . . . . . . . . . . . 146 5.24 Computational Times for Set 3 (M = 7) . . . . . . . . . . . . . . . . 146 5.25 Average best-case solution loss for Set 3 (M = 3) . . . . . . . . . . . 148 5.26 Average best-case solution loss for Set 3 (M = 7) . . . . . . . . . . . 148 viii List of Figures 2.1 Steiner tree-star example . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 General Steiner tree-star example . . . . . . . . . . . . . . . . . . . . 7 2.3 Connected facility location example . . . . . . . . . . . . . . . . . . . 10 2.4 GSTS problem transformation to GConFL problem . . . . . . . . . . 19 2.5 Transformations that allow us to assume D, F , and S form a partition in the ConFL problem . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6 ConFL problem transformation to GConFL problem . . . . . . . . . 22 3.1 Transforming a GConFL instance, derived from the ROB problem, into a directed Steiner tree problem . . . . . . . . . . . . . . . . . . . 27 3.2 Dual-Ascent Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Pseudocode for local improvements . . . . . . . . . . . . . . . . . . . 39 4.1 Sample variance of qN 0(x) for increasing number of scenarios, N 0 . . . 98 5.1 Bertsimas and Sim?s Algorithm B . . . . . . . . . . . . . . . . . . . . 112 5.2 Robust Combinatorial Optimization Heuristic (RCOH) . . . . . . . . 114 5.3 Example for Lemma 5.2.4 . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4 RCOH for the Robust ConFL problem . . . . . . . . . . . . . . . . . 124 5.5 Example of robust ConFL problem . . . . . . . . . . . . . . . . . . . 125 5.6 Example of rectangular uncertainty region . . . . . . . . . . . . . . . 133 5.7 Example of rectangular uncertainty region . . . . . . . . . . . . . . . 135 5.8 Solutions for di erent values for Set 1 (M = 3) . . . . . . . . . . . 139 5.9 Solutions for di erent values for Set 1 (M = 7) . . . . . . . . . . . 140 5.10 Solutions for di erent values for Set 2 (M = 3) . . . . . . . . . . . 143 5.11 Solutions for di erent values for Set 2 (M = 7) . . . . . . . . . . . 144 5.12 Solutions for di erent values for Set 3 (M = 3) . . . . . . . . . . . 147 5.13 Solutions for di erent values for Set 3 (M = 7) . . . . . . . . . . . 147 ix List of Abbreviations B&C Branch-and-Cut ConFL Connected Facility Location DA Dual Ascent DLS Dual-ascent Local Search DST Directed Steiner Tree EVS Expected Value Solution GRASP Greedy Randomized Adaptive Search Procedure GSTS General Steiner Tree-Star MVP Mean Value Problem RCOH Robust Combinatorial Optimization Heuristic RConFL Robust Connected Facility Location ROB Rent-or-Buy SAA Sample Average Approximation SConFL Stochastic Connected Facility Location SSBND Single-Sink Buy-at-Bulk Network Design STS Steiner Tree-Star UFL Uncapacitated Facility Location VNS Variable Neighborhood Search x Chapter 1 Dissertation Overview In this dissertation, we propose the study of a family of network design prob- lems that arise in a wide range of practical settings ranging from telecommunica- tions to data management. We investigate the use of heuristic search procedures coupled with lower bounding mechanisms to obtain high quality solutions for de- terministic, stochastic and robust variants of these problems. We extend the use of well-known methods such as the sample average approximation for stochastic op- timization (Shapiro and Philpott 2007) and the Bertsimas and Sim approach for robust optimization (Bertsimas and Sim 2003) with heuristics and lower bounding mechanisms. This is particular important for NP-complete problems where even deterministic and small instances are di cult to solve to optimality. Our extensions provide a novel way of applying these techniques while using heuristics; which from a practical perspective increases their usefulness. We study the connected facility location (ConFL) problem, which arises in a number of applications that relate to the design of telecommunication networks as well as data distribution and management problems on networks. It combines features of the uncapacitated facility location problem with the Steiner tree prob- lem and is known to be NP-complete. In this setting, we wish to install a set of facilities on a communication network and assign customers to the installed facil- 1 ities. In addition, the set of selected facilities needs to be connected by a Steiner tree. We propose a dual-based local search (DLS) heuristic that combines dual- ascent and local search which together yield strong lower and upper bounds to the optimal solution. Our procedure is applied to a slightly more general version of the ConFL problem that embraces a family of four di erent problems|the Steiner tree-star problem, the general Steiner tree-star problem, the ConFL problem, and the rent-or-buy problem|that combine facility location decisions with connectivity requirements. Consequently, our solution methodology can be successfully applied to all of them. We discuss a wide range of computational experiments, which indi- cate that our heuristic is a very e ective procedure that nds high quality solutions very rapidly for the deterministic variant of the problem. In the next part of the dissertation, we focus our attention mainly on the ConFL problem because as we show in Chapter 2 the remaining problems can be easily transformed into ConFL instances, and our computational experiments in Chapter 3 reveal that the ConFL problem is the hardest to solve for the DLS heuristic. Regardless, all of the subsequent analyses and solution methodologies for the stochastic and robust variants presented throughout this dissertation can be successfully applied to all of the problem within the broader family. We consider the ConFL problem in the case where there is uncertainty on the assignment costs either due to random quantity demands, unknown customer loca- tions, or varying edge lengths. We show that the optimal solution of the stochastic problem with random demands can be obtained by replacing the unknown values with their averages. However, when customer locations or edge lengths are random 2 this strategy is not optimal. Instead, we de ne a deterministic equivalent problem and provide a set of graph transformations that allow us to apply our DLS proce- dure to this equivalent problem. Our computational results show that this strategy is very e ective for instances with a polynomial number of scenarios. However, as the number of scenarios increases, it becomes impractical to generate a complete deterministic equivalent problem. Under such circumstances, we propose a more ef- fective framework that relies on the sample average approximation (SAA) approach (Shapiro and Philpott 2007) yet it utilizes a heuristic to solve the sample average problems and a lower bounding procedure to construct a con dence interval on the true optimal value. To our knowledge this is the rst discussion of the SAA ap- proach in stochastic programming with the use of heuristics to solve the underlying optimization problems. Finally, we present a robust optimization model for the ConFL problem under customer uncertainty and we show how a DLS heuristic can be e ectively used to obtain high quality solutions to this robust optimization problem. We extend Bertsimas and Sim?s robust optimization solution approach (Bertsimas and Sim 2003) to situations where one has a heuristic upper bound and a lower bound on the optimal solution objective value for each nominal problem. An advantage of the DLS heuristic is that it provides high quality lower bounds in addition to the heuristic solution. We present computational results that demonstrate that the DLS heuristic rapidly obtains high-quality solutions for a large set of test instances for the robust variant of the ConFL problem. We conclude this dissertation with a summary of contributions and directions 3 for future research. 4 Chapter 2 The Connected Facility Location Problem and Related Problems 2.1 Introduction The recent growth of telecommunication networks coupled with digital data management has motivated a range of network design problems that combine facility location with connectivity requirements. These network design problems combine features of the uncapacitated facility location (UFL) problem with the Steiner tree problem. The connected facility location (ConFL) problem belongs to this class of network design problems and is known to be NP-complete. In this chapter we introduce a slightly more general version of the ConFL problem that subsumes a family of four closely related problems (that arise in virtual private network design and data distribution problems on networks) and propose a dual-based local search (DLS) heuristic that combines dual-ascent and local search to obtain high quality solutions rapidly. We rst describe the family of four closely related problems that are special cases of the general version of the ConFL problem that we introduce in this essay. 2.1.1 Steiner tree-star (STS) problem The Steiner tree-star (STS) problem (Lee et al. 1993) is a virtual private network design problem that arises in the design of telecommunication networks for 5 Open facility node Facility node Demand node(a) (b) Figure 2.1: Steiner tree-star example digital data services and can be stated as follows. Given a graph G = (V;E) and a disjoint partition of the nodes in V into two sets: D V , set of demand nodes (also referred to as target nodes in the STS literature), and F V , set of potential facility nodes (also referred to as hubs in the STS literature), we seek to nd a minimum cost tree such that every demand node is connected (or assigned) to a facility node, and the facilities serving demand nodes are connected through a node-weighted Steiner tree T constructed solely on the F nodes (i.e., on G(F ) = (F;E(F ))). Each facility has an opening (or activating) cost, fi 0, that is incurred if the facility is included in the nal network design (regardless of whether the facility serves a demand node). The cost to connect a demand node, j 2 D, to a facility node, i 2 F , is given by an assignment cost aij; while the cost to connect two facilities, i; j 2 F , comes at a signi cantly higher (in terms of cost per unit distance) connection cost bij. The network design cost is thus given by P j2D a(j)j+ P i2T fi+ P fi;jg2T bij, where i(j) is the facility serving demand node j, and T is the Steiner tree connecting the facilities serving demand nodes. Figure 2.1 illustrates an example of the STS problem, where 6 Open facipnlnet ayodf Fipnlnet ayodf Dfmiy dayodf (i) (b) D ila olf ayodf (p) Figure 2.2: General Steiner tree-star example (a) shows the graph G, and (b) represents a feasible solution to the STS problem. 2.1.2 General Steiner tree-star (GSTS) problem Khuller and Zhu (2002) extended the STS problem to a more general setting where the facility and demand node sets are not disjoint and called it the general STS (GSTS) problem. In this case some demand nodes may host a facility in a solution to the problem. Figure 2.2(a) shows a small example of GSTS problem, where one node has dual role and can be both a facility and a demand node. Figure 2.2(b) illustrates a solution where the dual-role node takes the function of a demand node; while Figure 2.2(c) shows a solution where it takes the role of a facility node. Note that when the dual-role node takes the role of a facility node it also satis es its demand node at no additional cost. 2.1.3 Connected Facility Location (ConFL) problem Karger and Minko (2000), Krick et al. (2003), and Havet and Wennink (2004) independently introduced data distribution and management problems in a network setting that arise in information/content distribution networks (such networks are 7 widely prevalent at search providers for example). In the applications discussed in these three papers, there are facilities (or servers) to be located on a network that will contain (or cache) information. Demand nodes make requests for the information. When a demand node j requests a piece of information, it is served from the closest facility i(j) and incurs a cost (i(j); j) (if a demand node j makes multiple requests, say dj requests, the cost is simply dj (i(j); j)). Further, updates to the information on the servers are made over time. If a piece of information is updated, then it must be updated at every facility (or server) on the network. This incurs a cost (i; j) for every edge fi; jg in the network on which this information is sent. Consequently, the cheapest way to update information over facilities (once a choice of which facilities to open has been made) is via a Steiner tree T on the facilities with a cost of P fi;jg2T (i; j) (if information update requests are made, then the cost is P fi;jg2T (i; j)). The goal is to determine (i) what facilities to locate (or open), (ii) which facility serves each demand node, and (iii) how to connect the open facilities; in order to minimize the total cost. In the applications discussed in Karger and Minko (2000) and Havet and Wennink (2004) the facility opening costs are zero, while in Krick et al. (2003) there are costs associated with opening facilities. The problems introduced by Karger and Minko (2000), Krick et al. (2003) and Havet and Wennink (2004) can be modeled as ConFL problems (Gupta et al. 2001, introduced the terminology ConFL) that can be stated as follows. We are given a graph G = (V;E), and three sets: D V , set of demand nodes (or customers); F V , set of potential facility nodes; and S V , set of potential Steiner nodes, 8 with D[F [S = V and F \S = ;. We seek to nd a minimum cost network where every demand node is assigned to an open facility and open facilities are connected through a Steiner tree T constructed on the subgraph of G on the nodes F [ S (i.e., G(F [ S) = (F [ S;E(F [ S))). There are facility opening costs, fi 0, incurred for each facility that serves a customer; assignment costs, aij 0, for assigning a customer j 2 D to a facility i 2 F ; and edge costs, bij 0, for an edge fi; jg 2 E(F [S) if it is used on the Steiner tree T . The nodes in S may be viewed as pure Steiner nodes and can only be used in the tree T as Steiner nodes, while the nodes in F may be used as Steiner nodes on the tree T without incurring a facility opening cost when no customers are assigned to them. The nal network cost is given by P j2D ai(j)j+ P i2Z fi+ P fi;jg2T bij, where i(j) is the facility serving demand node j, F is the set of open facilities (or facilities serving customers), and T is a Steiner tree connecting the open facilities. The data distribution problems introduced by Karger and Minko (2000), Krick et al. (2003), and Havet and Wennink (2004) may be modeled as a ConFL problem by setting aij = dj (i; j), bij = (i; j), and fi as the facility opening cost. Figure 2.3 illustrates an example of the ConFL problem and a feasible solution. In the de nition of the ConFL problem above, it is possible that D \ F 6= ; or D \S 6= ;. In both these situations a demand node j can be used on the Steiner tree T . In this case, in addition to the edges adjacent to node j in the Steiner tree T , we have a connection fi(j); jg between node j and the facility it is assigned to. Consequently, the minimum cost network is not necessarily a tree. On the other hand, if D \ F = ; and D \ S = ; (meaning that D, F , and S form a partition), a 9 Open facility node Facility node Demand node(a) bteine node ( ) Figure 2.3: Connected facility location example demand node cannot be used on the Steiner tree T and the minimum cost network is a tree. It can be viewed as consisting of a core tree T (where the leaf nodes must be open facility nodes), with the assignment edges dangling from open facility nodes on the core tree. Typically, the papers in the computer science literature allow for D \ F 6= ; or D \ S 6= ;, while the papers in the operations research literature assume that D, F , and S form a partition. It is easy to transform ConFL instances where D\F 6= ; or D\S 6= ; into ones where the sets D, F , and S form a partition. We will discuss this transformation in x2.3.2. 2.1.4 Rent-or-Buy (ROB) problem The rent-or-buy (ROB) problem, often viewed as a special case of the ConFL problem, has the feature that facilities can be opened at any node of the graph (i.e., F = V ) at zero cost. (The term rent-or-buy comes from a related problem called the single-sink buy-at-bulk network design (SSBND) problem (Salman et al. 1997) which ROB is equivalent to when the SSBND problem has two cable types. 10 Here the idea is that edges on the network can either be rented, in which case the cost function aij applies, or can be purchased in which case the cost function bij applies.) In our opinion, the ROB problem can also be viewed as a special case of the GSTS problem with zero facility opening costs and D F = V . In the ROB problem any demand node can act as a facility node and hence serve other customers (i.e., demand nodes). Consequently, the cost of an edge depends on the role of its adjacent nodes. If one of the end points of the edge plays the role of a demand node, meaning that the edge is connecting a demand node to an open facility, we say the edge is an assignment edge and its cost is aij. Otherwise, the edge belongs to the Steiner tree T (and we call it a tree edge) and its cost is bij. (In all four problems (STS, GSTS, ConFL, and ROB) we will use the term customer interchangeably with demand node. We will also refer to edges connecting demand nodes to facilities as assignment edges, and potential edges on the Steiner tree connecting open facilities as tree edges.) In this problem, the nal network cost is given by P j2D ai(j)j + P fi;jg2T bij, where i(j) is the facility which serves demand node j (note that aii = 0 when demand node i is used as a facility), and T is the Steiner tree connecting all the open facility nodes. The data distribution and management applications introduced by Karger and Minko (2000) and Havet and Wennink (2004) are actually instances of the ROB problem. In a di erent setting, Nuggehalli et al. (2003) considered the problem of energy-conscious cache placement in wireless ad hoc networks. The objective is to nd an e ective strategy to cache the server information at some nodes distributed across the network while optimally considering the trade-o between energy consumption and access latency. 11 Interestingly, this problem is also an instance of the ROB problem (Ljubi c 2007) indicating the widespread application of the ConFL and ROB problems. All four problems|STS, GSTS, ConFL, ROB|are of signi cant interest from a practical perspective; both in the telecommunications/virtual private network de- sign context as well as in the data distribution and management context. There has been a considerable amount of research devoted to these four problems from an ap- proximation algorithms perspective, but somewhat limited study of these problems from a mathematical programming perspective. Clearly, all four problems combine a facility location problem with a Steiner tree problem. In this chapter, we exploit the similarity between the four problems and de ne a slightly more general version of the ConFL problem that we call the general ConFL (or GConFL) problem. In the next chapter, we devise a high-quality dual-based local search heuristic for the GConFL problem that provides both tight lower and upper bounds. Our heuristic solution strategy consists of rst formu- lating the GConFL problem as a directed Steiner tree problem with a unit degree constraint on the root node. We then implement a dual-ascent procedure to obtain a lower bound and an upper bound (feasible solution) to the optimal solution. We then apply a set of local improvement steps on the feasible solution obtained by the dual-ascent procedure to signi cantly improve the quality of the solution. We con- ducted an extensive set of computational experiments (reported on the next chapter) that demonstrated the e cacy and e ciency of our DLS heuristic. These results included instances on complete graphs as well as non-complete graphs. Across the four problems over the set of test instances our DLS heuristic consistently found 12 solutions of very high quality. Over the 2748 problems tested the DLS heuristic found solutions that were on average at most 1.07% from optimality. 2.2 Literature Review 2.2.1 Steiner tree-star (STS) and General STS problem The STS problem was introduced by Lee et al. (1993). Later, Lee et al. (1996) described valid inequalities and facets for the STS polytope (polytope of integer feasible STS solutions) and developed a branch-and-cut procedure for the STS problem. Their procedure was able to solve Euclidean test problems with less than 200 nodes in up to 3 hours of computational time. Xu et al. (1996a) and Xu et al. (1996b) proposed two tabu search heuristics for the STS problem which they tested in a small sample of random and grid problems with up to 600 nodes. Subsequently, Chu et al. (2000) proposed a genetic algorithm for the STS problem. Their computational experiments indicated that their genetic algorithm is of similar quality to Xu et al. (1996a). For the same set of problems, they nd no di erence in performance between the two heuristics; however, their genetic algorithm required less computational time. Note, however, that since nei- ther heuristic computes a lower bound to the optimal solution, no optimality gaps were reported in neither of these papers. Khuller and Zhu (2002) introduced the GSTS problem and gave two approximation algorithms with approximation ratios of 5.16 and 5, respectively. 13 2.2.2 Connected Facility Location (ConFL) problem Gupta et al. (2001) arrived at the ConFL problem when considering a virtual private network design with demand uncertainty. Here, a set of demand nodes are to be connected using a virtual private network. One is given the maximum incoming and outgoing tra c from each demand node, but one does not know the actual tra c matrix between the nodes. One wishes to construct a minimum cost tree network and provision su cient capacity on its edges so that the tree network can support any tra c matrix where the aggregate incoming and outgoing demands respect the maximum limits for each node. Gupta et al. (2001) reduced this virtual private network design problem to the ConFL problem. They then gave a 10.66 approximation algorithm for the ConFL problem by adapting a rounding technique of Shmoys et al. (1997) on an integer programming formulation of the problem with an exponential number of constraints. Swamy and Kumar (2004) described a primal-dual approximation algorithm for the ConFL problem. Their algorithm works in two phases and has an approxi- mation ratio of 8.55. The rst phase is a facility location phase where they decide which facilities to open, connect demands to facilities, and cluster the demands at each facility. In this phase, demands are clustered so that each open facility serves at least a certain minimum number of demand points. (In Swamy and Kumar (2004) the ratio between the costs bij and aij is constant, and bij aij is treated as the mini- mum number of demand points to cluster at a facility node.) The second phase is a Steiner phase where the open facilities are connected by a Steiner tree. Jung et al. 14 (2008) improved upon Swamy and Kumar?s algorithm (by making changes in phase 1) and devised a 6.55 approximation algorithm for the ConFL problem. Recently Eisenbrand et al. (2008) presented a randomized algorithm that im- proves the approximation ratio for the ConFL problem to 4. In their algorithm the idea is to (i) run an approximation algorithm for the UFL problem, (ii) randomly sample demand nodes, and open the facilities serving sampled customers in the approximate solution (all the demands are now assigned to these open facilities), and (iii) compute an approximate Steiner tree on the opened facilities. The authors showed that the approximation ratio degrades slightly to 4.23 when the algorithm is derandomized. The theoretical computer science community has focused on developing ap- proximation algorithms for the ConFL problem. In all of these papers no computa- tional results are presented for any of the proposed approximation algorithms, and thus their e ectiveness in practice is unknown. The ConFL problem has only recently gained attention in the operations re- search community. Ljubi c (2007) introduced a variable neighborhood search (VNS) heuristic that is combined with reactive tabu search. She also proposed a branch- and-cut (B&C) approach for solving the ConFL problem to optimality. She con- structed ConFL test problems by combining Steiner tree problem instances from OR-Library 1 and UFL problem instances from U Lib 2. She reported that the VNS heuristic found solutions that were up to 10% from the lower bound provided 1http://people.brunel.ac.uk/~mastjjb/jeb/orlib/steininfo.html 2http://www.mpi-inf.mpg.de/departments/d1/projects/benchmarks/UflLib/ 15 by the branch-and-cut algorithm. It is our understanding (Ljubi c 2009) that due to a computational error the values of the lower bounds described in Ljubi c (2007) are incorrect and in some instances may be lower than the ones reported in the paper. We should note that the approaches used by Swamy and Kumar (2004), Ljubi c (2007), and Jung et al. (2008) assumed that one of the facilities in the optimal solution is known a priori. We nd that interpretation of the ConFL somewhat restrictive. If one does not know a priori one of the facilities in the optimal solution, then their solution procedures need to be applied jF j times (once for each facility node) and the best solution selected. In this chapter, we will assume that we do not have any a priori knowledge of an open facility in the optimal solution. Tomazic and Ljubi c (2008) proposed a greedy randomized adaptive search procedure (GRASP) algorithm for the ConFL problem. (Here the authors do not assume a priori knowledge of an open facility in the optimal solution.) The pro- posed heuristic is a multi-start iterative approach, where for each start a greedy construction de nes a starting solution. Then, local improvements consisting of moves where a single facility is opened or closed are applied. Finally, a shortest path Steiner tree heuristic is applied to nd a Steiner tree on the open facilities. The procedure was tested on three sets of randomly generated graphs with varying topologies and cost structures. On those instances, the GRASP procedure provided results whose average gaps were as large as 10% from the optimal solution. 16 2.2.3 Rent-or-Buy (ROB) problem Since the ROB problem may be viewed as a special case of the ConFL prob- lem, some of the heuristics proposed for the ConFL problem can be applied to the ROB problem with no deterioration in the approximation ratio, and in many cases with an improvement in the approximation ratio. Gupta et al. (2001) found that their rounding heuristic is a 9:002-approximation algorithm for the ROB prob- lem while Swamy and Kumar (2004) showed that their primal-dual-approach is a 4:55-approximation algorithm. Nuggehalli et al. (2003) provided a 6-approximation algorithm for the ROB problem. Gupta et al. (2003) proposed a 3.55-randomized approximation algorithm for the ROB problem. The best known approximation algorithm for the ROB problem is due to Eisenbrand et al. (2008), who proposed a randomized approximation algorithm with a performance bound of 2.92 that when derandomized has an approximation ratio of 3.28. 2.3 General Connected Facility Location Problem One of the key di erences between the STS and the ConFL problems is that in the STS problem a facility node in the network design that does not serve a customer incurs a facility opening cost, fi, while in the ConFL problem a facility incurs a facility opening cost only if it serves a customer. To create a generalization that encompasses all four problems, we slightly alter the original de nition of the ConFL problem (with some additional changes) and require that a facility node in T incurs a facility opening cost regardless of whether it serves a customer or not. 17 Speci cally, we de ne the GConFL problem as follows. Given a graph G = (V;E) and a disjoint partition of V into three subsets of nodes: namely, D, the set of demand nodes; F , the set of potential facilities; and S, the set of potential Steiner nodes; we seek a minimum cost tree solution such that (i) a set of facility nodes is opened, (ii) every demand node is assigned to an open facility, and (iii) open facilities are connected through a Steiner tree T constructed on G(F [ S) = (F [ S;E(F [ S)). Each facility i used in T incurs a facility opening cost, fi 0; the cost of assigning demand node j 2 D to facility i 2 F is given by aij (as before we refer to these edges as assignment edges and the remaining edges as tree edges); and the cost of tree edges is given by bij. The network design cost is de ned by P j2D ai(j)j + P i2V (T )\F fi + P fi;jg2T bij (where as before i(j) denotes the facility serving demand node j). Observe that the graph for the GConFL problem has some special charac- teristics. Since each node takes a unique role, the only edges from demand nodes (D) are to potential facility nodes (F ) (i.e., there are no edges between any pair of demand nodes, and there are no edges in the graph between any demand node and any Steiner node (S)). Further, use of a facility node necessarily incurs a facility opening cost, while there are no opening costs associated with using a Steiner node. Clearly, the STS problem is a special case of the GConFL problem with S = ; and sets F and D de ned identically to the STS problem. Here the cost for assignment edges and tree edges remains unchanged. On the other hand, to see that the GSTS, ConFL and ROB problems are special cases of the GConFL problem, we must apply a set of graph transformations. 18 Ope One pfacailtyodF DFmpydtyodF D(pc)bocFtyodF Ofe i i i i i i' j j i j' jj j' j jj' aij, bij aij, bijaij bij aij aij aij bij aij 0 000 Figure 2.4: GSTS problem transformation to GConFL problem In each of these three problems, a node can have multiple roles. For example, in the GSTS problem a node can be a demand node as well as a facility node. Similarly, in the ConFL problem a node can be a facility node (serving demand nodes and incurring a facility installation cost) or a Steiner node (not serving any demand nodes and not incurring a facility installation cost). The transformations are based on a node splitting strategy where each node has a unique role (either demand node, facility node, or Steiner node). For ease of exposition (and brevity) we illustrate each of these transformations graphically. 2.3.1 Transforming the GSTS problem to the GConFL problem In an instance of the GSTS problem, the node sets F and D are not disjoint (F \D 6= ;) and there are no potential Steiner nodes (S = ;). To address the fact that the node sets are not disjoint, we duplicate every node i 2 (F \D) creating an additional copy i0. One copy, i, is treated as a facility node in the GConFL problem and the other copy, i0, is treated as a demand node in the GConFL problem. Further, nodes i and i0 are connected by a zero cost edge, and the edge fi; jg in GSTS that has a dual-role node adjacent to it (i.e., if either i; j 2 (F \ D)) is replaced as 19 Ope One facicltyodFn DnmfoFyodFn (lncon)yodFn i i i ii' j j j j' aij , ij aij aij Ofe i i'i j j' aij j , ij baij j Oae i i j j' aijaij j OFe i i' j j , ij, ij 0 i , ij baij 0 , ij Obe i ii' j j aij, ij baij 0 , ij , ij 0j' i ii' j j aijaij 0 i ii' j j aij, ij baij 0 , ij j' O e O e Figure 2.5: Transformations that allow us to assume D, F , and S form a partition in the ConFL problem indicated in Figure 2.4. There are three cases: (i) either one of the end points is in F \ D and the other in F which is shown in Figure 2.4(a), or (ii) one of the end points is in F \D and the other is in D which is shown in Figure 2.4(b), or (iii) both of the end points are in F \D which is shown in Figure 2.4(c). Observe that while in the original GSTS problem instance some edges have di erent costs depending on the role of the edge (i.e., whether it is an assignment edge or a tree edge), in the transformed GConFL problem each node has a unique role and thus each edge has a unique cost. 2.3.2 Transforming the ConFL problem into the GConFL problem We rst illustrate how an instance of the ConFL problem where D \ F 6= ; or D \ S 6= ; can be transformed into a ConFL instance where D, F , and S form 20 a partition. If a demand node can also be used as a Steiner node, we simply create two copies of the node with one copy representing the node as a demand node and the other copy representing the node as a Steiner node. Similarly, if a demand node can also be used as a facility node, we create two copies of the node with one copy representing the node as a demand node and the other copy representing the node as a facility node. With this duplication, the edges between the duplicated nodes and the remaining nodes in the graph are updated as shown in Figure 2.5. (There are 8 possible cases that are illustrated in the gure). Consequently, without loss of generality, we can assume that in the ConFL problem D, F , and S form a partition. We now show how to transform an instance of the ConFL problem into an instance of the GConFL problem. In the de nition of a ConFL problem instance, the facility opening cost is only incurred when a demand node is assigned to it. To address this situation we duplicate every facility node F in the ConFL problem creating an additional copy i0 for every node i 2 F . The copy i is treated as a facility node in the GConFL problem and the other copy i0 is treated as a Steiner node in the GConFL problem. Further, nodes i and i0 are connected by a zero cost edge. Edges between a facility i and a node j 2 V in the ConFL problem are replaced as shown in Figure 2.6 to transform it into a GConFL problem. There are three cases: (i) either j 2 D which is shown in Figure 2.6(a), or (ii) j 2 S which is shown in Figure 2.6(b), or (iii) j 2 F which is shown in Figure 2.6(c). We should note that it is possible to use fewer edges in this transformation. For example, an alternate transformation with fewer edges is to delete edge fi; jg in Figure 2.6(b), and delete edges fi; jg, fi; j0g, and fi0; jg in Figure 2.6(c). However, in terms of our 21 Ope np fafciltyod FdDptoltyod mcdftd(ltyod O e i i'i i i i' j j j'j j ) ) ) aij aij , ij , ij Obe i i'i j j ) , ij , ij , ij , ij , ij , ij Figure 2.6: ConFL problem transformation to GConFL problem local search heuristic (described in the next chapter) we found it convenient to use the transformations described in Figure 2.6, since they have the property that if the graph on G(F [ S) is complete for the ConFL problem, then the graph induced on G(F [ S) after transformation to the GConFL problem is also complete. 2.3.3 Transforming the ROB problem into the GConFL problem The ROB problem is a special case of GSTS with fi = 0; 8i 2 F . Hence, we apply the transformation described for the GSTS problem in x2.3.1 to convert the ROB problem into a GConFL problem. 2.4 Conclusions Though the four problems arise in very di erent settings and so far they have received individual treatment, we have shown that they are special cases of a more general problem and can in fact be treated as one problem. The distinctive and unifying characteristics of these problems are (i) that all of them combine facility location decisions with connectivity requirements and (ii) that demand nodes are 22 connected through assignment edges dangling from the core tree. Being able to de ne a more general problem that encompasses all of them allows us to develop heuristics with broader applicability. Furthermore, it allows us to observe that many of the approximation algorithms and heuristics proposed for some of these problems may in fact be transferable and applicable to the other problems within the family with minimum or no adjustments. In the following chapters, we use this observation to focus our attention on the GConFL problem with the computational focus on the ConFL problem. How- ever, the heuristics and methodologies developed throughout this dissertation are applicable to the whole family of problems discussed in this chapter. 23 Chapter 3 Heuristics for the General Connected Facility Connected Problem 3.1 Introduction In this chapter we propose a powerful heuristic that combines traditional math- ematical programming with local search procedures to yield high-quality solutions rapidly for the GConFL problem and consequently for the four related problems. Our heuristic has two signi cant advantages over the approximation algorithms and heuristics discussed in the literature and reviewed in Chapter 2. The main feature is that it yields a solution within seconds for instances with up to 200 nodes, and within minutes for larger instances with up to 500 nodes, as opposed to within hours for either case as the best-known exact methods. The second advantage of our heuristic is that with each solution it provides a quality measurement in per- cent, , that speci es how far the solution is from optimality. In other words, given a solution one knows that the solution is at most % from optimality. This is a distinctive characteristic that sets it apart from any of the other heuristics in the literature. While approximation algorithms do have a constant worst-case ratio from optimality, it is often very loose (up to 4.23 times the optimal solution value with the best-known approximation algorithm for the ConFL problem) and it does not provide any concrete information to the decision-maker about the particular solution at hand. Furthermore, our computational results show that the solutions 24 obtained by our heuristic are consistently within less than 5% from optimality. Our proposed heuristic, dual-based local search (DLS) heuristic, works in a sequence of stages. In the rst stage we transform the problem into a pure directed network design problem with all its costs at the arc level. We speci cally model the GConFL problem as a directed Steiner tree problem with a unit degree constraint on the root node. We exploit this transformation to formulate the problem as a multicommodity network ow problem and incorporate the unit degree constraint into the objective function with a su ciently large Lagrangian multiplier. Next, we use a dual-ascent procedure, which has been shown to be successful for the directed Steiner tree problem, to obtain an initial solution to the problem and a lower bound to the optimal solution. Finally, we improve upon dual-ascent?s initial solution by local search movements; closing open facilities and reconstructing the tree on the remaining open facilities, and repeat this process until we cannot nd any further improvements in the solution total cost. As an alternative we present another heuristic that combines mathematical programming and local search procedure. Some of the approximation algorithms found in the literature for the ConFL problem rely on decomposing the problem into two subproblems: uncapacitated facility location problem and Steiner tree problem. Consequently, we devise a heuristic that rst solves the facility location problem to optimality disregarding connectivity requirements; secondly, constructs a Steiner tree on the open facilities using dual-ascent; and lastly, improves upon the solu- tion by local search movements. We called this heuristic the uncapacitated facility location (UFL) heuristic. Furthermore, the UFL heuristic allows us to determine 25 whether the high-quality solutions yielded by the DLS heuristic could be largely at- tributed to the local search phase. In our computational experiments, we nd that even though this seems a reasonable strategy, the solutions are consistently and sig- ni cantly worse than the solutions yield by the DLS heuristic. We conclude that it is the combination of dual-ascent (on the entirety of the problem) with local-search that produces high-quality solutions. 3.2 Modeling the GConFL Problem as a Directed Steiner Tree Prob- lem with a Unit Degree Constraint on the Root Node We now discuss how to model the GConFL problem as a directed Steiner tree (DST) problem with a unit degree constraint on the root node. We will use this transformation to apply a dual-ascent strategy to obtain lower and upper bounds for the GConFL problem. We rst construct a directed graph H = (V;A) from the graph G = (V;E) of the GConFL problem as follows. 1. Replace every edge fi; jg 2 E(F [S) by two directed arcs (i; j) and (j; i) with cost cij = bij, if j 2 S (and cji = bij, if i 2 S), or cost cij = bij + fj, if j 2 F (and cji = bij + fi, if i 2 F ). 2. Replace every assignment edge fi; jg between facility i and demand node j by an arc (i; j) with cost cij = aij. 3. Create an arti cial root node s, and create an arc from s to every node i 2 F with cost csi = fi. 26 Open nfacilty oydpitcilty FDmFpm ()fnbnenp c)llfcilty Fem o p )l ycilty Figure 3.1: Transforming a GConFL instance, derived from the ROB problem, into a directed Steiner tree problem Figure 3.1 illustrates the transformation for a GConFL instance, derived from a ROB problem. Figure 3.1(b) shows the transformation of the ROB problem into a GConFL problem, and Figure 3.1(c) the transformation to a directed graph. We can now view the GConFL problem as a DST problem on H. On H we would like to construct a minimum cost DST rooted at node s and connected to all demand nodes D (i.e., s has a directed path to every node in D) with the condition that the outdegree of node s is equal to one. (We note that we do not actually need to explicitly introduce an arti cial root node s. Instead, we can use any of the demand nodes D as the root node. We will refer to this problem as the unit degree directed Steiner tree (UDDST) problem.) The unit outdegree of node s ensures that the graph obtained after deleting (the arti cial root node) s is connected. Notice that on H no costs are associated with any of the nodes, as the facility costs are now included in the arc costs on H. It is easy to see that every feasible solution to GConFL on G can be converted into a feasible UDDST on H with the same cost (simply connect the root node s to one of the facility nodes in the solution, and direct the tree solution to the GConFL problem away from s). Likewise, every feasible solution to the UDDST problem on H corresponds to a feasible solution to the GConFL 27 problem on G with identical cost (simply delete the root node s, the resulting tree in an undirected sense provides a feasible solution to the GConFL problem). We now provide two formulations for the UDDST problem. Both formulations follow from well-known directed formulations for the Steiner tree problem, with an additional constraint for the unit degree constraint on the root node. The rst model is a directed cut model, while the second model is a multicommodity ow model. Both models assume that arc costs are non-negative. z = Minimize X (i;j)2A cijyij (3.1) subject to: X (i;j)2 (R) yij 1 for all R V , s 62 R, R \D 6= ; (3.2) X j2F ysj 1 (3.3) yij 0 and integer for all (i; j) 2 A. (3.4) In the above model yij is an integer variable denoting the number of copies of arc (i; j) in the solution. (We note that since arc costs are non-negative it is su cient to de ne yij as integer instead of binary.) Constraints (3.2) are the standard directed cut constraints, and constraint (3.3) is the degree constraint on the root node. Since there must be at least one arc out of the root node, it is su cient to de ne the constraint as an inequality instead of a strict equality. Alternatively, for each node j 2 D, we create a commodity with the origin as the root node s and node j as the destination node. Let K denote the set of commodities, and D(k) denote the 28 destination node of commodity k. z = Minimize X (i;j)2A cijyij (3.5) subject to: X (j;i)2A fkji X (i;l)2A fkil = 8 >>>>>>< >>>>>>: 1 1 0 if i = s; if i = D(k); otherwise; for all i 2 V & k 2 K (3.6) fkij yij for all (i; j) 2 A & k 2 K (3.7) X j2F ysj 1 (3.8) fkij 0 for all (i; j) 2 A & k 2 K (3.9) yij 0 & integer for all (i; j) 2 A.(3.10) In the above directed ow formulation for the UDDST problem, constraints (3.6) are the standard ow balance constraints. Constraints (3.7) are forcing constraints that require that an arc be in the design, if there is ow on it. Finally, constraint (3.8) is the unit degree constraint on the root node. 3.3 Dual-Based Local Search Heuristic Our heuristic can be viewed as a two-phase procedure. The rst phase is a dual-ascent procedure applied to the UDDST problem that yields both a feasible solution and a lower bound on the optimal solution value. At the conclusion of this phase, we have a feasible solution to the GConFL problem consisting of a set of 29 open facilities F F , a set of Steiner nodes S S, and a tree solution on F , S, and D. The second phase is a local search phase that tries to improve the solution obtained by the dual-ascent procedure for the GConFL problem. Our local search heuristic limits itself to improvements that only include nodes in F and S. In other words, it tries to obtain improvements by nding a better tree on the existing set of nodes, and by closing open facilities and reassigning demand nodes to facilities as needed. 3.3.1 Dual-Ascent Phase If we were to relax the degree constraint on the root node to the UDDST problem, we obtain the DST problem. Dual-ascent has been a successful solution strategy for the Steiner tree problem. Our rst stage is motivated by the desire to utilize this solution strategy to obtain a good lower bound for the GConFL problem as well as a high-quality initial solution for our local search phase. Suppose we relax constraint (3.8) by dualizing it. Then the resulting Lagrangian problem LR( ) is zLR( ) = Minimize X (i;j)2A cijyij + ( X j2F ysj 1) subject to: (3:6); (3:7); (3:9); (3:10): Ideally, we would like to obtain the best possible lower bound on z by solving the Lagrangian dual problem zLD = max 0 zLR( ). It is well-known that zLR( ) is piecewise linear and concave. Further, since ( P j2F ysj 1) 0 for every feasible solution that satis es constraints (3.6), (3.7), (3.9), and (3.10), zLR( ) is a non- 30 decreasing function of . Intuitively, zLR( ) increases with , until it reaches a plateau (i.e., it is at) for su ciently large . In other words, we can solve the Lagrangian dual problem by solving LR( ) for a su ciently large value of . Further, when zLR( ) is at a plateau, the solutions to LR( ) have unit degree (otherwise zLR( ) would not have a slope of 0 at that point) implying that zLD = z. Rather than working with the primal problem, we will obtain a lower bound on z by working with the dual of the linear programming relaxation of the directed ow formulation for the UDDST problem. This dual may be stated as follows. t( ) = Maximize X k2K vkD(k) (3.11) subject to vkj v k i w k ij for all (i; j) 2 A and k 2 K (3.12) X k2K wkij cij for all (i; j) 2 A; i 6= s (3.13) X k2K wksj cij for all (s; j) 2 A (3.14) wkij 0 for all (i; j) 2 A and k 2 K (3.15) 0: (3.16) Observe that for a given value of , t( ) zLR( ). (In fact if we were to move to the right hand side of constraint (3.14), the above dual may be viewed as the dual to the linear relaxation of LR( ).) Solving for t( ) for any choice of 0 provides a lower bound on zLR( ), and in particular solving for t( ) for a su ciently large value of provides a lower bound on z. Thus, our strategy to obtain a lower bound is to choose a su ciently large value of and apply the dual-ascent procedure for 31 the Steiner tree problem. Notice that the Lagrange multiplier may also be viewed as an arti cial cost that is added to the cost of the arcs out of the root node to ensure that the solution has outdegree 1 at the root node. We apply the dual-ascent procedure for the Steiner tree as described in Wong (1984) and Balakrishnan et al. (1989), and re ned in Raghavan (1995). If we are given values for the w variables and , the dual problem separates by commodity. The subproblem corresponding to commodity k is the dual to the directed shortest path problem between the root node and node D(k), with arc lengths wkij. Dual-ascent iteratively increases the value of the w variables to improve the dual objective until no further increase is possible. This is achieved in the following fashion. Every arc for which either constraint (3.13) or (3.14) is satis ed at equality is called a tight arc. Any commodity k, that has a directed s{D(k) cut with no tight arcs across it, is a candidate for improving the dual objective. (We denote the nodes in the destination side of this cut as Uk and refer to them as an ascent set, and to the arcs across this cut as (Uk).) The dual objective can be increased by increasing all the wkij variables in (Uk) until one of the arcs across becomes tight. This is called a basic dual-ascent step. Our dual-ascent implementation uses the commodity cycling rule of Balakrishnan et al. (1989) to choose the commodity and directed cut on which a dual-ascent iteration is performed. This iteratively grows the ascent sets for each commodity Uk, starting from Uk = D(k). It maintains the property that every node in Uk has a directed path to D(k) consisting solely of tight arcs. Raghavan (1995) shows this rule to be equivalent to considering minimal ascent sets (a set is a minimal ascent set if it does not contain an ascent set as a strict subset) in the dual-ascent iterations. At the 32 conclusion of the dual-ascent procedure, we have an approximate dual solution, i.e., a lower bound, and a network of tight arcs (referred to as the auxiliary network) on which there is a directed path from the root node to every demand node. A feasible solution (upper bound for the DST problem) consisting solely of tight arcs is obtained by iteratively deleting arcs in the reverse order in which they become tight, if their deletion from the auxiliary network does not result in a network where there is no directed path from s to a demand node. (A feasible solution consisting solely of tight arcs satis es the primal complementary slackness conditions.) In our implementation, we add a su ciently large cost to all of the arcs out of the root node. We then apply the dual-ascent procedure for the DST problem to obtain a lower bound and a Steiner tree. If this Steiner tree has outdegree 1 at the root node, it is also a feasible upper bound for the UDDST problem. For a su ciently large , we now argue this is precisely the case. As the value of increases, the cost of the arcs out of the root node e ectively increases. Thus for a su ciently large , each of the nodes in F will belong to the ascent sets Uk for k 2 D before any of the arcs out of the root node s become tight in the dual-ascent procedure. In the next set of dual-ascent steps, one or more arcs out of the root node will become tight, and the dual-ascent procedure will terminate (as there will now be a path from s to all of nodes in D in the auxiliary network). Observe now in the reverse delete procedure to obtain an upper bound, we rst delete arcs out of the root node as they became tight last. Consequently, all of the tight arcs out of the root node except for one will be deleted in the reverse delete step, and the upper bound produced by dual-ascent will have outdegree 1. 33 We make a few nal notes before proceeding to the local search phase. Setting jF [ Sjmax(i;j)2A cij ensures that the upper bound is feasible to the UDDST problem (notice that the longest acyclic path from s to a node in D has at most jF [ Sj intermediate nodes, from which the above value is computed). Also, since z( ) and t( ) have in the objective function, to obtain upper and lower bounds for the UDDST problem, we should subtract from the upper and lower bounds obtained by applying dual-ascent for the DST problem (once the arcs out of the root node have their cost increased by ). Following we illustrate this dual-ascent procedure with an example. Dual-Ascent Example We consider the example shown in Figure 3.2. Figures 3.2(a) shows the arcs costs, node s is the source node, and D = f1; 4g. The cost of the arcs out of the arti cial source node, s, have already been increased by = 20. Initially, Z = f;g, (v;w) = (0;0), and L = 0. The sets f1g and f4g are minimal ascent sets. The algorithm rst performs a basic dual-ascent set step on f1g. As a result arc (2; 1) becomes tight and is added to Z. The dual objective function increases by 2 units to L = 2. The algorithm then performs a basic dual-ascent step on f4g. As a result arc (3; 4) becomes tight and is added to Z. The dual objective function increases by 1 unit to L = 3. Figure 3.2(b) shows the digraph at the conclusion of the rst two iterations. At this point, sets f1; 2g and f3; 4g are minimal ascent sets. The algorithm rst performs a basic dual-ascent step on f1; 2g. As a result arc (5; 1) and (3; 1) become tight and are added to Z. The dual objective function increases by 1 unit to 34 L = 4. The algorithm then performs a basic dual-ascent step on f3; 4g. As a result arc (2; 4) becomes tight and is added to Z. The dual objective function increases by 2 units to L = 6. Figure 3.2(c) shows the digraph at the conclusion of these two iterations. At this stage, sets f1; 2; 3; 5g and f2; 3; 4g are minimal ascent sets. The al- gorithm performs a basic dual-ascent set step on f1; 2; 3; 5g. As a result arc (s; 3) becomes tight and is added to Z. The dual objective function increases by 18 units to L = 24. The algorithm then tries to perform a basic dual-ascent step on f2; 3; 4g. However, the arc (s; 3) is already tight.Figure 3.2(d) shows the digraph at the conclusion of these two iterations. At this point, the network has no ascent set. Therefore, we apply the LIFO drop heuristic. The arcs of Z in the order they were added are (2; 1), (3; 4), (5; 1), (3; 1), (2; 4), and (s; 3). If we delete arc (s; 3) the network de ned by the arcs in Z does not contain a path from the source node to the demand nodes 1 and 4. Therefore, we retain arc (s; 3). Following we delete arc (2; 4) and observe that the network de ned by the remaining arcs in Z contains a path from the source node to the demand nodes 1 and 4. We continue to proceed in this fashion, deleting arcs in LIFO order from Z and checking that the network de ned by remaining arcs contains a path from the source node and the demand nodes 1 and 4. We nd the solution shown in Figure 3.2(e). The total cost of this solution is 24 units. Since the lower bound is 24 units, we have obtained the optimal solution to the DST problem. Now, we verify that the out degree of the source node is 1, and consequently, we drop the arti cial source node from the solution. We have found the optimal solution to 35 the ConFL problem with total cost equal to 4 (that is, 24 - ). 3.3.2 Local Search Phase In the second phase, we implement a basic version of local search to improve upon the solution yielded by dual-ascent. We search the neighborhood of the dual- ascent solution through a set of improvement steps for a solution of lower cost. If such a solution is found after the improvement steps have been completed, it replaces the current solution, and the search continues. In the search of a lower cost neighboring solution, we implement a set of steps that reconstructs the tree on the set of Steiner nodes S, open facilities F , and demand nodes; closes open facilities; and reassigns demand nodes as needed. Recall, at the conclusion of the dual-ascent phase, we have a feasible solution to the GConFL problem consisting of a set of open facilities F F , a set of Steiner nodes S S, and a tree solution on F , S, and D. Our local search phase works on the undirected graph associated with the GConFL problem. It tries to improve the solution provided by the dual-ascent procedure by using two types of improvement steps: (1) sequential improvements that try to delete Steiner nodes in S; and (2) local improvements that at each iteration strategically close a facility in F . In the sequential improvements, we construct a minimum spanning tree T on the set of open facility nodes, F , and Steiner nodes, S. Next, we iteratively remove any Steiner node from S that has degree 2 or less and reconstruct the minimum spanning tree, T . When the graph on (F [ S) is complete and edge costs satisfy the triangle inequality, this cannot deteriorate the 36 Open nfacilty oydpitcilty FDfnmnenp cDllfcilty 5 1 2 3 4 S 3 3 3 4 4 4 6 65 5 20 20 20 2 1 5 1 2 3 4 S 1 1 2 3 4 4 6 65 5 20 20 20 0 0 5 1 2 3 4 S 0 0 0 1 2 3 6 45 4 20 19 18 0 0 5 1 2 3 4 S 0 0 0 1 2 3 6 45 4 2 1 0 0 0 5 1 2 3 4 S (a) (b) (c) (d) (e) Figure 3.2: Dual-Ascent Example 37 objective. When the graph on (F [ S) is not complete or the edge costs do not satisfy triangle inequality, we verify that the cost of the minimum spanning tree T does not increase before removing the Steiner node. (An alternate procedure is to compute all pairs of shortest paths on G(F [ S) taking into account facility costs, and to (i) complete the graph G(F [ S), and (ii) set the cost of an edge to the cost of the shortest path between the two end points. This results in the situation where the graph on (F [ S) is complete and edge costs satisfy the triangle inequality.) At the end of the sequential improvements no Steiner nodes from S can be deleted without increasing the cost of T . Subsequently, in the local improvements we list open facility nodes rst in order of node degree in the tree T and next by the number of demand nodes it serves (obviously, each demand node is connected to the closest open facility on the tree T ). Then we move through the list at each iteration removing the next facility node from the solution, reassigning its demand nodes, recomputing the minimum spanning tree on the remaining open facilities and Steiner nodes, and computing the change in the solution cost, that we denote by . If we observe a saving in the solution cost, the facility node is permanently closed and removed from F ; otherwise, the facility node is restored to the solution. We iteratively repeat this process for each facility in the list. In the local improvements, the order in which open facilities are considered for removal is critical. Ordering the nodes in increasing order of node degree in T and number of demand nodes assignments seeks to maximize the savings with each removal based on the two roles that a facility node plays in the nal solution. The rationale is to aggressively remove facilities that serve the least number of demand 38 Figure 3.3: Pseudocode for local improvements nodes and that are farther out in the tree, T . Consequently, we rst remove facilities that are leaf nodes in the core tree in increasing order of their number of assigned demand nodes; and then, we move gradually into the tree attempting to close facil- ities with higher node degree in the tree, T . Our local improvements are described in pseudocode in Figure 3.3. For the local improvements, we tested three somewhat di erent implementations (and thus de nitions of neighboring solutions) that yield solutions with successively greater improvements. In the rst implementation, the set of Steiner nodes S is not updated until the completion of the local improvements. Hence, the local search phase cycles back and forth through sequential improvements followed by local improvements until there are no more improvements. In the sec- ond implementation, when is less than or equal to zero, at the time of updating 39 the set of open facility nodes F and the tree T ; in addition, we also eliminate any Steiner node that has degree 2 or less as speci ed in the sequential improvements. Hence, in this case at this step the set of Steiner nodes S is also updated, and the actual improvement is greater than or equal to . The third implementation tries to look ahead and computes by incorporating cost reductions by removing any Steiner node with degree 2 or less from the minimum spanning tree T obtained as a result of removing the facility node under consideration. Each of these implementations provides successively better solutions (although the improvements are very slight) on average. In terms of running time, we expected each of the implementations to take successively longer times, though we did not observe any time di erences on smaller graphs. Thus, we used the third implemen- tation in our local search phase. 3.4 Computational Experiments We now report on an extensive set of computational experiments with our DLS heuristic on the STS, GSTS, ConFL, and ROB problems. We coded our heuristics in Visual Studio 2005 (C++). We conducted all runs on an AMD AthlonTM 62 X2 Dual, 2.61 GHz machine with 3GB of RAM. 3.4.1 Problem Generation and Characteristics For the four problems, we generated a large set of Euclidean test problems with varying characteristics. We created ve sets of graphs in Euclidean space with 40 di erent characteristics in terms of number of D, F , and S nodes, facility opening costs, edge costs, and network density. We generated problems by rst selecting nodes randomly located on a 100 x 100 square grid. The Euclidean distances rounded up to the next integer (to preserve triangle inequality) were used as a based for the edge lengths. This problem generation method leads to the most di cult problem instances from a computational standpoint in the context of the STS problem (see Lee et al. 1996). The assignment edge costs are equal to the edge lengths between demand nodes and facility nodes, while tree edge costs are equal to the edge lengths multiplied by an M factor. The M factor illustrates the signi cantly higher (in terms of cost per unit distance) connection cost of edges in the tree T . The number of demand nodes and facility nodes vary between 10 and 90 in steps of 10, with the total number of demand and facility nodes equal to 100. In an instance of the problem, the facility opening costs are the same for all the facility nodes. For the STS, GSTS and ConFL problems, we generated two sets of \complete" instances with common variations. In Set 1 the facility opening cost varies between 0 and 30 in steps of 10 while the M factor is xed at 3. In Set 2 the M factor takes values 1, 3, 5 and 7, while the facility opening cost is kept xed at 30. In the GSTS problem instances, any demand node may host a facility. In the ConFL problem instances, an additional 20 pure potential Steiner nodes were created. Edges were created between the pure Steiner nodes and all of the facility nodes. Since demand nodes cannot be assigned to pure Steiner nodes, no edges were created between demand nodes and pure Steiner nodes. For the ROB problem we used test instances from Set 2 for the GSTS problem and set the facility opening cost to zero. We refer to 41 this set of instances as Set 3. Notice, in these three sets of instances, any demand node may be assigned to any facility node and the graph G(F [ S) is complete. Since our DLS heuristic performed exceedingly well on these complete instances, to test the e ect of sparsity on the performance of our heuristic, we generated an additional set of test instances, Set 4, for the ConFL problem. We focused on the ConFL problem since we found, from our experiments on Sets 1-3, that they were the hardest to solve amongst the four problems. In this test set the assignment edges and the tree edges were created with a given probability. In addition, to ensure that the instances were connected we randomly constructed a cycle of tree edges through all the facility and potential Steiner nodes; and nally, we veri ed that there was at least one assignment edge for each demand node. We varied the sparsity of the test instances by using edge creation probabilities of 0.25, 0.50 and 0.75. As in the previous sets of ConFL instances, the number of facility nodes varies between 10 and 90 in steps of 10, and the number of demand nodes is 100 minus the number of facility nodes and the number of pure Steiner nodes is 20. The facility opening cost was 30 and the M factor was set to 7. As we noted earlier by recomputing edge costs and completing the graph on G(F [ S), we can in general assume that the graph G(F [ S) is complete. Thus we hypothesized that in terms of sparsity, the sparsity between demand and facility nodes is more problematic than sparsity between facility nodes. To test this assertion we created a new ConFL test set, Set 5, where the bipartite graph between demand nodes and facility nodes is complete while the graph between facility nodes and potential Steiner nodes is sparse. We varied the sparsity by using edge creation 42 probabilities of 0.25, 0.50, and 0.75, for the tree edges. The problem instances in Set 5 are always feasible; however, to ensure that they were somewhat equivalent in characteristics to Set 4 (i.e., that all facility and Steiner nodes are potential candidates for the nal solution) we randomly create a cycle of tree edges through all the facility and pure Steiner nodes. The remaining characteristics are identical to Set 4. 3.4.2 UFL Heuristic We were interested in knowing whether the high-quality of the DLS heuristic could be largely attributed to the local search phase. In other words, we wanted to see if the dual-ascent phase provided a high quality initial solution by its selection of facilities to open and Steiner tree T connecting them, or whether any other reason- able choice of a starting solution followed by the local search phase would produce solutions of similar quality to the DLS heuristic. Consequently, we proposed the following UFL heuristic to compare against the DLS heuristic. We rst ignore the requirement that the open facilities must be connected to each other by a Steiner tree, and solve the UFL problem (between demand nodes and facility nodes) opti- mally to obtain a set of facilities to open and the demand nodes assigned to them. We then nd a Steiner tree T connecting the open facility nodes by applying dual- ascent for the DST problem to obtain a Steiner tree on G(F [ S) connecting the set of open facilities. We then use this solution as the starting solution to our local search phase. To solve the uncapacitated facility location problem to optimality 43 we used CPLEX 10. To obtain the Steiner tree connecting the open facilities, we transformed the graph G = (V;E) to a directed graph H = (V;A) as described in x3.2 except that (i) we use one of the open facilities as the root node and do not create an arti cial node (thus no degree constraint is necessary), (ii) all D nodes were deleted, and (iii) the remaining open facilities are the required nodes in our directed Steiner tree. 3.4.3 Results on Complete Graphs We are now ready to discuss our computational experiments on the di erent problems. We compare the upper bounds provided by only applying dual-ascent (DA), the DLS heuristic, and the UFL heuristic. We compare the quality of these results by reporting the gaps between the upper bounds provided by the three heuristics and the lower bound obtained by the dual-ascent phase. Each entry in the tables represents the average over 10 instances. 3.4.3.1 STS Problem Tables 3.1 and 3.2 present our computational results on Set 1 and Set 2, respectively. DA yields relatively good solutions with average gaps below 8.01% for each combination of parameters. However, in some instances the gap can be quite large, and in one instance this reaches 39.72%. Our local search phase is extremely e ective and reduces this gap considerably. Our DLS heuristic yields solutions with average gaps below 1.51% and for this speci c instance it lowered the gap to 2.80%. 44 From Table 3.1 it appears that the average percentage gap decreases when the facility opening cost increases. Table 3.2 indicates that the average percentage gap decreases when the M factor increases. The average percentage gap increases at rst as the proportion of demand nodes increases, before it decreases again. The location of this peak seems to increase as the M factor is increased. Overall, the performance of our heuristic is stable to a wide range of problem parameters. The average gaps are below 1.51%. The highest gap out of the 630 instances is 3.68% and the average gap computed over all 630 instances is 0.39%. The DLS heuristic is extremely fast and took less than 2 seconds in all instances. In contrast to the DLS heuristic, the UFL heuristic performed quite poorly. The average gaps of the UFL heuristic are signi cantly larger than the average gaps from the DLS heuristic. This indicates that the dual-ascent phase of the DLS heuristic does a signi cantly better job identifying which facilities to open than the UFL heuristic does. 45 jD j jF j f i = 0 f i = 10 f i = 20 f i = 30 D A DL S UF L D A DL S UF L D A DL S UF L D A DL S UF L 10 90 3.36 % 0.0 3% 4.82 % 0.00 % 0.00 % 3.10 % 0.00 % 0. 00 % 7.49 % 0.00 % 0. 00 % 5.69 % 20 80 7.27 % 0.9 1% 5.38 % 0.69 % 0.02 % 5.41 % 2.87 % 0. 43 % 4.62 % 0.98 % 0. 04 % 4.12 % 30 70 5.39 % 1.5 1% 6.00 % 6.66 % 1.14 % 5.39 % 8.01 % 0. 93 % 5.36 % 4.02 % 0. 72 % 4.26 % 40 60 3.66 % 0.9 5% 4.62 % 3.51 % 0.78 % 4.35 % 2.79 % 0. 51 % 5.27 % 3.77 % 0. 79 % 5.44 % 50 50 2.17 % 0.8 2% 3.72 % 2.23 % 0.61 % 4.04 % 4.19 % 0. 48 % 4.27 % 5.29 % 0. 74 % 4.12 % 60 40 1.91 % 0.6 9% 3.24 % 1.91 % 0.46 % 3.62 % 2.30 % 0. 47 % 2.97 % 2.24 % 0. 48 % 3.56 % 70 30 0.56 % 0.2 6% 1.50 % 1.22 % 0.41 % 1.66 % 2.12 % 0. 50 % 2.76 % 1.30 % 0. 40 % 2.52 % 80 20 0.28 % 0.1 7% 0.35 % 0.69 % 0.40 % 0.79 % 0.56 % 0. 31 % 0.83 % 1.04 % 0. 23 % 1.30 % 90 10 0.05 % 0.0 2% 0.06 % 0.25 % 0.05 % 0.11 % 0.07 % 0. 03 % 0.05 % 0.10 % 0. 02 % 0.20 % T abl e 3.1 : Compariso n of heuristic s fo r th e ST S pr oble m on Se t 1. M = 3, an d facili ty op enin g cost s ar e varie d jD j jF j M = 1 M = 3 M = 5 M = 7 D A DL S UF L D A DL S UF L D A DL S UF L D A DL S UF L 10 90 1.45 % 0.3 7% 5.87 % 0.00 % 0.00 % 5.69 % 0.00 % 0. 00 % 15.34 % 0.00 % 0. 00 % 15.34 % 20 80 5.78 % 0.7 5% 2.76 % 0.98 % 0.04 % 4.12 % 0.00 % 0. 00 % 4.69 % 0.00 % 0. 00 % 5.69 % 30 70 4.43 % 0.9 8% 3.22 % 4.02 % 0.72 % 4.26 % 1.59 % 0. 01 % 3.26 % 0.00 % 0. 00 % 2.82 % 40 60 3.43 % 0.4 3% 1.36 % 3.77 % 0.79 % 5.44 % 4.11 % 0. 40 % 5.25 % 3.68 % 0. 13 % 3.36 % 50 50 2.97 % 0.4 7% 1.08 % 5.29 % 0.74 % 4.12 % 5.26 % 0. 55 % 3.71 % 6.66 % 0. 37 % 5.00 % 60 40 2.97 % 0.4 4% 1.13 % 2.24 % 0.48 % 3.56 % 3.56 % 0. 55 % 4.59 % 4.10 % 0. 59 % 3.15 % 70 30 1.17 % 0.2 5% 0.50 % 1.30 % 0.40 % 2.52 % 2.81 % 0. 43 % 3.10 % 2.72 % 0. 24 % 4.07 % 80 20 0.16 % 0.0 9% 0.19 % 1.04 % 0.23 % 1.30 % 1.56 % 0. 19 % 2.59 % 3.13 % 0. 28 % 2.38 % 90 10 0.00 % 0.0 0% 0.03 % 0.10 % 0.02 % 0.20 % 0.17 % 0. 17 % 0.23 % 1.39 % 0. 45 % 0.23 % T abl e 3.2 : Compariso n of heuristic s fo r th e ST S pr oble m on Se t 2. f i = 30 , an d M facto r is varie d 46 3.4.3.2 GSTS Problem Our results for the GSTS problem are as promising as the results for the STS problem. Tables 3.3 and 3.4 present our computational results on Set 1 and Set 2. Recall, in these instances any demand node may host a facility. Compared to the STS problem the gaps for all three upper bound heuristics increase slightly on the GSTS problem. For DA, the highest average gap is 12.82% and the worst gap over 630 instances is 31.93%. Once again the local search phase achieves a signi cant improvement and the average gaps fell below 1.47%. The DLS continues to show a very consistent performance. Out of 630 instances, the worst gap for the DLS heuristic is 4.00% and the average gap over all instances is 0.74%. As the proportion of demand nodes and facility nodes is varied, it appears that instances with a low proportion of demand nodes (10%) are easier, while the remaining instances show similar gaps. Again the DLS heuristic is extremely fast taking at most 3 seconds to obtain a solution for an instance. 47 jD j jF j f i = 0 f i = 10 f i = 20 f i = 30 D A DL S UF L D A DL S UF L D A DL S UF L D A DL S UF L 10 90 0.00 % 0. 00 % 5.50 % 0.00 % 0. 00 % 5.27 % 0.00 % 0.0 0% 5.08 % 0.00 % 0.00 % 13.01 % 20 80 2.82 % 0. 72 % 3.23 % 5.77 % 0. 29 % 3.80 % 3.05 % 0.1 6% 4.77 % 2.55 % 0.20 % 3.84 % 30 70 3.20 % 0. 81 % 2.92 % 9.33 % 1. 47 % 3.22 % 8.77 % 1.0 1% 6.09 % 7.32 % 0.70 % 4.78 % 40 60 3.40 % 0. 89 % 2.93 % 8.48 % 1. 16 % 5.17 % 8.01 % 0.7 8% 4.85 % 8.30 % 1.05 % 5.62 % 50 50 4.13 % 0. 93 % 2.79 % 9.28 % 1. 28 % 4.42 % 12.82 % 1. 46 % 4.31 % 10.62 % 0.9 9% 4.72 % 60 40 2.07 % 0. 59 % 3.02 % 8.38 % 0. 98 % 4.53 % 9.35 % 1.1 2% 4.22 % 8.55 % 1.25 % 3.63 % 70 30 2.85 % 0. 99 % 2.83 % 7.59 % 1. 34 % 5.33 % 9.93 % 1.0 6% 4.20 % 9.01 % 1.22 % 4.15 % 80 20 2.39 % 0. 77 % 3.50 % 9.61 % 1. 33 % 4.85 % 10.67 % 1. 22 % 5.00 % 10.32 % 1.3 1% 5.00 % 90 10 2.22 % 0. 83 % 3.45 % 7.82 % 1. 06 % 4.33 % 10.88 % 1. 14 % 4.15 % 8.77 % 0.74 % 5.14 % T abl e 3.3 : Compariso n of heuristic s fo r th e GST S pr oble m on Se t 1. M = 3, an d facili ty op enin g co st s ar e varie d jD j jF j M = 1 M = 3 M = 5 M = 7 D A DL S UF L D A DL S UF L D A DL S UF L D A DL S UF L 10 90 2.06 % 0. 42 % 5.10 % 0.00 % 0. 00 % 13.01 % 0.00 % 0.0 0% 20.66 % 0.00 % 0.0 0% 20.66 % 20 80 12.63 % 1.42 % 3.77 % 2.55 % 0. 20 % 3.84 % 0.00 % 0.0 0% 3.95 % 0.00 % 0.0 0% 6.00 % 30 70 5.81 % 0. 63 % 2.88 % 7.32 % 0. 70 % 4.78 % 2.57 % 0.1 1% 4.90 % 0.00 % 0.0 0% 4.03 % 40 60 7.91 % 0. 60 % 2.11 % 8.30 % 1. 05 % 5.62 % 6.10 % 0.6 9% 3.68 % 2.26 % 0.1 6% 2.22 % 50 50 6.97 % 0. 59 % 2.27 % 10.62 % 0. 99 % 4.72 % 4.22 % 0.5 5% 5.28 % 4.85 % 0.5 8% 4.71 % 60 40 7.22 % 0. 42 % 1.48 % 8.55 % 1. 25 % 3.63 % 9.78 % 1.2 1% 4.68 % 6.12 % 1.1 8% 5.53 % 70 30 5.18 % 0. 48 % 1.79 % 9.01 % 1. 22 % 4.15 % 11.09 % 1. 11 % 5.01 % 3.90 % 0.4 4% 5.57 % 80 20 5.17 % 0. 40 % 1.86 % 10.32 % 1. 31 % 5.00 % 5.62 % 0.8 9% 5.81 % 7.65 % 0.9 4% 5.20 % 90 10 5.86 % 0. 58 % 1.55 % 8.77 % 0. 74 % 5.14 % 8.82 % 0.9 6% 5.57 % 12.23 % 1. 13 % 5.01 % T abl e 3.4 : Compariso n of heuristic s fo r th e GST S proble m on Se t 2. f i = 30 , an d M facto r is varie d 48 The solutions obtained by the UFL heuristic on the GSTS problem are quite poor. Again, we can state that the dual-ascent phase does an excellent job nding a subset of facility nodes to open (and Steiner nodes to use). The di erence between the DLS and UFL heuristics is more marked when the number of facility nodes in an instance is large. Here the problem of identifying the set of facility nodes to open is combinatorially more challenging, and in these instances the UFL heuristic does very badly. 3.4.3.3 ROB Problem We now focus our attention on the ROB problem. Table 3.5 summarizes our results. The ROB problem with M = 1 is essentially a Steiner tree problem. Consequently, for M = 1 DA yields high-quality solutions for all combinations of demand nodes and facility nodes, which our local search phase improves even further. As the M factor increases, on average the average gap for DA and the DLS heuristic increase. The average gap for DA is less than 8% for all combinations of parameters. With the addition of the local search phase the results are even better. Our DLS heuristic has average gaps below 0.5% for M = 1, and below 2.02% for all other combinations of parameters. Over the entire set of 360 instances, the average gap is below 0.70% for the DLS heuristic. At the instance level, the highest gap for the DLS heuristic remains below 4%, while the worst gap for DA over the 360 instances is 22.29%. Once again, the consistency of the results obtained by the DLS heuristic is quite compelling. Again the DLS heuristic is extremely fast taking at most 3 49 seconds to obtain a solution for an instance. Recall, in the case of the ROB problem facilities can be opened at any node in the graph including demand nodes and the facility opening cost is zero. Con- sequently, for the UFL heuristic, the optimal solution to the uncapacitated facility location problem is to open a facility at each demand node. Thus, the starting solution for the UFL heuristic is simply the set of facilities at the demand nodes, which later is enhanced by dual-ascent to include Steiner nodes and by the local search to improve the total solution cost. Again, the average gaps obtained by UFL are consistently worse than the ones achieved by the DLS heuristic. Speci cally, at the instance level in 351 out of 360 instances the solution from the DLS heuristic was superior to the UFL heuristic. 50 jD j jF j M = 1 M = 3 M = 5 M = 7 D A DL S UF L D A DL S UF L D A DL S UF L D A DL S UF L 10 90 2.30 % 0.35 % 1.93 % 0.00 % 0.00 % 5.50 % 0.00 % 0.00 % 6.17 % 0.00 % 0.00 % 6.17 % 20 80 0.90 % 0.15 % 0.66 % 2.82 % 0.72 % 3.23 % 2.51 % 0.48 % 3.08 % 0.00 % 0.00 % 3.95 % 30 70 0.49 % 0.00 % 0.58 % 3.20 % 0.81 % 3.04 % 3.94 % 1.02 % 3.65 % 1.26 % 0.07 % 3.46 % 40 60 0.38 % 0.04 % 0.43 % 3.40 % 0.89 % 2.93 % 2.69 % 1.39 % 4.21 % 7.94 % 1.14 % 4.79 % 50 50 0.33 % 0.08 % 0.22 % 4.13 % 0.93 % 2.79 % 5.52 % 1.19 % 4.21 % 5.13 % 1.19 % 3.73 % 60 40 0.09 % 0.02 % 0.16 % 2.07 % 0.59 % 3.02 % 4.35 % 1.37 % 5.79 % 5.74 % 1.62 % 4.93 % 70 30 0.10 % 0.02 % 0.18 % 2.85 % 0.99 % 2.83 % 4.55 % 1.33 % 4.19 % 3.55 % 1.37 % 4.51 % 80 20 0.02 % 0.00 % 0.11 % 2.39 % 0.77 % 3.50 % 4.40 % 1.21 % 5.04 % 5.95 % 1.74 % 5.40 % 90 10 0.00 % 0.00 % 0.06 % 2.22 % 0.83 % 3.45 % 2.95 % 1.04 % 4.62 % 5.43 % 2.02 % 6.00 % T abl e 3.5 : Compariso n of heuristic s fo r th e R O B proble m on Se t 3. M facto r is varie d 51 3.4.3.4 ConFL Problem Tables 3.6 and 3.7 summarize our computational results for Set 1 and Set 2, respectively. The gaps for DA exhibit a concave behavior as the fraction of demand nodes increases. For a low proportion of demand nodes the gaps are low. They rapidly jump as the proportion of demand nodes increases before decreasing again as the proportion of demand nodes gets higher. For DA the average percentage gaps get as high as 16%. On the other hand, the average percentage gaps for the DLS heuristic are always below 4.27%. In all cases the local search phase signi cantly improved the upper bounds obtained from the dual-ascent phase. Over 630 instances the worst gap for the DLS heuristic was 7.72%, in contrast to a worst gap of 29.67% for DA. Averaged over the 630 instances the average gap for the DLS heuristic was 1.74%. We observed that as the facility opening costs increase the average gaps for the DLS heuristic increase. In contrast, as the M factor increases the average gap for the DLS heuristic rst increases but later decreases reaching the maximum at M = 3. In summary, our results continue to indicate that the DLS heuristic?s performance is stable over a wide range of parameters. We note that the DLS heuristic took at most 4.74 seconds to solve an instance of the ConFL problem. The performance of the UFL heuristic on the ConFL instances was signi cantly worse than the DLS heuristic. As before, it appears the dual-ascent phase is quite consistent in identifying a good starting solution for the local search phase. 52 jD j jF j f i = 0 f i = 10 f i = 20 f i = 30 D A DL S UF L D A DL S UF L D A DL S UF L D A DL S UF L 10 90 0.54 % 0.08 % 1.76 % 2.30 % 0.48 % 4.49 % 2.18 % 0.30 % 8.43 % 0.00 % 0.00 % 10.59 % 20 80 6.79 % 0.91 % 5.52 % 12.41 % 3.16 % 4.94 % 11.38 % 3.05 % 5.17 % 15.62 % 3.95 % 7.90 % 30 70 4.86 % 1.57 % 5.32 % 9.95 % 2.80 % 5.33 % 14.24 % 4.27 % 5.71 % 14.13 % 3.70 % 5.35 % 40 60 5.31 % 1.80 % 4.83 % 9.80 % 2.63 % 6.19 % 8.97 % 3.02 % 6.69 % 14.13 % 3.97 % 6.17 % 50 50 3.83 % 1.43 % 3.37 % 7.90 % 2.45 % 3.94 % 8.73 % 2.94 % 5.55 % 10.08 % 3.27 % 4.85 % 60 40 3.50 % 0.96 % 2.63 % 5.51 % 1.80 % 3.77 % 6.86 % 2.29 % 4.31 % 7.54 % 3.04 % 4.93 % 70 30 1.28 % 0.81 % 1.69 % 2.70 % 1.13 % 1.93 % 4.88 % 1.57 % 3.46 % 5.44 % 1.89 % 3.23 % 80 20 0.53 % 0.28 % 14.98 % 1.02 % 0.57 % 1.18 % 1.84 % 0.79 % 1.24 % 3.24 % 1.08 % 1.28 % 90 10 0.23 % 0.17 % 0.17 % 0.23 % 0.02 % 0.13 % 0.43 % 0.05 % 0.30 % 0.09 % 0.04 % 0.28 % T abl e 3.6 : Compariso n of heuristic s fo r th e ConF L pro ble m on Se t 1. M = 3, an d facili ty op enin g cost s ar e varie d jD j jF j M = 1 M = 3 M = 5 M = 7 D A DL S UF L D A DL S UF L D A DL S UF L D A DL S UF L 10 90 10.30 % 3. 25 % 5.83 % 0.00 % 0.00 % 10.59 % 0.00 % 0. 00 % 10.59 % 0.00 % 0. 00 % 10.59 % 20 80 12.21 % 3. 15 % 4.22 % 15.62 % 3.95 % 7.90 % 8.29 % 1. 11 % 5.76 % 0.94 % 0. 01 % 6.08 % 30 70 7.90 % 2.4 7% 2.93 % 14.13 % 3.70 % 5.35 % 7.74 % 2. 97 % 5.72 % 3.17 % 1. 36 % 6.67 % 40 60 7.18 % 1.9 2% 3.24 % 14.13 % 3.97 % 6.17 % 12.23 % 3.76 % 5.37 % 8.22 % 2. 73 % 6.43 % 50 50 5.32 % 1.6 3% 1.53 % 10.08 % 3.27 % 4.85 % 7.83 % 3. 03 % 5.93 % 7.69 % 3. 01 % 7.56 % 60 40 4.78 % 1.4 0% 2.12 % 7.54 % 3.04 % 4.93 % 9.05 % 3. 54 % 5.56 % 8.39 % 2. 41 % 6.57 % 70 30 2.59 % 0.8 4% 0.75 % 5.44 % 1.89 % 3.23 % 4.83 % 2. 08 % 4.66 % 6.17 % 2. 24 % 6.14 % 80 20 1.56 % 0.3 3% 0.32 % 3.24 % 1.08 % 1.28 % 3.06 % 1. 49 % 2.27 % 4.43 % 1. 64 % 3.19 % 90 10 0.00 % 0.0 0% 0.19 % 0.09 % 0.04 % 0.28 % 0.30 % 0. 28 % 0.47 % 1.14 % 0. 56 % 0.51 % T abl e 3.7 : Compariso n of heuristic s fo r th e ConF L pro ble m on Se t 2. f i = 30 , an d M facto r is varie d 53 3.4.4 Results on Sparse Instances Table 3.8 summarizes our computational results on Set 4. Recall that Set 4 was created to understand the performance of the DLS heuristic for sparse instances, and that we focussed our attention on the ConFL problem as it was the hardest (in terms of gaps) of the four problems to solve on complete graphs. Table 3.8 indicates that as the graph gets sparser the average gaps increase considerably for all three upper bound heuristics. Average gaps for DA go up to 18.12% and for the DLS heuristic increase to 14.36%. For individual instances, the worst case gap for DA is 37.16% while the worst case gap for the DLS heuristic is 22.38%. Notice, however, that both heuristics yield signi cantly better solutions than the UFL heuristic which is disastrous. Its worst case gap for individual instances is 151.04% and the average gap gets as high as 76.74%. In interpreting the results in Table 3.8 we wanted to understand whether the large gaps for the DLS heuristic on sparse instances were due to the quality of the upper bounds or the lower bounds produced by the dual-ascent phase. To address this issue we were very fortunate to access Dr. Ljubi c?s state of the art B&C code (Ljubi c 2007) as an alternate method of generating lower bounds. We limited the running time of the B&C code to 1 hour for each instance, and use the better lower bound from both the B&C code and the dual-ascent procedure to recalculate the average gaps for our DLS heuristic. These recomputed average gaps are shown in Table 3.9. The column DLS indicates the average gaps obtained using the dual- ascent phase?s lower bound, while the column DLS? indicates the recomputed average 54 Probability of Edge Creation jDj jF j 0.25 0.50 0.75 DA DLS UFL DA DLS UFL DA DLS UFL 10 90 10.63% 9.72% 74.71% 5.48% 5.17% 76.74% 3.71% 2.57% 43.29% 20 80 18.12% 14.36% 50.45% 7.15% 5.44% 38.55% 4.51% 2.90% 34.48% 30 70 12.36% 10.44% 38.68% 12.64% 7.92% 25.71% 10.50% 4.99% 25.04% 40 60 16.15% 12.28% 32.09% 11.18% 7.31% 24.40% 10.52% 4.54% 15.66% 50 50 12.87% 8.86% 24.41% 8.75% 5.33% 16.87% 10.85% 4.86% 12.03% 60 40 9.28% 7.06% 20.31% 9.47% 5.45% 14.78% 9.53% 4.77% 10.59% 70 30 7.69% 4.81% 14.56% 7.25% 4.26% 11.34% 8.00% 3.49% 8.42% 80 20 3.98% 3.06% 5.06% 4.50% 2.73% 5.57% 4.69% 2.20% 4.88% 90 10 0.30% 0.11% 0.16% 0.29% 0.22% 1.46% 1.34% 0.56% 1.09% Table 3.8: Comparison of heuristics for the ConFL problem on Set 4 Probability of Edge Creation jDj jF j 0.25 0.50 0.75 DLS? DLS Imp DLS? DLS Imp DLS? DLS Imp 10 90 1.81% 9.72% 9 1.26% 5.17% 9 1.42% 2.57% 4 20 80 4.78% 14.36% 10 0.38% 5.44% 10 0.20% 2.90% 9 30 70 3.03% 10.44% 10 3.09% 7.92% 10 2.48% 4.99% 7 40 60 4.50% 12.28% 10 2.92% 7.31% 9 2.82% 4.54% 5 50 50 1.48% 8.86% 10 0.74% 5.33% 10 2.59% 4.86% 7 60 40 1.52% 7.06% 10 1.26% 5.45% 10 2.01% 4.77% 8 70 30 0.29% 4.81% 10 1.04% 4.26% 10 0.66% 3.49% 10 80 20 0.50% 3.06% 10 0.45% 2.73% 10 0.31% 2.20% 10 90 10 0.11% 0.11% 0 0.05% 0.22% 4 0.14% 0.56% 8 Table 3.9: Performance of the DLS heuristic computed using the best lower bound on Set 4 gaps. The column Imp indicates the number of times over the 10 instances B&C was able to improve the lower bound. In most cases average gaps decrease to one third of their previous value. B&C is able to obtain tighter lower bounds in practically all of the very sparse (probability 0.25) instances except for those with 90 demand nodes. However, as the network becomes more dense it becomes a little harder for B&C to improve the lower bound and even when the lower bound is improved the relative improvement is much smaller. We note that the running times of the B&C code increase as the density of the graph increases. 55 The analysis in Table 3.9 indicates that DLS heuristic performs quite well and is consistent. As the density of the graph increases its performance improves. The average gaps are always lower than 4.78%. Over the 270 instances, the average gap of the DLS heuristic is 1.55%. The worst gap over the 270 instances for the DLS heuristic drops to 16.43%. Incidentally, the instance with the 16.43% gap is one where the lower bound from B&C is worse than the dual-ascent lower bound (meaning that even after 1 hour of running time it is unable to improve the dual- ascent lower bound). In summary, the quality of the solutions obtained by the DLS heuristic are consistently of high quality, even for sparse problems. We now report on our computational experience with Set 5. Recall instances in Set 5 are sparse problems where the bipartite graph between demand nodes and facility nodes is complete. Table 3.10 summarizes the results of the three heuristics with the gaps computed using the dual-ascent lower bound. These results are quite similar to those for the three heuristics on Set 1 and Set 2. In other words, the quality of the solutions provided by the DLS heuristic are quite high. In particular, the average gap of the DLS heuristic is below 2.46% (or 2.05% when computed using the best lower bound). Over the 270 instances the worst gap for the DLS heuristic is 5.09% (or 4.72% with the best lower bound) in contrast to the worst gap for DA which is 34.58%. The average gaps for the DLS heuristic increase as the instances become denser. Averaged over the 270 instances the DLS heuristic has a gap of 1.12% (or 0.46% with the best lower bound). Table 3.11 shows the gaps for the DLS heuristic gaps recomputed using the best of the lower bounds from the B&C code and dual-ascent. The results in the 56 Probability of Edge Creation jDj jF j 0.25 0.50 0.75 DA DLS UFL DA DLS UFL DA DLS UFL 10 90 0.00% 0.00% 14.39% 0.00% 0.00% 11.91% 0.00% 0.00% 10.59% 20 80 0.00% 0.00% 12.48% 0.00% 0.00% 9.15% 0.00% 0.00% 5.59% 30 70 2.54% 0.80% 5.99% 3.43% 0.72% 5.50% 3.38% 1.18% 4.43% 40 60 7.23% 0.96% 8.74% 5.59% 1.47% 7.36% 9.91% 2.46% 7.30% 50 50 4.31% 1.47% 7.02% 7.52% 1.94% 6.10% 6.98% 1.78% 6.18% 60 40 2.56% 1.30% 11.57% 6.01% 2.19% 8.98% 8.62% 2.43% 6.34% 70 30 4.07% 1.42% 6.41% 4.94% 1.74% 6.10% 5.27% 2.32% 5.35% 80 20 2.71% 1.24% 4.57% 6.50% 1.55% 4.34% 3.72% 1.90% 3.77% 90 10 1.00% 0.48% 3.08% 0.64% 0.29% 2.21% 1.28% 0.72% 1.02% Table 3.10: Comparison of heuristics for the ConFL problem on Set 5 Probability of Edge Creation jDj jF j 0.25 0.50 0.75 DLS? DLS Imp DLS? DLS Imp DLS? DLS Imp 10 90 0.00% 0.00% 0 0.00% 0.00% 0 0.00% 0.00% 0 20 80 0.00% 0.00% 0 0.00% 0.00% 0 0.00% 0.00% 0 30 70 0.29% 0.80% 7 0.24% 0.72% 5 0.71% 1.18% 5 40 60 0.34% 0.96% 10 0.64% 1.47% 6 2.05% 2.46% 4 50 50 0.44% 1.47% 10 0.86% 1.94% 10 0.96% 1.78% 4 60 40 0.20% 1.30% 10 1.04% 2.19% 10 1.08% 2.43% 8 70 30 0.35% 1.42% 10 0.60% 1.74% 10 0.69% 2.32% 9 80 20 0.43% 1.24% 9 0.35% 1.55% 10 0.75% 1.90% 9 90 10 0.14% 0.48% 7 0.10% 0.29% 7 0.23% 0.72% 10 Table 3.11: Performance of the DLS heuristic computed using the best lower bound on Set 5 table indicate that B&C yielded a slightly smaller number of improvements, and at the same time the improvements in the lower bound were less pronounced. In summary, we can conclude that sparsity between demand nodes and facility nodes causes a weaker dual-ascent lower bound and hence a wider gap when it is used to compute the average gaps. In other words, ConFL problems with sparsity between demand nodes and facility nodes are hardest for our DLS heuristic. 57 3.4.5 Large-Scale Instances and Comparison to Ljubi c?s VNS heuris- tic We tested the DLS heuristic on a set of large-scale instances introduced by Ljubi c (2007) for the ConFL problem. She constructed ConFL test problems by combining Steiner tree problem instances from OR-Library and UFL problem in- stances from U Lib (see Ljubi c (2007) for more details about these test instances). These instances are representative for each type of problem. However, their com- bination departs from the typical convention that we have seen in the literature on the four problems in that tree edges are typically more expensive per unit length than assignment edges. In the instances constructed in Ljubi c (2007) the cost of tree edges is given by the Steiner tree instances while the cost of the assignment edges is determined by the UFL instances with no attempt to scale these costs (and thus the per unit length cost of tree edges is cheaper than the per unit length cost of assignment edges). However, testing the DLS heuristic on these instances gives us the opportunity to evaluate its performance on larger instances (up to 1300 nodes) that are on non-Euclidean graphs, and compare their performance against a vari- able neighborhood search (VNS) heuristic that was used in Ljubi c (2007). In these instances knowledge of an open facility in the solution was assumed a priori. Con- sequently, we modi ed our DLS heuristic to incorporate this knowledge. Hence, in the dual-ascent phase we simply use the open facility as the root node, and do not create an arti cial source node s or impose a unit degree constraint. In addition, in the local search phase this open facility is never removed even if its removal would 58 result in a lower cost solution. Table 3.12 shows the results obtained by DA, the DLS heuristic, and Ljubi c?s VNS procedure. (To compute the gaps we use dual-ascent?s lower bounds since it is our understanding that there is an error (Ljubi c 2009) in the lower bounds reported on Ljubi c (2007).) Over the 48 test instances reported in Table 3.12, the worst gap of DA is 17.23%, VNS?s worst gap is 21.16%, while the DLS heuristic?s worst gap is 8.63%. Averaged over the 48 instances, the average gap of the DLS heuristic is 3.83%, while VNS?s average gap is 5.75%. The DLS heuristic nds a better solution than the VNS procedure in 36 out of the 48 instances. The average computational time required by VNS (these are as reported in Ljubi c 2007) and our DLS heuristic appear to be quite similar and around 500 seconds. The running times reported in Ljubi c (2007) are the average over ten runs of the VNS procedure, and the solution reported is the best over the ten instances. So a more accurate assessment of the running time of the VNS procedure would be a tenfold increase in the running times reported. However, our heuristic shows a lot more variability with running times as high as 5422.2 seconds. On the other hand the gaps of the DLS heuristic are lower, have much less variability, and are quite stable to a wide range of parameters. We should note that the excessive running times for the DLS heuristic seem to occur in the last group of 16 instances. If we use the second implementation of the local improvements in the local search phase instead, the running time goes down consid- erably and the performance actually does not deteriorate. For example, the instance that takes 5422.2 seconds takes 1640.5 seconds with the second implementation, and its gap increases from 0.45% to 0.46%. Consequently, when computational time is 59 important and the instance has a large number of nodes, we could either impose a time limit to the DLS heuristic, or use an alternate implementation (i.e., the second implementation). 3.5 Conclusions In this chapter we considered a family of four important network design prob- lems that combine facility location with connectivity requirements. We provided a common framework and methodology to address these four problems. In particular, we devise a DLS heuristic that works in two phases. In the rst phase it applies dual-ascent to obtain both a lower bound and an initial solution to the problem. In the second phase it applies local search, limiting its attention to the set of facilities and Steiner nodes selected in the rst phase. From a broad algorithmic perspec- tive our work is closely related to the primal-dual algorithm by Swamy and Kumar (2004), with approximation ratio 8.55. However, while Swamy and Kumar?s focus is to develop approximation algorithms with provable worst case bounds, our goal is to nd tight formulations that combined with local search perform e ectively in prac- tice. In that sense our approach is more comprehensive. Although the rst phase of Swamy and Kumar?s heuristics attempts to incorporate some of the connectivity requirement costs in the selection of open facilities (by insisting that each facility serve a minimum number of demand points), the nal cost of the core tree network is only considered in the second phase when facilities have already been selected. Our approach selects the open facilities and constructs a Steiner tree among open 60 facilities simultaneously ensuring a global treatment of the problem. Our exten- sive computational experiments show that across the family of problems, our DLS heuristic obtains high quality solutions rapidly. Further, the results are quite consis- tent in the sense that the variance of the performance gap is quite low, and smaller than the other heuristics considered for the problem. Among the four problems, the ConFL problem seems to be the hardest to solve for the DLS heuristic. In particular ConFL instances with higher sparsity between demand nodes and facility nodes are harder for our DLS Our heuristic can be viewed as one that successfully partners mathematical programming approaches (i.e., dual-ascent) with local search. As our experiments show both phases of the heuristic strategy contribute to its success. When we replaced the DA solution by an initial solution from the UFL heuristic, the local search phase found solutions that were signi cantly worse. The local search phase also signi cantly improves the solution obtained from the dual-ascent phase. We should note that on the large scale ConFL problem instances our DLS procedure signi cantly outperformed a state-of-the-art VNS procedure. In contrast to other heuristic methodologies, another signi cant advantage of our DLS heuristic is that it provides a high-quality lower bound along with each solution; thus providing a guarantee on the quality of the solution. 61 Instances Gap Time (sec) UFLP STP DA DLS VNSbest DA DLS VNSaverage mp1 c5 5.88% 4.29% 13.14% 68.03 69.34 98.60 mp2 c5 9.84% 5.34% 9.95% 87.44 89.61 363.10 mp1 c10 5.76% 4.59% 9.58% 61.55 63.00 389.00 mp2 c10 5.41% 5.13% 5.85% 78.34 79.05 187.20 mp1 c15 5.45% 4.94% 7.75% 77.91 79.58 406.40 mp2 c15 9.34% 6.59% 9.18% 84.41 85.00 186.10 mp1 c20 5.35% 4.66% 6.05% 128.30 128.95 300.30 mp2 c20 9.28% 4.21% 5.72% 95.38 95.66 305.20 mq1 c5 11.08% 6.78% 21.16% 355.73 357.52 177.70 mq2 c5 12.74% 5.32% 16.20% 413.98 416.00 353.00 mq1 c10 10.79% 6.62% 10.67% 332.91 334.61 365.00 mq2 c10 17.23% 4.79% 10.29% 387.47 392.13 340.10 mq1 c15 10.63% 7.06% 12.21% 346.64 347.89 528.90 mq2 c15 11.96% 4.50% 6.01% 367.23 367.81 250.50 mq1 c20 10.48% 7.15% 10.83% 376.11 376.75 401.20 mq2 c20 16.65% 4.73% 7.40% 667.44 668.59 375.50 mp1 d5 6.00% 4.06% 5.52% 62.36 65.88 402.90 mp2 d5 15.70% 5.64% 5.32% 84.08 87.61 482.30 mp1 d10 5.73% 4.25% 6.19% 78.80 82.39 366.80 mp2 d10 9.70% 5.08% 3.01% 107.34 111.03 365.00 mp1 d15 5.44% 4.97% 6.62% 65.52 68.25 328.50 mp2 d15 14.28% 6.37% 5.02% 86.92 90.89 379.00 mp1 d20 5.31% 4.22% 4.83% 118.86 120.86 453.40 mp2 d20 9.27% 6.66% 2.86% 359.92 361.61 321.90 mq1 d5 10.88% 7.11% 9.12% 339.34 346.48 508.10 mq2 d5 8.81% 5.24% 11.52% 391.75 395.56 460.70 mq1 d10 10.87% 5.88% 8.78% 360.88 364.67 511.10 mq2 d10 12.30% 4.31% 6.43% 422.13 430.11 593.80 mq1 d15 10.68% 6.42% 6.77% 354.23 358.02 652.80 mq2 d15 12.15% 4.64% 7.45% 482.49 488.17 627.00 mq1 d20 10.59% 7.32% 10.47% 474.31 477.78 490.40 mq2 d20 16.57% 8.62% 6.47% 535.13 536.89 495.50 gs250a 1 c5 0.73% 0.52% 0.36% 10.41 292.81 523.50 gs250a 2 c5 0.59% 0.37% 0.30% 5.58 364.83 458.30 gs250a 1 c10 0.57% 0.37% 0.39% 6.09 272.66 668.50 gs250a 2 c10 0.69% 0.54% 0.38% 6.61 406.28 341.70 gs250a 1 c15 0.66% 0.45% 0.34% 15.13 218.06 548.60 gs250a 2 c15 0.55% 0.38% 0.27% 16.75 103.16 598.30 gs250a 1 c20 0.67% 0.41% 0.36% 140.13 160.53 598.00 gs250a 2 c20 0.67% 0.41% 0.28% 121.28 144.88 697.40 gs500a 1 c5 0.58% 0.41% 0.68% 36.42 2312.39 838.10 gs500a 2 c5 0.66% 0.45% 0.72% 40.73 5422.21 845.90 gs500a 1 c10 0.51% 0.41% 0.63% 27.55 1595.16 881.10 gs500a 2 c10 0.52% 0.37% 0.66% 29.28 1373.59 939.50 gs500a 1 c15 0.60% 0.39% 0.61% 84.63 1081.00 928.00 gs500a 2 c15 0.51% 0.36% 0.60% 138.77 837.94 871.20 gs500a 1 c20 0.58% 0.38% 0.55% 568.05 717.47 943.80 gs500a 2 c20 0.57% 0.36% 0.51% 111.27 216.88 906.00 Table 3.12: Comparison of the DLS heuristic with the VNS procedure on large-scale instances 62 Chapter 4 The Stochastic Connected Facility Location Problem 4.1 Introduction Most of the applications presented in Chapter 2 have associated uncertainty which is often simpli ed in the abstraction of the problem. For example, in the caching problem described by Krick et al. (2003), the exact number of read and write requests is unknown at the time that the network is designed. Similarly, in the problem described by Nuggehalli et al. (2003) the assignment cost might not be revealed until the last moment when the assignment edges are rented. In both problems the literature assumes average values as an approximation; however, under some circumstances we might wish to obtain the solution of the stochastic problem, perhaps to assess the validity of the approximation. In this chapter we seek to explore the value of explicitly modeling uncertainty into the ConFL problem. Here we introduce a variant of the Connected Facility Location (ConFL) problem that addresses these uncertainties and we call it the Stochastic ConFL (SConFL) problem. When uncertainties exist, the objective is to minimize the expected cost of the network. In an instance of the SConFL problem, there are certain and uncertain costs. The facility opening costs and the connection costs between them is assumed to be known beforehand while the assignment costs are unknown and dependent on a 63 random scenario that causes them to be uncertain. This type of problem is known as a two-stage stochastic problem with xed recourse as originated by Dantzig (1955) and described by Birge and Louveaux (1997). The two stages of the problem are as follows. In the rst stage, a set of facilities must be opened and a Steiner tree that connects them constructed. In the second stage, uncertainty on the assignment costs is unveiled (i.e., one scenario is realized) and customers must be assigned to open facilities. The objective is to minimize the network design cost including the core network|opening of facilities and their connection|and the expected assignment cost. This sequence of events clearly describes the problem introduced by Krick et al. (2003) in the data management setting where for operational reasons the construction of the core network must be performed ahead of time while the actual requests to read and/or write data from the servers come later in time. In the SConFL problem we can identify two types of uncertainty: (i) each customer?s demand quantity and (ii) each customer?s location (or travel time to potential facilities). Both types of uncertainty a ect the assignment costs; however, they have a very di erent impact that requires independent treatment. When de- mand quantities are unknown, we show that the SConFL problem can be optimally solved by replacing all random variables by their expected values. The two-stage stochastic problem can be nicely reduced into a one stage problem without recourse. The reason is that demand quantities a ect all assignment costs in the same way. Hence, once facilities have been opened, demand nodes are assigned to the closest (per unit of demand) facility, regardless of the actual demand quantity. Conse- quently, there is no recourse in the second stage and the problem can be solved in 64 one stage. The value of the stochastic solution is null (i.e., using average demand values as in Krick et al. (2003) actually solves the problem). Another source of uncertainty in assignment costs is the customer locations. In this category we also include other sources of uncertainty in assignment costs that have a similar e ect and do not a ect all assignment costs in the same direc- tion (i.e., some costs can increase and some can decrease). For example, when a network provider o ers discounts or special rates on certain links in the network to balance out the overall ow, the customer location may be known beforehand but the closest or cheapest facility (per unit of demand) may change overtime tied to the tra c ow on the network. For the ConFL problem one can specify three di erent types of probability distributions on the scenarios: (i) the polynomial- scenario model, where one assumes that there is only a polynomial number of sce- narios that occur with positive probability, and these are explicitly enumerated, (ii) the independent-activation model, where each assignment edge has an independent probability distribution, and (iii) the black-box model, where nothing is assumed about the probability distribution. In this chapter our analysis focuses on the polynomial-scenario model and the independent-activation model. While in theory one can use the same methodol- ogy proposed here for the polynomial-scenario model for the independent-activation model; in practice our solution approach devised for the polynomial-scenario model becomes impracticable because the number of scenarios grows rapidly when assign- ment edges vary independently. The method proposed for the polynomial-scenario model is based on construct- 65 ing a deterministic equivalent of the stochastic problem. We propose a set of trans- formations to obtain a deterministic ConFL problem that coupled with our DLS heuristic yields high-quality solutions. Furthermore, we use this strategy to obtain high-quality solutions for the independent-activation model within a Monte-Carlo simulation framework such as Sample Average Approximation (SAA) (see Kleywegt et al. (2002)). We report computational results on a comprehensive set of randomly generated instances for both the polynomial-scenario model and the independent- activation model using the SAA framework. The novelty of our implementation of the SAA framework is that we use a heuristic to solve the sample average problems. The SAA framework relies on the fact that sample average problems are solved to optimality. However, here we show how to implement SAA using a lower bounding procedure jointly with a heuristic and yet obtain tight con dence bounds on the optimal solution. 4.2 Literature Review The introduction of uncertainty in linear programming dates back to 1955 when Dantzig (1955) introduced a computation procedure for two-stage linear pro- gramming models and a set of convexity theorems on the objective function of multiple stage models. Real world applications are ooded with uncertainty, which stochastic optimization allows us to model into the decision process as probabil- ity distributions to better represent the problem under consideration. As a result the stochastic models provide signi cant value to the decision process because they 66 better capture the nature of problem. In many situations it has been shown that just replacing uncertain input values for their expected values is not necessarily a good strategy. For a good introduction to the eld of stochastic optimization see Ruszczynski and Shapiro (2003), Birge and Louveaux (1997) and Kall and Wallace (1994). There have been several papers in the literature that deal with facility location or network design with uncertain demands or link lengths in various contexts. How- ever, to our knowledge there is no prior work on the stochastic connected facility location problem. Mirchandani (1975) and Mirchandani and Odoni (1979) extend the concept of p-median location to networks whose edge costs are random vari- ables. Their main motivation is the deployment of a service vehicle in a city when the travel times vary randomly and throughout the day due to tra c congestion. The objective of the problem is to minimize the expected travel time to any destina- tion node in the network. Weaver and Church (1983) address the same problem and develop a computational procedure. There is no recourse in this problem. Berman (1978) and Berman and Odoni (1982) add the option of relocating the service vehicle once travel times are revealed. Berman (1978)?s heuristic is generalized to multiple facilities by Berman and Odoni (1982). In another set of facility location problems, the uncertainty element relies in- stead on the customer demands. Snyder (2006) provides a comprehensive review on stochastic and robust facility location models. Laporte et al. (1994) analyze the capacitated facility location problem with uncertain demand. They state the problem as a two-stage program with recourse where the rst stage decisions de ne 67 the location of the facilities and their capacity and the second-stage decisions de- termine the quantities delivered to each demand node. The paper by Louveaux and Peeters (1992), which deals with a more general version of the problem that models uncertain demands and edge costs, is very relevant to our study in this chapter in terms of methodology. They propose a dual-based heuristic for a two-stage stochas- tic program with recourse when there is uncertainty on demands, selling prices, and production and transportation costs. In the rst stage, decisions regarding location and capacities of the plants are taken. And in the second stage, after demands, prices and costs are revealed, the allocation of demands is determined. The optimal capacity of the facilities arises from the trade-o between the cost of increasing the capacity and the net pro t at the various random demand levels. They extend the dual-based procedure of Erlenkotter (1978) for the uncapacitated facility location problem and once again prove the e ectiveness of dual-ascent schemes for facility lo- cation problems. Although the problem setting and algorithmic strategy is di erent; we also exploit the known virtues of a dual-ascent scheme to generate a dual-solution (and lower bound) and a primal solution that later improves with local search. The SConFL problem is di erent from these two types for facility location problems in that we assume no capacity limits on the edges. Capacity limits call for a trade-o between the rst stage and second stage decisions as some demand might be lost due to capacity decisions taken at an earlier stage. In the SConFL problem demands are always met; however, the facility node that serves the demand may change according to the realized edge costs. The trade-o lies between the cost of the core network and the realized assignment cost. Installing more facilities and 68 locating them closer to the demand nodes would increase the cost of the core network while potentially decreasing the assignment cost of the second stage decisions. Another vein of research that relates to our problem are network design prob- lem in telecommunications. Sen et al. (1994) study the problem of private-line services with random demands. They de ne a two-stage problem where the rst- stage decision variables correspond to the installation of capacity on the edges of a network and the second stage decision variables deal with routing demand between origins and destinations. The objective in that problem is to minimize unmet de- mand. Riis and Andersen (2003) discuss the same problem and develop a procedure based on an L-Shaped algorithm (Van Slyke and Wets 1969). A di erent path of research has been the development of approximation heuris- tics for these problems. Gupta et al. (2004) nd a constant factor approximation heuristic for the stochastic Steiner tree problem and single sink network design prob- lem. Ravi and Sinha (2006) consider two-stage nite scenario stochastic versions of various combinatorial optimization problems among them the facility location prob- lem. In the rst-stage facilities are opened while in the second-stage open facilities can be modi ed at a higher cost. They nd an 8-approximation heuristic for this problem, which Swamy (2004) improve with a 4.127 + -approximation algorithm. In this chapter, we do not seek to devise approximation algorithms. Instead our objective is to develop a heuristic that performs e ciently in practice. All the previously discussed problems have some aspect of the SConFL prob- lem; yet none of the methods proposed for these problems easily extends to the SConFL problem. There has been considerable research on two-stage stochastic 69 problem with recourse (see, Birge and Louveaux (1997)); however, most of these methods assume linearity on the decisions of the rst and second stage decision variables. Integer (and binary) decisions on both stages makes the stochastic prob- lem even harder to solve. We present in this chapter a heuristic that exploits the characteristics of the SConFL problem and obtains high-quality solutions to a two- stage stochastic integer programming problem. 4.3 A Note on Two-Stage Linear Programs with Fixed Recourse Stochastic linear programs are linear programs in which some problem data may be considered uncertain. Recourse programs are those in which some decisions or recourse actions can be taken after the uncertainty is revealed. That means that some of the problem parameters can be represented as random variables. We represent by ! the random event such as market conditions that determines some of our problem parameters such as demand quantities di(!) or assignment costs aij(!), and we denote by the set of problem parameters that are tied to the random event !. We represent by = (!) the relationship between the uncertain event and the parameters of our problem. Although this relationship is not a functional relation between the random event and the problem inputs; we assume that each scenario ! 2 fully determines the problem parameters in . The realization of the random event divides the set of decisions into two groups: Decisions that have to be taken before the realization of the random event are called rst-stage decisions and take place during the rst stage. 70 Decisions that take place after uncertainty is unveiled are second-stage deci- sions or recourse decisions. These decisions take place during the second-stage. Within the stochastic optimization literature, it is customary to represent rst- stage decisions by x and second-stage decisions by y(x; !). Then, the classical two- stage stochastic linear program with xed recourse (introduced by Dantzig (1955) and Beale (1955)) is the problem of nding Minimize z = cTx+ E![min q(!)Ty(x; !)] (4.1a) subject to Ax = b (4.1b) T (!)x+W (!)y(x; !) = h(!) (4.1c) x 0; y(x; !) 0: (4.1d) Each component q, T , W , and h is a possible random variable determined by a realization of !. Then is the set of these random components, (!) = fq(!)T ; h(!)T ; T (!);W (!)g. We assume that is the support of . In addition, for the SConFL we assume that is nite. The objective function (4.1a) contains a deterministic term, cTx, and the ex- pectation of the second-stage objective, q(!)Ty(x; !). In many stochastic program- ming problems, this second-stage term is usually hard to compute because for each 71 ! the value of y(x; !) is the solution of an optimization problem. In other words, for each ! and rst-stage decisions x, we must solve the following optimization program Q(x; (!))= Minimize q(!)Ty (4.2a) subject to W (!)y = h(!) T (!)x (4.2b) y 0: (4.2c) Q(x; (!)) is referred as the second-stage value function. Suppose we denote the expected second-stage value function as Q(x) = E!Q(x; (!)), then the deter- ministic equivalent program (DEP) of the stochastic programming problem is: Minimize z = cTx+Q(x) (4.3a) subject to Ax = b (4.3b) x 0 (4.3c) This representation of a stochastic program shows that the main di erence from a deterministic formulation is in the expected second-stage value function. If we can nd a closed form representation for the second-stage value function, 72 which could be nonlinear, the stochastic program becomes an ordinary nonlinear program. For the SConFL problem we nd this second-stage value function when has polynomial size, and furthermore, show that it is linear. This allow us to apply our DLS heuristic on the deterministic equivalent program to nd high-quality solutions. In the SConFL the rst-stage decisions are the set of open facilities, z, and the Steiner tree that connects them, y, and the second-stage decisions involve the allocation of customers to open facilities, x. We represent second-stage decisions by x(z; !) to emphasize their dependence on the rst-stage decisions, z, and the realized scenario, !. 4.4 Problem Formulation In this section we explore the formulation of the Stochastic ConFL and show its transformation into a deterministic ConFL problem (i.e. its deterministic equivalent program) such that our dual-ascent local search heuristic, introduced in Chapter 3, can be used to obtain high-quality solutions as well as assisting lower bounds. We rst de ne a cutset formulation for the deterministic ConFL, i.e. a ConFL problem with known demand quantities and assignment costs. The objective func- tion (4.4a) has three terms: the opening facility cost, the core tree cost and the assignment cost. Constraints (4.4b) and (4.4c) impose the condition that the open facilities are connected by a Steiner tree, while constraints (4.4d) and (4.4e) ensure that each demand node is assigned to an open facility. 73 Cutset formulation for the deterministic ConFL problem: Minimize X i2F fizi + X (i;j)2(S[F ) cijyij + X i2D;j2F aijxij (4.4a) subject to X (i;j)2(S[F ) yij = X l2(S[F ) zl 1 (4.4b) X (i;j)2R yij X l2R zl 1; 8R (S [ F ) (4.4c) X j2F xij 1; 8i 2 D (4.4d) xij zj; 8i 2 D; 8j 2 F (4.4e) xij; yij; zi 2 f0; 1g (4.4f) In the stochastic version of the ConFL problem, assignment costs, aij, are uncertain and depend on the realization of a random variable ! or scenario. In general terms, pieces together the stochastic components of the problem. We assume that the random variable has discrete and nite support in . In other words, there is a nite set of known possible scenarios and P( ) = 1. Then, aij(!) represents the assignment cost under scenario ! and p! the probability of occurrence of scenario !. The following formulation shows the analogous cutset formulation of this problem as a two-stage stochastic program with xed recourse. 74 Cutset formulation for the Stochastic ConFL problem: Minimize X i2F fizi + X i;j2(S[F ) cijyij + E!(Q(z; (!))) (4.5a) subject to X (i;j)2(S[F ) yij = X l2(S[F ) zl 1 (4.5b) X (i;j)2R yij X l2R zl 1; 8R (S [ F ) (4.5c) yij; zi 2 f0; 1g (4.5d) In the stochastic version of the ConFL problem, the assignment cost is un- known in the rst stage and hence we replace the third term in the objective function (4.4a) by its expected value, (i.e. the expected value of the second stage decision problem), yielding the objective function (4.5a). In other words, the assignment decision for each demand node is delayed until the second stage when the recourse minimization problem (4.6) is solved. 75 Q(z; (!)) = Minimize q(z; (!))(x) = X i2D;j2F aij(!)xij (4.6a) subject to X i2F xij 1; 8j 2 D (4.6b) xij zj; 8i 2 D; 8j 2 F (4.6c) xij 2 f0; 1g (4.6d) Clearly, once open facilities are de ned in the rst stage, the recourse problem reduces to an assignment problem where demand nodes are assigned to the closest open facility. To determine the closest open facility for each demand node we must wait until assignment costs are realized; consequently, the solution to the assignment problem may vary for each scenario. 4.4.1 SConFL with Uncertain Demands In the case where uncertainty on assignment costs is due to unknown demand quantities, we assume that the per unit assignment cost (denoted by bij) is xed and known before hand. Here when scenario ! is unveiled, we mean that the demand quantity di(!) is discovered for each demand node i, and hence the assignment cost aij(!) = di(!)bij is revealed. In this setting, the assignment cost matrix, A, is the only random input parameter in the problem; then, (!) = (A(!)). For this speci c 76 realization of events, we show that the value of the stochastic solution is null. In other words, the optimal solution for this stochastic ConFL is the optimal solution of a deterministic ConFL problem when average demands are assumed. Theorem 4.4.1. The optimal solution of the stochastic ConFL with uncertain de- mands is equal to the optimal solution of the deterministic ConFL obtained by re- placing all random variables by their expected values. Before we can prove Theorem 4.4.1, we need to show the following two lemmas. Lemma 4.4.2. Given a rst-stage decision, z, the optimal allocation solution, x , to the recourse problem, Q(z; (!)), for the SConFL with uncertain demands is invariant to demand realizations. Proof of Lemma 4.4.2 We must show that if x is an optimal solution for Q(z; (~!)) for some ~! 2 , then x is an optimal solution for Q(z; (!)) for all ! 2 . Let Xz be the feasible region de ned by z, and x 2 Xz be an optimal solution for Q(z; (~!)) for some ~! 2 . Note that the feasible region Xz is not dependent on !. Then, q(z; (~!))(x ) = P i2D( P j2F aij(~!)x ij) = P i2D( P j2F di(~!)bijx ij) = P i2D di(~!)( P j2F bijx ij) P i2D di(~!)( P j2F bijxij); 8x 2 X z: 77 In vector notation, X i2D di(~!)(bTi x i ) X i2D di(~!)(bTi xi);8x 2 X z: (4.7) From inequality (4.7) we can show1 that not only the inequality holds for the summation, but also for each individual term in it. That is, di(~!)(bTi x i ) di(~!)(b T i xi);8i 2 D; 8x 2 X z; (4.8) which implies, bTi x i b T i xi;8i 2 D; 8x 2 X z: (4.9) We know that x 2 Xz and hence x is a feasible solution for Q(z, (!));8! 2 . Now, assume that x is not an optimal solution to Q(z; (!)) for some ! 2 . Then, there exists an x0 6= x 2 Xz such that q(z; (!))(x0) < q(z; (!))(x ). q(z; (!))(x0) = P i2D di(!)(b T i x 0 i) < P i2D di(!)(b T i x i ) ) di(!)(bTi x 0 i) < di(!)(b T i x i ); 9i 2 D ) bTi x 0 i < b T i x i ;9i 2 D )( This contradicts equation (4.9) and proves by contradiction our Lemma 4.4.2 . 1If there exists a demand node i 2 D, such that di(~!)(bTi x i ) > di(~!)(b T i x 0 i) for some x 0 i. Then we could replace i?s assignment in x and obtain a lower objective function. This would contradict our assumption that x is an optimal solution. 78 Lemma 4.4.3. Given a rst-stage decision, z, the expected value of the recourse program, Q(z; (!)), for the SConFL with uncertain demands equals the objective function value of the recourse program with expected demands. Proof of Lemma 4.4.3 We must show that E!(Q(z; (!))) = Minimizex2Xz P i2D P j2F E!(di(!))bijxij. E!(Q(z; (!))) = X !2 p!Q(z; (!)) (4.10) = X !2 p! Minx2Xz X i2D;j2F di(!)bijxij (4.11) By Lemma 4.4.2, we can take the minimization outside the rst summation. More- over, we can rearrange the order of summations. Then, E!(Q(z; (!))) = Minx2Xz X !2 p! X i2D;j2F di(!)bijxij (4.12) = Minx2Xz X i2D;j2F ( X !2 p!di(!))bijxij (4.13) = Minx2Xz X i2D;j2F E!(di)bijxij (4.14) This proves Lemma 4.4.3 . Proof of Theorem 4.4.1 Finally, the mean value problem of the recourse problem solves the recourse problem, and consequently Theorem 4.4.1 directly follows from Lemma 4.4.3 . 79 4.4.2 SConFL with Uncertain Locations When variability on assignment costs is due to uncertainty on customers? loca- tion or other factors that do not a ect assignment costs proportionally, simplifying the problem by replacing the random variables by their expected value does not necessary lead to good solutions. This is because the location of the closest open facility depends on the realized scenario. In this case we show that even though we cannot replace random variables by their expected values, we can transform the SConFL problem into a deterministic ConFL problem with multiple copies of de- mand nodes that our dual-ascent local search heuristic can successfully solve. We assume the polynomial-scenario model such that there exists a limited number of scenarios, ! 2 , that determine the whole set of assignment costs, aij(!) 8i 2 D and j 2 F . Theorem 4.4.4. The SConFL problem is equivalent to a deterministic ConFL prob- lem with j j copies of each demand node|one copy for each scenario and with assignment cost equal to p!aij(!). Proof of Theorem 4.4.4 Recall the second stage recourse problem, Q(z; !) = Min x2Xz X i2D;j2F aij(!)xij(!): (4.15a) Hence, the expected value of the recourse problem is the weighted sum of each scenario given the decisions of the rst stage problem, z. Here, we have to specify 80 that the allocation solution xij is a function of !. E!(Q(z; (!))) = X !2 p! Minx2X X i2D;j2F aij(!)xij(!) (4.16a) = Minx2X X !2 p! X i2D;j2F aij(!)xij(!) (4.16b) = Minx2X X !2 X i2D;j2F p!aij(!)xij(!) (4.16c) In other words, given equation (4.16c) we can explicitly introduce the expected value into our two-stage linear problem with recourse formulation. The resulting formulation (4.17), hence, is equivalent to our original cutset formulation for the ConFL problem when assignment costs are known, see formulation (4.4). There are as many copies of the demand nodes as scenarios there exist, and the assignment costs are given by the product of the probability of the scenario and the original assignment cost. This nalizes our proof of Theorem 4.4.4 . 81 Cutset formulation for the Stochastic ConFL problem: Minimize Z = X i2F fizi + X i;j2(S[F ) cijyij + X !2 X i2D;j2F p!aij(!)xij(!) (4.17a) subject to X i2F xij(!) 1; 8j 2 D; 8! 2 (4.17b) xij(!) zj; 8i 2 D; 8j 2 F; 8! 2 (4.17c) X (i;j)2(S[F ) yij = X l2(S[F ) zl 1 (4.17d) X (i;j)2R yij X l2R zl 1; 8R (S [ F ) (4.17e) xij(!); yij; zi 2 f0; 1g (4.17f) While our transformation is also applicable to the independent-activation model, it su ers from what is often referred to as the curse of dimensionality. For example, a problem with only two facilities, jF j = 2, and three demand nodes, jDj = 3, with two assignment cost levels each, jLj = 2, where any demand node can be assigned to any facility node would have a total of 64 scenarios to consider, jLjjDj jF j . When the location of demand nodes are independent from each other, the number of scenarios increases rapidly and to simply solve the deterministic equiva- lent problem with multiple demand node copies is impractical and computationally infeasible. However, taking advantage of the deterministic equivalent problem, we can obtain high-quality solutions using Monte Carlo simulation for the independent- 82 activation model. Furthermore, the lower bounding procedure allows as to assess the quality of the heuristic solution and construct tight con dence bounds for the optimal solution value. 4.5 Sample Average Approximation Method The sample average approximation (SAA) method is an approach for solv- ing stochastic optimization problems by using Monte Carlo simulation. Kleywegt et al. (2002), Verweij et al. (2003), and Shapiro and Philpott (2007) provide good introductions to this approach. In this technique the expected objective function of the stochastic problem is approximated by a sample average estimate derived from a random sample. The resulting sample average approximation problem, a de- terministic variant of the problem, is then solved by optimization techniques. The process is repeated with di erent samples to obtain candidate solutions along with statistical estimates of their optimality gaps. Unlike the approach followed in the literature in the SAA method, here we solve the sample problems with a heuristic coupled with a lower bounding method. We show that it is possible to construct tight con dence intervals on the optimal value function even if the sample problems are not solved to optimality. This result is particularly important for problems such as the ConFL problem that are costly to solve to optimality (see Ljubi c (2007)). Certainly, the quality of the solution yielded will depend on the quality of the solution obtained by the heuristic for the sample problems; and the width of the con dence interval will also depend on the sample 83 problem size, the quality of the lower bounds, and the variability of the solution values. In the SAA method the expected value function E[Q(x; (!))] is approximated by the sample average function PN n=1Q(x; (! n))=N , where a sample f!1; !2; :::; !Ng of N sample scenarios is generated from according to probability distribution P . The SAA problem zN = min x2X cTx+ 1 N NX n=1 Q(x; (!n)); (4.18) corresponding to the original two-stage stochastic problem is then solved using a deterministic optimization algorithm. The optimal value zN and an optimal solution x^ to the SAA problem provide estimates of their true counterparts in the stochastic program. By generating R independent samples, each of size N , and solving the associated SAA problems, objective values zN1; zN2; :::; zNR and candidate solutions x^1; x^2; :::; x^R are obtained. Let zN = 1 R RX m=1 zNm (4.19) denote the average of the R optimal values of the SAA problems. This procedure produces up to R di erent candidate solutions. Out of these R di erent candidate solutions, we have to select one as the approximation to the optimal solution of the original stochastic program. One generally accepted strategy is to generate a sample problem with a signi cantly large number of scenarios, 84 N 0 >> N . Then, it is natural to take x^ as one of the optimal solutions x^1; x^2; :::; x^R of the R SAA problems that has the smallest estimated objective value, that is, x^ 2 arg minfz^N 0 (x^)jx^ 2 fx^1; x^2; :::; x^Rgg (4.20) where f!1; !2; :::; !N 0 g is the sample of scenarios chosen to evaluate the candidate solutions. Using the DLS heuristic, we generateR heuristic candidate solutions x1H ; x 2 H ;:::; xRH with their corresponding objective values, z N1 H ; z N2 H ; :::; z NR H and lower bounds, zN1LB; z N2 LB;:::; z NR LB . Similarly, we take as the heuristic solution to the stochastic pro- gram the heuristic solution xiH that has the smallest estimated objective value in the sample problem with N 0 scenarios. 4.5.1 Quality of the Solution Kleywegt et al. (2002) provides performance bounds on the quality of the solution yielded by the SAA method to the stochastic program. Following and extending their argument, in this section we provide performance bounds on the quality of the solution yielded by the SAA method using the DLS heuristic. Given a feasible solution x 2 X, we have to evaluate the quality of this point viewed as a candidate for solving the true problem. Since the point x is feasible, we clearly have that g(x) v , where v = minx2X g(x) is the optimal value of the stochastic problem, and g is the true stochastic objective function. The quality of 85 x can be measured by the optimality gap gap(x) := g(x) v : (4.21) The true value of g(x) can be estimated by Monte Carlo sampling. That is, an iid random sample !j, j = 1; :::; N 0, of ! is generated and g(x) is estimated by the corresponding sample average gN 0(x) = cTx+ qN 0(x). At the same time the sample variance 2N 0(x) := 1 N 0(N 0 1) N 0X j=1 [Q(x; !j) qN 0(x)] 2 (4.22) of qN 0(x) is calculated. Then we can calculate an approximate 100(1 )% con dence upper bound for g(x) by UN 0(x) := gN 0(x) + z N 0(x): (4.23) This bound is justi ed by the Central Limit Theorem with the critical value z = 1(1 ), where (z) is the cumulative distribution function (cdf) of the standard normal distribution. In order to calculate a lower bound for v we proceed as follows. Denote by zNLB the lower bound yielded by the DLS heuristic for the SAA problem based on a sample of size N . Note that zNLB is a function of the (random) sample and hence is random. To obtain a lower bound for v observe that E[ gN(x)] = g(x), i.e., the sample average gN is an unbiased estimator of the expectation g(x). We also have that for any x 2 X the inequality gN(x) infx02X gN(x0) zNLB holds, so for any 86 x 2 X, we have g(x) = E[ gN(x)] E[ inf x02X gN(x 0)] E[zNLB]: (4.24) By taking the minimum over x 2 X of the left hand side of the above inequality we obtain v E[zNLB]. We can estimate E[zNLB] by solving the SAA problems several times and av- eraging the lower bounds calculated by dual ascent. That is, the SAA problems based on independently generated samples, each of size N , are solved to obtain a dual-ascent bound R times. Let zN1LB; z N2 LB; :::; z NR LB be the computed lower bound values for these SAA problems. Then, zNRLB := 1 R RX j=1 zNjLB (4.25) is an unbiased estimator of E[zNLB], the dual-ascent lower bound yield for SAA prob- lems with N scenarios. Since the samples, and hence zN1LB; z N2 LB; :::; z NR LB , are indepen- dent, we can estimate the variance of zNRLB by 2NR := 1 R(R 1) RX j=1 (zNjLB z NR LB ) 2: (4.26) A con dence 100(1 )% lower bound for E[zNLB] is then given by LNR := z NR LB + t ; NR; (4.27) where = R 1 and t ; is the -critical value of the t-distribution with degrees of 87 freedom. Since v E[zNLB], we have that LNR gives a valid lower statistical bound for v as well. Consequently, ^gap(x) := UN 0(x) LNR (4.28) gives a statistically valid (with con dence at least 1 - 2 ) bound on the true gap(x). Alternatively, we can express this gap as a percentage by ^gap(x)[%] := UN 0(x) LNR LNR 100[%]; (4.29) with the following interpretation: the heuristic solution is within x% from the true optimal solution with con dence at least 1 - 2 . It can be noted that the lower bound LNR is somewhat conservative and depends on the quality of the lower bounding mechanism. 4.6 Proposed Heuristic Our proposed heuristic relies on the fact that the SConFL problem with a polynomial number of scenarios can be formulated as a deterministic ConFL problem with multiple copies of the demand nodes and assignment costs equal to the original assignment cost multiplied by its probability of occurrence. Once the problem has been transformed into a deterministic ConFL problem, we apply our DLS heuristic to obtain a high-quality solution and lower bound as described in Chapter 3. 88 4.7 Computational Experiments In this section we solve a set of SConFL problems and explore the bene ts of solving the stochastic problem as an integer linear two-stage recourse problem. We report on a set of computational experiments with our DLS heuristic on the SConFL problem. We report on results for both: (i) the polynomial-scenario model and (ii) the independent-activation model. We use the insights from our polynomial-scenario computational experiments to generate the sample average problems for the inde- pendent-activation computational experiments. We note that the sample average problems are simply SConFL problems with a preset number of scenarios. For the polynomial-scenario problem, we also solve the associated mean value problem and report the duality gap of the expected value solution. We coded our heuristics in Visual Studio 2005 (C++). We conducted all runs on an AMD AthlonTM 62 X2 Dual, 2.61 GHz machine with 3GB of RAM. 4.7.1 Expected Value Solutions In order to calculate the Expected Value Solution (EVS) we need to rst nd the solution that solves the associated mean value problem (MVP). That is, the deterministic ConFL problem whose random assignment costs, aij(!), have been replaced by their expected value, aij. We denote this solution as x( !) and we use 89 it to calculate the EVS as, EV S = X !2 p!Z(x( !); !) (4.30) where Z(x( !); !) is the objective function of the SConFL model for the value of the decision variables x( !) and a realization of the random variable !. Therefore, the EVS is the expected value yielded by implementing the solution obtained by assuming expected values on the random inputs. In many settings this solution is not even guaranteed to be feasible; however, for the ConFL problem the solution obtained by solving the average problem is always feasible. The Value of the Stochastic Solution (VSS) is then given by, V SS = EV S Z (4.31) where Z is the objective of the stochastic solution. In our computational experiments we use a heuristic to solve the MVP and the stochastic problem; consequently, we cannot precisely compute the value of the stochastic solution. However, we can compare the quality of both solutions and compute the lower bound gap of the MVP solution using the lower bound yielded by DA for the SConFL problem. 90 4.7.2 Problem Generation and Characteristics In this section we describe how we generated instances for both the polynomial- scenario model and sample average approximation problems. We generated in- stances by rst selecting nodes randomly located on a 100 x 100 square grid. The location, i.e. x- and y-coordinates, of each facility, Steiner node and demand node is randomly generated on the grid. Furthermore, to represent the uncertainty in the assignment costs we assume that the exact location of each demand node is uncertain and generate as many copies as scenarios of each demand node varying its location. As a rst step we generated a base location for each demand node; secondly, we disturbed that location by an error term, e, in the x- and y-coordinates drawn from a discrete uniform distribution according to a given variability, v. In our rst set of instances, v ranges from 5 to 30 in steps of 5; i.e. if v = 5 then e s U [ 5; 5]. The Euclidean distances rounded up to the next integer (to preserve triangle inequality) were used as a basis for the edge lengths. The assignment edge costs are equal to the edge lengths between demand nodes and facility nodes, while tree edge costs are equal to the edge lengths multiplied by an M factor. The M factor illustrates the signi cantly higher (in terms of cost per unit distance) connection cost of edges in the tree T . We set M = 7 for the polynomial-scenario instances and M = 3 for the sample average problems. The number of demand nodes and facility nodes vary between 10 and 90 in steps of 10, with the total number of demand and facility nodes equal to 100. The 91 number of Steiner nodes is 20 for all the instances. In an instance of the problem, the facility opening costs are equal to 30 and the same for all the facility nodes. Finally, the number of scenarios considered for the polynomial-scenario instances is 5. For the polynomial-scenario instances to compare the performance of our DLS heuristic on the SConFL with respect to its performance on the average represen- tation of the problem, we generate an \average instance" where the assignment costs for each demand node equal its average assignment cost. In these instances assignment costs are no longer integer. For the sample approximation method, we set the number of scenarios N for the sample average problems to 20, and the number of replications R to 10. Lastly, we set the number of scenarios N 0 to 2000 to compute the sample variance of the solution yielded by the heuristic. 4.7.3 Polynomial-Scenario Model Results Tables 4.1 and 4.2 present our computational results of the dual-ascent heuris- tic and the DLS heuristic on the MVP and the stochastic formulation. The MVP results were calculated using the original assignment costs; that is, after the DLS heuristic yielded a solution using the average assignment costs, we use that solution (set of open facilities) to calculate the real assignment costs with recourse. DA yields relatively good solutions with average gaps below 11.01% for every range of variability on the location of the demand nodes and proportion of demand 92 and facility nodes. However, in some instances the gap can be quite large, and in one instance this reaches 23.74%. The local search is quite e ective to reduce the gaps on either formulations and yields in all cases solutions with gaps below 9.25% on the MVP and 7.70% on the stochastic formulation. The DLS heuristic yields solutions with average gaps below 5.22% and 4.07% on the MVP and stochastic formulation, respectively, for all the combinations of parameters. There are no considerable di erences between the solutions yielded by DLS on the MVP and on the stochastic formulation for low values of variability. This result was expected. When there is low variability on the assignment costs, to obtain a solution using the average assignment costs seems quite reasonable given that the DLS takes a fraction of the time on this formulation with respect to the time it requires on the complete formulation of the problem. However, as the variability on the assignment costs increases, DLS nds better solutions when run on the stochastic formulation of the problem. Nevertheless, we must note that neither formulation yields the best solution for all combinations of parameters or variability. Even when the variability on the assignment costs is high, in a few instances the DLS heuristic found a better solution using the MVP formulation. 4.7.4 SAA Results (Independent-Activation Model) Tables 4.3, 4.4 and 4.5 show our computational results for the SConFL problem using the SAA method and the DLS heuristic. Each entry in Table 4.3 shows the average gap over ten instances, while Tables 4.4 and 4.5 show the minimum 93 jDj jF j v = 5 v = 10 v = 15 DA MVP DLS DA MVP DLS DA MVP DLS 10 90 0.04% 0.04% 0.04% 0.04% 0.04% 0.04% 0.04% 0.04% 0.04% 20 80 0.02% 0.02% 0.02% 0.02% 0.02% 0.02% 0.02% 0.02% 0.02% 30 70 3.05% 1.37% 1.39% 4.88% 1.21% 1.11% 4.27% 1.64% 1.25% 40 60 7.63% 2.38% 2.64% 8.59% 3.02% 3.03% 8.47% 3.38% 2.69% 50 50 7.20% 3.50% 3.70% 8.40% 3.72% 3.52% 10.28% 4.49% 3.68% 60 40 8.55% 3.27% 3.53% 9.58% 3.34% 3.84% 9.84% 4.30% 4.07% 70 30 6.22% 2.71% 2.77% 6.79% 3.06% 2.89% 7.12% 3.38% 2.66% 80 20 2.87% 1.30% 1.53% 3.16% 1.68% 1.47% 3.59% 1.87% 1.46% 90 10 1.66% 0.38% 0.38% 1.38% 0.47% 0.48% 1.53% 0.61% 0.57% Table 4.1: Comparison of heuristics for the Stochastic ConFL. fi = 30, M = 7, and v factor is varied jDj jF j v = 20 v = 25 v = 30 DA MVP DLS DA MVP DLS DA MVP DLS 10 90 0.04% 0.04% 0.04% 0.04% 0.04% 0.04% 0.04% 0.04% 0.04% 20 80 0.02% 0.02% 0.02% 0.02% 0.02% 0.02% 0.02% 0.02% 0.02% 30 70 6.15% 1.28% 1.47% 4.56% 1.91% 1.73% 5.98% 1.67% 1.63% 40 60 7.01% 3.24% 2.93% 8.91% 3.16% 2.55% 8.40% 4.08% 2.79% 50 50 7.67% 3.59% 3.32% 9.67% 4.11% 3.74% 11.01% 4.85% 3.67% 60 40 10.61% 3.33% 3.98% 9.11% 5.22% 3.82% 8.43% 4.29% 3.21% 70 30 6.77% 3.57% 3.02% 8.74% 3.83% 2.81% 9.35% 4.54% 3.42% 80 20 4.10% 2.53% 1.83% 5.40% 2.31% 1.62% 5.32% 2.98% 2.33% 90 10 1.15% 0.66% 0.41% 1.82% 1.23% 0.32% 1.72% 1.50% 0.51% Table 4.2: Comparison of heuristics for the Stochastic ConFL. fi = 30, M = 7, and v factor is varied and maximum gap within those ten instances, respectively. The values reported are 98% con dence gaps. That is, with 98% con dence the optimal value of the true stochastic problem is x% from the lower bound. Overall, these gaps follow the behaviour observed for the deterministic instances and the polynomial-scenario instances. Lower gaps are observed for either high proportions of demand nodes or facility nodes. On the contrary, higher gaps are observed for balanced instances with similar numbers of demand nodes and facility nodes. Furthermore, these con dence 94 jDj jF j v 5 10 15 20 25 30 10 90 0.59% 1.03% 1.96% 2.47% 2.64% 2.67% 20 80 2.58% 2.93% 3.55% 3.60% 4.25% 4.75% 30 70 2.69% 2.95% 3.51% 4.59% 5.01% 5.17% 40 60 2.59% 3.08% 4.14% 4.48% 5.00% 5.42% 50 50 3.08% 3.39% 3.76% 4.51% 4.75% 5.25% 60 40 2.91% 3.31% 3.42% 3.90% 4.17% 4.78% 70 30 1.76% 2.12% 2.53% 2.83% 3.23% 3.35% 80 20 0.82% 1.20% 1.75% 1.91% 2.26% 2.52% 90 10 0.36% 0.80% 0.95% 1.09% 1.36% 1.19% Table 4.3: Average 98% con dence gaps for the SConFL, fi = 30, M = 3, and v factor is varied jDj jF j v 5 10 15 20 25 30 10 90 0.33% 0.43% 1.08% 1.48% 1.40% 0.74% 20 80 1.27% 1.54% 2.17% 2.06% 1.71% 3.04% 30 70 1.01% 1.69% 1.57% 2.66% 3.29% 3.18% 40 60 0.83% 1.22% 3.13% 3.09% 3.55% 4.17% 50 50 1.75% 2.23% 2.64% 3.81% 3.39% 4.25% 60 40 1.40% 1.76% 2.71% 3.08% 3.14% 2.82% 70 30 0.80% 1.19% 1.63% 1.77% 2.01% 1.89% 80 20 0.30% 0.59% 1.28% 1.39% 1.65% 1.53% 90 10 0.20% 0.32% 0.48% 0.69% 1.05% 0.44% Table 4.4: Minimum 98% con dence gaps for the SConFL, fi = 30, M = 3, and v factor is varied gaps increase for higher levels of uncertainty. We can outline two explanation for this behaviour. First, as the uncertainty level increases, the duality gaps obtained by DLS increase. We observe this behaviour for the polynomial-scenario instances. Secondly, as the uncertainty level increases, the sample variance increases as well, and the width of the con dence interval increases. Tables 4.4 and 4.5 show that there are no large disparities in the gaps for a particular set of parameters. The smallest gap corresponds to an instance with 90 95 jDj jF j v 5 10 15 20 25 30 10 90 1.14% 1.60% 3.34% 3.85% 3.99% 4.53% 20 80 3.77% 6.20% 6.11% 5.48% 6.81% 7.82% 30 70 4.49% 4.21% 5.15% 7.85% 8.28% 8.84% 40 60 4.12% 4.19% 5.38% 5.15% 6.53% 6.45% 50 50 4.91% 6.03% 5.63% 5.76% 5.80% 6.44% 60 40 4.29% 4.08% 4.40% 4.65% 5.19% 6.02% 70 30 3.66% 3.61% 4.42% 3.94% 4.42% 4.25% 80 20 1.55% 1.87% 2.40% 2.48% 3.65% 3.59% 90 10 0.80% 1.45% 1.57% 1.58% 1.73% 2.00% Table 4.5: Maximum 98% con dence gaps for the SConFL, fi = 30, M = 3, and v factor is varied customer nodes, 10 facility nodes and the lowest variability (v = 5). On the other hand, the highest gap is observed for an instance with 30 demand nodes, 70 facility nodes and the highest variability (v = 30). In terms of computational time, to solve one sample average problem with 20 scenarios takes approximately 90 seconds. Consequently, to obtain a con dence interval on the optimal function value of the true stochastic problem with 10 replica- tions takes approximately 900 seconds plus approximately 100 seconds for post pro- cessing. One could attempt to nd better solutions for the true stochastic problem and tighter con dence intervals increasing the number of scenarios in each sample average problem, the number of replications for each sample average problem or the number of scenarios to evaluate each solution. However, any of these alternatives would indisputably require a longer computational time. There is a natural trade- o between the performance of the SAA and the computational e ort to produce tighter performance bounds. 96 4.7.5 Sample sizes N and N 0 and number of replications R In this section we explore the trade-o between sample size N (i.e., number of scenarios per sample average problem) and the number of replications R (i.e., number of sample average problems). In addition, we describe how we determine the number of scenarios N 0 selected to test each of the solutions generated by the sample average problems for our computational experiments. Earlier we mentioned that we can use a large number of scenarios to assess the quality of a solution, x, and calculate an approximate 100(1 )% con dence upper bound for g(x) by equation (4.28). Clearly, the sample variance is one of the key factors that determines the width of such bound. Figure 4.1 shows how the sample variance of qN 0(x) changes as the number of scenarios, N 0, increases. This gure corresponds to one instance with 50 demand nodes, 50 facility nodes, M = 3, facility opening cost equal to 30 and variability up to 10 in the demand nodes coordinates. The sample variance decreases abruptly at the beginning but later it level o s reaching a plateau at around 2000 scenarios. This behavior was represen- tative for the whole set of problems. Based on this observation, we determined that N 0 = 2000 was an appropriate number of scenarios for our computational experi- ments. Evaluating more scenarios would increase our computational time with very little gain in terms of the quality of the bound. Figure 4.1 gives further insights regarding the solutions yielded by each repli- cation. In theory each distinct sample average problem (or replication) would yield a distinct solution. However, in this case we can observe that the 10 replications 97 200 400 600 800 1000 1200 1400 1600 1800 2000 0 1 2 3 4 5 6 7 Number of scenarios Sample Varianc e Solution 1 Solution 2 Solution 3 Solution 4 Solution 5 Solution 6 Solution 7 Solution 8 Solution 9 Solution 10 Figure 4.1: Sample variance of qN 0(x) for increasing number of scenarios, N 0 provide only 7 distinct solutions. This is an important observation at the time of deciding the number of replications. Increasing the number of replications does not necessarily produce more candidate solutions. Another reason to increase the number of replications, R, is to produce a tighter lower bound. In order to improve the lower bound de ned by equation (4.27), one must decrease the sample variance of E[zNLB] either increasing the number of scenarios, N , for each sample average problem, or the number of replications, R. The results reported in the previous section were calculated using N = 20 and R = 10. If we had twice the computational time available, we could increase either N from 20 to 30, or R from 10 to 20. Table 4.6 shows the sample variance and total computational time for either 20 or 30 scenarios and 10 or 20 replications. Each 98 N = 20 N = 30 Instance R = 10 R = 20 R = 10 variance time (sec) variance time (sec) variance time (sec) 1 3.4922 805.76 1.8242 1589.31 2.7715 1512.27 2 2.7314 837.45 1.2617 1656.42 2.7534 1586.78 3 8.8446 801.28 3.7902 1582.86 5.7064 1510.22 4 6.1019 769.58 3.5685 1521.37 6.2817 1464.41 5 2.7945 804.11 2.6196 1587.08 3.0258 1501.25 6 6.1907 826.36 2.3549 1639.23 3.1859 1581.80 7 2.7922 817.08 2.4721 1614.63 3.3145 1561.22 8 3.4815 792.14 2.3656 1568.42 5.9125 1523.25 9 3.1519 869.80 1.7980 1721.59 0.7566 1670.31 10 1.7649 811.98 1.3088 1600.97 2.5983 1519.61 Table 4.6: Sample variance of E[zNLB] and computational time for various N and R values. row represents the results for an instance with 50 demand nodes, 50 facility nodes, M = 3, facility opening cost equal to 30 and variability up to 10 in demand node coordinates. Interestingly, while we observed earlier that increasing the number of replications does not improve the quality of the upper bound, it does decrease the lower bound sample variance and consequently improve the lower bound. Such improvement is greater than the one obtained by increasing the number of scenarios per sample average problem. These results indicate that increasing the number of replications might be the preferred strategy to produce tighter con dence interval around the true optimal cost and improve the quality of the lower bound. Once again these results were representative for the whole set of instances. 99 4.8 Conclusions In this chapter we have extended the ConFL problem to address two types of uncertainties: demand quantities and assignment edge costs. Furthermore, we have considered two types of models: polynomial-scenario and independent-activation. In either of these models, for demand quantities we show that the problem can be solved optimally by the mean expected value problem. That is, the probability distribution of demands can be ignored as long as the risk of the solution is not considered. As the linear program proposed is optimizing expected values, we do not make any assumptions on the variability of the total cost of the solution. How- ever, we could introduce a constraint that limits the maximum cost of the recourse problem to control for very unfavorable scenarios which would invariably change the problem at hand. In the more general case, where variability occurs on the assignment edges, we show how to transform the problem into a deterministic equivalent formula- tion on a larger graph and apply our dual-ascent local search heuristic. We tested successfully this approach on a set of randomly generated instances with varying network structures and uncertainty levels. We nd this strategy especially valuable for the polynomial-scenario model and for the independent-activation model within the sample average approximation method. We nd that the stochastic solution has signi cant value. The value of the stochastic solution obtained by the DLS heuris- tic, as expected, depends on the variability level on the assignment costs. As the uncertainty is higher, the value of the stochastic solution is higher. 100 Lastly, we show how to use sample average approximation with a heuristic and a lower bounding procedure. Furthermore, we nd that the DLS heuristic within the SAA method yields tight con dence intervals for the SConFL problem under a wide range of parameters. We explore the trade-o s between the number of scenarios and number of replications; and conclude that in general increasing the number of replications yields tighter performance con dence bounds for the true stochastic optimal value. 101 Chapter 5 Robust Optimization for the Connected Facility Connected Problem 5.1 Introduction In this chapter we attempt to address the uncertainty in the ConFL problem by robust optimization, and search for solutions that perform well under varying customer realizations. One example that motivates this approach is the ber-to- the-nodes (FTTN) technology, which the ConFL problem models (Gollowitzer and Ljubic 2011). There is growing interest to ensure broadband accessibility to every household (see Economist 2009a, Frenzel 2010); and consequently, to upgrade and extend ber optic networks. However, there is limited information regarding which customers will subscribe to the service and their demand. On the other hand, demand ranges and coverage areas are easily identi able. From a broad perspective the FTTN layout is as follows. Serving o ces dis- tribute the signal to neighborhoods or homes through ber optic connections. Fiber optic cables run from the serving o ces to a cabinet serving a neighborhood, where end users connect using their existing copper (or even ber optic, Economist 2009b) connections. Switching devices stored in these cabinets and ber optic cables are ex- pensive. Consequently, the problem is to determine how many cabinets (and hence switches) to deploy, where to place them, which customers to connect to them, and how to reconnect cabinets among each other and to the backbone. 102 A considerable amount of time elapses between the moment that cabinets (or facilities) are installed and interconnected and customers are assigned to each facil- ity. Consequently, decision makers have to determine the \optimal" location of these cabinets with partial or limited information based on best estimates and forecasts of the location of future customers and their demand. Furthermore, to explicitly incorporate information on the location of each individual customer might not be possible or even practical. Evidence of the di culty to estimate the location of customers is the fact that while AT&T can provide broadband access to 22 million of household, today only 2.5 millions are current subscribers of the U-verse service. Under this premise we propose a robust optimization approach that would explicitly incorporate the uncertainty of customer locations and provide a solution that mit- igates the adverse e ects of speci c realizations. As we highlighted in the previous chapter, the ConFL problem models other similar problems that combine facility location decisions with connectivity requirements that face similar uncertainty. We follow the optimization approach introduced by Bertsimas and Sim (2003) to search for robust solutions to the original problem. Under Bertsimas and Sim?s approach, a robust solution is an optimal solution that satis es an uncertainty bud- get constraint determined by the decision-maker. For robust optimization problems with uncertainty limited to the objective function coe cients, the budget constraint determines the number of such coe cients that can take their highest value at any solution and the objective function becomes a minimax function. Bertsimas and Sim proposed an algorithmic procedure to solve this minimax optimization problem that consists of solving a family of deterministic, called nominal, optimization prob- 103 lems. These nominal problems are deterministic versions of the original problem yet their cost matrices do not necessarily have the same structure and properties as the original problem. The main algorithm focuses on nding the optimal solution to the nominal problems and therefore an optimal solution to the robust optimization problem. For the case where the nominal problems are not solved to optimality, Bertsimas and Sim (2003) also show that an -approximation algorithm for the deterministic problem can be used to obtain an -approximation algorithm for the robust problem. We extend their approach to heuristics that do not necessarily have a known worst case approximation ratio but where it is possible to calculate lower bounds on the optimal solutions of the nominal problem. For the ConFL problem, we look for network designs that are insensitive to di erent customer realizations yet exhibit low cost under di erent environments. We de ne customer location within an area. The nal location of a customer may fall anywhere within this area. This strategy is specially appropriate for the broadband network problem, where neighborhoods can be described by geographic regions. Furthermore, this approach can be easily extended to assign range demand values for each customer or group of customers in a given region. We examine how such uncertainty a ects the optimal network design and propose an e cient strategy to obtain high-quality solutions to the robust ConFL problem. In summary, the contributions of this chapter are threefold. First, we in- troduce the ConFL problem under customer uncertainty and formalize its robust counterpart. Secondly, we extend Bertsimas and Sim?s robust optimization solution approach to situations where one has a heuristic upper bound and a lower bound on 104 the optimal solution objective value for each nominal problem. In other words as long as we have a heuristic and a lower bounding mechanism for the deterministic problem, we can use them to nd a heuristic solution for the robust problem to- gether with a lower bound on the optimal objective value of the robust optimization problem. And nally, we propose an algorithm based on a DLS heuristic that yields high-quality solutions to the ConFL problem?s robust counterpart. In the next section we explore the black-box model for uncertainty and ana- lyze how the formulation in (4.4) is a ected by incertitude in the location and/or demand quantity of customers. The robust ConFL problem belongs to the family of robust problems whose uncertainty takes place only in the objective function (as opposed to the feasibility set). Consequently, we focus on a model that exploits this characteristic and speci cally applies to problems with box uncertainty in the objective function. 5.2 Robust Optimization and The ConFL Problem When the location of demand nodes or their demand is unknown and there is limited information regarding the probability distribution of this uncertainty, it is not possible to implement an expected value approach that minimizes the expected value of the objective function. Furthermore, optimizing over the expected value might lead to solutions that are very expensive under certain realizations, although they have the minimum expected value. Since this may not be desirable for the decision maker, the robust optimization approach provides a more appropriate 105 solution strategy in these situations. Under the robust optimization framework one optimizes against the worst in- stances that might arise by using a min-max objective. In the ConFL problem, uncertain customer locations and/or demand quantities translate to uncertain as- signment costs. Hence, one robust approach is to optimize against the worst-case scenario (i.e., worst-case analysis) and assume the highest assignment cost for each pair of demand node and facility node. However, indisputably this strategy yields a very conservative and most likely a very expensive solution. Alternatively, Bertsi- mas and Sim (2003) propose a robust optimization method that allows the decision maker to adjust the conservatism level of the solution sought. For the case where uncertainty exists only on the coe cients of the objective function, as it is the case in the ConFL problem under customer uncertainty1, each solution is penalized by a weighted deviation term that incorporates the uncertainty cost of the solution. In the ConFL problem under customer uncertainty, we consider two sources of uncertainty: customer location and demand quantities. To model customer location uncertainty we assume that demand node locations are unknown on a plane and consider two types of uncertainty regions: circular and rectangular. For the circular uncertainty region the location of each demand node is described by a circular disk de ned by two parameters: center coordinates and radius (or diameter). For the rectangular uncertainty region the location is de ned by a set of parameters: center coordinates and distinct deviation ranges for each axis direction. Then, the 1We use the expression customer uncertainty to refer jointly to location and/or demand quantity uncertainty 106 assignment costs range between the closest Euclidean distance between the facility node and the demand node uncertainty region, aij, and the longest distance between them, aij +dij; that is, the uncertain assignment cost, ~aij 2 [aij; aij +dij]. When the facility node falls within the uncertainty region, aij = 0. Our analysis here easily extends to other shapes for the uncertain region. However, within this chapter we focus our attention to only these two shapes, circular and rectangular, as they model the uncertainty generally seen in practice. It turns out that the circle is simple enough to derive theoretical results, and the rectangle is su ciently irregular to convey general characteristics pertaining to other shapes. On the other hand, when demand quantities are uncertain we assume that they also range within a predetermined range. Let ~qi 2 [qi; qi + i] be the random demand quantity for customer i, and ij be the per unit demand assignment cost for customer i to facility j. Then, the random assignment cost is given by ~aij 2 [ ijqi; ijqi + ij i] = [aij; aij + dij], where aij = ijqi and dij = ij i. In both cases uncertainty translates into interval uncertainty as unknown as- signment costs vary between a minimum and a maximum value without reference to a probability distribution. Note that if there is uncertainty on both location and demand, it is easy to see that the result is indeed uncertainty in the assignment costs. 107 5.2.1 Bertsimas and Sim?s Robust Optimization Model In general terms, given the following nominal combinatorial optimization prob- lem Minimize ~cTv (5.1) subject to v 2 X where X f0; 1gn, Bertsimas and Sim (2003) de ne the robust counterpart, where each entry ~cj; j 2 N = f1; 2; :::; ng takes values in [cj; cj + dj], dj 0, and X is a discrete set, as follows Z = Minimize cTv + max fRjR N;jRj g X j2R djvj (5.2) subject to v 2 X: The interpretation of this formulation is that at most of the uncertain values in any solution will take their highest value. Consequently, the decision maker wants to minimize the maximum cost of a solution with at most coe cients at their highest (or wort case) values. In other words, represents a budget constraint that allows the decision maker to adjust the conservatism level of the solution sought. The deviation term (or penalty term), maxfRjR N;jRj g P j2R djvj, represents the sum of the maximum deviation of a speci ed number, , of uncertain coe - cients in the solution. The decision maker can select the conservatism parameter between zero and the maximum number of uncertain coe cients in the problem, n. 108 A parameter of = 0 corresponds to the optimistic case solution, which completely disregards the deviation terms in the cost coe cients, and assumes the best-case scenario with the minimum coe cient cost for each decision variable. A value of = n yields the worst-case scenario solution and each cost coe cient includes the deviation term. Bertsimas and Sim (2003) propose an algorithm to nd a solution to problem (5.2). They show that one can nd the optimal solution to problem (5.2) by solving at most n + 1 nominal (deterministic) problems. To apply this method one must rst identify the values of the deviation coe cients, dj, and label them in decreasing order such that d1 d2 ::: dn dn+1 = 0. Then for each deviation coe cient, dl, one de nes a nominal problem, Gl, given by Gl = dl+ Minimize c Tv + lX j=1 (dj dl)vj (5.3) subject to v 2 X: Note that for equal dl values, one must solve only one Gl nominal problem. Consequently, the number of nominal problems is at most n+ 1. The problem Gl dl is a deterministic instance of the original problem (5.1) with cost coe cients equal to cj + max(dj dl; 0). Then, G1 d1 represents the best-case scenario problem and Gn+1 dn+1 yields the worst-case scenario problem. Theorem 5.2.1. [Bertsimas and Sim (2003)] The optimal function value, Z, to problem (5.2) is given by Z = minl=1;:::;n+1 Gl and the optimal solution, v = 109 arg minl=1;:::;n+1 Gl. Below we restate a sketch of the proof of Theorem 5.2.1 in Bertsimas and Sim (2003). This will be helpful in our subsequent analysis of using upper and lower bounds instead of solving the nominal problems to optimality. Proof of Theorem 5.2.1 For a given value of v, we can alternatively express the inner maximization problem in (5.2) as the following: (v) = Maximize X j2N (djvj)uj (5.4) subject to uj 1; 8j 2 N X j2N uj uj 0; 8j 2 N: Note that (v) has an integral solution for uj (i.e., the polyhedron has integer extreme points). The values uj represent whether a certain value dj in the solution (i.e., with vj = 1) belongs to the set of the highest deviations in the solution v. The dual problem to (v) is given by the following minimization problem: !(v) = Minimize + X j2N j (5.5) subject to j + djvj; 8j 2 N ; j 0 where j are the dual variables to the rst set of constraints and to the conservatism 110 budget constraint. By strong duality (v) = !(v) and consequently we can replace (v) by !(v) in problem (5.2), which leads to the following minimization problem Z = Minimize + X j2N cjvj + X j2N j (5.6) subject to v 2 X j + djvj; 8j 2 N ; j 0: We can observe that constraint j + djvj is binding at the optimal so- lution. If for a given j this constraint were non-binding in the solution, we can improve its total cost by decreasing the value of j. Consequently, j satis es j = max (djvj ; 0) = max (dj ; 0)vj. Note that we can only take vj outside the maximization function because vj 2 f0; 1g. Then we can substitute j in the formulation, rearrange the terms and rewrite the problem as Z = Minimize + X j2N (cj + max (dj ; 0))vj (5.7) subject to v 2 X; 0: We de ne as Z( ) problem (5.7) for a given value of . Then to nd the optimal solution to the robust counterpart, we have to nd the optimal solution 2 <+ that minimizes Z( ). In other words, Z = min 2<+ Z( ). Z( ) is neither a convex or concave function; however, Z( ) is linear over 111 Input: Problem instance and . Output: Solution, vB, and Solution value, ZB foreach dl do Find an -approximate solution vlH using Algorithm H for the nominal problem: Gl dl = minv2X c0v + Pl j=1(dj dl)vj: Let ZHl = c 0vHl + maxfRjR N;jRj g P j2R dj(v H l )j: end Let l = arg minl=1;:::;n+1 ZHl . ZB = ZHl ; v B = vHl . Figure 5.1: Bertsimas and Sim?s Algorithm B 2 [dl+1; dl]. The optimal solution to Z( ) for 2 [dl; dl+1] must lie on one of the end points of the interval [dl; dl+1]. In other words, min 2[dl;dl+1]Z( ) = min (Z(dl);Z(dl+1)). In a similar way, if we extend the interval such that 2 [dl; dl+2], then min 2[dl;dl+2]Z( ) = min (Z(dl);Z(dl+1);Z(dl+2)). Lastly, for 2 < +, Z ( ) = minl=1;2;:::;n+1Z(dl) (and = arg minl=1;2;:::;n+1Z(dl)). Furthermore, Z(dl) = Gl. This concludes the proof of Theorem 5.2.1. 5.2.2 Extending Bertsimas and Sim?s Robust Approximation Algo- rithm We rst describe Bertsimas and Sim?s robust approximation algorithm. Figure 5.1 depicts the algorithm, which is referred to as Algorithm B in their paper. First, one nds an -approximate solution, vHl , to the nominal problem Gl dl, for each deviation term, dl. Following that one computes the total cost of the -approximate solution, vHl , in the robust counterpart problem (5.2). Lastly, the solution that yields the lowest total cost is the algorithmic solution to the robust counterpart problem. 112 Bertsimas and Sim show that given a polynomial time -approximation algo- rithm capable of solving the nominal problems, Algorithm B yields a polynomial time -approximate solution to its robust counterpart. The approach is predicated on having an approximation algorithm available for the combinatorial optimization problem. We show that this approach can be extended to any heuristic procedure if we are able to generate a lower bound for each nominal problem. In cases where the nominal problem is hard to solve, the approach we outline can be used to obtain high quality heuristic solutions to the robust combinatorial optimization problem. We point out that for certain problems even though there might exist a high quality approximation algorithm to solve the deterministic version of the problem, the same algorithm might not be suitable to solve the nominal problems. That is the case with the 3/2-approximation algorithm for the traveling salesman problem (Christo des 1976), which requires that the cost matrix satis es the triangle inequality. We modify Algorithm B to use any heuristic (as opposed to an approximation algorithm) together with a lower bounding procedure for the nominal problems, Gl dl. Figure 5.2 depicts this modi ed Algorithm B, which we will call Robust Combinatorial Optimization Heuristic (RCOH), and nds a lower bound for each nominal problem in conjunction with a heuristic solution. For each deviation coe cient, dl, we rst nd a heuristic solution, vHl , and a lower bound, LBl , using Heuristic H and a lower bounding procedure for the nominal problem l = Gl dl = minv2X c0v+ Pl j=1(dj dl)vj: Let H l be the objective value of the heuristic solution, vHl . Let l be the lower bound gap of solution v H l ; i.e., 113 Input: Problem instance and . Output: Solution, vB, solution value, ZB, and a solution quality assessment, foreach dl do Find a heuristic solution vHl and lower bound LB l using Heuristic H and a lower bounding procedure for the nominal problem: Gl dl = minv2X c0v + Pl j=1(dj dl)vj: Let ZHl = c 0vHl + maxfRjR N;jRj g P j2S dj(v H l )j: end Let l = arg minl=1;:::;n+1 ZHl . Let ZLB = minl=1;:::;n+1 dl + LBl . ZB = ZHl ; v B = vHl . = Z B ZLB ZLB Figure 5.2: Robust Combinatorial Optimization Heuristic (RCOH) l = Hl LB l LBl and let = maxl=1;:::;n+1 l. Then we show that the solution yielded by RCOH using Heuristic H for the robust counterpart problem, ZB, has a lower bound gap , less than or equal to . In other words, ZLB Z ZB (1+ )ZLB and . Theorem 5.2.2. If = maxl=1;:::;n+1 l, where l is the a bound gap for nominal problem l, then the solution yielded by RCOH for the robust counterpart has a lower bound gap , which is less than or equal to . In order to prove Theorem 5.2.2 we need the following four lemmas. Lemma 5.2.3. Let GLBl = dl + LB l be a lower bound to nominal problem, Gl, then ZLB = minl=1;:::;n+1GLBl is a lower bound to Z. Proof. Let Z = G l, where l is the nominal problem that solves (5.2) to optimality, then GLB l is a lower bound to Z. Furthermore, Z LB = minl=1;:::;n+1 GLBl G LB l ; and consequently, ZLB is a lower bound to Z. 114 Lemma 5.2.4. For all v 2 X and 2 <+, max fRjR N;jRj g X j2R djvj X j2N max(dj ; 0)vj + : (5.8) Before we present a formal proof for Lemma (5.2.4), we discuss a simple ex- ample to illustrate and understand the meaning of this lemma. Figure 5.3(a) shows ten deviation values in decreasing order. Each bar represents one dl value with unit width and height equal to dl. In this example, is equal to 52 and equals 6. Then, the light blue area in Figure 5.3(b) highlights the region that corresponds to the left-hand side of equation (5.8). On the other hand, the light blue area in Figure 5.3(c) depicts the area that corresponds to the right-hand side of equation (5.8). By comparing both light blue areas, we can conclude that inequality (5.8) strictly holds. Furthermore, we can conclude that inequality (5.8) holds strictly for > 43:18 and < 40:62, and holds at equality for 2 [40:62; 43:18]. Proof of Lemma 5.2.4. Given a solution v 2 X, we denote by Dv the subset of deviations, dj, whose corresponding vj element is non-zero in the solution. Furthermore, we label the elements in Dv in decreasing order such that d1 d2 ::: djDvj. Then, we can calculate the left-hand side maximization term by adding up the highest deviations in Dv. That is, max fRjR f1;:::;jDvjg;jRj g X j2R dj = X j=1 dj: (5.9) 115 89.51 83.81 65.00 52.73 44.54 43.18 40.62 29.06 25.70 7.62 0.00 25.00 50.00 75.00 100.00 1 2 3 4 5 6 7 8 9 10 89.51 83.81 65.00 52.73 44.54 43.18 40.62 29.06 25.70 7.62 0.00 25.00 50.00 75.00 100.00 1 2 3 4 5 6 7 8 9 10 89.51 83.81 65.00 52.73 44.54 43.18 40.62 29.06 25.70 7.62 0.00 25.00 50.00 75.00 100.00 1 2 3 4 5 6 7 8 9 10 512134 S1216 098 512134 S1216 0(8 512134 S1216 0a8 Figure 5.3: Example for Lemma 5.2.4 116 Without loss of generality, we assume that jDvj. Otherwise, the summation on the right-hand side of (5.9) goes up to minf ; jDvjg. We have to consider three situations depending on the value of : Case (A): > d1 In this case P j2jDvj max(dj ; 0) = 0, and clearly, P j=1 dj . Case (B): djDvj In this case, jDvjX j=1 max(dj ; 0) = jDvjX j=1 (dj ) X j=1 (dj ) = X j=1 dj : In other words, by reordering the terms, we observe that X j=1 dj jDvjX j=1 max(dj ; 0) + : Case (C): 2 (djDvj; d1] Without loss of generality, assume that 2 (dl+1; dl]. Then, we have to consider two cases: (i) l (and d dl) and (ii) < l (and d dl). Note that all of the deviations, dj?s, do not need to be distinct. Given 2 (djDvj; d1], we can always nd an l such that the interval (dl+1; dl] contains . 117 Case (i). l, then max fRjR f1;:::;jDvjg;jRj g X j2R dj = lX j=1 dj + X j=l+1 dj + l l = lX j=1 (dj ) + X j=l+1 dj + l By assumption 8j l + 1; > dj; thus lX j=1 (dj ) + X j=l+1 + l = lX j=1 (dj ) + ( l) + l = lX j=1 (dj ) + Note that max(dj ; 0) = dj for j l; and max(dj ; 0) = 0 for j l + 1; thus max fRjR f1;:::;jDvjg;jRj g X j2R dj jDvjX j=1 max(dj ; 0) + : (5.10) 118 Case (ii). < l, then max fRjR f1;:::;jDvjg;jRj g X j2R dj = lX j=1 dj lX j= +1 dj + l l = lX j=1 (dj ) + l lX j= +1 dj = lX j=1 (dj ) + + (l ) lX j= +1 dj = lX j=1 (dj ) + + lX j= +1 lX j= +1 dj = lX j=1 (dj ) + + lX j= +1 ( dj) By assumption dj 8j l; then lX j= +1 ( dj) 0; and lX j=1 (dj ) + max fRjR f1;:::;jDvjg;jRj g X j2R dj jDvjX j=1 max(dj ; 0) + Now we can reintroduce the decision variables vj with its zero terms and conclude that max fRjR N;jRj g X j2R djvj X j2N max(dj ; 0)vj + for all v 2 X: (5.11) This nalizes our proof of Lemma 5.2.4. We now use Lemma 5.2.4 to show our next lemma. Lemma 5.2.5 states that the total cost of any solution in the robust optimization problem (5.2) is less than 119 its total cost in the nominal problem de ned by any value in <+ and 2 Z+. Lemma 5.2.5. Given v 2 X, Z(v) + (v; ) for all 2 <+ and 2 Z+. Proof. By de nition, Z(v) = c0v + max fRjR N;jRj g X j2R dj(v)j By Lemma 5.2.4, c0v + X j2N max(dj ; 0)(v)j + By de nition of (v; ) = c0v + X j2N max(dj ; 0)(v)j; = (v; ) + Consequently, Z(v) + (v; ): Lastly, Lemma 5.2.6 states that the heuristic solution obtained for the robust problem (5.2) by RCOH has a total cost less than the total cost of any heuristic solution in the nominal problems, Gl, for all l 2 N . Lemma 5.2.6. ZB dl + Hl for all l 2 N . Proof. Since ZB ZHl for all l 2 N , by Lemma 5.2.5 Z B dl + Hl . Finally, we can prove Theorem 5.2.2 that says that the lower bound gap of the robust solution yielded by RCOH is less than the worst lower bound gap for any 120 individual nominal problem, Gl dl. Proof of Theorem 5.2.2. Let be the lower bound gap of the robust solution yielded by RCOH. That is, = ZB ZLB ZLB : (5.12) Let ZLB = LBj + dj for some j 2 N . Then, by Lemma 5.2.6 Hj + dj ( LB j + dj) LBj + dj = Hj LB j LBj + dj = j LBj LBj + dj " LBj LBj + dj # since dj 0, 0 LBj dl + LBj 1, thus (5.13) : Based on this result one would be tempted to conclude that decreases for higher values of . While that is the behavior that we observe in the majority of our computational experiments, it is not necessarily true in all cases. The value of dj varies for di erent values. Consequently, does decrease for higher values of as long as the decrease in dj does not outperform the change in and dj increases. 121 5.2.3 The ConFL problem robust counterpart The ConFL problem is an NP-complete problem costly to solve to optimality, Ljubi c (2007). Consequently, a series of approximation algorithms and heuristics have been proposed to address the ConFL problem (see Swamy and Kumar 2004, Tomazic and Ljubi c 2008, Eisenbrand et al. 2008, Gollowitzer and Ljubic 2011, Jung et al. 2008, Ljubi c 2007). The question remains whether a heuristic (as opposed to an approximation algorithm with a performance guarantee) can be implemented to solve the nominal problems within Theorem 5.2.1 framework and yet nd high- quality solutions to its robust counterpart. In Bardossy and Raghavan (2010) we propose a high quality heuristic based on dual-ascent and local search (DLS) and here we show that the same DLS heuristic can be implemented within a variant of Algorithm B to nd high-quality solutions for the robust counterpart and lower bound gaps for the solution. The robust counterpart of the ConFL problem concerns the problem where each assignment cost ~aij, fi; jg 2 E(D), takes values in [aij; aij+dij], dij 0; fi; jg 2 E(D), but the set of feasible solutions (x; y; z) 2 X that satisfy constraints (4.4b)- (4.4f) does not change for di erent realizations. Using Bertsimas and Sim (2003) de nition of the robust counterpart, we would like to nd a solution (x; y; z) 2 X that minimizes the maximum cost P i2F fizi + P fi;jg2E(S[F ) bijyij + P i2F;j2D ~aijxij such that at most of the coe cients ~aj are allowed to change. In other words, we want to minimize a deterministic ConFL prob- lem objective function,W(x; y; z) = P i2F fizi+ P fi;jg2E(S[F ) bijyij+ P i2F;j2D aijxij, 122 plus the maximum deviation in assignment edges. The robust ConFL is given by the following formulation. Z = Minimize W(x; y; z) + max fRjR E(D);jRj g X fi;jg2R dijxij (5.14) subject to: (x; y; z) 2 X : In any solution to a ConFL instance there is a deviation term for each demand node; consequently, the penalty term incorporates the uncertainty in the location or demand quantity of each demand node. In other words, the optimal solution of the robust problem is determined by the uncertainty level of the most uncertain demand nodes. The added term in the objective function weighs in the uncertainty of the highest possible deviations in the assignment costs for each feasible solution. Alternatively, we can interpret this term as a bu er or protection term in the total cost solution, Z. Let (x ; y ; z ) be the optimal solution to problem (5.14). The realized total cost will always be higher than or equal to W(x ; y ; z ) and with some probability below Z . The probability that the realized cost is less than Z is directly proportional to . Once we have selected solution (x ; y ; z ), that is, opened facilities z , con- nected them using tree y , and assigned customers to open facilities according to x ; when the assignment costs are known, we may observe with some probability a total cost greater than Z. However, depending on the value of the probability 123 Input: ConFL instance: G = (V;E), deviation matrix, D, and . Output: Total cost, ZB, (xB; yB; zB) solution to robust ConFL problem, lower bound, ZLB, and lower bound gap, . Initialize ZB =1+, and ZLB =1+; foreach dl 2 D do Use the DLS heuristic to obtain a solution, (xl; yl; zl), and lower bound, LBl , to the ConFL problem, l; Calculate ZHl ; if ZHl < Z B then Update robust solution: ZB = ZHl and (x B; yB; zB) = (xl; yl; zl); end if dl + LBl < Z LB then Update robust lower bound: ZLB = dl + LBl ; end end Calculate solution lower bound gap: = Z B ZLB ZLB ; Figure 5.4: RCOH for the Robust ConFL problem that the total cost realized will be less or greater than Z will vary. The higher yields a higher Z and also ensures that the probability of experiencing a cost higher than Z decreases. Higher values put higher weight to costly realizations and cap the total cost under more unfavorable scenarios. Figure 5.4 depicts how we implement RCOH using our DLS heuristic for the robust ConFL problem. This algorithm uses the DLS heuristic to nd a solution to each nominal problem, computes the total cost of each solution in the objec- tive function of problem (5.14), and stores the minimum cost solution. Lastly, the algorithm provides the lower bound gap, of the robust solution. One 124 5 4 2 1 3 Facility Node Customer Node Uncertainty Disk (2,6) (7,8) (6,3) (4,8) (3,5)3 4 3 Figure 5.5: Example of robust ConFL problem 5.3 An Example of RConFL problem using RCOH In this section we provide an example to show how we apply RCOH using a small RConFL instance. Figure 5.5 depicts an instance with two facility nodes and three customer nodes where each customer has an uncertainty location within a disk. For this example we set the facility opening cost equal to 1 and the M factor equal to 2. Table 5.1 shows the coordinates for each nodes? center and the radius of the demand nodes? uncertainty disk. Next we calculate assignment costs, that is, minimum assignment cost and maximum possible deviation, for each pair of facility and customer nodes, and tree edge costs. Table 5.2 shows these costs. The minimum assignment cost is given by aij = maxf p (i1 j1)2 + (i2 j2)2 ri; 0g and the maximum deviation by dij = minf p (i1 j1)2 + (i2 j2)2 + ri; 2rig, 125 Customer Nodes Node x-coord y-coord radius 1 2 6 3 2 7 8 3 3 6 3 4 Facility Nodes Node x-coord y-coord 4 4 8 5 3 5 Table 5.1: Coordinates for nodes in the example where (i1; i2) are coordinates of node i. We observe that there are ve distinct deviation values, dl; consequently, fol- lowing Bertsimas and Sim (2003)?s algorithm we need to solve six distinct nominal problems, one for each deviation value, plus an additional problem for dl = 0, D = f8:00; 7:61; 6:00; 5:83; 4:41; 0g. Table 5.3 depicts the assignment costs for each nominal problem l, Gl dl. We use our dual-ascent local search (DLS) heuristic to obtain a solution, (x; y; z)l, for each nominal problem. Table 5.4 shows the lower bounds obtained for each nominal problem using dual-ascent (DA), the best solution value yielded by DLS (note that for this small example the solutions by DA and DLS are equal), and lastly the total cost yielded by the solution (x; y; z)l in the robust formulation for = 2. Based on these results, we observe that ZLB = 16:385 and ZB = 16:385 and can conclude that the best and optimal solution is to open facility 4. In this case, DA?s lower bound proves that the solution obtained by the heuristic is the optimal solution. 126 Assignment Costs Customer Facility Minimum Cost Deviation 1 4 0.00 5.83 1 5 0.00 4.41 2 4 0.00 6.00 2 5 2.00 6.00 3 4 1.39 8.00 3 5 0.00 7.61 Tree Edge Cost Facility Facility Cost 4 5 6.32 Table 5.2: Assignment and tree edge costs in the example Assignment Costs Deviation dl Customer Facility 8.00 7.61 6.00 5.83 4.41 0.00 1 4 0.00 0.00 0.00 0.00 1.41 5.83 1 5 0.00 0.00 0.00 0.00 0.00 4.41 2 4 0.00 0.00 0.00 0.17 1.59 6.00 2 5 2.00 2.00 2.00 2.17 3.59 8.00 3 4 1.39 1.78 3.39 3.56 4.97 9.39 3 5 0.00 0.00 1.61 1.78 3.19 7.61 Table 5.3: Assignment costs for each nominal problem in the example = 2 dl LBl G LB l G H l Z H l Solution 4.41 7.777 16.605 16.605 16.605 Open Facility 5 5.83 4.729 16.386 16.386 16.385 Open Facility 4 6.00 4.385 16.385 16.385 16.385 Open Facility 4 7.61 2.780 17.991 17.991 16.385 Open Facility 4 8.00 2.385 18.385 18.385 16.385 Open Facility 4 0.00 21.020 21.020 21.020 16.605 Open Facility 5 Table 5.4: Preliminary results for the example 127 Solution 0 1 2 3 Open Facility 4 2.385 10.385 16.385 22.214 Open Facility 5 3.000 10.606 16.606 21.020 ZLB 2.385 10.385 16.385 21.020 Table 5.5: Example results for 0 3 We initially set equal to 2. However, we could vary between 0 and 3 and observe how the optimal solution changes. A nice feature of this robust optimization approach is that one can easily vary and observe whether the optimal solution changes. Once we obtain a lower bound and feasible solution for each nominal problem using DLS, we can quickly recalculate the lower bound, ZLB, for a di erent value by adding the term dl to each LBl , and reassessing each solution cost in Problem (5.2). When = 0 we obtain the best-case scenario solution, while when = 3 we obtain the worst-case scenario solution. (Any > 3 yields the same solution and total cost than = 3.) We observe that the six nominal problems, Gl dl, produced only two distinct solutions; that is, to either open facility 4 or facility 5. Table 5.5 shows the lower bound for each problem and the value function for each solution in the robust optimization formulation. To open facility 4 is optimal for every value except 3. In other words, to open facility 4 is a robust solution under a wide range of realizations except when the worst-case scenario takes place. 128 5.4 Special Case The ConFL problem is related to the Weber problem under location uncer- tainty introduced in Cooper (1978), where the location of demand nodes is uncertain and within an uncertainty disk. The centered problem refers to the Weber problem that assumes that demands are located in the center of the uncertainty disk. Cooper (1978) found (and later Juel (1981) correctly noted) that if the optimal solution to the centered problem on the Euclidean norm lies outside of all of the uncertainty circular disks, then the optimal solution to the worst-case problem is the solution to the centered problem and its optimal solution value is equal to the optimal solution value for the centered problem plus a constant term. We can use a similar argument to note that if none of the potential facilities falls within an uncertainty circular disk, one can nd the optimal solution for the whole family of robust problems by solving only the best-case scenario problem. Furthermore, the optimal solution is the same for all parameters. This result does not hold when the optimal solution to the best-case scenario includes at least one facility node within a customer uncertainty disk. That was the case in our earlier example, where facility node 4 falls within the uncertainty disk for demand node 1. The only di erence between the robust ConFL formulation and the ConFL formulation lies in the objective function|speci cally, in the penalty term, that is, the maximization term in the objective function. When this term is constant and equal to maxfDjD D;jDj g P i2D di, it does not depend on the assignment decision 129 xij and the optimal solution to the best-case scenario problem is an optimal solution to the robust problem. We can show that this result holds for the disk uncertainty. Theorem 5.4.1. Let (x ; y ; z )BC be an optimal solution to the best-case problem, and F be the set of open facilities in the solution, that is, F = fjjy j = 1g. Now, if every facility node j 2 F , falls outside the uncertainty disk for every demand node i 2 D, then (x ; y ; z )BC is also an optimal solution to the robust problem (5.14) for any value . Proof. By assumption j 2 F lies outside the uncertainty disk with diameter di for all i 2 D, and assignment cost ~aji 2 [aji; aji + dji]. Since any j 2 F lies outside i?s uncertainty disk, dji = di;8j 2 F . Hence, we can rearrange the order of summation in equation (4.4a) as follows: P i2D P j2F dijxij = P i2D di P j2F xij. However, at optimality P j2F xij = 1. (While constraint (4.4d) enforces that this sum is greater than or equal to zero, the non-negative coe cients, aij, ensure that the constraint is satis ed with equality in the optimal solution.) Then, P i2D P j2F dijxij = P i2D di. Clearly, the penalty term maxfDjD D;jDj g P i2D di is a constant dependent on but una ected by the problem solution. Thus, the optimal solution to problem (5.14) is obtained by solving the best-case problem. Corollary 5.4.2. If every j 2 F falls outside the uncertainty disk for any demand node i 2 D, then the optimal solution to the best-case problem is the optimal solution to the robust problem (5.14). Proof. 130 Clearly, F F and the rest follows from Theorem 5.4.1. 5.5 Computational Experiments We now report on a set of computational experiments with our DLS heuris- tic on the robust ConFL problem. The purpose of these experiments is to assess the e ectiveness of DLS, in terms of solution quality and computational time, to solve the RConFL problem under various uncertainty levels and uncertainty regions. Furthermore, these experiments allow us to observe whether the proposed solution changes under di erent conservatism parameters . We coded our heuristic in Vi- sual Studio 2005 (C++). We conducted all runs on an AMD AthlonTM 62 X2 Dual, 2.61 GHz machine with 3GB of RAM. 5.5.1 Problem Generation and Characteristics We generated problems by rst selecting nodes randomly located on a 100 x 100 square grid. The Euclidean distances were used as a basis for the edge lengths. The assignment edge costs are equal to the edge lengths between demand nodes and facility nodes, while tree edge costs are equal to the edge lengths multiplied by an M factor. The M factor illustrates the signi cantly higher (in terms of cost per unit distance) connection cost of edges in the tree T . For our test instances, we set the M factor equal to 3 and 7, respectively. Bardossy and Raghavan (2010) found in their extensive computational experiments that as the M factor increases the average gap for the DLS heuristic rst increases but later decreases, reaching the 131 maximum at M = 3. In addition, we expect that the uncertainty in the assignment costs to have a higher impact on the overall cost of the solution for smaller values of M , and consequently, we anticipate to observe smaller optimality gaps for the instances with M = 7 than for M = 3. Each of the instances has 50 demand nodes, 50 facility nodes and 20 pure potential Steiner nodes. Bardossy and Raghavan (2010) found that instances with equal number of demand and facility nodes are usually the most di cult one. Lastly, the facility opening cost was set to 30, which leads to the most di cult instances. In summary, these problem characteristics lead to a set of hard ConFL instances in terms of the di culty for the DLS heuristic of Bardossy and Raghavan (2010). In order to evaluate how the shape of the uncertainty region a ects the DLS heuristic performance, in particular the computational times, we generated three sets of instances with various uncertainty regions: circular, square and rectangu- lar. In Set 1 the location of demand nodes is represented by an uncertainty disk whose radius is randomly generated. To evaluate the e ect of uncertainty on the heuristic and the solutions obtained, we considered various ranges for the disk ra- dius and generated them between 0 and 2, 5, 10 or 20 (i.e., radii were randomly generated in the ranges: [0,2], [0,5], [0,10] and [0,20]) on each subset of instances. Consequently, for each demand node, i, we de ned a center location (i1; i2) and an uncertainty radius ri. Then the minimum assignment cost for demand node i from facility j, aij, is the maximum of (1) the Euclidean distance between the two nodes minus the radius and (2) zero (since the minimum assignment cost is bounded by zero when they are collocated). On the other hand, the maximum de- 132 Open nfacilty oydpitcilty FieyDfpnifacDymnli 512 34 3S 4 S6 000 000 909 90 9000900 08 1 Figure 5.6: Example of rectangular uncertainty region viation in the assignment cost, dij, is the minimum of (1) the Euclidean distance between the two nodes plus the radius, and (2) the disk diameter (i.e., twice the radius). In mathematical notation, aij = maxf p (i1 j1)2 + (i2 j2)2 ri; 0g and dij = minf p (i1 j1)2 + (i2 j2)2 + ri; 2rig. In Set 2 the location of demand nodes is represented by a square uncertainty region. Also to evaluate the e ect of uncertainty, we created instances with various i1 and i2 deviations for each coordinate axis, which were uniformly generated between 0 and 2, 5, 10 and 20. For each demand node we de ned a center location (i1; i2) and an uncertainty deviation i (i.e., for square uncertainty region i1 = i2.) Similarly to Set 1, we then calculate the minimum assignment cost and maximum deviation for each pair of demand nodes and facility node. To de ne the minimum assignment cost, we had to determine the location of each facility node with respect to the nine regions around the demand node location as shown in Figure 5.6. Table 5.6 provides the coordinates of the closest possible location of demand 133 Region Closest I (a1 1; a2 + 2) II (j1; a2 + 2) III (a1 + 1; a2 + 2) IV (a1 1; j2) V (j1; j2) VI (a1 + 1; j2) VII (a1 1; a2 2) VIII (j1; a2 2) IX (a1 + 1; a2 2) Table 5.6: Facility node location and closest possible demand node location Region Farthest A (a1 + 1; a2 2) B (a1 1; a2 2) C (a1 + 1; a2 + 2) D (a1 + 1; a2 2) Table 5.7: Facility node location and farthest possible demand node location node a, when a facility node j falls within a given region with respect to demand node a as highlight in Figure 5.6. Then, to determine the maximum deviation in assignment cost, dij, we calcu- late the maximum possible assignment cost and take the di erence from the mini- mum. To calculate the maximum assignment cost we nd the farthest point within the uncertainty region from the facility node. Figure 5.7 shows the four quadrants used to determine the farthest location for the demand node and Table 5.7 the coordinates of the farthest demand location given the location of the facility node. In Set 3 the location of demand nodes is represented by a rectangular uncer- tainty region with di erent variations on each coordinate axis, (i.e., i1 6= i2.) For the example shown in Figure 5.6 and 5.7 with three facility nodes, the 134 Open nfacilty oydpitcilty FieyDfpnifacDymnli 512 34 3S 4 S6 0 9 8 ( 1 Figure 5.7: Example of rectangular uncertainty region Facility aij dij Node 1 p (11 (a1 1))2 + (12 (a2 + 2))2 p (11 (a1 + 1))2 + (12 (a2 2))2 a1a 2 p (21 (a1 + 1))2 p (21 (a1 1))2 + (22 (a2 + 2))2 a2a 3 0 p (21 (a1 1))2 + (22 (a2 + 2))2 Table 5.8: Assignment costs for the example shown in Figure 5.6 and 5.7 assignment costs are as shown in Table 5.8. We used various levels of conservatism or to assess its e ect on the solution and the performance of the heuristic. As we noted earlier, one advantage of this heuristic is that does not enter into the solution procedure until the last step. Once the nominal problems have been solved, considering various levels of re- quires simple computations that do not involve resolving the nominal problems. We considered values between 0 and 50 in steps of 10. We generated 10 instances for each combination of problem characteristics. Note that the nal location of a demand node may fall outside the 100 x 100 135 Radius 0 10 20 30 40 50 2 2.04% 1.98% 1.94% 1.92% 1.90% 1.90% 5 2.47% 2.29% 2.18% 2.10% 2.04% 2.03% 10 2.73% 2.35% 2.02% 1.88% 1.79% 1.77% 20 3.34% 2.61% 1.75% 1.63% 1.48% 1.43% Table 5.9: Average Gaps for Set 1 (M = 3) grid. While the demand node center location will always fall within the 100 x 100 grid by construction, the uncertainty area may extend outside the prede ned grid, and consequently, the nal location of a demand node may fall outside the grid. 5.5.2 Results on the RConFL Problem Here we report on our computational experiments on the three sets of prob- lems. Each entry in the tables represents the average over ten instances. The percentage gaps represent the percentage di erence of the RCOH solution over the lower bound obtained by DA. That is, [%] = UB LBLB : For each instance we must solve a signi cant number of nominal problems (up to jF jjDj = (50)(50) = 2500). DA provides a lower bound for each nominal problem and the minimum of the nominal problems? lower bounds is the lower bound for an instance. 5.5.2.1 Results for Set 1 - Disk Uncertainty Area The DLS heuristic yields high-quality solutions rapidly for the problems in Set 1 (M = 3 and M = 7). We report average lower bound gaps for di erent levels and uncertainty radii in Table 5.9 and 5.10. The average gaps are under 136 Radius 0 10 20 30 40 50 2 2.83% 2.77% 2.72% 2.69% 2.67% 2.67% 5 1.79% 1.70% 1.63% 1.59% 1.57% 1.56% 10 2.64% 2.39% 2.12% 2.00% 1.94% 1.92% 20 3.23% 2.58% 2.23% 2.03% 1.91% 1.88% Table 5.10: Average Gaps for Set 1 (M = 7) Radius Nominal Problems DA Local Search Post-Processing Total 2 52 19.599 0.529 0.009 20.137 5 59 22.253 0.709 0.009 22.971 10 77 29.700 0.855 0.013 30.568 20 141 55.884 1.551 0.022 57.457 Table 5.11: Average Computational Times in Seconds for Set 1 (M = 3) 3.34% and the highest gap for all instances is below 8.76%. Furthermore, we observe that gaps and average gaps decrease for higher values of (as hinted by Theorem 5.2.1.) On the other hand, for higher levels of uncertainty the average gaps remain stable. Actually, contrary to our earlier conjecture that gaps would increase for higher uncertainty levels and lower M factors, the highest gap is observed for an instance with uncertainty radius equal to 2 and M factor equal to 7. There are no signi cant di erences across M factors for the average gaps. Radius Nominal Problems DA Local Search Post-Processing Total 2 52 28.606 0.184 0.009 28.799 5 57 31.425 0.171 0.009 31.606 10 74 39.558 0.274 0.013 39.845 20 128 69.241 0.423 0.022 69.686 Table 5.12: Average Computational Times in Seconds for Set 1 (M = 7) 137 Tables 5.11 and 5.12 show the average number of nominal problems solved for each instance, dual-ascent (DA)?s computational time, local search?s computational time and post-processing time. The post-processing time accounts for the time required to calculate the cost of the solution in problem (5.14) and select the best solution out of all the solutions obtained for the nominal problems. The post- processing time observed is in nitesimally di erent (less than 0.001 second) for each level. (In theory the post-processing time should be equal for all values.) The average number of nominal problems increases with uncertainty because as the radius of the uncertainty disk increases, facility nodes are more likely to fall within uncertainty disks. Hence, the average heuristic time increases as the number of nominal problems increases. Regardless, the average time is below 1 minute when the radii vary up to 10 or less and slightly above 1 minute when the radii go up to 20. Moreover, the maximum time required to solve one instance was below 2 minutes. Dual-ascent takes the bulk of the heuristic time. Another advantage of this method is that we can easily observe how the so- lution changes as the decision-maker increases her conservatism level (i.e., value) and calculate the cost di erence between solutions. Figures 5.8 and 5.9 show the heuristic solution for each instance for di erent values of . Each plot corresponds to ten instances with the same uncertainty level. The x-axis represents the value and the y-axis the solution that yielded the lowest total cost for the instance. In other words, each line represents the solution path for an instance as varies. The plots in these gures show limited changes as a result of changes of . There are more changes for the lower M factor and for higher levels of uncertainty. In the rst 138 0 10 20 30 40 50 0 2 4 Deviation 2 Gamma level Solutio n 0 10 20 30 40 50 0 2 4 6 Deviation 5 Gamma level Solutio n 0 10 20 30 40 50 0 5 10 Deviation 10 Gamma level Solutio n 0 10 20 30 40 50 0 5 10 15 Deviation 20 Gamma level Solutio n Figure 5.8: Solutions for di erent values for Set 1 (M = 3) plot for deviation up to 2, for both Figures 5.8 and 5.9, we observe that the solution does not change as the level increases for any of the instances. Theorem 5.4.1 implies that under certain circumstances one might be able to solve the robust problem by solving the best-case problem. We test the e ectiveness of this strategy even when those special conditions do not hold. We use the DLS heuristic to obtain a solution, (x; y; z)BC , to the best-case problem; we then calculate its cost for the di erent levels in problem (5.14); and lastly, we calculate the percentage cost di erence over the robust solution yielded by RCOH. Tables 5.13 and 5.13 show the average percentage di erence in cost between the best-case scenario solution by DLS and the robust solution by RCOH. For every level of uncertainty the average loss di erence is below 1:66%. Out of 480 instances, the highest loss was 4:37%. In addition, we can note that even though a = 0 corresponds to the best-case problem in some instances the DLS solution is slightly worse than the 139 0 10 20 30 40 50 0 1 2 3 Deviation 2 Gamma level Solutio n 0 10 20 30 40 50 0 2 4 Deviation 5 Gamma level Solutio n 0 10 20 30 40 50 0 1 2 3 Deviation 10 Gamma level Solutio n 0 10 20 30 40 50 0 2 4 Deviation 20 Gamma level Solutio n Figure 5.9: Solutions for di erent values for Set 1 (M = 7) Radius 0 10 20 30 40 50 2 0.44% 0.43% 0.42% 0.41% 0.41% 0.41% 5 0.03% 0.02% 0.02% 0.02% 0.02% 0.02% 10 0.47% 0.45% 0.47% 0.44% 0.43% 0.40% 20 0.92% 0.88% 1.30% 1.61% 1.66% 1.64% Table 5.13: Average best-case solution loss for Set 1 (M = 3) solution yielded by the RCOH for the same problem (see rst column in Table 5.13 and 5.14.) The best-case problem, jointly with many other nominal problems, is solved as part of the RCOH; consequently, the RCOH solution is at least as good or better than the solution obtained by DLS. These results show that as the uncertainty level increases there is a higher value in implementing RCOH to solve the best-case problem, as RCOH might yield a better solution at a slightly higher computational cost. 140 Radius 0 10 20 30 40 50 2 0.33% 0.32% 0.32% 0.32% 0.31% 0.31% 5 0.32% 0.29% 0.28% 0.28% 0.27% 0.27% 10 0.30% 0.32% 0.33% 0.30% 0.29% 0.29% 20 0.28% 0.29% 0.32% 0.32% 0.32% 0.33% Table 5.14: Average best-case solution loss for Set 1 (M = 7) Deviation 0 10 20 30 40 50 2 1.79% 1.71% 1.63% 1.59% 1.58% 1.57% 5 1.98% 1.73% 1.56% 1.51% 1.48% 1.46% 10 2.45% 1.85% 1.69% 1.71% 1.61% 1.58% 20 3.40% 2.03% 1.32% 1.27% 1.18% 1.17% Table 5.15: Average Gaps for Set 2 (M = 3) 5.5.2.2 Results for Set 2 - Square Uncertainty Area The RCOH also provides high-quality solutions for Set 2. The average gaps are under 3.40% and the maximum gap is below 6.55%. The average gaps also decrease as increases and there is no signi cant di erence in the average gaps for higher levels of uncertainty. For low values of , the average gap is slightly higher Deviation 0 10 20 30 40 50 2 1.95% 1.90% 1.87% 1.87% 1.84% 1.84% 5 2.25% 2.10% 1.94% 1.88% 1.81% 1.80% 10 2.27% 1.82% 1.73% 1.64% 1.57% 1.56% 20 3.38% 2.44% 1.92% 1.71% 1.74% 1.72% Table 5.16: Average Gaps for Set 2 (M = 7) 141 Deviation Nominal Problems DA Local Search Processing Total 2 2485 958.792 30.116 0.437 989.345 5 2484 960.863 28.764 0.436 990.063 10 2478 968.769 29.190 0.531 998.491 20 2480 1005.351 27.770 0.539 1033.660 Table 5.17: Average Computational Times in Seconds for Set 2 (M = 3) Deviation Nominal Problems DA Local Search Processing Total 2 2474 1328.071 9.275 0.437 1337.783 5 2473 1288.235 8.263 0.436 1296.934 10 2477 1287.162 9.185 0.531 1296.879 20 2473 1343.423 8.657 0.539 1352.619 Table 5.18: Average Computational Times in Seconds for Set 2 (M = 7) for higher uncertainty levels. However, this tendency fades away as increases. There are no signi cant di erences in the average gaps for M = 3 and M = 7. In regards to computational times, the instances in Set 2 take a signi cantly higher amount of time than the instances in Set 1. The main reason is that to solve the robust problem, we have to solve a signi cant number of nominal problems. The average number of nominal problems for each instance in Set 2 is above 2400. Recall that the maximum number of possible nominal problems in these instances is 2500. Moreover, the number of nominal problems is independent of the uncertainty level. Consequently, the total computational times are much higher and they aver- age around 22 minutes. Table 5.17 and 5.18 show the average number of nominal problems and computational times for instances in Set 2 with M = 3 and M = 7, re- spectively. Even though the computational times are higher, they are substantially shorter than the ones required by other methods. Based on the results reported in 142 0 10 20 30 40 50 0 2 4 6 Deviation 2 Gamma level Solutio n 0 10 20 30 40 50 0 5 10 Deviation 5 Gamma level Solutio n 0 10 20 30 40 50 0 5 10 Deviation 10 Gamma level Solutio n 0 10 20 30 40 50 0 10 20 30 40 Deviation 20 Gamma level Solutio n Figure 5.10: Solutions for di erent values for Set 2 (M = 3) Ljubi c (2007), one could expect the average computational time for each nominal problem to be around one hour that would translate in an average of 2400 hours for each instance! That means that RCOH using DLS takes only 0:015% of the needed time by the exact method (i.e., Algorithm A of Bertsimas and Sim (2003)). Figures 5.10 and 5.11 show how the heuristic solution changes for each instance for each uncertainty level. We observe that there are more changes in the solution for M = 3 than for M = 7 for all level of uncertainty. One explanation might be that assignment costs have a higher impact on the overall total cost of the solution when M is lower; and consequently, the value that determines the number of assignment cost at their highest in the solution have also a higher e ect on determining the best solution. Similar to what we observed in Set 1, changes in have higher impact on the solution for higher levels of uncertainty. Lastly, Tables 5.19 and 5.20 show the average di erence between the cost of the 143 0 10 20 30 40 50 0 1 2 3 Deviation 2 Gamma level Solutio n 0 10 20 30 40 50 0 2 4 6 Deviation 5 Gamma level Solutio n 0 10 20 30 40 50 0 5 10 Deviation 10 Gamma level Solutio n 0 10 20 30 40 50 0 5 10 15 Deviation 20 Gamma level Solutio n Figure 5.11: Solutions for di erent values for Set 2 (M = 7) Radius 0 10 20 30 40 50 2 0.54% 0.53% 0.53% 0.54% 0.53% 0.52% 5 0.35% 0.32% 0.42% 0.42% 0.40% 0.39% 10 0.37% 0.75% 0.71% 0.70% 0.75% 0.73% 20 1.17% 1.75% 2.74% 2.83% 2.82% 2.74% Table 5.19: Average best-case solution loss for Set 2 (M = 3) Radius 0 10 20 30 40 50 2 0.20% 0.20% 0.22% 0.22% 0.22% 0.22% 5 0.58% 0.59% 0.69% 0.67% 0.68% 0.68% 10 0.40% 0.49% 0.46% 0.45% 0.42% 0.42% 20 1.35% 1.23% 1.42% 1.39% 1.37% 1.34% Table 5.20: Average best-case solution loss for Set 2 (M = 7) 144 Deviation 0 10 20 30 40 50 2 1.72% 1.68% 1.64% 1.61% 1.60% 1.59% 5 1.83% 1.63% 1.62% 1.49% 1.37% 1.42% 10 2.27% 1.87% 1.74% 1.62% 1.64% 1.65% 20 2.62% 1.66% 1.29% 1.30% 1.48% 1.77% Table 5.21: Average Gaps for Set 3 (M = 3) Deviation 0 10 20 30 40 50 2 2.22% 2.17% 2.13% 2.10% 2.05% 2.03% 5 1.91% 1.75% 1.69% 1.63% 1.61% 1.62% 10 2.48% 2.18% 1.99% 2.08% 2.07% 1.91% 20 1.39% 1.26% 1.47% 1.56% 1.73% 1.78% Table 5.22: Average Gaps for Set 3 (M = 7) best-case scenario solution and the robust solution. The robust solution is slightly better than the best-case scenario solution for low levels of uncertainty. However, for the highest level of uncertainty in our computational results, the best-case scenario solution is up to 5.39 % more costly than the robust solution. The average loss reaches up to 2.89%. 5.5.2.3 Results for Set 3 - Rectangular Uncertainty Area Tables 5.21 and 5.22 show the average gaps for Set 3. RCOH also provides high-quality solutions for Set 3. The average gaps are under 2.62%. Similarly, the highest gap in Set 3 is below 5.54%. The average gaps show an overall tendency to decline as increases with one exception. For high uncertainty levels, they 145 Deviation Nominal Problems DA Local Search Processing Total 2 2494 962.571 28.969 1.638 993.177 5 2494 962.457 29.430 1.569 993.456 10 2489 971.110 28.427 1.039 1000.577 20 2491 1011.519 26.699 1.021 1039.238 Table 5.23: Computational Times for Set 3 (M = 3) Deviation Nominal Problems DA Local Search Processing Total 2 2483 1278.859 7.727 1.638 1288.223 5 2489 1272.502 9.599 1.569 1283.670 10 2483 1345.038 9.830 1.039 1355.907 20 2485 1357.619 7.875 1.021 1366.514 Table 5.24: Computational Times for Set 3 (M = 7) rst slightly decrease and later increase when increases. There are no signi cant di erences in average gaps across M factors. The average number of nominal problems is slightly higher in Set 3; and conse- quently, the computational times are proportionally higher. The average total time per instance is approximately 22 minutes but less than 25 minutes for any instance. Figures 5.12 and 5.13 show how the heuristic solution for each instance changes for di erent values of . Again there is slightly more activity for M = 3 than for M = 7 and for higher values of uncertainty. Tables 5.25 and 5.26 show the average cost increase of the best-case scenario solution over the robust solution. The average loss reaches up to 3:21% and the highest loss for one instance goes up to 5:08%. Once again in the rst column we observe that even for the best-case problem RCOH yields better solutions than DLS. In one instance that di erence reaches 3:34%. Interestingly, average losses are 146 0 10 20 30 40 50 0 2 4 6 Deviation 2 Gamma level Solutio n 0 10 20 30 40 50 0 10 20 Deviation 5 Gamma level Solutio n 0 10 20 30 40 50 0 5 10 15 20 Deviation 10 Gamma level Solutio n 0 10 20 30 40 50 0 20 40 Deviation 20 Gamma level Solutio n Figure 5.12: Solutions for di erent values for Set 3 (M = 3) 0 10 20 30 40 50 0 2 4 6 Deviation 2 Gamma level Solutio n 0 10 20 30 40 50 0 5 10 Deviation 5 Gamma level Solutio n 0 10 20 30 40 50 0 5 10 Deviation 10 Gamma level Solutio n 0 10 20 30 40 50 0 5 10 15 20 Deviation 20 Gamma level Solutio n Figure 5.13: Solutions for di erent values for Set 3 (M = 7) 147 Radius 0 10 20 30 40 50 2 0.53% 0.52% 0.53% 0.54% 0.53% 0.52% 5 0.54% 0.52% 0.56% 0.62% 0.65% 0.65% 10 0.64% 0.60% 0.75% 0.97% 1.02% 0.94% 20 1.00% 1.83% 2.63% 2.97% 3.21% 3.17% Table 5.25: Average best-case solution loss for Set 3 (M = 3) Radius 0 10 20 30 40 50 2 0.27% 0.28% 0.26% 0.25% 0.25% 0.24% 5 0.31% 0.33% 0.31% 0.38% 0.40% 0.40% 10 0.88% 0.82% 0.84% 0.95% 1.09% 1.25% 20 0.81% 0.99% 1.16% 1.52% 1.56% 1.58% Table 5.26: Average best-case solution loss for Set 3 (M = 7) slightly higher for Set 2 than Set 3. 5.6 Conclusions In this chapter we address the uncertainty in the ConFL problem by robust optimization. Speci cally, we extend Bertsimas and Sim?s approach to situations where one uses a lower bounding mechanism jointly with a heuristic to obtain a solution to the nominal problems (as opposed to solving the nominal problems to optimality). We tested this strategy on the ConFL problem using the DLS heuristic and found high-quality solutions for various levels of uncertainty and budget of conservatism, which highlights the potential and applicability of the RCOH to other robust combinatorial optimization problems. 148 In our computational experiments we observe that the computational time of RCOH is directly proportional to the the number of deviation terms. There is one nominal problem for each deviation term. Consequently, we leave as future research to explore and devise strategies to cut down the number of nominal problems neces- sary to solve to nd a high-quality heuristic solution for the robust problem. Notice that if the set X in problem (5.1) is a convex set then Z( ) is a convex function of . Suppose we consider the linear relaxation in problem (5.7) and denote the objective as ZLP ( ), since this function is piecewise linear and convex to determine the mini- mum value of ZLP ( ) for 0 we need only to consider the nominal problems. Our dual ascent lower bound for a nominal problem is ZLB( ) ZLP ( ). We can use this fact together with the convexity of ZLB( ) to prioritize which nominal problem to solve rst or at all. Speci cally, if the lower bound in a region is higher than the best solution (upper bound) in another region, we can disregard any value beyond the value that produced the lower bound. This could signi cantly reduce the total computational time of the RCOH. The following observations relate to the Bertsimas and Sim?s robust optimiza- tion approach itself and concern a deeper understanding of the method and the implications derived by its solutions for the ConFL problem. The following are some interesting questions that might be worth exploring in the future: 1. What does the di erence in the total solution cost for di erent values indi- cate? 2. What does that di erence in cost indicate if the solution changes for adjacent 149 values? 3. What does that di erence in cost mean if the solution does not change? Recall that the value is subjective and determined by the decision-maker. In real world applications the decision-maker does not have control over how many of the deviations will in fact take their highest value. 4. How should the decision-maker make a choice for the value for the problem at hand? Should the uncertainty in the demand nodes be considered to determine ? Should there be a relationship between the number of demand nodes with the highest uncertainty level and the considered? Another note worth discussing regarding robust optimization for the ConFL problem is that in order to calculate the assignment costs and the maximum devi- ation in assignment costs, we consider one facility node and demand node at the time. However, one might think of situations where the actual assignment of the de- mand nodes does not take place until the location of the demand node is revealed. Consequently, the assignment may change depending on the nal location of the demand node as in a two-stage problem. In such situations, the earlier calculated maximum deviations in costs might never occur because the demand node may get routed to a di erent facility node before reaching the maximum deviation. In other words, the maximum assignment cost depends on the nal set of opened facilities. These questions and observations may reveal some weaknesses in the method, specially for the ConFL problem. However, we think that they are actually oppor- tunities worth understanding and exploring in the future and we look forward to 150 devising stronger robust optimization formulations that would better address the ConFL problem. 151 Bibliography Balakrishnan, A., T. L. Magnanti, R. T. Wong. 1989. A dual-ascent procedure for large- scale uncapacitated network design. Operations Research 37(5) 716{740. Bardossy, M.G., S. Raghavan. 2010. Dual-based local search for the connected facility location and related problems. INFORMS Journal on Computing 22(4) 584{602. Beale, E.M.L. 1955. On Minimizing A Convex Function Subject to Linear Inequalities. Journal of the Royal Statistical Society. Series B (Methodological) 173{184. Berman, O. 1978. Dynamic positioning of mobile servers on networks. Ph.D. thesis, Massachusetts Institute of Technology. Berman, O., A.R. Odoni. 1982. Locating Mobile Servers on a Network With Markovian Properties. Networks 12(1) 73{86. Bertsimas, D., M. Sim. 2003. Robust discrete optimization and network ows. Mathemat- ical Programming 98(1) 49{71. Birge, J.R., F. Louveaux. 1997. Introduction to stochastic programming . Springer Berlin. Christo des, N. 1976. Worst case analysis of a new heuristic for the traveling salesman problem. Tech. Rep. 388, Graduate School of Industrial Administration, Carnegie- Mellon University, Pittsburgh, PA. Chu, C. H., G. Premkumar, H. Chou. 2000. Digital data networks design using genetic algorithms. European Journal of Operational Research 127(1) 140{158. Cooper, L. 1978. Bounds on the Weber problem solution under conditions of uncertainty. Journal of Regional Science 18(1) 87{92. Dantzig, G.B. 1955. Linear programming under uncertainty. Management Science 1(3) 197{206. Economist. 2009a. And access for all. The Economist 392(8636) 60{01. Economist. 2009b. Who pays for the pipes? The Economist 393(8661) 6{8. Eisenbrand, F., F. Grandoni, T. Rothvo , G. Sch afer. 2008. Approximating connected fa- cility location problems via random facility sampling and core detouring. Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms 1174{1183. Erlenkotter, D. 1978. A dual-based procedure for uncapacitated facility location. Opera- tions Research 26(6) 992{1009. Frenzel, L. 2010. Broadband for everyone. Tech. Rep. 7, Electronic Design. Gollowitzer, S., I. Ljubic. 2011. MIP models for connected facility location: A theoretical and computational study. Computers & Operations Research 38(2) 435{449. Gupta, A., J. Kleinberg, A. Kumar, R. Rastogi, B. Yener. 2001. Provisioning a virtual private network: a network design problem for multicommodity ow. Proceedings of the 33rd Annual ACM Symposium on Theory of Computing 389{398. Gupta, A., A. Kumar, T. Roughgarden. 2003. Simpler and better approximation al- gorithms for network design. Proceedings of the 35th Annual ACM Symposium on Theory of Computing 365{372. Gupta, A., R. Ravi, A. Sinha. 2004. An edge in time saves nine: LP rounding approxima- tion algorithms for stochastic network design. Annual Symposium on Foundations of Computer Science, vol. 45. IEEE Computer Society Press, 218{227. Havet, F., M. Wennink. 2004. The push tree problem. Networks 44(4) 281{291. 152 Juel, H. 1981. Bounds in the generalized Weber problem under locational uncertainty. Operations Research 29(6) 1219{1227. Jung, H., M. K. Hasan, K. Y. Chwa. 2008. Improved primal-dual approximation algorithm for the connected facility location problem. Lecture Notes in Computer Science 5165 265{277. Kall, P., S.W. Wallace. 1994. Stochastic programming . John Wiley & Sons Inc. Karger, D. R., M. Minko . 2000. Building Steiner trees with incomplete global knowl- edge. Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science 613{623. Khuller, S., A. Zhu. 2002. The general Steiner tree-star problem. Information Processing Letters 84(4) 215{220. Kleywegt, A.J., A. Shapiro, T. Homem-de Mello. 2002. The sample average approximation method for stochastic discrete optimization. SIAM Journal on Optimization 12(2) 479{502. Krick, C., H. R acke, M. Westermann. 2003. Approximation algorithms for data manage- ment in networks. Theory of Computing Systems 36(5) 497{519. Laporte, G., FV Louveaux, L. Van Hamme. 1994. Exact solution to a location problem with stochastic demands. Transportation Science 28(2) 95{103. Lee, Y., S. Y. Chiu, J. Ryan. 1996. A branch and cut algorithm for a Steiner tree-star problem. INFORMS Journal on Computing 8(3) 194{201. Lee, Y., L. Lu, Y. Qiu, F. Glover. 1993. Strong formulations and cutting planes for designing digital data service networks. Telecommunication Systems 2(1) 261{274. Ljubi c, I. 2007. A hybrid VNS for connected facility location. Lecture Notes in Computer Science 4771 157{169. Ljubi c, I. 2009. Private communication. Louveaux, F.V., D. Peeters. 1992. A dual-based procedure for stochastic facility location. Operations Research 40(3) 564. Mirchandani, P. B., A. R. Odoni. 1979. Locations of medians on stochastic networks. Transportation Science 13 85{97. Mirchandani, P.B. 1975. Analysis of stochastic networks in emergency service systems. Ph.D. thesis, Massachusetts Institute of Technology. Nuggehalli, P., V. Srinivasan, C. F. Chiasserini. 2003. Energy-e cient caching strategies in ad hoc wireless networks. Proceedings of the 4th ACM International Symposium on Mobile ad hoc Networking & Computing 25{34. Raghavan, S. 1995. Formulations and algorithms for network design problems with con- nectivity requirements. Ph.D. thesis, Massachusetts Institute of Technology. Ravi, R., A. Sinha. 2006. Hedging uncertainty: Approximation algorithms for stochastic optimization problems. Mathematical Programming 108(1) 97{114. Riis, M., K.A. Andersen. 2003. Capacitated network design with uncertain demand. INFORMS Journal on Computing 14(3) 247{260. Ruszczynski, A., A. Shapiro. 2003. Stochastic Programming. Handbook in Operations Research and Management Science, vol. 40. Elsevier. Salman, FS, J. Cheriyan, R. Ravi, S. Subramanian. 1997. Buy-at-bulk network design: Ap- proximating the single-sink edge installation problem. Proceedings of the 8th Annual 153 ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA, 619{628. Sen, S., R.D. Doverspike, S. Cosares. 1994. Network planning with random demand. Telecommunication Systems 3(1) 11{30. Shapiro, A., A. Philpott. 2007. A tutorial on stochastic programming. Shmoys, D. B., E. Tardos, K. Aardal. 1997. Approximation algorithms for facility location problems. Proceedings of the 29th Annual ACM Symposium on Theory of Computing 265{274. Snyder, L.V. 2006. Facility location under uncertainty: A review. IIE Transactions 38(7) 547{564. Swamy, C. 2004. Approximation algorithms for clustering problems. Ph.D. thesis, Cornell University. Swamy, C., A. Kumar. 2004. Primal{dual algorithms for connected facility location prob- lems. Algorithmica 40(4) 245{269. Tomazic, A., I. Ljubi c. 2008. A GRASP algorithm for the connected facility location problem. 2008 International Symposium on Applications and the Internet 257{260. Van Slyke, R.M., R. Wets. 1969. L-shaped linear programs with applications to optimal control and stochastic programming. SIAM Journal on Applied Mathematics 17(4) 638{663. Verweij, Bram, Shabbir Ahmed, Anton J. Kleywegt, George Nemhauser, Alexander Shapiro. 2003. The sample average approximation method applied to stochastic rout- ing problems: A computational study. Computational Optimization and Applications 24 289{333. Weaver, JR, RL Church. 1983. Computational procedure for location problems on stochas- tic networks. Transportation Science 17(2) 168{180. Wong, R. T. 1984. A dual ascent approach for Steiner tree problems on a directed graph. Mathematical Programming 28(3) 271{287. Xu, J., S. Chiu, F. Glover. 1996a. Using tabu search to solve Steiner tree-star problem in telecommunications. Telecommunications Systems 117{125. Xu, J., S.Y. Chiu, F. Glover. 1996b. Probabilistic tabu search for telecommunications network design. Journal of Combinatorial Optimization 1 69{94. 154