Approximation Algorithms for Connected Dominating Sets  Sudipto Guha y Dept. of Computer Science University of Maryland,College Park, MD 20742 Samir Khuller z Dept. of Computer Science and UMIACS University of Maryland,College Park, MD 20742 Abstract The dominatingset problem in graphs asks for a minimumsize subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in the dominating set. We focus on the question of nding a connected dominating set of minimum size, where the graph induced by vertices in the dominating set is required to be connected as well. This problem arises in network testing, as well as in wireless communication. Two polynomial time algorithms that achieve approximation factors of O(H()) are presented, where  is the maximum degree, and H is the harmonic function. This ques- tion also arises in relation to the traveling tourist problem, where one is looking for the shortest tour such that each vertex is either visited, or has at least one of its neighbors visited. We study a generalization of the problem when the vertices have weights, and give an algorithm which achieves a performance ratio of 3lnn. We also consider the more general problem of nding a connected dominating set of a speci ed subset of vertices and provide an O(H()) approximation factor. To prove the bound we also develop an optimal approximation algorithm for the unit node weighted Steiner tree problem. 1 Introduction The connected dominating set problem is de ned as follows. Find a minimum size subset S of vertices, such that the subgraph induced by S is connected and S forms a dominating set. This problem is known to be NP-hard [7]. Recall that a dominating set is one in which each vertex is either in the dominating set, or adjacent to some vertex in the dominating set.  A preliminary version of this paper will appear in the Proceedings of the Fourth Annual European Symposium on Algorithms (ESA 1996). y Research supported by NSF Research Initiation Award CCR-9307462. Email addr: sudipto@cs.umd.edu z Research supported by NSF Research Initiation Award CCR-9307462, and NSF CAREER Award CCR- 9501355. Email addr: samir@cs.umd.edu 1 A related problem is the traveling tourist problem. Given a graph G = (V;E) nd the shortest walk visiting a subset of vertices, such that each vertex is either visited, or has at least one of its neighbors visited. (The vertices of the graph correspond to monuments the tourist would like to see, and an edge between two vertices denotes visibility of one monument from another.) The shortest such walk would guarantee that the tourist sees all monuments of interest. We show that a approximation for the connected dominating set problem yields a 2 approximation for the traveling tourist problem. Consider a spanning tree of the connected dominating set S and perform a tree traversal. This yields a walk in which exactly 2(jSjBnZr 1) edges are traversed. Any set of vertices visited by the tourist, form a connected dominating set. Thus S  OPT  OPT TT , where OPT TT denotes an optimal traveling tourist tour, and the result follows. We also study the connected dominating set problem when the vertices have weights, and we wish to minimize the total weighted sum of the vertices that form the connected dominating set. This also yields an approximation algorithm for the weighted traveling tourist problem, where the weights could potentially denote the tourist's cost of buying a ticket to visit the monument. We also consider Steiner generalizations, where only a speci ed subset of vertices have to be dominated by a connected dominating set. 1.1 Our Results We present two approximation algorithms for this problem. The rst algorithm develops a greedy algorithm for solving the problem. A naive greedy algorithm is shown to do badly. Surprisingly, with a simple modi cation we are able to show an approximation factor of 2(1+ H()) (in practice, this algorithm appears to do very well). We also provide a very ecient implementation of this algorithm. The second algorithm is an improvement of the rst algorithm. The algorithm nds a dominating set in the rst phase, and in the second phase connects the dominating set. In an earlier version of this paper [8] we established a bound of H() + H(H()). Using Slav  ik's greedy set-cover bound [17], we were able to show that the approximation factor is lnn+O(1). Recently, Berman suggested a modi cation to the algorithm, which improves the approximation factor to H() + 2. We describe this algorithm and give a simple proof for a performance guarantee of ln+3. We also show an approximation preserving reduction from the set-cover problem to the con- nected dominating set problem, showing that it is hard to improve the approximation guarantee unless NP  DTIME[n O(loglogn) ] [13, 6]. We give a 3lnn approximation for the version when the vertices have weights. We also show that the upper bound of 2lnk for approximating node weighted Steiner trees [10], can be improved to lnk, when all Steiner vertices have unit weight. We then use this result to give a 3lnk approximation for nding a connected dominating set for a speci ed subset of vertices. We also outline a second algorithm that gives an approximation factor of (1 + c)H(min(;k))+ O(1), where c is the best approximation ratio for the Steiner 2 tree problem (currently c = 1:644 [12]). Even though this algorithm has a better approximation guarantee, it is not practical due to the high running time, albeit polynomial. 1.2 Preliminaries The Steiner tree problem is de ned as follows: given a subset of required vertices in an edge weighted graph, nd a minimum weight tree spanning the required subset of vertices. (Note that the tree may include other vertices that are not required vertices.) The node weighted Steiner tree problem is essentially the same problem, except that the vertices of the graph have weights associated with them and the weight of the tree is the sum of the weights of its vertices. The Unit Node Weighted Steiner tree is the special case when all vertices that are not required, have the same weight. The required vertices all have zero weight. The set cover problem is the following: given a set of elements U, and a set of subsets S, of U, we wish to nd the smallest collection of sets S 0  S such that [ S2S 0 S = U. The set TSP problem is de ned as follows: given an edge weighted graph G = (V;E) and a partition of V = (V 1 [V 2 [:::[V k ), nd the shortest tour that contains at least one vertex from each V i . Given a graph G = (V;E), we use  to denote the maximum degree of a vertex in the graph. We use n and m to denote the number of vertices and edges in G. We use N(v) to denote the set of neighbors of a vertex v. 1.3 Applications The paper by Paul and Miller [15] discusses applications related to testing nodes in a computer networkusing a short \traveling touristtour". They also consider the related question of nding a tour that visits each edge of the graph (connected vertex cover). This is needed when one requires testing the links as well as the nodes. Approximation algorithms for the latter problem were given by Arkin, Halldorsson and Hassin [1]. We observe that there is a simple algorithm for the unweighted connected vertex cover problem that gives a factor 2 approximation (the one given in [1] is more complicated). Do a Depth First Search, and take all the non-leaf vertices as the nodes in the vertex cover. This clearly induces a connected graph, and the approximation ratio is 2, as shown by Savage [16]. In practice, however this method will probably give large connected vertex covers. Other applications for the connected dominating set problem are in doing broadcasts for wireless computers in digital battle elds. The broadcast is done to the vertices in the con- nected dominating set. The nodes in the connected dominating set are responsible for relaying messages. Each node not in the dominating set, is not responsible for relaying any messages [9]. Other relevant issues are regarding the maintenance of the connected dominating set as the network topology changes. 2 Algorithm I We introduce an algorithm that nds a connected dominating set, by \growing" a tree. 3 The idea behind the algorithm is the following: grow a tree T, starting from the vertex of maximum degree. At each step we will pick a vertex v in T and \scan it". Scanning a vertex, adds edges to T from v to all its neighbors not in T. In the end we will nd a spanning tree T, and will pick the non-leaf nodes as the connected dominating set. Initially all vertices are unmarked (white). When we scan a vertex (color it black), we mark all its neighbors that are not in T and add them to T (color them gray). Thus marked nodes that have not been scanned are leaves in T (gray nodes). The algorithm continues scanning marked nodes, until all the vertices are marked (grayor black). The set of scanned nodes (black nodes) will form the CDS in the end. The main question is the following: what rule should we use for picking a vertex to be scanned? A natural choice is to pick the vertex that has the maximum number of unmarked (white) neighbors. We call this the \yield" of the scan step. Unfortunately, as the following example shows this may not work well (see Fig. 1). u v N(u) N(v) Figure 1: Example to show that the scanning rule fails. Let u and v be vertices of degree d. There is a solution of size four, by picking a path from u to v as the CDS. The algorithm begins by marking and scanning u. This adds all of u's neighbors to T. We pick a vertex from N(u) and scan it, adding its only unmarked neighbor (from N(v)) to T. At this point, each vertex has exactly one unmarked neighbor. We could pick a vertex from N(u) again, and scan it, adding its only unmarked neighbor to T. This continues until all the vertices from N(u) have been scanned. Finally we scan a vertex from N(v) and mark v. At this point, the algorithm has picked d+ 2 vertices. Implementation Issues: The above algorithm can be implemented in O(m) time (and was implemented). To achieve this running time, we use a data structure DS that maintains all gray vertices in T with a key value equal to the number of white neighbors that they have. Rather than using a heap, we maintain an array of linked lists, where DS[i] is a pointer to a (doubly linked) list containing all the gray vertices that have exactly i white neighbors. We also maintain an integer maxd that records the maximum i, such that DS[i] 6= nil. This makes it easy to locate a gray vertex with the highest \yield". The main work is in updating the value of maxd when DS[maxd] becomes nil. The work that is done is at most O(maxd), and since at this step, maxd vertices are colored gray, we can \charge" the work done to the vertices that are colored gray at this step. (Equivalently, we could develop a potential function to prove 4 this.) The other operations are easy to perform (for example, when a vertex is colored gray, we need to update the entries for its neighbors that are already in DS and create an entry in DS for this vertex). The entire algorithm runs in O(m) steps. This implementation is useful because it leads to a heuristic for the maximum leaf spanning tree problem as well [11]. Modi ed Greedy Algorithm: We now modify the scanning rule to prove a good approxi- mation ratio for this class of algorithms (that grow a connected dominating set). We de ne a new operation of scanning a pair of adjacent vertices u and v. Let u be gray and v be white. Scanning the pair means, rst making u black (this makes v along with some other nodes, gray) and then coloring v black (makes more nodes gray). The total number of nodes that are colored gray is called the \yield" of the scan step. At each step, we will either scan a single vertex, or a pair of vertices, whichever gives the higher yield. (In some sense we are doing a \look-ahead" by one extra vertex, and are willing to scan a pair, if this has a higher yield.) It is clear that this algorithm nds the optimal solution in the example shown in Fig. 1. What is perhaps a little surprising, is that this simple modi cation lets us prove the following theorem. Theorem 2.1 Using the scanning rule described above yields a connected dominating set of size at most 2(1+ H())jOPT DS j. Proof: Let OPT DS be the set of vertices in an optimal dominating set. The sets of vertices of G dominated by vertex i 2 OPT DS is called S i (we assume that i also belongs to S i . If a vertex is dominated by more than one vertex, we arbitrarily put it in one of the sets). The proof will be based on a charging scheme. Each time we scan a vertex, we add a new vertex to our connected dominating set. We will \charge" each new vertex marked (colored gray) in this step. Since each vertex in the graph gets marked exactly once, it is charged exactly once (the rst time it is marked). We will then prove that the total charge on the vertices belonging to a set S i (for any i) is at most 2(1+H()). Since there are jOPT DS j sets in the optimal solution, the theorem follows. Assume that when we pick a vertex to scan, we mark x new vertices. We will charge each such newly marked vertex 1 x . In some steps we scan two vertices, and charge each newly marked vertex 2 x . The main advantage of the \look-ahead" is the following. The instant we mark some nodes in set S i , even if vertex i has not been marked, since it is adjacent to a marked vertex, it becomes eligible to be scanned as part of a pair. Without the look-ahead, only marked vertices were candidates to be scanned . We now prove the upper bound on the total charges to vertices belonging to a single set S i . At each step, some vertices may get marked. The number of unmarked vertices is initially u 0 , and nally drops to 0. Let u j denote the number of unmarked vertices after step j. For simplicity, let us assume that at each step some vertices of S i are marked, so the number of unmarked vertices decreases at each step. The number of marked vertices after the rst step is u 0 BnZru 1 . Each vertex gets a charge of at most 2 u 0 BnZru 1 (the actual charge may be a lot smaller, if only one vertex was scanned at this step, or if we marked many other vertices as well). Once some vertex in S i is marked, vertex i becomes an \eligible" vertex to be scanned as a part of a pair, since it is adjacent to a marked 5 vertex. In the j th step, the number of vertices of set S i that get marked is u j BnZru j+1 , and the charge to each vertex is at most 2 u j as vertex i was an eligible vertex to be scanned. Let u k = 0. Adding up all the charges we get 2 u 0 BnZru 1 (u 0 BnZru 1 )+ kBnZr1 X j=1 2 u j (u j BnZru j+1 )  2+2 kBnZr1 X j=1 (u j BnZru j+1 ) u j : (With some algebraic manipulation (see [5, page 977]), one can show that this is at most 2(1+ H()). 2 Remark: We could modify the algorithm and at each step scan either one or two vertices, whichever results in a smaller charge to each vertex. In practice, this should give better solu- tions. Implementation Issues: A naive implementation appears to give a worst case running time of O(mn 2 ). In each iteration we choose either one vertex, or a pair of vertices, and color them black. It is clear that we may have (n) iterations, since the optimal solution may have (n) vertices. In each iteration, we wish to identify a pair of nodes with the highest yield. For each gray vertex u, we scan its adjacency list and consider all its white neighbors. For each white neighbor v of u, we wish to determine the number of vertices that would get marked if we scanned the pair (u;v). Since u and v have common white neighbors, we cannot simply add up the number of white neighbors of each vertex to obtain the \yield" of this pair. We need to identify the number of white neighbors of v that are not adjacent to u (since those will not be colored gray by u). The number of steps in a single iteration can be computed as follows. Let G be the gray nodes in T. Let W be the white vertices that are adjacent to gray vertices. We can upper bound the total work done in a single iteration as follows: S = X u2G X v2N(u)^v2W d(v) : In the double summation each vertex in W is counted as many times as the number of its gray neighbors, we obtain the following. S  X v2W d(v) 2  X v2W nd(v)  O(mn) : This yields a bound of O(mn 2 ). We now show that the total number of steps over all iterations is O(mn) by a more careful analysis. For each vertex we can maintain two adjacency lists, one of its gray neighbors and one of its white neighbors. We use d W (u) to denote the number of white neighbors of u and d G (u) to denote the number of gray neighbors of u. The work done in a single iteration is as follows: S = X u2G X v2W^v2N(u) d W (v) 6 = X v2W d W (v)d G (v) : (In the double summation, each vertex v is counted as many times as the number of its gray neighbors.) Observe that at this step, we will make a subset of white vertices gray. Lemma 2.2 The number of white vertices that are made gray in this iteration is at least 1 2 max v2W d W (v) : Proof: We pick the pair of vertices that give the highest \yield"; we certainly consider all such vertices v, and color their white neighbors gray. We might pick a single vertex with a smaller yield, but only if its yield is at least half the yield of a pair of vertices. 2 At this step, we can \charge" the vertices whose color changed from white to gray. The charge to each such vertex is at most P v2W d W (v)d G (v) 1 2 max v2W d W (v)  2 X v2W d G (v) = 4m : Since each vertex changes color from white to gray exactly once over the entire algorithm, and there are n such vertices the total number of steps is O(mn). The only remaining issue is maintaining the required adjacency lists. This can be done each time we change the color of a vertex from white to gray by scanning its adjacency list, and updating the structures for its neighbors. 3 Algorithm II An alternate approach to growing one connected tree is to grow separate components that form a dominating set and to then connect them together. One straightforward approach is to nd a dominating set using a greedy heuristic, and to use a Steiner tree approximation to connect it. Since members of the optimum connected dominating set along with the members of the dominating set we found, form a spanning tree, we can prove a performance guarantee of c(1+ H()), where c is the best approximation ratio for the unweighted Steiner tree problem (currently c = 1:644 [12]). For the special case when the required vertices form a dominating set in a graph and all edges have unit weight, Berman and Furer [3] have announced a new algorithm with c = 4 3 . Thus we can improve the performance ratio to 4 3 (1 + H()). By applying a simple greedy strategy to connect the vertices in the dominating set, we proved a bound of H()+H(H()) [8]. Here we present a modi cation of the above algorithm, as suggested by Berman [2], and are able to prove a performance guarantee of ln + 3. (Berman has an alternate proof for a performance ratio of H()+2.) 7 The algorithm runs in two phases. At the start of the rst phase all nodes are colored white. Each time we include a vertex in the dominating set, we color it black. Nodes that are dominated are colored gray (once they are adjacent to a black node). In the rst phase the algorithm picks a node at each step and colors it black, coloring all adjacent white nodes gray. A piece is de ned as a white node or a black connected component. At each step we pick a node to color black that gives the maximum (non-zero) reduction in the number of pieces. We show that at the end of this phase if no vertex gives a non-zero reduction to the number of pieces, then there are no white nodes left. In the second phase, we have a collection of black connected components that we need to connect. Recursively connect pairs of black components by choosing a chain of two vertices, until there is one black connected component. Our nal solution is the set of black vertices that form the connected component. Lemma 3.1 At the end of the rst phase there are no white vertices left. Proof: Suppose there is a white node v at the end of the phase. We will show that there is a vertex that strictly reduces the number of pieces. If v has a white neighbor then coloring v black, reduces the number of white nodes by two, and increases the number of black components by one, thus picking v would reduce the number of pieces. Otherwise, v has a gray neighbor u. Coloring u black would reduce the number of white nodes, and not increase the number of black components since u is adjacent to a black node. Thus picking u reduces the number of pieces. 2 We show that at the end of the rst phase there is always a pair of black components that can be connected by choosing a chain of two vertices. For each such component i, we consider the shortest path to component j. The path goes through vertices u 1 ;u 2 ;u 3 ;:::;u k not in components i or j. u 1 is dominated by a vertex in component i. Observe that u 2 is gray, otherwise picking u 1 would give a strict reduction in the number of pieces. Thus u 2 is adjacent to a black component ` (` 6= i since we selected the shortest path from i to j). Components i and ` can be connected by choosing a chain of two vertices. Theorem 3.2 The connected dominating set found by the algorithm is of size at most (ln+ 3)jOPTj. Proof: De ne a i as the number of pieces left after i th iteration, and a 0 = n. Since a node can connect up to  pieces, jOPTj  a 0  . Consider i + 1 st iteration. The optimal solution can connect a i pieces. Hence the greedy procedure is guaranteed to pick a node which connects at least l a i jOPTj m pieces. This gives us the recurrence relation, a i+1  a i BnZr  a i jOPTj  + 1  a i (1BnZr 1 jOPTj )+ 1 : Its solution is, a i+1  a 0 (1BnZr 1 jOPTj ) i + iBnZr1 X j=i (1BnZr 1 jOPTj ) j : 8 Notice after jOPTj ln a 0 jOPTj iterations, the number of pieces left is less than 2  jOPTj. For each node we choose, we will decrease the number of pieces by at least one. This will continue until the number of black components is at most jOPTj, thus at most jOPTj more vertices are picked. Assume from this point onwards, we stop after choosing a f more nodes. The number of pieces left to connect is at most jOPTj BnZr a f . We connect the remaining pieces choosing chains of two vertices in the second phase. The total number of nodes chosen is at most jOPTjln a 0 jOPTj +jOPTj+a f +2(jOPTjBnZra f ), and since jOPTj  a 0  , the solution found has at most jOPTj(ln+ 3) nodes. 2 Remark: Berman, [2], has an alternate proof of H()+2 of the same algorithm. However, since ln  H()BnZr 0:7, the di erence is very small. 4 Generalizations 4.1 Vertex Weighted Graphs An approximation factor of 3lnn is possible when the vertices have weights. The algorithm rst nds a dominating set, and then connects the nodes in the dominating set. Step 1. Use a weighted set cover approximation algorithm to nd a dominating set DS. (A set cover instance is created by making each vertex an element, and each vertex corresponds to a set that contains the vertex itself, together with its neighbors. The greedy algorithm picks sets based on the ratio of their weight to the number of new elements they cover.) Step 2. To connect the vertices in DS we use a node-weighted Steiner tree approximation algorithm due to Klein and Ravi [10] to nd a Steiner tree that includes all the vertices in DS, after making the weights of all vertices in DS equal to zero. This yields a connected dominating set CDS. Theorem 4.1 The weight of vertices in CDS is at most 3lnn  jOPTj where OPT is the minimum weight connected dominating set in G. Proof: The weight of the vertices in DS is at most lnjOPTj. We now run the algorithm by Klein and Ravi [10] for the node-weighted Steiner tree case. The approximation factor of the algorithm is 2lnk, where k is the number of Steiner vertices. Consider the vertices in OPT; these together with the vertices in DS induce a connected subgraph. Hence there exists a node weighted Steiner tree of weight OPT. The total weight of the vertices in the connected dominating set is the weight of DS together with the weights of optional vertices chosen from G in the Steiner instance. Adding the weight of the two sets gives the required bound. 2 Before looking at other generalizations, we rst consider a problem closely related to our discussion. 9 4.2 Unit Node Weighted Steiner Trees The best known algorithm for node weighted Steiner trees, has a performance ratio of 2lnk, where k is the number of required vertices [10]. However, if the nodes have unit weight, there is a simpler algorithm, which gives a better performance ratio. We have k required vertices in a graph G = (V;E), which we want to connect using the least number of non-required vertices. We assume that the non-required vertices have weight 1, and the required vertices have weight 0. Our algorithm runs in two phases. In the rst phase, the algorithm greedily picks high degree stars (a star is a vertex that has at least two required vertices belonging to distinct components as neighbors) and merges them, until very few components are left. In the second phase, the algorithm runs a Steiner tree approximation algorithm with each edge having unit weight. In a preprocessing phase we merge all adjacent required vertices into their connected com- ponents. We pick  = 2c+1 where c is the best approximation ratio for the unweighted Steiner tree problem. Algorithm A Step 1. In each iteration choose a vertex that merges the largest number of required vertices until we reach a stage that the number of components left to merge is less than iteration count lnkBnZr +e  or no merging is possible. Step 2. Apply an (edge weighted) Steiner tree approximation algorithm, with each edge having unit weight. Theorem 4.2 The above algorithm nds a solution to the unit node weighted Steiner tree (UNST) problem with an approximation factor of lnk (which is best possible), when the optimal solution is greater than ce  . Proof: Assume that the set of components remaining after the rst phase is A 0 . We claim that there is a Steiner tree with jA 0 j+jOPTj edges. Thus when we apply an (edge weighted) Steiner tree approximation, we get a tree with at most c(jA 0 j+ jOPTj) edges. If the number of iterations in the rst phase is r, the nal solution has a cost r+c(jA 0 j+ jOPTj). We now proceed to give a bound on r. Let a i components be left after i th iteration. Since jOPTj nodes are capable of merging these components, for each i, in the i th iteration, there must be a node that merges l a iBnZr1 jOPTj m components. This gives a bound on a i , a i  a iBnZr1 BnZr  a iBnZr1 jOPTj  + 1  a iBnZr1 (1BnZr 1 jOPTj )+ 1: We can easily verify that a i  a 0  (1 BnZr 1 jOPTj ) i + P iBnZr1 j=0 (1 BnZr 1 jOPTj ) j . The second term is a geometric series that sums to at most jOPTj. Thus when i = (lnkBnZr)jOPTj the rst term 10 is at most e  , and the number of components a i  jOPTj+ e   i lnkBnZr + e  . This guarantees that the number of iterations, r  (lnkBnZr)jOPTj. If we stop because no merging by stars is possible, then the components have disjoint neighborhoods, and OPT has to pick at least one vertex from each neighborhood. Thus jA 0 j  jOPTj. If we stop because the number of components is small, then jA 0 j  jOPTj + e  . In any case, jA 0 j  jOPTj + e  and this yields a solution of cost at most lnk jOPTj + ce  + (2cBnZr )jOPTj. Putting  = 2c + 1 gives at most lnk  jOPTj vertices in our solution (when jOPTj  ce 2c+1 . 2 The optimality of this approximation ratio was established by Berman (see [10]). We can modify the above algorithm, to run until no further merging is possible. Algorithm B Step 1. In each iteration choose a vertex that merges the largest number of required vertices (at least two). Step 2. Apply an (edge weighted) Steiner tree approximation algorithm, with each edge having unit weight. Theorem 4.3 The above algorithm nds a solution to the unit node weighted Steiner tree (UNST) problem with an approximation factor of ln+ 2c+1 . Proof: As before, let a i denote the number of vertices left after the i th iteration and a 0 = n. Then after jOPTjln a 0 jOPTj ; there are at most 2jOPTj components to connect. Hence we will continue to merge by stars for jOPTj more iterations then the number of components will be de nitely less than jOPTj. Since each Steiner vertex can be adjacent to at most  required vertices, jOPTj  a 0  . If at this stage we use a f more iterations before invoking the edge weighted Steiner tree algorithm, there is a tree with jOPTj BnZr a f + jOPTj edges. So we nd a solution of cost at most c  (jOPTj BnZr a f + jOPTj). The nal solution has at most jOPTj  ln a 0 jOPTj + jOPTj + a f + c (jOPTjBnZra f + jOPTj) nodes. Since jOPTj  a 0  , we get a performance guarantee of ln+ 2c+ 1 for the algorithm. 2 4.3 Dominating a Subset of Vertices We now address the connected dominating set problem when we are required to dominate only a speci ed subset S of the vertices. The cost of the solution is the size of the smallest connected dominating set that dominates the vertices in S. (Notice that the objective function is slightly di erent from the unit node weighted Steiner tree problem, where required vertices have zero cost. In the Steiner CDS problem, we are charged for all vertices in the nal solution that are not leaf nodes in the tree that connects S.) 11 Unweighted Graphs Let jSj = k, and OPT denote the optimal solution. we present two algorithm that solve this problem. A straightforward strategy is to rst nd a small dominating set A, of the vertices in S, and to then connect these nodes. Algorithm A Step 1. Greedily choose a dominating set of the vertices in S. We can transform this to a set cover problem in which corresponding to each vertex v we have a set that includes all its neighbors and itself. The greedy algorithm for set cover yields a dominating set A. Step 2. For each element in A choose a representative element in S that is adjacent to it. Call this set R(A). Run the unit node weighted Steiner tree approximation algorithm to nd a Steiner tree with required set R(A). The nal solution is the union of A, R(A), and the Steiner tree vertices. Theorem 4.4 The connected dominating set for the subset S of size k, is at most 3lnk times the optimal. Proof: Since we chose the cover greedily, we have that jAj  jOPTj lnk, since OPT forms a dominating set for S. Notice that jAj  k. We cannot claim that there is a Steiner tree of size jOPTj connecting the set A. But there is a Steiner tree of size jOPTj connecting the elements of set R(A), since the connected dominating set also forms a Steiner tree on the members of S, and R(A)  S. Notice that jR(A)j jAj  k. Apply Theorem 4.2, and obtain a Steiner tree of R(A), of size at most jOPTjlnjR(A)j. So the nal solution is of cost less than jAj+jR(A)j+jOPTjlnjR(A)j  3lnkjOPTj. 2 Algorithm B Step 1. We modify the greedy set cover algorithm on the set S, to run until no vertex covers more than one uncovered vertex of S. We call the set of vertices chosen as B. Step 2. We now choose the uncovered vertices of S , calling this set B 0 . Step 3. For each member of B, choose a representative element of S that it dominates. Let this set be R(B). We apply an (edge weighted) Steiner tree approximation, with the set of required nodes as R(B)[B 0 . The nal solution is the nodes of this tree and the nodes of B. Theorem 4.5 The connected dominating set for the subset S, is at most (c+ 1)H()+ cBnZr 1 times the optimal (where c is the Steiner ratio). We de ne  as the size of the largest subset of S, adjacent to a node in the graph (  min(;k)). Proof: By a slight modi cation to the proof given in [5, page 977] we can prove, jBj (H()BnZr 1) jOPTj. (Since the rst step reduces to nding a set cover with the size of the largest set being ). Since OPT cannot dominate any two vertices of B 0 by one vertex, jB 0 j  jOPTj. Notice B [B 0 dominates the set S. 12 Consider the set R(B); there is a Steiner tree with jR(B)j+jB 0 j+jOPTj edges that connects the nodes of R(B)[B 0 . Apply an (edge weighted) Steiner tree approximation, with all edges having unit weight, and nd a tree of size c(jR(B)j+jB 0 j+jOPTj), where c is the Steiner ratio [12]. Since this tree is edge weighted, it has essentially the same number of nodes, including those of R(B)[B 0 . Since we havetoadd the vertices ofB aswell, wegetan upper bound ofc(jR(B)j+jB 0 j+jOPTj)+jBj. Notice that jR(B)j  jBj  (H()BnZr 1)jOPTj, and jB 0 j  jOPTj. This gives us a solution of cost at most ((c+1)H()+ cBnZr 1)jOPTj. 2 This de nitely is a better algorithm in terms of the worst case approximation guarantee. However the rst algorithms is simpler and faster. Most of the approximation algorithms that reduce the Steiner ratio below 2, have a high running time [4, 12]. 5 Lower Bounds 5.1 Hardness result for Connected Dominating Set We can prove that the set-cover problem can be reduced to the connected dominating set problem by an approximation preserving reduction, thus showing that the approximation factor H() will be hard to improve. This is based on the hardness results for set cover proven by Lund and Yannakakis [13] and Feige [6]. Given a set cover instance we reduce it to a connected dominating set problem as follows: Let the set cover instance be to cover the set U, with minimum number of sets from the collection S = fS 1 ;S 2 ;:::;S m g. Construct a graph G, that has vertex set U S fu;v;v 1 ;v 2 ;:::;v m g. An element e 2 U, and v i has an edge joining them i e 2 S i . Each v i has an edge to v. u has an edge only to v. (see Fig. 2) u v v m v 1 e 1 e 3 e 2 e n Figure 2: Reduction of set cover to connected dominating sets. Let us look at a minimum connected dominating set of G. Vertex v belongs to any con- nected dominating set, and hence u does not belong to any minimal connected dominating set. 13 No vertex e j is chosen in a minimal connected dominating set, since any node that it might potentially dominate, is already dominated by v, which also provides the connectivity. Hence we will only have v and some v i 's. These v i 's will correspond to the minimum cover for the given instance of set cover. The size of the connected dominating set is one more than the minimum set cover. Thus ap- proximating the connected dominating set with a factorof (1BnZr)H() would mean approximat- ing minimum setcoverwithin thesamefactor. This wouldimply thatNP  DTIME[n O(loglogn) ] [6]. 5.2 Hardness results for Generalizations We show two simple reductions, that demonstrate that other generalizations of the CDS prob- lem may be as hard to approximate as the \set TSP" problem for which no approximation algorithms are known. (For the Euclidean case, Mata and Mitchell [14] have given approxima- tion algorithms for this problem.) c j V j Figure 3: Reduction of set TSP problem to edge weighted CDS Theorem 5.1 A polynomial approximation algorithm for the edge weighted connected domi- nating set problem with factor f(n) would imply a polynomial approximation algorithm for the set TSP problem with factor 2f(n). Proof: We show how to reduce the set TSP problem to the edge weighted connected dominating set problem. Consider a set TSP instance G = (V;E) where V = (V 1 [V 2 [:::[V k ). For each subset V j , introduce a special vertex c j , and add edges from c j to all v 2 V j , with very high cost edges. For u;v 2 V j , if (u;v) 62 E, add the edge (u;v) with very high cost. Call this new graph G 0 . 14 Any set TSP tour in G chooses at least one vertex of V j to visit. Thus all nodes of V j [fc j g will be dominated by the corresponding node in the tour. Since every node of G occurs in some V j , this yields a dominating set. Since these are nodes on a tour, they also form a connected set. Hence OPT CDS  OPT TOUR . If we have a connected dominating set of G 0 , then it must have a vertex of V j to dominate c j . Hence the dominating set must have at least one vertex from each set V j . If the cost of this connected dominating set is small ( f(n)OPT CDS ), since we are not using the high cost edges in G 0 , we are using only the edges of the graph G. By traversing this tree twice, we can produce a tour in G, with cost at most 2f(n)OPT CDS  2f(n)OPT TOUR . Thus, if we can approximate the connected dominating set with edge weights to a factor f(n), we can approximate set TSP within a factor 2f(n). 2 Theorem 5.2 A polynomial approximation algorithm for the node weighted Steiner connected dominating set problem with factor f(n) would imply a polynomial approximation algorithm for the set TSP problem with factor 2f(n). Proof: The proof is similar to the proof of the previous theorem. Given a set TSP instance G = (V;E) where V = (V 1 [ V 2 [ :::[ V k ) we construct a graph G 0 . First convert the edge weights of the set TSP problem into node weights. For every edge e = (v p ;v q ) 2 E, create an extra node v pq of the same cost, connected to v p and v q . All other nodes are given 0 cost. For every subset V j , introduce a special vertex c j (of very high cost), and connect it to all v 2 V j . We show that the problem reduces to nding a node weighted connected dominating set of the subset U = fc j jj = 1:::kg of nodes of G 0 . Any set TSP tour in G chooses at least one vertex of V j to visit. Thus each c j will be dominated. The weight of the edges e = (v p ;v q ) translates to the weight of the corresponding vertices v pq . Since the nodes form a tour, they also form a connected set in G 0 , together with the new nodes that subdivide edges. Thus OPT CDS  OPT TOUR . Consider a connected dominating set that dominates U. To dominate c j , it must pick a vertex from V j . (W.l.o.g, the connected dominating set does not contain c j .) If the cost of this connected dominating set is small ( f(n)OPT CDS ), (since we are not using the high cost nodes in G 0 ), we are using only the nodes of the graph G along with nodes that correspond to the subdivided edges. Thus the dominating set chooses vertices that are also in G, and the corresponding vertices for each edge of G that it includes. This yields a tree that connects at least one element from each V j using edges of G. By traversing this tree twice, we can produce a tour in G, of cost at most 2f(n)OPT CDS  2f(n)OPT TOUR . Thus, if we able to approximate the connected dominating set of a subset with node weights to a factor f(n), we can approximate set TSP within a factor 2f(n). 2 Acknowledgements: We thank Ray Miller and Azriel Rosenfeld for rst mentioning the traveling tourists problem. We thank Vaduvur Bharghavan for re-igniting our interest in the connected dominating sets problem. We would like to thank Estie Arkin, and Randeep Bhatia for useful discussions. We thank Balaji Raghavachari and Serge Plotkin for raising the questions about the vertex weighted case, and the Steiner case respectively. We thank Piotr Berman for allowing us to include his improvement to Algorithm II. 15 References [1] E. Arkin, M. Halldorsson and R. Hassin, \Approximating the tree and tour covers of a graph", Information Processing Letters, 47: 275{282, (1993). [2] P. Berman, personal communication, May (1996). [3] P. Berman and M. Furer, personal communication, May (1996). [4] P. Berman and V. Ramaiyer, \Improved approximation algorithms for the Steiner tree problem", J. Algorithms, 17:381{408, (1994). [5] T.Cormen, C. Leiserson, and R. Rivest, Introduction to Algorithms, The MIT Press, 1989. [6] U. Feige, \A threshold of lnn for approximating set-cover", 28th ACM Symposium on Theory of Computing, pages 314{318, (1996). [7] M. R. Garey and D. S. Johnson, \Computers and Intractability: A guide to the theory of NP-completeness", Freeman, San Francisco (1978). [8] S. Guha and S. Khuller, \Approximation algorithms for connected dominating sets", Proc. of 4th Annual European Symposium on Algorithms, (1996). [9] A. Kothari and V. Bharghavan, \Algorithms for unicast and multicast routing in ad-hoc networks", manuscript. [10] P. N. Klein and R. Ravi, \A nearly best-possible approximation algorithm for node- weighted Steiner trees", J. Algorithms, 19(1):104{114, (1995). [11] D. Kleitman and D. West, \Spanning trees with many leaves", SIAM Journal on Discrete Mathematics, 4(1):99{106, (1991). [12] M.Karpinsky and A.Zelikovsky, \New approximationalgorithmsfortheSteiner treeprob- lem", Technical Report, Electronic Colloquium on Computational Complexity (ECCC): TR95-030, (1995). [13] C. Lund and M. Yannakakis, \On the hardness of approximating minimization problems", Journal of the ACM, 41(5): 960{981, (1994). [14] C. S. Mata and J. S. B. Mitchell \Approximation algorithms for geometric tour and network design problems", Proc. of the 11th Annual Symp. on Computational Geometry, pages 360{369, (1995). [15] S. Paul and R. Miller, \Locating faults in a systematic manner in a large heterogeneous network", IEEE INFOCOM, pages 522{529, (1995). [16] C. Savage, \Depth-First search and the vertex cover problem", Information Processing Letters, 14(5): 233{235, (1982). [17] P. Slav  ik \A tight analysis of the greedy algorithm for set cover" 28th ACM Symposium on Theory of Computing, pages 435{441, (1996). 16