A polynomial-time interior-point method for conic optimization, with inexact barrier evaluations? Simon P. Schurr? Dianne P. O?Leary? Andr?e L. Tits? University of Maryland Technical Report CS-TR-4912 & UMIACS-TR-2008-10 April 2008 Abstract We consider a primal-dual short-step interior-point method for conic convex opti- mization problems for which exact evaluation of the gradient and Hessian of the primal and dual barrier functions is either impossible or prohibitively expensive. As our main contribution, we show that if approximate gradients and Hessians of the primal barrier function can be computed, and the relative errors in such quantities are not too large, then the method has polynomial worst-case iteration complexity. (In particular, polyno- mial iteration complexity ensues when the gradient and Hessian are evaluated exactly.) In addition, the algorithm requires no evaluation?or even approximate evaluation?of quantities related to the barrier function for the dual cone, even for problems in which the underlying cone is not self-dual. Keywords. Convex cone, conic optimization, self-concordant barrier function, interior-point method, inexact algorithm, polynomial complexity. 1 Introduction Interior-point methods (IPMs) are the algorithms of choice for solving many convex opti- mization problems, including semidefinite programs (SDPs) and second-order cone programs ?This work was supported by NSF grant DMI0422931 and DoE grant DEFG0204ER25655. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation or those of the US Department of Energy. ?Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1; email: spschurr@uwaterloo.ca . ?Computer Science Department and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742; email: oleary@cs.umd.edu . ?Department of Electrical and Computer Engineering and Institute for Systems Research, University of Maryland, College Park, MD 20742; email: andre@umd.edu . Correspondence should be addressed to this author. 1 (SOCPs) (e.g., [NN94, BTN01, BV04]). The bulk of the work at each iteration of an IPM consists of the evaluation of the first and second derivatives of a suitably selected barrier function and the solution of the linear system of equations formed by Newton?s method. The most efficient algorithms require barrier functions for both the primal and dual problems. The success of IPMs on SDPs and SOCPs has not been matched for more general classes of conic convex optimization problems. The primary hindrance is the difficulty of explicitly evaluating the derivatives of the associated barrier functions, particularly those for the dual problem. The construction of the universal barrier function [NN94] provides a definition for these derivatives but does not necessarily prescribe a polynomial-time evaluation procedure. Even for well-studied problem classes, evaluation of the derivatives can be quite expen- sive. For example, in SDP, where the variables are symmetric positive semidefinite matrices, derivatives of the barrier function are usually evaluated by computing inverses or Cholesky decompositions of matrices. Since these linear algebra tasks can be expensive when the ma- trices are large or dense, it might be preferable to instead compute approximate inverses or approximate Cholesky decompositions, and hence an approximate gradient and Hessian of the barrier function, if doing so does not greatly increase the number of IPM iterations. Therefore, in this work, we consider applying a simple primal-dual short-step IPM to conic convex optimization problems for which exact evaluation of the gradient and Hessian is either impossible or prohibitively expensive. Note that this algorithm does not require evaluation of the gradient and Hessian of the dual barrier function. (This property is shared by an algorithm recently proposed by Nesterov [Nes06].) As our main contribution, we show that if estimates of the gradient and Hessian of the primal barrier function are known, and their relative errors are not too large, then the algorithm has the desirable property of polynomial worst-case iteration complexity. There is an extensive body of literature on inexact IPMs for linear optimization problems, and a smaller body of work on inexact methods for nonlinear convex optimization problems. We notethepaper[Tun01], inwhichit was shownthat asimple primal-dualpotential reduction method for linear optimization problems can in principle be applied to conic optimization problems. It is shown how the assumptions made on the availability of a self-concordant barrier function and its derivatives are crucial to the convergence and complexity properties of the method. The inverse Hessian of an appropriate self-concordant barrier function is approximated by a matrix that may be updated according to a quasi-Newton scheme. We take a different approach to such approximations. In the IPM literature, the term ?inexact algorithm? typically refers to algorithms in which approximate right-hand sides of the linear system are computed. In some schemes approx- imate coefficient matrices are also used. The inexact algorithm we consider here can be interpreted in this way, although our primary concern is to estimate the gradient and Hes- sian of a suitable self-concordant barrier function, rather than the coefficient matrix and right-hand-side of the linear system per se. In, e.g., [MJ99, FJM99, Kor00] infeasible-point methods for linear optimization are presented in which the linear system is solved approx- imately. In [LMO06] an inexact primal-dual path-following method is proposed for convex quadratic optimization problems. In [BPR99] an inexact algorithm is presented for mono- tone horizontal linear complementarity problems. (This class of complementarity problems includes as special cases linear optimization problems and convex quadratic optimization problems.) Inexact methods for other classes of nonlinear convex problems have also been 2 studied. In [ZT04] an inexact primal-dual method is presented for SDP. We also mention the paper [TTT06] in which an inexact primal-dual path-following algorithm is proposed for a class of convex quadratic SDPs. Only the equations resulting from the linearization of the complementarity condition are perturbed, and a polynomial worst-case complexity result is proven. Another source of inexactness is the use of an iterative method such as the preconditioned conjugate gradient method to solve the linear system, and this has been proposed (e.g., in [NFK98, KR91, Meh92, CS93, GMS+86, WO00]) and analyzed (e.g., in [BCL99, BGN00, GM88]) for certain classes of convex optimization problems. This paper is organized as follows. We first present some relevant properties of barrier functions in Section 2. In Section 3 we consider a primal-dual short-step algorithm that uses inexact values of the primal barrier gradient and Hessian. We show in Section 4 that if the gradient and Hessian estimates are accurate enough, then the algorithm converges in a polynomial number of iterations. Finally, Section 5 is devoted to a few concluding remarks. Much of the work reported here is derived from the dissertation [Sch06]. Additional details, including results obtained by assuming the errors in the gradient and Hessian approximations are dependent, which can occur in practice, may be found there. 2 Notation and Preliminaries A cone K is a nonempty set such that ?x ? K for all scalars ? ? 0 and x ? K. Hence K always includes the origin. A cone that is closed, convex, solid (i.e., has nonempty interior), and pointed (i.e., contains no lines) is said to be full.1 Given an inner product ??,?? on Rn, the dual of a set S is given by S? = {y : ?x,y? ? 0 ?x ? S}. It can be easily verified that the dual of any set is a closed convex cone, so we may refer to S? as the dual cone of S. Denote by bardbl?bardbl the vector norm induced by ??,??, and given a linear mapping A, let bardblAbardbl be the operator norm induced by this vector norm: bardblAbardbl = supxnegationslash=0 bardblAxbardbl/bardblxbardbl. Note that for any inner product, bardblAbardbl is the largest singular value of A, i.e., bardblAbardbl is the spectral norm. The adjoint of A is denoted by A?, and its inverse by A??. It will be clear from the context whether ? denotes a dual cone or an adjoint map. Given a set S ? Rn and a smooth function F : S ? R, let F(k)(x) denote the kth derivative of F at x ? int(S). Following standard practice in the optimization literature, given directions h1,??? ,hj, let F(k)(x)[h1,??? ,hj] denote the result of applying F(k)(x) successively to h1,??? ,hj. For example, F??(x)[h1,h2] = (F??(x)h1)h2 = ?F??(x)h1,h2? ? R, (2.1) and F???(x)[h1,h2] = (F???(x)h1)h2 ? L(Rn), (2.2) where L(Rn) is the space of linear functionals over Rn. In the remainder of this section, we establish properties of barrier functions needed in our analysis. 1Some authors call such a cone proper or regular. 3 2.1 Self-concordant barrier functions Let S ? Rn be a closed convex set with nonempty interior. The function F : int(S) ? R is said to be a ?-self-concordant barrier function (or simply a self-concordant barrier) for S if it is closed (i.e., its epigraph is a closed set), is three times continuously differentiable, and satisfies |F???(x)[h,h,h]| ? 2(F??(x)[h,h])3/2 ? x ? int(S) and h ? Rn, (2.3) and ? = sup x?int(S) ?(F,x)2 < ?, where ?(F,x) = inft {t : |F?(x)[h]| ? t(F??(x)[h,h])1/2 ?h ? Rn}.2 If furthermore F??(x) is positive definite for all x ? int(S), then F is said to be nondegenerate. Note that property (2.3) implies the convexity of F, and that convexity and closedness of F imply that F satisfies the barrier property, i.e., for every sequence {xi} ? int(S) converging to a boundary point of S, F(xi) ? ?. (See [Nes04, Theorem 4.1.4] for details.) It is known that a self-concordant barrier for a full cone is necessarily nondegenerate [Nes04, Theorem 4.1.3], but for the sake of clarity we will sometimes refer to such functions as nondegenerate self-concordant barriers. The quantity ? is known as the complexity parameter of F. If S ? Rn is a closed convex set with nonempty interior, and F is a ?-self-concordant barrier for S, then ? ? 1 [NN94, Corollary 2.3.3]. Let F be a nondegenerate self-concordant barrier for S ? Rn, and let x ? int(S). Then F??(x) and its inverse induce the following ?local? norms on Rn: bardblhbardblx,F = ?h,F??(x)h?1/2, bardblhbardbl?x,F = ?h,F??(x)?1h?1/2, h ? Rn. (2.4) These norms are dual to each other. The following result?which uses and builds upon [NN94, Theorem 2.1.1]?characterizes the local behavior of the Hessian of a self-concordant barrier. Lemma 2.1. Let F be a nondegenerate self-concordant barrier for the closed convex set S having nonempty interior. (a) If x,y ? int(S) are such that r := bardblx?ybardblx,F < 1, then for all h, (1?r)bardblhbardblx,F ? bardblhbardbly,F ? 11?rbardblhbardblx,F, (2.5) (1?r)bardblhbardbl?x,F ? bardblhbardbl?y,F ? 11?rbardblhbardbl?x,F . (2.6) (b) For every x ? int(S), bardblx?ybardblx,F < 1 implies that y ? int(S). Proof. The inequalities in (2.5) and the result in (b) are from [NN94, Theorem 2.1.1]. The inequalities in (2.6) follow from those in (2.5); see e.g., [Sch06, Lemma 3.2.9]. 2The quantity ?(F,x) is often referred to as the Newton decrement. 4 Although Lemma 2.1 holds for self-concordant barrier functions on general convex sets, we will be most interested in barriers on full cones, so subsequent results will be phrased with this in mind. Given a full cone K and a function F : int(K) ? R, the conjugate function, F? : int(K?) ? R, of F is given by3 F?(s) := sup x?int(K) {??x,s??F(x)}, s ? int(K?). (2.7) Since F?(s) is the pointwise supremum of linear functions of s, F? is convex on int(K?). It was shown in, e.g., [Ren01, Theorem 3.3.1], that if F is a nondegenerate self-concordant barrier for K, then F? is a nondegenerate self-concordant barrier for K?. In light of this, the next result follows directly from Lemma 2.1. Lemma 2.2. Let F be a nondegenerate self-concordant barrier for the full cone K. (a) If t,s ? int(K?) are such that r := bardblt?sbardblt,F? < 1, then for all h, (1?r)bardblhbardblt,F? ? bardblhbardbls,F? ? 11?rbardblhbardblt,F?. (b) For every s ? int(K?), bardbls?tbardbls,F? < 1 implies that t ? int(K?). The following technical results will be used in the analysis of the interior-point method to be presented later. Lemma 2.3. Let F be a nondegenerate self-concordant barrier for the full cone K, and let x ? int(K). Then for all h1 and h2, bardblF???(x)[h1,h2]bardbl?x,F ? 2bardblh1bardblx,Fbardblh2bardblx,F. Proof. Since F is a nondegenerate self-concordant barrier function, it follows from [NN94, Proposition 9.1.1] that |F???(x)[h1,h2,h3]| ? 2bardblh1bardblx,Fbardblh2bardblx,Fbardblh3bardblx,F, for all x ? int(K) and all h1,h2,h3. (In fact, only property (2.3) in the definition of a self-concordant barrier function is used to prove this result.) Now by definition of a dual norm, bardblF???(x)[h1,h2]bardbl?x,F = max y:bardblybardblx,F=1 ?y,F???(x)[h1,h2]? = max y:bardblybardblx,F=1 F???(x)[h1,h2,y] ? max y:bardblybardblx,F=1 2bardblh1bardblx,Fbardblh2bardblx,Fbardblybardblx,F = 2bardblh1bardblx,Fbardblh2bardblx,F. Corollary 2.4. Let F be a nondegenerate self-concordant barrier for the full cone K, let x ? int(K) and h be such that ? := bardblhbardblx,F < 1, and let ? ? [0,1]. Then bardblF???(x+?h)[h,h]bardbl?x+h,F ? 2? 2 (1??)(1???). 3Strictly speaking, in the classical definition of a conjugate function, the domain of F? is not restricted to int(K?). We include such a restriction here so that F? is finite, which is the only case of interest to us. The definition of a conjugate function used here is found in, e.g., [NT97, NT98, Ren01]. It is slightly different from that found elsewhere in the literature, including [NN94]. The difference is the minus sign in front of the ?x,s? term, which turns the domain of F? from int(?K?) into int(K?). 5 Proof. Since ?? < 1, it follows from Lemma 2.1(b) that x+?h ? int(K), and then from (2.5) that bardblhbardblx+?h,F ? ?1???. (2.8) Moreover, bardbl(1 ? ?)hbardblx+?h,F ? (1 ? ?)?/(1 ? ??) < 1, so (2.6) followed by Lemma 2.3 and then (2.8) yields bardblF???(x +?h)[h,h]bardbl?x+h,F ? 11?bardbl(1??)hbardbl x+?h,F bardblF???(x+?h)[h,h]bardbl?x+?h,F ? 11?(1??)bardblhbardbl x+?h,F 2bardblhbardbl2x+?h,F ? 11?(1??) ? 1??? 2 parenleftbigg ? 1??? parenrightbigg2 = 2? 2 (1??)(1???). 2.2 Logarithmically homogeneous barrier functions An important class of barrier functions for convex cones is that of logarithmically homoge- neous barriers. (The terminology reflects the fact that the exponential of such a function is homogeneous.) These barrier functions were first defined in [NN94, Definition 2.3.2] and occur widely in practice. Given a full cone K, a convex twice continuously differentiable function F : int(K) ? R is said to be a ?-logarithmically homogeneous barrier for K if ? ? 1, F satisfies the barrier property for K, and the logarithmic-homogeneity relation F(tx) = F(x)?? log(t) ? x ? int(K), t > 0 (2.9) holds. The following well-known properties of logarithmically homogeneous barriers are the same or similar to those in [NN94, Proposition 2.3.4], so we omit proofs. Lemma 2.5. Let K be a full cone, and F : int(K) ? R be a ?-logarithmically homogeneous barrier for K. For all x ? int(K) and t > 0: (a) F?(tx) = 1tF?(x) and F??(tx) = 1t2F??(x); (b) ?F?(x),x? = ??; (c) F??(x)x = ?F?(x); (d) ?F??(x)x,x? = ?; (e) If in addition F??(x) is positive definite for all x ? int(K), then bardblF?(x)bardbl?x,F = ?1/2. It was noted in [NN94, Corollary 2.3.2] that if F is a ?-logarithmically homogeneous barrier for the full cone K, and F is three times differentiable and satisfies (2.3), then F is a ?-self-concordant barrier for K. Such an F is called a ?-logarithmically homogeneous (nondegenerate) self-concordant barrier for K. The following result relates the norm induced by the Hessian of F to that induced by the Hessian of F?. Here and throughout, we write F??? (?) as shorthand for (F?(?))??. 6 Lemma 2.6. For some ?, let F be a ?-logarithmically homogeneous self-concordant barrier for the full cone K, and let x ? int(K). Then for all ? > 0 and all h, bardblhbardbl??F?(x),F? = 1?bardblhbardbl?x,F . Proof. It follows from, e.g., [Ren01, Theorem 3.3.1] that F? is a ?-logarithmically homogeneous self-concordant barrier for K?. Using Lemma 2.5(a), we obtain that for all x ? int(K) and ? > 0, F??? (??F?(x)) = 1?2F??? (?F?(x)) = 1?2F??(x)?1, where the last equality follows from the relation F??? (?F?(x)) = F??(x)?1; see e.g., [Ren01, Theorem 3.3.4]. (This relation holds for all x ? int(K), and uses the fact that F?(x) ? ?int(K?).) So for all h, bardblhbardbl??F?(x),F? = bardblhbardblx,F?1/?2 = 1?bardblhbardblx,F?1 = 1?bardblhbardbl?x,F . To our knowledge, the following key relation between the norms induced by the Hessians of F and F? is new. It shows that in a certain region (that we will later identify as a neighborhood of the central path), the dual norm induced by the Hessian of F is comparable to the norm induced by the Hessian of F?. Lemma 2.7. For some ?, let F be a ?-logarithmically-homogeneous self-concordant barrier for the full cone K. Let ? ? (0,1), and suppose that x ? int(K) and s ? int(K?) satisfy bardbls+?F?(x)bardbl?x,F ? ??, (2.10) where ? = ?x,s?/?. Then for any h, 1?? ? bardblhbardbl ? x,F ? bardblhbardbls,F? ? 1 (1??)?bardblhbardbl ? x,F . Proof. As noted previously, F?(x) ? ?int(K?) for all x ? int(K). Moreover, ? > 0. So t := ??F?(x) ? int(K?). Since x ? int(K) and ? > 0, we may invoke Lemma 2.6. Following this, we use (2.10) to obtain r := bardblt?sbardblt,F? = bardbl?F?(x) +sbardbl??F?(x),F? = 1?bardbl?F?(x) +sbardbl?x,F ? ? < 1. Now applying Lemma 2.2(a) and using r ? ?, we obtain for any h, (1??)bardblhbardbl??F?(x),F? ? bardblhbardbls,F? ? 11??bardblhbardbl??F?(x),F?. (2.11) Applying Lemma 2.6 again gives the required result. 3 An inexact primal-dual interior-point method for conic optimization In this section we define our problem, give an overview of the short-step primal-dual IPM and then present an algorithm that does not require exact evaluation of the primal or dual barrier functions or their derivatives. 7 3.1 Conic optimization problem Any primal-dual pair of convex optimization problems can be written in the following conic form: vP = infx {?c,x? : Ax = b, x ? K}, (3.1) vD = sup w,s {?b,w? : A?w +s = c, s ? K?}. (3.2) where A : Rn ? Rm, b ? Rm, c ? Rn, and K ? Rn is a closed convex cone. A point x is said to be strongly feasible for (3.1) if Ax = b and x ? int(K), and a pair (w,s) is said to be strongly feasible for (3.2) if A?w + s = c and s ? int(K?). In the case that K is the nonnegative orthant, (3.1) and (3.2) are called linear optimization problems. If K is the second-order cone {x : xn ? bardblx1:n?1bardbl2}, then (3.1) and (3.2) are called second-order cone optimization problems.4 If K is the full cone of positive semidefinite matrices, considered as a subset of the space of symmetric matrices, then (3.1) and (3.2) are called semidefinite programs.5 Assumption 3.1. (a) m > 0 and A is onto; (b) K is a full cone; (c) The optimization problems (3.1) and (3.2) each satisfy the generalized Slater constraint qualification; i.e., there exists a strongly feasible x for (3.1) and a strongly feasible pair (w,s) for (3.2). It follows from Assumption 3.1(b) that K? is also a full cone. Assumption 3.1(c) can be made without loss of generality, since if it fails to hold, one can embed (3.1)?(3.2) in a higher-dimensional conic problem for which the assumption does hold; see e.g., [NTY99]. Un- der Assumption 3.1, the optimal primal and dual solution sets are nonempty and bounded (see [Nes97, Theorem 1]), and the duality gap is zero, i.e., vP = vD; see e.g., [Roc97, Corol- lary 30.5.2]. Consider the barrier problem associated with (3.1), viz., vP(?) = infx {?c,x?+?F(x) : Ax = b}, (3.3) where ? > 0 is called the barrier parameter, and the function F : int(K) ? R satisfies the following assumption. Assumption 3.2. The function F is a ?-logarithmically homogeneous self-concordant barrier for K. Such an F is known to exist for every full cone K [NN94, Theorem 2.5.1]. As mentioned previously, the conjugate function F? of such an F is a ?-logarithmically homogeneous self- concordant barrier for K?, and hence is a suitable barrier for (3.2). The resulting dual barrier problem is vD(?) = sup w,s {?b,w???F?(s) : A?w +s = c}. (3.4) 4The second-order cone is also known as the Lorentz cone or ice-cream cone. 5The space of symmetric matrices of order n is, of course, isomorphic to Rn(n+1)/2. 8 It was shown in [Nes97, Lemma 1] that under the generalized Slater constraint qualification, (3.3) and (3.4) have a unique minimizer for each ? > 0. 3.2 Primal-dual interior-point methods for conic problems Of special importance in primal-dual interior-point algorithms is the set of triples (x,w,s) such that for some ? > 0, x is a minimizer of the primal barrier problem and (w,s) is a minimizer of the dual barrier problem. This set is called the primal-dual central path. To exploit the fact that the primal-dual central path is a curve culminating at a point in the primal-dual optimal solution set, we use an iterative algorithm whose iterates stay close to the primal-dual central path while also converging to the primal-dual optimal solution set. It is well known that under Assumption 3.1 the primal-dual central path consists of the triples satisfying the following system: Ax = b A?w +s = c ?F?(x) +s = 0 x ? int(K), s ? int(K?). (3.5) For any (x,w,s) on the primal-dual central path the pointwise duality gap is ?c,x???b,y? = ?x,s? = ?x,??F?(x)? = ???x,F?(x)? = ??, (3.6) where the last equality follows from Lemma 2.5(b). Therefore to follow the primal-dual central path to the optimal primal-dual solution set, we decrease the positive duality measure ? = ?x,s?? , (3.7) of the triple (x,w,s) to zero, at which point, under Assumption 3.1, x solves (3.1) and (w,s) solves (3.2). Applying Newton?s method to the system of equations in (3.5), we obtain the linear system ? ? A 0 0 0 A? I ?F??(x) 0 I ? ? ? ? ?x ?w ?s ? ? = ? ? b?Ax c?A?w ?s ???F?(x)?s ? ?, (3.8) with ? = 1.6 It is well known that when F is a logarithmically homogeneous barrier for K, the primal- dual central path is unchanged when the condition ?F?(x) + s = 0 in (3.5) is replaced by ?F??(s)+x = 0. Such a replacement gives rise to a ?dual linearization? analogous to the ?primal linearization? in (3.8). In general these two approaches yield different Newton directions. The advantage of using (3.8) is that evaluation of F? and its derivatives is not required. On the 6Strictly speaking, A, A?, and F??(x), and hence the block matrix on the left-hand side of (3.8), are linear mappings rather than matrices, but the meaning of (3.8) is clear. 9 other hand, the direction obtained from (3.8) is biased in favor of solving the primal barrier problem. Since we intend for the algorithm stated below to produce iterates staying close to the primal-dual central path while making progress towards the optimal primal-dual solution set, it is necessary to quantify what it means for a point to lie close to the central path. Given ? ? (0,1), define the N(?) neighborhood of the primal-dual central path by N(?) := braceleftbigg (x,w,s) : (x,w,s) is strongly feasible for (3.1)?(3.2), and bardbls+?F?(x)bardbl?x,F ? ??, where ? = ?x,s?? bracerightbigg . (3.9) The neighborhood N(?) defined in (3.9) was used in [NT98, Section 6] for optimization over the class of self-scaled cones. In the case that K is the nonnegative orthant and F(x) = ?summationtexti log(xi) is the standard logarithmic barrier function, we have bardbls + ?F?(x)bardbl?x,F = bardblXs? ?ebardbl2, so N(?) is the familiar N2(?) neighborhood used in linear optimization; see e.g., [Wri97, p. 9]. Note that points in the set N(?) satisfy all conditions in (3.5) with the possible exception of the system of equations s+?F?(x) = 0, whose residual is small (as measured in the bardbl?bardbl?x,F norm). We will use a primal-dual short-step algorithm to solve (3.1) and (3.2). Short-step al- gorithms date back to an important paper of Renegar [Ren88], in which a polynomial-time primal algorithm for linear optimization was given. The name ?short-step? arises from the fact that this class of algorithms generates at each iteration Newton steps that are ?short? enough to be feasible without the need for a line search. This is an advantage in conic opti- mization, since line searches may be expensive and difficult for many classes of cones K. The major downside is that such Newton steps are usually too conservative; in practice larger steps are often possible, leading to a faster reduction in the duality measure, and hence a smaller number of iterations. A standard short-step primal-dual feasible-point algorithm uses (3.8) to compute a se- quence of steps for a sequence of ? values converging to zero. The algorithm uses two param- eters. One is ? ? (0,1), which stipulates the width of the neighborhood N(?) inside which the iterates are constrained to lie, the other is a centering parameter ? ? (0,1). (See e.g., [Wri97, p. 8], where the parameter is denoted by ?.) We terminate the iteration upon finding an ?-optimal solution of (3.1)?(3.2), which is a feasible primal-dual point whose duality measure is no greater than ?. We show in Section 4 that by choosing ? and ? appropriately, it can be ensured that all iterates of the algorithm indeed stay in the neighborhood N(?), even if F?(x) and F??(x) in (3.8) and (3.9) are not computed exactly, provided close enough approximations are used. Moreover, the duality measure decreases geometrically at each iteration, insuring convergence to an optimal solution in a reasonable number of iterations. 3.3 An inexact interior-point method The interior-point method we propose uses inexact barrier function evaluations; i.e., for a given x ? int(K), the quantities F?(x) and F??(x) in (3.8) are computed inexactly. Denote 10 these estimates by F1(x) and F2(x) respectively. Our analysis is independent of the estimation algorithm and depends only on the magnitude of the approximation errors. The core of the algorithm is the following update procedure (cf. (3.8)). Iteration Short step Input: ?,? ? (0,1) and a triple (x,w,s) ? N(?). (1) Let ? = ?x,s?/? and solve the linear system ? ? A 0 0 0 A? I ?F2(x) 0 I ? ? ? ? ?x ?w ?s ? ? = ? ? 0 0 ???F1(x)?s ? ? (3.10) where F2(x) is self-adjoint and positive definite. (2) Set x+ = x+ ?x, w+ = w + ?w, and s+ = s+ ?s. Remark 3.3. Iteration Short step requires the availability of an initial triple (x,w,s) in N(?). This requirement is inherited from other short-step methods. One method of finding such a point is to repeatedly perform Iteration Short step with ? = 1, but with ? kept constant at, say, its initial value. In other words, we seek to move directly towards a point on the central path. However, to our knowledge, even given a strictly feasible initial triple and using exact computation of the gradient and Hessian of the barrier function, no algorithm has been proposed that reaches an appropriate neighborhood of the central path within a polynomial iteration bound independent of the nearness of the initial point to the boundary of the cone. Accordingly, this question requires further study beyond the scope of this paper. A noteworthy feature of Iteration Short step is that its implementation does not require evaluating, or even estimating, numerical values of the dual barrier function F? or its deriva- tives. For suitable values of ? and ?, and with F1(x) and F2(x) taken to be F?(x) and F??(x) respectively, it is known (at least in the SDP case) to produce an algorithm with polynomial iteration complexity; see, e.g., [BTN01, Section 6.5]. As we show below, such a property is preserved when one uses inexact evaluations of the derivatives of F, which can yield a substantial computational advantage. We now study the effect of errors in our estimates of F?(x) and F??(x) on the convergence of the interior-point method. 4 Convergence analysis of the inexact algorithm In this section we prove that the inexact algorithm produces a strongly feasible set of iterates and converges in a polynomial number of iterations. 4.1 Feasibility and convergence of the iterates Denote the errors in the known estimates of the gradient and Hessian of F(x) by E1(x) = F1(x)?F?(x), E2(x) = F2(x)?F??(x). 11 By definition E2(x) is a self-adjoint linear mapping. Throughout this section, we will assume that the errors E1(x) and E2(x) are ?small enough?. Specifically, for some epsilon11 and epsilon12, bardblE1(x)bardbl?x,F bardblF?(x)bardbl?x,F ? bardblE1(x)bardbl?x,F ?1/2 ? epsilon11 < 1 ? x ? int(K), (4.1) bardblF??(x)?1/2E2(x)F??(x)?1/2bardbl ? epsilon12 < 1 ? x ? int(K). (4.2) The quantity bardblE1(x)bardbl?x,FbardblF?(x)bardbl? x,F is the relative error in F1(x), measured in the bardbl ? bardbl?x,F norm, and bardblF??(x)?1/2E2(x)F??(x)?1/2bardbl is the absolute error in F2(x) relative to the Hessian of F. It is also an upper bound on the relative error in F2(x), measured in the norm bardbl?bardbl: bardblE2(x)bardbl bardblF??(x)bardbl = bardblF??(x)1/2F??(x)?1/2E2(x)F??(x)?1/2F??(x)1/2bardbl bardblF??(x)bardbl ? bardblF ??(x)1/2bardbl bardblF??(x)?1/2E2(x)F??(x)?1/2bardbl bardblF??(x)1/2bardbl bardblF??(x)bardbl = bardblF ??(x)1/2bardbl2 bardblF??(x)bardbl bardblF ??(x)?1/2E2(x)F??(x)?1/2bardbl = bardblF??(x)?1/2E2(x)F??(x)?1/2bardbl. The last equality follows from the relation bardblB2bardbl = ?max(B2) = ?max(B)2 = bardblBbardbl2 for a positive semidefinite self-adjoint operator B, where ?max(?) denotes the largest eigenvalue. We now show that under the assumption in (4.2), the eigenvalues of F2(x) are close to those of F??(x). We will use precedesequal to denote a partial ordering with respect to the cone of positive semidefinite matrices. Viz., for symmetric matrices M1 and M2, M1 precedesequal M2 if and only if M2 ?M1 is positive semidefinite. Lemma 4.1. Suppose that F?? and its estimate F2 satisfy (4.2). Then (1?epsilon12)F??(x) precedesequal F2(x) precedesequal (1 +epsilon12)F??(x) ?x ? int(K). Moreover F2(x) is positive definite for all x ? int(K). Proof. The assumption in (4.2) and the definition of a norm induced by an inner product im- pliesthatforallx ? int(K), eacheigenvalueoftheself-adjointoperatorF??(x)?1/2E2(x)F??(x)?1/2 is bounded by epsilon12, so ?epsilon12I precedesequal F??(x)?1/2E2(x)F??(x)?1/2 precedesequal epsilon12I. Multiplying this matrix inequality on the left and right by the positive definite matrix F??(x)1/2 preserves the partial ordering precedesequal: ?epsilon12F??(x) precedesequal E2(x) precedesequal epsilon12F??(x). Adding F??(x) to each quantity in this matrix inequality gives the required result. Since F??(x) is positive definite, so F2 is also. 12 Since F2(x) is positive definite for all x ? int(K), it follows that F2(x)?1 is well defined. Due to this and A being onto (Assumption 3.1), (3.10) has a unique solution. The outline for the remainder of this section is as follows. After giving some technical results, we prove that for a primal-dual iterate (x,w,s) lying inside a neighborhood N(?) of the central path, the Newton steps ?x and ?s in (3.10) are bounded by a constant and a constant times the duality measure ?, respectively, in the appropriate norms (Lemma 4.3 and Corollary 4.4). We then show that if these constants are appropriately bounded, then a full Newton step produces a strongly feasible primal-dual point (Lemma 4.5). Next we derive lower and upper bounds on the rate of decrease of the duality measure at each iteration (Lemma 4.6). The lower bound is used to show that all iterates stay in the N(?) neighborhood of the central path (Lemmas 4.7, 4.8, and 4.9). Thus Iteration Short step can be repeated ad infinitum, and the resulting algorithm belongs to the class of ?path-following? methods. It also belongs to the class of feasible-point algorithms: all iterates are (strongly) feasible. Finally, using the upper bound on the rate of decrease of the duality measure, we show that the algorithm has polynomial iteration complexity. Since F??(x) and F2(x) are positive definite on int(K), the positive definite square roots of these two matrices are well defined. Define D(x) = F??(x)1/2F2(x)?1/2 = F??(x)1/2(F??(x) +E2(x))?1/2 (4.3) for all x ? int(K). In this section we will use the following implication of the Cauchy-Schwarz inequality: given x ? int(K) and y,z ? Rn, |?y,z?| ? bardblybardbl?x,F bardblzbardblx,F. The following technical results will be used in our analysis. Lemma 4.2. Suppose that F1 and F2 satisfy (4.1) and (4.2), and let x ? int(K) and z ? Rn. Let D(x) be defined as in (4.3). Then bardblD(x)bardbl2 ? 11?epsilon1 2 , (4.4a) bardblD(x)?1bardbl2 ? 1 +epsilon12, (4.4b) |?F?(x),z?| ? ?1/2bardblzbardblx,F, (4.4c) |?E1(x),z?| ? epsilon11?1/2bardblzbardblx,F, (4.4d) |?E1(x),x?| ? epsilon11?, (4.4e) |?x,E2(x)z?| ? epsilon12?1/2bardblzbardblx,F . (4.4f) Proof. First, since bardblD(x)bardbl is the square root of the largest eigenvalue of D(x)D(x)?, we have bardblD(x)bardbl2 = bardblD(x)D(x)?bardbl = bardblF??(x)1/2parenleftbigF??(x) +E2(x)parenrightbig?1F??(x)1/2bardbl = bardblparenleftbigI +F??(x)?1/2E2(x)F??(x)?1/2parenrightbig?1bardbl ? 11?bardblF??(x)?1/2E 2(x)F??(x)?1/2bardbl ? 11?epsilon1 2 , 13 where the inequalities follow from (4.2). This proves (4.4a). Next, the invertible mapping D satisfies bardblD(x)?1bardbl2 = bardblD(x)??D(x)?1bardbl = bardblF??(x)?1/2parenleftbigF??(x) +E2(x)parenrightbigF??(x)?1/2bardbl = bardblI +F??(x)?1/2E2(x)F??(x)?1/2bardbl ? 1 +bardblF??(x)?1/2E2(x)F??(x)?1/2bardbl ? 1 +epsilon12, where the inequalities again follow from (4.2), proving (4.4b). Further, |?F?(x),z?| ? bardblF?(x)bardbl?x,F bardblzbardblx,F = ?1/2bardblzbardblx,F, where we have used Lemma 2.5(e), proving (4.4c). Now, using (4.1), we have |?E1(x),z?| ? bardblE1(x)bardbl?x,F bardblzbardblx,F ? epsilon11?1/2bardblzbardblx,F, where the last inequality follows from (4.1), proving (4.4d). The inequality in (4.4e) follows from (4.4d) with z = x, where Lemma 2.5(d) has also been used. Finally, using Lemma 2.5(d) and (4.2), we have |?x,E2(x)z?| ? bardblE2(x)?xbardbl?x,F bardblzbardblx,F = bardblE2(x)?F??(x)?1/2 ?F??(x)1/2xbardbl?x,F bardblzbardblx,F ? bardblF??(x)?1/2E2(x)F??(x)?1/2bardbl bardblF??(x)1/2xbardbl bardblzbardblx,F ? epsilon12?1/2bardblzbardblx,F, proving (4.4f). Define the following three constants, which depend on the IPM parameters ? and ?, the maximum errors epsilon11 and epsilon12, and the complexity parameter ? ? 1. ?0 := ? + (1?? +?epsilon11)? 1/2 (1?epsilon12)1/2 , ?1 := ?0(1?epsilon1 2)1/2 , (4.5) ?2 := max braceleftbigg ?1,?0(1 +epsilon12) 1/2 1?? bracerightbigg . Lemma 4.3. Suppose that F1 and F2 satisfy (4.1) and (4.2). Let ? ? (0,1), and suppose that (x,w,s) ? N(?) for some w, and that x,s,?x, and ?s satisfy (3.10) for some ?w. Then ?2bardblF2(x)1/2?xbardbl2 +bardblF2(x)?1/2?sbardbl2 ? ?2?20. 14 Proof. Premultiplying the third block equation in (3.10) by (?F2(x))?1/2 yields: (?F2(x))1/2?x + (?F2(x))?1/2?s = ?(?F2(x))?1/2(??F1(x) +s). (4.6) It is seen from (3.10) that ?x lies in the nullspace of A, while ?s lies in the range of A?. Therefore ?x is orthogonal to ?s. Taking the square of the norm of each side of (4.6) and using this orthogonality, we obtain bardbl(?F2(x))1/2?xbardbl2 +bardbl(?F2(x))?1/2?sbardbl2 = bardbl(?F2(x))?1/2(??F1(x) +s)bardbl2. Multiplying this equation by ? yields ?2bardblF2(x)1/2?xbardbl2 +bardblF2(x)?1/2?sbardbl2 = bardblF2(x)?1/2(??F1(x) +s)bardbl2. (4.7) Let us now bound the right-hand side of (4.7). In the following, (4.8a) follows from (4.3), and (4.8b) follows from (4.4a). The inequality in (4.8c) follows from (x,w,s) ? N(?), (4.1), and Lemma 2.5(e). bardblF2(x)?1/2(??F1(x) +s)bardbl = bardblD(x)?F??(x)?1/2(??F1(x) +s)bardbl (4.8a) ? 1(1?epsilon1 2)1/2 bardbl??F1(x) +s)bardbl?x,F (4.8b) = 1(1?epsilon1 2)1/2 bardbl(s+?F?(x)) +??E1(x)??(1??)F?(x)bardbl?x,F ? 1(1?epsilon1 2)1/2 parenleftbigbardbls+?F?(x)bardbl? x,F +??bardblE1(x)bardbl ? x,F +?(1??)bardblF ?(x)bardbl? x,F parenrightbig ? 1(1?epsilon1 2)1/2 parenleftbig?? +??epsilon1 1?1/2 +?(1??)?1/2 parenrightbig (4.8c) = ??0. Combining this with (4.7) yields the required result. Corollary 4.4. Suppose that F1 and F2 satisfy (4.1) and (4.2). Let ? ? (0,1), and suppose that (x,w,s) ? N(?) for some w, and that x,s,?x, and ?s satisfy (3.10) for some ?w. Then bardblF2(x)1/2?xbardbl ? ?0, bardblF2(x)?1/2?sbardbl ? ??0, bardbl?xbardblx,F ? ?1, bardbl?sbardbl?x,F ? (1 +epsilon12)1/2??0. Proof. The first two bounds follow immediately from Lemma 4.3. Further, using (4.3) and (4.4a), we have bardbl?xbardblx,F = bardblD(x)F2(x)1/2?xbardbl ? bardblD(x)bardbl bardblF2(x)1/2?xbardbl ? (1?epsilon12)?1/2?0 = ?1. This proves the third inequality. To prove the last, similarly use (4.3) and (4.4b) to obtain bardbl?sbardbl?x,F = bardblD(x)??F2(x)?1/2?sbardbl ? (1 +epsilon12)1/2bardblF2(x)?1/2?sbardbl ? (1 +epsilon12)1/2??0. 15 Iteration Short step takes a full primal-dual Newton step from a triple (x,w,s) ? N(?). In the next series of lemmas we show under a condition on the parameters ?,?,epsilon11, and epsilon12, not only that such a step is strongly feasible, justifying step (2) in Iteration Short step, but further, that the new iterate remains in the N(?) neighborhood of the central path, so Iteration Short step can be repeated ad infinitum. We also characterize how the duality measure changes after taking a full Newton step. The following condition, in which dependence of ?2 on ?,?,epsilon11, and epsilon12 is emphasized, will play a key role: ?2(?,?,epsilon11,epsilon12,?) < 1. (4.9) Lemma 4.5. Suppose that F1 and F2 satisfy (4.1) and (4.2). Let ?,?,epsilon11, and epsilon12 be such that (4.9) holds, and let (x,w,s) ? N(?). Then the point (x+,w+,s+) generated by Iteration Short step is strongly feasible. Proof. The equality constraints in (3.1) and (3.2) are satisfied by (x+,w+,s+), since they are satisfied by (x,w,s) (by virtue of this triple lying in N(?)), and any step from (x,w,s) in the direction (?x,?w,?s) will satisfy these constraints due to the first two block equations in (3.10). We now show that x+ ? int(K) and s+ ? int(K?). Since F is a nondegenerate self-concordant barrier (Assumption 3.2), Lemma 2.1(b) is applicable: if bardbl?xbardblx,F < 1, then x+?x ? int(K). By Corollary 4.4, a sufficient condition for bardbl?xbardblx,F < 1 is ?1 < 1, and this holds since ?2 < 1. Similarly, in light of Lemma 2.2(b), if bardbl?sbardbls,F? < 1, then s+?s ? int(K?). It remains to show that bardbl?sbardbls,F? < 1 under the assumption (4.9). Since (x,w,s) ? N(?), we may apply Lemma 2.7 with h = ?s to obtain bardbl?sbardbls,F? ? bardbl?sbardbl ? x,F (1??)? ? (1 +epsilon12)1/2?0 1?? , where the last inequality follows from Corollary 4.4. By the definition of ?2 in (4.5), we have bardbl?sbardbls,F? ? ?2 < 1. It follows from Lemma 4.5 that a full step in the Newton direction for (3.10) is strongly feasible. We now study the change in the duality measure after taking a full primal-dual Newton step. Lemma 4.6. Suppose that F1 and F2 satisfy (4.1) and (4.2). Let ?,?,epsilon11, and epsilon12 be such that (4.9) holds, and let (x,w,s) ? N(?). Then the duality measure ?+ of (x+,w+,s+) (see (3.7)) in Iteration Short step satisfies ?? ? ?+ ? ???, (4.10) where ? = ? ??epsilon11 ? 1?? +?epsilon11 +epsilon12?1/2 ?1 ? 1?epsilon12? ?21, (4.11) ?? = ? +?epsilon11 + 1?? +?epsilon11 +epsilon12 ?1/2 ?? 1?epsilon12 ? ? 2, (4.12) with ? = min braceleftbigg ?1, (1?? +?epsilon11 +epsilon12)? 1/2 2(1?epsilon12) bracerightbigg . (4.13) 16 Proof. Recalling that for the direction obtained from (3.8), ?x is orthogonal to ?s, we have ??+ = ?x+,s+? = ?x+ ?x,s + ?s? = ?x,s + ?s?+??x,s?. (4.14) From the third block equation in (3.10), we obtain s+ ?s = ???F1(x)??F2(x)?x, (4.15) giving ?x,s + ?s? = ??x,??F1(x) +?F2(x)?x? = ????x,F?(x) +E1(x)????x,(F??(x) +E2(x))?x? = ??? ????x,E1(x)?+??F?(x),?x????x,E2(x)?x?, (4.16) where we have used Lemma 2.5(b),(c). Since ??x,?s? = 0, it also follows from (4.15) that ??x,s? = ??x,??F2(x)?x???F1(x)? = ????x,F2(x)?x?????F?(x) +E1(x),?x?. (4.17) Combining (4.14), (4.16), and (4.17) we have ??+ = ??? ????x,E1(x)?+ (1??)??F?(x),?x?????E1(x),?x? ???x,E2(x)?x?????x,F2(x)?x? , i.e., ?+ ? = ? ? ? ??x,E1(x)?+ 1?? ? ?F ?(x),?x?? ? ??E1(x),?x? ?1??x,E2(x)?x?? 1???x,F2(x)?x? . (4.18) To reduce clutter, let t := bardbl?xbardblx,F. In view of Lemma 4.1, we have??x,F2(x)?x? ? (1?epsilon12)t2. Using this inequality and appealing to Lemma 4.2, we obtain the following upper bound on ?+/?: ?+ ? ? ? + ? ?|?x,E1(x)?|+ 1?? ? |?F ?(x),?x?|+ ? ?|?E1(x),?x?| +1?|?x,E2(x)?x?|? 1?epsilon12? t2 ? ? +?epsilon11 + 1??? ?1/2t+ ??epsilon11?1/2t+ 1?epsilon12?1/2t? 1?epsilon12? t2 =: u(t). It follows from Corollary 4.4 that t ? ?1, so a uniform upper bound on ?+/? is max{u(t) : 0 ? t ? ?1}. The unconstrained maximizer is given by t? = (1?? +?epsilon11 +epsilon12)? 1/2 2(1?epsilon12) . 17 If this nonnegative solution satisfies t? ? ?1, then the constrained maximum of u(t) is u(t?). Otherwise the maximum is u(?1). Hence ?+ ? ? u(min{?1,t ?}), and the latter bound is ?? as given in (4.12) and (4.13). This proves the second inequality in (4.10). In view of Corollary 4.4, ??x,F2(x)?x? ? ?20 = (1 ? epsilon12)?21, so from (4.18) we can also obtain a lower bound on ?+/?: ?+ ? ? ? ??epsilon11 ? 1?? ? ? 1/2t? ? ?epsilon11? 1/2t? 1 ?epsilon12? 1/2t? 1?epsilon12 ? ? 2 1 = ? ??epsilon11 ? 1?epsilon12? ?21 ? parenleftbigg1?? +?epsilon1 1 +epsilon12 ?1/2 parenrightbigg t ? ? ??epsilon11 ? 1?epsilon12? ?21 ? parenleftbigg1?? +?epsilon1 1 +epsilon12 ?1/2 parenrightbigg ?1 = ?, proving the first inequality in (4.10). Lemma 4.7. The quantities ? and ?? defined in Lemma 4.6 satisfy ? < ? < ??. Proof. It is clear from (4.11) that ? < ?. The inequality ? < ?? is easily seen after rewriting ?? as ?? = ? +?epsilon11 + ?(1?epsilon12) ? bracketleftBiggparenleftbig 1?? +?epsilon11 +epsilon12parenrightbig?1/2 1?epsilon12 ?? bracketrightBigg , and noting that the term in the square brackets must be positive, due to (4.13). We will now show that condition (4.9) together with the following two conditions, where dependence on ?, ?, epsilon11, epsilon12, and ? is again emphasized, insures that the next iterate lies in N(?): ?(?,?,epsilon11,epsilon12,?) > 0, (4.19) ?+(?,?,epsilon11,epsilon12,?) is well defined, and ?+(?,?,epsilon11,epsilon12,?) ? ?, (4.20) where ?+ := ?epsilon11? 1/2 +epsilon12?1 ?(1??1) + parenleftbigg? ? ?1 parenrightbigg ?1/2 + (1??)?1?(1?? 1) + 2?? parenleftbigg log(1??1) + ?11?? 1 parenrightbigg . (4.21) The next lemma will prove useful in the sequel. Lemma 4.8. If ?,?,epsilon11,epsilon12, and ? satisfy conditions (4.19) and (4.20), then ?1,?2 < 1. 18 Proof. If ?+ is well defined, then clearly ?1 < 1. Moreover, for all ?1 ? (0,1), log(1 ? ?1) + ?1 1??1 > 0. It follows from Lemma 4.7 and (4.19) that ?/? > 1. Hence 2 parenleftbigg log(1??1) + ?11?? 1 parenrightbigg < 2?? parenleftbigg log(1??1) + ?11?? 1 parenrightbigg < ?+ ? ? < ?1. (4.22) The inequality 2log(1??1) + 2?11??1 < ?1 implies ?1 < 0.46. Now from (4.5) we have ?2 = max braceleftbigg ?1,?0(1 +epsilon12) 1/2 1?? bracerightbigg = ?1 max braceleftbigg 1, (1?epsilon1 2 2) 1/2 1?? bracerightbigg ? ?11?? < ?11?? 1 < 0.461?0.46 < 1. Lemma 4.9. Let ?,?,epsilon11,epsilon12, and ? satisfy conditions (4.19) and (4.20). If (x,w,s) ? N(?), then Iteration Short step produces a triple (x+,w+,s+) ? N(?). Proof. By Lemma 4.8, ?1,?2 < 1, so (4.9) holds. Applying Lemma 4.5, we conclude that (x+,w+,s+) is strongly feasible. It remains to show that bardbls+?F?(x)bardbl?x,F ? ?? =? bardbls+ +?+F?(x+)bardbl?x+,F ? ??+ . (4.23) From the third block equation in the linear system (3.10), we have s+ +?+F?(x+) = ???F1(x)??F2(x)?x+?+F?(x+) = D1 +D2 +D3, (4.24) where D1 = ???E1(x)??E2(x)?x, D2 = (?+ ???)F?(x+) +?(? ?1)F??(x)?x, D3 = ??(F?(x+)?F?(x)?F??(x)?x). We now appropriately bound the norms of D1, D2, and D3. First we bound bardblD1bardbl?x+,F. Since bardblx+ ?xbardblx,F = bardbl?xbardblx,F ? ?1 (Corollary 4.4) and ?1 < 1, we can use (2.6) to bound bardblD1bardbl?x+,F in terms of bardblD1bardbl?x,F; the result is (4.25a). The inequality (4.25b) follows from the definition of D1, and (4.25c) follows from (4.1). The inequality (4.25d) follows from (4.2) and Corollary 4.4. 19 bardblD1bardbl?x+,F ? 11?? 1 bardblD1bardbl?x,F (4.25a) ? 11?? 1 parenleftbig??bardblE 1(x)bardbl?x,F +?bardblE2(x)?xbardbl?x,F parenrightbig (4.25b) ? 11?? 1 parenleftbig??epsilon1 1?1/2 +?bardblF??(x)?1/2E2(x)F??(x)?1/2F??(x)1/2?xbardbl parenrightbig (4.25c) ? 11?? 1 parenleftbig??epsilon1 1?1/2 +?bardblF??(x)?1/2E2(x)F??(x)?1/2bardbl bardblF??(x)1/2?xbardbl parenrightbig ? 11?? 1 (??epsilon11?1/2 +?epsilon12?1). (4.25d) Now in Lemma 4.6 we established that ?? ? ?+, where ? is given in (4.11). Moreover, by assumption, ? > 0. It follows that bardblD1bardbl?x+,F ? ?epsilon11? 1/2 +epsilon12?1 ?(1??1) ?+ =: d1?+. (4.26) Next we bound bardblD2bardbl?x+,F. First note that for ? and ?? given in (4.11) and (4.12), ? + ?? = 2? + (???1)1?? +?epsilon11 +epsilon12?1/2 ? 1?epsilon12? (?2 +?21) < 2?, where the inequality follows from ? ? ?1; see (4.13). It follows that ???? < ? ??. Using this, the fact that ? > 0, and the result ? < ?? from Lemma 4.7, we have ? ? ?1 = ? ?? ? ? ? ?? ?? > ?? ?? ?? = 1? ? ?? > 0. Hence max ????+???? vextendsinglevextendsingle vextendsinglevextendsingle1?? ? ?+ vextendsinglevextendsingle vextendsinglevextendsingle = max braceleftbigg? ? ?1,1? ? ?? bracerightbigg = ?? ?1. (4.27) Inthefollowing, (4.28a)followsfromthedefinitionofD2, and(4.28b)followsfromLemma2.5(e) and (2.6). Furthermore, (4.28c) follows from (4.27) and Corollary 4.4. bardblD2bardbl?x+,F ? |?+ ???|bardblF?(x+)bardbl?x+,F +?(1??)bardblF??(x)?xbardbl?x+,F (4.28a) ? ?+ vextendsinglevextendsingle vextendsinglevextendsingle1?? ? ?+ vextendsinglevextendsingle vextendsinglevextendsingle?1/2 +?(1??)bardblF??(x)?xbardbl?x,F 1?bardbl?xbardblx,F (4.28b) ? ?+ parenleftbigg? ? ?1 parenrightbigg ?1/2 + ?+? (1??) ?11?? 1 (4.28c) =: d2?+. (4.28d) 20 Finally, we bound bardblD3bardbl?x+,F. In what follows, we will be working with integrals of vectors and matrices. All such integrals are meant componentwise. Applying the Fundamental Theorem of Calculus to F?, then to F??, yields D3 = ?? integraldisplay 1 0 F??(x+t?x)dt?F??(x)?x = ?? integraldisplay 1 0 integraldisplay t 0 F???(x+u?x)[?x,?x] dudt, giving bardblD3bardbl?x+,F ? ?? integraldisplay 1 0 integraldisplay t 0 bardblF???(x+u?x)[?x,?x]bardbl?x+,F dudt. Since ?1 < 1, Corollary 2.4 can be applied, yielding bardblF???(x+u?x)[?x,?x]bardbl?x+,F ? 2? 2 1 (1??1)(1?u?1), and7 bardblD3bardbl?x+,F ? ?? integraldisplay 1 0 integraldisplay t 0 2?21 (1??1)(1?u?1) dudt = ?? parenleftbigg 2log(1??1) + 2?11?? 1 parenrightbigg (4.29) ? 2?? parenleftbigg log(1??1) + ?11?? 1 parenrightbigg ?+ =: d3?+. (4.30) Combining the bounds in (4.26), (4.28d), and (4.30) with (4.24), we obtain bardbls+ +?+F?(x+)bardbl?x+,F ? bardblD1bardbl?x+,F +bardblD2bardbl?x+,F +bardblD3bardbl?x+,F ? ?+(d1(?) +d2(?) +d3(?)), where d1(?) := ?epsilon11? 1/2 +epsilon12?1 ?(1??1) , (4.31a) d2(?) := parenleftbigg? ? ?1 parenrightbigg ?1/2 + (1??)?1?(1?? 1) , (4.31b) d3(?) := 2?? parenleftbigg log(1??1) + ?11?? 1 parenrightbigg . (4.31c) From (4.21), we see that ?+ ? d1(?) + d2(?) + d3(?). Since ?+ ? ? (4.20), the proof is complete. 7Our bound on 1 ??bardblD3bardbl ? x+,F of 2log(1??1) + 2?1 1??1 = ? 21 + 4 3? 31 +??? in (4.29) below can be improved in the case that K is a self-scaled cone: in [NT97, Theorem 4.3] a bound of no greater than ?21 is derived using special properties of self-scaled barriers and self-scaled cones. 21 The significance of?+ is that it specifies a neighborhoodN(?+) of the central path such that (x+,w+,s+) ? N(?+) ? N(?). An important consequence of this is that Iteration Short step may be repeated. In what follows we will refer to ?Algorithm Short step?, which is simply an iterative algorithm in which at iteration k, Iteration Short step is invoked with ?k as the duality measure of the current iterate (x,w,s) = (xk,wk,sk). The algorithm terminates when an ?-optimal solution is obtained, i.e., a feasible solution with ?k ? ?, for some prescribed ?. 4.2 Polynomial complexity of an inexact IPM For convenience let us gather the conditions previously imposed on the algorithm parameters. The conditions (4.32a) and (4.32b) in the box below are copied from (4.19) and (4.20). (Due to Lemma 4.8 it is unnecessary to include the condition (4.9).) We also impose a new condition (4.32c) that will ensure the duality measure decreases at a fast enough rate. Since we are interested in a complexity result that holds regardless of the complexity parameter ?, we ask that these conditions be satisfied for all ? ? 1. Algorithm parameters ?,? ? (0,1), and epsilon11,epsilon12 ? [0,1), are selected as functions of ? with the property that there exists a constant ? ? (0,1) independent of ? such that for every ? ? 1: ?(?,?,epsilon11,epsilon12,?) > 0, (4.32a) ?+(?,?,epsilon11,epsilon12,?) is well defined, and ?+(?,?,epsilon11,epsilon12,?) ? ?, (4.32b) ??(?,?,epsilon11,epsilon12,?) ? 1? ? ?1/2.(4.32c) We now give the main result regarding the worst-case complexity of Algorithm Short step. Theorem 4.10. Let F be a ?-self-concordant barrier for the cone K in (3.1). Let ?,? ? (0,1) and epsilon11,epsilon12 ? [0,1) vary with ? in such a way that for some constant ? > 0, they satisfy conditions (4.32a)?(4.32c) for all ? ? 1. Suppose that the initial primal-dual iterate lies in N(?), and that for all primal iterates xk, F1(xk) and F2(xk) satisfy the inequalities in (4.1) and (4.2). Then for any ? > 0, an ?-optimal solution to (3.1)?(3.2) is obtained in a polynomial number (in ? and log(?0/?)) of iterations. Specifically, given ? > 0, there exists a number k? = O(?1/2 log(?0/?)) such that k ? k? implies ?k ? ?. Proof. Since (4.32a) and (4.32b) hold, and the starting iterate lies in N(?), it follows from Lemma 4.9 that all iterates generated by Iteration Short step lie in N(?). In fact for each k, they remain in N(?) restricted to the set S(?k) := {(x,w,s) : 0 ? ? ? ?k}. Now from Lemma 4.6 the ratio of successive duality measures ?k+1/?k is bounded above by the constant ??, which by (4.32c) is bounded away from 1. Hence ?k ? 0+ and all limit points of the sequence {(xk,wk,sk)} lie in N(?) restricted to the set S(limk ?k) = S(0), which is the primal-dual optimal solution set. Now the particular bound on ?? in (4.32c) implies that ?k+1 ? (1 ? ?/?1/2)?k for k = 0,1,???. Our polynomial iteration bound is readily obtained from this inequality. Specifically, if k ? ?1??1/2 log(?0/?)? then ?k ? ?. 22 Remark 4.11. A tradeoff exists between the distance from a given quadruple (?,?,epsilon11,epsilon12) to the boundary of the set of quadruples satisfying (4.32a)-(4.32b), and the largest possible ? in (4.32c). Given ? and ?, it is natural to want to choose epsilon11 and epsilon12 as large as possible. This reduces ?, which according to the proof of Theorem 4.10, increases the order constant in the complexity of the algorithm. It remains to verify that Theorem 4.10 is not vacuous, i.e., that there exists some ? ? (0,1) and (possibly ?-dependent) values ?,? ? (0,1) and epsilon11,epsilon12 ? [0,1), such that inequalities (4.32a)- (4.32c) hold for all ? ? 1. To that effect, let ? = 10?4, ? = 0.1, ? = 1 ? 0.02??1/2, epsilon11 = 0.015??1/2, and epsilon12 = 0.035. The slack in the three inequalities (4.32a)-(4.32c) is plotted in Figure 1, suggesting that the conditions hold indeed. (That they hold for arbitrarily large ? can be formally verified with little effort.) Remark 4.12. The worst-case complexity bound obtained in Theorem 4.10, O(?1/2 log(?0/?)), matches the best worst-case bound currently available for convex optimization algorithms, even for linear optimization algorithms that use exact evaluations of the barrier gradient and Hes- sian. Corollary 4.13. Given a family {Kn} of cones, where Kn has dimension n, and associated barrier functions Fn : int(Kn) ? R, if the complexity parameter ?(n) of Fn is polynomial in n, and if the gradient and Hessian estimates Fn1 (?) and Fn2 (?) can each be computed in a polynomial (in n) number of arithmetic operations, then Algorithm Short step with (possibly ?-dependent) values of ?, ?, epsilon11, and epsilon12 chosen to satisfy inequalities (4.32a)-(4.32c), generates an ?-optimal solution in a polynomial (in n) number of arithmetic operations. As a special case, Theorem 4.10 provides sufficient conditions on the parameters ? and ? for polynomiality of Algorithm Short Step under exact evaluation of the gradient and Hessian of F. Letting ? = 10?4 again, it is readily verified that, when ? = 0.1 is again selected, ? can now be as small as 1 ? 0.069??1/2, and when the larger value ? = 1 ? 0.02??1/2 is maintained, ? can be as large as 0.28, i.e., the size of the neighborhood of the central path can be significantly increased. 5 Conclusion A simple interior-point method for general conic optimization was proposed. It allows for inexact evaluation of the gradient and Hessian of the primal barrier function, and does not involve any evaluation of the dual barrier function or its derivatives. It was shown that as long as the relative evaluation errors remain within certain fixed, prescribed bounds, under standard assumptions, an ?-optimal solution is obtained in a polynomial number of iterations. References [BCL99] Mary Ann Branch, Thomas F. Coleman, and Yuying Li, A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems, SIAM Journal on Sci- entific Computing 21 (1999), no. 1, 1?23. 23 100 101 102 103 104 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 ? ? 100 101 102 103 104 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 ? ? ? ?+ 100 101 102 103 104 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 ?3 ? 1 ? ??? ? ?? Figure 1: Plot of the slacks in conditions (4.32a), (4.32b), and (4.32c), for ? = 10?4, ? = 0.1, ? = 1?0.02??1/2, epsilon11 = 0.015??1/2, and epsilon12 = 0.035. 24 [BGN00] R.H. Byrd, J.C. Gilbert, and J. Nocedal, A trust region method based on interior point techniques for nonlinear programming, Mathematical Programming 89 (2000), no. 1, 149?185. [BPR99] J. F. Bonnans, C. Pola, and R. R?ebai, Perturbed path following predictor-corrector interior point algorithms, Optim. Methods Softw. 11/12 (1999), no. 1-4, 183?210. MR MR1777457 (2001d:90124) [BTN01] A. Ben-Tal and A. Nemirovski, Lectures on modern convex optimization. analysis, algorithms, and engineering applications, MPS/SIAM Series on Optimization, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2001. [BV04] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, 2004. [CS93] Tamra J. Carpenter and David F. Shanno, An interior point method for quadratic programs based on conjugate projected gradients, Computational Optimization and Applications 2 (1993), 5?28. [FJM99] R. W. Freund, F. Jarre, and S. Mizuno, Convergence of a class of inexact interior-point algorithms for linear programs, Math. Oper. Res. 24 (1999), no. 1, 50?71. MR MR1679054 (2000a:90033) [GM88] D. Goldfarb and S. Mehrotra, A relaxed version of Karmarkar?s method, Mathematical Program- ming 40 (1988), no. 3, 289?315. [GMS+86] Philip E. Gill, Walter Murray, Michael A. Saunders, J. A. Tomlin, and Margaret H. Wright, On projected Newton barrier methods for linear programming and an equivalence to Karmarkar?s projective method, Mathematical Programming 36 (1986), 183?209. [Kor00] J. Korzak, Convergence analysis of inexact infeasible-interior-point algorithms for solving linear programming problems, SIAM J. Optim. 11 (2000), no. 1, 133?148. MR MR1785643 (2002f:90161) [KR91] N. K. Karmarkar and K. G. Ramakrishnan, Computational results of an interior point algorithm for large scale linear programming, Mathematical Programming 52 (1991), 555?586. [LMO06] Z. Lu, R. D. C. Monteiro, and J. W. O?Neal, An iterative solver-based infeasible primal-dual path-following algorithm for convex quadratic programming, SIAM J. on Optimization 17 (2006), no. 1, 287?310. [Meh92] Sanjay Mehrotra, Implementation of affine scaling methods: Approximate solutions of systems of linear equations using preconditioned conjugate gradient methods, ORSA Journal on Computing 4 (1992), no. 2, 103?118. [MJ99] S. Mizuno and F. Jarre, Global and polynomial-time convergence of an infeasible-interior-point algorithm using inexact computation, Math. Program. 84 (1999), no. 1, Ser. A, 105?122. MR MR1687272 (2000b:90026) [Nes97] Yu. Nesterov, Long-step strategies in interior-point primal-dual methods, Math. Programming 76 (1997), no. 1, Ser. B, 47?94. [Nes04] Yu. Nesterov, Introductory lectures on convex optimization. a basic course, Kluwer Academic Publishers, Boston/Dordrecht/London, 2004. [Nes06] , Towards nonsymmetric conic optimization, Tech. Report : CORE Discussion Paper 2006/28, Center for Operations Research and Econometrics, Catholic University of Louvain, March 2006. [NFK98] K. Nakata, K. Fujisawa, and M. Kojima, Using the conjugate gradient method in interior-point methods for semidefinite programs, Proceedings of the Institute of Statistical Mathematics 46 (1998), 297?316. [NN94] Yu. Nesterov and A. Nemirovskii, Interior-point polynomial algorithms in convex programming, SIAM Studies in Applied Mathematics, vol. 13, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1994. MR 94m:90005 [NT97] Yu. E. Nesterov and M. J. Todd, Self-scaled barriers and interior-point methods for convex pro- gramming, Math. Oper. Res. 22 (1997), no. 1, 1?42. MR 98c:90093 25 [NT98] , Primal-dual interior-point methods for self-scaled cones, SIAM J. Optim. 8 (1998), no. 2, 324?364. MR 99f:90056 [NTY99] Yu. Nesterov, M. J. Todd, and Y. Ye, Infeasible-start primal-dual methods and infeasibility de- tectors for nonlinear programming problems, Math. Program. 84 (1999), no. 2, Ser. A, 227?267. MR 2000d:90055 [Ren88] J. Renegar, A polynomial?time algorithm, based on Newton?s method, for linear programming, Mathematical Programming 40 (1988), 59?93. [Ren01] J. Renegar, A mathematical view of interior-point methods in convex optimization, MPS/SIAM Series on Optimization, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2001. MR 2002g:90002 [Roc97] R. T. Rockafellar, Convex analysis, Princeton Landmarks in Mathematics, Princeton University Press, Princeton, NJ, 1997, Reprint of the 1970 original. MR 97m:49001 [Sch06] S. P. Schurr, An inexact interior-point algorithm for conic convex optimization problems, Ph.D. thesis, University of Maryland, College Park, MD 20472, USA, 2006. [TTT06] K.-C. Toh, R. H. T?ut?unc?u, and M. J. Todd, Inexact primal-dual path-following algorithms for a special class of convex quadratic SDP and related problems, Tech. report, Department of Math- ematics, National University of Singapore, May 2006, Submitted to the Pacific Journal of Opti- mization. [Tun01] L. Tuncel, Generalization of primal-dual interior-point methods to convex optimization problems in conic form, Foundations of Computational Mathematics 1 (2001), no. 3, 229?254. [WO00] W. Wang and D. P. O?Leary, Adaptive use of iterative methods in predictor-corrector interior point methods for linear programming, Numer. Algorithms 25 (2000), no. 1-4, 387?406. MR MR1827167 (2002a:90099) [Wri97] S. J. Wright, Primal?dual interior?point methods, SIAM Publications, SIAM, Philadelphia, PA, USA, 1997. [ZT04] G. Zhou and K.-C. Toh, Polynomiality of an inexact infeasible interior point algorithm for semidefinite programming, Math. Program. 99 (2004), no. 2, Ser. A, 261?282. MR MR2039040 (2005b:90090) 26