University of Maryland College Park Institute for Advanced Computer Studies TR{2001-91 Department of Computer Science TR{4317 On the Powers of a Matrix with Perturbations  G. W. Stewart y December 2001 ABSTRACT Let A be a matrix of order n. The properties of the powers A k of A have been extensively studied in the literature. This paper concerns the perturbed powers P k = (A+ E k )(A+ E kBnZr1 )(A+ E 1 ); where the E k are perturbation matrices. We will treat three problems con- cerning the asymptotic behavior of the perturbed powers. First, determine conditions under which P k ! 0. Second, determine the limiting structure of P k . Third, investigate the convergence of the power method with error: that is, given u 1 , determine the behavior of u k =  k P k u 1 , where  k is a suitable scaling factor.  This report is available by anonymous ftp from thales.cs.umd.edu in the directory pub/reports or on the web at http://www.cs.umd.edu/stewart/. y Department of Computer Science and Institute for Advanced Computer Studies, University of Mary- land, College Park, MD 20742 (stewart@cs.umd.edu). On the Powers of a Matrix with Perturbations G. W. Stewart ABSTRACT Let A be a matrix of order n. The properties of the powers A k of A have been extensively studied in the literature. This paper concerns the perturbed powers P k = (A+ E k )(A+ E kBnZr1 )(A+ E 1 ); where the E k are perturbation matrices. We will treat three problems con- cerning the asymptotic behavior of the perturbed powers. First, determine conditions under which P k ! 0. Second, determine the limiting structure of P k . Third, investigate the convergence of the power method with error: that is, given u 1 , determine the behavior of u k =  k P k u 1 , where  k is a suitable scaling factor. 1. Introduction Let A be a matrix of order n with eigenvalues  1 ;::: ; n ordered so that j 1 j  j 2 j   j n j and let (A) = j 1 jdenote the spectral radius of A. We will be concerned with extending the following three results about the behavior of the powers A k of A. (These results are easily proved by exploiting the relations between norms and spectral radii; see [9, Sections I.2, II.1]. For earlier work on powers of a matrix see [2, 3, 5, 7].)  The rst result is classic. If (A) < 1, then lim k!1 A k = 0. Moreover, in any norm kA k k 1=k ! (A), or equivalently the convergence of A k to zero is faster than than that of [(A)+] k for any  > 0. We say that the root convergence index of A k is (A).  The second result concerns the asymptotic form of A k . Let j 1 j > j 2 j, and let the right and left eigenvectors corresponding to  1 be x and y, normalized so that y H x = 1. Then  BnZrk 1 A k ! xy H : Moreover, the root convergence index is j 2 = 1 j. 1 2 Perturbed Matrix Powers  The third result concerns the convergence of the power method. Speci cally, let u 1 be given and de ne u k+1 =  k Au k ; k = 1;2;::: ; where  k is a normalizing factor (e.g., kAu k k BnZr1 ). In the above notation, if j 1 j > j 2 j and y H u 1 6= 0, then u k converges to a multiple of x. The root index of convergence is j 2 = 1 j. Now let E 1 ;E 2 ;::: be a sequence of perturbation matrices and let P k = (A+ E k )(A+ E kBnZr1 ):::(A+ E 1 ): The purpose of this paper is to extend the three results above to the perturbed powers P k . 1 Regarding the rst result, we will show that if (A) < 1 then for suciently small E k , the P k approach zero. Moreover, by making the E k small enough we can bring the convergence ratio arbitrarily near (A). Regarding the second result, we will show that if X k kE k k < 1 (1.1) in any norm then the P k converge to xz H for some z (which may be zero), and we will investigate the convergence rate. Finally, a nite-precision implementation of the power method results in perturbations E k are of the order of the rounding unit and therefore do not satisfy (1.1). Thus, we cannot use the second result to analyze the convergence (actually nonconvergence) of the power method in the presence of rounding error. However, using other techniques we can show that in the presence of rounding error the power method will converge up to a point and then stagnate. This paper is organized as follows. In establishing our extensions it will prove conve- nient totransformour matricesbycertain similarity transformations,andanyconditions placed on the transformed perturbations must be translated back to the original prob- lem. Since the transformations can be ill conditioned, it is important to understand the source of the ill-conditioning. Accordingly, the next section is devoted to describing the two transformations we will use. In Section 3 we will establish our extension of the rst result, and in Section 4 the extension of the second. In Section 5 we will give an analysis of the power method. It is worth noting that the last two sections end with a little hook: each provides a new result about the problem treated in the preceding section. 1 The phrase \perturbed powers" is, strictly speaking, a misnomer, since it is the factors, not the powers that are perturbed. Perturbed Matrix Powers 3 Throughout this paper the jth column of the identity matrix will be denoted by e j . In addition, kk will denote a consistent family of norms such that that kAk bounds the norm of any submatrix of A and kdiag(d 1 ;::: ;d n )k = max i jd i j. This class includes the 1-, 2-, 1- norms but excludes the Frobenious norm [4, 8]. We will use the root convergence index to measure speed of convergence. Speci cally, if a k is a sequence converging to zero, and  = limsup k jaj 1=k < 1 we say that a k converges with root index . 2. Two transformations In deriving our results we will have to transform A into ^ A = X BnZr1 AX, for some X appropriate to the problem at hand. In this case, we must also transform the pertur- bation matrices: ^ E k = X BnZr1 E k X. Now k ^ E k k  (X)kE k k, where (X) = kXkkX BnZr1 k is the condition number of X. Hence in order to insure that a bound like k ^ E k k  holds we have to require that kE k k  =(X). Thus it is appropriate to examine the conditions under which (X) is large|that is, under which the transformations are ill conditioned. There are two classes of transformations. The rst transformation is described in the following classic theorem (see, e.g., [9, Theorem I.2.8]). Theorem 2.1. For any  > 0 there is a matrix X such that kX BnZr1 AXk (A)+ : (2.1) The theorem is proved by rst transforming A to Schur form; i.e., U H AU = T; where U is unitary and T is upper triangular. If we then set D = diag(1; ;::: ; nBnZr1 ), the superdiagonal elements of D BnZr1 TD are jBnZri  ij . If we de ne X = UD , then as ! 0 we have kX BnZr1 AX k ! kdiag( 11 ;::: ; nn )k = (A). Consequently, we may set X = X , where is chosen so that (2.1) is satis ed. As decreases, X BnZr1 becomes large. If the o -diagonal elements of T are nonzero, becomes small along with , the more so in proportion as the o diagonal elements of T are large. Large diagonal elements in T are associated with Henrici's measure of nonnormality [5]. Hence, nonnormality in A may weaken our theorems. However, it should be stressed, that the above construction of X is designed to accommodate the worst possible case, and in particular situations there may be a better way to construct T. For example, if A has a complete, well-conditioned system of eigenvectors, then the matrix X of eigenvectors reduces A to diagonal form, so that (2.1) is satis ed for  = 0. The following useful theorem describes second of our transformations. 4 Perturbed Matrix Powers Theorem 2.2. Let  1 be a simple eigenvalue of A with right and left eigenvectors x 1 and y 1 normalized so that kx 1 k 2 = ky 1 k 2 = 1 and = y H x is positive. Let  = p 1BnZr 2 . Then there is a matrix U with (U) = 1+ (2.2) in the 2-norm such that U BnZr1 AU =   1 0 0 B  : (2.3) Proof. By appealing to the CS decomposition [4, 8], we can nd orthonormal matrices (x 1 x 2 X 3 ) and (y 1 y 2 Y 3 ) such that (x 1 x 2 X 3 ) T (y 1 y 2 Y 3 ) = 0 @  0  0 0 0 I 1 A : Let U = ( BnZr 1 2 x 1 BnZr 1 2 y 2 X 3 ) and V = ( BnZr 1 2 y 1 BnZr 1 2 x 2 Y 3 ) . Then it is easily veri ed that V H = U BnZr1 . Moreover, U T U = 0 @ 1=gamma = 0 = 1=gamma 0 0 0 I 1 A Thus kUk 2 2 = kU T Uk 2 = (1+)= . Similarly kVk 2 2 = (1+)= ,which establishes (2.2). The fact that (2.3) is satis ed follows immediately from the fact that the rst column of U is the right eigenvector x 1 and the rst column of V is the left eigenvector y 1 . The matrix U will be ill conditioned when BnZr1 is small. But BnZr1 |the secant of the angle between x 1 and y 1 |is a condition number for the eigenvalue  1 [4, Section 7.2.2], [9, Section I.3.2]. Thus U 1 will be ill conditioned precisely when  1 is. 3. Convergence to zero We are now in the position to state and prove our extension of the rst result. In fact, thanks to results of the last section, it is trivial to establish the following theorem. Theorem 3.1. Let (A) < 1 and consider the perturbed products P k = (A+ E k )(A+ E kBnZr1 )(A+ E 1 ): (3.1) Perturbed Matrix Powers 5 For every  > 0 there is an  > 0 such that if kE k k   then limsup k kP k k 1=k  (A)+ : (3.2) Hence if (A)+ < 1 then P k converges to zero with root convergence index at greatest (A)+. Proof. By Theorem 2.1 there is a nonsingular matrix X such that if ^ A = X BnZr1 AX then k ^ Ak  (A)+=2. Let ^ E k = X BnZr1 E k X, ^ P k = X BnZr1 P k X, and ^ = =2. Then if k ^ E k k ^ we have k ^ A+ ^ E k k  (A)+, and k ^ P k k [(A)+] k . Transforming back to the original problem, we see that if kE k k    ^=(X) then kP k k (X)[(A)+] k . The inequality (3.2) now follows on taking kth roots. There is little to add to this theorem. The price we pay for the perturbations is that to make the root convergence index approach (A) we must increasingly restrict the size of the perturbations. This is unavoidable. For if we x the size of the error at  we can always nd E such that the largest eigenvalue of A+E has magnitude (A)+. If we set E k = E, then P k = (A+E) k , and the best root convergence index we can hope for is (A)+. 4. Convergence with a simple dominant eigenvalue In this section we will treat the behavior of the perturbed powers when A has a single, simple dominant eigenvalue ; i.e., when j 1 j > j 2 j. By dividing by A by  1 , we may assume that  1 = 1. The basic result is given in the following theorem. Theorem 4.1. Let 1 =  1 > j 2 j and let the right eigenvector corresponding to  1 be x. Let P k be de ned as in (3.1). If 1 X i=1 kE k k < 1; (4.1) then for some (possibly zero) vector z we have lim k!1 P k = xz H : The root convergence index is not greater than maxf;g; where  is the largest of the magnitudes of the subdominant eigenvalues of A and  = limsupkE k k 1=k : (4.2) 6 Perturbed Matrix Powers Proof. By Theorem 2.2, we may transform A so that it has the form diag(1;B), where (B) < 1. By Theorem 2.1, we may assume that kBk  < 1. Note that the right eigenvector of the transformed matrix is e 1 . The theorem is best established in a vectorized form. First write P k+1 = (A+E k )P k = AP k +E k P k Let u 6= 0 be given and let p k = P k u. Then p k+1 = Ap k +E k p k : (4.3) We will use this recurrence to show that the p k approach a multiple of e 1 . Our rst job is to nd a condition that insures that the p k remain bounded. To this end, partition (4.3) in the form p (k+1) 1 p (k+1) 2 ! = 1+ e (k) 11 e (k)H 12 e (k) 21 B + E (k) 22 ! p (k) 1 p (k) 2 ! : (4.4) Now let  k = kE k k and let  (k) 1 and  (k) 2 be upper bounds on kp (k) 1 k and kp (k) 2 k. Then by taking norms in (4.4) we see that the components of  (k+1) 1  (k+1) 2 ! =  1+ k  k  k + k   (k) 1  (k) 2 ! (4.5) are upper bounds on kp (k+1) 1 k and kp (k+1) 2 k. Thus if we write  1+ k  k  k + k  = diag(1; )+   k  k  k  k   diag(1; )+ H k ; then the p k will be bounded provided the product 1 Y k=1 kdiag(1; )+H k k < 1: Now 1 Y k=1 kdiag(1; )+H k k  1 Y k=1 (kdiag(1; )k+kH k k)  1 Y k=1 (1+4 k ): It is well known that the product on the right is nite if and only if the series P k  k converges [1, Section 5.2.2]. Hence a sucient condition for the p k to remain bounded is for (4.1) to be satis ed. Perturbed Matrix Powers 7 The next step is to show that p (k) 2 converges to zero. Let  be a uniform upper bound on kp (k) 1 k and kp (k) 2 k. From (4.5) we have  (k+1) 2  (2 k  +  (k) 2 ): Hence if we set  = ^ 1 and de ne ^ k by the recurrence ^ k+1 = ^ k + 2 k ; we have that kp (k) 2 k ^ k . But it is easy to see that if we de ne  0 = 1 2 then ^ k+1 = 2( k  0 + kBnZr1  1 ++ 1  kBnZr1 +  k ): (4.6) It follows that ^ 1 + ^ 2 += 2(1+ + 2 +)( 0 +  1 + 2 +) (4.7) But the geometric series in on the right is absolutely convergent, and by (4.1) the series in  k is also. Thus the series on the left is absolutely convergent, and its terms ^ k must converge to zero. We must next show that p (k) 1 converges. From the rst row of (4.4) we have p (k+1) 1 BnZrp (k) 1 = e (k) 11 p 1 (k)+ e (k) 12 p 2 (k); (4.8) whence jp (k+1) 1 BnZrp (k) 1 j  2 k : (4.9) Since P k  k converges, if we set p (0) 1 = 0, the telescoping series P k j=0 (p (j+1) 1 BnZr p (j) 1 ) = p (k+1) 1 converges. By taking u = e i , we nd that the ith column of P k converges to w i e 1 for some w i . Consequently, if we set w H = (w 1  w n ), then P k converges to e 1 w H . Finally, in assuming that A = diag(1;B) we transformed the original matrix A by a similarity transformation X whose rst column was the dominant eigenvector of A. It follows that the original P k converge to Xe 1 w H X BnZr1 = xz H ; where z H = X BnZr1 w H . We now turn to the rates of convergence. The inequality (4.9) shows that p (k) 1 converges as fast as kE k k approaches zero; i.e., its root convergence index is not greater than  de ned by (4.2). 8 Perturbed Matrix Powers To analyze the convergence of p (k) 2 we make the observation that the reciprocal of the radius of convergence of any function f(z) that is analytic at the origin is = limsup k ja k j 1=k , where a k is the kth coecient in the power series of f [1, Secton II.2.4]. We alsonote that in the expression (4.6)for ^ k+1 we can replace bykB k kand still have an upper bound on kp (k+1) 2 k. Now let r() = P k kB k k k and s() = P k kE k k k . Since kB k k 1=k ! (B) = , we know that the radius of convergence of r is  BnZr1 . By de nition the radius of convergence of s is  BnZr1 . But by (4.7), limsup ^ 1=k k is the reciprocal of the radius of convergence of the function p() = r()s(). Since the radius of convergence of p is at least as great as the smaller of  BnZr1 and  BnZr1 , the root index of convergence of p (k) 2 is not greater than maxf;g. There are four comments to be made about this theorem.  By the equivalence of norms, if the condition (4.1) on the E k holds for one norm, it holds for any norm. Thus, the condition on the errors does not depend on the similarity transformation we used to bring A into the form diag(1;B). But this happy state of a airs obtains only because (4.1) is an asymptotic statement. In practice, the sizes of the initial errors, which do depend on the transformation, may be important.  Since P k converges to xz H , if z 6= 0, at least one column of P k contains an increasingly accurate approximation to x. In the error free case, z is equal to the left eigenvector of A, which is by de nition nonzero. In general, however, we cannot guarantee that z 6= 0, and indeed it is easy to contrive examples for which z is zero. However, it follows from (4.8) that jp (k+1) 1 j jp (k) 1 jBnZr2 k  jp (1) k jBnZr2( k ++ 1 ): Hence if 2 P k  k < kp (1) 1 k, then lim k p k 1 6= 0, and hence lim k P k 6= 0.  The proof can be extended to the case where A has more than one dominant eigen- value, provided they are all simple. The key is to use a generalization of Theorem 2.2 that uses bases for the left and right dominant eigenspaces of A, to reduce A to the form diag(D;B), where jDj= I. The quantities p (k) 1 and p (k+1) 1 in (4.4) are no longer scalars, but the recursion (4.5) for upper bounds remains the same, as does the subsequent analysis.  We have been interested in the case where A has a simple dominant eigenvalue of one. However, the proof of the theorem can easily be adapted tothe case where (A) < 1 with no hypothesis of simplicity (it is essentially the analysis of p (k) 2 without the contributions from p (k) 1 ). The result is the following corollary. Corollary 4.2. Let (A) < 1 and let E k satisfy (4.1). Then P k ! 0 and the root convergence index is not greater than maxf;g. Perturbed Matrix Powers 9 5. The power method The power method starts with a vector u 1 and generates a sequence of vectors according to the formula u k+1 =  k Au k ; where  k is a normalizing factor. If A has a simple dominant eigenvalue (which we may assume to be one), under mild restrictions on u 1 , the u k converge to the dominant eigenvector of A. A backward rounding-error analysis shows that in the presence of rounding error we actually compute u k+1 =  k (A+ E k )u k = ( k  1 )P k u 1 : where kE k k=kA k k is of the order of the rounding unit [6, 8]. Theorem 4.1 is not well suited to analyzing this method for two reasons. First the E k will all be roughly the same size, so that the condition (4.1) is not satis ed. But even if it were, it is possible for the P k to approach zero while at the same time the normalized vectors u k converge to a nonzero limit, in which case Theorem 4.1 says nothing useful. Accordingly, in this section we give a di erent convergence analysis for the power method. As in the last section we will assume that A = diag(1;B), where kBk = . Let  k = kE k k. We will normalize the u k so that the rst component is one and write u k =  1 h k  : In is important to have some appreciation of the magnitudes of the quantities in- volved. If the computations are being done in IEEE double precision,  will around p n10 BnZr16 ; e.g., 10 BnZr14 if n = 10;000. If u 1 is a random vector, we can expect kh 1 k to be of order p n; e.g., 100, if n = 10;000. Finally, since the ratio of convergence of the power method is approximately , must not be too near one; e.g., 0:99 gives unacceptably slow convergence. Thus we may assume that kh 1 k and =(1BnZr ) are small. Let  k be an upper bound for kh k k. We will derive an upper bound  k+1 for kh k+1 k, in the form of the quotient of a lower bound on the rst component of (A+E k )u k and and upper bound on the rest of the vector. We have (A+ E k )u k = 1+ e (k) 11 e (k)H 12 e (k) 21 B +E (k) 22 !  1 h k  : A lower bound on the rst component of this vector is 1BnZr(1+ k ) k 10 Perturbed Matrix Powers and an upper bound on the lower part is  k +  k (1+  k ): Hence kh k+1 k  k+1   k + k (1+ k ) 1BnZr(1+  k ) k Let '  () =  + (1+) 1BnZr(1+ ) : so that  k+1 = '  k ( k ). It is easily veri ed that '  has a xed point   = 2 c+ p c 2 BnZr4 ; where c = 1BnZr BnZr2  : Moreover, ' 0  () = 1BnZr(1+) +  +(1+ ) [1BnZr(1+ )] 2 : (5.1) Given our assumptions on the magnitudes of the quantities involved, '  () is ap- proximately a straight line with slope and xed point   = =(1BnZr ). Thus we see that kh k kBnZr   , must decrease by a factor of about with each iteration. Since   is of order , this means that the  k initially appear to converge toward zero as k ; but this convergence stagnates as  k approaches   . To the extent that the bounds re ect reality, the power method converges with ratio at greatest until the error is reduced to a multiple of the rounding unit divided by 1BnZr . Thus the power method can be expected to give good accuracy in the presence of rounding error. We can use this analysis to show that if  k = kE k k converges monotonically to zero and  1 is suitably small, then the power method converges. Speci cally, we have the following theorem. Theorem 5.1. In the above notation, let 0 < < 1. For any  1 , there is an  1 such that if the sequence  1 ; 2 ;::: approaches zero monotonically then the sequence de ned by  k+1 = '  k ( k ); k = 1;2;::: ; converges monotonically to zero. Perturbed Matrix Powers 11 Proof. From (5.1) it is clear that if  1 is suciently small then ' 0  ()  < 1 for any  <  1 and  <  1 . It then follows from the theory of xed point iterations that the sequence  1 ; 2 ;::: is monotonic decreasing. Let its limit be ^. We must show that ^ = 0. Let  > 0 be given. Now lim !0 '  () =  uniformly on [0; 1 ]. Hence there is an integer K > 0 such that k  K =) j'  k ( k )BnZr  k j <  2 : We may also assume that K is so large that k  K =) j  k BnZr ^j <  2 : Then for k  K j k+1 BnZr ^j = j'  k ( k )BnZr ^j j'  k ( k )BnZr  k j+j  k BnZr ^ k j < : It follows that  k ! ^. But since  k ! ^ and 6= 0, we must have ^ = 0. This theorem has an important implication for the behavior of the perturbed powers P k , which was treated in the previous section. The jth column of P k , suitably scaled, is just the result of applying the unscaled power method with error to e j . Now suppose that y H e j 6= 0, where y is the dominant left eigenvector. Then if  1   2   and  1 is suciently small, the jth column of P k , suitably scaled, approximates the dominant eigenvector of A, even if P k converges to zero. Thus if we are interested only in the behavior of the columns of P k , we can relax the condition that P k  k < 1. However, the price we pay is a less clean estimate of the asymptotic convergence rate. Acknowledgements I would like to thank Donald Estep and Sean Eastman for their comments on this paper, and especially Sean Eastman for the elegant proof of Theorem 5.1. I am indebted to the Mathematical and Computational Sciences Division of the National Institute of Standards and Technology for the use of their research facilities. References [1] L. V. Ahlfors. Complex Analysis. McGraw{Hill, New York, 1966. [2] W. Gautschi. The asymptotic behavior of powers of a matrix. Duke Mathematical Journal, 20:127{140, 1953. [3] W. Gautschi. The asymptotic behavior of powers of a matrix. II. Duke Mathematical Journal, 20:275{279, 1953. 12 Perturbed Matrix Powers [4] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, MD, third edition, 1996. [5] P. Henrici. Bounds for iterates, inverses, spectral variation and elds of values of nonnormal matrices. Numerische Mathematik, 4:24{39, 1962. [6] N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 1996. [7] A. M. Ostrowski. Solution of Equations and Systems of Equations. Academic Press, New York, 1960. [8] G. W. Stewart. Matrix Algorithms I: Basic Decompositions. SIAM, Philadelphia, 1998. [9] G. W. Stewart. Matrix Algorithms II: Eigensystems. SIAM, Philadelphia, 2001.