University of Maryland College Park Institute for Advanced Computer Studies TR{92{43 Department of Computer Science TR{2878 Scaling for Orthogonality  Alan Edelman y and G. W. Stewart z April 1992 ABSTRACT In updating algorthms where orthogonal transformations are accumu- lated, it is important to preserve the orthogonality of the product in the presence of rounding error. Moonen, Van Dooren, and Vandewalle have pointed out that simply normalizing the columns of the product tends to preserveorthogonality|though not, as DeGroat points out, to working precision. In this note we give an analysis of the phenomenon.  This report is available by anonymous ftp from thales.cs.umd.edu in the directory pub/reports. y Department of Mathematics and Lawrence Berkeley Laboratory, University of California, Berkeley, CA 94720, edelman@math.berkeley.edu. Supported by the Applied Mathematical Sciences subprogram of the Oce of Energy Research, U.S. Department of Energy under Con- tract DE-AC03-76SF00098. z Department of Computer Science and Institute for Advanced Computer Studies, Univer- sity of Maryland, College Park, MD 20742. stewart@cs.umd.edu. This work was performed while the author was visiting the Institute for Mathematics and Its Applications, University of Minnesota, Minneapolis, MN 55455. Scaling for Orthogonality Alan Edelman G. W. Stewart In many updating algorithms it is required to accumulate a product of the form X k = Q 1 Q kBnZr1 Q k ; where the matrices Q i are orthogonal. Although mathematically speaking X k must be orthogonal, in practice rounding error will cause it to drift from orthog- onality with increasing k. If we take the deviation of X T k X k from the identity as a measure of the loss of orthogonality, then typically kI BnZrX T k X k k F  k n  M ; where kk F is the Frobenius norm,  M is the rounding unit for the arithmetic in question, and  n is a slowly growing function of the size n of X k (e.g. n 1:5 ). As a cure for this problem DeGroat and Roberts [1] have proposed that each X k be subjected to a partial reorthogonalization in which the second column is orthogonalized against the rst, the third against the second, and so on with all the columns being renormalized after orthogonalization. In a subsequent note on their paper Moonen, Van Dooren, and Vandewalle [2] pointed out that the normalization alone is sucient to maintain orthogonality and supported their claim with a heuristic argument. In a reply DeGroat pointed out that normaliza- tion \does not yield working precision orthogonality." However, the error remains quite small. The purpose of this note is to give a more complete analysis of the method, one that explains the phenomena mentioned in the last paragraph. In particular, we show that this method succeeds when the Q i manage to transfer o -diagonal error in the matrices I BnZr X T i X i to the diagonal. We also show that normalizing is the best possible scaling up to to rst order. However, it can actually decrease orthogonality in certain unlikely circumstances. For notational convenience we will drop subscripts and write ^ X = XQ; where X is scaled so that its its column norms are one and Q is orthogonal (for the moment we ignore rounding error). Since X is normalized, we can write A  X T X = I + E; 1 Scaling for Orthogonality 2 where the diagonals of E are zero. Write ^ A  ^ X T ^ X = I + ^ D + ^ E; where ^ D + ^ E = Q T EQ (1) is a decomposition of Q T EQ into its diagonal and o -diagonal parts. In this notation, the scaling of ^ X amounts to setting ^ S = (I + ^ D) BnZr1 (2) and ~ X = ^ X ^ S 1 2 : The deviation from orthogonality of ~ X is the Frobenius norm of ~ E = ^ S 1 2 ^ E ^ S 1 2 : (3) The above equations de ne a recurrence for E, ~ E, etc., which we are going to analyze. But rst we will motivate the scaling by comparing it with the optimal scaling, which is characterized in the following theorem. Theorem 1. For any diagonal matrix D let diag(D) denote the vector consisting of the diagonal element of D. Then for all suciently small E, the optimal scaling matrix S satis es ^ A ^ Adiag(S) = diag(I + D); (4) where ^ A ^ A is the component-wise product (a.k.a., the Schur or Hadamard prod- uct) of ^ A with itself. Proof. Regarded as a function of the elements of S, the function kS 1 2 ^ ES 1 2 k 2 F is a quadratic function that is bounded below by zero. Di erentiating this function and setting the results to zero, we obtain (4). It follows that if (4) has a positive solution, then that solution will provide the optimal scaling. Now lim ^ E!0 ^ A ^ A = (I + D) 2 . Consequently, lim ^ E!0 diag(S) = lim ^ E!0 ( ^ A ^ A) BnZr1 diag(I + D) = diag[(I + D) BnZr1 ] = diag( ^ S) > 0: (5) Hence for all suciently small E, the solution of (4) is positive. 3 Scaling for Orthogonality Equation (5) provides a heuristic justi cation for the method, since it says that to rst order in E our scaling approximates the optimal scaling. However, the matrix ^ A = 1BnZr 2   1BnZr 2 ! shows that the method is not guaranteed to increase orthogonality for all small E. Nevertheless, this situation is quite unlikely, as we will now demonstrate by an analysis of the recurrence (3). First note that from (1) and the unitary invariance of the Frobenius norm we have kEk 2 F = k ^ Dk 2 F +k ^ Ek 2 F : (6) Now the square of the (i;j) element of ~ E is ^e 2 ij (1 + ^ d i )(1 + ^ d j )  ^e 2 ij (1 BnZrk ^ Dk F ) 2 : Here ^ d i is the ith element of ^ D, and we assume that k ^ Dk F < 1. Hence k ~ Ek 2 F  k ^ Ek 2 F (1BnZrk ^ Dk F ) 2 : (7) Setting  = kEk F and ^  = k ^ Dk F ; we have from (6) and (7) k ~ Ek 2 F  ~ 2   2 BnZr ^  2 (1 BnZr ^ ) 2 : (8) A littleextra notation willhelp us decidewhen the scaling results in an increase of orthogonality. Since from (6) we have ^   , we can write ^  = ; 0   1: In this notation the equality in (8) becomes ~ 2 =  2 " 1 BnZr 2 (1 BnZr ) 2 #   2 '( ): (9) Thus the problem is to ascertain when '( ) is less than one. The following facts are easily veri ed. Scaling for Orthogonality 4 1. '( )  1 in the interval [0;2=(1 +  2 )]. At =  it assumes a maximum of (1 BnZr 2 ) BnZr1 . 2. '( ) decreases monotonically from one to zero on the interval [2=(1+ 2 );1]. In terms of our iteration, if ^  is too small, roughly less than 2 2 , then the scaling has the potential to reduce orthogonality|but not by very much if  is at all small. For larger ^  the scaling is guaranteed to increase orthogonality. Otherwise put, multiplication by the matrix Q moves part of E to the diagonal where it is eliminated by the scaling. The more of E that is moved to the diagonal the better. The amount of E that is moved will depend on Q, which in turn depends on the application in question. However, it is interesting to note what happens when Q is chosen at random uniformly from the group of orthogonal matrices. To do so we prove Theorem 2. Let Q = (q 1 ;:::;q n ) be a random orthogonal matrix, uniformly distributed over the group of orthogonal matrices. Then for any symmetricmatrix E E n X i=1 (q T i Eq i ) 2 ! = 1 n + 2 [trace(E) 2 + 2kEk 2 F ]; where E is the expectation operator. Proof. Let u denote a random vector of n independent standard normals. Let r denote kuk and v = u=r (n.b., v is a typical column of Q). It is well known that v is distributed uniformly over the sphere, while r 2 is independent with  2 n distribution. Thus using standard results on the moments of the normal and  2 distributions, we have Ev 4 i = Eu 4 i Er 4 = 3 n(n + 2) and E(v 2 i v 2 j ) = E(u 2 i u 2 j ) Er 4 = 1 n(n + 2) ; i 6= j: It is clearly sucient to prove the lemma for diagonal matrices, say E = diag( 1 ;:::; n ): For this case the result follows easily on expanding P n i=1 (q T i Eq i ) 2 and using the above formulas to take expectations (recall that trace(E) 2 = kEk 2 F + P i6=j  i  j ). 5 Scaling for Orthogonality In our application, the trace of E is zero and we have on the average ^  2 = 2 n + 2  2 ; i.e., 2 = 2=(n + 2). Thus,  is of the same order as , and by the second observations following (9) we can expect to observe an increase of orthogonality. However, this increase decreases as n grows. For if  is small enough so that the denominator in '( ) can be ignored, an iteration will reduce  2 on the average by a factor of of only n=(n + 2). Finally, returning to the role of rounding error, its e ect is to add errors to ~ E. The Frobenius norm of this error will be proportional to the rounding unit  M , say  n  M . Thus the recurrence (9) must be rewritten in the form ~ = '( ) 1 2  +  n  M : If we assume that is constant, then this recurrence has the xed point  =  n  M 1 BnZr'( ) 1 2  = 2 n  M 2 ; the last approximation holding for small gamma. For example, with random Q we should not expect to reduce the measure of orthogonality much below (n + 2) n  M . These considerations perhaps explain the lack of orthogonality to working precision noticed by DeGroat. References [1] R. D. DeGroat and R. A. Roberts. Ecient numerically stabilized rank-one eigenstructure updating. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38:301{316, 1990. Cited in [2]. [2] M. Moonen, P. Van Dooren, and J. Vandewalle. A note on \ecient nu- merically stabilized rank-one eigenstructure updating. IEEE Transactions on Signal Processing, 39:1911{1913, 1991. Reply by DeGroat, pp. 1913{1914.