University of Maryland College Park
Institute for Advanced Computer Studies TR{92{43
Department of Computer Science TR{2878
Scaling for Orthogonality

Alan Edelman
y
and G. W. Stewart
z
April 1992
ABSTRACT
In updating algorthms where orthogonal transformations are accumu-
lated, it is important to preserve the orthogonality of the product in
the presence of rounding error. Moonen, Van Dooren, and Vandewalle
have pointed out that simply normalizing the columns of the product
tends to preserveorthogonality|though not, as DeGroat points out, to
working precision. In this note we give an analysis of the phenomenon.

This report is available by anonymous ftp from thales.cs.umd.edu in the directory
pub/reports.
y
Department of Mathematics and Lawrence Berkeley Laboratory, University of California,
Berkeley, CA 94720, edelman@math.berkeley.edu. Supported by the Applied Mathematical
Sciences subprogram of the Oce of Energy Research, U.S. Department of Energy under Con-
tract DE-AC03-76SF00098.
z
Department of Computer Science and Institute for Advanced Computer Studies, Univer-
sity of Maryland, College Park, MD 20742. stewart@cs.umd.edu. This work was performed
while the author was visiting the Institute for Mathematics and Its Applications, University of
Minnesota, Minneapolis, MN 55455.
Scaling for Orthogonality
Alan Edelman
G. W. Stewart
In many updating algorithms it is required to accumulate a product of the form
X
k
= Q
1
Q
kBnZr1
Q
k
;
where the matrices Q
i
are orthogonal. Although mathematically speaking X
k
must be orthogonal, in practice rounding error will cause it to drift from orthog-
onality with increasing k. If we take the deviation of X
T
k
X
k
from the identity as
a measure of the loss of orthogonality, then typically
kI BnZrX
T
k
X
k
k
F
 k
n

M
;
where kk
F
is the Frobenius norm, 
M
is the rounding unit for the arithmetic in
question, and 
n
is a slowly growing function of the size n of X
k
(e.g. n
1:5
).
As a cure for this problem DeGroat and Roberts [1] have proposed that each
X
k
be subjected to a partial reorthogonalization in which the second column is
orthogonalized against the rst, the third against the second, and so on with all
the columns being renormalized after orthogonalization. In a subsequent note
on their paper Moonen, Van Dooren, and Vandewalle [2] pointed out that the
normalization alone is sucient to maintain orthogonality and supported their
claim with a heuristic argument. In a reply DeGroat pointed out that normaliza-
tion \does not yield working precision orthogonality." However, the error remains
quite small.
The purpose of this note is to give a more complete analysis of the method,
one that explains the phenomena mentioned in the last paragraph. In particular,
we show that this method succeeds when the Q
i
manage to transfer o-diagonal
error in the matrices I BnZr X
T
i
X
i
to the diagonal. We also show that normalizing
is the best possible scaling up to to rst order. However, it can actually decrease
orthogonality in certain unlikely circumstances.
For notational convenience we will drop subscripts and write
^
X = XQ;
where X is scaled so that its its column norms are one and Q is orthogonal (for
the moment we ignore rounding error). Since X is normalized, we can write
A  X
T
X = I + E;
1
Scaling for Orthogonality 2
where the diagonals of E are zero. Write
^
A 
^
X
T
^
X = I +
^
D +
^
E;
where
^
D +
^
E = Q
T
EQ (1)
is a decomposition of Q
T
EQ into its diagonal and o-diagonal parts. In this
notation, the scaling of
^
X amounts to setting
^
S = (I +
^
D)
BnZr1
(2)
and
~
X =
^
X
^
S
1
2
:
The deviation from orthogonality of
~
X is the Frobenius norm of
~
E =
^
S
1
2
^
E
^
S
1
2
: (3)
The above equations dene a recurrence for E,
~
E, etc., which we are going to
analyze. But rst we will motivate the scaling by comparing it with the optimal
scaling, which is characterized in the following theorem.
Theorem 1. For any diagonal matrix D let diag(D) denote the vector consisting
of the diagonal element of D. Then for all suciently small E, the optimal scaling
matrix S satises
^
A
^
Adiag(S) = diag(I + D); (4)
where
^
A
^
A is the component-wise product (a.k.a., the Schur or Hadamard prod-
uct) of
^
A with itself.
Proof. Regarded as a function of the elements of S, the function kS
1
2
^
ES
1
2
k
2
F
is
a quadratic function that is bounded below by zero. Dierentiating this function
and setting the results to zero, we obtain (4). It follows that if (4) has a positive
solution, then that solution will provide the optimal scaling. Now lim
^
E!0
^
A
^
A =
(I + D)
2
. Consequently,
lim
^
E!0
diag(S) = lim
^
E!0
(
^
A
^
A)
BnZr1
diag(I + D) = diag[(I + D)
BnZr1
] = diag(
^
S) > 0: (5)
Hence for all suciently small E, the solution of (4) is positive.
3 Scaling for Orthogonality
Equation (5) provides a heuristic justication for the method, since it says
that to rst order in E our scaling approximates the optimal scaling. However,
the matrix
^
A =
 
1BnZr
2

 1BnZr
2
!
shows that the method is not guaranteed to increase orthogonality for all small
E. Nevertheless, this situation is quite unlikely, as we will now demonstrate by
an analysis of the recurrence (3).
First note that from (1) and the unitary invariance of the Frobenius norm we
have
kEk
2
F
= k
^
Dk
2
F
+k
^
Ek
2
F
: (6)
Now the square of the (i;j) element of
~
E is
^e
2
ij
(1 +
^
d
i
)(1 +
^
d
j
)

^e
2
ij
(1 BnZrk
^
Dk
F
)
2
:
Here
^
d
i
is the ith element of
^
D, and we assume that k
^
Dk
F
< 1. Hence
k
~
Ek
2
F

k
^
Ek
2
F
(1BnZrk
^
Dk
F
)
2
: (7)
Setting
 = kEk
F
and
^
 = k
^
Dk
F
;
we have from (6) and (7)
k
~
Ek
2
F
 ~
2


2
BnZr
^

2
(1 BnZr
^
)
2
: (8)
A littleextra notation willhelp us decidewhen the scaling results in an increase
of orthogonality. Since from (6) we have
^
  , we can write
^
 = ; 0    1:
In this notation the equality in (8) becomes
~
2
= 
2
"
1 BnZr
2
(1 BnZr)
2
#
 
2
'(): (9)
Thus the problem is to ascertain when '() is less than one. The following facts
are easily veried.
Scaling for Orthogonality 4
1. '()  1 in the interval [0;2=(1 + 
2
)]. At  =  it assumes a maximum of
(1 BnZr
2
)
BnZr1
.
2. '() decreases monotonically from one to zero on the interval [2=(1+
2
);1].
In terms of our iteration, if
^
 is too small, roughly less than 2
2
, then the
scaling has the potential to reduce orthogonality|but not by very much if 
is at all small. For larger
^
 the scaling is guaranteed to increase orthogonality.
Otherwise put, multiplication by the matrix Q moves part of E to the diagonal
where it is eliminated by the scaling. The more of E that is moved to the diagonal
the better.
The amount of E that is moved will depend on Q, which in turn depends on
the application in question. However, it is interesting to note what happens when
Q is chosen at random uniformly from the group of orthogonal matrices. To do
so we prove
Theorem 2. Let Q = (q
1
;:::;q
n
) be a random orthogonal matrix, uniformly
distributed over the group of orthogonal matrices. Then for any symmetricmatrix
E
E
 
n
X
i=1
(q
T
i
Eq
i
)
2
!
=
1
n + 2
[trace(E)
2
+ 2kEk
2
F
];
where E is the expectation operator.
Proof. Let u denote a random vector of n independent standard normals. Let
r denote kuk and v = u=r (n.b., v is a typical column of Q). It is well known
that v is distributed uniformly over the sphere, while r
2
is independent with 
2
n
distribution. Thus using standard results on the moments of the normal and 
2
distributions, we have
Ev
4
i
=
Eu
4
i
Er
4
=
3
n(n + 2)
and
E(v
2
i
v
2
j
) =
E(u
2
i
u
2
j
)
Er
4
=
1
n(n + 2)
; i 6= j:
It is clearly sucient to prove the lemma for diagonal matrices, say
E = diag(
1
;:::;
n
):
For this case the result follows easily on expanding
P
n
i=1
(q
T
i
Eq
i
)
2
and using the
above formulas to take expectations (recall that trace(E)
2
= kEk
2
F
+
P
i6=j

i

j
).
5 Scaling for Orthogonality
In our application, the trace of E is zero and we have on the average
^

2
=
2
n + 2

2
;
i.e., 
2
= 2=(n + 2). Thus,  is of the same order as , and by the second
observations following (9) we can expect to observe an increase of orthogonality.
However, this increase decreases as n grows. For if  is small enough so that the
denominator in '() can be ignored, an iteration will reduce 
2
on the average by
a factor of of only n=(n + 2).
Finally, returning to the role of rounding error, its eect is to add errors to
~
E.
The Frobenius norm of this error will be proportional to the rounding unit 
M
,
say 
n

M
. Thus the recurrence (9) must be rewritten in the form
~ = '()
1
2
 + 
n

M
:
If we assume that  is constant, then this recurrence has the xed point
 =

n

M
1 BnZr'()
1
2

=
2
n

M

2
;
the last approximation holding for small gamma. For example, with random Q
we should not expect to reduce the measure of orthogonality much below (n +
2)
n

M
. These considerations perhaps explain the lack of orthogonality to working
precision noticed by DeGroat.
References
[1] R. D. DeGroat and R. A. Roberts. Ecient numerically stabilized rank-one
eigenstructure updating. IEEE Transactions on Acoustics, Speech, and Signal
Processing, 38:301{316, 1990. Cited in [2].
[2] M. Moonen, P. Van Dooren, and J. Vandewalle. A note on \ecient nu-
merically stabilized rank-one eigenstructure updating. IEEE Transactions on
Signal Processing, 39:1911{1913, 1991. Reply by DeGroat, pp. 1913{1914.