CHOOSING REGULARIZATION PARAMETERS IN ITERATIVE
METHODS FOR ILL-POSED PROBLEMS

MISHA E. KILMER
y
AND DIANNE P. O'LEARY
z
Abstract. Numerical solution of ill-posed problems is often accomplished by discretization
(projection onto a nite dimensional subspace) followed by regularization. If the discrete problem
has high dimension, though, typically we compute an approximate solution by projection onto an
even smaller dimensional space, via iterative methods based on Krylov subspaces. In this work we
present ecient algorithms that regularize after this second projection rather than before it. We
prove some results on the approximate equivalence of this approach to other forms of regularization
and we present numerical examples.
Key words. ill-posed problems, regularization, discrepancy principle, iterative methods, L-
curve, Tikhonov, TSVD, projection, Krylov subspace
65R30,65F20
Running Title: Choosing Regularization Parameters
1. Introduction. Linear, discrete ill-posed problems of the form
Ax = b(1)
or
min
x
kAxBnZrbk
2
; or equivalently, A

Ax = A

b(2)
arise, for example, from the discretization of rst-kind Fredholm integral equations
and occur in a variety of applications. We shall assume that the full-rank matrix A
is mn, with m  n in (2) and m = n in (1). In discrete ill-posed problems, A is ill-
conditioned and there is no gap in the singular value spectrum. Typically, the right
hand side b contains noise due to measurement and/or approximation error. This
noise, in combination with the ill-conditioning of A, means that the exact solution of
(1) or (2) has little relationship to the noise-free solution and is worthless. Instead, we
use a regularization method to determine a solution that approximates the noise-free
solution. Regularizationmethodsreplace the originaloperator byabetter-conditioned
but related one in order to diminish the eects of noise in the data and produce a
regularized solution to the original problem. Sometimes this regularized problem is
too large to solve exactly. In that case, we typically compute an approximate solution
by projection onto an even smaller dimensional space, perhaps via iterative methods
based on Krylov subspaces.
The conditioning of the new problem is controlled by one or more regularization
parameters specic to the method. A large regularization parameter yields a new well-
conditioned problem, but its solution may be far from the noise-free solution since
the new operator is a poor approximation to A. A small regularization parameter
generally yields a solution very close to the noise-contaminated exact solution of (1)
or (2), and hence its distance from the noise-free solution also can be large. Thus,

This work was supported by the National Science Foundationunder Grants CCR 95-03126 and
CCR-97-32022 and by the Army Research Oce, MURI Grant DAAG55-97-1-0013.
y
Dept. of Computer and Electrical Engineering, Northeastern University, Boston, MA 02115
(mkilmer@ece.neu.edu)
z
Dept. of Computer Science and Institute for Advanced Computer Studies, University of Mary-
land, College Park, MD 20742 (oleary@cs.umd.edu).
1
a key issue in regularization methods is to choose a regularization parameter that
balances the error due to noise with the error due to regularization.
A wise choice of regularization parameter is obviously crucial to obtaining useful
approximate solutions to ill-posed problems. For problems small enough that a rank-
revealing factorization or singular value decomposition of A can be computed, there
are well-studied techniques for computing a good regularization parameter. These
techniques include the Discrepancy Principle [8], generalized cross-validation (GCV)
[9], and the L-curve [15]. For larger problems treated by iterative methods, though,
the parameter choice is much less understood. If regularization is applied to the
projected problemthat is generated by the iterative method, then there are essentially
two regularization parameters: one for the standard regularization algorithms, such
as Tikhonov or truncated SVD, and one controlling the number of iterations taken.
One subtle issue is that the standard regularization parameter that is correct for the
discretized problem may not be the optimal one for the lower-dimensional problem
actually solved by the iteration, and this observation leads to the research discussed
in this paper. At rst glance, there can appear to be a lot of work associated with
the selection of a good regularization parameter, and many algorithms proposed in
the literature are needlessly complicated. But by regularizing after projection by the
iterative method, so that we are regularizing the lower dimensional problem that is
actually being solved, much of this diculty vanishes.
The purpose of this paper is to present parameter selection techniques designed
to reduce the regularization work for iterative methods such as Krylov subspace tech-
niques. Our paper is organized as follows. In x2, we will give an overview of the
regularization methods we will be considering, and we follow up in x3 by surveying
some methods for choosing the corresponding regularization parameters. In x4, we
show how parameter selection techniques for the original problem can be applied in-
stead to a projected problem obtained from an iterative method, greatly reducing the
cost without much degradation in the solution. We give experimental results in x5
and conclusions and future work in x6.
2. Regularization background. In the following we shall assume that b =
b
true
+ e, where b
true
denotes the unperturbed data vector and e denotes zero-mean
white noise. We will also assume that b
true
satises the discrete Picard condition;
that is, the spectral coecients of b
true
decay faster, on average, than the singular
values.
Under these assumptions, it is easy to see why the exact solution to (1) or (2) is
hopelessly contaminated by noise. Let
^
U
^
V

denote the singular value decomposition
of A, where the columns of
^
U and
^
V are the singular vectors, and the singular values
are ordered as 
1
 
2
 ::: 
n
. Then the solution to (1) or (2) is given by
x =
n
X
i=1
^u

i
b

i
^v
i
=
n
X
i=1

^u

i
b
true

i
+
^u

i
e

i

^v
i
:(3)
As a consequence of the white noise assumption, j^u

i
ej is roughly constant for all i,
while the discrete Picard condition guarantees that j^u

i
b
true
j decreases with i faster
than 
i
does . The matrix A is ill-conditioned, so small singular values magnify the
corresponding coecients ^u

i
e in the second sum, and it is this large contribution of
noise from the approximate null space of A that renders the exact solution x dened
in (3) worthless. The following regularization methods try in dierent ways to lessen
the contribution of noise to the solution. For further information on these methods,
see, for example, [17].
2
2.1. Tikhonov regularization. One of the most common methods of regular-
ization is Tikhonov regularization [34]. In this method, the problem (1) or (2) is
replaced with the problem of solving
min
x
kAxBnZrbk
2
2
+ 
2
kLxk
2
2
(4)
where L denotes a matrix, often chosen to be the identity matrix I or a discrete
derivative operator, and  is a positive scalar regularization parameter. For ease in
notation, we will assume that L = I. Solving (4) is equivalent to solving
(A

A+
2
I)x

= A

b:(5)
In analogy with (3) we have
x

=
n
X
i=1


i
^u

i
b
true

2
i
+
2
+

i
^u

i
e

2
i
+
2

^v
i
:(6)
In this solution, the contributions from noise components ^u

i
e for values of 
i
<  are
much smaller than they are in (3), and thus x

can be closer to the noise-free solution
than x is. If  is too large, however, A

A+
2
I is very far from the original operator
A

A, and x

is very far from x
true
, the solution to (2) when e = 0. Conversely, if 
is too small, the singular values of the new operator A

A + 
2
I are close to those of
A

A; thus x

 x, so small singular values again greatly magnify noise components.
2.2. Truncated SVD. In the truncated SVD method of regularization, the reg-
ularized solution is chosen simply by truncating the expansion in (3) as
x
`
=
nBnZr`
X
i=1
^u

i
b

i
^v
i
:(7)
Here the regularization parameter is `, the number of terms to be dropped from the
sum. Observe that if ` is small, very few terms are dropped from the sum, so x
`
resembles x in that the eects of noise are large. If ` is too large, however, important
information could be lost; such is the case if ^u

i
b
true
 ^u

i
e for some i > nBnZr`.
An alternative, yet related, approach to TSVD is an approach introduced by Rust
[31] where the truncation strategy is based on the value of each spectral coecient
^u

i
b itself. The strategy is to include in the sum (3) only those terms corresponding to
a spectral coecient ^u

i
b whose magnitude is greater than or equal to some tolerance
, which can be regarded as the regularization parameter.
2.3. Projectionand iterativemethods. Solving(5) or (7) can be impractical
if n is large, but fortunately, regularization can be achieved through projection onto
a subspace; see, for example, [7]. The truncated SVD is an example of one such
projection: the solution is constrained to lie in the subspace spanned by the singular
vectors corresponding to the largest nBnZr` singular values. Other projections can be
more economical. In general, we constrain our regularized solution to lie in some
k-dimensional subspace of C
n
, spanned by the columns of an nk matrix Q
(k)
. For
example, we choose x
(k)
reg
= Q
(k)
y
(k)
where y
(k)
solves
min
y2C
k
kAQ
(k)
y BnZrbk
2
2
(8)
3
or equivalently
(Q
(k)
)

A

AQ
(k)
y = (Q
(k)
)

A

b:(9)
The idea is that with an appropriately chosen subspace, the operator (Q
(k)
)

A

AQ
(k)
will be better conditioned than the original operator and hence that x
(k)
reg
will approx-
imate x
true
well on that subspace.
This projection is often achieved through the use of iterative methods such as con-
jugate gradients, GMRES, QMR, and other Krylov subspace methods. The matrix
Q
(k)
then contains orthonormalcolumnsgenerated via a Lanczos tridiagonalizationor
bidiagonalization process [27, 1]. In this case, Q
(k)
is a basis for some k-dimensional
Krylov subspace (i.e., the subspace K
k
(c;K) spanned by the vectors c;Kc;:::;K
kBnZr1
c
for some matrix K and vector c). The regularized solutions x
(k)
reg
are generated iter-
atively as the subspaces are built. Krylov subspace algorithms such as CG, CGLS,
GMRES, and LSQR tend to produce, at early iterations, solutions that resemble x
true
in the subspace spanned by (right) singular vectors of A corresponding to the largest
singular values. At later iterations, however, these methods start to reconstruct in-
creasing amounts of noise into the solution. This is due to the fact that for large k,
the operator (Q
(k)
)

A

AQ
(k)
approaches the ill-conditioned operator A

A. There-
fore, the choice of the regularization parameter k, the stopping point for the iteration
and the dimension of the subspace, is very important.
1
2.4. Hybrid methods: projection plus regularization. Another important
family of regularization methods, often referred to as hybrid methods [17], was intro-
duced by O'Leary and Simmons [27]. These methods combine a projection method
with a direct regularization method such as TSVD or Tikhonov regularization. The
problem is projected onto a particular subspace of dimension k, but typically the
restricted operator in (9) is still ill-conditioned. Therefore, a regularization method
is applied to the projected problem. Since the dimension k is usually small relative
to n, regularization of the restricted problem is much less expensive. Yet, with an
appropriately chosen subspace, the end results can be very similar to those achieved
by applying the same direct regularization technique to the original problem. We will
become more precise about how \similar" the solutions are in x4.5. Because the pro-
jected problems are usually generated iteratively by a Lanczos method, this approach
is useful when A is sparse or structured in such a way that matrix-vector products
can be handled eciently with minimal storage.
3. Existing parameter selectionmethods. In this section, we discuss a sam-
pling of the parameter selection techniques that have been proposed in the literature.
They dier in the amount of a priori information required as well as in the decision
criteria.
3.1. The Discrepancy Principle. If some extra information is available { for
example, an estimate of the variance of the noise vector e { then the regularization
parameter can be chosen rather easily. Morozov's Discrepancy Principle [25] says that
if  is the expected value of kek
2
, then the regularization parameter should be chosen
so that the norm of the residual corresponding to the regularized solution x
reg
is ;
that is,
kAx
reg
BnZrbk
2
= ;(10)
1
Usually, small values of the regularizationparametercorrespondto a closer solutionto the noisy
equation, but despite this, we will call k, rather than 1=k, the regularization parameter.
4
10?3 10?2 10?1 100 101
101
102
103
|| r? ||2 
|| x
? ||
2 
Fig. 1. Example of a typical L-curve. This particular L-curve corresponds to applying Tikhonov
regularization to the problem in Example 2
where  > 1 is some predetermined real number. Note that as  ! 0; x
reg
! x
true
.
Other methods based on knowledge of the variance are given, for example, in [12, 5].
3.2. GeneralizedCross-Validation. The GeneralizedCross-Validation(GCV)
parameter selection method does not depend on a priori knowledge about the noise
variance. This idea of Golub, Heath, and Wahba [9] is to nd the parameter  that
minimizes the GCV functional
G() =
k(I BnZrAA
]

)bk
2
2
(trace(I BnZrAA
]

))
2
;(11)
where A
]

denotes the matrix that maps the right hand side b onto the regularized
solution x

. In Tikhonov regularization, for example, A
]

is
(A

A+
2
I)
BnZr1
A

:
GCV chooses a regularization parameter that is not too dependent on any one data
measurement [11, 12.1.3].
3.3. The L-Curve. One way to visualize the tradeo between regularization
error and error due to noise is to plot the norm of the regularized solution versus
the corresponding residual norm for each of a set of regularization parameter values.
The result is the L-curve, introduced by Lawson and popularized by Hansen [15].
See Figure 1 for a typical example. As the regularization parameter increases, noise
is damped, so that the norm of the solution decreases while the residual increases.
Intuitively, the best regularization parameter should lie on the corner of the L-curve,
since for values higher than this, the residual increases without reducing the norm
of the solution much, while for values smaller than this, the norm of the solution
increases rapidly without much decrease in residual. In practice, only a few points
on the L-curve are computed and the corner is located by approximate methods,
estimating the point of maximumcurvature [19].
Like GCV, this method of determining a regularization parameter does not de-
pend on specic knowledge about the noise vector.
3.4. Disadvantages of these parameter choice algorithms. The appropri-
ate choice of regularization parameter { especially for projection algorithms { is a
dicult problem, and each method has severe aws.
5
Basic cost Added Cost
Disc. GCV L-curve
Tikhonov O(mn
2
) O(p(m + n)) O(p(n+m)) O(p(m +n))
TSVD O(mn
2
) O(m) O(m) O(m +n)
Rust's TSVD O(mn
2
) O(mlogm) O(mlogm) O(mlogm)
Projection O(qk) 0 O(q) O(q)
Table 1
Summary of additional ops needed to compute the regularization parameter for each four
regularization methods with various parameter selection techniques. Notation:
q is the cost of multiplication of a vector by A.
p is the number of discrete parameters that must be tried;
k is the dimension of the projection.
m and n are problem dimensions.
The Discrepancy Principle is convergent as the noise goes to zero, but it relies on
knowing information that is often unavailable or incorrectly estimated. Even with a
correct estimate of the variance, the solutions tend to be oversmoothed [20, pg. 96]
(see also the discussion in x6.1 of [15]).
One noted diculty with GCV is that G can have a very at minimum, making
it dicult to determine the optimal  numerically [35].
The L-curve is usually more tractable numerically, but its limiting properties are
nonideal. The solution estimates fail to converge to the true solution as n ! 1 [36]
or as the error norm goes to zero [6]. All methods that assume no knowledge of the
error norm { including GCV { have this latter property [6].
For further discussion and references about parameter choice methods, see [5, 17].
The cost of these methods is tabulated in Table 1.
3.5. Previous work on parameter choice for hybrid methods. At rst
glance, it appears that for Tikhonov regularization, multiple systems of the form
(5) must be solved in order to evaluate candidate values of  for the Discrepancy
Principle or the L-curve. Techniques have been suggested in the literature for solving
these systems using projection methods.
Chan and Ng [4], for example, note that the systems involve the closely related
matrices matrices C() = A

A+I, and they suggest solving the systems simultane-
ously using a Galerkin projection method on a sequence of \seed" systems. Although
this is economical in storage, it can be unnecessarily expensive in time because they
do not exploit the fact that for each xed k, the Krylov subspace K
k
(A

b;C()) is
the same for all values of .
Frommer and Maass [8] propose two algorithms for approximating the  that
satises the Discrepancy Principle (10). The rst is a \truncated cg" approach in
which they use conjugate gradients to solve k systems of the form (5), truncating
the iterative process early for large  and using previous solutions as starting guesses
for later problems. Like Chan and Ng, this algorithm does not exploit any of the
redundancy in generating the Krylov-subspaces for each 
i
. The second method
they propose, however, does exploit the redundancy so that the CG iterates for all k
systems can be updated simultaneously with no extra matrix-vector products. They
stop their \shifted cg" algorithm when kAx

BnZr bk
2
  for one of their  values.
Thus the number of matrix-vector products required is twice the number of iterations
for this particular system to converge. We note that while the algorithms we propose
in x4 for nding a good value of  are based on the same key observation regarding
6
the Krylov subspace, our methods will usually require less work than the shifted cg
algorithm.
Calvetti, Golub, and Reichel [3] compute upper and lower bounds on the L-curve
generated by the matrices C() using a Lanczos bidiagonalizationprocess. From this,
they approximate the best parameter for Tikhonov regularization without projection.
In x4, we choose instead to approximate the best parameter for Tikhonov regular-
ization on the projected problem, since this is the approximation to the continuous
problem that is actually being used.
Kaufman and Neumaier [21] suggest an envelope guided conjugate gradient ap-
proach for the TikhonovL-curve problem. Their methodis more complicatedthan the
methods we propose because they maintainnonnegativityconstraints on the variables.
Substantial work has also been done on TSVD regularization of the projected
problems. Bjorck, Grimme, and van Dooren [2] use GCV to determine the truncation
point for the projected SVD.Their emphasis is on stable ways to maintainan accurate
factorization when many iterations are needed, and they use full reorthogonalization
and implicit restart strategies. O'Leary and Simmons [27] take a somewhat dierent
viewpoint that the problem should be preconditioned appropriately so that a massive
number of iterations is unnecessary. That viewpoint is echoed in this current work,
so we implicitly assume that the problem has been left-preconditioned or \ltered"
[27]. For example, in place of (4), we solve
min
x
kM
BnZr1
AxBnZrM
BnZr1
bk
2
2
+ 
2
kxk
2
2
for a square preconditioner M. See [14, 26, 24, 23] for preconditioners appropriate for
certain types of ill-posed problems. Note that we could alternately have considered
right preconditioning, which amounts to solving, in the Tikhonov case,
min
y
k

A
I

M
BnZr1
y BnZr

b
0

k;
for y

then setting x

= M
BnZr1
y

. Note that either left or right preconditioning
eectively changes the balance between the two terms in the minimization.
4. Regularizing the projected problem. In this section we develop nine ap-
proaches to regularization using Krylov methods. Many Krylov methods have been
proposed; for ease of exposition we focus on just two of these: the LSQR algorithm
of Paige and Saunders [29] and the GMRES algorithm of Saad and Schultz [33].
The LSQR algorithm of Paige and Saunders [29] iteratively computes the bidiag-
onalization introduced by Golub and Kahan [10]. Given a vector b, the algorithm is
as follows [29, Alg. Bidiag 1]:
Compute a scalar 
1
and a vector u
1
of length one so that 
1
u
1
= b.
Similarly, determine 
1
and v
1
so that 
1
v
1
= A
T
u
1
.
For i = 1,2,...
Let 
i+1
u
i+1
= Av
i
BnZr
i
u
i
and 
i+1
v
i+1
= A
T
u
i+1
BnZr
i+1
v
i
,
where the non-negative scalars 
i+1
and 
i+1
are chosen
so that u
i+1
and v
i+1
have length one.
End for
The vectors u
i
;v
i
are called the left and right Lanczos vectors respectively. The
algorithm can be rewritten in matrix form by rst dening the matrices
U
k
 [u
1
;:::;u
k
];V
k
 [v
1
;:::;v
k
];
7
B
k

2
6
6
6
6
6
6
4

1

2

2

3
.
.
.
.
.
.

k

k+1
3
7
7
7
7
7
7
5
:
With e
i
denoting the ith unit vector, the following relations can be established:
b = 
1
u
1
= 
1
U
k+1
e
1
;(12)
AV
k
= U
k+1
B
k
;(13)
A
T
U
k+1
= V
k
B
T
k
+ 
k+1
v
k+1
e
T
k+1
;(14)
V

k
V
k
= I
k
; U

k+1
U
k+1
= I
k+1
;(15)
where the subscript on I denotes the dimension of the identity.
Now suppose we want to solve
min
x2S
kbBnZrAxk
2
(16)
where S denotes the k-dimensional subspace spanned by the rst k Lanczos vectors
v
i
. The solution we seek is of the form x
(k)
= V
k
y
(k)
for some vector y
(k)
of length k.
Dene r
(k)
= bBnZrAx
(k)
to be the corresponding residual. From the relations above,
observe that in exact arithmetic
r
(k)
= 
1
u
1
BnZrAV
k
y
(k)
= U
k+1
(
1
e
1
BnZrB
k
y
(k)
)
Since U
k+1
has, in exact arithmetic, orthonormal columns, we have
kr
(k)
k
2
= k
1
e
1
BnZrB
k
y
(k)
k
2
:(17)
Therefore, the projected problem we wish to solve is
min
y
(k)
k
1
e
1
BnZrB
k
y
(k)
k
2
:(18)
Solving this minimization problem is equivalent to solving the normal equations in-
volving the bidiagonal matrix:
B

k
B
k
y
(k)
= 
1
B

k
e
1
:(19)
Typically k is small,so reorthogonalizationto combat the eects of inexact arithmetic
might or might not be necessary. The matrix B
k
maybe ill-conditioned because some
of its singular values approximate some of the small singular values of A. Therefore
solving the projected problem might not yield a good solution y
(k)
. However, we
can use any of the methods of Section 3 to regularize this projected problem; we
discuss options in detail below. As alluded to in x4, the idea is to generate y
(k)
reg
, the
regularized solution to (18), and then to compute a regularized solution to (16) as
x
(k)
reg
= V
k
y
(k)
reg
.
If we used the algorithm GMRES instead of LSQR, we would derive similar
relations. Here, though, the U and V matrices are identical and the B matrix is
upper Hessenberg rather than bidiagonal. Conjugate gradients would yield similar
relationships.
For cost comparisons for these methods, see Tables 1 and 2. Storage comparisons
are given in Tables 3 and 4.
8
4.1. Regularizationby projection. As mentioned earlier, if we terminate the
iteration after k steps, we have projected the solution onto a k dimensional subspace
and this has a regularizing eect that is sometimes sucient. Determining the best
value of k can be accomplished, for instance, by one ofour three methods of parameter
choice:
1. Discrepancy Principle.
In this case, we stop the iteration for the smallest value of k for which kr
k
k
. Both LSQR and GMRES have recurrence relations for determining kr
k
k
using scalar computations, without computing either r
k
or x
k
[29, 32].
2. GCV.
For the projected problems (see x4.1) dened by either LSQR or GMRES,
the operator AA
]
is given by
U
k+1
B
k
B
y
k
U

k+1
where B
y
k
is the pseudo-inverse of the matrix B
k
. Thus from (11), the GCV
functional is [17]
G(k) =
kr
(k)
k
2
2
(mBnZrk)
2
:
We note that there are in fact two distinct denitions for B
y
k
and hence two
denitions for the denominator in G(k); for small enough k, the two are
comparable, and the denition we use here is less expensive to calculate [18,
x7.4].
3. L-Curve.
To determine the L-curve associated with LSQR or GMRES, estimates of
kr
k
k
2
and kx
k
k
2
are needed for several values of k. Using either algorithm,
we can computekr
k
k
2
with onlya few scalar calculations. Paige andSaunders
give a similar method for computing kx
k
k
2
[29], but, with GMRES, the cost
for computing kx
k
k
2
is O(k
2
). In using this method or GCV, one must go a
few iterations beyond the optimal k in order to verify the optimum [19].
4.2. Regularizationby projectionplus TSVD. If projection alone does not
regularize, then we can compute the TSVD regularized solution to the projected
problem (19). We need the SVD of the (k + 1)k matrix B
k
. This requires O(k
3
)
operations, but can also be computed from the SVD of B
kBnZr1
in O(k
2
) operations [13].
Clearly, we still need to use some type of parameter selection technique to nd a
good value of `(k). First, notice that it is easy to compute the norms of the residual
and the solution resulting from neglecting the ` smallest singular values. If 
jk
is the
component of e
1
in the direction of the j-th left singular vector of B
k
, and if 
j
is
the j-th singular value (ordered largest to smallest), then the residual and solution
2-norms are

1
0
@
k+1
X
j=kBnZr`(k)+1

2
jk
1
A
1=2
and 
1
0
@
kBnZr`(k)
X
j=1


jk

j

2
1
A
1=2
:(20)
Using this fact, we can use any of our three sample methods:
1. Discrepancy Principle.
Let r
(k)
`
denote the quantity bBnZrAx
(k)
`
and note that by (13) and orthonor-
mality, kr
(k)
`
k
2
is equal to the rst quantity in (20). Therefore, we choose
`(k) to be the largest value for which kr
(k)
`
k , if such a value exists.
9
2. GCV.
Another alternative for choosing `(k) is to use GCV to compute `(k) for
the projected problem. The GCV functional for the kth projected problem
is obtained by substituting B
k
for A and B
]
k
for A
]
, and substituting the
expression of the residual in (20) for the numerator in (11):
G
k
(`) =

2
1
P
k+1
j=kBnZr`+1

2
jk
(`+ 1)
2
:
3. L-Curve.
We now have many L-curves, one for each value of k. The coordinate values
in (20) form the discrete L-curve for a given k, from which the desired value
of `(k) can be chosen without forming the approximate solutions or residuals.
As k increases, the value `(k) chosen by the Discrepancy Principle will be mono-
tonically nondecreasing.
4.3. RegularizationbyprojectionplusRust'sTSVD. Asinstandard TSVD,
to use Rust's version of TSVD for regularization of the projected problem requires
that we compute the SVD of the (k+1)k matrix B
k
. Using the previous notation,
Rust's strategy is to set
y
(k)

=
X
i2I
(k)


ik

i
q
(k)
i
where q
(k)
j
are the right singular vectors of B
k
and I
(k)

= fi < k + 1 : j
ik
j > g. We
focus on three ways to determine :
1. Discrepancy Principle.
Using the notation from the previous section, the norm of the regularized so-
lution is given by 
1
(
P
i62I
(k)


2
ik
)
1=2
= kr
(k)

k
2
: According to the discrepancy
principle, we must choose  so that the residual is less than . In practice,
this would require that the residual be evaluated by sorting the values j
ik
j
and adding terms in that order until the residual norm is less than .
2. GCV.
Let us denote by card(I
(k)

) the cardinality of the set I
(k)

. From (11), it is
easy to show that the GCVfunctionalcorresponding to the projected problem
for this regularization technique is given by
G
k
() =

2
1
P
i2I
(k)


2
ik
(k +1BnZrcard(I
(k)

))
2
:
In practice, for each k we rst sort the values j
ik
j;i = 1;:::;k from smallest
to largest. Then we dene k discrete values 
j
to be equal to these values
with 
1
being the smallest. We set 
0
= 0. Note that because the values of

j
;j = 1;:::;k are the sorted magnitudes of the SVD expansion coecients,
we have
G
k
(
j
) =

2
1
(j
(k+1);k
j
2
+
P
j
i=1

2
j
)
(j +1)
2
; j = 0;:::;k:
Finally, we take the regularization parameter to be the 
j
for which G
k
(
j
)
is a minimum.
10
3. L-Curve.
As with standard TSVD, we now have one L-curve for each value of k. For
xed k, if we dene the 
j
;j = 0;:::;k as we did for GCV above and we
reorder the 
i
in the same way that the j
ik
j were reordered when sorted,
then we have
kx
(k)

j
k
2
2
= 
2
1
k
X
i=j+1


i

i

2
; kr
(k)

j
k
2
2
= 
2
1
(j
(k+1);k
j
2
+
j
X
i=1

2
j
) j = 0;:::;k:
When these solution and residual norms are plotted against each other as
functions of , the value of 
j
corresponding to the corner is selected as the
regularization parameter.
4.4. Regularization by projection plus Tikhonov. Finally, let us consider
using Tikhonov regularization to regularize the projected problem (18) for some inte-
ger k. Thus, for a given regularization parameter , we would like to solve
min
y
k
1
e
1
BnZrB
k
yk
2
2
+ 
2
kyk
2
2
;(21)
or, equivalently,
min
y
k


1
e
1
0

BnZr

B
k
I

yk
2
:(22)
The solution y
(k)

to either formulation satises
(B

k
B
k
+
2
I)y
(k)

= 
1
B

k
e
1
:(23)
Using (13) and (15), we see that y
(k)

also satises
(V

k
A

AV
k
+
2
I)y
(k)

= V

k
A

b:(24)
Therefore,
y
(k)

= argmin
y
k

A
I

V
k
y BnZr

b
0

k
2
:
Using x
(k)

= V
k
y
(k)

, we have
x
(k)

= argmin
x2S
kAxBnZrbk
2
2
+ 
2
kxk
2
2
:
Thus as k ! n, the backprojected regularized solution x
(k)

approaches the solution
to (4).
We need to address how to choose a suitable value of .
1. Discrepancy Principle.
Note that in exact arithmetic, we have
r
(k)

= bBnZrAx
(k)

= U

k+1
(
1
e
1
BnZrB
k
y
(k)

):(25)
Hence kB
k
y
(k)

BnZr
1
e
1
k
2
= kr
(k)

k
2
. Therefore, to use the Discrepancy Princi-
ple requires we choose  so that kr
(k)

k
2
 , with p discrete trial values 
j
.
For a given k, we take  to be the largest value 
j
for which kr
(k)

k
2
< , if
it exists; if not, we increase k and test again.
11
2. GCV.
Let us dene (B
k
)
y

to be the operator mapping the right hand side of the
projected problem onto the regularized solution of the projected problem:
(B
k
)
y

= (B

k
B
k
+ 
2
I)
BnZr1
B

k
:
Given the SVD of B
k
as above, the denominator in the GCV functional
dened for the projected problem (refer to (11)) is
0
@
k + 1BnZr
k
X
j=1

2
j

2
j
+
2
1
A
2
:
The numeratoris simplykr
(k)

k
2
2
. Forvalues ofk  n, itis feasibleto compute
the singular values of B
k
.
3. L-Curve.
The L-curve is comprised of the points (kB
k
y
(k)

BnZr
1
e
1
k
2
;ky
(k)

k
2
). But using
(25) and the orthonormality of the columns of V
k
, we see these points are
precisely (kr
(k)

k
2
;kx
(k)

k
2
). For p discrete values of , 
i
;1  i  p, the
quantities kr
(k)

i
k
2
and kx
(k)

i
k
2
can be obtained by updating their respective
estimates at the (kBnZr1)st iteration.
2
4.5. Correspondence between Direct Regularization and Projection
Plus Regularization. In this section, we argue why the projection plus regular-
ization approaches can be expected to yield regularized solutions nearly equivalent to
the direct regularization counterpart. The following theorem establishes the desired
result for the case of Tikhonov vs. projection plus Tikhonov.
Theorem 4.1. Fix  > 0 and dene x
(k)

to be the kth iterate of conjugate
gradients applied to the Tikhonov problem
(A

A +
2
I)x = A

b:
Let y
(k)

be the exact solution to the regularized projected problem
(B

k
B
k
+
2
I)y = B

k
(e
1
)
where B
k
;V
k
are derived from the original problem A

A = A

b, and set z
(k)

= V
k
y
(k)

.
Then z
(k)

= x
(k)

.
Proof: By the discussion at the beginning of x4.4 and equations (23) and (24), it
follows that y
(k)

solves
V

k
(A

A+ 
2
I)V
k
y = V

k
A

b:
Now the columns of V
k
are the Lanczos vectors with respect to the matrix A

A and
right-hand side A

b. But these are the same as the Lanczos vectors generated with
respect to the matrixA

A+
2
I and right-hand side A

b. Therefore V
k
y
(k)

is precisely
the kth iterate of conjugate gradients applied to (A

A + 
2
I)x = A

b [11, pg. 495].
Hence z
(k)

= x
(k)

. 2
2
The technical details of the approach are found in [28, pp. 197-198], from which we obtain
kr
(k)

k =
q
kr
(k)

k
2
+ 
2
kx
(k)

k
2
. The implementation details for estimating kx
(k)

k and kr
(k)

k were
taken from the Paige and Saunders algorithm at http://www.netlib.org/linalg/lsqr.
12
Projection plus { Disc. GCV L-curve
Tikhonov O(pk) O(k
3
) O(pk)
TSVD O(k
3
) O(k
3
) O(k
3
)
Rust's O(k
3
) O(k
3
) O(k
3
)
Table 2
Summary of ops for projection plus inner regularization with various parameter selection
techniques, in addition to the O(qk) ops required for projection itself. Here k is the number of
iterations (ie. the size of the projection) taken and p is the number of discrete parameters that must
be tried.
Let us turn to the case of TSVD regularization applied to the original problem
vs. the projection plus TSVD approach. Direct computation convinces us that the
two methods compute the same regularized solution if k = n and arithmetic is exact.
An approximate result holds in exact arithmetic when we take k iterations, with
n BnZr ` = j < k < n. Let the singular value decomposition of B
k
be denoted by
B
k
= Z
k
BnZr
k
Q
T
k
and dene the sj matrix W
s;j
as
W
s;j
=

I
0

:
Then the regularized solution obtained fromthe TSVD regularization of the projected
problem is
x
(k)
reg
= V
k
(Q
k
W
k;j
BnZr
BnZr1
k;1
W
T
k+1;j
Z
T
k
U
T
k
b);
where BnZr
k;1
denotes the leading j  j principle submatrix of BnZr
k
. If k is taken to be
enough larger than j so that V
k
Q
k
W
k;j

^
V W
n;j
, W
T
k+1;j
Z
T
k
U
T
k+1
 W
T
n;j
^
U
T
and
BnZr
k;1
 
1
with 
1
the leading principle submatrix of , then we expect x
(k)
reg
to be a
good approximation to x
`
. This is made more precise in the following theorem.
Theorem 4.2. Let k > j such that
(V
k
Q
k
W
k;j
) =
^
V
1
+ E
1
with kE
1
k 
1
 1;
(U
k+1
Z
k
W
k+1;j
) =
^
U
1
+ E
2
with kE
2
k  
2
 1;
where
^
V
1
and
^
U
1
contain the rst j columns of
^
V and
^
U respectively. Let D =
diag(d
1
;:::;d
j
) satisfy
BnZr
k;1
= 
1
+ D with jd
i
j 
3
 1:
Then
kx
(k)
reg
BnZrx
`
k max
1ij
1

i
+ d
i


3

j
+ 3max(
1
;
2
)

kbk:
Proof: Using the representations x
`
=
^
V
1

BnZr1
1
^
U
T
1
b and x
(k)
reg
= (
^
V
1
+E
1
)BnZr
BnZr1
k;1
(
^
U
T
1
+
E
T
2
)b, we obtain
kx
(k)
reg
BnZrx
`
k (kBnZr
BnZr1
k;1
BnZr
BnZr1
1
k+kBnZr
BnZr1
k;1
kkE
2
k+kE
1
kkBnZr
BnZr1
k;1
k+kE
1
kkBnZr
BnZr1
k;1
kkE
2
k)kbk;
and the conclusion follows from bounding each term. 2
Note that typically 
j
 
n
so that 1=
j
is not too large. For some results
relating to the value of k necessary for the hypothesis of the theorem to hold, the
interested reader is referred to theory of the Kaniel-Paige and Saad [30, x12.4].
13
Basic cost Added Cost
Disc. GCV L-curve
Tikhonov O(^q) O(1) O(p) O(p)
TSVD O(^q) O(1) O(m) O(m)
Rust's TSVD O(^q) O(m) O(m) O(m)
Projection O(kn) O(1) O(k) O(k)
Table 3
Summary of additional storage for each of four regularization methods under each of three
parameter selection techniques. The original matrix is m  n with q nonzeros, p is the number of
discrete parameters that must be tried, k iterations are used in projection, and the factorizations are
assumed to take ^q storage.
Projection plus { Disc. GCV L-curve
Tikhonov O(1) O(p) O(p)
TSVD O(1) O(k) O(k)
Rust's TSVD O(k) O(k + p) O(k +p)
Table 4
Summary of storage, not including storage for the matrix, for projection plus inner regular-
ization approach, various parameter selection techniques. Here p denotes the number of discrete
parameters tried. Each of these regularization methods also requires us to save the basis V or else
regenerate it in order to reconstruct x.
5. Numerical results. In this section, we present two numerical examples. All
experiments were carried out using Matlab and Hansen's Regularization Tools [16],
with IEEE double precision oating point arithmetic. Since the exact, noise-free
solutions were known in both examples, we evaluated the methods using the two-
norm dierence between the regularized solutions and the exact solutions. In both
examples when we applied Rust's method to the original problem, the 
i
were taken
to be the magnitudes of the spectral coecients of b sorted in increasing order.
5.1. Example1. The 200200matrixA andtrue solutionx
true
forthis example
were generated using the function baart in Hansen's Regularization Toolbox. We
generated b
true
= Ax
true
and then computed the noisy vector b as b+e, where e was
generated using the Matlab randn function and was scaled so that the noise level,
kek
kb
true
k
, was 10
BnZr3
. The condition number of A was on the order of 10
19
.
Many values of  were tested: log
10
 = BnZr6;BnZr5:9;:::;BnZr2. Table 5 displays the
values of the regularization parameters chosen when the three parameter selection
techniques were applied together with one of the four regularization methods on the
original problem. Since kek
2
= 5:3761EBnZr4, we set  that denes the discrepancy
principle as the very close approximation 5:5EBnZr4.
The last column in the table gives the value of the parameter that yielded a
regularized solution with the minimumrelative error when compared against the true
solution. The relative error values for regularized solutions corresponding to the
parameters in Table 5 are given in Table 6. Note that using GCV to determine a
regularization parameter for Rust's TSVD resulted in an extremely noisy solution
with huge error.
The corners of the L-curves for the Tikhonov, projection, and TSVD methods
were determined using Hansen's lcorner function, with the modication that points
corresponding to solution norms greater than 10
6
for the TSVD methods were not
14
Disc. GCV L-curve optimal
Tikhonov  1:259EBnZr3 1:995EBnZr4 2:512EBnZr4 5:012EBnZr5
TSVD ` 197 197 196 196
Rust's TSVD  1:223EBnZr4 9:645EBnZr7 1:223EBnZr4 1:259EBnZr4 or 1:223EBnZr4
Projection k 4 4 6 6
Table 5
Example 1: parameter values selected for each method.
Disc. GCV L-curve optimal
Tikhonov .1330 .1110 .1084 .0648
TSVD .1663 .1213 .1663 .1213
Rust's TSVD .1213 7E+14 .1213 .1213
Projection .1134 .1207 .1134 .1134
Table 6
Example 1: comparison of kx
true
BnZr x
reg
k
2
=kx
true
k
2
for each of 4 regularization methods on
the original problem, where the regularization method was chosen using methods indicated.
considered (otherwise, a false corner resulted).
Next, we projected using LSQR and then regularized the projected problem with
one of the three regularization methods considered. For each of the three methods,
we computed regularization parameters for the projected problem using Discrepancy,
GCV, and L-curve, then computed the corresponding regularized solutions; the pa-
rameters that were selected in each case at iterations 10 and 40 are given in Tables 7
and 9 respectively. As before, the lcorner routine was used to determine the corners
of the respective L-curves.
Comparing Table 6 and 8, we observe that computing the regularized solution
via projection plus Tikhonov for projection size of 10 using either the Discrepancy
Principle or the L-curve to nd the regularization parameter gives results as good as if
those techniques had been used with Tikhonov on the original problem to determine
a regularized solution. Similar statements can be made for projection plus TSVD
and projection plus Rust's TSVD. We should also note that for Tikhonov, with and
without projection, none of the errors in the tables is optimal; that is, no parameter
selection techniques ever gave the parameter for which the error was minimal.
5.2. Example 2. The 255  255 matrix A for this example was a symmetric
Toeplitz matrix with bandwidth 16 and exponential decay across the band.
3
The
true solution vector x
true
is displayed as the top picture in Figure 2. We generated
b
true
= Ax
true
and then computed the noisy vector b as b+e, where e was generated
using the Matlab randn function and was scaled so that the noise level,
kek
kb
true
k
, was
10
BnZr3
. The vector b is shown in the bottom of Figure 2. The condition number of A
was 1:6510
7
.
We generated our discrete 
i
using log
10
 = BnZr5;BnZr4:9;:::;BnZr1. The norm of the
noise vector was 7:16EBnZr2, so we took the value of  that denes the discrepancy
principle to be 8:00EBnZr2.
In this example, it took 61 iterations for LSQR to reach a minimum relative er-
ror of 9:48EBnZr2, and several more iterations were needed for the L-curve method to
3
It was generated using the Matlab command A = (1=(2  pi  sigma))  toeplitz(t); where
sigma = 5 and t = [exp(BnZr([0 : bandBnZr1]:^2)=(2sigma^2));zeros(1;N BnZrband)] with band = 16.
15
Disc. GCV L-curve optimal
Tikhonov (k) 1:259EBnZr3 1:995EBnZr3 1:995EBnZr4 5:012EBnZr5
TSVD `(k) 2 3 2 2
Rust's TSVD (k) 1:679EBnZr4 1:773EBnZr4 1:679EBnZr5 1:679EBnZr5
Table 7
Example 1, iteration 10: regularization parameters selected for projection plus Tikhonov,
TSVD, and Rust's TSVD.
Disc. GCV L-curve optimal
Tikhonov .1330 .1486 .1084 .0648
TSVD .1663 .3451 .1663 .1213
Rust's TSVD .1213 .1663 .1213 .1213
Table 8
Example 1, iteration 10: comparison of kx
true
BnZrx
reg
k
2
=kx
true
k
2
for projection plus Tikhonov,
TSVD, and Rust's TSVD.
Disc. GCV L-curve optimal
Tikhonov (k) 1:259EBnZr3 1:995EBnZr3 1:995EBnZr4 5:012EBnZr5
TSVD `(k) 10 13 8 9
Rust's TSVD (k) 9:201EBnZr5 1:225EBnZr4 9:201EBnZr5 9:201EBnZr5
Table 9
Example 1, iteration 40: regularization parameters selected for projection plus Tikhonov,
TSVD, and Rust's TSVD.
Disc. GCV L-curve optimal
Tikhonov .1330 .1486 .1084 .0648
TSVD .1679 .1986 .1206 .1165
Rust's TSVD .1162 .1162 .1162 .1162
Table 10
Example 1, iteration 40: comparison of kx
true
BnZrx
reg
k
2
=kx
true
k
2
for projection plus Tikhonov,
TSVD, and Rust's TSVD.
50 100 150 200 250
?2
0
2
4
6
8
10
12
exact solution
50 100 150 200 250?2
0
2
4
6
b
Fig. 2. Example 2: Top: exact solution. Bottom: noisy right hand side b.
16
Disc. GCV L-curve optimal
Tikhonov  1:259EBnZr2 1:259EBnZr2 1:995EBnZr3 3:9811EBnZr3
TSVD ` 216 254 201 201
Rust's TSVD  2:183EBnZr2 2:586EBnZr6 1:477EBnZr2 1:527EBnZr2
Projection k 2 18 5 5
Table 11
Example 2: parameter values selected for each method. The projection was performed on a left
preconditioned system.
Disc. GCV L-curve optimal
Tikhonov 9:909EBnZr2 9:909EBnZr2 1:050EBnZr2 9:394EBnZr2
TSVD 1:102EBnZr1 8:121EBnZr1 9:074EBnZr2 9:0744EBnZr2
Rust's TSVD 1:025EBnZr1 22:67 1:011EBnZr1 1:011EBnZr1
Projection 1:030EBnZr1 9:85EBnZr2 1:15EBnZr1 9:479EBnZr2
Table 12
Example 2: comparison of kx
true
BnZr x
reg
k
2
=kx
true
k
2
for each of 4 regularization methods on
the original problem.
estimate a stopping parameter. Likewise, the dimension k of the projected problem
had to be around 60 to obtain good results with the projection-plus-regularization ap-
proaches, and much larger than 60 for the L-curve applied to the projected, Tikhonov
regularized problemto givea good estimate ofthe corner with respect to the Tikhonov
regularized original problem. Therefore, for the projection based techniques, we chose
to work with a left preconditioned system (refer to the discussion at the end of x 3.5).
Our preconditioner was chosen as in [22] where the parameter dening the precondi-
tioner was taken to be m = 50.
The values of the regularization parameters chosen when the three parameter
selection techniques were applied together with one of the four regularization methods
on the original problem are given in Table 11. The last column in the table gives the
value of the parameter that gave a regularized solution with the minimum relative
error over the range of discrete values tested, with respect to the true solution. The
relative errors that resulted from computing solutions according to the parameters in
Table 11 are in Table 12. We note that GCV with TSVD and Rust's TSVD were
ineective.
The corners of the L-curves for the Tikhonov, projection, and TSVD methods
were determined using Hansen's lcorner function, with the modication that points
corresponding to the largest solution norms for the TSVD methods were not consid-
ered (otherwise, a false corner was detected by the lcorner routine).
Next, we projected using LSQR (note that since the matrix and preconditioner
were symmetric, we could have used MINRES as in [22]) and then regularized the
projected problem with one of the three methods considered. For each of the three
methods, we computed regularizationparameters for the projected problemusing Dis-
crepancy, GCV, and L-curve, then computed the corresponding regularized solutions;
the parameters that were selected in each case at iterations 15 and 25 are given in Ta-
bles 13 and 15, respectively. The relative errors of the regularized solutions generated
accordingly are given in Tables 14 and 16.
Again, we used the lcorner routine to determine the corners of the respective
L-curves, except in the case of Rust's TSVD method. In the latter case, there was
17
Disc. GCV L-curve optimal
Tikhonov (k) 2:512EBnZr2 1:585EBnZr2 1:9953EBnZr3 3:981EBnZr3
TSVD `(k) 5 5 4 1
Rust's TSVD (k) 3:558EBnZr2 3:558EBnZr2 3:558EBnZr2 3:558EBnZr2
Table 13
Example 2, iteration 15: regularization parameters selected for projection plus Tikhonov,
TSVD, and Rust's TSVD.
Disc. GCV L-curve optimal
Tikhonov 1:0001EBnZr1 9:9511EBnZr2 1:061EBnZr1 9:530EBnZr2
TSVD 9:595EBnZr1 9:595EBnZr1 1:004EBnZr1 9:357EBnZr2
Rust's TSVD 1:004EBnZr1 1:004EBnZr1 1:004EBnZr1 1:004EBnZr1
Table 14
Example 2, iteration 15: comparison of kx
true
BnZrx
reg
k
2
=kx
true
k
2
for projection plus Tikhonov,
TSVD, and Rust's TSVD.
always a very sharp corner that could be picked out visually.
Comparing Table 11 with Tables 13 and 15, we see that the parameter chosen
by applying the L-curve method to projected-plus-Tikhonov problem was the same
parameter chosen by applying the L-curve to the original problem. Moreover, a com-
parison of Table 12 with Tables 14 and 16 shows that relative errors of the regularized
solutions computed accordingly are comparable to applying Tikhonov to the origi-
nal problem with that same parameter. Similar results are shown for the other cases,
with the exception that the discrepancy principle did not work well for the projection-
plus-TSVD problems, and GCV was not eective for the projected problems when
k = 25.
6. Conclusions. In this work we have given methods for determining the reg-
ularization parameter and regularized solution to the original problem based on reg-
ularizing a projected problem. The proposed approach of applying regularization
and parameter selection techniques to a projected problem is economical in time
and storage. We presented results that in fact the regularized solution obtained by
backprojecting the TSVD or Tikhonov solution to the projected problem is almost
equivalent to applying TSVD or Tikhonov to the original problem, where \almost"
depends on the size of k. The examples indicate the practicality of the method, and
illustrate that our regularized solutions are usually as good as those computed using
the original system and can be computed in a fraction of the time, using a fraction of
the storage. We note that similar approaches are valid using other Krylov subspace
methods for computing the projected problem.
In this work, we did not address potential problems from loss of orthogonality
as the iterations progress. In this discussion, we did, however, assume that either k
was naturally very small compared to n or that preconditioning had been applied to
enforce this condition. Possibly for this reason, we found that for modest k, round-o
did not appear to degrade either the LSQR estimates of the residual and solution
norms or the computed regularized solution in the following sense: the regularization
parameters chosen viathe projection-regularization and the corresponding regularized
solutions were comparable to those chosen and generated for the original discretized
problem.
For the Tikhonov approach in this paper, we have assumed that the regularization
18
Disc. GCV L-curve optimal
Tikhonov (k) 2:512EBnZr2 1:259EBnZr2 1:995EBnZr3 3:982EBnZr3
TSVD `(k) 9 9 8 3
Rust's TSVD (k) 4:828EBnZr2 7:806EBnZr3 4:828EBnZr2 4:828EBnZr2
Table 15
Example 2, iteration 25: regularization parameters selected for projection plus Tikhonov,
TSVD, and Rust's TSVD.
Disc. GCV L-curve optimal
Tikhonov 1:000EBnZr1 9:909EBnZr2 1:061EBnZr1 9:530EBnZr2
TSVD 9:164EBnZr1 9:595EBnZr1 1:004EBnZr1 9:145EBnZr2
Rust's TSVD 1:004EBnZr1 2.728 1:004EBnZr1 1:004EBnZr1
Table 16
Example 2, iteration 25: comparison of kx
true
BnZrx
reg
k
2
=kx
true
k
2
for projection plus Tikhonov,
TSVD, and Rust's TSVD.
operator L wasthe identityorwasrelated to the preconditioningoperator; this allowed
us to ecientlycomputekr
(k)

kandkx
(k)

kformultiplevalues of ecientlyfor each k.
If L is not the identity but is invertible, we can rst implicitly transform the problem
to \standard form"[17]. With

A = AL
BnZr1
, x = Lx, we can solve the equivalent system
min
x
= k

AxBnZrbk
2
2
+ 
2
kxk
2
2
:
Then the projection plus regularization schemes may be applied to this transformed
problem. Clearlythe projection based schemes willbe useful as longas solvingsystems
involving L can be done eciently.
REFERENCES
[1]

A. Bj

orck, A bidiagonalization algorithm for solving large and sparse ill-posed systems of
linear equations, BIT, 28 (1988), pp. 659{670.
[2]

A. Bj

orck, E. Grimme, and P. V. Dooren, An implicit shift bidiagonalization algorithm for
ill-posed systems, BIT, 34 (1994), pp. 510{534.
[3] D. Calvetti, G. Golub, and L. Reichel, Estimation of the L-curve via Lanczos bidiagonal-
ization, Tech. Report SCCM-97-12, ScienticComputingand ComputationalMathematics
Program, Computer Science Depart., Stanford University, 1997.
[4] T. Chan and M. Ng, Galerkin projection method for solving multiple linear systems, Tech.
Report 96-31, Computational and Applied Mathematics Dept., UCLA, Los Angeles, CA,
1996.
[5] L. Desbat and D. Girard, The `minimum reconstruction error' choice of regularization pa-
rameters: Some more ecient methods and their application to deconvolution problems,
SIAM J. Sci. Comput., 16 (1995), pp. 1387{1403.
[6] H. W. Engl and W. Grever, Using the L-curve for determining optimal regularization pa-
rameters, Numer. Math., 69 (1994), pp. 25{31.
[7] H. E. Fleming, Equivalence of regularization and truncated iteration in the solution of ill-posed
image reconstruction problems, Lin. Alg. and Applics., 130 (1990), pp. 133{150.
[8] A. Frommer and P. Maass, Fast CG-based methods for Tikhonov-Phillips regularization,
SIAM J. Sci. Comput., (to appear).
[9] G. Golub, M. Heath, and G. Wahba, Generalized cross-validation as a method for choosing
a good ridge parameter, Technometrics, 21 (1979), pp. 215{223.
[10] G. Golub and W. Kahan, Calculating the singular values and pseudo-inverse of a matrix,
SIAM J. Numer. Anal., 2(Series B) (1965), pp. 205{224.
19
[11] G. Golub and C. V. Loan, Matrix Computations, The Johns Hopkins University Press, Bal-
timore, MD, 1989.
[12] W. Groetsch, Theory of Tikhonov Regularization for Fredholm equations of the First Kind,
Pitman Publishing Limited, 1984.
[13] M. Gu and S. Eisenstat, A stable and fast algorithm for updating the singular value decom-
position, Tech. Report RR-939, Yale University, Department of Computer Science, New
Haven, CT, 1993.
[14] M. Hanke, J. Nagy, and R. Plemmons, Preconditioned iterative regularization for ill-posed
problems, Numerical Linear Algebra and Sci. Computing, (1993), pp. 141{163. L. Reichel,
A. Ruttan, and R. S. Varga, editors.
[15] P. C. Hansen, Analysis of discrete ill-posed problems by means of the L-curve, SIAM Review,
34 (1992), pp. 561{580.
[16] P. C. Hansen, Regularization tools: a Matlab package for analysis and solution of discrete
ill-posed problems, Numer. Algo., 6 (1994), pp. 1{35.
[17] , Rank-Decient and Discrete Ill-Posed Problems, SIAM Press, Philadelphia, 1998.
[18] , Rank Decient and Discrete Ill-Posed Problems, PhD thesis, Technical University of
Denmark, July 1995. UNIC Report UNIC-95-07.
[19] P. C. Hansen and D. P. O'Leary, The use of the L-curve in the regularization of discrete
ill-posed problems, SIAM J. Sci. Comput., 14 (1993), pp. 1487{1503.
[20] B. Hofmann, Regularization for Applied Inverse and Ill-Posed Problems, Teubner-Texte
Mathe., Teubner, Leipzig, 1986.
[21] L. Kaufman and A. Neumaier, Regularization of ill-posed problems by envelope guided con-
jugate gradients, Journal of Computational and Graphical Statistics, (1997).
[22] M. Kilmer, Symmetric Cauchy-like preconditioners for the regularized solution of 1-d ill-posed
problems, Tech. Report CS-TR-3851, University of Maryland, College Park, 1997.
[23] , Cauchy-like preconditioners for 2-dimensional ill-posed problems, SIAM J. Matrix
Anal., (1998). to appear.
[24] M. Kilmer and D. P. O'Leary, Pivoted Cauchy-like preconditioners for regularized solution
of ill-posed problems, SIAM J. Sci. Stat. Comput., (1998). to appear.
[25] V. A. Morozov, On the solution of functional equations by the method of regularization, Soviet
Math. Dokl., 7 (1966), pp. 414{417. cited in [17].
[26] J. Nagy, R. Plemmons, and T. Torgersen, Iterative image restoration using approximate
inverse preconditioning, IEEE Trans. Image Proc., 5 (96), pp. 1151{1163.
[27] D. P. O'Leary and J. A. Simmons, A bidiagonalization-regularization procedure for large scale
discretization of ill-posed problems, SIAM J. Sci. Stat. Comput., 2 (1981), pp. 474{489.
[28] C. C. Paige and M. A. Saunders, Algorithm 583, LSQR: Sparse linear equations and least
squares problems, ACM Transactions on Mathematical Software, 8 (1982), pp. 43{71.
[29] , LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans-
actions on Mathematical Software, 8 (1982), pp. 43{71.
[30] B. N. Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood Clis, NJ,
1980.
[31] B. W. Rust, Truncating the singular value decomposition for ill-posed problems, Tech. Report
NISTIR 6131, Mathematical and Computational Sciences Division, National Institute of
Standards and Technology, Gaithersburg, MD, 1998.
[32] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Publishing Company, Boston,
1996.
[33] Y. Saad and M. H. Schultz, GMRES: A generalized minimal residual algorithm for solving
nonsymmetric linear systems, SIAM J.Sci. Stat. Comput., 7 (1986), pp. 856{869.
[34] A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-Posed Problems, Wiley, New York, 1977.
[35] J. M. Varah, Pitfalls in the numerical solution of linear ill-posed problems, SIAM J. Sci. Stat.
Comput., 4 (1983), pp. 164{176.
[36] C. R. Vogel, Non-convergence of the L-curve regularization parameter selection method, In-
verse Problems, 12 (1996), pp. 535{547.
20