CHOOSING REGULARIZATION PARAMETERS IN ITERATIVE METHODS FOR ILL-POSED PROBLEMS  MISHA E. KILMER y AND DIANNE P. O'LEARY z Abstract. Numerical solution of ill-posed problems is often accomplished by discretization (projection onto a nite dimensional subspace) followed by regularization. If the discrete problem has high dimension, though, typically we compute an approximate solution by projection onto an even smaller dimensional space, via iterative methods based on Krylov subspaces. In this work we present ecient algorithms that regularize after this second projection rather than before it. We prove some results on the approximate equivalence of this approach to other forms of regularization and we present numerical examples. Key words. ill-posed problems, regularization, discrepancy principle, iterative methods, L- curve, Tikhonov, TSVD, projection, Krylov subspace 65R30,65F20 Running Title: Choosing Regularization Parameters 1. Introduction. Linear, discrete ill-posed problems of the form Ax = b(1) or min x kAxBnZrbk 2 ; or equivalently, A  Ax = A  b(2) arise, for example, from the discretization of rst-kind Fredholm integral equations and occur in a variety of applications. We shall assume that the full-rank matrix A is mn, with m  n in (2) and m = n in (1). In discrete ill-posed problems, A is ill- conditioned and there is no gap in the singular value spectrum. Typically, the right hand side b contains noise due to measurement and/or approximation error. This noise, in combination with the ill-conditioning of A, means that the exact solution of (1) or (2) has little relationship to the noise-free solution and is worthless. Instead, we use a regularization method to determine a solution that approximates the noise-free solution. Regularizationmethodsreplace the originaloperator byabetter-conditioned but related one in order to diminish the e ects of noise in the data and produce a regularized solution to the original problem. Sometimes this regularized problem is too large to solve exactly. In that case, we typically compute an approximate solution by projection onto an even smaller dimensional space, perhaps via iterative methods based on Krylov subspaces. The conditioning of the new problem is controlled by one or more regularization parameters speci c to the method. A large regularization parameter yields a new well- conditioned problem, but its solution may be far from the noise-free solution since the new operator is a poor approximation to A. A small regularization parameter generally yields a solution very close to the noise-contaminated exact solution of (1) or (2), and hence its distance from the noise-free solution also can be large. Thus,  This work was supported by the National Science Foundationunder Grants CCR 95-03126 and CCR-97-32022 and by the Army Research Oce, MURI Grant DAAG55-97-1-0013. y Dept. of Computer and Electrical Engineering, Northeastern University, Boston, MA 02115 (mkilmer@ece.neu.edu) z Dept. of Computer Science and Institute for Advanced Computer Studies, University of Mary- land, College Park, MD 20742 (oleary@cs.umd.edu). 1 a key issue in regularization methods is to choose a regularization parameter that balances the error due to noise with the error due to regularization. A wise choice of regularization parameter is obviously crucial to obtaining useful approximate solutions to ill-posed problems. For problems small enough that a rank- revealing factorization or singular value decomposition of A can be computed, there are well-studied techniques for computing a good regularization parameter. These techniques include the Discrepancy Principle [8], generalized cross-validation (GCV) [9], and the L-curve [15]. For larger problems treated by iterative methods, though, the parameter choice is much less understood. If regularization is applied to the projected problemthat is generated by the iterative method, then there are essentially two regularization parameters: one for the standard regularization algorithms, such as Tikhonov or truncated SVD, and one controlling the number of iterations taken. One subtle issue is that the standard regularization parameter that is correct for the discretized problem may not be the optimal one for the lower-dimensional problem actually solved by the iteration, and this observation leads to the research discussed in this paper. At rst glance, there can appear to be a lot of work associated with the selection of a good regularization parameter, and many algorithms proposed in the literature are needlessly complicated. But by regularizing after projection by the iterative method, so that we are regularizing the lower dimensional problem that is actually being solved, much of this diculty vanishes. The purpose of this paper is to present parameter selection techniques designed to reduce the regularization work for iterative methods such as Krylov subspace tech- niques. Our paper is organized as follows. In x2, we will give an overview of the regularization methods we will be considering, and we follow up in x3 by surveying some methods for choosing the corresponding regularization parameters. In x4, we show how parameter selection techniques for the original problem can be applied in- stead to a projected problem obtained from an iterative method, greatly reducing the cost without much degradation in the solution. We give experimental results in x5 and conclusions and future work in x6. 2. Regularization background. In the following we shall assume that b = b true + e, where b true denotes the unperturbed data vector and e denotes zero-mean white noise. We will also assume that b true satis es the discrete Picard condition; that is, the spectral coecients of b true decay faster, on average, than the singular values. Under these assumptions, it is easy to see why the exact solution to (1) or (2) is hopelessly contaminated by noise. Let ^ U ^ V  denote the singular value decomposition of A, where the columns of ^ U and ^ V are the singular vectors, and the singular values are ordered as  1   2  :::  n . Then the solution to (1) or (2) is given by x = n X i=1 ^u  i b  i ^v i = n X i=1  ^u  i b true  i + ^u  i e  i  ^v i :(3) As a consequence of the white noise assumption, j^u  i ej is roughly constant for all i, while the discrete Picard condition guarantees that j^u  i b true j decreases with i faster than  i does . The matrix A is ill-conditioned, so small singular values magnify the corresponding coecients ^u  i e in the second sum, and it is this large contribution of noise from the approximate null space of A that renders the exact solution x de ned in (3) worthless. The following regularization methods try in di erent ways to lessen the contribution of noise to the solution. For further information on these methods, see, for example, [17]. 2 2.1. Tikhonov regularization. One of the most common methods of regular- ization is Tikhonov regularization [34]. In this method, the problem (1) or (2) is replaced with the problem of solving min x kAxBnZrbk 2 2 +  2 kLxk 2 2 (4) where L denotes a matrix, often chosen to be the identity matrix I or a discrete derivative operator, and  is a positive scalar regularization parameter. For ease in notation, we will assume that L = I. Solving (4) is equivalent to solving (A  A+ 2 I)x  = A  b:(5) In analogy with (3) we have x  = n X i=1   i ^u  i b true  2 i + 2 +  i ^u  i e  2 i + 2  ^v i :(6) In this solution, the contributions from noise components ^u  i e for values of  i <  are much smaller than they are in (3), and thus x  can be closer to the noise-free solution than x is. If  is too large, however, A  A+ 2 I is very far from the original operator A  A, and x  is very far from x true , the solution to (2) when e = 0. Conversely, if  is too small, the singular values of the new operator A  A +  2 I are close to those of A  A; thus x   x, so small singular values again greatly magnify noise components. 2.2. Truncated SVD. In the truncated SVD method of regularization, the reg- ularized solution is chosen simply by truncating the expansion in (3) as x ` = nBnZr` X i=1 ^u  i b  i ^v i :(7) Here the regularization parameter is `, the number of terms to be dropped from the sum. Observe that if ` is small, very few terms are dropped from the sum, so x ` resembles x in that the e ects of noise are large. If ` is too large, however, important information could be lost; such is the case if ^u  i b true  ^u  i e for some i > nBnZr`. An alternative, yet related, approach to TSVD is an approach introduced by Rust [31] where the truncation strategy is based on the value of each spectral coecient ^u  i b itself. The strategy is to include in the sum (3) only those terms corresponding to a spectral coecient ^u  i b whose magnitude is greater than or equal to some tolerance , which can be regarded as the regularization parameter. 2.3. Projectionand iterativemethods. Solving(5) or (7) can be impractical if n is large, but fortunately, regularization can be achieved through projection onto a subspace; see, for example, [7]. The truncated SVD is an example of one such projection: the solution is constrained to lie in the subspace spanned by the singular vectors corresponding to the largest nBnZr` singular values. Other projections can be more economical. In general, we constrain our regularized solution to lie in some k-dimensional subspace of C n , spanned by the columns of an nk matrix Q (k) . For example, we choose x (k) reg = Q (k) y (k) where y (k) solves min y2C k kAQ (k) y BnZrbk 2 2 (8) 3 or equivalently (Q (k) )  A  AQ (k) y = (Q (k) )  A  b:(9) The idea is that with an appropriately chosen subspace, the operator (Q (k) )  A  AQ (k) will be better conditioned than the original operator and hence that x (k) reg will approx- imate x true well on that subspace. This projection is often achieved through the use of iterative methods such as con- jugate gradients, GMRES, QMR, and other Krylov subspace methods. The matrix Q (k) then contains orthonormalcolumnsgenerated via a Lanczos tridiagonalizationor bidiagonalization process [27, 1]. In this case, Q (k) is a basis for some k-dimensional Krylov subspace (i.e., the subspace K k (c;K) spanned by the vectors c;Kc;:::;K kBnZr1 c for some matrix K and vector c). The regularized solutions x (k) reg are generated iter- atively as the subspaces are built. Krylov subspace algorithms such as CG, CGLS, GMRES, and LSQR tend to produce, at early iterations, solutions that resemble x true in the subspace spanned by (right) singular vectors of A corresponding to the largest singular values. At later iterations, however, these methods start to reconstruct in- creasing amounts of noise into the solution. This is due to the fact that for large k, the operator (Q (k) )  A  AQ (k) approaches the ill-conditioned operator A  A. There- fore, the choice of the regularization parameter k, the stopping point for the iteration and the dimension of the subspace, is very important. 1 2.4. Hybrid methods: projection plus regularization. Another important family of regularization methods, often referred to as hybrid methods [17], was intro- duced by O'Leary and Simmons [27]. These methods combine a projection method with a direct regularization method such as TSVD or Tikhonov regularization. The problem is projected onto a particular subspace of dimension k, but typically the restricted operator in (9) is still ill-conditioned. Therefore, a regularization method is applied to the projected problem. Since the dimension k is usually small relative to n, regularization of the restricted problem is much less expensive. Yet, with an appropriately chosen subspace, the end results can be very similar to those achieved by applying the same direct regularization technique to the original problem. We will become more precise about how \similar" the solutions are in x4.5. Because the pro- jected problems are usually generated iteratively by a Lanczos method, this approach is useful when A is sparse or structured in such a way that matrix-vector products can be handled eciently with minimal storage. 3. Existing parameter selectionmethods. In this section, we discuss a sam- pling of the parameter selection techniques that have been proposed in the literature. They di er in the amount of a priori information required as well as in the decision criteria. 3.1. The Discrepancy Principle. If some extra information is available { for example, an estimate of the variance of the noise vector e { then the regularization parameter can be chosen rather easily. Morozov's Discrepancy Principle [25] says that if  is the expected value of kek 2 , then the regularization parameter should be chosen so that the norm of the residual corresponding to the regularized solution x reg is ; that is, kAx reg BnZrbk 2 = ;(10) 1 Usually, small values of the regularizationparametercorrespondto a closer solutionto the noisy equation, but despite this, we will call k, rather than 1=k, the regularization parameter. 4 10?3 10?2 10?1 100 101 101 102 103 || r? ||2 || x ? || 2 Fig. 1. Example of a typical L-curve. This particular L-curve corresponds to applying Tikhonov regularization to the problem in Example 2 where  > 1 is some predetermined real number. Note that as  ! 0; x reg ! x true . Other methods based on knowledge of the variance are given, for example, in [12, 5]. 3.2. GeneralizedCross-Validation. The GeneralizedCross-Validation(GCV) parameter selection method does not depend on a priori knowledge about the noise variance. This idea of Golub, Heath, and Wahba [9] is to nd the parameter  that minimizes the GCV functional G() = k(I BnZrAA ]  )bk 2 2 (trace(I BnZrAA ]  )) 2 ;(11) where A ]  denotes the matrix that maps the right hand side b onto the regularized solution x  . In Tikhonov regularization, for example, A ]  is (A  A+ 2 I) BnZr1 A  : GCV chooses a regularization parameter that is not too dependent on any one data measurement [11, 12.1.3]. 3.3. The L-Curve. One way to visualize the tradeo between regularization error and error due to noise is to plot the norm of the regularized solution versus the corresponding residual norm for each of a set of regularization parameter values. The result is the L-curve, introduced by Lawson and popularized by Hansen [15]. See Figure 1 for a typical example. As the regularization parameter increases, noise is damped, so that the norm of the solution decreases while the residual increases. Intuitively, the best regularization parameter should lie on the corner of the L-curve, since for values higher than this, the residual increases without reducing the norm of the solution much, while for values smaller than this, the norm of the solution increases rapidly without much decrease in residual. In practice, only a few points on the L-curve are computed and the corner is located by approximate methods, estimating the point of maximumcurvature [19]. Like GCV, this method of determining a regularization parameter does not de- pend on speci c knowledge about the noise vector. 3.4. Disadvantages of these parameter choice algorithms. The appropri- ate choice of regularization parameter { especially for projection algorithms { is a dicult problem, and each method has severe aws. 5 Basic cost Added Cost Disc. GCV L-curve Tikhonov O(mn 2 ) O(p(m + n)) O(p(n+m)) O(p(m +n)) TSVD O(mn 2 ) O(m) O(m) O(m +n) Rust's TSVD O(mn 2 ) O(mlogm) O(mlogm) O(mlogm) Projection O(qk) 0 O(q) O(q) Table 1 Summary of additional ops needed to compute the regularization parameter for each four regularization methods with various parameter selection techniques. Notation: q is the cost of multiplication of a vector by A. p is the number of discrete parameters that must be tried; k is the dimension of the projection. m and n are problem dimensions. The Discrepancy Principle is convergent as the noise goes to zero, but it relies on knowing information that is often unavailable or incorrectly estimated. Even with a correct estimate of the variance, the solutions tend to be oversmoothed [20, pg. 96] (see also the discussion in x6.1 of [15]). One noted diculty with GCV is that G can have a very at minimum, making it dicult to determine the optimal  numerically [35]. The L-curve is usually more tractable numerically, but its limiting properties are nonideal. The solution estimates fail to converge to the true solution as n ! 1 [36] or as the error norm goes to zero [6]. All methods that assume no knowledge of the error norm { including GCV { have this latter property [6]. For further discussion and references about parameter choice methods, see [5, 17]. The cost of these methods is tabulated in Table 1. 3.5. Previous work on parameter choice for hybrid methods. At rst glance, it appears that for Tikhonov regularization, multiple systems of the form (5) must be solved in order to evaluate candidate values of  for the Discrepancy Principle or the L-curve. Techniques have been suggested in the literature for solving these systems using projection methods. Chan and Ng [4], for example, note that the systems involve the closely related matrices matrices C() = A  A+I, and they suggest solving the systems simultane- ously using a Galerkin projection method on a sequence of \seed" systems. Although this is economical in storage, it can be unnecessarily expensive in time because they do not exploit the fact that for each xed k, the Krylov subspace K k (A  b;C()) is the same for all values of . Frommer and Maass [8] propose two algorithms for approximating the  that satis es the Discrepancy Principle (10). The rst is a \truncated cg" approach in which they use conjugate gradients to solve k systems of the form (5), truncating the iterative process early for large  and using previous solutions as starting guesses for later problems. Like Chan and Ng, this algorithm does not exploit any of the redundancy in generating the Krylov-subspaces for each  i . The second method they propose, however, does exploit the redundancy so that the CG iterates for all k systems can be updated simultaneously with no extra matrix-vector products. They stop their \shifted cg" algorithm when kAx  BnZr bk 2   for one of their  values. Thus the number of matrix-vector products required is twice the number of iterations for this particular system to converge. We note that while the algorithms we propose in x4 for nding a good value of  are based on the same key observation regarding 6 the Krylov subspace, our methods will usually require less work than the shifted cg algorithm. Calvetti, Golub, and Reichel [3] compute upper and lower bounds on the L-curve generated by the matrices C() using a Lanczos bidiagonalizationprocess. From this, they approximate the best parameter for Tikhonov regularization without projection. In x4, we choose instead to approximate the best parameter for Tikhonov regular- ization on the projected problem, since this is the approximation to the continuous problem that is actually being used. Kaufman and Neumaier [21] suggest an envelope guided conjugate gradient ap- proach for the TikhonovL-curve problem. Their methodis more complicatedthan the methods we propose because they maintainnonnegativityconstraints on the variables. Substantial work has also been done on TSVD regularization of the projected problems. Bjorck, Grimme, and van Dooren [2] use GCV to determine the truncation point for the projected SVD.Their emphasis is on stable ways to maintainan accurate factorization when many iterations are needed, and they use full reorthogonalization and implicit restart strategies. O'Leary and Simmons [27] take a somewhat di erent viewpoint that the problem should be preconditioned appropriately so that a massive number of iterations is unnecessary. That viewpoint is echoed in this current work, so we implicitly assume that the problem has been left-preconditioned or \ ltered" [27]. For example, in place of (4), we solve min x kM BnZr1 AxBnZrM BnZr1 bk 2 2 +  2 kxk 2 2 for a square preconditioner M. See [14, 26, 24, 23] for preconditioners appropriate for certain types of ill-posed problems. Note that we could alternately have considered right preconditioning, which amounts to solving, in the Tikhonov case, min y k  A I  M BnZr1 y BnZr  b 0  k; for y  then setting x  = M BnZr1 y  . Note that either left or right preconditioning e ectively changes the balance between the two terms in the minimization. 4. Regularizing the projected problem. In this section we develop nine ap- proaches to regularization using Krylov methods. Many Krylov methods have been proposed; for ease of exposition we focus on just two of these: the LSQR algorithm of Paige and Saunders [29] and the GMRES algorithm of Saad and Schultz [33]. The LSQR algorithm of Paige and Saunders [29] iteratively computes the bidiag- onalization introduced by Golub and Kahan [10]. Given a vector b, the algorithm is as follows [29, Alg. Bidiag 1]: Compute a scalar 1 and a vector u 1 of length one so that 1 u 1 = b. Similarly, determine 1 and v 1 so that 1 v 1 = A T u 1 . For i = 1,2,... Let i+1 u i+1 = Av i BnZr i u i and i+1 v i+1 = A T u i+1 BnZr i+1 v i , where the non-negative scalars i+1 and i+1 are chosen so that u i+1 and v i+1 have length one. End for The vectors u i ;v i are called the left and right Lanczos vectors respectively. The algorithm can be rewritten in matrix form by rst de ning the matrices U k  [u 1 ;:::;u k ];V k  [v 1 ;:::;v k ]; 7 B k  2 6 6 6 6 6 6 4 1 2 2 3 . . . . . . k k+1 3 7 7 7 7 7 7 5 : With e i denoting the ith unit vector, the following relations can be established: b = 1 u 1 = 1 U k+1 e 1 ;(12) AV k = U k+1 B k ;(13) A T U k+1 = V k B T k + k+1 v k+1 e T k+1 ;(14) V  k V k = I k ; U  k+1 U k+1 = I k+1 ;(15) where the subscript on I denotes the dimension of the identity. Now suppose we want to solve min x2S kbBnZrAxk 2 (16) where S denotes the k-dimensional subspace spanned by the rst k Lanczos vectors v i . The solution we seek is of the form x (k) = V k y (k) for some vector y (k) of length k. De ne r (k) = bBnZrAx (k) to be the corresponding residual. From the relations above, observe that in exact arithmetic r (k) = 1 u 1 BnZrAV k y (k) = U k+1 ( 1 e 1 BnZrB k y (k) ) Since U k+1 has, in exact arithmetic, orthonormal columns, we have kr (k) k 2 = k 1 e 1 BnZrB k y (k) k 2 :(17) Therefore, the projected problem we wish to solve is min y (k) k 1 e 1 BnZrB k y (k) k 2 :(18) Solving this minimization problem is equivalent to solving the normal equations in- volving the bidiagonal matrix: B  k B k y (k) = 1 B  k e 1 :(19) Typically k is small,so reorthogonalizationto combat the e ects of inexact arithmetic might or might not be necessary. The matrix B k maybe ill-conditioned because some of its singular values approximate some of the small singular values of A. Therefore solving the projected problem might not yield a good solution y (k) . However, we can use any of the methods of Section 3 to regularize this projected problem; we discuss options in detail below. As alluded to in x4, the idea is to generate y (k) reg , the regularized solution to (18), and then to compute a regularized solution to (16) as x (k) reg = V k y (k) reg . If we used the algorithm GMRES instead of LSQR, we would derive similar relations. Here, though, the U and V matrices are identical and the B matrix is upper Hessenberg rather than bidiagonal. Conjugate gradients would yield similar relationships. For cost comparisons for these methods, see Tables 1 and 2. Storage comparisons are given in Tables 3 and 4. 8 4.1. Regularizationby projection. As mentioned earlier, if we terminate the iteration after k steps, we have projected the solution onto a k dimensional subspace and this has a regularizing e ect that is sometimes sucient. Determining the best value of k can be accomplished, for instance, by one ofour three methods of parameter choice: 1. Discrepancy Principle. In this case, we stop the iteration for the smallest value of k for which kr k k . Both LSQR and GMRES have recurrence relations for determining kr k k using scalar computations, without computing either r k or x k [29, 32]. 2. GCV. For the projected problems (see x4.1) de ned by either LSQR or GMRES, the operator AA ] is given by U k+1 B k B y k U  k+1 where B y k is the pseudo-inverse of the matrix B k . Thus from (11), the GCV functional is [17] G(k) = kr (k) k 2 2 (mBnZrk) 2 : We note that there are in fact two distinct de nitions for B y k and hence two de nitions for the denominator in G(k); for small enough k, the two are comparable, and the de nition we use here is less expensive to calculate [18, x7.4]. 3. L-Curve. To determine the L-curve associated with LSQR or GMRES, estimates of kr k k 2 and kx k k 2 are needed for several values of k. Using either algorithm, we can computekr k k 2 with onlya few scalar calculations. Paige andSaunders give a similar method for computing kx k k 2 [29], but, with GMRES, the cost for computing kx k k 2 is O(k 2 ). In using this method or GCV, one must go a few iterations beyond the optimal k in order to verify the optimum [19]. 4.2. Regularizationby projectionplus TSVD. If projection alone does not regularize, then we can compute the TSVD regularized solution to the projected problem (19). We need the SVD of the (k + 1)k matrix B k . This requires O(k 3 ) operations, but can also be computed from the SVD of B kBnZr1 in O(k 2 ) operations [13]. Clearly, we still need to use some type of parameter selection technique to nd a good value of `(k). First, notice that it is easy to compute the norms of the residual and the solution resulting from neglecting the ` smallest singular values. If  jk is the component of e 1 in the direction of the j-th left singular vector of B k , and if j is the j-th singular value (ordered largest to smallest), then the residual and solution 2-norms are 1 0 @ k+1 X j=kBnZr`(k)+1  2 jk 1 A 1=2 and 1 0 @ kBnZr`(k) X j=1   jk j  2 1 A 1=2 :(20) Using this fact, we can use any of our three sample methods: 1. Discrepancy Principle. Let r (k) ` denote the quantity bBnZrAx (k) ` and note that by (13) and orthonor- mality, kr (k) ` k 2 is equal to the rst quantity in (20). Therefore, we choose `(k) to be the largest value for which kr (k) ` k , if such a value exists. 9 2. GCV. Another alternative for choosing `(k) is to use GCV to compute `(k) for the projected problem. The GCV functional for the kth projected problem is obtained by substituting B k for A and B ] k for A ] , and substituting the expression of the residual in (20) for the numerator in (11): G k (`) = 2 1 P k+1 j=kBnZr`+1  2 jk (`+ 1) 2 : 3. L-Curve. We now have many L-curves, one for each value of k. The coordinate values in (20) form the discrete L-curve for a given k, from which the desired value of `(k) can be chosen without forming the approximate solutions or residuals. As k increases, the value `(k) chosen by the Discrepancy Principle will be mono- tonically nondecreasing. 4.3. RegularizationbyprojectionplusRust'sTSVD. Asinstandard TSVD, to use Rust's version of TSVD for regularization of the projected problem requires that we compute the SVD of the (k+1)k matrix B k . Using the previous notation, Rust's strategy is to set y (k)  = X i2I (k)   ik i q (k) i where q (k) j are the right singular vectors of B k and I (k)  = fi < k + 1 : j ik j > g. We focus on three ways to determine : 1. Discrepancy Principle. Using the notation from the previous section, the norm of the regularized so- lution is given by 1 ( P i62I (k)   2 ik ) 1=2 = kr (k)  k 2 : According to the discrepancy principle, we must choose  so that the residual is less than . In practice, this would require that the residual be evaluated by sorting the values j ik j and adding terms in that order until the residual norm is less than . 2. GCV. Let us denote by card(I (k)  ) the cardinality of the set I (k)  . From (11), it is easy to show that the GCVfunctionalcorresponding to the projected problem for this regularization technique is given by G k () = 2 1 P i2I (k)   2 ik (k +1BnZrcard(I (k)  )) 2 : In practice, for each k we rst sort the values j ik j;i = 1;:::;k from smallest to largest. Then we de ne k discrete values  j to be equal to these values with  1 being the smallest. We set  0 = 0. Note that because the values of  j ;j = 1;:::;k are the sorted magnitudes of the SVD expansion coecients, we have G k ( j ) = 2 1 (j (k+1);k j 2 + P j i=1  2 j ) (j +1) 2 ; j = 0;:::;k: Finally, we take the regularization parameter to be the  j for which G k ( j ) is a minimum. 10 3. L-Curve. As with standard TSVD, we now have one L-curve for each value of k. For xed k, if we de ne the  j ;j = 0;:::;k as we did for GCV above and we reorder the i in the same way that the j ik j were reordered when sorted, then we have kx (k)  j k 2 2 = 2 1 k X i=j+1   i i  2 ; kr (k)  j k 2 2 = 2 1 (j (k+1);k j 2 + j X i=1  2 j ) j = 0;:::;k: When these solution and residual norms are plotted against each other as functions of , the value of  j corresponding to the corner is selected as the regularization parameter. 4.4. Regularization by projection plus Tikhonov. Finally, let us consider using Tikhonov regularization to regularize the projected problem (18) for some inte- ger k. Thus, for a given regularization parameter , we would like to solve min y k 1 e 1 BnZrB k yk 2 2 +  2 kyk 2 2 ;(21) or, equivalently, min y k  1 e 1 0  BnZr  B k I  yk 2 :(22) The solution y (k)  to either formulation satis es (B  k B k + 2 I)y (k)  = 1 B  k e 1 :(23) Using (13) and (15), we see that y (k)  also satis es (V  k A  AV k + 2 I)y (k)  = V  k A  b:(24) Therefore, y (k)  = argmin y k  A I  V k y BnZr  b 0  k 2 : Using x (k)  = V k y (k)  , we have x (k)  = argmin x2S kAxBnZrbk 2 2 +  2 kxk 2 2 : Thus as k ! n, the backprojected regularized solution x (k)  approaches the solution to (4). We need to address how to choose a suitable value of . 1. Discrepancy Principle. Note that in exact arithmetic, we have r (k)  = bBnZrAx (k)  = U  k+1 ( 1 e 1 BnZrB k y (k)  ):(25) Hence kB k y (k)  BnZr 1 e 1 k 2 = kr (k)  k 2 . Therefore, to use the Discrepancy Princi- ple requires we choose  so that kr (k)  k 2  , with p discrete trial values  j . For a given k, we take  to be the largest value  j for which kr (k)  k 2 < , if it exists; if not, we increase k and test again. 11 2. GCV. Let us de ne (B k ) y  to be the operator mapping the right hand side of the projected problem onto the regularized solution of the projected problem: (B k ) y  = (B  k B k +  2 I) BnZr1 B  k : Given the SVD of B k as above, the denominator in the GCV functional de ned for the projected problem (refer to (11)) is 0 @ k + 1BnZr k X j=1 2 j 2 j + 2 1 A 2 : The numeratoris simplykr (k)  k 2 2 . Forvalues ofk  n, itis feasibleto compute the singular values of B k . 3. L-Curve. The L-curve is comprised of the points (kB k y (k)  BnZr 1 e 1 k 2 ;ky (k)  k 2 ). But using (25) and the orthonormality of the columns of V k , we see these points are precisely (kr (k)  k 2 ;kx (k)  k 2 ). For p discrete values of ,  i ;1  i  p, the quantities kr (k)  i k 2 and kx (k)  i k 2 can be obtained by updating their respective estimates at the (kBnZr1)st iteration. 2 4.5. Correspondence between Direct Regularization and Projection Plus Regularization. In this section, we argue why the projection plus regular- ization approaches can be expected to yield regularized solutions nearly equivalent to the direct regularization counterpart. The following theorem establishes the desired result for the case of Tikhonov vs. projection plus Tikhonov. Theorem 4.1. Fix  > 0 and de ne x (k)  to be the kth iterate of conjugate gradients applied to the Tikhonov problem (A  A + 2 I)x = A  b: Let y (k)  be the exact solution to the regularized projected problem (B  k B k + 2 I)y = B  k ( e 1 ) where B k ;V k are derived from the original problem A  A = A  b, and set z (k)  = V k y (k)  . Then z (k)  = x (k)  . Proof: By the discussion at the beginning of x4.4 and equations (23) and (24), it follows that y (k)  solves V  k (A  A+  2 I)V k y = V  k A  b: Now the columns of V k are the Lanczos vectors with respect to the matrix A  A and right-hand side A  b. But these are the same as the Lanczos vectors generated with respect to the matrixA  A+ 2 I and right-hand side A  b. Therefore V k y (k)  is precisely the kth iterate of conjugate gradients applied to (A  A +  2 I)x = A  b [11, pg. 495]. Hence z (k)  = x (k)  . 2 2 The technical details of the approach are found in [28, pp. 197-198], from which we obtain kr (k)  k = q kr (k)  k 2 +  2 kx (k)  k 2 . The implementation details for estimating kx (k)  k and kr (k)  k were taken from the Paige and Saunders algorithm at http://www.netlib.org/linalg/lsqr. 12 Projection plus { Disc. GCV L-curve Tikhonov O(pk) O(k 3 ) O(pk) TSVD O(k 3 ) O(k 3 ) O(k 3 ) Rust's O(k 3 ) O(k 3 ) O(k 3 ) Table 2 Summary of ops for projection plus inner regularization with various parameter selection techniques, in addition to the O(qk) ops required for projection itself. Here k is the number of iterations (ie. the size of the projection) taken and p is the number of discrete parameters that must be tried. Let us turn to the case of TSVD regularization applied to the original problem vs. the projection plus TSVD approach. Direct computation convinces us that the two methods compute the same regularized solution if k = n and arithmetic is exact. An approximate result holds in exact arithmetic when we take k iterations, with n BnZr ` = j < k < n. Let the singular value decomposition of B k be denoted by B k = Z k BnZr k Q T k and de ne the sj matrix W s;j as W s;j =  I 0  : Then the regularized solution obtained fromthe TSVD regularization of the projected problem is x (k) reg = V k (Q k W k;j BnZr BnZr1 k;1 W T k+1;j Z T k U T k b); where BnZr k;1 denotes the leading j  j principle submatrix of BnZr k . If k is taken to be enough larger than j so that V k Q k W k;j  ^ V W n;j , W T k+1;j Z T k U T k+1  W T n;j ^ U T and BnZr k;1   1 with  1 the leading principle submatrix of , then we expect x (k) reg to be a good approximation to x ` . This is made more precise in the following theorem. Theorem 4.2. Let k > j such that (V k Q k W k;j ) = ^ V 1 + E 1 with kE 1 k  1  1; (U k+1 Z k W k+1;j ) = ^ U 1 + E 2 with kE 2 k   2  1; where ^ V 1 and ^ U 1 contain the rst j columns of ^ V and ^ U respectively. Let D = diag(d 1 ;:::;d j ) satisfy BnZr k;1 =  1 + D with jd i j  3  1: Then kx (k) reg BnZrx ` k max 1ij 1  i + d i   3  j + 3max( 1 ; 2 )  kbk: Proof: Using the representations x ` = ^ V 1  BnZr1 1 ^ U T 1 b and x (k) reg = ( ^ V 1 +E 1 )BnZr BnZr1 k;1 ( ^ U T 1 + E T 2 )b, we obtain kx (k) reg BnZrx ` k (kBnZr BnZr1 k;1 BnZr BnZr1 1 k+kBnZr BnZr1 k;1 kkE 2 k+kE 1 kkBnZr BnZr1 k;1 k+kE 1 kkBnZr BnZr1 k;1 kkE 2 k)kbk; and the conclusion follows from bounding each term. 2 Note that typically  j   n so that 1= j is not too large. For some results relating to the value of k necessary for the hypothesis of the theorem to hold, the interested reader is referred to theory of the Kaniel-Paige and Saad [30, x12.4]. 13 Basic cost Added Cost Disc. GCV L-curve Tikhonov O(^q) O(1) O(p) O(p) TSVD O(^q) O(1) O(m) O(m) Rust's TSVD O(^q) O(m) O(m) O(m) Projection O(kn) O(1) O(k) O(k) Table 3 Summary of additional storage for each of four regularization methods under each of three parameter selection techniques. The original matrix is m  n with q nonzeros, p is the number of discrete parameters that must be tried, k iterations are used in projection, and the factorizations are assumed to take ^q storage. Projection plus { Disc. GCV L-curve Tikhonov O(1) O(p) O(p) TSVD O(1) O(k) O(k) Rust's TSVD O(k) O(k + p) O(k +p) Table 4 Summary of storage, not including storage for the matrix, for projection plus inner regular- ization approach, various parameter selection techniques. Here p denotes the number of discrete parameters tried. Each of these regularization methods also requires us to save the basis V or else regenerate it in order to reconstruct x. 5. Numerical results. In this section, we present two numerical examples. All experiments were carried out using Matlab and Hansen's Regularization Tools [16], with IEEE double precision oating point arithmetic. Since the exact, noise-free solutions were known in both examples, we evaluated the methods using the two- norm di erence between the regularized solutions and the exact solutions. In both examples when we applied Rust's method to the original problem, the  i were taken to be the magnitudes of the spectral coecients of b sorted in increasing order. 5.1. Example1. The 200200matrixA andtrue solutionx true forthis example were generated using the function baart in Hansen's Regularization Toolbox. We generated b true = Ax true and then computed the noisy vector b as b+e, where e was generated using the Matlab randn function and was scaled so that the noise level, kek kb true k , was 10 BnZr3 . The condition number of A was on the order of 10 19 . Many values of  were tested: log 10  = BnZr6;BnZr5:9;:::;BnZr2. Table 5 displays the values of the regularization parameters chosen when the three parameter selection techniques were applied together with one of the four regularization methods on the original problem. Since kek 2 = 5:3761EBnZr4, we set  that de nes the discrepancy principle as the very close approximation 5:5EBnZr4. The last column in the table gives the value of the parameter that yielded a regularized solution with the minimumrelative error when compared against the true solution. The relative error values for regularized solutions corresponding to the parameters in Table 5 are given in Table 6. Note that using GCV to determine a regularization parameter for Rust's TSVD resulted in an extremely noisy solution with huge error. The corners of the L-curves for the Tikhonov, projection, and TSVD methods were determined using Hansen's lcorner function, with the modi cation that points corresponding to solution norms greater than 10 6 for the TSVD methods were not 14 Disc. GCV L-curve optimal Tikhonov  1:259EBnZr3 1:995EBnZr4 2:512EBnZr4 5:012EBnZr5 TSVD ` 197 197 196 196 Rust's TSVD  1:223EBnZr4 9:645EBnZr7 1:223EBnZr4 1:259EBnZr4 or 1:223EBnZr4 Projection k 4 4 6 6 Table 5 Example 1: parameter values selected for each method. Disc. GCV L-curve optimal Tikhonov .1330 .1110 .1084 .0648 TSVD .1663 .1213 .1663 .1213 Rust's TSVD .1213 7E+14 .1213 .1213 Projection .1134 .1207 .1134 .1134 Table 6 Example 1: comparison of kx true BnZr x reg k 2 =kx true k 2 for each of 4 regularization methods on the original problem, where the regularization method was chosen using methods indicated. considered (otherwise, a false corner resulted). Next, we projected using LSQR and then regularized the projected problem with one of the three regularization methods considered. For each of the three methods, we computed regularization parameters for the projected problem using Discrepancy, GCV, and L-curve, then computed the corresponding regularized solutions; the pa- rameters that were selected in each case at iterations 10 and 40 are given in Tables 7 and 9 respectively. As before, the lcorner routine was used to determine the corners of the respective L-curves. Comparing Table 6 and 8, we observe that computing the regularized solution via projection plus Tikhonov for projection size of 10 using either the Discrepancy Principle or the L-curve to nd the regularization parameter gives results as good as if those techniques had been used with Tikhonov on the original problem to determine a regularized solution. Similar statements can be made for projection plus TSVD and projection plus Rust's TSVD. We should also note that for Tikhonov, with and without projection, none of the errors in the tables is optimal; that is, no parameter selection techniques ever gave the parameter for which the error was minimal. 5.2. Example 2. The 255  255 matrix A for this example was a symmetric Toeplitz matrix with bandwidth 16 and exponential decay across the band. 3 The true solution vector x true is displayed as the top picture in Figure 2. We generated b true = Ax true and then computed the noisy vector b as b+e, where e was generated using the Matlab randn function and was scaled so that the noise level, kek kb true k , was 10 BnZr3 . The vector b is shown in the bottom of Figure 2. The condition number of A was 1:6510 7 . We generated our discrete  i using log 10  = BnZr5;BnZr4:9;:::;BnZr1. The norm of the noise vector was 7:16EBnZr2, so we took the value of  that de nes the discrepancy principle to be 8:00EBnZr2. In this example, it took 61 iterations for LSQR to reach a minimum relative er- ror of 9:48EBnZr2, and several more iterations were needed for the L-curve method to 3 It was generated using the Matlab command A = (1=(2  pi  sigma))  toeplitz(t); where sigma = 5 and t = [exp(BnZr([0 : bandBnZr1]:^2)=(2sigma^2));zeros(1;N BnZrband)] with band = 16. 15 Disc. GCV L-curve optimal Tikhonov (k) 1:259EBnZr3 1:995EBnZr3 1:995EBnZr4 5:012EBnZr5 TSVD `(k) 2 3 2 2 Rust's TSVD (k) 1:679EBnZr4 1:773EBnZr4 1:679EBnZr5 1:679EBnZr5 Table 7 Example 1, iteration 10: regularization parameters selected for projection plus Tikhonov, TSVD, and Rust's TSVD. Disc. GCV L-curve optimal Tikhonov .1330 .1486 .1084 .0648 TSVD .1663 .3451 .1663 .1213 Rust's TSVD .1213 .1663 .1213 .1213 Table 8 Example 1, iteration 10: comparison of kx true BnZrx reg k 2 =kx true k 2 for projection plus Tikhonov, TSVD, and Rust's TSVD. Disc. GCV L-curve optimal Tikhonov (k) 1:259EBnZr3 1:995EBnZr3 1:995EBnZr4 5:012EBnZr5 TSVD `(k) 10 13 8 9 Rust's TSVD (k) 9:201EBnZr5 1:225EBnZr4 9:201EBnZr5 9:201EBnZr5 Table 9 Example 1, iteration 40: regularization parameters selected for projection plus Tikhonov, TSVD, and Rust's TSVD. Disc. GCV L-curve optimal Tikhonov .1330 .1486 .1084 .0648 TSVD .1679 .1986 .1206 .1165 Rust's TSVD .1162 .1162 .1162 .1162 Table 10 Example 1, iteration 40: comparison of kx true BnZrx reg k 2 =kx true k 2 for projection plus Tikhonov, TSVD, and Rust's TSVD. 50 100 150 200 250 ?2 0 2 4 6 8 10 12 exact solution 50 100 150 200 250?2 0 2 4 6 b Fig. 2. Example 2: Top: exact solution. Bottom: noisy right hand side b. 16 Disc. GCV L-curve optimal Tikhonov  1:259EBnZr2 1:259EBnZr2 1:995EBnZr3 3:9811EBnZr3 TSVD ` 216 254 201 201 Rust's TSVD  2:183EBnZr2 2:586EBnZr6 1:477EBnZr2 1:527EBnZr2 Projection k 2 18 5 5 Table 11 Example 2: parameter values selected for each method. The projection was performed on a left preconditioned system. Disc. GCV L-curve optimal Tikhonov 9:909EBnZr2 9:909EBnZr2 1:050EBnZr2 9:394EBnZr2 TSVD 1:102EBnZr1 8:121EBnZr1 9:074EBnZr2 9:0744EBnZr2 Rust's TSVD 1:025EBnZr1 22:67 1:011EBnZr1 1:011EBnZr1 Projection 1:030EBnZr1 9:85EBnZr2 1:15EBnZr1 9:479EBnZr2 Table 12 Example 2: comparison of kx true BnZr x reg k 2 =kx true k 2 for each of 4 regularization methods on the original problem. estimate a stopping parameter. Likewise, the dimension k of the projected problem had to be around 60 to obtain good results with the projection-plus-regularization ap- proaches, and much larger than 60 for the L-curve applied to the projected, Tikhonov regularized problemto givea good estimate ofthe corner with respect to the Tikhonov regularized original problem. Therefore, for the projection based techniques, we chose to work with a left preconditioned system (refer to the discussion at the end of x 3.5). Our preconditioner was chosen as in [22] where the parameter de ning the precondi- tioner was taken to be m = 50. The values of the regularization parameters chosen when the three parameter selection techniques were applied together with one of the four regularization methods on the original problem are given in Table 11. The last column in the table gives the value of the parameter that gave a regularized solution with the minimum relative error over the range of discrete values tested, with respect to the true solution. The relative errors that resulted from computing solutions according to the parameters in Table 11 are in Table 12. We note that GCV with TSVD and Rust's TSVD were ine ective. The corners of the L-curves for the Tikhonov, projection, and TSVD methods were determined using Hansen's lcorner function, with the modi cation that points corresponding to the largest solution norms for the TSVD methods were not consid- ered (otherwise, a false corner was detected by the lcorner routine). Next, we projected using LSQR (note that since the matrix and preconditioner were symmetric, we could have used MINRES as in [22]) and then regularized the projected problem with one of the three methods considered. For each of the three methods, we computed regularizationparameters for the projected problemusing Dis- crepancy, GCV, and L-curve, then computed the corresponding regularized solutions; the parameters that were selected in each case at iterations 15 and 25 are given in Ta- bles 13 and 15, respectively. The relative errors of the regularized solutions generated accordingly are given in Tables 14 and 16. Again, we used the lcorner routine to determine the corners of the respective L-curves, except in the case of Rust's TSVD method. In the latter case, there was 17 Disc. GCV L-curve optimal Tikhonov (k) 2:512EBnZr2 1:585EBnZr2 1:9953EBnZr3 3:981EBnZr3 TSVD `(k) 5 5 4 1 Rust's TSVD (k) 3:558EBnZr2 3:558EBnZr2 3:558EBnZr2 3:558EBnZr2 Table 13 Example 2, iteration 15: regularization parameters selected for projection plus Tikhonov, TSVD, and Rust's TSVD. Disc. GCV L-curve optimal Tikhonov 1:0001EBnZr1 9:9511EBnZr2 1:061EBnZr1 9:530EBnZr2 TSVD 9:595EBnZr1 9:595EBnZr1 1:004EBnZr1 9:357EBnZr2 Rust's TSVD 1:004EBnZr1 1:004EBnZr1 1:004EBnZr1 1:004EBnZr1 Table 14 Example 2, iteration 15: comparison of kx true BnZrx reg k 2 =kx true k 2 for projection plus Tikhonov, TSVD, and Rust's TSVD. always a very sharp corner that could be picked out visually. Comparing Table 11 with Tables 13 and 15, we see that the parameter chosen by applying the L-curve method to projected-plus-Tikhonov problem was the same parameter chosen by applying the L-curve to the original problem. Moreover, a com- parison of Table 12 with Tables 14 and 16 shows that relative errors of the regularized solutions computed accordingly are comparable to applying Tikhonov to the origi- nal problem with that same parameter. Similar results are shown for the other cases, with the exception that the discrepancy principle did not work well for the projection- plus-TSVD problems, and GCV was not e ective for the projected problems when k = 25. 6. Conclusions. In this work we have given methods for determining the reg- ularization parameter and regularized solution to the original problem based on reg- ularizing a projected problem. The proposed approach of applying regularization and parameter selection techniques to a projected problem is economical in time and storage. We presented results that in fact the regularized solution obtained by backprojecting the TSVD or Tikhonov solution to the projected problem is almost equivalent to applying TSVD or Tikhonov to the original problem, where \almost" depends on the size of k. The examples indicate the practicality of the method, and illustrate that our regularized solutions are usually as good as those computed using the original system and can be computed in a fraction of the time, using a fraction of the storage. We note that similar approaches are valid using other Krylov subspace methods for computing the projected problem. In this work, we did not address potential problems from loss of orthogonality as the iterations progress. In this discussion, we did, however, assume that either k was naturally very small compared to n or that preconditioning had been applied to enforce this condition. Possibly for this reason, we found that for modest k, round-o did not appear to degrade either the LSQR estimates of the residual and solution norms or the computed regularized solution in the following sense: the regularization parameters chosen viathe projection-regularization and the corresponding regularized solutions were comparable to those chosen and generated for the original discretized problem. For the Tikhonov approach in this paper, we have assumed that the regularization 18 Disc. GCV L-curve optimal Tikhonov (k) 2:512EBnZr2 1:259EBnZr2 1:995EBnZr3 3:982EBnZr3 TSVD `(k) 9 9 8 3 Rust's TSVD (k) 4:828EBnZr2 7:806EBnZr3 4:828EBnZr2 4:828EBnZr2 Table 15 Example 2, iteration 25: regularization parameters selected for projection plus Tikhonov, TSVD, and Rust's TSVD. Disc. GCV L-curve optimal Tikhonov 1:000EBnZr1 9:909EBnZr2 1:061EBnZr1 9:530EBnZr2 TSVD 9:164EBnZr1 9:595EBnZr1 1:004EBnZr1 9:145EBnZr2 Rust's TSVD 1:004EBnZr1 2.728 1:004EBnZr1 1:004EBnZr1 Table 16 Example 2, iteration 25: comparison of kx true BnZrx reg k 2 =kx true k 2 for projection plus Tikhonov, TSVD, and Rust's TSVD. operator L wasthe identityorwasrelated to the preconditioningoperator; this allowed us to ecientlycomputekr (k)  kandkx (k)  kformultiplevalues of ecientlyfor each k. If L is not the identity but is invertible, we can rst implicitly transform the problem to \standard form"[17]. With  A = AL BnZr1 , x = Lx, we can solve the equivalent system min x = k  AxBnZrbk 2 2 +  2 kxk 2 2 : Then the projection plus regularization schemes may be applied to this transformed problem. Clearlythe projection based schemes willbe useful as longas solvingsystems involving L can be done eciently. REFERENCES [1]  A. Bj  orck, A bidiagonalization algorithm for solving large and sparse ill-posed systems of linear equations, BIT, 28 (1988), pp. 659{670. [2]  A. Bj  orck, E. Grimme, and P. V. Dooren, An implicit shift bidiagonalization algorithm for ill-posed systems, BIT, 34 (1994), pp. 510{534. [3] D. Calvetti, G. Golub, and L. Reichel, Estimation of the L-curve via Lanczos bidiagonal- ization, Tech. Report SCCM-97-12, Scienti cComputingand ComputationalMathematics Program, Computer Science Depart., Stanford University, 1997. [4] T. Chan and M. Ng, Galerkin projection method for solving multiple linear systems, Tech. Report 96-31, Computational and Applied Mathematics Dept., UCLA, Los Angeles, CA, 1996. [5] L. Desbat and D. Girard, The `minimum reconstruction error' choice of regularization pa- rameters: Some more ecient methods and their application to deconvolution problems, SIAM J. Sci. Comput., 16 (1995), pp. 1387{1403. [6] H. W. Engl and W. Grever, Using the L-curve for determining optimal regularization pa- rameters, Numer. Math., 69 (1994), pp. 25{31. [7] H. E. Fleming, Equivalence of regularization and truncated iteration in the solution of ill-posed image reconstruction problems, Lin. Alg. and Applics., 130 (1990), pp. 133{150. [8] A. Frommer and P. Maass, Fast CG-based methods for Tikhonov-Phillips regularization, SIAM J. Sci. Comput., (to appear). [9] G. Golub, M. Heath, and G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics, 21 (1979), pp. 215{223. [10] G. Golub and W. Kahan, Calculating the singular values and pseudo-inverse of a matrix, SIAM J. Numer. Anal., 2(Series B) (1965), pp. 205{224. 19 [11] G. Golub and C. V. Loan, Matrix Computations, The Johns Hopkins University Press, Bal- timore, MD, 1989. [12] W. Groetsch, Theory of Tikhonov Regularization for Fredholm equations of the First Kind, Pitman Publishing Limited, 1984. [13] M. Gu and S. Eisenstat, A stable and fast algorithm for updating the singular value decom- position, Tech. Report RR-939, Yale University, Department of Computer Science, New Haven, CT, 1993. [14] M. Hanke, J. Nagy, and R. Plemmons, Preconditioned iterative regularization for ill-posed problems, Numerical Linear Algebra and Sci. Computing, (1993), pp. 141{163. L. Reichel, A. Ruttan, and R. S. Varga, editors. [15] P. C. Hansen, Analysis of discrete ill-posed problems by means of the L-curve, SIAM Review, 34 (1992), pp. 561{580. [16] P. C. Hansen, Regularization tools: a Matlab package for analysis and solution of discrete ill-posed problems, Numer. Algo., 6 (1994), pp. 1{35. [17] , Rank-De cient and Discrete Ill-Posed Problems, SIAM Press, Philadelphia, 1998. [18] , Rank De cient and Discrete Ill-Posed Problems, PhD thesis, Technical University of Denmark, July 1995. UNIC Report UNIC-95-07. [19] P. C. Hansen and D. P. O'Leary, The use of the L-curve in the regularization of discrete ill-posed problems, SIAM J. Sci. Comput., 14 (1993), pp. 1487{1503. [20] B. Hofmann, Regularization for Applied Inverse and Ill-Posed Problems, Teubner-Texte Mathe., Teubner, Leipzig, 1986. [21] L. Kaufman and A. Neumaier, Regularization of ill-posed problems by envelope guided con- jugate gradients, Journal of Computational and Graphical Statistics, (1997). [22] M. Kilmer, Symmetric Cauchy-like preconditioners for the regularized solution of 1-d ill-posed problems, Tech. Report CS-TR-3851, University of Maryland, College Park, 1997. [23] , Cauchy-like preconditioners for 2-dimensional ill-posed problems, SIAM J. Matrix Anal., (1998). to appear. [24] M. Kilmer and D. P. O'Leary, Pivoted Cauchy-like preconditioners for regularized solution of ill-posed problems, SIAM J. Sci. Stat. Comput., (1998). to appear. [25] V. A. Morozov, On the solution of functional equations by the method of regularization, Soviet Math. Dokl., 7 (1966), pp. 414{417. cited in [17]. [26] J. Nagy, R. Plemmons, and T. Torgersen, Iterative image restoration using approximate inverse preconditioning, IEEE Trans. Image Proc., 5 (96), pp. 1151{1163. [27] D. P. O'Leary and J. A. Simmons, A bidiagonalization-regularization procedure for large scale discretization of ill-posed problems, SIAM J. Sci. Stat. Comput., 2 (1981), pp. 474{489. [28] C. C. Paige and M. A. Saunders, Algorithm 583, LSQR: Sparse linear equations and least squares problems, ACM Transactions on Mathematical Software, 8 (1982), pp. 43{71. [29] , LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans- actions on Mathematical Software, 8 (1982), pp. 43{71. [30] B. N. Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood Cli s, NJ, 1980. [31] B. W. Rust, Truncating the singular value decomposition for ill-posed problems, Tech. Report NISTIR 6131, Mathematical and Computational Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD, 1998. [32] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Publishing Company, Boston, 1996. [33] Y. Saad and M. H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J.Sci. Stat. Comput., 7 (1986), pp. 856{869. [34] A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-Posed Problems, Wiley, New York, 1977. [35] J. M. Varah, Pitfalls in the numerical solution of linear ill-posed problems, SIAM J. Sci. Stat. Comput., 4 (1983), pp. 164{176. [36] C. R. Vogel, Non-convergence of the L-curve regularization parameter selection method, In- verse Problems, 12 (1996), pp. 535{547. 20