ABSTRACT Title of Dissertation: Uniqueness for continuous superresolution by means of Choquet theory and geometric measure theory Ryan M. Cinoman Doctor of Philosophy, 2021 Dissertation Directed by: Professor John Benedetto Department of Mathematics The problem of superresolution is to recover an element of a vector space from data much smaller than the dimension of the space, using a prior assumption of sparsity. A famous example is compressive sensing, where the elements are images with a large finite resolution. On the other hand, we focus on a continuous form of superresolution. Given a measure ? on a continuous domain such as the two dimensional torus, can we recover ? from knowledge of only a finite number of its Fourier coefficients using a total variation minimization method? We will see that the answer depends on certain properties of ?. Namely, a necessary condition is that ? be discrete. We use methods from geometric analysis to investigate the continuous super- resolution problem. Tools from measure theory relate properties of the support of a measure, such as Hausdorff dimension, to properties of its Fourier transform. We also use measure theory to investigate the possibility of alternatives to total variation that may allow us to recover surface measures defined on space curves. There is a theorem of Choquet concerning representations of points in convex sets as sums of their extreme points. As it turns out, we can apply this to the superresolution problem because the extreme points of the set of measures with total variation 1 are precisely the set of delta measures. We consider superresolution as a convex optimization problem, where the goal is to find representations of the initial data as sums of delta measures. Choquet theory provides tools to investigate the previously unresolved problem of uniqueness. We use this to give a novel sufficient condition for a measure to be uniquely superresolved, given data on a known finite set of frequencies. Uniqueness for continuous superresolution by means of Choquet theory and geometric measure theory by Ryan M. Cinoman Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2021 Advisory Committee: Professor John J. Benedetto, Chair/Advisor Professor Radu V. Balan Professor Wojciech Czaja Professor Matei Machedon Dean?s Representative: Professor Alexander Barg ? Copyright by Ryan M. Cinoman 2021 Acknowledgments I would like to extend my sincere thanks to my advisor, Dr. John Benedetto. I deeply appreciate the opportunity he has given me to work on this exciting topic, as well as the invaluable advice and guidance he has given me over the years. I?d also like to gratefully acknowledge ARO Grant W911NF-17-1-0014, which has supported me during my research. Finally, I?d like to deeply thank my family and friends who have supported me throughout my research. I?d like to extend a special thanks to my parents, my boyfriend Lawrence Cho, and Jill Cassone. Without the unwavering support of each of them, this thesis would not have been possible. ii Table of Contents Acknowledgements ii Table of Contents iii Chapter 1: Introduction 1 1.1 Compressive Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Prony?s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Continuous Superresolution . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Resolution of Point Sources and Minimum Separation . . . . . . . . . 9 1.5 Attempts at Superresolving Non-discrete Measures . . . . . . . . . . 10 1.6 Positive Definite Functions . . . . . . . . . . . . . . . . . . . . . . . . 11 1.7 Bochner?s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.7.1 G = Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.7.2 G = R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.7.3 Additional Examples and Applications of Bochner?s Theorem 22 1.8 Positive Definite Extensions . . . . . . . . . . . . . . . . . . . . . . . 24 1.8.1 Extensions in Higher Dimensions . . . . . . . . . . . . . . . . 28 1.9 My Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 2: Measure Theory and Surface Measures 36 2.1 Definitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2 Energy Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3 Cantor Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4 Fourier Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . 43 2.5 Fourier Dimension and Salem Sets . . . . . . . . . . . . . . . . . . . . 47 2.6 Fourier Decay of Surface Measures . . . . . . . . . . . . . . . . . . . 51 2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Chapter 3: Choquet Theory 57 3.1 Choquet?s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 Application to Completely Monotone Functions . . . . . . . . . . . . 61 3.3 Uniqueness and the Choquet Simplex . . . . . . . . . . . . . . . . . . 67 3.4 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.4.1 Choquet?s Theorem for Metrizable Spaces . . . . . . . . . . . 73 3.4.2 The Choquet-Meyer Theorem . . . . . . . . . . . . . . . . . . 75 iii Chapter 4: My Results 86 4.1 Uniqueness of Positive Definite Extensions . . . . . . . . . . . . . . . 86 4.2 Faces of Choquet Simplices . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3 Uniqueness for Signed and Complex Measures . . . . . . . . . . . . . 97 Bibliography 108 iv Chapter 1: Introduction 1.1 Compressive Sensing A major motivation for studying superresolution is the success of the so-called discrete case, known under another name as compressive sensing. The field of com- pressed sensing is based on a famous set of papers published starting in 2004 written by Cande?s, Romberg and Tao [14, 15, 16, 17]. The motivating problem is to find sparse solutions to underdetermined systems of equations. The main results of the original papers were that: yes, it is possible to recover a k-sparse vector in RN from only m measurements when m & k ln(eN/k); the algorithm to do so is practically effecient; and random sensing matrices are satisfactory for this algorithm with high probability ? 1 ? 2 exp(?Cm) provided the number of measurements is sufficient as before. In more formal terms, what we want is a sensing matrix A which is ?short and fat,? i.e. m ? N where m  N. We denote our sensed data y = Ax, where x is some k-sparse (or approximately k-sparse) vector in RN , and we want an algorithm which recovers x# = ?(y), so that the distance from x# to x is small in some sense. The original work from Cande?s, Romberg and Tao showed that if x is indeed sparse, 1 then we can get not only approximate, but exact recovery from the algorithm: x# = argmin?z?1 such that Az = y. (1.1) This can be formulated as a linear programming problem and hence has polynomial complexity, a vast improvement over the similar ?ideal? optimization problem? sometimes referred to as `0 optimization, x# = argmin|supp z| such that Az = y, (1.2) which is essentially a combinatorial search. This method is very powerful, and has implications for many applications, as nearly sparse signals and images are very common in natural data [27]. However there are a few obstacles which come up, which have been the topic of much study over the past decades. In their original papers, Cande?s, Romberg and Tao defined a number of prop- erties of sensing matrices which are relevant to the compressive sensing problem, including the Null Space Property (NSP), Uniform Uncertainty Principle (UUP), Exact Reconstruction Principle (ERP), and Restricted Isometry Property (RIP) [17]. Some of these have faded into mostly historical significance, but NSP and RIP, in particular the latter, continue to get attention today. Definition 1. (Null Space Property) An m?N matrix A is said to have the NSP of order k if for any ? ? kerA \ {0}, S ? {1, . . . , N} with |S| ? k, ??S?1 0 such that A satisfies RIP of order k. RIP is more restrictive than NSP, but in return we get much more robustness. For this reason much research has been done on the RIP and various recovery algorithms that depend on it in the past 15 years. Both NSP and RIP represent sufficient conditions for accurate and unique recovery in (1.70). The goal of this dissertation is to extend the results of compressive sensing to a continuous domain, the two dimensional torus T2. In section 1.9 I outline the space and algorithm of interest, and my main results, which are discussed 3 in more detail in Chapter 4. As I will show, in continuous space the question of uniqueness of the reconstruction is much more of an issue than in the finite dimensional case. However I will prove both a necessary and a sufficient condition for unique reconstruction. 1.2 Prony?s Method There have been some limited attempts to extend compressive sensing into the continuous domain with good results, but limited generality. An example of a widely studied problem is the recovery of point masses, where we assume the signal to be recovered is assumed to be of the form ?M f(x) = ?x (x). (1.3)k k=1 Perhaps the most archetypal form of this problem dates back long before compressive sensing. Prony?s method was known in the 18th century, but its ideas still see some use today [26, 54]. Prony?s method is an algorithm for resolving discrete signals as in (1.3) on R. However it is typically formulated on the frequency spectrum as a sampling problem. The goal is to recover a signal f? ? L?(R) which is a finite sum of sinusoids, ?M f?(t) = B ei?ktk (1.4) k=1 4 from N = 2M evenly spaced samples, ?M f?(?n) = B ei?k?nk , (1.5) k=1 where ? > 0 is the fixed distance between samples. The algorithm comes from recognizing two key relationships. First, since f? is the sum of M exponentials, it is the solution to a particular linear difference equation of M variables. That is to say we have ?M f?(?n) = Pkf?(?(n? k)). (1.6) k=1 Then the frequencies ?k are related to this equation by the roots of the characteristic polynomial, ?M zM ? P zM?11 ? ? ? ? ? PM = (z ? ei?k). (1.7) k=1 The process of Prony?s method consists of first calculating the coefficients Pk using (1.6), ?? ?? ? ? ???? FM?1 F1 ? ? ? F0 ???????? P1 ???? ?? ? FM ?? ??? FM FM?1 F1 ?????? P2 ?? ?? ????? ? ? FM+1 ?? ? = ? , (1.8)? .. . . .. ?. . . ??????? .. ???? ?. ??? .. ?? ?? ? ? . ???? FN?1 ? ? ? FN?M?1 Pk FN where Fk = f?(?k). Second, factor the polynomial in (1.7) to find the frequencies 5 ?k. And finally the magnitudes Bk are found by another linear equation, ?? ?? ? ? ? ???? 1 1 ? ? ? 1 ???????? B1 ???? ?? F1 ???? e?1? e?2? e?M?? ???????? ?? ?? B2 ???? ?? ? F2 ?? ? = ? ?? .. . . . . ? . ? . (1.9) ? . . .. ???????? . . ???? ??? . ? . ???? e?1?M ? ? ? e?M?M Bk FM Prony?s method works on two conditions: first, that the number of samples is at least twice the number of sinusoids; and second, that the sampling rate is at least the Nyquist frequency. The second point is slightly more subtle, but it is because during the polynomial factorization step we only solve for e?k , not the frequencies themselves, hence they are non-unique up to a factor of 2??. Prony?s method is seen in a broad variety of applications. Identifying and demixing sinusoidal signals is useful in many signal processing applications, includ- ing electromagnetics and antenna engineering [54]. It is frequently seen in the re- lated, slightly more modern algorithms called matrix pencils, for finding generalized eigenvalues of matrix operators, in a manner similar to finding the frequencies of the sinusoids above [54]. 1.3 Continuous Superresolution Use of the total variation norm for basis pursuit in the continuous domain is very recent, inspired by the massive success of compressive sensing in the discrete case. The first work was from de Castro and Gamboa [25], who used Beurling?s 6 theory of minimal extrapolations to generalize some of the concepts of the NSP and random sensing matrices from compressive sensing. Notably they showed that for a sensing apparatus defined from a family of functions F = (u )Kk k=1, there must be a solution to the optimization problem ?# = argmin???TV such that ?uk, ?? = yk, 1 ? k ? K, (1.10) ??M[0,1] that?is supported in a finite set, which are the roots to a ?generalized polynomial? 1? k ckuk. Under certain conditions on the family F , these solutions may also be unique. The use of Beurling?s minimal extrapolations were expounded in a different way by Benedetto and Li [6], who studied the case of F a family of complex ex- ponentials. This is equivalent to having prior knowledge of the Fourier transform. In this case the zero sets of ?generalized polynomials? are roots of trigonometric polynomials. The contribution of Benedetto and Li was giving quite sharp results for one particular situation [6]. Let ? ? M(Tn) and our prior knowledge be f?(m) for m ? ? ? Zn, not necessarily a rectangular box. We consider the set of minimal extrapolations as follows. If  = (f) = inf{???TV | ? ?M(T 2), ?m ? ? ??(m) = f?(m)}, (1.11) then the set of minimal extrapolations is the set of extensions of f? which achieve 7 the upper bound: E = {? ?M(Tn) | ?m ? ? ??(m) = f?(m) and ???TV is minimal}. (1.12) They divide into three possible situations, based on the observation that for all m ? ?, |f?(m)| ? . Define ? = {m ? ? | |f?(m)| = ???}TV . The cases are whether |?| is 0, 1 or greater than 1. In other words, how frequently does |f? | achieve its maximal value and does ?f??? =???TV ? At first glance these distinctions may be arbitrary, but we will see in section 1.6, particularly in the work of Carathe?odory on positive definite functions [19], that in the case of positive definite extensions, sequences which achieve their maxima represent points on the boundary of the convex set of positive definite functions. Similarly here, functions f : ?? C will fall on the boundary of the set of functions f with (f) = . This is not insignificant; we can relate this property back to the NSP from compressive sensing?in the context of compressive sensing, a signal is sparse if it is in the boundary of a face of the `1 ball, which is key to why compressive sensing works. The result is if |?| =6 1, then we have results as Castro and Gamboa, that solutions must fall within the roots of a trigonometric polynomial. If |?| ? 2, we have even better, that this level set must take the form of a finite lattice, making the calculation of the results very straightforward, and unique for sufficiently large ?. 8 1.4 Resolution of Point Sources and Minimum Separation Other work on point superresolution was done by Fernandez-Granda, et al., who have developed a nice investigation on the relationship of the recovery problem and the uncertainty principle [12, 13, 30]. One disadvantage of moving to continuous space is that there is no longer a straightforward relationship between the sparsity and the fineness of our signal. In other words, if we only have prior knowledge up to a finite band limit, then we can not be expected to resolve signals which have details which are arbitrarily fine. It is easy to construct examples which are Fernandez- Granda approaches this by adding a constraint on the minimum separation of the signal. If ? ?M(R) and supp? = S, the minimum separation of ? is ? ? ?(?) = inf ?t? t?? . (1.13) t=6 t??S In [13], Cande?s and Fernandez-Granda show that minimum separation re- quirements are enough to find uniqueness. Remarkably, this doesn?t require any modification of the search algorithm. Standard basis pursuit is enough to find these discrete signals, even though it has no ?knowledge? of the minimum separation. Their main theorem is as follows. Theorem 1. Let ? ?M(R) satisfy (1.13) with ?(?) ? D. If the values of ??(k) are known for |k| ? fc, then if fc ? 128 and 2 ?(?) ? , (1.14) fc 9 then ? is the unique solution to the basis pursuit algorithm, ?# = argmin???TV such that ??(k) = yk |k| ? fc. (1.15) ??M[0,1] The same result holds with a less restrictive constant for higher dimensions Rn as well, and it also holds up to stability in the face of noisy measurements. Later work from Fernandez-Granda has improved upon these constants, going as low as 1.26 in the 1-dimensional case [30]. 1.5 Attempts at Superresolving Non-discrete Measures There are only a few results that attempt to perform signal recovery on signals that are not discrete. They tend to lean strongly on the works prior described. A generalization from Unser, et al. [59] lets us perform superresolution on classes of splines, functions which behave like discrete functions when applied to a differential operator. He shows that splines, rather than deltas, are the archetypal solution for a slightly altered problem, x# = argmin?Lx?1 such that A(x) = y, (1.16) x?X where L is a differential operator, X = {x ? BV (Rn) | Lx ?M(Rn)}, and A : X ? Rm is a linear measurement function. The theory is very similar to the above work, but it demonstrates the flexibility of the theory nicely. Another creative approach was by Ongie [48, 49], who looked to generalize 10 Prony?s method to higher dimensions. Rather than starting with a mix of sinusoids, they recognize a key step of Prony?s method is to factor the trigonometric polynomial in (1.7). Let ?N ?(x) = ei?k?x, (1.17) k=1 be a trigonometric polynomial in on R2. Let f ? L?(R2) be of the form ?M f = ?k?U , (1.18)k k=1 where Uk are disjoint open subsets of R2 and ? 2kUk = {x ? R | ?(x) =6 0}. Then by recognizing that ?Df ? 0, it is possible to perform a similar calculation to the 1-dimensional case and recover the support of f by factoring a trigonometric polynomial. Bezout?s theorem puts a bound on the number of distinct Uk, so a system of equations finds the weights ?k, and we have theoretically guaranteed recovery. 1.6 Positive Definite Functions Carathe?odory began studying positive definite functions in the early 20th cen- tury, at around the same time he was doing work on convex geometry. The context in which such functions were discovered was in complex function theory. The con- nection is not immediately apparent, but as we will see there are deep similarities. 11 There we say that a positive definite function on the unit disc is one of the form ?? f(z) = 1 + (ak + ibk)z k, (1.19) k=1 which is analytic and has positive real part. Remarkably, Carathe?odory gave a characterization of these functions that as we will see is useful for the problem we are interested in today [19]. He said that such a function is positive definite if and only if for each n ? N the coordinates (a1, b1, a2, ? ? ? , an, bn) fall in the convex hull generated by the curve (cos(?), sin(?), cos(2?) ? ? ? , cos(n?), sin(n?)), for 0 ? ? ? 2?. Call this set Qn, and say that a point in Qn represents a function f if f has those coefficients for 0 ? k ? n. The main line of his proof contains many of the same beats as the other results in this paper. He recognizes the convexity of the set of positive definite functions P . He identifies that the interior of P is nonempty, or in other words there are functions f for which an analytic neighborhood of f falls within P . Therefore Qn cannot fall in a lower-dimensional hyperplane of R2n. And then he proves using complex analytic methods that the boundary of P are all rational functions, which are generated by those of the form 1 f(z) = |?| = 1, (1.20) 1? ?z which are represented by points on the curve (cos(?), sin(?), cos(2?), ? ? ? , cos(n?), sin(n?)), 0 ? ? ? 2?. 12 Toeplitz [58] later showed that an equivalent characterization of sequences (a1, b1, ? ? ? ) is that ?n ck?lzkzl ? 0 n ? N, zk ? C, (1.21) k,l=1 where c0 = 2, ck = ak ? ibk, d?k = dk. This definition is the one that was used commonly later in the century through today, and as we will see it has inspired a host of problems of this form in more generality and under various conditions [56]. Indeed, as we will see it is quite relevant to the total variation optimization problem in different settings as well. Definition 3. Given a group G and a symmetric subset V ? G, with 0 ? V, we say f : V ? C is positive definite if ?n f(x x?1k l )cicj ? 0 (1.22) k,l=1 for all sequences ? = (x )nj j=1 such that ?? ? ? V , and sequences (c )nj j=1 ? C. For G = Z we get the definition above studied by Carathe?odory and Toeplitz, and for G = R we get a situation that has been studied extensively, and which has particular interest in harmonic analysis because of Bochner?s theorem, which relates them to the Fourier transform of positive measures. Theorem 2. (Bochner) For any locally compact abelian group G with dual group G?, a function f : G ? C is positive definite if and only if there exists a unique 13 nonnegative measure ? ?M(G?) such that ? f(x) = e2?ix? d?(?). (1.23) G? The result is named after Salomon Bochner, who proved it first in 1932 for the case G = R, and later generalized his result to higher dimensions [8, 9, 56]. Although, Bochner?s result was actually inspired by Herglotz, who proved the theo- rem for the case G = Z in 1911. I will give details on both Herglotz and Bochner?s Theorems in the following chapter, but first I will give a few other properties of positive definite functions. Proposition 1. Let G be a locally compact abelian group, let V = ?? ? ? G and let f : V ? C be a positive definite function. 1. f(0) ? 0 2. f(?x) = f(x) ? ? 3. ?f(x)? ? f(0) These are all easily verified from the definitions above, and are standard in the literature [56]. The following is a standard but slightly less trivial result which was first shown by Artjomenko [2]. Proposition 2. If f is positive definite on R and continuous at 0 then it is uniformly continuous. Proof. Let |x? y| < ?. Choosing ? = {x, y, 0}, it follows from the definition of 14 positive definiteness that for any choice of {cx, cy, c0} ? C, ? ? [ ] (|c0|2 2 +|c |2 +?c ?x y )f(0) + 2 Re cxc0f(x) + cyc0f(y) + cxcyf(x? y) ? 0. (1.24) ? ? With the cho?ice c0 = 1, cx?= ?cy = ?f(x)? f(y)? /(f(x)? f(y)), and using the fact that 2f(0) ? ?f(x)? f(y)?, we get ? ? 3f(0)? 2 Re f(x? y) ? 2?f(x)? f(y)?? ? (1.25)3 2(f(0)? Re f(x? y)) ? ?f(x)? f(y)? . (1.26) 2 Hence we have a uniform bound on the continuity of f . Other regularity results follow from positive definiteness as well. In a similar manner to the previous result, it is possible to bound the derivatives of a positive definite function. Theorem 3. If for some k ? N a positive definite function f defined on R is 2k-times differentiable at 0, then f is 2k-times differentiable on R as well. The following on analytic positive definite functions was proven by each of Le?vy and Raikov independently [42, 51]. Theorem 4. If f is analytic and positive definite in a symmetric interval (?a, a), then there exist positive constants ?, ? ? (0,?] such that f can be extended to a function holomorphic on the horizontal strip {z ? C | ?? < Im z < ?}. (1.27) 15 And finally, more recently Sasva?ri has proven a similar result for measurability [55]. Theorem 5. If f is positive definite, and it is measurable on a symmetric interval (?A,A), then it is measurable on all of R. 1.7 Bochner?s Theorem I will not prove the theorem in full generality but I will provide details for the two most relevant cases, G = Z (this case is often referred to as Herglotz?s theorem) and G = R. The proofs given are not performed as in Bochner?s and Herglotz?s original papers but are updated using distribution theory. The proofs as given are based on a review from Maruyama [46]. Before I proceed to the proofs, I will make a short statement on distribution theory that is necessary for them. We will use distribution theory somewhat loosely in the upcoming section for the purpose of being as general as possible while keeping the use of notation reasonable. We will refer to S(G) as a set of test functions appropriate for G. For example if G = Rn then the nat?ural choice is the Schwartz functions. For G = T n we take the periodic summations n ?(x+ nT ) of Schwartz functions. The particular choice for other groups G is not important, as the properties that are necessary will be clear in the proofs of the following results, and we will focus our results in this paper on the two cases above. Given our space of test functions S(G), S ?(G) is the set of distributions. We can extend the Fourier transform to S ?(G) in a standard 16 way by Parseval?s formula. For T ? S ?(G), FT is defined on F(S(G)) by FT (F?) = T (?) ?? ? S(G). (1.28) Lemma 6. Let T ? S ?(G) be a distribution. Assume for all ? ? S(G) with ? ? 0, T (?) ? 0. Then T is a positive measure on G. Proof. Assume that T is a positive distribution, as given above. T is a measure if and only if it is a positive linear functional on C0(G). Because S(G) is dense in C0(G), and T is positive, all that is necessary is to show that it is continuous with respect to the supremum norm. Assume to the contrary, that there is a sequence (?k) n k=0 of positive functions in S(G) that is uniformly bounded by a constant M , but for which T (?k) ? ?. Assume without loss of generality as well that each ?k has compact support. We lose no generality from doing this because compactly supported functions are dense in both S(G) and C0(G). Define a sequence of smooth compactly supported bump functions ?k ? S(G) such that for all k, 0 ? ?k ? 1, ?k ? 1 on the support of ?k, and all derivatives of ?k are uniformly bounded (Note that this construction is valid for Schwartz functions). Now see that T (?k?k) = T (?k)??. But since 0 ? ?k ?M, ?k(M ? ?k) ? 0 ?k ? N. (1.29) Because the derivatives of ?k are uniformly bounded we have that 17 ? sup? ?T (M?)? 1, there exist positive definite functions defined on {(k1, ? ? ? , kn) ? Zn | |ki| ? M} such that they cannot be extended to positive definite functions on Zn. And another decade later it was Rudin who expounded and extended their result to Rn [52]. Theorem 11. (Rudin) For n > 1, there exist continuous positive definite functions f : In ? C, where I = (?a, a) is a symmetric interval, such that f cannot be extended to a positive definite function on Rn. The proof is based on a relationship between positive definite functions and sums of squares of polynomials, a problem that had been studied by Hilbert in the 19th century. Hilbert discovered that there exist nonnegative polynomials of three 28 variables that are not sums of squares, and later examples for n = 2 were found as well [33]. Theorem 12. (Hilbert) For a given n and d ? N, every nonnegative polynomial of degree d in n variables can be expressed as a finite sum of squared polynomials if and only if one of the following are true: ? n = 1 ? d = 2 ? n = 2 and d = 4. Given a finite set ? ? Zn, define a subset of the trigonometric polynomi- als which are supported on V = ? ? ?. X? will be the set of all trigonometric polynomials of the form ? f(?) = c(x)e?2?i??x. (1.57) x?V We will call these ?-polynomials. It is obvious that the coefficcients c(x) can be retrieved from the Fourier transform of f . Define P? as the subset of X? of nonneg- ative trigonometric polynomials. Then let Q? be the set of sums of squares?that is functions of the form ?J ?? ?2f(?) = gj(?)? , gj ? X?. (1.58) j=1 Immediately we have that Q? ? P? ? X?. The goal is to establish a correspondence between Q?, P? and the positive def- 29 initeness of c(?). Then Hilbert?s result will tell us when positive definite extensions are possible. With that in mind refer to PD(V ) and PD(Zn) as the set of positive definite functions on V and Zn respectively. Given a function ? defined on V such that ?(?x) = ?(x), we can define a real functional on X? by ? L?(f) = ?(x)f?(x). (1.59) x?V In fact, since X? is finite dimensional, it is easy to see that every linear functional on X? is of this form for some ?. The following two results establish the connection to positive definite sequences. Proposition 3. A function ? on V is in PD(?) if and only if L?(f) ? 0 for every f ? Q?. Proof. Let ? g(?) = c(x)e?2?i??x. (1.60) x?? Then we have ?? ?? ?2g(?) = c(x)c(y)e?2?i??(x?y) x?,y?? L (|g|2? ) = c(x)c(y)?(x? y). x,y?? So L (|g|2? ) is positive regardless of the choice of g if and only if ? is positive definite. By the linearity of L?, the same is true for any f ? Q?. Proposition 4. A function ? defined on V can be extended to a member of PD(Zn) 30 if and only if L?(f) ? for all f ? P?. Proof. If ? ? PD(Zn) then by Bochner?s theorem there is a nonnegative measure ? ?M(Tn) such that ? ?(x) = e2?i??x d?(?). (1.61) Tn If f ? X? then by Parseval?s formula we have ? ? L?(f) = f?(x)?(x) = f d?. (1.62) Tnx?V Then L?(f) ? 0 if f ? P?. For the other implication, suppose L?(f) ? 0 for all f ? P?. Without loss of generality assume |f | ? 1. Because L?(1) = ?(0) we have? that L? ?(f) = L?(1? f)? ?(0) ? ??(0) and L?(f) = ?(0)? L?(1? f) ? ?(0), so ?L?(f)? ? ?(0). If ?(0) = 0 then L? = 0 on X? and ? = 0 on V , and the result is true. Otherwise without loss of ?gene?rality assume that ?(0) = 1. Then L? is a bounded linear functional with ?L ?? = 1, and by the Hahn-Banach theorem it extends to a linear functional of norm 1 on C(Tn), which is identified with a measure ? ?M(Tn). We have that for f ? X?, ? L?(f) = f(??) d?(?). (1.63) Tn Because 1 = L?(1) = ?(Tn) ? ???TV = 1, we can conclude that ? ? 0. We have 31 that ? ?(x) = L (e2?i??x? ) = e 2?i??x d?(?), (1.64) Tn so ?? is an extension of ? to Zn. By Bochner?s theorem it is positive definite, so the result is proved. Then we complete this process by coming to the following result. Theorem 13. (Rudin) Given a finite set ? ? Zn, P? = Q? if and only if every ? ? PD(?) can be extended to a function in PD(Zn). Proof. The proof will be in two parts. First we must show that Q? is closed in P?, and then we go on to prove the result. ? Say that dimX = |V | = d. Let r > d, and f = r? i=1|g | 2 i , where gi ? X?. There is a nontrivial set of (?)ri=1 ? R such that ?r ? |g |2i i = 0. (1.65) i=1 Assume without loss of generality that ?i ? ?j whenever i ? j. By solving (1.65) for |g |2r , we get that ?r?1 ( ) f = 1? ?i |gi|2 . (1.66) ? i=1 r So f is a sum of r?1 squares. By repeating this process iteratively, we can conclude that each f ? Q? is a sum of at most d squares. Now find a sequence of (fn) ? n=1 ? Q? such that fn ? f ? X? uniformly. For 32 each fn there are d ?-polynomials (gjn) d j=1 such that ?d ?? ??2fn = gjn . (1.67) j=1 Each fn is uniformly bounded, hence gjn are also. We can conclude that there is a subsequence (ni) ? i=1 such that for each 1 ? j ? d and for all x ? ?, there exist cj(x) such that lim g?jn (x) = cj(x). (1.68) ?? ii Then it is clear that for ? g (?) = c (x)e?2?i??xj j , (1.69) ? ? ? x??2 f = ? ?j gj . We conclude that f ? Q?, thus Q? is a closed subset of P?. Now we will proceed to the second part of the proof. Assume that P? = Q?. Let ? ? PD(?). We know from proposition 4 that L?(f) ? for all f ? Q?. Since Q? = P?, we have also L?(f) ? 0 for f ? P?, so by proposition 4, ? can be extended to a function in PD(Zn). For the other direction, assume that P? 6= Q?. Then there exists some f0 ? P? such that f0 ?/ Q?. Because Q? is a closed convex set, the Hahn-Banach theorem guarantees the existence of a hyperplane in X? that separates Q? from f0. In other words there is a bounded linear functional L such that L(Q?) ? 0 and L(f0) < 0. Because X? is finite dimensional, there exists a function ? on V such that L = L?. By propositions 3 and 4, we can conclude that ? ? PD(?) but can?t be extended to a function in PD(Zn). The proof is complete. 33 Rudin contributed once more in 1970 to extend the one-dimensional theorem to one for radially symmetric functions [53]. Theorem 14. (Rudin) If f is positive definite on B(0, r) ? Rn and radially sym- metric, that is f(x) = ?(|x|), then f can be extended to a positive definite radially symmetric function on Rn. 1.9 My Contributions The remainder of this dissertation will be focused on the following algorithm: ?# = argmin???TV such that ??(m,n) = ymn, (1.70) ??M(T2) ?? is the Fourier transform and ymn is the a priori known values of the Fourier transform, for ?N ? m,n ? N. In Chapters 2 and 3, I will introduce concepts in measure theory and Choquet theory respectively to address the motivating question: which measures can be recovered uniquely through (1.70)? We have seen some results that give unique recovery for certain kinds of dis- crete measures, so in Chapter 2 we create tools to try and recover measures which are not discrete, focusing on measures defined on smooth manifolds. There are dif- ficult limitations to this task, though we generate ideas that may be fruitful with future research. But the difficulties provide motivation not only to focus in on the relationship between delta measures and (1.70), but if possible to try and represent all solutions in terms of discrete measures. 34 Chapter 3 discusses a field of convex geometry called Choquet theory, which is concerned precisely with representation problems of this type. We will see that delta measures form the so-called extreme points of the total variation norm, and as such are well suited to be solutions to the algorithm in (1.70). In Chapter 4 we use results from Choquet theory to state our main results. Our Theorem 30 says that if we restrict to positive solutions, any initial data y admits at least one solution ?# which is discrete, and thus our only hope for unique recovery can come from discrete measures. Theorem 31 generalizes this result to signed and complex measures. Then our Theorem 33 gives a novel sufficient condition on the initial data y and integers N such that (1.70) yields a unique result. 35 Chapter 2: Measure Theory and Surface Measures 2.1 Definitions and Notation One thing we are interested in is whether it is possible to find application to a broader category of measures than discrete ones. For example, is there a circum- stance in which we can use total variation methods to recover a surface measure? Or perhaps a singular measure like the Cantor measure? As it turns out, there are some fundamental difficulties inherent in the geometry of algorithm (1.70) which make these problems difficult. In order to explore these questions more fully will require a deeper understanding of measure theory. First some preliminaries on geometric measure theory. The field was developed in the mid 20th century primarily to work on problems related to energy-minimizing surfaces. Early pioneers include Wendell Fleming and Herbert Federer, the latter of whose textbook remains a fundamental source on the field today [29]. As the theory developed it found application to many more problems in both analysis and geometry and has become a staple of the mathematical landscape. Fundamental concepts include rectifiable sets, which are generalizations of sets witch admit tangent spaces, and integral currents, which generalize the ideas of manifolds [47]. Fourier analysis has played an ever increasing role in the research of modern 36 geometric analysts, such as Kenig and Toro, Hofmann, Mitrea and Taylor, Hofmann, Martell and Uriarte-Tuero [47]. Two excellent sources on the development of Fourier analysis in geometric measure theory are by Kahane and Salem [35], and a more recent book from Mattila [47]. While the specifics of these works aren?t of interest to us for the most part, the machinery involved is, particularly results which might shed insight on the behavior of the Fourier transform. We hope to use those results in conjunction with our work on positive definite measures. As going forward I will begin to look at more complicated measures I will make some definitions in order to be precise with terminology. For the n-dimensional Lebesgue measure in Rn (or Tn, when applicable), I will write Ln. I refer sometimes to a surface measure, which I will denote ?K , where K is the image of a rectifiable curve if n = 2, or some (n? 1)-dimensional surface for general n. Frequent choices are K = Sn?1 for the surface measure on the unit sphere, or K = {(x1, ? ? ? , xn) | xn = 0}. The surface measure may be defined in a few equivalent ways. Here I will say that it may be defined by an isometry from a rectifiable curve to (R,L1). Definition 4. Let ? : [0, 1]? R2 be a unit speed nonintersecting curve with image K. ? (A) = L1(??1K (A)). This construction is also called a push-forward of L1 under ?. It turns out that this definition is equivalent to the 1-dimensional Hausdorff measure restricted to K also, which I will define. In addition, in the cases where K = Sn?1 or K = {(x1, ? ? ? , xn) | xn = 0}, K is a topological group under rotations and horizontal translations respectively, and 37 therefore has a unique Haar measure. Because the Lebesgue measure?and therefore the Hausdorff measure as well?is invariant under each of those transformations, it can be seen easily that the unique Haar measure is equivalent to the Hausdorff measure definition up to a constant. Denote by M(X) the set of Radon measures. We say a measure ? on X is Radon if it is Borel-regular and locally finite. ? is Borel regular if for each A ? X, there is a Borel set B ? A such that ?(B) = ?(A). If in addition |?(X)| < ?, then we say that ? is bounded and we denote the set of bounded Radon measures Mb(X). In general for this chapter I refer to positive real-valued measures, but for later purposes I use this notation for complex-valued measures as well as specified. In addition, in this chapter it becomes necessary at times to distinguish when a measure is bounded or unbounded. In all other chapters we will only be concerned with bounded measures, so I will use the notationM(X) to mean the set of bounded measures. Definition 5. The Hausdorff measure Hs in Rn is defined by Hs(S) = limHs ? ? (S), (2.1) ? 0 where for ? > 0, ?? ? Hs?(S) = inf?? ? ??(s)2?s ?d(Ej)s ? S ? ?jEj, d(Ej) < ?? . (2.2) j d(A) is the diameter of A, and ?(s) is a positive constant, which may be scaled so 38 that when s is an integer it agrees with the volume of the s-dimensional unit ball. If they agree, then in R2, Ln = Hn. Likewise, the Hausdorff dimension of a set S ? Rn is dimS = inf{s | Hs(S) = 0} = sup{s | Hs =?}. (2.3) 2.2 Energy Integrals Understanding Hausdorff measures is important, but what we are really inter- ested in are more general measures, defined on lower-dimensional sets. We would like a natural way to relate a measure supported on a set to the dimension of its support. As it turns out, one of the first central results in geometric measure the- ory does just that. The goal of the following lemma is to aid in computing lower bounds for Hausdorff dimensions. From the definition of Hs, it seems that comput- ing lower bounds might be much more difficult that upper bounds, because we must prove bounds for arbitrary coverings. Frostman?s lemma gives us a way to pseudo- approximate from below, by showing the existence of measures that ?behave well? on balls that approximate the set from below. Theorem 15. (Frostman?s Lemma) For a Borel set S ? Rn, Hs(S) > 0 if and only if there is a measure ? ?M(Rn) with supp(?) ? S such that ?(B(x, r)) ? rs ?x ? Rn, r > 0. (2.4) 39 I will not prove the theorem here, but I will point out that one direction is immediate. If we have such a ? satisfying the condition, then given any covering of S with balls (Bj), ? ? d(Bj) s ? ?(Bj) ? ?(S) > 0. (2.5) j j An immediate consequence of Frostman?s lemma is that dimS = sup{s | ?? ?M(S) such that (2.4) holds}, (2.6) which is a demonstration of the lemma?s usefulness in computing Hausdorff dimen- sions. We now have a way to relate geometry of sets and measures. A natural question is whether we can go in the reverse direction, and find the ?dimension? of a measure in some sense? The answer is that it is much more difficult, but there are some things we can do. Begin by defining the s-energy of a measure, which will quantify the criterion given in (2.4). ?? Is(?) = |x? y|?s d?(x) d?(y). (2.7) We can relate this to equation (2.4) through a change of variables. ? ? ? | ? ?(B(x, r))x y|?s d?(y) = s dr (2.8) 0 r s+1 If ? satisfies (2.4), then we have that if 0 < t < s, and if ? is compactly supported 40 with diameter ?M , ?? M ??(B(x, r)) M I (?) = t dr d?(x) ? rs?t?1t dr 0 and construct A = {x | |x ? y|?s d?(y) < M} such that ?(A) > 0. Then for all x, r > 0, ? (B(x, r)) ? 2sMrsA . So ?A satisfies (2.4). We can conclude with a new full characterization of the Hausdorff dimension in terms of the energy of supported measures. dimS = sup{s | ?? ?M(S) such that Is(?) 0, then Is(LnA) 0. In fact, because of how ? was defined, an immediate consequence of this fact is that HsC is equivalent to ? up to a constant. Lastly, we can combine this result with equation (2.9) to conclude that It(?) 0. (2.27) |x|?? Proof. Assume for contradiction that such a measure ? ? M(C) exists, such that lim ??(x) = 0. Without loss of generality we may actually assume less; that the Fourier series coefficients ??(k), k ? Z t?end to 0. Let ? ? S(R) be a Schwartz function such that supp(?) ? (1/3, 2/3) and ? = 1. For j ? N, define ?j(x) = ?([3jx]), where [x] represents the fractional part of x. First, I claim that for all x ? C, j ? N, [3jx] ? C. This is clear by recognizing that x ? [0, 1] is in the Cantor set if and only if its base-3 representation contains no 1?s. Multiplication by 3 in base-3 is represented by shifting each digit to the left, so for any j ? N, the base-3 representation of [3jx] also contains no 1?s, and is still in C. Using this fact, we know that for all j ? N, supp?j?C = ?. We can represent ?? 2?ix3j?j(x) = ??e k. (2.28) k=?? Therefore ??j(3 jk) = ??(k), and all other Fourier coefficients are 0. By Parseval?s 48 theorem, we have ? ?? ?? 0 = ?j d? = ??j(k)??(k) = ??(k)??(3 jk) k?=?? k=?? = ??(0)??(0) + ??(k)??(3jk) |k|>0 Because the sum ? ??(k)??(3jk) (2.29) |k|>0 converges absolutely, and limj sup|k|>0 ??(3 jk) = 0, the sum tends to 0 as j ? ?. But recognize that ??(0)??(0) = ?(C), and hence we have shown that ?(C) = 0, which is a contradiction. I have demonstrated the proof only for the standard 1/3-Cantor set, but the proof is easily generalizable to any number of other Cantor-type sets, including those formed by removing intervals of ratio other than 1/3. While we are on the subject I will take the opportunity to do one more cal- culation, for the Fourier transform ??. For this, return to the representation of C as the numbers which contain no 1?s in their base-3 representation. With that in mind, we can construct C as a closed union C = ??k=1Ek, where {?k ? ? ?E = c 3 l ? }k l ? cl = 0 or 2, ?l . (2.30) l=1 Then define the measure ? ? = 2?kk ?e. (2.31) e?Ek 49 It is clear that ?k converges weakly, so by the same symmetry arguments we used in defining ?, we can show that ?k ? ? weakly. Then on the frequency spectrum, we can write ? ? ? ?? (n) = 2?k e?2?ien k ?l = 2?k e?2?in l=1 el3k , (2.32) e?Ek e?Ek where e is the lthl decimal point of e in base-3. A basic calculation shows then that ?k 1 + e?4?i3?jn ?k ?j ??k(n) = = e ?2?i3 ncos(2?3?jn). (2.33) 2 j=1 j=1 ? Noting that k ?j ?kj=1 3 = (1? 3 )/2, ?? ?? (n) = e??i(1?3 ?k)n k cos(2?3 ?jn). (2.34) j=1 Let k ??, and we finally have ?? ??(n) = e??in cos(2?3?jn). (2.35) j=1 ? It is easy to verify that ?? does not tend to 0 as n ? ?. For k ? N, ??(3 k) = ? ? k?jj=1 cos(2?3 ) is constant with respect to k and nonzero. Given the above examples, it seems perhaps difficult to construct Salem sets. Indeed there are few explicit constructions, especially with Hausdorff dimension not an integer. However probabilistic methods are more fruitful. Particularly there are good results for sets that are produced from Brownian motion [34]. 50 Theorem 18. If ? : [0,?)? Rn is a random Brownian motion, and S ? [0,?) is a Borel set, then the following is true almost surely. 1. If n ? 2 or dimS ? n/2, then dim?(S) = 2 dimS. 2. If n = 1 and dimS > 1/2, then L1(?(S)) > 0. 3. ?(A) is a Salem set. The only known explicit examples of Salem sets of non-integer dimension are of a form discovered by Kaufman, which takes advantage of famous results on Dio- phantine approximation [36]. Let ? > 0 and let E be the set of x ? R such that for infinitely many rationals p/q, |x? p/q| ? q?2??. (2.36) E is a Salem set with dimension 2/(2 + ?). Recent results have generalized this to Rn, so constructive Salem sets are now known to exist in every dimension [32]. 2.6 Fourier Decay of Surface Measures Here I go through an important derivation of the Fourier transform of the surface measure ? ? M(Rn), defined by ? = Hn?1Sn?1 . First consider a generic radial function. It is a basic fact from calculus that if f ? L1(Rn), then ? ? ? ? f(x) dx = f(rx)rn?1 dr d?(x). (2.37) Rn Sn?1 0 51 We will need another related formula for rewriting the surface integral. Let x0 ? Sn?1 be fixed and define S? = {x ? Sn?1 | x ? x0 = cos(?)}. S? is an (n ? 2)- dimensional sphere lying in Sn?1, with radius sin(?). Call the surface measure on S by Hn?2 = ?n?2? S ? , and note that ? n?2 ? (S?) = b(n) sin n?2 ?, where b(n) is the ? surface area of the unit (n? 2)-ball. Then we can write ? ? ? ? f d? = f(x) d?n?2? (x) d?. (2.38) Sn?1 0 S? Then for a radial function f(x) = ?(|x|), we can compute for the Fourier transform using these two identities: ? ? ? ? f?(rx0) = f(y)e ?2?iy?rx0 dy = ?(s)sn?1 e?2?iy?rx0 d?(y) ds. Rn 0 Sn?1 We can take advantage of the fact that for y ? S , e?2?iy?rx0 = e?2?ir cos ?? is constant with respect to y. ? ? ? ? f?(rx0) = ?(s)s n?1 e?2?ir cos ?b(n) sinn?2 ? d? ds. (2.39) 0 0 With the change of variable t = ? cos ?, this becomes ? ? ? 1 ?(s)sn?1 e2?isrt(1? t2)(n?3)/2 dt ds, (2.40) 0 ?1 which is in the form of a Bessel function. Using the definition of Bessel functions 52 Jm : [0,?)? R, ? (u/2)m 1 J (u) = eiut(1? t2)m?1/2m dt, (2.41) ?(m+ 1/2)?(1/2) ?1 we can substitute and obtain the formula for f? , ? ? f?(rx) = ?(s)sn?1?c(n)(rs) ?(n?2)/2J(n?2)/2(2?rs) ds 0 ? = c(n)r?(n?2)/2 ?(s)J n/2(n?2)/2(2?rs)s ds. 0 We can now use this formula to derive some decay estimates. Using the following two classical properties of Bessel functions, Jm((t) ? C(m) )t ?1/2 t > 0, (2.42) d tmJm(t) = t mJm?1 (t) , (2.43)dt we can quickly calculate the Fourier transform of the unit ball. ? c(n) ?? ??B(0,1)(x) = |x|(2?n)/2Jn/2(2?|x|s)sn/2? 2?|x| ? s=1 = C(n)|x|?n/2Jn/2(2?|x|). Using (2.42), this satisfies the decay ??B(0,1)(x) ? C(n)|x|?(n+1)/2. Then finding a bound for the surface measure ? on Sn?1 is as simple as recognizing that we can 53 define it as a weak limit of the functions ??1?B(0,1+?)\B(0,1), ? ? 0. (2.44) Then we get the formula ? 1+? ??(x) = lim c(n)|x|?(n?2)/2??1 J n/2(n?2)/2(2?rs)s ds ??0 1 = c(n)|x|?(n?2)/2J(n?2)/2(2?|x|). This has a decay rate of ??(x) . |x|(1?n)/2. 2.7 Discussion Our goal is to find options for recovering non-discrete measures. We know that certain surface measures cannot be recovered by means of total variation min- imization [6]. I consider two main candidates for accomplishing this: we may either minimize for something other than total variation or limit the optimization to a subspace of M(T2). If we wish to recover surface measures a natural question is: what is a possible subspace S ?M(T2) which contains some or all 1-dimensional surface measures but no measures of dimension 0? We can rule out S being the set of all surface measures of rectifiable curves, because the weak-* closure of this set is M(T2)?for example, a sequential limit of surface measures on concentric circles with decreasing radius will converge to a point mass. Therefore nothing is gained by limiting optimization 54 to S; we must choose a subset which is closed. Limiting the decay rate is also tempting. Take S = {? ? M(T2 | ??(x) ? C|x|?1/2}, for some aptly chosen C > 0. We know from our derivation in the prior section that S will contain at least some surface measures, and cannot include any measures with support smaller than 1 dimension. The problem is with the choice of C. Any solution to (1.70) will necessarily fall on the boundary of S, so the choice of C has an effect on the outcome of the algorithm. This may be a good option. If, for example, we know that the desired solution is a circle with prescribed radius, we may find the correct value of C by direct calculation by means of the formula in the previous section. However this method requires a large amount of prior knowledge which may not be realistic. One more option to explore is using the energy integrals Is as an alternative or addendum to the total variation norm. Consider an algorithm, ?# = argmin???TV + ?Is(?) such that F (?) = y, (2.45) ??M(T2) for some appropriately chosen s and ? > 0. This algorithm would exclude any discrete measures as solutions, but beside that fact the results are somewhat unclear. One problem is that for ? a surface measure, I1(?) =?, so we must choose s < 1, which will also allow potentially for solutions with dimension between 0 and 1. Otherwise this territory is fairly unexplored to the author?s knowledge, and I would recommend it as a topic of future research. The relationship between total variation, Hausdorff dimension and the Fourier 55 transform is rich and complex. There are relationships which may be exploited, but dealing with them takes caution. The moral of this chapter is that it is difficult to divorce the total variation norm on a continuous space from the discrete measures. Particularly to note is the fact stated previously that a discrete measure may be expressed as a limit of surface measures or continuous measures (approximate iden- tity), but the converse is not true. In the next chapter I will explore the question of why the discrete measures are unique in this regard, and I get a result that confirms our expectations that unique recovery of non-discrete measures is impossible with total variation minimization. 56 Chapter 3: Choquet Theory 3.1 Choquet?s Theorem Choquet theory is concerned with convex sets and their extreme points. Ex- treme points can provide a characterization of convex sets, in particular if they are compact. The content of this section is focused on results that provide representation theorems of convex sets, and points within convex sets, centered on their extreme points. For a motivating example I begin with a classical theorem of Carathe?odory [19]. Theorem 19. Let a point x fall in the convex hull of a set E ? Rn. x can be written as a convex sum of at most n+ 1 points in E. An easy corollary is that if E consists of n+1 affinely independent points (the points xk ? x1 are linearly independent for k > 1), then the convex sum is unique. In this case the convex hull of E is a simplex, and E are its vertices. From this it is easy to see that the bound n + 1 is the best that can be done. If E is affinely independent and x lies in the interior of E, then x is affinely independent from any strict subset of E, therefore it can only be written as a convex combination of all n+ 1 elements of E. 57 Definition 6. A point x in a convex set S in a locally convex topological vector space is an extreme point if for any a, b ? S, a 6= b, and for any t ? (0, 1), x =6 ta+(1? t)b. Denote the set of extreme points of S by ex(S). Definition 7. Let E be a subset of a locally convex topological vector space. The convex hull of E is the smallest convex set containing E. The closed convex hull of E is the smallest closed set containing E. Extreme points are in a sense generators of compact convex sets. Because they cannot be written as convex combinations of other elements, they carry essential in- formation about the set. The Krein-Milman theorem tells us that they are sufficient to characterize their underlying sets as well [39]. Theorem 20. (Krein-Milman) If X is a locally convex topological vector space and S is a compact convex subset of X, then S is the closed convex hull of its extreme points. Results of this sort in finite dimensions date back to the early 20th century, first attributed to Minkowski and Carathe?odory [50]. Carathe?odory?s theorem 19 suggests the idea of simplices as a basic archetype of a convex set. As we will see in section 3.3, Choquet theory develops tools to generalize the idea of the simplex to infinite dimensional space. The insight of Choquet to convex geometry is generalizing the idea of a convex combination to infinite-dimensional domains, by considering them as integrals rather than sums. Now instead of being represented by the nonnegative weights of a finite 58 sum, elements of a convex set can be represented as an integral over a nonnegative measure supported on a generating set. Definition 8. Let X be a locally convex topological vector space and S be a nonempty compact subset. If ? is a probability measure on S, we say that a point x ? X is represented by ? if ? f(x) = f(t) d?(t) for all affine functions f on X. (3.1) S We say that f : X ? R is affine if for all x, y ? X, ? ? R, f(?x+ (1? ?)y) = ?f(x) + (1 ? ?)f(y). Note that any point x is trivially represented by the delta measure ?x. For example, in the finite dime?nsional case we might take x?? R N , and stan- dard basis E = (e Nn)n=1. Let x = n ?nen, where ?n > 0 and n ?n = 1. This mean?s that x is a convex combination of basis vectors. Alternatively, we can let ? = n ?n?en be a measure on RN . For all y? ? (RN)?, ? ?N y?z d?(z) = ?ny ?en = ?y,x? . (3.2) E n=1 Hence x is represented by the probability measure ? in a way that is analogous to writing it as a convex combination. Note also that the?restriction that ? be positive and norm one is equivalent in this case to ?n > 0 and n ?n = 1. For an example that is less trivial, I want an infinite dimensional space. Let X =M (T2b ), the set of bounded complex measures on the Torus, which is relevant 59 since I will return to it when I discuss superresolution. Let S = {?x | x ? T2}, and ? be the 1-dimensional surface measure on the set {(x1, x2) | x2 = 0}. It is easy to see that ? cannot be written as a finite convex combination of delta measures, but I wish to show that it can be represented by a probability measure. Proposition 5. Let ? ?Mb(S) be defined by ?(A) = ?({x ? T2 | ?x ? A}). Then ? represents ?. Proof. First observe that ? is well defined, and that it is a positive measure, with ??? = 1. Let?f ? X ? be a continuous linear functional onM (T2b ). We wish to show that f(?) = f d?. Note that because of how ? is defined, (S,?) and (T2, ?) are S isomorphic as measure spaces under the isomorphism ?x ?7 x. We can conclude that ? ? f(?) d?(?) = f(?x) d?(x). (3.3) S T2 Now consider the sequence of measures 1 ?N ?N = ?( n ,0). (3.4)N N n=1 ?N converges to ? weakly. We can conclude that ? ? f(?x) d?(x) = lim f(?x)(d?N(x)T2 N?? T2?N )1 = lim f ?( nN?? N ,0N ) n=1 (3.5) = lim f(?N) N?? = f(?). 60 Therefore ? represents ?. One thing to note is that the final inequality requires that f is continuous with respect to the weak topology on M 2b(T ). This is significant, as it is a requirement for Choquet?s representation theorem for S to be compact, which it is in this case only under the weak topology. The ultimate goal of this machinery is to say that we can actually represent all points in a convex set by measures on its extreme points, which was achieved by Choquet in 1956 [22]. Theorem 21. (Choquet-Bishop-De Leeuw) Let S be a compact convex subset of a locally convex space X. Let x ? S. Then there exists a probability measure ? ? Mb(X) such that ? vanishes on every Baire subset of S that doesn?t contain an extreme point of S, and that for any bounded linear functional f on X, ? f(x) = f d? (3.6) S Choquet initially proved the theorem under the slightly stricter case where X is metrizable. Bishop and De Leeuw found the above strengthening three years later. 3.2 Application to Completely Monotone Functions Choquet?s theorem has found application to a variety of subjects, providing elegant proofs of some well-known representation theorems [3, 24, 28, 50]. I will go 61 through an example here to provide context for the theory and demonstrate its wide application, and to illustrate the recurring themes of convex geometry that I wish to exploit. A function f : (0,?) ? R is said to be completely monotonic if it is smooth and satisfies (?1)nf (n) ? 0 ?n ? N. (3.7) For example, e??x and x?? are completely monotonic. There is a theorem of Bern- stein that characterizes completely monotone functions in terms of exponentials of the form e??x [7]. This result is particularly interesting in the context of the Laplace transform. In a certain sense it can be viewed as a version of Bochner?s Theorem for the Laplace transform. For more details see Widder?s comprehensive exposition of the Laplace transform [60]. Phelps gives a proof of his theorem for bounded completely monotonic functions, using methods from Choquet theory [50]. Theorem 22. (Bernstein) Let f be a completely monotone function. Then there exists a unique nonnegative measure ? ?M[0,?] such that ?([0,?]) = f(0+) and ? ? f(x) = e??x d?(?). (3.8) 0 The rest of this section will be dedicated to a proof of Bernstein?s theorem. The idea is clear from the representation formula. We wish to find an adequate topology on the set of completely monotone functions and a subset which is compact and convex, so that the extreme points are precisely the functions e??x, ? ? [0,?]. 62 Let X be the convex cone of bounded completely monotonic functions. Note that because f ? X is necessarily strictly nonnegative and nonincreasing, f is bounded if and only if f(0+) < ?. Define S ? X as the set of all such functions for which f(0+) ? 1. Choose the topology such that fk ? f if and only if for all n ? N (n), ?fk ? f?? ? 0. By definition, this is a locally convex topology. We wish to show that in this case, S is compact. We can construct the chosen topology of uniform convergence with a countable family of seminorms, p (f) = sup{|f (k)m,n (x)| | 1/m ? x ? m, 0 ? k ? n}. (3.9) By a standard construction, X is metrizable, with the metric ?? pm,n(f ? g) d(f, g) = . (3.10) 1 + p (f ? g) m,n=1 m,n Under this definition, a set is bounded if and only if it is bounded in each semi- norm. Therefore, given a bounded sequence of functions, the sequence is uniformly equicontinuous on each derivative and each compact set. An application of the Arze?la-Ascoli theorem, along with a diagonalization argument shows that the se- quence has a convergent subsequence. Therefore every closed and bounded set is compact. S is clearly closed, so it remains to show that S is bounded. Lemma 23. Let S = {(?1)nf (n)n | f ? K}. Let a > 0 and n ? 0. Then Sn is 63 uniformly bounded above on [a,?) by a?n2(n+1)(n/2). Proof. The proof is by induction. For n=0, the lemma holds because for any f ? K, f(0+) ? 1 and f is strictly nonincreasing. Assume for arbitrary k ? N that the lemma holds for Sk. Let a > 0. By the mean value theorem we have the existence of a point c ? [a/2, a] such that a f (k+1)(c) = f (k)(a)? f (k)(a/2). (3.11) 2 Then using monotonicity of functions in Sk and the induction hypothesis, we have (a/2)?k2(k+1)(k/2) ? (?1)kf (k)(a/2) (3.12) ? (?1)k+1af (k+1)(c) (3.13) 2 ? (?1)k+1af (k+1)(a). (3.14) 2 The inequality follows, and because of the monotonicity of f ? Sk, it will hold for x ? a as well. We have now shown that we have an adequate setting for Choquet?s theorem to apply, and all that remains is to characterize the extreme points of S to obtain the desired representation theorem. Lemma 24. The extreme points of S are precisely the functions of the form f(x) = e??x, where 0 ? ? ? ?. Proof. A straightforward computation shows that any such function falls in S. First 64 I will show that any extreme point must take this form. Let f be an extreme point of S. Let x0 > 0 and define u(x) = f(x + x0) ? f(x)f(x0). I claim that f ? u ? S. Note first that (f + u)(0+) = f(0+) + f(x0)? f(0+)f(x0) = f(0+)(1? f(x0)) + f(x0) ? 1, (f ? u)(0+) = f(0+)? f(x0) + f(x0)f(0+) = f(x0)f(0 +) + (1? f(x0)) ? 1. And for k ? N, |u(k)(x)| = |f (k)(x+ x0)? f(x0)f (k)(x)| ? max{|f (k)(x+ x0)|, |f(x0)f (k)(x)|} ? (?1)kf (k)(x). Therefore (?1)k(f ? u)(k)(x) ? 0, and f ? u ? S. Because we made the assumption that f is an extreme point of S, u ? 0. Hence for any x, x0 > 0, f(x + x0) = f(x)f(x0). Because f is continuous, it must be of the form e??x. Because of the restriction that f ? ? 0, ? ? 0. Note that I am allowing ? =? for the case f ? 1. 65 Finally, what remains is to show that every function e??x is an extreme point. We can solve this by taking note of the mapping f(x) 7? f(rx) for r > 0. This mapping is a bijection of S onto itself and preserves convex combinations, so it takes extreme points to extreme points. Since the Krein-Milman theorem implies that there is at least one nonconstant exponential function e??x, which is an ex- treme point of S, we can generate any such exponential function e??rx by the above mapping to show that it is an extreme point. The lemma is proved. It follows from Choquet?s theorem th?at for any f ? S there exists a probability measure ? ?M(ex(S)) such that f(x) = d?. Now, because the correspondence ex(S) T : [0, 1] ? ex(S) given by ? 7? e??x is actually an homeomorphism, we have a natural correspondence fromM[0,?] toM(ex(S)). We can identify ? ?M(ex(S)) with T? (?) ?M[0,?] by T? (?)(A) = ?({e??x | ? ? A}) and it turns out that ? ? f(x) = e??x dT? (?)(?), (3.15) 0 as intended. To show?uniqueness, one ?needs only to use the Stone-Weierstrass theorem.? ? Suppose that e??x d?(?) = e??x d?(?) for all x > 0. If A is the subspace of 0 0 C[0,?] generated by functions of the form ? 7? e??x, then ? = ? as functionals on A. Since A separates points of [0,?], it is dense, and therefore ? ? ?. 66 3.3 Uniqueness and the Choquet Simplex While Choquet?s theorem is a great tool for proving existence, it doesn?t say anything about uniqueness. In fact we don?t need to go further than the finite dimensional case to see examples of the failure of convex representations to be unique. For example, the unit square in the plane has extreme points (?1,?1), and the origin is a convex combination of either of the two diagonal vertices. In fact any point in the interior has at least two convex representations. The unit ball by contrast is in a sense ?maximally nonunique,? in that any point in the unit ball is either an extreme point or has uncountably many representations as convex combinations of extreme points. The prototypical example of a set where Choquet-type representations are unique is a simplex. Proposition 6. Each point in a finite-dimensional simplex has a unique represen- tation as a convex sum of its vertices. Proof. Proceed by induction. For k = 0 is trivial, as the 0-dimensional simplex is a single point. Assume the hypothesis is true for k. Refer to the k-dimensional simplex as ?k, with vertices {x , ? ? ? ,x }. ?k+11 k is formed by adding a new vertex xk+1, linearly independent from ? k, and taking the convex hull of the result. Let x ? ?k+1, without loss of generality x 6= xk+?1. Since x is in the convex hull of the vertices {x1, ? ? ? ,xk+1}, we can write x = k+1m=1 cmxm. Consider the line ` = {tx + (1 ? t)xm | t ? R}. ` passes through both the points xk+1 and x, therefore 67 it passes through the face of ?k+1 opposite x kk+1, which is identified with ? . The point x0 ? ` ??k has for some t ? R, ??? ?k+1 ?kx0 = t cmx ?m + (1? t)xk+1 = (1 + ck+1) cmxm, (3.16) m=1 m=1 which is in ?k. By the induction hypthesis, (c1, ? ? ? , ck) are unique, therefore xk+1 is also unique. As an obvious consequence of this, the vertices of a simplex are also shown to be its extreme points. Carathe?odory?s theorem can be seen as an immediate con- sequence of this result as well. Any n-dimensional polyhedron may be decomposed into simplices. If the polyhedron has more than n + 1 vertices, then the decompo- sition is not unique. For any point in the interior of the polyhedron, it has at least one different Choquet representation for each choice of simplicial decomposition. The generalization of this concept is the Choquet simplex, a construction that generalizes the notion of simplices to infinite-dimensional vector spaces in a way that guarantees that points in Choquet simpleces have unique representations as measures on the extreme points. Here I will define the Choquet simplex and prove Choquet?s uniqueness theorem for metric spaces. The definition will require a few more constructions. Say that S is a compact convex set in a real locally convex space X. I will be working with the cone of S, so I will assume that S falls in an affine hyperplane in X that does not intersect the origin. Denote the cone C = {?x ? X | x ? S, ? ? 0}. If S falls in a hyperplane as described, then for all x ? C, there is a unique ? ? 0 such that ?x ? S. We say 68 that S is a base of the cone C. The cone C defines a partial ordering on the space X. Let x  y if y? x ? C. Since S falls in a hyperplane that misses the origin, C ? ?C = {0}, so x  y and y  x implies y = x. On the other hand if x  y  z, then y ? x = ?u, z ? y = ?v, where ?, ? ? 0 and u, v ? S. Then ( ) ? ? ?u+ ?v = (? + ?) u+ v , (3.17) ? + ? ? + ? which is in C, so x  z. Note as well that for x ? X, the set Cx = {y ? X | x  y} = C + x, so the partial order is also translation invariant. Finally, we say that for two elements x, y ? X have an upper bound z if x  z and y  z. The upper bound is a least upper bound if for every upper bound z0, z  z0. Finally, we define a convex set S that is the base of a cone C to be a Choquet simplex if and only if the space C ?C has the property that for each x, y ? C ?C, there is a unique least upper bound, which we denote x ? y. We say that C ? C is a vector lattice. Definition 9. A partially ordered set is called a join- (meet-) semilattice if each pair of elements has a unique least upper (lower) bound. If a vector space is a semilattice which satisfies 1. x  y implies x+ z  y + z for all z 2. x  y implies ?x  ?y for all ? ? 0, then it is a vector lattice. 69 Proposition 7. C ? C is a vector lattice if and only if C is a meet-semilattice. That is each pair of elements in C has a greatest lower bound. Proof. For each x, y ? C, call their unique greatest lower bound x?y. Let x = x1?x2 and y = y1 ? y2 be elements of C ? C. Let z = (x1 + y1) ? (x1 + y2) ? (y1 + x2). Because z?x = (x2 +y1)? (x1 +y2)? (y1 +x2), we have x  z, and similarly y  z. Now let w = w1 ? w2 be an upper bound for x and y. It is clear from the definitions that w2 + x1 + y1  w1 + x2 + y1 and w2 + x1 + y1  w1 + x1 + y2. Therefore, w ? z = (w1 + x1 + y2) ? (w1 + y1 + x2)? (w2 + x1 + y + 1)  0. (3.18) For the reverse direction, given C ? C is a vector lattice, one can define a greatest lower bound by x ? y = ?(?x ? ?y). By restricting to C ? {0} = C, it is demonstrated that C is a meet-semilattice. Geometrically, this can be interpreted as saying that the intersection of two identical cones will be another congruent cone. For basic examples, consider the finite dimensional examples I discussed earlier. The intersection of two circular cones C0 = {(x, y, z) ? R3 | y2 + z2 ? x2} and Cw = C0 + w situated in the same direction will be a hyperbola-shaped crescent. Therefore 0 and w can?t have a least upper bound because for any v ? C0 ? Cw, Cv = C0 + v ? C0 ? Cw is a strict subset; it does not contain all upper bounds of 0 and w. Therefore the circle is not a Choquet simplex. 70 By contrast, the nonempty intersection of two triangular pyramids, oriented in the same direction, is another identical prism. Thus, the triangle, a simplex, indeed meets the criterion to be a Choquet simplex as well. 3.4 Technical Proofs Now I will provide some more technical proofs in Choquet theory. To start with there are some tools we will need in the upcoming proofs. First, I d?efine an ?equivalence relation on measures. Say that ? ? ? if for all h ? A(X), h d? = h d?. Definition 10. Let X be a convex subset of a topological vector space and f : X ? R. f is called convex if for all x, y ? X, and for all 0 < ? < 1, f(?x+ (1? ?)y) ? ?f(x) + (1? ?)f(y). (3.19) If the inequality above is strict, we say f is strictly convex. If it is an equality, then f is said to be affine. ?f is concave. Define the set of all affine functions on X by A(X). Notice that A(X) contains functions of the form f(x) + r, where r ? R and f is linear, so it is rich enough to separate points. Definition 11. Let f : X ? R be bounded. Call the upper envelope of f the 71 function given by f(x) = inf{h(x) | h ? A(X) and h ? f}. (3.20) These are few basic properties of f, which will be useful: Proposition 8. 1. f is concave, bounded and upper semicontinuous. 2. f ? f for all f . f = f if and only if f is concave and upper semicontinuous. 3. If f, g are bounded functions, f + g ? f + g, and |f ? g| ??f ? g?C . f + g =0 f + g if g ? A(X). 4. rf = rf , for r > 0. The following proposition shows how the upper envelope function interacts with the extreme points of its underlying set. Proposition 9. Let X be a convex subset of a topological vector space. If f ? C(X), then for each x ? X, {? } f(x) = sup f d? | ? ? ?x . (3.21) f(x) = f(x) if x is an extreme point of X. ? Proof. Let g = sup{ f d? | ? ? ?x}, and we wish to show that g = f . For any ?, ? with ? representing x and ? representing y, and 0 < ? < 1, ? = ??+ (1? ?)? 72 ? represents ?x + (1? ?y). Then we have the inequality f d? ? g(?x + (1? ?y)). Taking the supremum over all ? and ?, we have that g(?x+ (1? ?y) ? ?g(x) + (1? ?)g(y), (3.22) Therefore g is concave. To show that g is upper semicontinuous, let {x?} be a net converging to a point x ? X, with g(x?) ? r for each ?. Fix  > 0, and then for each ? choose a measure ?? ? ?x? such that ??(f) > r ? . By the weak-* compactness of the set of probability measures, there exists a convergent subnet {??} that converges to a measure ?. For each h ? A(X), h(x?) = ??(h) ? ?(h), so ?(h) = h(x), and ? represents x. Also because ??(f) > r ?  for each ?, we have ?(f) ? r ? , and therefore g(x) ? r ? . Because  was arbitrary, we can let it go to 0 and obtain g(x) ? r, which shows that g is upper semicontinuous. Now since g is concave and upper semicontinuous, it follows that g ? f . On the other han?d, if h ? ?A(X) and h ? f , for any x ? X and ? representing x, we have h(x) = h d? ? f d?. From this we get h(x) ? g(x), and then taking the infimum over all such h, f(x) ? g(x). 3.4.1 Choquet?s Theorem for Metrizable Spaces Now we wish to move on to our first major proof. Recall Choquet?s Theorem. Theorem 25. (Choquet) Let S be a compact convex subset of a metric space X. Let x ? S. Then there exists a probability measure ? ? Mb(X) such that ? is 73 supported on ex(S), and that for any bounded linear functional f on X, ? f(x) = f(t) d?(t) (3.23) S This was the first representation theorem proved by Choquet [22]. It was, of course, generalized not long after to the Choquet-Bishop-De Leeuw theorem, but the proof is much more technical and the added machinery for working in non-metrizable topological spaces is not necessary for our purposes. The proof is courtesy of Phelps [50]. Proof. (Choquet?s theorem) Suppose X is a metrizable compact convex subset of a locally convex topological vector space E. Let x0 ? X be arbitrary. What we get from the metrizability of X is that C(X), and therefore A(X), is separable. Let (h ?k)k=1 be a sequence of functions in A(X) such that ?hk?C = 1 and {hk | k ? N}0 is dense in the unit?sphere of A. In other words, (hk) separates points of X. Then let f = 2?kh2k k. Note two things: (hk) is uniformly bounded, so the sum converges uniformly; and h2k is strictly convex for all k, so f is also strictly convex. Then we can define a subspace B = A(X) + rf , r ? R. For h ? A(X) and r ? 0, we have by Proposition 8 that h + rf = h+ rf . If r < 0, then h + rf is concave, and h+ rf ? h+ rf = h+ rf . In either case we have h+ rf ? h+ rf, (3.24) and the functional on B defined by h + rf 7? h(x0) + rf(x0) is dominated by the 74 functional g 7? g(x0), which is bounded on C(X). Therefore we can apply the Hahn- Banach theorem to extend to a functional m on C(X) which satisfies m(g) ? g(x0) and m(h+ rf) = h(x0) + rf(x0). Note that if g is negative then m(g) ? g(x0) ? 0, so m is positive, and therefore is identified with a measure on X. m(1) = 1, so m is a probability measure. Because for any h ? A(X), m(h) = h(x0), m represents x0. Now for any h ? A(X) such that f ? h, and therefore h ? f. Hence m(f) ? m(h) = h(x0). By the definition of f, m(f) ? m(f), and since f ? f , m(f) = m(f). We can conclude that m is supported on the set {x ? X | f(x) = f(x)}. Let x, y, z be distinct points in X. If x = y/2+z/2, then by the strict convexity of f , 1 1 1 1 f(x) < f(y) + f(z) ? f(y) + f(z) ? f(x). (3.25) 2 2 2 2 Therefore x is not in the support of m. We can conclude that every m is supported on a subset of the extreme points of X. As it turns out, one additional fact that follows from this proof is that the constructed function f has the stronger property that f(x) = f(x) if and only if x is an extreme point of X, which gives us another way of characterizing extreme points. 3.4.2 The Choquet-Meyer Theorem The Choquet-Meyer theorem says that Choquet simplices are precisely the convex sets that support uniqueness of Choquet representations [23]. Theorem 26. (Choquet-Meyer) Let X be a compact convex subset of a locally 75 convex, metrizable topological vector space E. The following are equivalent. 1. X is a Choquet simplex. 2. For all f ? C(ex(X)), if f is convex, then f ? A(X). 3. If ? ? M(ex(X)) represents x ? X, and f ? C(X) is convex, then f(x) = ?(f). 4. For any convex f, g ? C(X), f + g = f + g. 5. For all x ? X, there is a unique measure ? ?M(ex(X)) that represents x. The process outlined in this section is again from Phelps, as was much of the material in this chapter [50]. We will need a few more materials before attacking the proof in its entirety. Defi?ne a part?ial ordering on M(X) given by ? ? ? if for all convex functions h on X, h d? ? h d?. This definition is meaningful. For example, we can show that the surface measure (normalized) on the unit sphere is greater than the Dirac delta at the origin. Let f be convex and continuous on the unit ball. Because for all x ? Sn?1, f(0) < (f(x) + f(?x))/2, we have ? ? ? f(x) d?(x) = f(0) = ?f(0) d? (3.26)Sn?1 f(x) + f(?x) < d?(x) = f(x) d?(x). (3.27) Sn?1 2 With that established, it follows from an application of Zorn?s lemma that there are maximal measures in this ordering. We wish to show the following equivalence: 76 Proposition 10. A measure ? ? M(X) is maximal in the ordering from ? if and only if it is supported on the extreme points of X. Proof. Assume ? as above is maximal. Let f ? C(X). Define a linear functional L on the subspace span{f} given by rf 7? r?(f). This functional is dominated by the bounded sublinear (but not linear) functional p : C(X) ? R p(g) = ?(g). By the Hahn-Banach theorem we can extend L to a linear functional on C(X) which is bounded by p. If g ? 0, then g ? 0, so L(g) ? p(g) = ?(g) ? 0, so L is a positive functional, and can be represented by a measure ? ? M(X). Now, for any convex function g, ?g = ?g, so ?(?g) ? p(?g) = ?(?g), (3.28) and ? ? ?. Since ? is maximal, we know that ? = ?, and ?(f) = ?(f) = L(f) = ?(f). (3.29) Now assume that for any convex continuous function f , ?(f) = ?(f). Let choose a maximal measure such that ? ? ?. We know ? exists again from Zorn?s lemma. Then for any concave g, ?(g) ? ?(g). We can write for convex f , ?(f) = inf{?(g) | g is concave and g ? f} ? inf{?(g) | g is concave and g ? f} = ?(f). Hence for any convex f , ?(f) = ?(f). Because {f ? g | f, g are convex} is dense in C(X), we can conclude that ? = ? and ? is maximal. 77 Finally, recall that in the proof of Choquet?s theorem, we constructed a convex function f ? C(X) that has the property that ex(X) = {x ? X | f(x) = f(x)}. Then we can conclude that a measure ? is supported on the extreme points of X if and only if ?(f) = ?(f). The proof is complete. Furthermore, the set of nonnegative measures on a vector space X forms a lattice under this ordering. Proposition 11. Given two positive measures ? and ? in M(X), their greatest lower bound exists and is of the form ( ) ? ? ? = min d? , d? (?+ ?), (3.30) d?+? d?+? where d?? denotes the Radon-Nikodym derivative.d Proof. Let ? ? ?, ?. It?s obvious that ? is absolutely continuous with respect to ? + ?, since it is absolutely continuous with respect to ? and ? individually. If f ? C(X) is convex, then ? ? ? ? d? d? f d?+ ? = f d? ? f d? = f d?+ ?, (3.31) d?+ ? d?+ ? ( ) and likewise for ?. Hence, d? ? min d? , d? = d??? , and ? ? ? ? ?. d?+? d?+? d?+? d?+? The last thing we will need is a decomposition lemma for vector lattices. Lemma 27. Let V be a vector?lattice. If w?e have two subsets of positive elements of V , (xi) I i=1 and (y ) J , and if Ij j=1 i=1 xi = J j=1, then there exists zij ? 0, 1 ? i ? I, 78 ? ? 1 ? j ? J , such that x = J Ii j=1 zij and yj = i=1. Proof. Consider the case where I = J = 2. We have x1 + x2 = y1 + y2 and let z11 = x1 ? y1 z12 = x1 ? z11 z21 = y1 ? z11 z22 = x2 ? z21 = y2 ? z12 The desired identities follow immediately, so what we must show is that the final equality holds, and that each zij are nonnegative. By the translation invariance of the vector lattice, we have z12 ? z21 = (x1 ? z11) ? (y1 ? z11) = x1 ? y1 ? z11 = 0, (3.32) so z11, z12, z21 are nonnegative. Furthermore, z12 + x2 = x1 + x2 + z11 = y1 + y2 + z11 = z21 + y2. (3.33) It follows that y2 ? z12 = x2 ? z21, so z22 is well defined. z21 ? z21 + y2, so z21 = z21 ? (z12 + x2) ? z21 ? z12 + z21 ? x2 = z21 ? x2. Hence z21 ? x2, and z22 is nonnegative. Finally, we can generalize to higher I, J by induction. If we are given x1 +x2 = y1 + y2 + y3, we can set y?2 = y2 + y3 and then we have x1 + x2 = y1 + y?2, and we 79 can generate z?11, ? ? ? , z?22 as before. Then we have?z?12 + z?22 = y2 +?y3. We can apply the above again to get z12, z13, z 3 2 22, z23 such that j=2 zij = z?i2, i=1 zij = yj. Set z11 = z?11 and z21 = z?21 and the desired result follows. Proceed inductively in this manner for all I, J. Now we move on to the proof of the main theorem. Proof. (Choquet-Meyer) (1? 2) Let f ? C(X) and X be a simplex. Let x1, x2 ? X, ?1, ?2 > 0 such that ?1 + ?2 = 1. Call z = ?1x1?2x2. We?d like to show that f is affine, so we need that f(z) = ?1f(x1) + ?2f(x2). By Proposition 9, f(z) = sup{?(f) | ? ? ?z}. Suppose ? is a discrete measure ?and ? ? ?z. Then t?here exist a finite sequence of ?j ? 0 and yj ? X such that ?j = 1 and ? = ?j?y . By using Lemma 27 on vector la?ttices, we can thenj get a sequence of z?ij so that ?jyj = z ? + z?1j 2j and ?ixi?= z ? j ij. If we write z?ij = ?ijzij, ?ij ? 0 and zij ? X, then we get that xi = ??1j i ?ijzij, which is a convex combination of elements of X. It follows that it represents a discrete measure ?J ? = ??1i i ?ij?z i = 1, 2, (3.34)ij j=1 ? with ? ? ?x . We can conclude that f(xi) ? ?i(f) = ?1i j ?i ?ijf(zij). Finally, by the convexity of f , we have f(y ) ? ??1? f(z ) + ??1j j 1j 1j j ?2jf(z2j), 80 so we can conclude that ? ?(f) = ?jf(yj) ?j ? ?ijf(zij) = ?1?1(f) + ?2?2(f) i,j ? ?1f(x1) + ?2f(x2). If we take the supremum over all possible ?, we get f(z) ? ?1f(x1)+?2f(x2). Note that we made the assumption that ? may be a discrete measure without justification. This is acceptable because we can approximate any ? ? ?z with a discrete measure from below; this can be made formal using a partitioning argument relying on the compactness of X. (2? 3) If ? ? ?x is supported on ex(X), then ?(f) = ?(f). We want to show that ?(f) = f(x). f is affine and upper semicontinuous. Because for h ? A, ?(h) = h(x), it is enough to show that f can be approximated from above by continuous affine functions. Let h1 and h2 be in A such that hi > f . We want to show that there is another h ? A, h > f such that h ? h1, h2. Define J = {(x, r) ? X ? (0,?) | r ? f(x)} and Ji = {(x, r) ? X ? (0,?) | r = h(x)}. Notice that by the semicontinuity of f, J is closed, and J1?J2 is compact. Since the two sets are disjoint, by the Hahn- Banach separation theorem we can separate them with a hyperplane L(x, r) = ?. The function given by h(x) = L(x, h(x)) satisfies our requirements. Then the set H = {h ? A | h ? f} is directed downwards. In addition, the closure of H is 81 bounded below, so it has a unique minimal element f ?. Because for all z ? X, there exists hn ? H such that hn(z)? f(z) ? f ?(z), we can conclude that f ? = f , and ?(f) = inf{?(h) | h ? H} = inf{h(x) | h ? H} = f(x). (3.35) (3? 4) Let f, g ? C(X) be convex. Choose ? ? ?x supported on ex(X). (f + g)(x) = ?(f + g) = ?(f) + ?(g) = f(x) + g(x). (3.36) (4 ? 5) Let x ? X. Consider the set S ? C(X) of convex functions, and define a linear functional on S?S by m(f?g) = f(x)?g(x). Because for any h1, h2 ? C(X), h1(x)+h2(x) = h1 + h2(x), m is linear on S?S, and we have as a property of upper envelopes that |m(f?g)| ??f ? g?, we can conclude that m is uniformly continuous on S ? S, and therefore extends to a continuous function on C(X) with norm at most 1. m(1) = 1, so m is a probability measure, and therefore is identified with a measure on X. By Proposition 9, we have that for any f ? C(X), m(f) = f(x) = sup{?(f) | ? ? ?x}, so ? . m for any ? ? ?x. Therefore m is the unique maximal measure which represents x. Equivalently, it is the unique measure supported on ex(X) which represents x. 82 (5 ? 1) Consider the cone P of nonnegative measures on X. As we have shown in Proposition 11, this cone forms a lattice. We would like to consider the subcone Q ? P of measures supported on ex(X), and find that this is a lattice as well. First, for any two ?, ? ? Q, (?+ ?)(f) = (?+ ?)(f), so ?+ ? ? Q. For r ? 0, r? ? Q also. Let x, y ? Q, and let x ? y be their greatest lower bound in P . Recall the formula for the greatest lower bound of two measures in (3.30). It is clear from this formula that x ? y is supported on the union of the supports of x and y, which are contained in ex(X).Therefore x ? y is in Q. To show that x ? y is still a greatest lower bound in the natural ordering on Q, take w ? Q such that x ? w and y ? w are in Q. Because x?y is a greatest lower bound on P , we have that 0 ? P ? x?y. It follows that x ? y ? w ? P and x ? y ? w ? z. Because x ? y ? Q, x ? y ? w is absolutely continuous with respect to x ? y, and therefore it is also supported on ex(X) and therefore x ? y ? w ? Q, and w is less than x ? y in the natural order on Q. x ? y is a true greatest lower bound, and Q is a lattice. Moreover, the set Q1 = {? ? Q | ?(X) = 1} is a base for Q, and Q1 is a simplex. The final step is to show that X is a simplex. Define the resultant map r : Q1 ? X by r(?) is the unique point in X represented by ?. It is easy to see that the resultant is an affine function. Our assumption was that to each x ? X, there is a unique measure in Q1 such that ? represents x. This implies that the resultant map is one-to-one. By Choquet?s theorem it is also onto, and therefore it follows that X is a simplex isomorphic to Q1. 83 Notice that throughout this section we used the assumption that X is a metric space. This assumption can be dropped, but not without a cost. The Choquet- Bishop-De Leeuw theorem shows that we still get representation, but we cannot say that measures are supported on the extreme points anymore. One problem is that in a metrizable space, the extreme points form a G? set, but in the absence of a metric they may not even be Borel. Bishop and De Leeuw gave examples of such convex sets [50]. A more subtle use of the fact that X is a metric space was in the proof of Proposition 10. Recall that in the proof of Choquet?s theorem we constructed a convex continuous function f such that ex(X) = {x ? X | f(x) = f(x)}. This construction relied on the metrizability of X, and clearly is not possible without it, as it would imply that ex(X) is a G? set. Without this tool it isn?t possible to prove Proposition 10 in its full generality. Rather we get the weaker fact that ? ?M(X) is maximal if and only if ?(f) = ?(f) for all f ? C(X). Thus the generalized version of Choquet-Meyer is Theorem 28. (Choquet-Meyer, non-metrizable) Let X be a compact convex subset of a locally convex, metrizable topological vector space E. X is a Choquet simplex if and only if for each x ? X there is a unique maximal measure ?x such that ?x ? ?x. This is not exactly comparable to the Choquet-Bishop-De Leeuw theorem for non-metrizable spaces. Recall that the Choquet-Bishop-De Leeuw theorem says that there exists a representing measure which vanishes on Baire sets which contain no extreme points. However, the Choquet-Meyer Theorem does not show uniqueness of 84 measures which satisfy this condition. Mokobodzki found an example of a simplex X, for which ex(X) is Borel, but not Baire, and there exists a point x ? X with two distinct representing measures ? and ?, with ?(X\ex(X)) = 0 and ?(X\ex(X)) = 1, but both vanish on every Baire subset of X which is disjoint from its extreme points [50]. We aren?t too concerned with these anomalous spaces, but they are the subject of some continuing study today in functional analysis. 85 Chapter 4: My Results 4.1 Uniqueness of Positive Definite Extensions Now I proceed to my contributions to the topic, including the proofs of my main results. I wish to apply Choquet theory to both the problems of positive definite extensions and continuous superresolution. Recall that the positive definite extension problem is a special case of the superresolution problem, when we restrict ourselves to only caring about probability measures. The first thing we must do is identify the space we are working with and its extreme points. Proposition 12. Let P be the set of probability measures on the torus, P = {? ?M(T2) | ?f ? 0?(f) ? 0, ???TV = 1}. (4.1) ? ? P is an extreme point if and only if ? = ?x for some x ? T2. Proof. The extreme points of a closed convex set are those which are not a convex combination of two distinct elements. Note that for two probability measures ? and ?, the support of ?? + (1? ?)? is the union of the supports of ? and ?. Let ? be a probability measure. If ? is supported on a single point, then it cannot be written as a convex combination of any distinct ? and ?, because if they are distinct, their 86 supports must contain points outside of the support of ?. If, on the contrary, ? is supported on at least two points a and b, then we can divide T2 into two sets, A and B, such that a ? A and b ? B, such that ?(A) > 0 and ?(B) > 0. Then we can write ? = ?A + ?B. (4.2) ? ? Because ?A/ d?A and ?B/ d?B are distinct probability measures, ? cannot be an extreme point. The problem I?ll focus on is to find whether the following algorithm produces a unique result: ?# = argmin???TV such that ??(m,n) = ymn ?N ? m,n ? N, (4.3) ??M(T2) where (y ) ? C(2N+1)?(2N+1)mn are the observed data. Recall from Proposition 11 that the cone of nonnegative measures is a meet- semilattice. Because P is a base for that cone, we know that P is a Choquet simplex and has the property that each element has a unique representation as a probability measure supported on ex(P ). But that clearly isn?t sufficient for unique reconstruction from a finite number of frequency samples. We have many examples 87 of non-unique reconstruction [6]. A simple example is the two probability measures 1 ( ) ?1 = (? + ?1/2 , ) (4.4)21 ?2 = ? + ?1/4 + ?1/2 + ?3/4 . (4.5) 4 ??1(n) = ??2(n) when |n| < 2, but not when |n| ? 2. Consider the map F : P ? C(2N+1)?(2N+1) given by F (?)(m,n) = ??(m,n) ?N ? m,n ? N. (4.6) F (P ) is a compact, convex set in C(2N+1)?(2N+1). F will preserve convex combina- tions in the sense that if ? is a probability measure that represents x ? M(T2), then the pushforward F ? represents F (x) on C(2N+1)?(2N+1)? . Recall that ? is a linear bounded functional on the space M(T2) with the property that for all affine functions p : M(T2)? C, ? p(x) = p(t) d?(t). (4.7) M(T2) Given an affine function q on C(2N+1)?(2N+1), we calculate ? ? q(F (x)) = q(F (t)) d?(t) = q(z) dF??(z), (4.8) M(T2) C(2N+1)?(2N+1) and thus it is shown that F (x) has a representation by F??, which is also a proba- bility measure. 88 Notice that there is a characterization of the uniqueness problem in the fol- lowing way: the measure ?# in (1.70) is unique if the point (y ) ? C(2N+1)?(2N+1)mn has a unique representation F??, where ? is a measure supported on ex(M(T2)). We can conclude that, despite the fact we are attempting to reconstruct measures in an infinite dimensional space, the reconstruction is actually taking place in the finite dimensional space C(2N+1)?(2N+1). We wish to know whether we can formu- late (1.70) in full without appealing to the infinite dimensional space M(T2). The following lemma helps with that goal. Lemma 29. Let P ?M(T2) be the set of probability measures as before, and F as above. Then, ex(F (P )) = F (ex(P )). (4.9) Moreover, for each z ? ex(F (P )), there is a unique element x ? T2 such that z = F (?x). F ex(P ) is a homeomorphism. Proof. First, P is a convex, compact (in the weak-* topology) set, so by the Krein- Milman theorem 20 it is the closed convex hull of ex(P ). F is linear and continuous with respect to the weak-* topology, so F (P ) is not only compact and convex itself, but it is also the closed convex hull of F (ex(P )). Therefore, by applying the Krein- Milman theorem to F (P ), we can conclude that ex(F (P )) ? F (ex(P )). We must show that each point of F (ex(P )) cannot be a convex combination of two other points in F (P ). Notice first that each point z ? F (ex(P )) is of the form z = e2?i?m,n??xmn , (4.10) 89 for some x ? T2. Hence each vector z ? F (ex(P )) has ?z?2 = 2N + 1. Therefore F (P ) ? B(0, 2N + 1). Likewise, for any two distinct points z,w ? F (P ), t ? (0, 1) ?? ?tz + (1? t)w? < t?z?2 + (1? t)?w?2 ? 2N + 1. (4.11)2 Hence, tz + (1? t)w is not in F (ex(P )), and every point in F (ex(P )) is an extreme point of F (P ). Finally, if z ? ex(F (P )), then it must be in F (ex(P )), so there is an element ? ? ex(P ) such that z = F (? ). Note that for x,y ? T2x x , if for m,n ? {0, 1}, e2?i?m,n??x = e2?i?m,n??y, then x = y. Therefore, given N ? 1, x is unique. Finally, we have shown that F ex(P ) is a continuous bijection. Because F is an open map, so is its restriction. Therefore F ex(P ) is a homeomorphism. An immediate consequence of this lemma is that the pushforward map F? : M(ex(P ))? M(ex(F (P ))) is in fact an isomorphism of vector spaces. Therefore we conclude that the following two statements are equivalent: z ?M(T2? ) has a unique representation by a measure ? ?M(ex(P )). ? F (z) has a unique representation by a measure ? ?M(ex(F (P ))). Of course, F (P ) is not a simplex, as it has infinitely many extreme points, so as we already knew from the examples in [6] and (4.4), unique reconstruction is not possible in general. But we are aware that in certain cases we can get unique construction. For example, the measure 1/2(? + ?(1/2,0)) is the unique solution to 90 (1.70) when N ? 2. This is because the line tF (?) + (1? t)F (?(1/2,0)) t ? [0, 1] (4.12) is a simplex which lies in the boundary of F (P ). We call this a face of F (P ), analogous to a face of a polyhedron. The idea of a face of a convex set came from Alfsen, who defined the face of a Choquet simplex [1]. This idea will be important for uniqueness. Now I will prove my first main result on positive definite extensions. Theorem 30. If a finite sequence y ? C(2N+1)?(2N+1) has an extension to an infi- nite positive definite sequence (y ?mn)m,n=??, then at least one such positive definite extension must be a finite sum of the form ?K y ?2?i?m,n??xk 2mn = ?ke ?k > 0,xk ? T , (4.13) k=1 where K ? 4N2 + 4N + 2. Proof. Let y ? C(2N+1)?(2N+1) be arbitrary, such that the given assumptions hold. Without loss of generality let y0,0 = 1. By Bochner?s theorem, there exists a ? ? P such that y = F (?). Because C(2N+1)?(2N+1) is finite dimensional, Carathe?odory?s theorem 19 implies that y has a representation as a convex combination of at most (2N + 1)2 + 1 = 4N2 + 4N + 2 extreme points of F (P ). Because each extreme point 91 is of the form F (?x) for some unique x ? T2, we have ? ?? ?K K ?K y = ?kF (? ) = F ? ? ? ?xk k x ?k k > 0, ?k = 1. (4.14) k=1 k=1 k=1 ? The measure ? = ? 2k?x is a probability measure on T with F (?) = y, so ?? is ak positive definite extension of y. The proof is complete. 4.2 Faces of Choquet Simplices The first obstacle to generalizing Theorem 30 to signed measures is that there is no longer an obvious setting to apply Choquet theory. The set of complex measures {? ? M(T2) | ???TV = 1} is no longer even convex, let alone a Choquet simplex. We propose to solve this problem by introducing the idea of faces. Consider why compressive sensing works in the finite dimensional case: in high dimensions, the shape of the `1 ball is such that not only is it a polyhedron, but most of its tangent affine planes of sufficiently low dimension will hit one of the low-dimensional faces. The faces of the `1 ball are in fact simplices. I will refer to the `1 ball as a cross- polytope in n dimensions. The solid (hollow) cross-polytope is the set {x ? Rn | ?x?1 ?(=) 1}. Notably, because its faces are simplices, a hollow cross-polytope has the prop- erty that each point is a unique convex combination of vertices. We would like to find an infinite-dimensional analogy for M(T2). We have by Proposition 7 that the probability measures form a Choquet simplex. It is easy to see that each set of the form {?? | ? ? M(T2), ? ? L?(T2), ? ? 0, |?| ? 1} is also an isomorphic 92 simplex via the affine transformation ? 7? ??. The central question of this section is whether it is possible to define a generalized polytope, and in what generality can it be defined? A similar topic was studied by Alfsen [1]. As it turns out we may define a face of a simplex in the following way. Definition 12. Let X be a compact convex set in a topological vector space V . An affine space H of codimension 1 is said to be a supporting hyperplane for X if H ?X 6= ? and X \H is convex. (4.15) If H is a supporting manifold, then H ?X is a face of X. An immediate implication is that a face can consist of a single point {x} if and only if x is an extreme point. Thus the extreme points are analogous to the vertices of a simplex. We can in fact strengthen that statement to the following. Proposition 13. If x in a face F = H ? X has a representation ? ? ?x, then ? must be supported in H. Proof. Assume x, F,H, ? as above, and that ? is not supported in H. It is a standard fact, and a corollary of the Hahn-Banach theorem, that there is a linear functional f : V ? R such that f = 0 on H and f > 0 on X \H. Because ? is not supported in H, there must be some compact S ? X \ H such that ?(S) > 0 and f(S) is 93 bounded below by  > 0. We can conclude that ? ? f(t) d?(t) ? ?(S) > 0 = f(t) d?x(t), (4.16) X X which is a contradiction. An immediate consequence is that extreme points of faces correspond well with extreme points of their underlying sets. Corollary 1. Let C be a face of a compact convex subset X. Then C is also compact and convex, and ex(C) ? ex(X). Proof. C is the intersection of a closed convex hyperplane H and a compact convex set X, so it is immediately compact and convex. Let x ? ex(C). By the Krein-Milman theorem we have that x ? C, so Cho- quet?s theorem implies that there exists a measure ? supported on ex(X) such that ? represents x. But by Proposition 13, ? is supported in H ? X = C, and since x is an extreme point of C we can conclude that ? = ?x. But because {x} = supp(?) ? ex(X), x is an extreme point of X. We now know that the face structure of a closed convex set has some useful properties with respect to its extreme points. One more important implication is a property of Choquet simplices which will be helpful in generalizing Theorem 30. Proposition 14 explores this, but first we need to return to some materials on cones and lattices. 94 Definition 13. A subcone X? of a cone X is said to be hereditary if 0 ? x ? y and y ? X? implies x ? X?. Proposition 14. Each face of a Choquet simplex is itself a simplex. The extreme points of a face are each extreme points of X as well. Proof. The proof will take place in two parts. Recall that associated to any Choquet simplex X is a cone X?. A compact convex set X is a Choquet simplex if and only if the associated cone X? is a lattice under the associated order: x ? y if y? x ? X?. In part 1 of the proof, we wish to show that each face F of X forms a hereditary subcone of X??that is given y ? F? and x ? X?, if x ? y then x ? F? . In part 2 we will show that every hereditary subcone of a lattice is a lattice itself. (Part 1) If F is a face of X, then there is a hyperplane H ? V such that F = H?X, and X falls on one side of H. Equivalently we can say that there exists a linear functional f such that H = {x ? V | f(x) = 1}, and f(x) ? 1 for all x ? X. We can extend f to the cone X? sitting in the space V ? R in a natural way by f(?x) = ?f(x). Then we can characterize F? = {?x ? X? | f(?x) = ?}. Let y ? F and x ? X such that for some positive numbers ?, ? we have ?y ? ?x. We know immediately that f(?x) ? ?, so f(?y??x) ? f(?y)?? = ???. On the other hand, because ?y ? ?x ? X?, we have that ( ( )) ? ? f(?y ? ?x) = f (?? ?) y ? x ? (?? ?). (4.17) ?? ? ?? ? Therefore f(?y ? ?x) = ?? ? and ?x ? F? (Part 2) Now I wish to show that every hereditary subcone of a lattice is a lattice 95 itself. To that end, let ?X be the ordering induced by X and ?F be the one induced by F . Let x, y ? F? and we wish to find a greatest lower bound for x and y in F? . Let z be the greatest lower bound in X?, and let w ? F? such that w ?F x and w ?F y. We know already that w ? F? . Since F ? X, we have that 0 ?X w ?X z. Then because z?w ?X z ? F? , so z?w ? F? and w ?F z. Therefore z is a greatest lower bound and F is a lattice. We can conclude that each face of a Choquet simplex X generates a hereditary subcone of X?, and is therefore a Choquet simplex itself. To end this section, I will point out that the definition of a face of a simplex is easily generalizable. We will find the notion useful in exploring the shape of cross- polytopes (which are not Choquet simplices but share some useful properties) in M(T2). Definition 14. Let C be a compact convex set in a topological vector space V . An affine space H of codimension 1 is said to be a supporting hyperplane for C if H ? C =6 ? and C \H is convex. (4.18) If H is a supporting manifold, then H ? C is a face of C. 96 4.3 Uniqueness for Signed and Complex Measures Let B = {? ? M(T2) | ???TV ? 1}. I observe that by the definition from Alfsen, P is actually a face of B. Take the hyperplane in M(T2) defined by { ? } H = ? ?M(T2) | Re d? = 1 . (4.19) T2 ? Then P = H ? B and for all ? ? B, Re d? ? ???TV ? 1. Therefore H is a supporting hyperplane which carves out the face P . It is easy to imagine that one is able to prove similar results for other faces of B, which begs the question: what is a characterization of the faces of the cross-polytope B? A hyperplane H in a vector space V can always be generated by a linear functional f : V ? R, such that H = f?1(1). We can say that H is a supporting hyperplane for a compact convex set C if and only if f can be chosen such that for all x ? C, f(x) ? 1. Otherwise the hyperplane would split C into two disjoint convex sets f?1((1,?)) ? C and f?1((??, 1)) ? C. Let ? : T2 ? C be a continuous function with ???? = 1.?? is identified with a complex valued functional on M(T2) given by ? 7? ?(?) = ?(t) d?(t). In fact, recall that we?ve given M(T2) the weak-* topology, therefore by definition every element of the topological dual M(T2)? is of this form for some ? [5]. Define the affine hyperplane generated by ? as H? = ? ?1(1). (4.20) 97 I make the following observations about H?. Proposition 15. H? is a supporting hyperplane for B if and only if ???? = 1. Proof. If ???? = 1 then there exi?sts some x ? T 2 suc?h that for all y ? T 2, |?(y)| ? |?(x)| = 1. Then for all ? ? B, | ?(t) d?| ? 1 and ?(t) d(?(x)?x(t)) = 1. Thus H? ?B is nonempty and B \H? is convex. Given H? is a supporting hyperplane of B, then for all x ? T2, |?(x)| = |?(?x)| ? 1, so ???? ? 1. On the other hand, there must exist some ? ? B such that ? 1 = ?(t) d?(t) ? ??? |?|(T2? ) = ???? . (4.21) T2 Proposition 16. Let ? ? C(T2) with ???? = 1. The face H? ? B can be charac- terized as following. A measure ? ? B is in H? ? B if and only if ?? is a positive measure supported on the set {x ? T2 | |?(x)| = 1}. ? Proof. Given ? ? B, ? is in H? if ?(t) d?(t) = 1. Assume for contradiction that ?? is not a positive measure. Then there exists a set A ? T2 such that Re(??)(A) < |??|(A). Then |??|(T2) = |??|(A) + |??|(Ac) (4.22) > Re(??)(A) + Re(??)(Ac) (4.23) = Re(??)(T2) = 1. (4.24) But because ? ? B, ????M ? ???????TV ? 1, which is a contradiction. 98 Similarly, if ? is not supported on the set {x ? T2 | |?(x)| = 1}, then there exists a constant  < 1 such that A = {x ? T2 | |?(x)| < } has non-zero |?|- measure. Then ? ? 1 = ?(?) = ?(t) d?(t) + ?(t) d?(t) (4.25) A Ac  ? |?|(A ) + |?|(Ac ) (4.26) < |?|(A c) + |?|(A) ? 1, (4.27) which is a contradiction. The proof is complete. We can conclude that faces ofM(T2) are characterized by continuous functions with ???? = 1. Now consider the set F (B). We have the following result from Alfsen which links the face structure of B and F (B) [1]. Proposition 17. Let ? : V ? W be a linear map between vector spaces and let X ? V be a convex set. H ? ?(X) is a face of ?(X) if and only if ??1(H) is a face of X. So because F is a linear map, we conclude that faces of F (B) correspond to faces of B, but not every face of B necessarily corresponds to a face of F (B). For example, the face of B defined by the function ? = 1 contains all measures of the form ?x, x ? T2, but the set {F (? ) | x ? T2x } does not fall in a hyperplane of C(2N+1)?(2N+1), so it cannot map to a face of F (B). In fact, in order for ? to represent a face of F (B), it must factor through F , in the sense that there exists a function ?? : C(2N+1)?(2N+1) ? R such that ?(x) = ??(F (x)). 99 Now I will prove the main theorems of this section. Theorem 31. For each sequence y ? C(2N+1)?(2N+1), there exists an finite sum of point measures ?K ? = ?k? 2 x ?k ? C, xk ? T , (4.28)k k=1 such that for ?N ? m,n ? N, ??(m,n) = y(m,n). Furthermore, for all ? ?M(T2), if ??(m,n) = y(m,n) for ?N ? m,n ? N, then ???TV ????TV . Proof. Let 0 =6 y ? C(2N+1)?(2N+1). Because F (B) is closed, bounded and contains a neighborhood of the origin, we can guarantee that there exists an ? > 0 such that y ? ?F (B), and that ? is the minimal such constant. Without loss of generality, assume that ? = 1. Then for all ? ? M(T2), if F (?) = y, then ???TV ? 1. It is sufficient to construct ? satisfying the desired property, such that ???TV = 1. Because ? is minimal and F (B) is compact, y must fall in the boundary of F (B). It is a fact that the boundary of a compact convex set is the union of its faces, therefore there exists a hyperplane H and a corresponding face of F (B) such that y ? H ? F (B). Recall, as in Propositions 15 and 16, that associated to H is a functional ? ? (C(2N+1)?(2N+1))?, such that for all x ? F (B), ??,x? ? 1 and H = ??1(1). Likewise, F?1(H) is a supporting hyperplane of B, which is generated by the functional ? ? F . By Corollary 1, each extreme point of the face H ? F (B) is an extreme point of F (B), which corresponds with a unique extreme point of B. Because G contains all its extreme points, we can then characterize it as a closed 100 convex hull of delta measures. ex(G) = {??(x)?x ?M(T2) | |??(x)| = 1}. (4.29) Because G is isomorphic to a face of the set of positive definite measures, it is a Choquet simplex. We then proceed in an identical manner to the proof of Theorem 30. First I claim that for each z ? ex(F (G)), there is a unique x ? M(T2) such that z = ??(x)F (?x). It follows from the Krein-Milman theorem that ex(F (G)) ? F (ex(G)), so we must show that each point in F (ex(G)) is an extreme point of F (G). Let x ? T2 such that z = ??(x)F (?x) is an extreme point of G. Then zmn = ??(x)e ?2?i?m,n??x, (4.30) and it is straightforward to see that ?z?2 = 2N + 1. If we let z 6= w ? F (G), t ? (0, 1), then we can compute ?? ?tz + (1? t)w? < t?z?2 + (1? t)?w?2 ? 2N + 1. (4.31)2 Therefore for any non-extreme point of F (G), its norm must be strictly less than 2N + 1, thus every point in F (ex(G)) is extreme. By Carathe?odory?s theorem, there exists a representation of y as a finite convex sum of extreme points of F (G). As we have shown, to each extreme point z ? ex(F (G)), we can associate a unique x ? T2 such that z = ??(x)F (?x), so we can 101 write y as a convex sum with weights ?k > 0: ? ?K ?? ?Ky = ?k??(xk)F (?x ) = F ?k??(x )? ?k x . (4.32)k k k=1 k=1 Finally, because each xk is distinct, we can compute that ??????K ?? ? ?K ? ?k??(xk)?x ??? = ?k|??(xk)| = 1. (4.33)kk=1 k=1 TV We have successfully constructed ? as desired. Now we have shown that we cannot hope to uniquely recover non-discrete measures via the total variation minimization in (1.70). My final result is a charac- terization of exactly which measures allow for unique reconstruction. As we already know, any such measure must be supported on a finite set, but that is still not suffi- cient. Recall the work of Cande?s and Fernandez-Granda [12, 13, 30], which showed that a minimum separation requirement was sufficient. My characterization is based on a description of the faces of F (B). Recall that faces of B are generated by con- tinuous functions ? on T2, and that it corresponds to a face of F (B) if the function ? factors through F . Since unique reconstruction depends on the face structure of F (B), then we can characterize them by describing the admissible functions ? which generate faces of F (B) which are also simplices. For the proof of this theorem I first need the following lemma. Lemma 32. Each face of B is a Choquet simplex. Proof. Let G = B?H? be a face of B. By Proposition 16, ? ? C(T2) with ???? = 1. 102 Let P be the set of probability measures onM(T2), and consider the linear operator ?? : G? P given by ??(?) = ??. (4.34) Note that ?? is invertible, because for any ? ? G, ??? = ?. On the other hand for all ? ? P , ? ? ??(G) if and only if |?|? = ?. Therefore we can conclude that ?? is a linear isomorphism between G and P ?H|?|. Because P ?H|?| is a face of a Choquet simplex, by Proposition 14 it is also a simplex, and likewise G is as well. Theorem 33. Let ? ? M(T2). ? is the unique solution to the algorithm in (1.70) if there exists a trigonometric polynomial on T2, ?N ? = c e?2?i?m,n??xmn cmn ? C, (4.35) m,n=?N with the following properties: 1. ???? = 1 2. ?? = |?| 3. The set S = {?(x)?x ? T2 | |?(x)| = 1} is finite, and F (S) is affinely indepen- dent. Proof. Assume without loss of generality that ???TV = 1. The proof will be in four parts. (Part 1) First, it follows immediately from Proposition 16 that properties 1 and 2 are equivalent to the statement that ? is in a face G of B generated by ?. 103 (Part 2) Next I claim that any ? satisfying properties 1 and 2 is of the form (4.35) if and only if it factors through F , which likewise is true if and only if F (G) is a face of F (B). Let ? ? C(T2), with ???? = 1. ? factors through F if and only if there exists a covector ?? ? (C(2N+1)?(2N+1))? such that for any ? ? M(T2), ?(?) = ??(F (?)). This expands to ? ?N ?(t) d?(t) = ??(m,n)??(m,n). (4.36) T2 m,n=?N It is clear from this equality that ?? must be the Fourier transform of ? and by Parseval?s theorem equality holds for all ? ? M(T2) if and only if the Fourier transform of ? is supported on C(2N+1)?(2N+1), in which case it is a trigonemetric polynomial of the form in (4.35). H?? is clearly a supporting hyperplane of F (B), and F (G) = F (B)?H??. The reverse direction is true trivially from Proposition 17. Thus the claim is proved. (Part 3) y is a unique convex combination of extreme points of the face F (G) if F (S) is affinely independent. ?(x)?x ? ex(G) if and only if |?(x)| = 1, so S = ex(G). Therefore if F (S) is affinely independent, F (G) is a simplex. Therefore by Carathe?odory?s theorem, we can write y as a unique convex combination of extreme points of G. Because every representation of y must be supported on G, this combination must be unique over all representations on ex(F (B)). (Part 4) My next claim is that ? must be the unique solution to (1.70) if y = F (?) 104 can be written as a unique sum of extreme points of F (G). In the case that y is not in the boundary of F (B), there exists some 0 < ? < 1 such that y ? ?F (B). Then there exists some ? ? ?B such that F (?) = y. ???TV ? ? < ???TV , so ? is not a solution to (1.70). Likewise the pushforwards F?? and F?? are distinct measures supported on ex(F (B)), which each represent y. Hence the claim holds. In the case that y is in the boundary of F (B), there exists some supporting hyperplane H ? C(2N+1)?(2N+1)?? such that y is in the face F (G) = F (B) ? H??. By Proposition 17, G ? B is also a face of B, which is generated by a supporting hyperplane H?. Let ? ? G?B. By Proposition 32, G?B is a Choquet simplex, so by the Choquet-Meyer theorem there is a unique probability measure ?? ? (M(T2))?, supported on ex(G), which represents ?. By Lemma 29, F G is a homeomorphism, so F induces a vector space isomorphism F? betweenM(ex(G)) andM(ex(F (G))). For any functional A on C(2N+1)?(2N+1), see that ? ? A(t) dF???(t) = A(F (s)) d??(s) = A(F (?)). (4.37) C(2N+1)?(2N+1) M(T2) Because A was arbitrary, F (?) = y if and only if F??? represents y. Because F? is an isomorphism of vector spaces, we can conclude that ? is the unique measure in B such that F (?) = y if and only if F??? is the unique measure in M(ex(G)) which represents y. In addition, by Theorem 31, if ? is unique then the measure F??? must be a finite sum. The proof is complete. 105 A few notes on the preceding theorem. It may be possible to loosen the last requirement on ? slightly. Assume that ex(G) is finite. Using tools from algebraic geometry, we may put a bound on the number of extreme points of G. Let ? = (1???). The zero set of ? is identical to the set ex(F?1(G)). Under the assumption that this set is finite, Be?zout?s theorem gives us a bound. Because ? has degree at most 4N , Be?zout?s theorem says that it can have at most 4N zeros. For a more in depth derivation of this result, see appendix A.3 of [49]. Now F (G) is a simplex if and only if the set of extreme points of F (G) is linearly independent in C(2N+1)?(2N+1). This may not seem like a simplification, but for any given polynomial ? we can guarantee this to be true for sufficiently large N . Consider that for ?x ? ex(G), as N ? ?, F?1(F (?x)) approaches the delta function ?x, and it is easy to see that any finite set of Dirichlet kernels, F(F (?x ) fork xk ? T2, will evenually be linearly independent for sufficiently large N . How large N must be relative to the degree of ? is unknown to the author?s knowledge. Finally, we conclude with one more useful result that is an immediate appli- cation of the preceding theorem. Corollary 2. Any pair of positive delta measures can be uniquely recovered from (1.70), given N ? 2. Proof. Let x = (x1,x2),y = (y1,y2) ? T2. Define the polynomial ?(s, t) = (1 ? cos(s) ? cos(t))/2. It is easy to see that 0 ? ? ? 1, and ? = 0 if and only if 106 s = t = 0. Define ?(s, t) = 1? ?(s? x1, t? x2)?(s? y1, t? y2). (4.38) 0 ? ? ? 1, and |?(z)| = 1 only when z = x or y. Therefore for any positive measure supported on {x,y}, ?? = |?|. Finally, note that as x and y are distinct, {F (x), F (y)} is trivially affinely independent. Therefore ? satisfies the properties for Theorem 33, and we can conclude that any positive measure supported on {x,y} can be uniquely recovered by (1.70). 107 Bibliography [1] Erik M. Alfsen, On the geometry of Choquet simplexes, Math. Scand. 15 (1965), no. 1, 97?110. [2] A. P. Artemenko, Hermitian positive functions and positive funcctionals, Ph.D. thesis, Odessa State University, 1983. [3] Joseph A. Ball and Moise?s D. Guerra Huama?n, Convexity analysis and the matrix-valued Schur class over finitely connected planar domains, J. Operator Theory 70 (2013), no. 2, 531?571. [4] John J. Benedetto, Harmonic Analysis and Applications, CRC Press, 1996. [5] John J. Benedetto and Wojciech Czaja, Integration and Modern Analysis, Birkha?user Advanced Texts, Birkha?user, Basel, 2009. [6] John J. Benedetto and Weilin Li, Super-resolution by means of Beurling mini- mal extrapolation, 2016. [7] Sergei N. Bernstein, Sur les fonctions absolument monotones, Acta Math. 52 (1928), 1?66. [8] Salomon Bochner, Vorlesungen u?ber Fouriersche Integrale, first ed., Akademis- che Verlagsgesellschaft, Leipzig, 1932. [9] , Monotone Funktionen, Stieltjessche Integrale und harmonische Anal- yse, Math. Ann. 108 (1933), 378?410. [10] Stephen Boyd and Lieven Vandenberghe, Convex optimization, first ed., pp. 215?232, Cambridge University Press, Cambridge, 2004. [11] Alberto P. Caldero?n and Raymond Pepinsky, On the phases of Fourier coeffi- cients for positive real periodic functions, Computing Methods and the Phase Problem in X-Ray Crystal Analysis (Raymond Pepinsky, ed.), The X-Ray Crys- tal Analysis Laboratory, Department of Physics, The Pennsylvania State Col- lege, 1952, pp. 339?348. 108 [12] Emmanuel Cande?s and Carlos Fernandez-Granda, Super-resolution from noisy data, J. Fourier Anal. Appl. 19 (2013), 1229?1254. [13] , Towards a mathematical theory of super-resolution, Comm. Pure Appl. Math. 67 (2013), no. 6, 906?956. [14] Emmanuel Cande?s, Justin Romberg, and Terence Tao, Robust uncertainty prin- ciples: exact signal reconstruction from highly incomplete frequency informa- tion, IEEE Trans. Inform. Theory 52 (2006), no. 2, 489?509. [15] , Stable signal recovery from incomplete and inaccurate measurements, Comm. Pure Appl. Math. 59 (2006), 1207?1223. [16] Emmanuel Cande?s and Terence Tao, Decoding by linear programming, IEEE Trans. Inform. Theory 51 (2005), no. 12, 4203?4215. [17] , Near optimal signal recovery from random projections: universal en- coding strategies?, IEEE Trans. Inform. Theory 52 (2006), no. 12, 5406?5425. [18] Emmanuel J. Cande?s, Justin Romberg, and Terence Tao, Signal recovery from incomplete and inaccurate measurements, Comm. Pure Appl. Math 59 (2005), no. 8, 1207?1223. [19] Constantin Carathe?odory, U?ber den Variabilita?tsbereich der Koeffizienten von Potenzreihen, die gegebene Werte nicht annehmen, Math. Ann. 64 (1907), 95? 115. [20] , U?ber den Variabilita?tsbereich der Fourierschen Konstanten von posi- tiven harmonischen Funktionen, Rend. Circ. Mat. Palermo 32 (1911), 193?217. [21] Gustave Choquet, Theory of capacities, Ann. Inst. Fourier 5 (1955), 131?295. [22] , Existence et unicite? des representations inte?grales au moyen des points extre?maux dans co?nes convexes, Se?minaire Bourbaki : anne?es 1956/57 - 1957/58, expose?s 137-168, Se?minaire Bourbaki, no. 4, Socie?te? mathe?matique de France, 1958. [23] Gustave Choquet and Paul-Andre? Meyer, Existence et unicite? des representa- tions inte?grals dans les convexes compacts quelconques, Ann. Inst. Fourier 13 (1963), 139?154. [24] Krzysztof J. Ciosmak, Optimal transport and Choquet theory, 2020. [25] Yohann de Castro and Fabrice Gamboa, Exact reconstruction using Beurling minimal extrapolation, J. Math. Anal. Appl. 395 (2012), no. 1, 336?354. [26] Gaspard de Prony, Essai e?xperimantal et analytique: sur les lois de la dilata- bilite? de fluides e?lastique et sur celles de la force expansive de la vapeur de l?alkool, a? diffe?rentes tempe?ratures, J. E?. polytech. Math. 1 (1795), no. 22, 24?76. 109 [27] Ronald A. DeVore, Nonlinear Approximation, vol. 7, Cambridge University Press, 1998. [28] David A. Edwards, Some applications of Choquet theory to harmonic analysis, J. Lond. Math. Soc. 59 (1999), no. 2, 657?668. [29] Herbert Federer, Geometric Measure Theory, vol. 153, Springer, Berlin, 1969. [30] Carlos Fernandez-Granda, Super-resolution of point sources via convex pro- gramming, Information and Inference 5 (2016), no. 3, 251?303. [31] Robert Fraser and Kyle Hambrook, Explicit Salem sets in Rn, 2019. [32] , Explicit salem sets in Rn, 2020. [33] David Hilbert, U?ber die Darstellung definiter Formen als Summe von Formen- quadraten, Mathem. Annalen 32 (1888), 342?350. [34] Jean-Pierre Kahane, Some Random Series of Functions, Cambridge Studies in Advanced Mathematics, vol. 5, Cambridge University Press, Cambridge, 1993. [35] Jean-Pierre Kahane and Raphael Salem, Ensembles Parfaits et Se?ries Trigonome?triques, vol. 7, Cambridge University Press, 1964. [36] R. Kaufman, On the theorem of jarn??k and besicovitch, Acta Arith. 39 (1981), no. 3, 265?267. [37] Aleksandr Khinchin, Korrelationstheorie der stationa?ren stochastischen Prozesse, Math. Ann. 109 (1934), 604?615. [38] Mark Krein, Sur le proble?me du prolongement des fonctions hermitiennes pos- itives et continues, Dokl. Akad. Nauk SSSR 26 (1940), 17?22. [39] Mark Krein and David Milman, On extreme points of regular convex sets, Studia Math. 9 (1940), no. 1, 133?138. [40] H. Landau, Extrapolating a band-limited function from its samples taken in a finite interval, IEEE Trans. Inf. Theory 32 (1986), 464?470. [41] H. J. Landau, Maximum entropy and the moment problem, Bull. Amer. Mth. Soc. (N.S.) 16 (1987), no. 1, 47 ? 77. [42] Paul Le?vy, The?orie de l?Addition des Variables Ale?atoires, first ed., Gauthier- Villars, Paris, 1937. [43] Xiaodong Li, Shuyang Ling, Thomas Strohmer, and Ke Wei, Rapid, robust, and reliable blind deconvolution via nonconvex optimization, Appl. Comput. Harmon. Anal. 47 (2018), no. 3, 893?934. [44] Ste?phane Mallat and Guoshen Yu, Super-resolution with sparse mixing estima- tors, IEEE Trans. Image Process. 19 (2010), no. 11, 2889?2900. 110 [45] Hassan Mansour, Dehong Liu, Ulugbek S. Kamilov, and Petros T. Boufounos, Sparse blind deconvolution for distributed radar autofocus imaging, 2018. [46] Toru Maruyama, Herglotz-Bochner representation theorem via theory of distri- butions, J. Oper. Res. Soc. Japan 60 (2017), no. 2, 122?135. [47] Pertti Mattila, Fourier Analysis and Hausdorff Dimension, first ed., Cambridge Studies in Advanced Mathematics, vol. 150, Cambridge University Press, Cam- bridge, 2015. [48] Greg Ongie, Sampurna Biswas, and Mathews Jacob, Convex recovery of con- tinuous domain piecewise constant images from nonuniform Fourier samples, IEEETrans. Sitnal Process. 66 (2018), no. 1, 236?250. [49] Greg Ongie and Mathews Jacob, Off-the-grid recovery of piecewise constant images from few Fourier samples, SIAM Journal on Imaging Sciences 9 (2016), 1004?1041. [50] Robert R. Phelps, Lectures on Choquet?s Theorem, second ed., Lecture Notes in Mathematics, Springer, Berlin, 2001. [51] Dmitrii A. Raikov, On the decomposition of Gauss and Poisson laws, Izv. Akad. Nauk. SSSR 2 (1937), 91?124 (russian). [52] Walter Rudin, The extension problem for positive-definite functions, Illinois J. Math. 7 (1963), no. 3, 532?539. [53] , An extension theorem for positive-definite functions, Duke Math. J. 37 (1970), no. 1, 49?53. [54] Tapan K. Sarkar and Odilon Pereira, Using the matrix pencil method to es- timate the parameters of a sum of complex exponentials, IEEE Antennas and Propagation Magazine 37 (1995), no. 1, 48?55. [55] Zolta?n Sasva?ri, The extension problem for measurable positive definite func- tions, Math. Zeitschrift 191 (1986), 475?478. [56] , The extension problem for positive definite functions. A short historical survey, Operator theory and indefinite inner product spaces, Oper. Theory Adv. Appl., vol. 163, Birkha?user, Basel, 2006, pp. 365?379. [57] Thomas Strohmer and Ke Wei, Painless breakups?efficient demixing of low rank matrices, J. Fourier Anal. Appl. 25 (2019), 1?31. [58] Otto Toeplitz, U?ber die Fourier?sche Entwickelung positiver Funktionen, Rend. Circ. Mat. Palermo 32 (1911), 191?192. [59] Michael Unser, Julien Fageot, and John Paul Ward, Splines are universal so- lutions of linear inverse problems with generalized TV regularization, SIAM Review 59 (2017), no. 4, 769?793. 111 [60] David V. Widder, The Laplace Transform, Princeton Mathematical Series, vol. 6, Princeton University Press, Princeton, 1946. 112