ABSTRACT
Title of Dissertation: Uniqueness for continuous superresolution by
means of Choquet theory and geometric
measure theory
Ryan M. Cinoman
Doctor of Philosophy, 2021
Dissertation Directed by: Professor John Benedetto
Department of Mathematics
The problem of superresolution is to recover an element of a vector space from
data much smaller than the dimension of the space, using a prior assumption of
sparsity. A famous example is compressive sensing, where the elements are images
with a large finite resolution. On the other hand, we focus on a continuous form
of superresolution. Given a measure ? on a continuous domain such as the two
dimensional torus, can we recover ? from knowledge of only a finite number of its
Fourier coefficients using a total variation minimization method? We will see that
the answer depends on certain properties of ?. Namely, a necessary condition is
that ? be discrete.
We use methods from geometric analysis to investigate the continuous super-
resolution problem. Tools from measure theory relate properties of the support
of a measure, such as Hausdorff dimension, to properties of its Fourier transform.
We also use measure theory to investigate the possibility of alternatives to total
variation that may allow us to recover surface measures defined on space curves.
There is a theorem of Choquet concerning representations of points in convex
sets as sums of their extreme points. As it turns out, we can apply this to the
superresolution problem because the extreme points of the set of measures with total
variation 1 are precisely the set of delta measures. We consider superresolution as a
convex optimization problem, where the goal is to find representations of the initial
data as sums of delta measures. Choquet theory provides tools to investigate the
previously unresolved problem of uniqueness. We use this to give a novel sufficient
condition for a measure to be uniquely superresolved, given data on a known finite
set of frequencies.
Uniqueness for continuous superresolution by means of Choquet
theory and geometric measure theory
by
Ryan M. Cinoman
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2021
Advisory Committee:
Professor John J. Benedetto, Chair/Advisor
Professor Radu V. Balan
Professor Wojciech Czaja
Professor Matei Machedon
Dean?s Representative:
Professor Alexander Barg
? Copyright by
Ryan M. Cinoman
2021
Acknowledgments
I would like to extend my sincere thanks to my advisor, Dr. John Benedetto.
I deeply appreciate the opportunity he has given me to work on this exciting topic,
as well as the invaluable advice and guidance he has given me over the years.
I?d also like to gratefully acknowledge ARO Grant W911NF-17-1-0014, which
has supported me during my research.
Finally, I?d like to deeply thank my family and friends who have supported
me throughout my research. I?d like to extend a special thanks to my parents, my
boyfriend Lawrence Cho, and Jill Cassone. Without the unwavering support of each
of them, this thesis would not have been possible.
ii
Table of Contents
Acknowledgements ii
Table of Contents iii
Chapter 1: Introduction 1
1.1 Compressive Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Prony?s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Continuous Superresolution . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Resolution of Point Sources and Minimum Separation . . . . . . . . . 9
1.5 Attempts at Superresolving Non-discrete Measures . . . . . . . . . . 10
1.6 Positive Definite Functions . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7 Bochner?s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.1 G = Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.2 G = R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7.3 Additional Examples and Applications of Bochner?s Theorem 22
1.8 Positive Definite Extensions . . . . . . . . . . . . . . . . . . . . . . . 24
1.8.1 Extensions in Higher Dimensions . . . . . . . . . . . . . . . . 28
1.9 My Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 2: Measure Theory and Surface Measures 36
2.1 Definitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Energy Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Cantor Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4 Fourier Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5 Fourier Dimension and Salem Sets . . . . . . . . . . . . . . . . . . . . 47
2.6 Fourier Decay of Surface Measures . . . . . . . . . . . . . . . . . . . 51
2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Chapter 3: Choquet Theory 57
3.1 Choquet?s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Application to Completely Monotone Functions . . . . . . . . . . . . 61
3.3 Uniqueness and the Choquet Simplex . . . . . . . . . . . . . . . . . . 67
3.4 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.1 Choquet?s Theorem for Metrizable Spaces . . . . . . . . . . . 73
3.4.2 The Choquet-Meyer Theorem . . . . . . . . . . . . . . . . . . 75
iii
Chapter 4: My Results 86
4.1 Uniqueness of Positive Definite Extensions . . . . . . . . . . . . . . . 86
4.2 Faces of Choquet Simplices . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3 Uniqueness for Signed and Complex Measures . . . . . . . . . . . . . 97
Bibliography 108
iv
Chapter 1: Introduction
1.1 Compressive Sensing
A major motivation for studying superresolution is the success of the so-called
discrete case, known under another name as compressive sensing. The field of com-
pressed sensing is based on a famous set of papers published starting in 2004 written
by Cande?s, Romberg and Tao [14, 15, 16, 17]. The motivating problem is to find
sparse solutions to underdetermined systems of equations. The main results of the
original papers were that: yes, it is possible to recover a k-sparse vector in RN from
only m measurements when m & k ln(eN/k); the algorithm to do so is practically
effecient; and random sensing matrices are satisfactory for this algorithm with high
probability ? 1 ? 2 exp(?Cm) provided the number of measurements is sufficient
as before.
In more formal terms, what we want is a sensing matrix A which is ?short and
fat,? i.e. m ? N where m  N. We denote our sensed data y = Ax, where x is
some k-sparse (or approximately k-sparse) vector in RN , and we want an algorithm
which recovers x# = ?(y), so that the distance from x# to x is small in some sense.
The original work from Cande?s, Romberg and Tao showed that if x is indeed sparse,
1
then we can get not only approximate, but exact recovery from the algorithm:
x# = argmin?z?1 such that Az = y. (1.1)
This can be formulated as a linear programming problem and hence has polynomial
complexity, a vast improvement over the similar ?ideal? optimization problem?
sometimes referred to as `0 optimization,
x# = argmin|supp z| such that Az = y, (1.2)
which is essentially a combinatorial search.
This method is very powerful, and has implications for many applications, as
nearly sparse signals and images are very common in natural data [27]. However
there are a few obstacles which come up, which have been the topic of much study
over the past decades.
In their original papers, Cande?s, Romberg and Tao defined a number of prop-
erties of sensing matrices which are relevant to the compressive sensing problem,
including the Null Space Property (NSP), Uniform Uncertainty Principle (UUP),
Exact Reconstruction Principle (ERP), and Restricted Isometry Property (RIP)
[17]. Some of these have faded into mostly historical significance, but NSP and RIP,
in particular the latter, continue to get attention today.
Definition 1. (Null Space Property) An m?N matrix A is said to have the NSP
of order k if for any ? ? kerA \ {0}, S ? {1, . . . , N} with |S| ? k, ??S?1 <??Sc?1.
2
An equivalent charactarization is to say that the null space of A, considered as
a hyperplane in RN , when placed on a k-sparse face of the `1 unit sphere, is tangent
to it. Hence it is clear that whenever Ax = y has a k-sparse solution, the algorithm
in (1.1) will recover it.
While the NSP provides a good geometric intuition, in practice it is difficult
to work with. Also, although the NSP guarantees exact reconstruction if the signal
?is known? to be sparse, the introduction of even a small amount of error can make?x? x#? very large, which is troublesome for applications. The RIP is introduced
2
to deal with both of these problems.
Definition 2. (Restricted Isometry Property) An m?N matrix is said to have the
RIP of order k with constant ? ? (0, 1) if for any k-sparse x ? RN ,
(1? ?)?x?2 22 <?Ax?2 < (1 + ?)?x?
2
2 .
We say that ?k is the restricted isometry constant of A if ?k is the smallest ? > 0
such that A satisfies RIP of order k.
RIP is more restrictive than NSP, but in return we get much more robustness.
For this reason much research has been done on the RIP and various recovery
algorithms that depend on it in the past 15 years.
Both NSP and RIP represent sufficient conditions for accurate and unique
recovery in (1.70). The goal of this dissertation is to extend the results of compressive
sensing to a continuous domain, the two dimensional torus T2. In section 1.9 I
outline the space and algorithm of interest, and my main results, which are discussed
3
in more detail in Chapter 4. As I will show, in continuous space the question
of uniqueness of the reconstruction is much more of an issue than in the finite
dimensional case. However I will prove both a necessary and a sufficient condition
for unique reconstruction.
1.2 Prony?s Method
There have been some limited attempts to extend compressive sensing into
the continuous domain with good results, but limited generality. An example of a
widely studied problem is the recovery of point masses, where we assume the signal
to be recovered is assumed to be of the form
?M
f(x) = ?x (x). (1.3)k
k=1
Perhaps the most archetypal form of this problem dates back long before compressive
sensing. Prony?s method was known in the 18th century, but its ideas still see some
use today [26, 54].
Prony?s method is an algorithm for resolving discrete signals as in (1.3) on R.
However it is typically formulated on the frequency spectrum as a sampling problem.
The goal is to recover a signal f? ? L?(R) which is a finite sum of sinusoids,
?M
f?(t) = B ei?ktk (1.4)
k=1
4
from N = 2M evenly spaced samples,
?M
f?(?n) = B ei?k?nk , (1.5)
k=1
where ? > 0 is the fixed distance between samples.
The algorithm comes from recognizing two key relationships. First, since f?
is the sum of M exponentials, it is the solution to a particular linear difference
equation of M variables. That is to say we have
?M
f?(?n) = Pkf?(?(n? k)). (1.6)
k=1
Then the frequencies ?k are related to this equation by the roots of the characteristic
polynomial, ?M
zM ? P zM?11 ? ? ? ? ? PM = (z ? ei?k). (1.7)
k=1
The process of Prony?s method consists of first calculating the coefficients Pk
using (1.6),
?? ?? ? ? ???? FM?1 F1 ? ? ? F0 ???????? P1 ???? ?? ?
FM ??
??? FM FM?1 F1 ?????? P2 ?? ??
????? ?
?
FM+1 ??
? = ? , (1.8)? .. . . .. ?. . . ??????? .. ???? ?. ??? .. ?? ?? ? ? . ????
FN?1 ? ? ? FN?M?1 Pk FN
where Fk = f?(?k). Second, factor the polynomial in (1.7) to find the frequencies
5
?k. And finally the magnitudes Bk are found by another linear equation,
?? ?? ? ? ?
????
1 1 ? ? ? 1 ???????? B1 ???? ?? F1 ???? e?1? e?2? e?M?? ????????
?? ??
B2 ???? ?? ?
F2 ??
? = ? ?? .. . . . . ? . ?
. (1.9)
? . .
.. ???????? .
. ???? ???
.
? . ????
e?1?M ? ? ? e?M?M Bk FM
Prony?s method works on two conditions: first, that the number of samples is
at least twice the number of sinusoids; and second, that the sampling rate is at least
the Nyquist frequency. The second point is slightly more subtle, but it is because
during the polynomial factorization step we only solve for e?k , not the frequencies
themselves, hence they are non-unique up to a factor of 2??.
Prony?s method is seen in a broad variety of applications. Identifying and
demixing sinusoidal signals is useful in many signal processing applications, includ-
ing electromagnetics and antenna engineering [54]. It is frequently seen in the re-
lated, slightly more modern algorithms called matrix pencils, for finding generalized
eigenvalues of matrix operators, in a manner similar to finding the frequencies of
the sinusoids above [54].
1.3 Continuous Superresolution
Use of the total variation norm for basis pursuit in the continuous domain is
very recent, inspired by the massive success of compressive sensing in the discrete
case. The first work was from de Castro and Gamboa [25], who used Beurling?s
6
theory of minimal extrapolations to generalize some of the concepts of the NSP and
random sensing matrices from compressive sensing. Notably they showed that for a
sensing apparatus defined from a family of functions F = (u )Kk k=1, there must be a
solution to the optimization problem
?# = argmin???TV such that ?uk, ?? = yk, 1 ? k ? K, (1.10)
??M[0,1]
that?is supported in a finite set, which are the roots to a ?generalized polynomial?
1? k ckuk. Under certain conditions on the family F , these solutions may also be
unique.
The use of Beurling?s minimal extrapolations were expounded in a different
way by Benedetto and Li [6], who studied the case of F a family of complex ex-
ponentials. This is equivalent to having prior knowledge of the Fourier transform.
In this case the zero sets of ?generalized polynomials? are roots of trigonometric
polynomials.
The contribution of Benedetto and Li was giving quite sharp results for one
particular situation [6]. Let ? ? M(Tn) and our prior knowledge be f?(m) for
m ? ? ? Zn, not necessarily a rectangular box. We consider the set of minimal
extrapolations as follows. If
 = (f) = inf{???TV | ? ?M(T
2), ?m ? ? ??(m) = f?(m)}, (1.11)
then the set of minimal extrapolations is the set of extensions of f? which achieve
7
the upper bound:
E = {? ?M(Tn) | ?m ? ? ??(m) = f?(m) and ???TV is minimal}. (1.12)
They divide into three possible situations, based on the observation that for all
m ? ?, |f?(m)| ? . Define ? = {m ? ? | |f?(m)| = ???}TV . The cases are whether
|?| is 0, 1 or greater than 1. In other words, how frequently does |f? | achieve its
maximal value and does ?f??? =???TV ?
At first glance these distinctions may be arbitrary, but we will see in section
1.6, particularly in the work of Carathe?odory on positive definite functions [19], that
in the case of positive definite extensions, sequences which achieve their maxima
represent points on the boundary of the convex set of positive definite functions.
Similarly here, functions f : ?? C will fall on the boundary of the set of functions
f with (f) = . This is not insignificant; we can relate this property back to the
NSP from compressive sensing?in the context of compressive sensing, a signal is
sparse if it is in the boundary of a face of the `1 ball, which is key to why compressive
sensing works.
The result is if |?| =6 1, then we have results as Castro and Gamboa, that
solutions must fall within the roots of a trigonometric polynomial. If |?| ? 2, we
have even better, that this level set must take the form of a finite lattice, making
the calculation of the results very straightforward, and unique for sufficiently large
?.
8
1.4 Resolution of Point Sources and Minimum Separation
Other work on point superresolution was done by Fernandez-Granda, et al.,
who have developed a nice investigation on the relationship of the recovery problem
and the uncertainty principle [12, 13, 30]. One disadvantage of moving to continuous
space is that there is no longer a straightforward relationship between the sparsity
and the fineness of our signal. In other words, if we only have prior knowledge up to
a finite band limit, then we can not be expected to resolve signals which have details
which are arbitrarily fine. It is easy to construct examples which are Fernandez-
Granda approaches this by adding a constraint on the minimum separation of the
signal. If ? ?M(R) and supp? = S, the minimum separation of ? is
? ?
?(?) = inf ?t? t?? . (1.13)
t=6 t??S
In [13], Cande?s and Fernandez-Granda show that minimum separation re-
quirements are enough to find uniqueness. Remarkably, this doesn?t require any
modification of the search algorithm. Standard basis pursuit is enough to find these
discrete signals, even though it has no ?knowledge? of the minimum separation.
Their main theorem is as follows.
Theorem 1. Let ? ?M(R) satisfy (1.13) with ?(?) ? D. If the values of ??(k) are
known for |k| ? fc, then if fc ? 128 and
2
?(?) ? , (1.14)
fc
9
then ? is the unique solution to the basis pursuit algorithm,
?# = argmin???TV such that ??(k) = yk |k| ? fc. (1.15)
??M[0,1]
The same result holds with a less restrictive constant for higher dimensions
Rn as well, and it also holds up to stability in the face of noisy measurements. Later
work from Fernandez-Granda has improved upon these constants, going as low as
1.26 in the 1-dimensional case [30].
1.5 Attempts at Superresolving Non-discrete Measures
There are only a few results that attempt to perform signal recovery on signals
that are not discrete. They tend to lean strongly on the works prior described. A
generalization from Unser, et al. [59] lets us perform superresolution on classes of
splines, functions which behave like discrete functions when applied to a differential
operator. He shows that splines, rather than deltas, are the archetypal solution for
a slightly altered problem,
x# = argmin?Lx?1 such that A(x) = y, (1.16)
x?X
where L is a differential operator, X = {x ? BV (Rn) | Lx ?M(Rn)}, and A : X ?
Rm is a linear measurement function. The theory is very similar to the above work,
but it demonstrates the flexibility of the theory nicely.
Another creative approach was by Ongie [48, 49], who looked to generalize
10
Prony?s method to higher dimensions. Rather than starting with a mix of sinusoids,
they recognize a key step of Prony?s method is to factor the trigonometric polynomial
in (1.7). Let ?N
?(x) = ei?k?x, (1.17)
k=1
be a trigonometric polynomial in on R2. Let f ? L?(R2) be of the form
?M
f = ?k?U , (1.18)k
k=1
where Uk are disjoint open subsets of R2 and ? 2kUk = {x ? R | ?(x) =6 0}. Then
by recognizing that ?Df ? 0, it is possible to perform a similar calculation to
the 1-dimensional case and recover the support of f by factoring a trigonometric
polynomial. Bezout?s theorem puts a bound on the number of distinct Uk, so a
system of equations finds the weights ?k, and we have theoretically guaranteed
recovery.
1.6 Positive Definite Functions
Carathe?odory began studying positive definite functions in the early 20th cen-
tury, at around the same time he was doing work on convex geometry. The context
in which such functions were discovered was in complex function theory. The con-
nection is not immediately apparent, but as we will see there are deep similarities.
11
There we say that a positive definite function on the unit disc is one of the form
??
f(z) = 1 + (ak + ibk)z
k, (1.19)
k=1
which is analytic and has positive real part. Remarkably, Carathe?odory gave a
characterization of these functions that as we will see is useful for the problem we are
interested in today [19]. He said that such a function is positive definite if and only if
for each n ? N the coordinates (a1, b1, a2, ? ? ? , an, bn) fall in the convex hull generated
by the curve (cos(?), sin(?), cos(2?) ? ? ? , cos(n?), sin(n?)), for 0 ? ? ? 2?. Call
this set Qn, and say that a point in Qn represents a function f if f has those
coefficients for 0 ? k ? n.
The main line of his proof contains many of the same beats as the other results
in this paper. He recognizes the convexity of the set of positive definite functions
P . He identifies that the interior of P is nonempty, or in other words there are
functions f for which an analytic neighborhood of f falls within P . Therefore Qn
cannot fall in a lower-dimensional hyperplane of R2n. And then he proves using
complex analytic methods that the boundary of P are all rational functions, which
are generated by those of the form
1
f(z) = |?| = 1, (1.20)
1? ?z
which are represented by points on the curve (cos(?), sin(?), cos(2?), ? ? ? ,
cos(n?), sin(n?)), 0 ? ? ? 2?.
12
Toeplitz [58] later showed that an equivalent characterization of sequences
(a1, b1, ? ? ? ) is that ?n
ck?lzkzl ? 0 n ? N, zk ? C, (1.21)
k,l=1
where c0 = 2, ck = ak ? ibk, d?k = dk. This definition is the one that was used
commonly later in the century through today, and as we will see it has inspired a
host of problems of this form in more generality and under various conditions [56].
Indeed, as we will see it is quite relevant to the total variation optimization problem
in different settings as well.
Definition 3. Given a group G and a symmetric subset V ? G, with 0 ? V, we say
f : V ? C is positive definite if
?n
f(x x?1k l )cicj ? 0 (1.22)
k,l=1
for all sequences ? = (x )nj j=1 such that ?? ? ? V , and sequences (c )nj j=1 ? C.
For G = Z we get the definition above studied by Carathe?odory and Toeplitz,
and for G = R we get a situation that has been studied extensively, and which has
particular interest in harmonic analysis because of Bochner?s theorem, which relates
them to the Fourier transform of positive measures.
Theorem 2. (Bochner) For any locally compact abelian group G with dual group
G?, a function f : G ? C is positive definite if and only if there exists a unique
13
nonnegative measure ? ?M(G?) such that
?
f(x) = e2?ix? d?(?). (1.23)
G?
The result is named after Salomon Bochner, who proved it first in 1932 for
the case G = R, and later generalized his result to higher dimensions [8, 9, 56].
Although, Bochner?s result was actually inspired by Herglotz, who proved the theo-
rem for the case G = Z in 1911. I will give details on both Herglotz and Bochner?s
Theorems in the following chapter, but first I will give a few other properties of
positive definite functions.
Proposition 1. Let G be a locally compact abelian group, let V = ?? ? ? G and
let f : V ? C be a positive definite function.
1. f(0) ? 0
2. f(?x) = f(x)
? ?
3. ?f(x)? ? f(0)
These are all easily verified from the definitions above, and are standard in
the literature [56]. The following is a standard but slightly less trivial result which
was first shown by Artjomenko [2].
Proposition 2. If f is positive definite on R and continuous at 0 then it is uniformly
continuous.
Proof. Let |x? y| < ?. Choosing ? = {x, y, 0}, it follows from the definition of
14
positive definiteness that for any choice of {cx, cy, c0} ? C,
? ? [ ]
(|c0|2
2
+|c |2 +?c ?x y )f(0) + 2 Re cxc0f(x) + cyc0f(y) + cxcyf(x? y) ? 0. (1.24)
? ?
With the cho?ice c0 = 1, cx?= ?cy = ?f(x)? f(y)? /(f(x)? f(y)), and using the fact
that 2f(0) ? ?f(x)? f(y)?, we get
? ?
3f(0)? 2 Re f(x? y) ? 2?f(x)? f(y)?? ? (1.25)3
2(f(0)? Re f(x? y)) ? ?f(x)? f(y)? . (1.26)
2
Hence we have a uniform bound on the continuity of f .
Other regularity results follow from positive definiteness as well. In a similar
manner to the previous result, it is possible to bound the derivatives of a positive
definite function.
Theorem 3. If for some k ? N a positive definite function f defined on R is 2k-times
differentiable at 0, then f is 2k-times differentiable on R as well.
The following on analytic positive definite functions was proven by each of
Le?vy and Raikov independently [42, 51].
Theorem 4. If f is analytic and positive definite in a symmetric interval (?a, a),
then there exist positive constants ?, ? ? (0,?] such that f can be extended to a
function holomorphic on the horizontal strip
{z ? C | ?? < Im z < ?}. (1.27)
15
And finally, more recently Sasva?ri has proven a similar result for measurability
[55].
Theorem 5. If f is positive definite, and it is measurable on a symmetric interval
(?A,A), then it is measurable on all of R.
1.7 Bochner?s Theorem
I will not prove the theorem in full generality but I will provide details for the
two most relevant cases, G = Z (this case is often referred to as Herglotz?s theorem)
and G = R. The proofs given are not performed as in Bochner?s and Herglotz?s
original papers but are updated using distribution theory. The proofs as given are
based on a review from Maruyama [46]. Before I proceed to the proofs, I will make
a short statement on distribution theory that is necessary for them.
We will use distribution theory somewhat loosely in the upcoming section
for the purpose of being as general as possible while keeping the use of notation
reasonable. We will refer to S(G) as a set of test functions appropriate for G. For
example if G = Rn then the nat?ural choice is the Schwartz functions. For G = T
n we
take the periodic summations n ?(x+ nT ) of Schwartz functions. The particular
choice for other groups G is not important, as the properties that are necessary will
be clear in the proofs of the following results, and we will focus our results in this
paper on the two cases above. Given our space of test functions S(G), S ?(G) is the
set of distributions. We can extend the Fourier transform to S ?(G) in a standard
16
way by Parseval?s formula. For T ? S ?(G), FT is defined on F(S(G)) by
FT (F?) = T (?) ?? ? S(G). (1.28)
Lemma 6. Let T ? S ?(G) be a distribution. Assume for all ? ? S(G) with ? ? 0,
T (?) ? 0. Then T is a positive measure on G.
Proof. Assume that T is a positive distribution, as given above. T is a measure if
and only if it is a positive linear functional on C0(G). Because S(G) is dense in
C0(G), and T is positive, all that is necessary is to show that it is continuous with
respect to the supremum norm. Assume to the contrary, that there is a sequence
(?k)
n
k=0 of positive functions in S(G) that is uniformly bounded by a constant M ,
but for which T (?k) ? ?. Assume without loss of generality as well that each
?k has compact support. We lose no generality from doing this because compactly
supported functions are dense in both S(G) and C0(G). Define a sequence of smooth
compactly supported bump functions ?k ? S(G) such that for all k, 0 ? ?k ? 1,
?k ? 1 on the support of ?k, and all derivatives of ?k are uniformly bounded (Note
that this construction is valid for Schwartz functions).
Now see that T (?k?k) = T (?k)??. But since 0 ? ?k ?M,
?k(M ? ?k) ? 0 ?k ? N. (1.29)
Because the derivatives of ?k are uniformly bounded we have that
17
?
sup? ?T (M?)? <?, so we can conclude that
lim T (?k(M ? ?k)) = lim T (M?k)? T (?k?k) = ??. (1.30)
k?? k??
But this result conflicts with (1.29). Hence T is a bounded linear functional on
C0(G), and therefore is identified with a measure ? ?M(G) by the Riesz represen-
tation theorem.
The following proofs are from [46].
1.7.1 G = Z
Proof. (Herglotz) First, let ? ?M(T) be positive and
? 1
f(n) = e2?in? d?(?) n ? Z. (1.31)
0
Let (ck)
n
k=1 ? C and let 1 ? k < l ? n. For all x ? T we can write
?
e2?i(a?b)xc c = e2?i(k?l)xc c + e2?i(l?k)xa b k l ckcl +|ck|2 +|cl|2 (1.32)
a,b={l,k}
= 2 Re(e2?i(k?l)xckcl) +|ck|2 +|c 2l| (1.33)
?|ck|2 +|c 2l| ? 2|ck||ck| (1.34)
? 0. (1.35)
18
We can rewrite (1.22) by exchanging the order of integration:
?n ? 1 ? 1?n ?n ?
c c e2?i(k?l)?k l d?(?) = e
2?i(a?b)?cacb d?(?), (1.36)
k,l=1 0 0 k=1 l=k a,b={l,k}
which as shown in (1.32) is nonnegative.
We have shown that the Fourier transform of a positive measure is positive
definite. Now we must show the conve?rse. L?et f : Z ? C be positive definite. By
one of the properties in proposition 1, ?f(n)? ? f(0) for all n ? Z. This is enough
to demonstrate that f = FT for some distribution T ? S ?(G). Note that because
e?2?inx ? S(T), we can quickly calculate that f(n) = T (e2?inx). We want to show
that T is positive.
Let q be a trigonometric polynomial of order N ,
?N
q(x) = c e2?inxk . (1.37)
n=?N
We can write ?? ?q(x)? ?N2 = c 2?i(k?l)xkcle , (1.38)
k,l=?N
and then we get the formula
(?? ?? ) ( ( )) ?N2T q(x) = FT F |q(x)|2 = f(l ? k)ckcl ? 0. (1.39)
k,l=?N
As the trigonometric polynomials are dense in S(T), we can say then that for
generic q ? S(T), T (|q(x)|2) ? 0. So for arbitrary p ? S(T) such that p ? 0, choose
19
?
q(x) = p(x) and it immediately follows that T (p) ? 0.
By lemma 6, we can conclude that T is a positive measure on T, and the proof
is complete.
1.7.2 G = R
Theorem 7. A function f : R? ? C is the Fourier transform of a finite positive
measure ? ?M(R?) if and only if it is positive definite and continuous at 0.
Proof. The proof begins similar to that of Herglotz? theorem. For ? ?M(T) positive
and ?
f(x) = e2?ix? d?(?), (1.40)
R
we must show that (1.22) holds for any choice of (ck)
n
k=1 ? C, and any (xk)nk=1 ? R.
Write
?n ? ? ?n ?n ?
c c e?2?i(xk?xl)? d?(?) = e2?i(a?b)?k l cacb d?(?). (1.41)
k,l=1 R R k=1 l=k a,b={l,k}
The rightmost sum is positive as shown before, so the function ?? is positive definite.
Note that we may justify changing the order of integration because ? is finite. By
the same reasoning we may differentiate ?? under the integral sign to show that it is
differentiable and therefore continuous.
? ?Now assume that f : R?? C is continuous and positive definite. We have that?f(x)? ? f(0) <? because it is positive definite. So f ? S ?(R?) and we can identify
a distribution T such that f? = T ? S ?(R). We must show that T is positive.
20
Let ? ? S(R?) be a test function. Then it follows from (1.22) that
??
f(x? y)?(x)?(y) dx dy ? 0. (1.42)
R?2
If this is not clear, then consider approximating the integral by a Riemann sum,
which will be of the form in (1.22) irrespective of the choice of partition or ?.
Through a change of variable we can write
??
f(y)?(x)?(x? y) dx dy ? 0. (1.43)
R?2
?
Let ?(x) = ?(x)?(x? y) dy, and note that ?f,?? ? 0. Calculate the following for
??:
?
??(x) = 2?ix??R???(?)e d?
= ?? ?(?)?(? ? ?) d?e
2?ix? d?
R?2
= ? ?(?)?(?)e
2?ix?e?2?ix?? d? d?R?2
= ?(?)e2?ix?? ? d? ?(?)e
?2?ix? d?
? R?? ?
R?
= ??(x)?2 .
W?e can conclude that if p(x) ? S(R) with p(x) ? 0, then by choosing ??(x) =
p(x) ? S(R), T (p) = T (|??|2) = ?f,?? ? 0. Hence T is a positive functional, and
21
by lemma 6 there exists a unique finite positive measure ? ?M(R) such that
?
T (?) = ? d? ?? ? S(R). (1.44)
R
1.7.3 Additional Examples and Applications of Bochner?s Theorem
Notably, there are not many examples of functions which verify Bochner?s
theorem through an easy computation. One of the few is f(x) = e2?i?x = F??, for
? ? G? For (c )nk k=1 ? C and (xk)nk=1 ? G,
?n ?n ?n ?
c c e2?i?(xk?xl) 2?i?(a?b)k l = cacbe ? 0. (1.45)
k,l=1 k=1 l=k a,b={l,k}
This is suggestive of Carathe?odory?s work on holomorphic functions on the
disc. Recall that his characterization of positive definite functions identified that
a certain set of rational functions form the boundary for the convex set of positive
definite functions [19]. Then Toeplitz made the connection between holomorphic
functions ??
f(z) = c kkz , (1.46)
k=0
and positive definite sequences (ck)
?
k=0 ??C. Formally we can write that on the
boundary of the unit circle, f(e?2?i?) = c e?2?ik?k = F(ck)(?). And see that if
22
c = e2?i?kk as in (1.45), then f(z) is a rational function of the form
1
f(z) = , (1.47)
1? ?z
which as we?ve seen generate the set of positive definite functions as their convex
hull.
The theoretical interest in Bochner?s theorem is obvious, as it gives a char-
acterization of an important class of Fourier transforms. But it also has yielded
more applications through its life. A classical example due to Khintchin comes from
probability theory of continuous random processes [37].
Let (?, ?,P) be a probability space, and Xt(?) : R ? ? ? C be a stochastic
process. That is to say X is measurable in ?? L and for all t ? R, Xt is a random
variable on ?. X is said to be second order stationary if second moments exist,
and both the expectation E(Xt) and covariance Cov(Xt, Xt+? ) do not depend on t.
Define the covariance function
C(?) = Cov(Xt, Xt+? ). (1.48)
As it turns out, C(?) is a positive definite function. In fact, since Cov(?, ?) forms an
23
inner product, we can calculate for any pair (s, t), and complex numbers (cs, ct),
?
cacb Cov(Xa, Xb)
a,b={s,t}
= |c 2s| Cov(X 2a, Xa) + 2 Re csct Cov(Xa, Xb) +|ct| Cov(Xb, Xb)
? 0.
So for any collection (xk)
n
k=1 ? R, (c )nk k=1 ? C, we have
?n ?n ?n ?
ckclC(xk ? xl) = cacbC(xa ? xb)
k,l=1 ?k=1?l=k a,bn n ?={k,l}
= cacb Cov(Xx , Xxa)b
k=1 l=k a,b={k,l}
? 0.
By application of Bochner?s theorem, we have that for any such X, there exists a
measure ? ?M(R) such that
?
C(?) = e?2?it? d?(t). (1.49)
R
1.8 Positive Definite Extensions
What we call the positive definite extension problem was first posed explicitly
by Krein in 1940 [38]. For a locally compact abelian group G and symmetric subset
V , as in definition 3, denote the set of positive definite functions on V by P (V ).
24
The question is, can we extend every function f ? P (V ) to a function in P (G)?
The case G = Z has the characterization from Carathe?odory [19], which as
we?ve seen predates Krein?s question by thirty years. For Vn = {k ? Z | |k| ? n}, he
gave a characterization of those sequences in P (Vn) which have extenstions to P (Z).
Recall that the point (a?1, b1, ? ? ? , an, bn) is said to be the geometric representative of
a function f(z) = 1 + k akcos(2?k) + bksin(2?k) if and only if it falls within the
convex hull Qn of the curve defined by
(cos(2??), sin(2??), ? ? ? , cos(2n??), sin(2n??)), 0 ? ? < 2?. (1.50)
Krein?s problem is related by a simple change of variables. In fact in a later paper
Carathe?odory showed that the two sets coincide [20].
Theorem 8. (Carathe?odory) Every positive definite function on Vn can be extended
to a positive definite function on Z.
Proof. A function f defined on Vn is positive definite if and only if the matrix
A = (a nij)i,j=0, given by aij = f(i ? j), is positive definite, i.e. for all c ? Cn+1,
c?Ac ? 0. Consider for some z ? C the matrix formed by extending A by one row
25
and column in the following way:
?? ????? f(0) f(1) ? ? ? f(n) z? ?????? f(1) f(0) ? ? ? f(n? 1) f(n) ???
A = ??? .. . . ?z ? .
. . .. ???? (1.51)
???? f(n) f(n? 1) ? ? ? f(0) f(1) ????
z f(n) ? ? ? f(1) f(0)
Az is a Hermitian matrix so we can say a few things. First, that it has a uni-
tary decomposition, and therefore n+ 1 distinct real eigenvalues (? )n+1i i=0 (including
multiplicity) and orthogonal eigenvectors (vi)n+1i=0 .
We know by assumption that for c = (c )n+1 n+2 ?i i=0 ? C , if cn+?1 = 0, then c Azc ?
0. Consider the n + 1-dimensional subspace K = {c ? Cn+2 | i civin+1 = 0}. For
all c ? K, we have
(? )T (n+1 ? )n+1 ?n+1
c iiv Az c
i
iv = |c 2i| ?i ? 0. (1.52)
i=0 i=0 i=0
Because this is true for c in an (n+ 1)-dimensional space, we can conclude that at
least n+ 1 of the eigenvalues ?i are nonnegative. Since Az is positive definite if and
only if each of its eigenvalues are nonnegative, it follows t?hat since at most one ?i
is nonpositive, Az is positive definite if and only if |Az| = ?i ? 0.
The problem then reduces to solving a polynomial inequality |Az| ? 0, which
26
can be computed to have a solution of the form
|z ? z0| ? r (1.53)
for some z0 ? C, r ? 0.
We can continue this process for all n to inductively show that an extension
exists to P (R).
In Krein?s paper, he proved the logical next step for the problem by extending
such a result to G = R [38].
Theorem 9. (Krein) Any continuous positive definite function f on an interval
(?A,A) can be extended to a positive definite function on R.
I?ll give a sketch of the proof courtesy of Sasva?ri [56]. Let LA be the set of
functions with the form ?n
?(t) = c e2?ixktk , (1.54)
k=1
where ck ? C and ?A < xk < A for all k. Then define the functional ?: LA ? C
by ?n
?(?) = ckf(xk). (1.55)
k=1
It follows that ? is a nonnegative functional, and can therefore be extended contin-
uously to a nonnegative functional on L?(R). Then by setting F (x) = ?(e2?ixt), it
27
can be shown that f(x) = F (x) on (?A,A), and that
? ??n ???? ?n ?? ?2?
ckclF (xk ? xl) = ????? c 2?ixkt?ke ?? ? ? 0, (1.56)k,l=1 k=1
for any collections ck ? C, xk ? R.
1.8.1 Extensions in Higher Dimensions
When we go to higher dimensions the situation quickly becomes much more
complicated. Consider the simple situation where G = Zn, n ? 2. Ten years later
it was proven by Caldero?n and Pepinsky that a similar extension is not possible.
Theorem 10. For n > 1, there exist positive definite functions defined on
{(k1, ? ? ? , kn) ? Zn | |ki| ? M} such that they cannot be extended to positive
definite functions on Zn.
And another decade later it was Rudin who expounded and extended their
result to Rn [52].
Theorem 11. (Rudin) For n > 1, there exist continuous positive definite functions
f : In ? C, where I = (?a, a) is a symmetric interval, such that f cannot be
extended to a positive definite function on Rn.
The proof is based on a relationship between positive definite functions and
sums of squares of polynomials, a problem that had been studied by Hilbert in the
19th century. Hilbert discovered that there exist nonnegative polynomials of three
28
variables that are not sums of squares, and later examples for n = 2 were found as
well [33].
Theorem 12. (Hilbert) For a given n and d ? N, every nonnegative polynomial of
degree d in n variables can be expressed as a finite sum of squared polynomials if
and only if one of the following are true:
? n = 1
? d = 2
? n = 2 and d = 4.
Given a finite set ? ? Zn, define a subset of the trigonometric polynomi-
als which are supported on V = ? ? ?. X? will be the set of all trigonometric
polynomials of the form ?
f(?) = c(x)e?2?i??x. (1.57)
x?V
We will call these ?-polynomials. It is obvious that the coefficcients c(x) can be
retrieved from the Fourier transform of f . Define P? as the subset of X? of nonneg-
ative trigonometric polynomials. Then let Q? be the set of sums of squares?that
is functions of the form
?J ?? ?2f(?) = gj(?)? , gj ? X?. (1.58)
j=1
Immediately we have that Q? ? P? ? X?.
The goal is to establish a correspondence between Q?, P? and the positive def-
29
initeness of c(?). Then Hilbert?s result will tell us when positive definite extensions
are possible. With that in mind refer to PD(V ) and PD(Zn) as the set of positive
definite functions on V and Zn respectively. Given a function ? defined on V such
that ?(?x) = ?(x), we can define a real functional on X? by
?
L?(f) = ?(x)f?(x). (1.59)
x?V
In fact, since X? is finite dimensional, it is easy to see that every linear functional
on X? is of this form for some ?. The following two results establish the connection
to positive definite sequences.
Proposition 3. A function ? on V is in PD(?) if and only if L?(f) ? 0 for every
f ? Q?.
Proof. Let ?
g(?) = c(x)e?2?i??x. (1.60)
x??
Then we have
?? ?? ?2g(?) = c(x)c(y)e?2?i??(x?y)
x?,y??
L (|g|2? ) = c(x)c(y)?(x? y).
x,y??
So L (|g|2? ) is positive regardless of the choice of g if and only if ? is positive definite.
By the linearity of L?, the same is true for any f ? Q?.
Proposition 4. A function ? defined on V can be extended to a member of PD(Zn)
30
if and only if L?(f) ? for all f ? P?.
Proof. If ? ? PD(Zn) then by Bochner?s theorem there is a nonnegative measure
? ?M(Tn) such that ?
?(x) = e2?i??x d?(?). (1.61)
Tn
If f ? X? then by Parseval?s formula we have
? ?
L?(f) = f?(x)?(x) = f d?. (1.62)
Tnx?V
Then L?(f) ? 0 if f ? P?.
For the other implication, suppose L?(f) ? 0 for all f ? P?. Without loss of
generality assume |f | ? 1. Because L?(1) = ?(0) we have? that L? ?(f) = L?(1? f)?
?(0) ? ??(0) and L?(f) = ?(0)? L?(1? f) ? ?(0), so ?L?(f)? ? ?(0). If ?(0) = 0
then L? = 0 on X? and ? = 0 on V , and the result is true.
Otherwise without loss of ?gene?rality assume that ?(0) = 1. Then L? is a
bounded linear functional with ?L ?? = 1, and by the Hahn-Banach theorem it
extends to a linear functional of norm 1 on C(Tn), which is identified with a measure
? ?M(Tn). We have that for f ? X?,
?
L?(f) = f(??) d?(?). (1.63)
Tn
Because 1 = L?(1) = ?(Tn) ? ???TV = 1, we can conclude that ? ? 0. We have
31
that ?
?(x) = L (e2?i??x? ) = e
2?i??x d?(?), (1.64)
Tn
so ?? is an extension of ? to Zn. By Bochner?s theorem it is positive definite, so the
result is proved.
Then we complete this process by coming to the following result.
Theorem 13. (Rudin) Given a finite set ? ? Zn, P? = Q? if and only if every
? ? PD(?) can be extended to a function in PD(Zn).
Proof. The proof will be in two parts. First we must show that Q? is closed in P?,
and then we go on to prove the result. ?
Say that dimX = |V | = d. Let r > d, and f = r? i=1|g |
2
i , where gi ? X?.
There is a nontrivial set of (?)ri=1 ? R such that
?r
? |g |2i i = 0. (1.65)
i=1
Assume without loss of generality that ?i ? ?j whenever i ? j. By solving (1.65)
for |g |2r , we get that ?r?1 ( )
f = 1? ?i |gi|2 . (1.66)
?
i=1 r
So f is a sum of r?1 squares. By repeating this process iteratively, we can conclude
that each f ? Q? is a sum of at most d squares.
Now find a sequence of (fn)
?
n=1 ? Q? such that fn ? f ? X? uniformly. For
32
each fn there are d ?-polynomials (gjn)
d
j=1 such that
?d ?? ??2fn = gjn . (1.67)
j=1
Each fn is uniformly bounded, hence gjn are also. We can conclude that there is a
subsequence (ni)
?
i=1 such that for each 1 ? j ? d and for all x ? ?, there exist cj(x)
such that
lim g?jn (x) = cj(x). (1.68)
?? ii
Then it is clear that for ?
g (?) = c (x)e?2?i??xj j , (1.69)
? ? ? x??2
f = ? ?j gj . We conclude that f ? Q?, thus Q? is a closed subset of P?.
Now we will proceed to the second part of the proof. Assume that P? = Q?.
Let ? ? PD(?). We know from proposition 4 that L?(f) ? for all f ? Q?. Since
Q? = P?, we have also L?(f) ? 0 for f ? P?, so by proposition 4, ? can be extended
to a function in PD(Zn).
For the other direction, assume that P? 6= Q?. Then there exists some f0 ? P?
such that f0 ?/ Q?. Because Q? is a closed convex set, the Hahn-Banach theorem
guarantees the existence of a hyperplane in X? that separates Q? from f0. In other
words there is a bounded linear functional L such that L(Q?) ? 0 and L(f0) < 0.
Because X? is finite dimensional, there exists a function ? on V such that L = L?.
By propositions 3 and 4, we can conclude that ? ? PD(?) but can?t be extended to
a function in PD(Zn). The proof is complete.
33
Rudin contributed once more in 1970 to extend the one-dimensional theorem
to one for radially symmetric functions [53].
Theorem 14. (Rudin) If f is positive definite on B(0, r) ? Rn and radially sym-
metric, that is f(x) = ?(|x|), then f can be extended to a positive definite radially
symmetric function on Rn.
1.9 My Contributions
The remainder of this dissertation will be focused on the following algorithm:
?# = argmin???TV such that ??(m,n) = ymn, (1.70)
??M(T2)
?? is the Fourier transform and ymn is the a priori known values of the Fourier
transform, for ?N ? m,n ? N. In Chapters 2 and 3, I will introduce concepts in
measure theory and Choquet theory respectively to address the motivating question:
which measures can be recovered uniquely through (1.70)?
We have seen some results that give unique recovery for certain kinds of dis-
crete measures, so in Chapter 2 we create tools to try and recover measures which
are not discrete, focusing on measures defined on smooth manifolds. There are dif-
ficult limitations to this task, though we generate ideas that may be fruitful with
future research. But the difficulties provide motivation not only to focus in on the
relationship between delta measures and (1.70), but if possible to try and represent
all solutions in terms of discrete measures.
34
Chapter 3 discusses a field of convex geometry called Choquet theory, which
is concerned precisely with representation problems of this type. We will see that
delta measures form the so-called extreme points of the total variation norm, and
as such are well suited to be solutions to the algorithm in (1.70).
In Chapter 4 we use results from Choquet theory to state our main results. Our
Theorem 30 says that if we restrict to positive solutions, any initial data y admits at
least one solution ?# which is discrete, and thus our only hope for unique recovery
can come from discrete measures. Theorem 31 generalizes this result to signed and
complex measures. Then our Theorem 33 gives a novel sufficient condition on the
initial data y and integers N such that (1.70) yields a unique result.
35
Chapter 2: Measure Theory and Surface Measures
2.1 Definitions and Notation
One thing we are interested in is whether it is possible to find application to
a broader category of measures than discrete ones. For example, is there a circum-
stance in which we can use total variation methods to recover a surface measure?
Or perhaps a singular measure like the Cantor measure? As it turns out, there are
some fundamental difficulties inherent in the geometry of algorithm (1.70) which
make these problems difficult. In order to explore these questions more fully will
require a deeper understanding of measure theory.
First some preliminaries on geometric measure theory. The field was developed
in the mid 20th century primarily to work on problems related to energy-minimizing
surfaces. Early pioneers include Wendell Fleming and Herbert Federer, the latter of
whose textbook remains a fundamental source on the field today [29]. As the theory
developed it found application to many more problems in both analysis and geometry
and has become a staple of the mathematical landscape. Fundamental concepts
include rectifiable sets, which are generalizations of sets witch admit tangent spaces,
and integral currents, which generalize the ideas of manifolds [47].
Fourier analysis has played an ever increasing role in the research of modern
36
geometric analysts, such as Kenig and Toro, Hofmann, Mitrea and Taylor, Hofmann,
Martell and Uriarte-Tuero [47]. Two excellent sources on the development of Fourier
analysis in geometric measure theory are by Kahane and Salem [35], and a more
recent book from Mattila [47]. While the specifics of these works aren?t of interest
to us for the most part, the machinery involved is, particularly results which might
shed insight on the behavior of the Fourier transform. We hope to use those results
in conjunction with our work on positive definite measures.
As going forward I will begin to look at more complicated measures I will make
some definitions in order to be precise with terminology. For the n-dimensional
Lebesgue measure in Rn (or Tn, when applicable), I will write Ln. I refer sometimes
to a surface measure, which I will denote ?K , where K is the image of a rectifiable
curve if n = 2, or some (n? 1)-dimensional surface for general n. Frequent choices
are K = Sn?1 for the surface measure on the unit sphere, or K = {(x1, ? ? ? , xn) |
xn = 0}. The surface measure may be defined in a few equivalent ways. Here I will
say that it may be defined by an isometry from a rectifiable curve to (R,L1).
Definition 4. Let ? : [0, 1]? R2 be a unit speed nonintersecting curve with image
K. ? (A) = L1(??1K (A)).
This construction is also called a push-forward of L1 under ?. It turns out that
this definition is equivalent to the 1-dimensional Hausdorff measure restricted to K
also, which I will define.
In addition, in the cases where K = Sn?1 or K = {(x1, ? ? ? , xn) | xn = 0}, K
is a topological group under rotations and horizontal translations respectively, and
37
therefore has a unique Haar measure. Because the Lebesgue measure?and therefore
the Hausdorff measure as well?is invariant under each of those transformations, it
can be seen easily that the unique Haar measure is equivalent to the Hausdorff
measure definition up to a constant.
Denote by M(X) the set of Radon measures. We say a measure ? on X is
Radon if it is Borel-regular and locally finite. ? is Borel regular if for each A ? X,
there is a Borel set B ? A such that ?(B) = ?(A). If in addition |?(X)| < ?,
then we say that ? is bounded and we denote the set of bounded Radon measures
Mb(X). In general for this chapter I refer to positive real-valued measures, but for
later purposes I use this notation for complex-valued measures as well as specified.
In addition, in this chapter it becomes necessary at times to distinguish when a
measure is bounded or unbounded. In all other chapters we will only be concerned
with bounded measures, so I will use the notationM(X) to mean the set of bounded
measures.
Definition 5. The Hausdorff measure Hs in Rn is defined by
Hs(S) = limHs
? ?
(S), (2.1)
? 0
where for ? > 0,
?? ?
Hs?(S) = inf?? ? ??(s)2?s ?d(Ej)s ? S ? ?jEj, d(Ej) < ?? . (2.2)
j
d(A) is the diameter of A, and ?(s) is a positive constant, which may be scaled so
38
that when s is an integer it agrees with the volume of the s-dimensional unit ball.
If they agree, then in R2, Ln = Hn.
Likewise, the Hausdorff dimension of a set S ? Rn is
dimS = inf{s | Hs(S) = 0} = sup{s | Hs =?}. (2.3)
2.2 Energy Integrals
Understanding Hausdorff measures is important, but what we are really inter-
ested in are more general measures, defined on lower-dimensional sets. We would
like a natural way to relate a measure supported on a set to the dimension of its
support. As it turns out, one of the first central results in geometric measure the-
ory does just that. The goal of the following lemma is to aid in computing lower
bounds for Hausdorff dimensions. From the definition of Hs, it seems that comput-
ing lower bounds might be much more difficult that upper bounds, because we must
prove bounds for arbitrary coverings. Frostman?s lemma gives us a way to pseudo-
approximate from below, by showing the existence of measures that ?behave well?
on balls that approximate the set from below.
Theorem 15. (Frostman?s Lemma) For a Borel set S ? Rn, Hs(S) > 0 if and only
if there is a measure ? ?M(Rn) with supp(?) ? S such that
?(B(x, r)) ? rs ?x ? Rn, r > 0. (2.4)
39
I will not prove the theorem here, but I will point out that one direction is
immediate. If we have such a ? satisfying the condition, then given any covering of
S with balls (Bj), ? ?
d(Bj)
s ? ?(Bj) ? ?(S) > 0. (2.5)
j j
An immediate consequence of Frostman?s lemma is that
dimS = sup{s | ?? ?M(S) such that (2.4) holds}, (2.6)
which is a demonstration of the lemma?s usefulness in computing Hausdorff dimen-
sions.
We now have a way to relate geometry of sets and measures. A natural question
is whether we can go in the reverse direction, and find the ?dimension? of a measure
in some sense? The answer is that it is much more difficult, but there are some
things we can do. Begin by defining the s-energy of a measure, which will quantify
the criterion given in (2.4).
??
Is(?) = |x? y|?s d?(x) d?(y). (2.7)
We can relate this to equation (2.4) through a change of variables.
? ? ?
| ? ?(B(x, r))x y|?s d?(y) = s dr (2.8)
0 r
s+1
If ? satisfies (2.4), then we have that if 0 < t < s, and if ? is compactly supported
40
with diameter ?M ,
?? M ??(B(x, r)) M
I (?) = t dr d?(x) ? rs?t?1t dr <?. (2.9)
rt+10 0
In add?ition, if Is(?) < ?, then we can construct a measure that satisfies (2.4).
Since |x? ?s? y| d?(x) < ? for almost all y, we can choose M > 0 and construct
A = {x | |x ? y|?s d?(y) < M} such that ?(A) > 0. Then for all x, r > 0,
? (B(x, r)) ? 2sMrsA . So ?A satisfies (2.4).
We can conclude with a new full characterization of the Hausdorff dimension
in terms of the energy of supported measures.
dimS = sup{s | ?? ?M(S) such that Is(?) <?}. (2.10)
?? 1
For example, if ?=L1[0,1], Is(?) = |x ? y|?s dx dy < ? if and only if0
s < 1. We also get a similar result if instead of [0, 1] we take L1K where K ? Rn
is the image of a rectifiable curve.
Analogously in higher dimensions if A ? Rn has Ln(A) > 0, then
Is(LnA) <? if and only if s < n.
2.3 Cantor Measures
As an aside I?d also like to examine a less trivial example, which has non-
integer dimension. For this section I?m going to define and prove a few results on
Cantor sets and measures. I will define the Cantor set in the standard way.
41
Let I ? R be a closed interval of length one. We form the Cantor set iteratively
by deleting the middle third from each continuous interval at each step of the process.
Let I1,1 = I, then I2,1 and I2,2 are the two closed intervals that are produced by
removing the middle third of I. Likewise I2,1 is split into I3,1 and I3,2, and so on in
this fashion. The Cantor set itself is then defined
?? ?2k
C = Ik,i. (2.11)
k=1 i=1
The Cantor measure is typically defined using methods from real analysis. One
can define a Cantor function which is constant on each interval in the complement
of C, and then define the Cantor measure as the distributional derivative of this
function. The method of construction is not important, but what is important is
to note the following property of the Cantor measure, which I will heretofore refer
to as ?: for any I ?kk,i, ?(Ik,i) = 2 ?(C). In fact this property alone, along with the
criterion that 0 < ?(C) <?, is enough to characterize ? up to a constant.
Let s = log(2)/ log(3). I will take note of two identities.
( )s
1
2 = (1, ) (2.12)3 sk
1
d(Ik,i)
s = = 2?k = ?(Ik,i). (2.13)
3
Then I will show that 0 < Hs(C) ? 1. For any k, we have
?2k ?2k
d(I s kk,i) = ?(Ik,i) = 2 ?(Ik,i) = 1. (2.14)
i=1 i=1
42
Since ?2ki=1Ik,i is a cover of C, Hs(C) ? 1. For the other inequality, we wish to
apply Frostman?s lemma to ?. Let J be an interval which intersects C but does not
contain C. Let Il,j be (one of) the largest intervals completely contained in J . Then
J ? C is contained in the union of at most four intervals Il,j1 , ? ? ? , Il,j4 . Hence
?(J) ? 4?(Il,j) = 4d(Il,j)s ? 4d(J)s. (2.15)
We conclude from Frostman?s lemma that Hs(C) > 0.
In fact, because of how ? was defined, an immediate consequence of this fact
is that HsC is equivalent to ? up to a constant. Lastly, we can combine this result
with equation (2.9) to conclude that It(?) <? if and only if t < s.
2.4 Fourier Analysis Methods
Next I will move into some tools for Fourier analysis by returning to the energy
integral (2.7). I can write
?
Is(?) = ks ? ?(x) d?(x), (2.16)
where k ?ss is the Riesz kernel |x| . I can use the fact that for 0 < s < n, k?s =
?(n, s)kn?s and apply Parseval?s theorem to get a new formula for the s-energy:
?
Is(?) = ?(n, s) |x|s?n|??(x)|2 dx. (2.17)
43
Now the flavor for the Fourier analysis of geometric measures becomes clear:
the dimension of the support of a measure must be related to the rate of decay of
its Fourier transform. At the very least we have a quick bound. If
??(x) ? ?s/2 (2.18)
then Is(?) <? and therefore dim{supp?} ? s.
However the reverse picture is not so clear. Not only do there exist measures
that don?t obey this rule, in fact there are sets which don?t admit any measures
which achieve (2.6) for s equal to their Hausdorff dimension. To express this gap
we have two new definitions. Define the Fourier dimension of a set S by
dimF S = sup{s ? n | ?? ?M(S) such that ??(x) ? |x|?s/2}. (2.19)
As we showed above we have the bound dimF S ? dimS. But equality is not
achieved in general. Those sets that do achieve equality are called Salem sets.
Formally we can calculate the form of k?s from the two facts that it is radial
and it has the property that ks(rx) = r
?s(x). That implies that k?s must also be
translation invariant and it has the property that
? ?
k? (r?) = e?2?irx??s ks(x) dx = r
?n e?2?ix??k s?ns(x/r) dx = r k?s(?). (2.20)
Rn Rn
It follows that k?s = ?(n, s)kn?s for some constant ? depending only on s and n.
It is important to note that ks ?/ Lp for any 1 ? p ? ?, but can be written
44
as a sum of Lp functions, and therefore is a tempered distribution. The above may
be formalized in a more careful manner using this fact. I would next like to give a
derivation of the Fourier form of the engergy integral Is in (2.17).
Theorem 16. If ? ?M(Rn) is a measure and 0 < s < n, then
? ?
Is(?) = ks ? ?(x) d?(x) = ?(n, s) |??(?)|2|?|s?n d?. (2.21)
Rn Rn
Proof. If we assume that Parseval?s theorem holds, then the computation is straight-
forward.
? ? ? ?
k ? ?(x) d?(x) = (k?? ?)?? = k? s?n 2s s s???? = ?(n, s) |x| |??| dx. (2.22)
Rn Rn Rn
However, that is not a trivial assumption in this case, because neither ks nor ? may
be integrable functions, nor can they be assumed to have compact support, so we
will have to make the calculation more directly.
Let S(Rn) be the space of Schwartz functions, and S ?(Rn) the tempered dis-
tributions. It is straightforward to show that for any 0 < s < n, k ? S ?(Rns ), so the
convolution is at least well-behaved for ? = ? ? S(Rn).
?
I (?) = |??(x)|2|x|s?ns dx ?? ? S(Rn). (2.23)
Rn
What we wish to do then is approximate ? with test functions and show that the
equality holds true in the limit.
45
? Let ? be an approximate identity. That is ? ? S(R
n), with 0 ? ? ? 1,
? = 1. Define ?(x) = 
?n?(x/). Then ? = ? ? ? ? n ??? ? in S (R ). By applying
?0
Parseval?s formula to ? = ?, we get
?
Is(?) = ks ? ?(? ? ?)(x)(? ? ?)(x) dx (2.24)
= ?(n, s) |x|s?n|??(x)|2|??(x)|2 dx. (2.25)
?
The term in (2.25) converges monotonely to ?(n, s) |x|s?n|??(x)|2, so we only
need to examine the term in (2.24). Write
?? (? ) (? )
Is(?) = ???? ks(x? y)?(y ? z) d?(z) dy ?(x?w) d?(w) dx
= |x? y|?s?(y ? z)?(x?w) dx dy d?(z) d?(w)
To show that we may exchange the order of integration, consider the integral
??
|x? y|?s?? ?(y ? z)?(x?w) dx dy
= |(u? v) + z?w|?s?? ?(u)?(v) du dv
? |z?w|?s +  |u? v|?s?(u)?(v) du dv.
For appropriately small , we can bound the final term by C(s)|z ? w|, and if
I(?) <?, we may conclude that it is absolutely integral with respect to ?(z)??(w).
46
Therefore we may swap the order of integration and pass through the limit to see
????
I(?) = ?? |(u? v) + z?w|
?s?(u)?(v) du dv d?(z) d?(w)
??? |z?w|?s d?(z) d?(w) = I(?).
?0
Alternatively, if I(?) = ?, then the result follows from Fatou?s lemma, since ? =
I(?) = I(lim?) ? lim I(?).
2.5 Fourier Dimension and Salem Sets
Recall that the Fourier dimension dimF S of a set is given by the fastest decay
rate of ??, for all ? supported in S, in a way that follows naturally from (2.17), and
that a Salem set is a set whose Fourier dimension matches its Hausdorff dimension.
Unfortunately we do not have to look far to find sets which are not Salem. For
example, say S is supported in a lower-dimensional hyperplane H. Let ? ? Rn be
orthogonal to that hyperplane, and let ? ? M(S). Then we have for the Fourier
transform of ?, ? ?
??(?) = e?2?ix?? d?(x) = d?. (2.26)
S H
?? is O(1) as |?| ? ?, so the Fourier dimension of S is 0, although the Hausdorff
dimension of S may be anywhere from 0 to n? 1.
As we will derive in section 2.6, the surface measure ? on the unit sphere does
achieve the optimal decay rate ??(x) . |x|1?n, so Sn?1 is a Salem set. As it turns
out, the key factor here is the curvature of Sn?1, which plays a role in the analysis
47
of oscillatory integrals. In a certain sense Hn?1Sn?1 has the optimal decay rate of
any similarly defined set precisely because its curvature is uniform.
Another example is the Cantor set, which we will see is not Salem.
Theorem 17. Let C be the Cantor set. For any ? ?M(Cd),
lim sup |??(x)| > 0. (2.27)
|x|??
Proof. Assume for contradiction that such a measure ? ? M(C) exists, such that
lim ??(x) = 0. Without loss of generality we may actually assume less; that the
Fourier series coefficients ??(k), k ? Z t?end to 0. Let ? ? S(R) be a Schwartz function
such that supp(?) ? (1/3, 2/3) and ? = 1. For j ? N, define ?j(x) = ?([3jx]),
where [x] represents the fractional part of x.
First, I claim that for all x ? C, j ? N, [3jx] ? C. This is clear by recognizing
that x ? [0, 1] is in the Cantor set if and only if its base-3 representation contains
no 1?s. Multiplication by 3 in base-3 is represented by shifting each digit to the left,
so for any j ? N, the base-3 representation of [3jx] also contains no 1?s, and is still
in C.
Using this fact, we know that for all j ? N, supp?j?C = ?. We can represent
??
2?ix3j?j(x) = ??e
k. (2.28)
k=??
Therefore ??j(3
jk) = ??(k), and all other Fourier coefficients are 0. By Parseval?s
48
theorem, we have
? ?? ??
0 = ?j d? = ??j(k)??(k) = ??(k)??(3
jk)
k?=?? k=??
= ??(0)??(0) + ??(k)??(3jk)
|k|>0
Because the sum ?
??(k)??(3jk) (2.29)
|k|>0
converges absolutely, and limj sup|k|>0 ??(3
jk) = 0, the sum tends to 0 as j ? ?.
But recognize that ??(0)??(0) = ?(C), and hence we have shown that ?(C) = 0,
which is a contradiction.
I have demonstrated the proof only for the standard 1/3-Cantor set, but the
proof is easily generalizable to any number of other Cantor-type sets, including those
formed by removing intervals of ratio other than 1/3.
While we are on the subject I will take the opportunity to do one more cal-
culation, for the Fourier transform ??. For this, return to the representation of C
as the numbers which contain no 1?s in their base-3 representation. With that in
mind, we can construct C as a closed union C = ??k=1Ek, where
{?k ?
? ?E = c 3 l ? }k l ? cl = 0 or 2, ?l . (2.30)
l=1
Then define the measure ?
? = 2?kk ?e. (2.31)
e?Ek
49
It is clear that ?k converges weakly, so by the same symmetry arguments we used
in defining ?, we can show that ?k ? ? weakly.
Then on the frequency spectrum, we can write
? ? ?
?? (n) = 2?k e?2?ien
k ?l
= 2?k e?2?in l=1 el3k , (2.32)
e?Ek e?Ek
where e is the lthl decimal point of e in base-3. A basic calculation shows then that
?k 1 + e?4?i3?jn ?k ?j
??k(n) = = e
?2?i3 ncos(2?3?jn). (2.33)
2
j=1 j=1
?
Noting that k ?j ?kj=1 3 = (1? 3 )/2,
??
?? (n) = e??i(1?3
?k)n
k cos(2?3
?jn). (2.34)
j=1
Let k ??, and we finally have
??
??(n) = e??in cos(2?3?jn). (2.35)
j=1
? It is easy to verify that ?? does not tend to 0 as n ? ?. For k ? N, ??(3
k) =
? ? k?jj=1 cos(2?3 ) is constant with respect to k and nonzero.
Given the above examples, it seems perhaps difficult to construct Salem sets.
Indeed there are few explicit constructions, especially with Hausdorff dimension not
an integer. However probabilistic methods are more fruitful. Particularly there are
good results for sets that are produced from Brownian motion [34].
50
Theorem 18. If ? : [0,?)? Rn is a random Brownian motion, and S ? [0,?) is
a Borel set, then the following is true almost surely.
1. If n ? 2 or dimS ? n/2, then dim?(S) = 2 dimS.
2. If n = 1 and dimS > 1/2, then L1(?(S)) > 0.
3. ?(A) is a Salem set.
The only known explicit examples of Salem sets of non-integer dimension are
of a form discovered by Kaufman, which takes advantage of famous results on Dio-
phantine approximation [36]. Let ? > 0 and let E be the set of x ? R such that for
infinitely many rationals p/q,
|x? p/q| ? q?2??. (2.36)
E is a Salem set with dimension 2/(2 + ?). Recent results have generalized this to
Rn, so constructive Salem sets are now known to exist in every dimension [32].
2.6 Fourier Decay of Surface Measures
Here I go through an important derivation of the Fourier transform of the
surface measure ? ? M(Rn), defined by ? = Hn?1Sn?1 . First consider a generic
radial function. It is a basic fact from calculus that if f ? L1(Rn), then
? ? ? ?
f(x) dx = f(rx)rn?1 dr d?(x). (2.37)
Rn Sn?1 0
51
We will need another related formula for rewriting the surface integral. Let x0 ?
Sn?1 be fixed and define S? = {x ? Sn?1 | x ? x0 = cos(?)}. S? is an (n ? 2)-
dimensional sphere lying in Sn?1, with radius sin(?). Call the surface measure on
S by Hn?2 = ?n?2? S ? , and note that ?
n?2
? (S?) = b(n) sin
n?2 ?, where b(n) is the
?
surface area of the unit (n? 2)-ball. Then we can write
? ? ? ?
f d? = f(x) d?n?2? (x) d?. (2.38)
Sn?1 0 S?
Then for a radial function f(x) = ?(|x|), we can compute for the Fourier
transform using these two identities:
? ? ? ?
f?(rx0) = f(y)e
?2?iy?rx0 dy = ?(s)sn?1 e?2?iy?rx0 d?(y) ds.
Rn 0 Sn?1
We can take advantage of the fact that for y ? S , e?2?iy?rx0 = e?2?ir cos ?? is constant
with respect to y.
? ? ? ?
f?(rx0) = ?(s)s
n?1 e?2?ir cos ?b(n) sinn?2 ? d? ds. (2.39)
0 0
With the change of variable t = ? cos ?, this becomes
? ? ? 1
?(s)sn?1 e2?isrt(1? t2)(n?3)/2 dt ds, (2.40)
0 ?1
which is in the form of a Bessel function. Using the definition of Bessel functions
52
Jm : [0,?)? R,
?
(u/2)m 1
J (u) = eiut(1? t2)m?1/2m dt, (2.41)
?(m+ 1/2)?(1/2) ?1
we can substitute and obtain the formula for f? ,
? ?
f?(rx) = ?(s)sn?1?c(n)(rs)
?(n?2)/2J(n?2)/2(2?rs) ds
0
?
= c(n)r?(n?2)/2 ?(s)J n/2(n?2)/2(2?rs)s ds.
0
We can now use this formula to derive some decay estimates. Using the following
two classical properties of Bessel functions,
Jm((t) ? C(m) )t
?1/2 t > 0, (2.42)
d tmJm(t) = t
mJm?1 (t) , (2.43)dt
we can quickly calculate the Fourier transform of the unit ball.
?
c(n) ??
??B(0,1)(x) = |x|(2?n)/2Jn/2(2?|x|s)sn/2?
2?|x| ?
s=1
= C(n)|x|?n/2Jn/2(2?|x|).
Using (2.42), this satisfies the decay ??B(0,1)(x) ? C(n)|x|?(n+1)/2. Then finding a
bound for the surface measure ? on Sn?1 is as simple as recognizing that we can
53
define it as a weak limit of the functions
??1?B(0,1+?)\B(0,1), ? ? 0. (2.44)
Then we get the formula
? 1+?
??(x) = lim c(n)|x|?(n?2)/2??1 J n/2(n?2)/2(2?rs)s ds
??0 1
= c(n)|x|?(n?2)/2J(n?2)/2(2?|x|).
This has a decay rate of ??(x) . |x|(1?n)/2.
2.7 Discussion
Our goal is to find options for recovering non-discrete measures. We know
that certain surface measures cannot be recovered by means of total variation min-
imization [6]. I consider two main candidates for accomplishing this: we may either
minimize for something other than total variation or limit the optimization to a
subspace of M(T2).
If we wish to recover surface measures a natural question is: what is a possible
subspace S ?M(T2) which contains some or all 1-dimensional surface measures but
no measures of dimension 0? We can rule out S being the set of all surface measures
of rectifiable curves, because the weak-* closure of this set is M(T2)?for example,
a sequential limit of surface measures on concentric circles with decreasing radius
will converge to a point mass. Therefore nothing is gained by limiting optimization
54
to S; we must choose a subset which is closed.
Limiting the decay rate is also tempting. Take S = {? ? M(T2 | ??(x) ?
C|x|?1/2}, for some aptly chosen C > 0. We know from our derivation in the prior
section that S will contain at least some surface measures, and cannot include any
measures with support smaller than 1 dimension. The problem is with the choice of
C. Any solution to (1.70) will necessarily fall on the boundary of S, so the choice
of C has an effect on the outcome of the algorithm. This may be a good option. If,
for example, we know that the desired solution is a circle with prescribed radius, we
may find the correct value of C by direct calculation by means of the formula in the
previous section. However this method requires a large amount of prior knowledge
which may not be realistic.
One more option to explore is using the energy integrals Is as an alternative
or addendum to the total variation norm. Consider an algorithm,
?# = argmin???TV + ?Is(?) such that F (?) = y, (2.45)
??M(T2)
for some appropriately chosen s and ? > 0. This algorithm would exclude any
discrete measures as solutions, but beside that fact the results are somewhat unclear.
One problem is that for ? a surface measure, I1(?) =?, so we must choose s < 1,
which will also allow potentially for solutions with dimension between 0 and 1.
Otherwise this territory is fairly unexplored to the author?s knowledge, and I would
recommend it as a topic of future research.
The relationship between total variation, Hausdorff dimension and the Fourier
55
transform is rich and complex. There are relationships which may be exploited, but
dealing with them takes caution. The moral of this chapter is that it is difficult to
divorce the total variation norm on a continuous space from the discrete measures.
Particularly to note is the fact stated previously that a discrete measure may be
expressed as a limit of surface measures or continuous measures (approximate iden-
tity), but the converse is not true. In the next chapter I will explore the question of
why the discrete measures are unique in this regard, and I get a result that confirms
our expectations that unique recovery of non-discrete measures is impossible with
total variation minimization.
56
Chapter 3: Choquet Theory
3.1 Choquet?s Theorem
Choquet theory is concerned with convex sets and their extreme points. Ex-
treme points can provide a characterization of convex sets, in particular if they are
compact. The content of this section is focused on results that provide representation
theorems of convex sets, and points within convex sets, centered on their extreme
points. For a motivating example I begin with a classical theorem of Carathe?odory
[19].
Theorem 19. Let a point x fall in the convex hull of a set E ? Rn. x can be
written as a convex sum of at most n+ 1 points in E.
An easy corollary is that if E consists of n+1 affinely independent points (the
points xk ? x1 are linearly independent for k > 1), then the convex sum is unique.
In this case the convex hull of E is a simplex, and E are its vertices. From this it
is easy to see that the bound n + 1 is the best that can be done. If E is affinely
independent and x lies in the interior of E, then x is affinely independent from any
strict subset of E, therefore it can only be written as a convex combination of all
n+ 1 elements of E.
57
Definition 6. A point x in a convex set S in a locally convex topological vector space
is an extreme point if for any a, b ? S, a 6= b, and for any t ? (0, 1), x =6 ta+(1? t)b.
Denote the set of extreme points of S by ex(S).
Definition 7. Let E be a subset of a locally convex topological vector space. The
convex hull of E is the smallest convex set containing E. The closed convex hull of
E is the smallest closed set containing E.
Extreme points are in a sense generators of compact convex sets. Because they
cannot be written as convex combinations of other elements, they carry essential in-
formation about the set. The Krein-Milman theorem tells us that they are sufficient
to characterize their underlying sets as well [39].
Theorem 20. (Krein-Milman) If X is a locally convex topological vector space and
S is a compact convex subset of X, then S is the closed convex hull of its extreme
points.
Results of this sort in finite dimensions date back to the early 20th century,
first attributed to Minkowski and Carathe?odory [50]. Carathe?odory?s theorem 19
suggests the idea of simplices as a basic archetype of a convex set. As we will see in
section 3.3, Choquet theory develops tools to generalize the idea of the simplex to
infinite dimensional space.
The insight of Choquet to convex geometry is generalizing the idea of a convex
combination to infinite-dimensional domains, by considering them as integrals rather
than sums. Now instead of being represented by the nonnegative weights of a finite
58
sum, elements of a convex set can be represented as an integral over a nonnegative
measure supported on a generating set.
Definition 8. Let X be a locally convex topological vector space and S be a
nonempty compact subset. If ? is a probability measure on S, we say that a point
x ? X is represented by ? if
?
f(x) = f(t) d?(t) for all affine functions f on X. (3.1)
S
We say that f : X ? R is affine if for all x, y ? X, ? ? R, f(?x+ (1? ?)y) =
?f(x) + (1 ? ?)f(y). Note that any point x is trivially represented by the delta
measure ?x.
For example, in the finite dime?nsional case we might take x?? R
N , and stan-
dard basis E = (e Nn)n=1. Let x = n ?nen, where ?n > 0 and n ?n = 1. This
mean?s that x is a convex combination of basis vectors. Alternatively, we can let
? = n ?n?en be a measure on RN . For all y? ? (RN)?,
? ?N
y?z d?(z) = ?ny
?en = ?y,x? . (3.2)
E n=1
Hence x is represented by the probability measure ? in a way that is analogous to
writing it as a convex combination. Note also that the?restriction that ? be positive
and norm one is equivalent in this case to ?n > 0 and n ?n = 1.
For an example that is less trivial, I want an infinite dimensional space. Let
X =M (T2b ), the set of bounded complex measures on the Torus, which is relevant
59
since I will return to it when I discuss superresolution. Let S = {?x | x ? T2}, and
? be the 1-dimensional surface measure on the set {(x1, x2) | x2 = 0}. It is easy to
see that ? cannot be written as a finite convex combination of delta measures, but
I wish to show that it can be represented by a probability measure.
Proposition 5. Let ? ?Mb(S) be defined by ?(A) = ?({x ? T2 | ?x ? A}). Then
? represents ?.
Proof. First observe that ? is well defined, and that it is a positive measure, with
??? = 1. Let?f ? X
? be a continuous linear functional onM (T2b ). We wish to show
that f(?) = f d?. Note that because of how ? is defined, (S,?) and (T2, ?) are
S
isomorphic as measure spaces under the isomorphism ?x ?7 x. We can conclude
that ? ?
f(?) d?(?) = f(?x) d?(x). (3.3)
S T2
Now consider the sequence of measures
1 ?N
?N = ?( n ,0). (3.4)N N
n=1
?N converges to ? weakly. We can conclude that
? ?
f(?x) d?(x) = lim f(?x)(d?N(x)T2 N?? T2?N )1
= lim f ?( nN?? N ,0N )
n=1 (3.5)
= lim f(?N)
N??
= f(?).
60
Therefore ? represents ?.
One thing to note is that the final inequality requires that f is continuous with
respect to the weak topology on M 2b(T ). This is significant, as it is a requirement
for Choquet?s representation theorem for S to be compact, which it is in this case
only under the weak topology.
The ultimate goal of this machinery is to say that we can actually represent
all points in a convex set by measures on its extreme points, which was achieved by
Choquet in 1956 [22].
Theorem 21. (Choquet-Bishop-De Leeuw) Let S be a compact convex subset of
a locally convex space X. Let x ? S. Then there exists a probability measure
? ? Mb(X) such that ? vanishes on every Baire subset of S that doesn?t contain
an extreme point of S, and that for any bounded linear functional f on X,
?
f(x) = f d? (3.6)
S
Choquet initially proved the theorem under the slightly stricter case where
X is metrizable. Bishop and De Leeuw found the above strengthening three years
later.
3.2 Application to Completely Monotone Functions
Choquet?s theorem has found application to a variety of subjects, providing
elegant proofs of some well-known representation theorems [3, 24, 28, 50]. I will go
61
through an example here to provide context for the theory and demonstrate its wide
application, and to illustrate the recurring themes of convex geometry that I wish
to exploit.
A function f : (0,?) ? R is said to be completely monotonic if it is smooth
and satisfies
(?1)nf (n) ? 0 ?n ? N. (3.7)
For example, e??x and x?? are completely monotonic. There is a theorem of Bern-
stein that characterizes completely monotone functions in terms of exponentials of
the form e??x [7]. This result is particularly interesting in the context of the Laplace
transform. In a certain sense it can be viewed as a version of Bochner?s Theorem
for the Laplace transform. For more details see Widder?s comprehensive exposition
of the Laplace transform [60]. Phelps gives a proof of his theorem for bounded
completely monotonic functions, using methods from Choquet theory [50].
Theorem 22. (Bernstein) Let f be a completely monotone function. Then there
exists a unique nonnegative measure ? ?M[0,?] such that ?([0,?]) = f(0+) and
? ?
f(x) = e??x d?(?). (3.8)
0
The rest of this section will be dedicated to a proof of Bernstein?s theorem. The
idea is clear from the representation formula. We wish to find an adequate topology
on the set of completely monotone functions and a subset which is compact and
convex, so that the extreme points are precisely the functions e??x, ? ? [0,?].
62
Let X be the convex cone of bounded completely monotonic functions. Note
that because f ? X is necessarily strictly nonnegative and nonincreasing, f is
bounded if and only if f(0+) < ?. Define S ? X as the set of all such functions
for which f(0+) ? 1. Choose the topology such that fk ? f if and only if for all
n ? N (n), ?fk ? f?? ? 0. By definition, this is a locally convex topology. We wish
to show that in this case, S is compact.
We can construct the chosen topology of uniform convergence with a countable
family of seminorms,
p (f) = sup{|f (k)m,n (x)| | 1/m ? x ? m, 0 ? k ? n}. (3.9)
By a standard construction, X is metrizable, with the metric
?? pm,n(f ? g)
d(f, g) = . (3.10)
1 + p (f ? g)
m,n=1 m,n
Under this definition, a set is bounded if and only if it is bounded in each semi-
norm. Therefore, given a bounded sequence of functions, the sequence is uniformly
equicontinuous on each derivative and each compact set. An application of the
Arze?la-Ascoli theorem, along with a diagonalization argument shows that the se-
quence has a convergent subsequence. Therefore every closed and bounded set is
compact.
S is clearly closed, so it remains to show that S is bounded.
Lemma 23. Let S = {(?1)nf (n)n | f ? K}. Let a > 0 and n ? 0. Then Sn is
63
uniformly bounded above on [a,?) by a?n2(n+1)(n/2).
Proof. The proof is by induction. For n=0, the lemma holds because for any f ? K,
f(0+) ? 1 and f is strictly nonincreasing.
Assume for arbitrary k ? N that the lemma holds for Sk. Let a > 0. By the
mean value theorem we have the existence of a point c ? [a/2, a] such that
a
f (k+1)(c) = f (k)(a)? f (k)(a/2). (3.11)
2
Then using monotonicity of functions in Sk and the induction hypothesis, we have
(a/2)?k2(k+1)(k/2) ? (?1)kf (k)(a/2) (3.12)
? (?1)k+1af (k+1)(c) (3.13)
2
? (?1)k+1af (k+1)(a). (3.14)
2
The inequality follows, and because of the monotonicity of f ? Sk, it will hold for
x ? a as well.
We have now shown that we have an adequate setting for Choquet?s theorem
to apply, and all that remains is to characterize the extreme points of S to obtain
the desired representation theorem.
Lemma 24. The extreme points of S are precisely the functions of the form f(x) =
e??x, where 0 ? ? ? ?.
Proof. A straightforward computation shows that any such function falls in S. First
64
I will show that any extreme point must take this form.
Let f be an extreme point of S. Let x0 > 0 and define u(x) = f(x + x0) ?
f(x)f(x0). I claim that f ? u ? S. Note first that
(f + u)(0+) = f(0+) + f(x0)? f(0+)f(x0)
= f(0+)(1? f(x0)) + f(x0)
? 1,
(f ? u)(0+) = f(0+)? f(x0) + f(x0)f(0+)
= f(x0)f(0
+) + (1? f(x0))
? 1.
And for k ? N,
|u(k)(x)| = |f (k)(x+ x0)? f(x0)f (k)(x)|
? max{|f (k)(x+ x0)|, |f(x0)f (k)(x)|}
? (?1)kf (k)(x).
Therefore (?1)k(f ? u)(k)(x) ? 0, and f ? u ? S.
Because we made the assumption that f is an extreme point of S, u ? 0.
Hence for any x, x0 > 0, f(x + x0) = f(x)f(x0). Because f is continuous, it must
be of the form e??x. Because of the restriction that f ? ? 0, ? ? 0. Note that I am
allowing ? =? for the case f ? 1.
65
Finally, what remains is to show that every function e??x is an extreme point.
We can solve this by taking note of the mapping f(x) 7? f(rx) for r > 0. This
mapping is a bijection of S onto itself and preserves convex combinations, so it
takes extreme points to extreme points. Since the Krein-Milman theorem implies
that there is at least one nonconstant exponential function e??x, which is an ex-
treme point of S, we can generate any such exponential function e??rx by the above
mapping to show that it is an extreme point. The lemma is proved.
It follows from Choquet?s theorem th?at for any f ? S there exists a probability
measure ? ?M(ex(S)) such that f(x) = d?. Now, because the correspondence
ex(S)
T : [0, 1] ? ex(S) given by ? 7? e??x is actually an homeomorphism, we have a
natural correspondence fromM[0,?] toM(ex(S)). We can identify ? ?M(ex(S))
with T? (?) ?M[0,?] by T? (?)(A) = ?({e??x | ? ? A}) and it turns out that
? ?
f(x) = e??x dT? (?)(?), (3.15)
0
as intended.
To show?uniqueness, one ?needs only to use the Stone-Weierstrass theorem.? ?
Suppose that e??x d?(?) = e??x d?(?) for all x > 0. If A is the subspace of
0 0
C[0,?] generated by functions of the form ? 7? e??x, then ? = ? as functionals on
A. Since A separates points of [0,?], it is dense, and therefore ? ? ?.
66
3.3 Uniqueness and the Choquet Simplex
While Choquet?s theorem is a great tool for proving existence, it doesn?t say
anything about uniqueness. In fact we don?t need to go further than the finite
dimensional case to see examples of the failure of convex representations to be
unique. For example, the unit square in the plane has extreme points (?1,?1),
and the origin is a convex combination of either of the two diagonal vertices. In
fact any point in the interior has at least two convex representations. The unit
ball by contrast is in a sense ?maximally nonunique,? in that any point in the unit
ball is either an extreme point or has uncountably many representations as convex
combinations of extreme points.
The prototypical example of a set where Choquet-type representations are
unique is a simplex.
Proposition 6. Each point in a finite-dimensional simplex has a unique represen-
tation as a convex sum of its vertices.
Proof. Proceed by induction. For k = 0 is trivial, as the 0-dimensional simplex is
a single point. Assume the hypothesis is true for k. Refer to the k-dimensional
simplex as ?k, with vertices {x , ? ? ? ,x }. ?k+11 k is formed by adding a new vertex
xk+1, linearly independent from ?
k, and taking the convex hull of the result. Let
x ? ?k+1, without loss of generality x 6= xk+?1. Since x is in the convex hull of
the vertices {x1, ? ? ? ,xk+1}, we can write x = k+1m=1 cmxm. Consider the line ` =
{tx + (1 ? t)xm | t ? R}. ` passes through both the points xk+1 and x, therefore
67
it passes through the face of ?k+1 opposite x kk+1, which is identified with ? . The
point x0 ? ` ??k has for some t ? R,
??? ?k+1 ?kx0 = t cmx ?m + (1? t)xk+1 = (1 + ck+1) cmxm, (3.16)
m=1 m=1
which is in ?k. By the induction hypthesis, (c1, ? ? ? , ck) are unique, therefore xk+1
is also unique.
As an obvious consequence of this, the vertices of a simplex are also shown to
be its extreme points. Carathe?odory?s theorem can be seen as an immediate con-
sequence of this result as well. Any n-dimensional polyhedron may be decomposed
into simplices. If the polyhedron has more than n + 1 vertices, then the decompo-
sition is not unique. For any point in the interior of the polyhedron, it has at least
one different Choquet representation for each choice of simplicial decomposition.
The generalization of this concept is the Choquet simplex, a construction that
generalizes the notion of simplices to infinite-dimensional vector spaces in a way
that guarantees that points in Choquet simpleces have unique representations as
measures on the extreme points. Here I will define the Choquet simplex and prove
Choquet?s uniqueness theorem for metric spaces.
The definition will require a few more constructions. Say that S is a compact
convex set in a real locally convex space X. I will be working with the cone of S, so
I will assume that S falls in an affine hyperplane in X that does not intersect the
origin. Denote the cone C = {?x ? X | x ? S, ? ? 0}. If S falls in a hyperplane
as described, then for all x ? C, there is a unique ? ? 0 such that ?x ? S. We say
68
that S is a base of the cone C.
The cone C defines a partial ordering on the space X. Let x  y if y? x ? C.
Since S falls in a hyperplane that misses the origin, C ? ?C = {0}, so x  y and
y  x implies y = x. On the other hand if x  y  z, then y ? x = ?u, z ? y = ?v,
where ?, ? ? 0 and u, v ? S. Then
( )
? ?
?u+ ?v = (? + ?) u+ v , (3.17)
? + ? ? + ?
which is in C, so x  z. Note as well that for x ? X, the set Cx = {y ? X | x 
y} = C + x, so the partial order is also translation invariant. Finally, we say that
for two elements x, y ? X have an upper bound z if x  z and y  z. The upper
bound is a least upper bound if for every upper bound z0, z  z0.
Finally, we define a convex set S that is the base of a cone C to be a Choquet
simplex if and only if the space C ?C has the property that for each x, y ? C ?C,
there is a unique least upper bound, which we denote x ? y. We say that C ? C is
a vector lattice.
Definition 9. A partially ordered set is called a join- (meet-) semilattice if each
pair of elements has a unique least upper (lower) bound. If a vector space is a
semilattice which satisfies
1. x  y implies x+ z  y + z for all z
2. x  y implies ?x  ?y for all ? ? 0,
then it is a vector lattice.
69
Proposition 7. C ? C is a vector lattice if and only if C is a meet-semilattice.
That is each pair of elements in C has a greatest lower bound.
Proof. For each x, y ? C, call their unique greatest lower bound x?y. Let x = x1?x2
and y = y1 ? y2 be elements of C ? C. Let z = (x1 + y1) ? (x1 + y2) ? (y1 + x2).
Because z?x = (x2 +y1)? (x1 +y2)? (y1 +x2), we have x  z, and similarly y  z.
Now let w = w1 ? w2 be an upper bound for x and y. It is clear from the
definitions that w2 + x1 + y1  w1 + x2 + y1 and w2 + x1 + y1  w1 + x1 + y2.
Therefore,
w ? z = (w1 + x1 + y2) ? (w1 + y1 + x2)? (w2 + x1 + y + 1)  0. (3.18)
For the reverse direction, given C ? C is a vector lattice, one can define a
greatest lower bound by x ? y = ?(?x ? ?y). By restricting to C ? {0} = C, it is
demonstrated that C is a meet-semilattice.
Geometrically, this can be interpreted as saying that the intersection of two
identical cones will be another congruent cone. For basic examples, consider the
finite dimensional examples I discussed earlier. The intersection of two circular
cones C0 = {(x, y, z) ? R3 | y2 + z2 ? x2} and Cw = C0 + w situated in the same
direction will be a hyperbola-shaped crescent. Therefore 0 and w can?t have a least
upper bound because for any v ? C0 ? Cw, Cv = C0 + v ? C0 ? Cw is a strict
subset; it does not contain all upper bounds of 0 and w. Therefore the circle is not
a Choquet simplex.
70
By contrast, the nonempty intersection of two triangular pyramids, oriented in
the same direction, is another identical prism. Thus, the triangle, a simplex, indeed
meets the criterion to be a Choquet simplex as well.
3.4 Technical Proofs
Now I will provide some more technical proofs in Choquet theory. To start
with there are some tools we will need in the upcoming proofs. First, I d?efine an
?equivalence relation on measures. Say that ? ? ? if for all h ? A(X), h d? =
h d?.
Definition 10. Let X be a convex subset of a topological vector space and f : X ?
R. f is called convex if for all x, y ? X, and for all 0 < ? < 1,
f(?x+ (1? ?)y) ? ?f(x) + (1? ?)f(y). (3.19)
If the inequality above is strict, we say f is strictly convex. If it is an equality, then
f is said to be affine. ?f is concave.
Define the set of all affine functions on X by A(X). Notice that A(X) contains
functions of the form f(x) + r, where r ? R and f is linear, so it is rich enough to
separate points.
Definition 11. Let f : X ? R be bounded. Call the upper envelope of f the
71
function given by
f(x) = inf{h(x) | h ? A(X) and h ? f}. (3.20)
These are few basic properties of f, which will be useful:
Proposition 8.
1. f is concave, bounded and upper semicontinuous.
2. f ? f for all f . f = f if and only if f is concave and upper semicontinuous.
3. If f, g are bounded functions, f + g ? f + g, and |f ? g| ??f ? g?C . f + g =0
f + g if g ? A(X).
4. rf = rf , for r > 0.
The following proposition shows how the upper envelope function interacts
with the extreme points of its underlying set.
Proposition 9. Let X be a convex subset of a topological vector space. If f ?
C(X), then for each x ? X,
{? }
f(x) = sup f d? | ? ? ?x . (3.21)
f(x) = f(x) if x is an extreme point of X.
?
Proof. Let g = sup{ f d? | ? ? ?x}, and we wish to show that g = f . For any
?, ? with ? representing x and ? representing y, and 0 < ? < 1, ? = ??+ (1? ?)?
72
?
represents ?x + (1? ?y). Then we have the inequality f d? ? g(?x + (1? ?y)).
Taking the supremum over all ? and ?, we have that
g(?x+ (1? ?y) ? ?g(x) + (1? ?)g(y), (3.22)
Therefore g is concave.
To show that g is upper semicontinuous, let {x?} be a net converging to a
point x ? X, with g(x?) ? r for each ?. Fix  > 0, and then for each ? choose
a measure ?? ? ?x? such that ??(f) > r ? . By the weak-* compactness of the
set of probability measures, there exists a convergent subnet {??} that converges
to a measure ?. For each h ? A(X), h(x?) = ??(h) ? ?(h), so ?(h) = h(x), and
? represents x. Also because ??(f) > r ?  for each ?, we have ?(f) ? r ? , and
therefore g(x) ? r ? . Because  was arbitrary, we can let it go to 0 and obtain
g(x) ? r, which shows that g is upper semicontinuous.
Now since g is concave and upper semicontinuous, it follows that g ? f . On
the other han?d, if h ? ?A(X) and h ? f , for any x ? X and ? representing x, we
have h(x) = h d? ? f d?. From this we get h(x) ? g(x), and then taking the
infimum over all such h, f(x) ? g(x).
3.4.1 Choquet?s Theorem for Metrizable Spaces
Now we wish to move on to our first major proof. Recall Choquet?s Theorem.
Theorem 25. (Choquet) Let S be a compact convex subset of a metric space X.
Let x ? S. Then there exists a probability measure ? ? Mb(X) such that ? is
73
supported on ex(S), and that for any bounded linear functional f on X,
?
f(x) = f(t) d?(t) (3.23)
S
This was the first representation theorem proved by Choquet [22]. It was, of
course, generalized not long after to the Choquet-Bishop-De Leeuw theorem, but the
proof is much more technical and the added machinery for working in non-metrizable
topological spaces is not necessary for our purposes. The proof is courtesy of Phelps
[50].
Proof. (Choquet?s theorem) Suppose X is a metrizable compact convex subset of a
locally convex topological vector space E. Let x0 ? X be arbitrary. What we get
from the metrizability of X is that C(X), and therefore A(X), is separable. Let
(h ?k)k=1 be a sequence of functions in A(X) such that ?hk?C = 1 and {hk | k ? N}0
is dense in the unit?sphere of A. In other words, (hk) separates points of X.
Then let f = 2?kh2k k. Note two things: (hk) is uniformly bounded, so the
sum converges uniformly; and h2k is strictly convex for all k, so f is also strictly
convex. Then we can define a subspace B = A(X) + rf , r ? R. For h ? A(X) and
r ? 0, we have by Proposition 8 that h + rf = h+ rf . If r < 0, then h + rf is
concave, and h+ rf ? h+ rf = h+ rf . In either case we have
h+ rf ? h+ rf, (3.24)
and the functional on B defined by h + rf 7? h(x0) + rf(x0) is dominated by the
74
functional g 7? g(x0), which is bounded on C(X). Therefore we can apply the Hahn-
Banach theorem to extend to a functional m on C(X) which satisfies m(g) ? g(x0)
and m(h+ rf) = h(x0) + rf(x0). Note that if g is negative then m(g) ? g(x0) ? 0,
so m is positive, and therefore is identified with a measure on X. m(1) = 1, so m is
a probability measure. Because for any h ? A(X), m(h) = h(x0), m represents x0.
Now for any h ? A(X) such that f ? h, and therefore h ? f. Hence m(f) ?
m(h) = h(x0). By the definition of f, m(f) ? m(f), and since f ? f , m(f) = m(f).
We can conclude that m is supported on the set {x ? X | f(x) = f(x)}.
Let x, y, z be distinct points in X. If x = y/2+z/2, then by the strict convexity
of f ,
1 1 1 1
f(x) < f(y) + f(z) ? f(y) + f(z) ? f(x). (3.25)
2 2 2 2
Therefore x is not in the support of m. We can conclude that every m is supported
on a subset of the extreme points of X.
As it turns out, one additional fact that follows from this proof is that the
constructed function f has the stronger property that f(x) = f(x) if and only if
x is an extreme point of X, which gives us another way of characterizing extreme
points.
3.4.2 The Choquet-Meyer Theorem
The Choquet-Meyer theorem says that Choquet simplices are precisely the
convex sets that support uniqueness of Choquet representations [23].
Theorem 26. (Choquet-Meyer) Let X be a compact convex subset of a locally
75
convex, metrizable topological vector space E. The following are equivalent.
1. X is a Choquet simplex.
2. For all f ? C(ex(X)), if f is convex, then f ? A(X).
3. If ? ? M(ex(X)) represents x ? X, and f ? C(X) is convex, then f(x) =
?(f).
4. For any convex f, g ? C(X), f + g = f + g.
5. For all x ? X, there is a unique measure ? ?M(ex(X)) that represents x.
The process outlined in this section is again from Phelps, as was much of the
material in this chapter [50]. We will need a few more materials before attacking
the proof in its entirety.
Defi?ne a part?ial ordering on M(X) given by ? ? ? if for all convex functions
h on X, h d? ? h d?. This definition is meaningful. For example, we can show
that the surface measure (normalized) on the unit sphere is greater than the Dirac
delta at the origin. Let f be convex and continuous on the unit ball. Because for
all x ? Sn?1, f(0) < (f(x) + f(?x))/2, we have
? ?
? f(x) d?(x) = f(0) = ?f(0) d? (3.26)Sn?1
f(x) + f(?x)
< d?(x) = f(x) d?(x). (3.27)
Sn?1 2
With that established, it follows from an application of Zorn?s lemma that there are
maximal measures in this ordering. We wish to show the following equivalence:
76
Proposition 10. A measure ? ? M(X) is maximal in the ordering from ? if and
only if it is supported on the extreme points of X.
Proof. Assume ? as above is maximal. Let f ? C(X). Define a linear functional
L on the subspace span{f} given by rf 7? r?(f). This functional is dominated by
the bounded sublinear (but not linear) functional p : C(X) ? R p(g) = ?(g). By
the Hahn-Banach theorem we can extend L to a linear functional on C(X) which is
bounded by p. If g ? 0, then g ? 0, so L(g) ? p(g) = ?(g) ? 0, so L is a positive
functional, and can be represented by a measure ? ? M(X). Now, for any convex
function g, ?g = ?g, so
?(?g) ? p(?g) = ?(?g), (3.28)
and ? ? ?. Since ? is maximal, we know that ? = ?, and
?(f) = ?(f) = L(f) = ?(f). (3.29)
Now assume that for any convex continuous function f , ?(f) = ?(f). Let
choose a maximal measure such that ? ? ?. We know ? exists again from Zorn?s
lemma. Then for any concave g, ?(g) ? ?(g). We can write for convex f ,
?(f) = inf{?(g) | g is concave and g ? f}
? inf{?(g) | g is concave and g ? f} = ?(f).
Hence for any convex f , ?(f) = ?(f). Because {f ? g | f, g are convex} is dense in
C(X), we can conclude that ? = ? and ? is maximal.
77
Finally, recall that in the proof of Choquet?s theorem, we constructed a convex
function f ? C(X) that has the property that ex(X) = {x ? X | f(x) = f(x)}.
Then we can conclude that a measure ? is supported on the extreme points of X if
and only if ?(f) = ?(f). The proof is complete.
Furthermore, the set of nonnegative measures on a vector space X forms a
lattice under this ordering.
Proposition 11. Given two positive measures ? and ? in M(X), their greatest
lower bound exists and is of the form
( )
? ? ? = min d? , d? (?+ ?), (3.30)
d?+? d?+?
where d?? denotes the Radon-Nikodym derivative.d
Proof. Let ? ? ?, ?. It?s obvious that ? is absolutely continuous with respect to
? + ?, since it is absolutely continuous with respect to ? and ? individually. If
f ? C(X) is convex, then
? ? ? ?
d? d?
f d?+ ? = f d? ? f d? = f d?+ ?, (3.31)
d?+ ? d?+ ?
( )
and likewise for ?. Hence, d? ? min d? , d? = d??? , and ? ? ? ? ?.
d?+? d?+? d?+? d?+?
The last thing we will need is a decomposition lemma for vector lattices.
Lemma 27. Let V be a vector?lattice. If w?e have two subsets of positive elements of
V , (xi)
I
i=1 and (y )
J , and if Ij j=1 i=1 xi =
J
j=1, then there exists zij ? 0, 1 ? i ? I,
78
? ?
1 ? j ? J , such that x = J Ii j=1 zij and yj = i=1.
Proof. Consider the case where I = J = 2. We have x1 + x2 = y1 + y2 and let
z11 = x1 ? y1
z12 = x1 ? z11
z21 = y1 ? z11
z22 = x2 ? z21 = y2 ? z12
The desired identities follow immediately, so what we must show is that the final
equality holds, and that each zij are nonnegative. By the translation invariance of
the vector lattice, we have
z12 ? z21 = (x1 ? z11) ? (y1 ? z11) = x1 ? y1 ? z11 = 0, (3.32)
so z11, z12, z21 are nonnegative. Furthermore,
z12 + x2 = x1 + x2 + z11 = y1 + y2 + z11 = z21 + y2. (3.33)
It follows that y2 ? z12 = x2 ? z21, so z22 is well defined. z21 ? z21 + y2, so
z21 = z21 ? (z12 + x2) ? z21 ? z12 + z21 ? x2 = z21 ? x2. Hence z21 ? x2, and z22 is
nonnegative.
Finally, we can generalize to higher I, J by induction. If we are given x1 +x2 =
y1 + y2 + y3, we can set y?2 = y2 + y3 and then we have x1 + x2 = y1 + y?2, and we
79
can generate z?11, ? ? ? , z?22 as before. Then we have?z?12 + z?22 = y2 +?y3. We can apply
the above again to get z12, z13, z
3 2
22, z23 such that j=2 zij = z?i2, i=1 zij = yj. Set
z11 = z?11 and z21 = z?21 and the desired result follows. Proceed inductively in this
manner for all I, J.
Now we move on to the proof of the main theorem.
Proof. (Choquet-Meyer) (1? 2) Let f ? C(X) and X be a simplex. Let x1, x2 ? X,
?1, ?2 > 0 such that ?1 + ?2 = 1. Call z = ?1x1?2x2. We?d like to show that f is
affine, so we need that f(z) = ?1f(x1) + ?2f(x2).
By Proposition 9, f(z) = sup{?(f) | ? ? ?z}. Suppose ? is a discrete measure
?and ? ? ?z. Then t?here exist a finite sequence of ?j ? 0 and yj ? X such that
?j = 1 and ? = ?j?y . By using Lemma 27 on vector la?ttices, we can thenj
get a sequence of z?ij so that ?jyj = z
? + z?1j 2j and ?ixi?= z
?
j ij. If we write
z?ij = ?ijzij, ?ij ? 0 and zij ? X, then we get that xi = ??1j i ?ijzij, which is a
convex combination of elements of X. It follows that it represents a discrete measure
?J
? = ??1i i ?ij?z i = 1, 2, (3.34)ij
j=1
?
with ? ? ?x . We can conclude that f(xi) ? ?i(f) = ?1i j ?i ?ijf(zij).
Finally, by the convexity of f , we have f(y ) ? ??1? f(z ) + ??1j j 1j 1j j ?2jf(z2j),
80
so we can conclude that
?
?(f) = ?jf(yj)
?j
? ?ijf(zij) = ?1?1(f) + ?2?2(f)
i,j
? ?1f(x1) + ?2f(x2).
If we take the supremum over all possible ?, we get f(z) ? ?1f(x1)+?2f(x2). Note
that we made the assumption that ? may be a discrete measure without justification.
This is acceptable because we can approximate any ? ? ?z with a discrete measure
from below; this can be made formal using a partitioning argument relying on the
compactness of X.
(2? 3) If ? ? ?x is supported on ex(X), then ?(f) = ?(f). We want to show that
?(f) = f(x). f is affine and upper semicontinuous. Because for h ? A, ?(h) = h(x),
it is enough to show that f can be approximated from above by continuous affine
functions. Let h1 and h2 be in A such that hi > f . We want to show that there is
another h ? A, h > f such that h ? h1, h2. Define J = {(x, r) ? X ? (0,?) | r ?
f(x)} and Ji = {(x, r) ? X ? (0,?) | r = h(x)}. Notice that by the semicontinuity
of f, J is closed, and J1?J2 is compact. Since the two sets are disjoint, by the Hahn-
Banach separation theorem we can separate them with a hyperplane L(x, r) = ?.
The function given by h(x) = L(x, h(x)) satisfies our requirements. Then the set
H = {h ? A | h ? f} is directed downwards. In addition, the closure of H is
81
bounded below, so it has a unique minimal element f ?. Because for all z ? X, there
exists hn ? H such that hn(z)? f(z) ? f ?(z), we can conclude that f ? = f , and
?(f) = inf{?(h) | h ? H} = inf{h(x) | h ? H} = f(x). (3.35)
(3? 4) Let f, g ? C(X) be convex. Choose ? ? ?x supported on ex(X).
(f + g)(x) = ?(f + g) = ?(f) + ?(g) = f(x) + g(x). (3.36)
(4 ? 5) Let x ? X. Consider the set S ? C(X) of convex functions, and define a
linear functional on S?S by m(f?g) = f(x)?g(x). Because for any h1, h2 ? C(X),
h1(x)+h2(x) = h1 + h2(x), m is linear on S?S, and we have as a property of upper
envelopes that |m(f?g)| ??f ? g?, we can conclude that m is uniformly continuous
on S ? S, and therefore extends to a continuous function on C(X) with norm at
most 1. m(1) = 1, so m is a probability measure, and therefore is identified with a
measure on X.
By Proposition 9, we have that for any f ? C(X), m(f) = f(x) = sup{?(f) |
? ? ?x}, so ? . m for any ? ? ?x. Therefore m is the unique maximal measure
which represents x. Equivalently, it is the unique measure supported on ex(X)
which represents x.
82
(5 ? 1) Consider the cone P of nonnegative measures on X. As we have shown
in Proposition 11, this cone forms a lattice. We would like to consider the subcone
Q ? P of measures supported on ex(X), and find that this is a lattice as well.
First, for any two ?, ? ? Q, (?+ ?)(f) = (?+ ?)(f), so ?+ ? ? Q. For r ? 0,
r? ? Q also.
Let x, y ? Q, and let x ? y be their greatest lower bound in P . Recall the
formula for the greatest lower bound of two measures in (3.30). It is clear from this
formula that x ? y is supported on the union of the supports of x and y, which are
contained in ex(X).Therefore x ? y is in Q. To show that x ? y is still a greatest
lower bound in the natural ordering on Q, take w ? Q such that x ? w and y ? w
are in Q. Because x?y is a greatest lower bound on P , we have that 0 ? P ? x?y.
It follows that x ? y ? w ? P and x ? y ? w ? z. Because x ? y ? Q, x ? y ? w
is absolutely continuous with respect to x ? y, and therefore it is also supported on
ex(X) and therefore x ? y ? w ? Q, and w is less than x ? y in the natural order
on Q. x ? y is a true greatest lower bound, and Q is a lattice. Moreover, the set
Q1 = {? ? Q | ?(X) = 1} is a base for Q, and Q1 is a simplex.
The final step is to show that X is a simplex. Define the resultant map
r : Q1 ? X by r(?) is the unique point in X represented by ?. It is easy to see that
the resultant is an affine function. Our assumption was that to each x ? X, there
is a unique measure in Q1 such that ? represents x. This implies that the resultant
map is one-to-one. By Choquet?s theorem it is also onto, and therefore it follows
that X is a simplex isomorphic to Q1.
83
Notice that throughout this section we used the assumption that X is a metric
space. This assumption can be dropped, but not without a cost. The Choquet-
Bishop-De Leeuw theorem shows that we still get representation, but we cannot say
that measures are supported on the extreme points anymore. One problem is that
in a metrizable space, the extreme points form a G? set, but in the absence of a
metric they may not even be Borel. Bishop and De Leeuw gave examples of such
convex sets [50].
A more subtle use of the fact that X is a metric space was in the proof of
Proposition 10. Recall that in the proof of Choquet?s theorem we constructed a
convex continuous function f such that ex(X) = {x ? X | f(x) = f(x)}. This
construction relied on the metrizability of X, and clearly is not possible without it,
as it would imply that ex(X) is a G? set. Without this tool it isn?t possible to prove
Proposition 10 in its full generality. Rather we get the weaker fact that ? ?M(X)
is maximal if and only if ?(f) = ?(f) for all f ? C(X). Thus the generalized version
of Choquet-Meyer is
Theorem 28. (Choquet-Meyer, non-metrizable) Let X be a compact convex subset
of a locally convex, metrizable topological vector space E. X is a Choquet simplex if
and only if for each x ? X there is a unique maximal measure ?x such that ?x ? ?x.
This is not exactly comparable to the Choquet-Bishop-De Leeuw theorem for
non-metrizable spaces. Recall that the Choquet-Bishop-De Leeuw theorem says that
there exists a representing measure which vanishes on Baire sets which contain no
extreme points. However, the Choquet-Meyer Theorem does not show uniqueness of
84
measures which satisfy this condition. Mokobodzki found an example of a simplex
X, for which ex(X) is Borel, but not Baire, and there exists a point x ? X with two
distinct representing measures ? and ?, with ?(X\ex(X)) = 0 and ?(X\ex(X)) = 1,
but both vanish on every Baire subset of X which is disjoint from its extreme points
[50].
We aren?t too concerned with these anomalous spaces, but they are the subject
of some continuing study today in functional analysis.
85
Chapter 4: My Results
4.1 Uniqueness of Positive Definite Extensions
Now I proceed to my contributions to the topic, including the proofs of my
main results. I wish to apply Choquet theory to both the problems of positive
definite extensions and continuous superresolution. Recall that the positive definite
extension problem is a special case of the superresolution problem, when we restrict
ourselves to only caring about probability measures. The first thing we must do is
identify the space we are working with and its extreme points.
Proposition 12. Let P be the set of probability measures on the torus,
P = {? ?M(T2) | ?f ? 0?(f) ? 0, ???TV = 1}. (4.1)
? ? P is an extreme point if and only if ? = ?x for some x ? T2.
Proof. The extreme points of a closed convex set are those which are not a convex
combination of two distinct elements. Note that for two probability measures ? and
?, the support of ?? + (1? ?)? is the union of the supports of ? and ?. Let ? be a
probability measure. If ? is supported on a single point, then it cannot be written
as a convex combination of any distinct ? and ?, because if they are distinct, their
86
supports must contain points outside of the support of ?.
If, on the contrary, ? is supported on at least two points a and b, then we can divide
T2 into two sets, A and B, such that a ? A and b ? B, such that ?(A) > 0 and
?(B) > 0. Then we can write
? = ?A + ?B. (4.2)
? ?
Because ?A/ d?A and ?B/ d?B are distinct probability measures, ? cannot
be an extreme point.
The problem I?ll focus on is to find whether the following algorithm produces
a unique result:
?# = argmin???TV such that ??(m,n) = ymn ?N ? m,n ? N, (4.3)
??M(T2)
where (y ) ? C(2N+1)?(2N+1)mn are the observed data.
Recall from Proposition 11 that the cone of nonnegative measures is a meet-
semilattice. Because P is a base for that cone, we know that P is a Choquet
simplex and has the property that each element has a unique representation as a
probability measure supported on ex(P ). But that clearly isn?t sufficient for unique
reconstruction from a finite number of frequency samples. We have many examples
87
of non-unique reconstruction [6]. A simple example is the two probability measures
1 ( )
?1 = (? + ?1/2 , ) (4.4)21
?2 = ? + ?1/4 + ?1/2 + ?3/4 . (4.5)
4
??1(n) = ??2(n) when |n| < 2, but not when |n| ? 2.
Consider the map F : P ? C(2N+1)?(2N+1) given by
F (?)(m,n) = ??(m,n) ?N ? m,n ? N. (4.6)
F (P ) is a compact, convex set in C(2N+1)?(2N+1). F will preserve convex combina-
tions in the sense that if ? is a probability measure that represents x ? M(T2),
then the pushforward F ? represents F (x) on C(2N+1)?(2N+1)? . Recall that ? is a
linear bounded functional on the space M(T2) with the property that for all affine
functions p : M(T2)? C,
?
p(x) = p(t) d?(t). (4.7)
M(T2)
Given an affine function q on C(2N+1)?(2N+1), we calculate
? ?
q(F (x)) = q(F (t)) d?(t) = q(z) dF??(z), (4.8)
M(T2) C(2N+1)?(2N+1)
and thus it is shown that F (x) has a representation by F??, which is also a proba-
bility measure.
88
Notice that there is a characterization of the uniqueness problem in the fol-
lowing way: the measure ?# in (1.70) is unique if the point (y ) ? C(2N+1)?(2N+1)mn
has a unique representation F??, where ? is a measure supported on ex(M(T2)).
We can conclude that, despite the fact we are attempting to reconstruct measures
in an infinite dimensional space, the reconstruction is actually taking place in the
finite dimensional space C(2N+1)?(2N+1). We wish to know whether we can formu-
late (1.70) in full without appealing to the infinite dimensional space M(T2). The
following lemma helps with that goal.
Lemma 29. Let P ?M(T2) be the set of probability measures as before, and F as
above. Then,
ex(F (P )) = F (ex(P )). (4.9)
Moreover, for each z ? ex(F (P )), there is a unique element x ? T2 such that
z = F (?x). F ex(P ) is a homeomorphism.
Proof. First, P is a convex, compact (in the weak-* topology) set, so by the Krein-
Milman theorem 20 it is the closed convex hull of ex(P ). F is linear and continuous
with respect to the weak-* topology, so F (P ) is not only compact and convex itself,
but it is also the closed convex hull of F (ex(P )). Therefore, by applying the Krein-
Milman theorem to F (P ), we can conclude that ex(F (P )) ? F (ex(P )).
We must show that each point of F (ex(P )) cannot be a convex combination of
two other points in F (P ). Notice first that each point z ? F (ex(P )) is of the form
z = e2?i?m,n??xmn , (4.10)
89
for some x ? T2. Hence each vector z ? F (ex(P )) has ?z?2 = 2N + 1. Therefore
F (P ) ? B(0, 2N + 1). Likewise, for any two distinct points z,w ? F (P ), t ? (0, 1)
?? ?tz + (1? t)w? < t?z?2 + (1? t)?w?2 ? 2N + 1. (4.11)2
Hence, tz + (1? t)w is not in F (ex(P )), and every point in F (ex(P )) is an extreme
point of F (P ).
Finally, if z ? ex(F (P )), then it must be in F (ex(P )), so there is an element
? ? ex(P ) such that z = F (? ). Note that for x,y ? T2x x , if for m,n ? {0, 1},
e2?i?m,n??x = e2?i?m,n??y, then x = y. Therefore, given N ? 1, x is unique.
Finally, we have shown that F ex(P ) is a continuous bijection. Because F is an
open map, so is its restriction. Therefore F ex(P ) is a homeomorphism.
An immediate consequence of this lemma is that the pushforward map F? : M(ex(P ))?
M(ex(F (P ))) is in fact an isomorphism of vector spaces. Therefore we conclude that
the following two statements are equivalent:
z ?M(T2? ) has a unique representation by a measure ? ?M(ex(P )).
? F (z) has a unique representation by a measure ? ?M(ex(F (P ))).
Of course, F (P ) is not a simplex, as it has infinitely many extreme points, so as
we already knew from the examples in [6] and (4.4), unique reconstruction is not
possible in general. But we are aware that in certain cases we can get unique
construction. For example, the measure 1/2(? + ?(1/2,0)) is the unique solution to
90
(1.70) when N ? 2. This is because the line
tF (?) + (1? t)F (?(1/2,0)) t ? [0, 1] (4.12)
is a simplex which lies in the boundary of F (P ). We call this a face of F (P ),
analogous to a face of a polyhedron. The idea of a face of a convex set came from
Alfsen, who defined the face of a Choquet simplex [1]. This idea will be important
for uniqueness.
Now I will prove my first main result on positive definite extensions.
Theorem 30. If a finite sequence y ? C(2N+1)?(2N+1) has an extension to an infi-
nite positive definite sequence (y ?mn)m,n=??, then at least one such positive definite
extension must be a finite sum of the form
?K
y ?2?i?m,n??xk 2mn = ?ke ?k > 0,xk ? T , (4.13)
k=1
where K ? 4N2 + 4N + 2.
Proof. Let y ? C(2N+1)?(2N+1) be arbitrary, such that the given assumptions hold.
Without loss of generality let y0,0 = 1. By Bochner?s theorem, there exists a ? ? P
such that y = F (?). Because C(2N+1)?(2N+1) is finite dimensional, Carathe?odory?s
theorem 19 implies that y has a representation as a convex combination of at most
(2N + 1)2 + 1 = 4N2 + 4N + 2 extreme points of F (P ). Because each extreme point
91
is of the form F (?x) for some unique x ? T2, we have
? ?? ?K K ?K
y = ?kF (? ) = F ? ? ? ?xk k x ?k k > 0, ?k = 1. (4.14)
k=1 k=1 k=1
?
The measure ? = ? 2k?x is a probability measure on T with F (?) = y, so ?? is ak
positive definite extension of y. The proof is complete.
4.2 Faces of Choquet Simplices
The first obstacle to generalizing Theorem 30 to signed measures is that there is
no longer an obvious setting to apply Choquet theory. The set of complex measures
{? ? M(T2) | ???TV = 1} is no longer even convex, let alone a Choquet simplex.
We propose to solve this problem by introducing the idea of faces. Consider why
compressive sensing works in the finite dimensional case: in high dimensions, the
shape of the `1 ball is such that not only is it a polyhedron, but most of its tangent
affine planes of sufficiently low dimension will hit one of the low-dimensional faces.
The faces of the `1 ball are in fact simplices. I will refer to the `1 ball as a cross-
polytope in n dimensions. The solid (hollow) cross-polytope is the set {x ? Rn |
?x?1 ?(=) 1}.
Notably, because its faces are simplices, a hollow cross-polytope has the prop-
erty that each point is a unique convex combination of vertices. We would like to
find an infinite-dimensional analogy for M(T2). We have by Proposition 7 that the
probability measures form a Choquet simplex. It is easy to see that each set of
the form {?? | ? ? M(T2), ? ? L?(T2), ? ? 0, |?| ? 1} is also an isomorphic
92
simplex via the affine transformation ? 7? ??. The central question of this section
is whether it is possible to define a generalized polytope, and in what generality can
it be defined?
A similar topic was studied by Alfsen [1]. As it turns out we may define a face
of a simplex in the following way.
Definition 12. Let X be a compact convex set in a topological vector space V . An
affine space H of codimension 1 is said to be a supporting hyperplane for X if
H ?X 6= ? and X \H is convex. (4.15)
If H is a supporting manifold, then H ?X is a face of X.
An immediate implication is that a face can consist of a single point {x} if
and only if x is an extreme point. Thus the extreme points are analogous to the
vertices of a simplex. We can in fact strengthen that statement to the following.
Proposition 13. If x in a face F = H ? X has a representation ? ? ?x, then ?
must be supported in H.
Proof. Assume x, F,H, ? as above, and that ? is not supported in H. It is a standard
fact, and a corollary of the Hahn-Banach theorem, that there is a linear functional
f : V ? R such that f = 0 on H and f > 0 on X \H. Because ? is not supported
in H, there must be some compact S ? X \ H such that ?(S) > 0 and f(S) is
93
bounded below by  > 0. We can conclude that
? ?
f(t) d?(t) ? ?(S) > 0 = f(t) d?x(t), (4.16)
X X
which is a contradiction.
An immediate consequence is that extreme points of faces correspond well with
extreme points of their underlying sets.
Corollary 1. Let C be a face of a compact convex subset X. Then C is also compact
and convex, and ex(C) ? ex(X).
Proof. C is the intersection of a closed convex hyperplane H and a compact convex
set X, so it is immediately compact and convex.
Let x ? ex(C). By the Krein-Milman theorem we have that x ? C, so Cho-
quet?s theorem implies that there exists a measure ? supported on ex(X) such
that ? represents x. But by Proposition 13, ? is supported in H ? X = C, and
since x is an extreme point of C we can conclude that ? = ?x. But because
{x} = supp(?) ? ex(X), x is an extreme point of X.
We now know that the face structure of a closed convex set has some useful
properties with respect to its extreme points. One more important implication is
a property of Choquet simplices which will be helpful in generalizing Theorem 30.
Proposition 14 explores this, but first we need to return to some materials on cones
and lattices.
94
Definition 13. A subcone X? of a cone X is said to be hereditary if 0 ? x ? y and
y ? X? implies x ? X?.
Proposition 14. Each face of a Choquet simplex is itself a simplex. The extreme
points of a face are each extreme points of X as well.
Proof. The proof will take place in two parts. Recall that associated to any Choquet
simplex X is a cone X?. A compact convex set X is a Choquet simplex if and only
if the associated cone X? is a lattice under the associated order: x ? y if y? x ? X?.
In part 1 of the proof, we wish to show that each face F of X forms a hereditary
subcone of X??that is given y ? F? and x ? X?, if x ? y then x ? F? . In part 2 we
will show that every hereditary subcone of a lattice is a lattice itself.
(Part 1) If F is a face of X, then there is a hyperplane H ? V such that F = H?X,
and X falls on one side of H. Equivalently we can say that there exists a linear
functional f such that H = {x ? V | f(x) = 1}, and f(x) ? 1 for all x ? X.
We can extend f to the cone X? sitting in the space V ? R in a natural way by
f(?x) = ?f(x). Then we can characterize F? = {?x ? X? | f(?x) = ?}.
Let y ? F and x ? X such that for some positive numbers ?, ? we have
?y ? ?x. We know immediately that f(?x) ? ?, so f(?y??x) ? f(?y)?? = ???.
On the other hand, because ?y ? ?x ? X?, we have that
( ( ))
? ?
f(?y ? ?x) = f (?? ?) y ? x ? (?? ?). (4.17)
?? ? ?? ?
Therefore f(?y ? ?x) = ?? ? and ?x ? F?
(Part 2) Now I wish to show that every hereditary subcone of a lattice is a lattice
95
itself. To that end, let ?X be the ordering induced by X and ?F be the one induced
by F . Let x, y ? F? and we wish to find a greatest lower bound for x and y in F? . Let
z be the greatest lower bound in X?, and let w ? F? such that w ?F x and w ?F y.
We know already that w ? F? . Since F ? X, we have that 0 ?X w ?X z. Then
because z?w ?X z ? F? , so z?w ? F? and w ?F z. Therefore z is a greatest lower
bound and F is a lattice.
We can conclude that each face of a Choquet simplex X generates a hereditary
subcone of X?, and is therefore a Choquet simplex itself.
To end this section, I will point out that the definition of a face of a simplex is
easily generalizable. We will find the notion useful in exploring the shape of cross-
polytopes (which are not Choquet simplices but share some useful properties) in
M(T2).
Definition 14. Let C be a compact convex set in a topological vector space V . An
affine space H of codimension 1 is said to be a supporting hyperplane for C if
H ? C =6 ? and C \H is convex. (4.18)
If H is a supporting manifold, then H ? C is a face of C.
96
4.3 Uniqueness for Signed and Complex Measures
Let B = {? ? M(T2) | ???TV ? 1}. I observe that by the definition from
Alfsen, P is actually a face of B. Take the hyperplane in M(T2) defined by
{ ? }
H = ? ?M(T2) | Re d? = 1 . (4.19)
T2
?
Then P = H ? B and for all ? ? B, Re d? ? ???TV ? 1. Therefore H is a
supporting hyperplane which carves out the face P . It is easy to imagine that one
is able to prove similar results for other faces of B, which begs the question: what
is a characterization of the faces of the cross-polytope B?
A hyperplane H in a vector space V can always be generated by a linear
functional f : V ? R, such that H = f?1(1). We can say that H is a supporting
hyperplane for a compact convex set C if and only if f can be chosen such that
for all x ? C, f(x) ? 1. Otherwise the hyperplane would split C into two disjoint
convex sets f?1((1,?)) ? C and f?1((??, 1)) ? C.
Let ? : T2 ? C be a continuous function with ???? = 1.?? is identified with
a complex valued functional on M(T2) given by ? 7? ?(?) = ?(t) d?(t). In fact,
recall that we?ve given M(T2) the weak-* topology, therefore by definition every
element of the topological dual M(T2)? is of this form for some ? [5]. Define the
affine hyperplane generated by ? as
H? = ?
?1(1). (4.20)
97
I make the following observations about H?.
Proposition 15. H? is a supporting hyperplane for B if and only if ???? = 1.
Proof. If ???? = 1 then there exi?sts some x ? T
2 suc?h that for all y ? T
2, |?(y)| ?
|?(x)| = 1. Then for all ? ? B, | ?(t) d?| ? 1 and ?(t) d(?(x)?x(t)) = 1. Thus
H? ?B is nonempty and B \H? is convex.
Given H? is a supporting hyperplane of B, then for all x ? T2, |?(x)| =
|?(?x)| ? 1, so ???? ? 1. On the other hand, there must exist some ? ? B such
that ?
1 = ?(t) d?(t) ? ??? |?|(T2? ) = ???? . (4.21)
T2
Proposition 16. Let ? ? C(T2) with ???? = 1. The face H? ? B can be charac-
terized as following. A measure ? ? B is in H? ? B if and only if ?? is a positive
measure supported on the set {x ? T2 | |?(x)| = 1}.
?
Proof. Given ? ? B, ? is in H? if ?(t) d?(t) = 1. Assume for contradiction
that ?? is not a positive measure. Then there exists a set A ? T2 such that
Re(??)(A) < |??|(A). Then
|??|(T2) = |??|(A) + |??|(Ac) (4.22)
> Re(??)(A) + Re(??)(Ac) (4.23)
= Re(??)(T2) = 1. (4.24)
But because ? ? B, ????M ? ???????TV ? 1, which is a contradiction.
98
Similarly, if ? is not supported on the set {x ? T2 | |?(x)| = 1}, then there
exists a constant  < 1 such that A = {x ? T2 | |?(x)| < } has non-zero |?|-
measure. Then
? ?
1 = ?(?) = ?(t) d?(t) + ?(t) d?(t) (4.25)
A Ac 
? |?|(A ) + |?|(Ac ) (4.26)
< |?|(A c) + |?|(A) ? 1, (4.27)
which is a contradiction. The proof is complete.
We can conclude that faces ofM(T2) are characterized by continuous functions
with ???? = 1. Now consider the set F (B). We have the following result from Alfsen
which links the face structure of B and F (B) [1].
Proposition 17. Let ? : V ? W be a linear map between vector spaces and let
X ? V be a convex set. H ? ?(X) is a face of ?(X) if and only if ??1(H) is a face
of X.
So because F is a linear map, we conclude that faces of F (B) correspond
to faces of B, but not every face of B necessarily corresponds to a face of F (B).
For example, the face of B defined by the function ? = 1 contains all measures of
the form ?x, x ? T2, but the set {F (? ) | x ? T2x } does not fall in a hyperplane
of C(2N+1)?(2N+1), so it cannot map to a face of F (B). In fact, in order for ? to
represent a face of F (B), it must factor through F , in the sense that there exists a
function ?? : C(2N+1)?(2N+1) ? R such that ?(x) = ??(F (x)).
99
Now I will prove the main theorems of this section.
Theorem 31. For each sequence y ? C(2N+1)?(2N+1), there exists an finite sum of
point measures ?K
? = ?k?
2
x ?k ? C, xk ? T , (4.28)k
k=1
such that for ?N ? m,n ? N, ??(m,n) = y(m,n). Furthermore, for all ? ?M(T2),
if ??(m,n) = y(m,n) for ?N ? m,n ? N, then ???TV ????TV .
Proof. Let 0 =6 y ? C(2N+1)?(2N+1). Because F (B) is closed, bounded and contains
a neighborhood of the origin, we can guarantee that there exists an ? > 0 such that
y ? ?F (B), and that ? is the minimal such constant. Without loss of generality,
assume that ? = 1. Then for all ? ? M(T2), if F (?) = y, then ???TV ? 1. It is
sufficient to construct ? satisfying the desired property, such that ???TV = 1.
Because ? is minimal and F (B) is compact, y must fall in the boundary of
F (B). It is a fact that the boundary of a compact convex set is the union of its
faces, therefore there exists a hyperplane H and a corresponding face of F (B) such
that y ? H ? F (B). Recall, as in Propositions 15 and 16, that associated to H
is a functional ? ? (C(2N+1)?(2N+1))?, such that for all x ? F (B), ??,x? ? 1 and
H = ??1(1). Likewise, F?1(H) is a supporting hyperplane of B, which is generated
by the functional ? ? F . By Corollary 1, each extreme point of the face H ? F (B)
is an extreme point of F (B), which corresponds with a unique extreme point of B.
Because G contains all its extreme points, we can then characterize it as a closed
100
convex hull of delta measures.
ex(G) = {??(x)?x ?M(T2) | |??(x)| = 1}. (4.29)
Because G is isomorphic to a face of the set of positive definite measures, it is a
Choquet simplex. We then proceed in an identical manner to the proof of Theorem
30. First I claim that for each z ? ex(F (G)), there is a unique x ? M(T2) such
that z = ??(x)F (?x). It follows from the Krein-Milman theorem that ex(F (G)) ?
F (ex(G)), so we must show that each point in F (ex(G)) is an extreme point of
F (G). Let x ? T2 such that z = ??(x)F (?x) is an extreme point of G. Then
zmn = ??(x)e
?2?i?m,n??x, (4.30)
and it is straightforward to see that ?z?2 = 2N + 1. If we let z 6= w ? F (G),
t ? (0, 1), then we can compute
?? ?tz + (1? t)w? < t?z?2 + (1? t)?w?2 ? 2N + 1. (4.31)2
Therefore for any non-extreme point of F (G), its norm must be strictly less than
2N + 1, thus every point in F (ex(G)) is extreme.
By Carathe?odory?s theorem, there exists a representation of y as a finite convex
sum of extreme points of F (G). As we have shown, to each extreme point z ?
ex(F (G)), we can associate a unique x ? T2 such that z = ??(x)F (?x), so we can
101
write y as a convex sum with weights ?k > 0:
? ?K ?? ?Ky = ?k??(xk)F (?x ) = F ?k??(x )? ?k x . (4.32)k k
k=1 k=1
Finally, because each xk is distinct, we can compute that
??????K ??
? ?K
? ?k??(xk)?x ??? = ?k|??(xk)| = 1. (4.33)kk=1 k=1
TV
We have successfully constructed ? as desired.
Now we have shown that we cannot hope to uniquely recover non-discrete
measures via the total variation minimization in (1.70). My final result is a charac-
terization of exactly which measures allow for unique reconstruction. As we already
know, any such measure must be supported on a finite set, but that is still not suffi-
cient. Recall the work of Cande?s and Fernandez-Granda [12, 13, 30], which showed
that a minimum separation requirement was sufficient. My characterization is based
on a description of the faces of F (B). Recall that faces of B are generated by con-
tinuous functions ? on T2, and that it corresponds to a face of F (B) if the function
? factors through F . Since unique reconstruction depends on the face structure of
F (B), then we can characterize them by describing the admissible functions ? which
generate faces of F (B) which are also simplices.
For the proof of this theorem I first need the following lemma.
Lemma 32. Each face of B is a Choquet simplex.
Proof. Let G = B?H? be a face of B. By Proposition 16, ? ? C(T2) with ???? = 1.
102
Let P be the set of probability measures onM(T2), and consider the linear operator
?? : G? P given by
??(?) = ??. (4.34)
Note that ?? is invertible, because for any ? ? G, ??? = ?. On the other hand for
all ? ? P , ? ? ??(G) if and only if |?|? = ?. Therefore we can conclude that ?? is a
linear isomorphism between G and P ?H|?|. Because P ?H|?| is a face of a Choquet
simplex, by Proposition 14 it is also a simplex, and likewise G is as well.
Theorem 33. Let ? ? M(T2). ? is the unique solution to the algorithm in (1.70)
if there exists a trigonometric polynomial on T2,
?N
? = c e?2?i?m,n??xmn cmn ? C, (4.35)
m,n=?N
with the following properties:
1. ???? = 1
2. ?? = |?|
3. The set S = {?(x)?x ? T2 | |?(x)| = 1} is finite, and F (S) is affinely indepen-
dent.
Proof. Assume without loss of generality that ???TV = 1. The proof will be in four
parts.
(Part 1) First, it follows immediately from Proposition 16 that properties 1 and 2
are equivalent to the statement that ? is in a face G of B generated by ?.
103
(Part 2) Next I claim that any ? satisfying properties 1 and 2 is of the form (4.35)
if and only if it factors through F , which likewise is true if and only if F (G) is a
face of F (B).
Let ? ? C(T2), with ???? = 1. ? factors through F if and only if there exists
a covector ?? ? (C(2N+1)?(2N+1))? such that for any ? ? M(T2), ?(?) = ??(F (?)).
This expands to ? ?N
?(t) d?(t) = ??(m,n)??(m,n). (4.36)
T2 m,n=?N
It is clear from this equality that ?? must be the Fourier transform of ? and by
Parseval?s theorem equality holds for all ? ? M(T2) if and only if the Fourier
transform of ? is supported on C(2N+1)?(2N+1), in which case it is a trigonemetric
polynomial of the form in (4.35). H?? is clearly a supporting hyperplane of F (B),
and F (G) = F (B)?H??. The reverse direction is true trivially from Proposition 17.
Thus the claim is proved.
(Part 3) y is a unique convex combination of extreme points of the face F (G)
if F (S) is affinely independent.
?(x)?x ? ex(G) if and only if |?(x)| = 1, so S = ex(G). Therefore if F (S) is
affinely independent, F (G) is a simplex. Therefore by Carathe?odory?s theorem, we
can write y as a unique convex combination of extreme points of G. Because every
representation of y must be supported on G, this combination must be unique over
all representations on ex(F (B)).
(Part 4) My next claim is that ? must be the unique solution to (1.70) if y = F (?)
104
can be written as a unique sum of extreme points of F (G).
In the case that y is not in the boundary of F (B), there exists some 0 < ? < 1
such that y ? ?F (B). Then there exists some ? ? ?B such that F (?) = y.
???TV ? ? < ???TV , so ? is not a solution to (1.70). Likewise the pushforwards
F?? and F?? are distinct measures supported on ex(F (B)), which each represent y.
Hence the claim holds.
In the case that y is in the boundary of F (B), there exists some supporting
hyperplane H ? C(2N+1)?(2N+1)?? such that y is in the face F (G) = F (B) ? H??.
By Proposition 17, G ? B is also a face of B, which is generated by a supporting
hyperplane H?. Let ? ? G?B. By Proposition 32, G?B is a Choquet simplex, so by
the Choquet-Meyer theorem there is a unique probability measure ?? ? (M(T2))?,
supported on ex(G), which represents ?. By Lemma 29, F G is a homeomorphism,
so F induces a vector space isomorphism F? betweenM(ex(G)) andM(ex(F (G))).
For any functional A on C(2N+1)?(2N+1), see that
? ?
A(t) dF???(t) = A(F (s)) d??(s) = A(F (?)). (4.37)
C(2N+1)?(2N+1) M(T2)
Because A was arbitrary, F (?) = y if and only if F??? represents y. Because F? is
an isomorphism of vector spaces, we can conclude that ? is the unique measure in
B such that F (?) = y if and only if F??? is the unique measure in M(ex(G)) which
represents y. In addition, by Theorem 31, if ? is unique then the measure F??? must
be a finite sum.
The proof is complete.
105
A few notes on the preceding theorem. It may be possible to loosen the last
requirement on ? slightly. Assume that ex(G) is finite. Using tools from algebraic
geometry, we may put a bound on the number of extreme points of G. Let ? =
(1???). The zero set of ? is identical to the set ex(F?1(G)). Under the assumption
that this set is finite, Be?zout?s theorem gives us a bound. Because ? has degree at
most 4N , Be?zout?s theorem says that it can have at most 4N zeros. For a more in
depth derivation of this result, see appendix A.3 of [49].
Now F (G) is a simplex if and only if the set of extreme points of F (G) is
linearly independent in C(2N+1)?(2N+1). This may not seem like a simplification, but
for any given polynomial ? we can guarantee this to be true for sufficiently large
N . Consider that for ?x ? ex(G), as N ? ?, F?1(F (?x)) approaches the delta
function ?x, and it is easy to see that any finite set of Dirichlet kernels, F(F (?x ) fork
xk ? T2, will evenually be linearly independent for sufficiently large N . How large
N must be relative to the degree of ? is unknown to the author?s knowledge.
Finally, we conclude with one more useful result that is an immediate appli-
cation of the preceding theorem.
Corollary 2. Any pair of positive delta measures can be uniquely recovered from
(1.70), given N ? 2.
Proof. Let x = (x1,x2),y = (y1,y2) ? T2. Define the polynomial ?(s, t) = (1 ?
cos(s) ? cos(t))/2. It is easy to see that 0 ? ? ? 1, and ? = 0 if and only if
106
s = t = 0. Define
?(s, t) = 1? ?(s? x1, t? x2)?(s? y1, t? y2). (4.38)
0 ? ? ? 1, and |?(z)| = 1 only when z = x or y. Therefore for any positive
measure supported on {x,y}, ?? = |?|. Finally, note that as x and y are distinct,
{F (x), F (y)} is trivially affinely independent. Therefore ? satisfies the properties
for Theorem 33, and we can conclude that any positive measure supported on {x,y}
can be uniquely recovered by (1.70).
107
Bibliography
[1] Erik M. Alfsen, On the geometry of Choquet simplexes, Math. Scand. 15 (1965),
no. 1, 97?110.
[2] A. P. Artemenko, Hermitian positive functions and positive funcctionals, Ph.D.
thesis, Odessa State University, 1983.
[3] Joseph A. Ball and Moise?s D. Guerra Huama?n, Convexity analysis and the
matrix-valued Schur class over finitely connected planar domains, J. Operator
Theory 70 (2013), no. 2, 531?571.
[4] John J. Benedetto, Harmonic Analysis and Applications, CRC Press, 1996.
[5] John J. Benedetto and Wojciech Czaja, Integration and Modern Analysis,
Birkha?user Advanced Texts, Birkha?user, Basel, 2009.
[6] John J. Benedetto and Weilin Li, Super-resolution by means of Beurling mini-
mal extrapolation, 2016.
[7] Sergei N. Bernstein, Sur les fonctions absolument monotones, Acta Math. 52
(1928), 1?66.
[8] Salomon Bochner, Vorlesungen u?ber Fouriersche Integrale, first ed., Akademis-
che Verlagsgesellschaft, Leipzig, 1932.
[9] , Monotone Funktionen, Stieltjessche Integrale und harmonische Anal-
yse, Math. Ann. 108 (1933), 378?410.
[10] Stephen Boyd and Lieven Vandenberghe, Convex optimization, first ed.,
pp. 215?232, Cambridge University Press, Cambridge, 2004.
[11] Alberto P. Caldero?n and Raymond Pepinsky, On the phases of Fourier coeffi-
cients for positive real periodic functions, Computing Methods and the Phase
Problem in X-Ray Crystal Analysis (Raymond Pepinsky, ed.), The X-Ray Crys-
tal Analysis Laboratory, Department of Physics, The Pennsylvania State Col-
lege, 1952, pp. 339?348.
108
[12] Emmanuel Cande?s and Carlos Fernandez-Granda, Super-resolution from noisy
data, J. Fourier Anal. Appl. 19 (2013), 1229?1254.
[13] , Towards a mathematical theory of super-resolution, Comm. Pure Appl.
Math. 67 (2013), no. 6, 906?956.
[14] Emmanuel Cande?s, Justin Romberg, and Terence Tao, Robust uncertainty prin-
ciples: exact signal reconstruction from highly incomplete frequency informa-
tion, IEEE Trans. Inform. Theory 52 (2006), no. 2, 489?509.
[15] , Stable signal recovery from incomplete and inaccurate measurements,
Comm. Pure Appl. Math. 59 (2006), 1207?1223.
[16] Emmanuel Cande?s and Terence Tao, Decoding by linear programming, IEEE
Trans. Inform. Theory 51 (2005), no. 12, 4203?4215.
[17] , Near optimal signal recovery from random projections: universal en-
coding strategies?, IEEE Trans. Inform. Theory 52 (2006), no. 12, 5406?5425.
[18] Emmanuel J. Cande?s, Justin Romberg, and Terence Tao, Signal recovery from
incomplete and inaccurate measurements, Comm. Pure Appl. Math 59 (2005),
no. 8, 1207?1223.
[19] Constantin Carathe?odory, U?ber den Variabilita?tsbereich der Koeffizienten von
Potenzreihen, die gegebene Werte nicht annehmen, Math. Ann. 64 (1907), 95?
115.
[20] , U?ber den Variabilita?tsbereich der Fourierschen Konstanten von posi-
tiven harmonischen Funktionen, Rend. Circ. Mat. Palermo 32 (1911), 193?217.
[21] Gustave Choquet, Theory of capacities, Ann. Inst. Fourier 5 (1955), 131?295.
[22] , Existence et unicite? des representations inte?grales au moyen des
points extre?maux dans co?nes convexes, Se?minaire Bourbaki : anne?es 1956/57
- 1957/58, expose?s 137-168, Se?minaire Bourbaki, no. 4, Socie?te? mathe?matique
de France, 1958.
[23] Gustave Choquet and Paul-Andre? Meyer, Existence et unicite? des representa-
tions inte?grals dans les convexes compacts quelconques, Ann. Inst. Fourier 13
(1963), 139?154.
[24] Krzysztof J. Ciosmak, Optimal transport and Choquet theory, 2020.
[25] Yohann de Castro and Fabrice Gamboa, Exact reconstruction using Beurling
minimal extrapolation, J. Math. Anal. Appl. 395 (2012), no. 1, 336?354.
[26] Gaspard de Prony, Essai e?xperimantal et analytique: sur les lois de la dilata-
bilite? de fluides e?lastique et sur celles de la force expansive de la vapeur de
l?alkool, a? diffe?rentes tempe?ratures, J. E?. polytech. Math. 1 (1795), no. 22,
24?76.
109
[27] Ronald A. DeVore, Nonlinear Approximation, vol. 7, Cambridge University
Press, 1998.
[28] David A. Edwards, Some applications of Choquet theory to harmonic analysis,
J. Lond. Math. Soc. 59 (1999), no. 2, 657?668.
[29] Herbert Federer, Geometric Measure Theory, vol. 153, Springer, Berlin, 1969.
[30] Carlos Fernandez-Granda, Super-resolution of point sources via convex pro-
gramming, Information and Inference 5 (2016), no. 3, 251?303.
[31] Robert Fraser and Kyle Hambrook, Explicit Salem sets in Rn, 2019.
[32] , Explicit salem sets in Rn, 2020.
[33] David Hilbert, U?ber die Darstellung definiter Formen als Summe von Formen-
quadraten, Mathem. Annalen 32 (1888), 342?350.
[34] Jean-Pierre Kahane, Some Random Series of Functions, Cambridge Studies in
Advanced Mathematics, vol. 5, Cambridge University Press, Cambridge, 1993.
[35] Jean-Pierre Kahane and Raphael Salem, Ensembles Parfaits et Se?ries
Trigonome?triques, vol. 7, Cambridge University Press, 1964.
[36] R. Kaufman, On the theorem of jarn??k and besicovitch, Acta Arith. 39 (1981),
no. 3, 265?267.
[37] Aleksandr Khinchin, Korrelationstheorie der stationa?ren stochastischen
Prozesse, Math. Ann. 109 (1934), 604?615.
[38] Mark Krein, Sur le proble?me du prolongement des fonctions hermitiennes pos-
itives et continues, Dokl. Akad. Nauk SSSR 26 (1940), 17?22.
[39] Mark Krein and David Milman, On extreme points of regular convex sets, Studia
Math. 9 (1940), no. 1, 133?138.
[40] H. Landau, Extrapolating a band-limited function from its samples taken in a
finite interval, IEEE Trans. Inf. Theory 32 (1986), 464?470.
[41] H. J. Landau, Maximum entropy and the moment problem, Bull. Amer. Mth.
Soc. (N.S.) 16 (1987), no. 1, 47 ? 77.
[42] Paul Le?vy, The?orie de l?Addition des Variables Ale?atoires, first ed., Gauthier-
Villars, Paris, 1937.
[43] Xiaodong Li, Shuyang Ling, Thomas Strohmer, and Ke Wei, Rapid, robust,
and reliable blind deconvolution via nonconvex optimization, Appl. Comput.
Harmon. Anal. 47 (2018), no. 3, 893?934.
[44] Ste?phane Mallat and Guoshen Yu, Super-resolution with sparse mixing estima-
tors, IEEE Trans. Image Process. 19 (2010), no. 11, 2889?2900.
110
[45] Hassan Mansour, Dehong Liu, Ulugbek S. Kamilov, and Petros T. Boufounos,
Sparse blind deconvolution for distributed radar autofocus imaging, 2018.
[46] Toru Maruyama, Herglotz-Bochner representation theorem via theory of distri-
butions, J. Oper. Res. Soc. Japan 60 (2017), no. 2, 122?135.
[47] Pertti Mattila, Fourier Analysis and Hausdorff Dimension, first ed., Cambridge
Studies in Advanced Mathematics, vol. 150, Cambridge University Press, Cam-
bridge, 2015.
[48] Greg Ongie, Sampurna Biswas, and Mathews Jacob, Convex recovery of con-
tinuous domain piecewise constant images from nonuniform Fourier samples,
IEEETrans. Sitnal Process. 66 (2018), no. 1, 236?250.
[49] Greg Ongie and Mathews Jacob, Off-the-grid recovery of piecewise constant
images from few Fourier samples, SIAM Journal on Imaging Sciences 9 (2016),
1004?1041.
[50] Robert R. Phelps, Lectures on Choquet?s Theorem, second ed., Lecture Notes
in Mathematics, Springer, Berlin, 2001.
[51] Dmitrii A. Raikov, On the decomposition of Gauss and Poisson laws, Izv. Akad.
Nauk. SSSR 2 (1937), 91?124 (russian).
[52] Walter Rudin, The extension problem for positive-definite functions, Illinois J.
Math. 7 (1963), no. 3, 532?539.
[53] , An extension theorem for positive-definite functions, Duke Math. J. 37
(1970), no. 1, 49?53.
[54] Tapan K. Sarkar and Odilon Pereira, Using the matrix pencil method to es-
timate the parameters of a sum of complex exponentials, IEEE Antennas and
Propagation Magazine 37 (1995), no. 1, 48?55.
[55] Zolta?n Sasva?ri, The extension problem for measurable positive definite func-
tions, Math. Zeitschrift 191 (1986), 475?478.
[56] , The extension problem for positive definite functions. A short historical
survey, Operator theory and indefinite inner product spaces, Oper. Theory Adv.
Appl., vol. 163, Birkha?user, Basel, 2006, pp. 365?379.
[57] Thomas Strohmer and Ke Wei, Painless breakups?efficient demixing of low rank
matrices, J. Fourier Anal. Appl. 25 (2019), 1?31.
[58] Otto Toeplitz, U?ber die Fourier?sche Entwickelung positiver Funktionen, Rend.
Circ. Mat. Palermo 32 (1911), 191?192.
[59] Michael Unser, Julien Fageot, and John Paul Ward, Splines are universal so-
lutions of linear inverse problems with generalized TV regularization, SIAM
Review 59 (2017), no. 4, 769?793.
111
[60] David V. Widder, The Laplace Transform, Princeton Mathematical Series,
vol. 6, Princeton University Press, Princeton, 1946.
112