ABSTRACT
Title of dissertation: HIGHER ORDER ASYMPTOTICS
FOR THE CENTRAL LIMIT THEOREM
AND LARGE DEVIATION PRINCIPLES
Buddhima Kasun Fernando Akurugodage
Doctor of Philosophy, 2018
Dissertation directed by: Professor Dmitry Dolgopyat
Department of Mathematics
First, we present results that extend the classical theory of Edgeworth expan-
sions to independent identically distributed non-lattice discrete random variables.
We consider sums of independent identically distributed random variables whose
distributions have d + 1 atoms and show that such distributions never admit an
Edgeworth expansion of order d but for almost all parameters the Edgeworth ex-
pansion of order d− 1 is valid and the error of the order d− 1 Edgeworth expansion
is typically O(n−d/2) but the O(n−d/2) terms have wild oscillations.
Next, going a step further, we introduce a general theory of Edgeworth expan-
sions for weakly dependent random variables. This gives us higher order asymptotics
for the Central Limit Theorem for strongly ergodic Markov chains and for piece–
wise expanding maps. In addition, alternative versions of asymptotic expansions
are introduced in order to estimate errors when the classical expansions fail to hold.
As applications, we obtain Local Limit Theorems and a Moderate Deviation Prin-
ciple.
Finally, we introduce asymptotic expansions for large deviations. For suffi-
ciently regular weakly dependent random variables, we obtain higher order asymp-
totics (similar to Edgeworth Expansions) for Large Deviation Principles. In partic-
ular, we obtain asymptotic expansions for Cramér’s classical Large Deviation Prin-
ciple for independent identically distributed random variables, and for the Large
Deviation Principle for strongly ergodic Markov chains.
HIGHER ORDER ASYMPTOTICS FOR THE CENTRAL LIMIT
THEOREM AND LARGE DEVIATION PRINCIPLES
by
Buddhima Kasun Fernando Akurugodage
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2018
Advisory Committee:
Prof. Dmitry Dolgopyat, Chair/Adviser
Prof. Leonid Koralov
Prof. Mark Freidlin
Prof. Rodrigo Treviño
Prof. Edward Ott, Dean’s Representative
©c Copyright by
Buddhima Kasun Fernando Akuruogdage
2018
Dedication
To the memory of my uncle, Ivan Fernando, who urged me to seek truth.
ii
Acknowledgments
I am forever grateful to Dima, for taking me under his wing and showing me
how to find my way in the mathematical landscape, for the insightful conversations
and for his patience and thoughtfulness. His mathematical brilliance and humility
are things that I will always look up to.
I am also thankful to my collaborators, Carlangelo and Pratima for keeping
me on the right track and, to Lashi, Sam, Alex and Peter for being positive role
models. I am grateful to all my friends at UMD: Hamid, Danul, Micah, Steven,
Pratima, Jerry, Jenny, Hsin-Yi, Shujie, David, Corry, Jing, Phil, Minsung, Patrick,
JP, Ryan, Jacky and Nick, for being great companions.
I extend my gratitude to Larry, without whom I would not even be in the
program, to Vadim for initiating me into Dynamics and to Leonid for initiating me
into Probability. I also appreciate the work of the awesome administrative staff
at the UMD Mathematics Department, including Haydee, Celeste, Liliana, Sharon,
Bill and Cristina. Thank you for turning paperwork into a minor problem and for
all the reminders about deadlines.
Finally, I am grateful to my parents and Kasunka, for everything they have
done. I would not have been able to get through my PhD if not for their encour-
agement and support.
iii
Table of Contents
Dedication ii
Acknowledgements iii
List of Abbreviations and Symbols vi
1 Introduction 1
2 Central Limit Theorem: Discrete Random Variables. 11
2.1 Overview and main results. . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Edgeworth Expansion under Diophantine conditions. . . . . . . . . . 19
2.3 Change of variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Cut off. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.1 Density. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Fourier transform. . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Simplifying the error. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Expectation of characteristic function. . . . . . . . . . . . . . . . . . 34
2.7 Relation to homogeneous flows. . . . . . . . . . . . . . . . . . . . . . 38
2.8 Finite intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Central Limit Theorem: Weakly Dependent Random Variables. 44
3.1 Overview and main results. . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Proofs of the main results. . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Computing coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4 Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4.1 Local Limit Theorem. . . . . . . . . . . . . . . . . . . . . . . 79
3.4.2 Moderate Deviations. . . . . . . . . . . . . . . . . . . . . . . . 83
3.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5.1 Independent variables. . . . . . . . . . . . . . . . . . . . . . . 86
3.5.2 Finite state Markov chains. . . . . . . . . . . . . . . . . . . . 87
3.5.3 More general Markov chains. . . . . . . . . . . . . . . . . . . . 90
3.5.3.1 Chains with smooth transition density. . . . . . . . . 90
3.5.3.2 Chains without densities. . . . . . . . . . . . . . . . 95
3.5.4 One dimensional piecewise expanding maps. . . . . . . . . . . 97
3.5.5 Multidimensional expanding maps. . . . . . . . . . . . . . . . 102
iv
4 Large Deviation Principles. 104
4.1 Asymptotics for Cramér’s Theorem. . . . . . . . . . . . . . . . . . . . 104
4.1.1 Weak asymptotic expansions. . . . . . . . . . . . . . . . . . . 104
4.1.2 Strong asymptotic expansions. . . . . . . . . . . . . . . . . . . 109
4.2 Higher order asymptotics in the non–i.i.d. case. . . . . . . . . . . . . 113
4.3 An application to Markov Chains. . . . . . . . . . . . . . . . . . . . . 122
A Appendix 128
A.1 Convergence of X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
A.2 Hierarchy of Expansions. . . . . . . . . . . . . . . . . . . . . . . . . . 131
A.3 Construction of {fk}. . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Bibliography 137
v
List of Abbreviations
CLT Central Limit Theorem
i.i.d. independent and identically distributed
LCLT Local Central Limit Theorem
LDP Large Deviation Principle
LDCT Lebesgue Dominated Convergence Theorem
LHS Left hand side
LLT Local Limit Theorem
PDE Partial differential equation
RHS Right hand side
WLOG Without loss of generality
vi
vii
Chapter 1: Introduction
The Central Limit Theorem (CLT) is one of the most fundamental concepts in
probability which was introduced by the work of Laplace and Bernoulli. It describes
the long term be∑haviour of random trials repeated under uniform conditions.N
Let SN = Xn be a sum of random variables. We say that SN satisfies the
n=1
CLT if there are real constants A(and σ > 0 such)that
S
P N√−NA∫ lim ≤ z = N(z) (1.1)N→∞ Nz 1 y2
where N(z) = n(y)dy and n(y) = √ e− 2σ2 .
−∞ 2πσ2
The usefulness of the CLT and related limit theorems depends on rapid con-
vergence of distributions of normalized partial sums to the limiting distribution.
This is because limit theorems are primarily used for approximating distributions
of sums of large but finite number of random variables. Therefore, an important
problem is to estimate the rate of convergence of (1.1).
In this regard, an asymptotic expansion as a series of increasing powers of
order n−1/2 (now commonly referred to as the Edgeworth expansion) was formally
derived by Chebyshev in [8]. Kolmogorov and Gnedenko emphasize the importance
of these expansion in their monograph [23] by stating that the Edgeworth Expansion
is “the most powerful and general method of finding such corrections.”
1
Definition 1. SN admits Edgeworth expansion of order r if there are polynomials
P1(z), . . . , Pr(z)(such that ) r
SN√−NA
∑ ( )
P ≤ Pp(z)z = N(x) + n(z) +o N−r/2︸ ︷︷ ︸ (1.2)N Np/2p=1
Er,N (z)
uniformly for z ∈ R.
Remark 1.1. It is an easy observation that Edgeworth expansion of SN , if it exists,
is unique. Suppose {Pp(z)}p and {P̃p(z)}p, 1 ≤ p ≤ r are polynomials corresponding
to two Edgeworth expansions. Then,
∑r ∑rPp(z) P̃p(z) ( )
n(z) = n(z) + o N−r/2
Np/2 Np/2
p=1 p=1
√
Multiplying by N taking the limit N →∞ we have P1(z) = P̃1(z). Therefore,
∑r rPp(z) ∑ P̃p(z) ( )
n(z) = n(z) + o N−r/2
Np/2 Np/2
p=2 p=2
Then, multiplying by N and taking N →∞, P2(z) = P̃2(z). Continuing this r times
we can conclude Pp(z) = P̃p(z) for 1 ≤ p ≤ r. (S )N
Here and in what follows, A is the asymptotic mean i.e. A = lim E .
N→∞ N
Work of Lyapunov, Edgeworth and Cramér focus on the problem of finding
higher order asymptotics in the CLT. Their main focus was on independent and
identically distributed (i.i.d.) sequences of random variables. In 1928, Cramér in-
troduced a theory of Edgeworth expansions for a broad class of random variables.
For the first rigorous derivation of this expansion see [10]. The monograph [11] by
Cramér also gives a detailed account of his theory of Edgeworth expansions.
Theorem 1.1 (Cramér). Let X be a centred random variable with E(X2) = σ2 > 0
and r + 2 absolute moments. Let X1, . . . , XN , . . . be sequence of i.i.d. copies of X.
2
Assume further that
lim sup |E(eitX)| < 1. (1.3)
|t|→∞
Then, SN satisfies (1.2).
Many refinements of this result appear in later literature. A good introduction
to this theory and later developments can be found in [3, 11,20,23].
In the i.i.d. case, Pp’s are polynomials such that the characteristic function
φ(t) = E(eitX) and the Fo(urier tr)ansform Êr,N of Er,N satisfyN
√t
( )
φ − Êr,N(t) = o N−r/2 .
σ N
E(X3)
For example, E1,n(z) = N(z) + n(z) √ (1− z2) and3
[ 6σ n
E E(X
3
√ )
4
− 2 E(X )− 3σ
4
2,n(z) = N(z) + n(z) (1 z ) + (3z − z3)
6 nσ3 24nσ4 ]
−E(X
3)2
(15z − 10z3 + z5) .
72nσ6
Since all distributions with an absolutely continuous component satisfy (1.3),
this theorem covers a large class of random variables. However, (1.3) indicates that
the common distribution of Xn’s is far from being discrete. In fact, (1.3) fails when
random variables are purely discrete. Surprisingly, not much had been explored in
the case of discrete random variables, except in the lattice case. The purpose of my
first project [16], joint with Dmitry Dolgopyat, was to address this issue. A detailed
discussion about this can found in Chapter 2.
When Xn’s are i.i.d., it is known that the order 1 Edgeworth expansion exists if
and only if the distribution is non-lattice (see [19]). Therefore, the following asymp-
totic expansion for the Local Central Limit Theorem (LCLT) for lattice random
3
variables is also useful.
Definition 2. Suppose that Xn’s are integer valued. We say that SN admits a lattice
Edgeworth expansion of order r, if there are polynomials P0,d, . . . , Pr,d and a number
A such that
√ ( )− r √k √NA ∑ Pp,d((k −NA)/ N) ( )NP(SN = k) = n + o N−r/2
N Np/2p=0
uniformly for k ∈ Z.
Remark 1.2. Here, the subscript d in Pp,d refers to the fact that the expansion
is for discrete lattice-valued random variables. A priori, there is no reason for the
polynomials Pp in Definition 1 to be related to Pp,d. In Section 3.3, we show how
these two polynomials are related.
As in remark 1.1, we can prove the uniqueness of this expansion. Because
Pp,d’s have finite degree, say at most q, choose N large enough so that SN has more
than q values. Then the argument in remark 1.1 applies.
During the 20th century, the work of Lyapunov, Edgeworth, Cramér, Kol-
mogorov, Esséen, Petrov, Bhattacharya and many others led to the development of
the theory of these two asymptotic expansions. See [26, 31] and references therein,
for more details.
It is immediate that SN[ ad(mits an order r)Edgeworth]expansion if
r/2 SN√−NAlim N P ≤ z − Er,N(z) = 0. (1.4)
N→∞ N
uniformly in z. [3, 4] discuss weak Edgeworth expansions where the LHS of (1.4) is
convolved with smooth compactly supported functions. These expansions yield the
asymptotics of E(f(SN)).
4
To introduce these expansions, suppose (F , ‖ · ‖) is a function space.
Definition 3. SN admits weak global Edgeworth expansion of order r for f ∈ F
if there are polynomials P0,g(z), . . . Pr,g(z) and A (which are independent of f) such
that
∑r ∫ ( √ ) ( )
E(f(SN −
1
NA)) = p Pp,g(z)n(z)f z N dz + ‖f‖ · o N−(r+1)/2 .
N 2
p=0
Definition 4. SN admits weak local Edgeworth expansion of order r for f ∈ F if
there are polynomials P0,l(z), . . . Pr,l(z) and A (which are independent of f) such
that
√ br/2c ∫1 ∑ 1 ( )
NE(f(SN −NA)) = Pp,l(z)f(z)dz + ‖f‖ · o N−r/2 .
2π Np
p=0
We also introduce the following asymptotic expansion which yields an averaged
form of the error of approximation.
Definition 5. SN admits averaged Edgeworth expansion of order r if there are
polynomials P1,a(z), . . . Pr,a(z) and numbers k,m such that for f ∈ F we have
∫ [ ( ) ( )]
S −NA y y
P N√ ≤∫z + √( −N )z +(√ f∑N N Nr )
(y)dy
1 ( )
= Pp,a z + √
y y
n z + √ f (y) dy + ‖f‖ · o N−r/2 .
Np/2
p=1 N N
Remark 1.3. Here, the subscripts g, l, a refer to global, local and averaged respec-
tively and used to distinguish the polynomials appearing each definition. In Sec-
tion 3.3, we show how these two polynomials are related.
All of these weak forms of expansions are unique provided that F is dense in
C∞c . If there are two different weak global expansions with polynomials {Pp,g} and
5
{P̃p,g}, the argument in remark 1.1 yields,
∫ ( √ ) ∫ ( √ )
Pp,g(z)n(z)f z N dz = P̃p,g(z)n(z)f z N dz
for all f ∈ C∞c which gives us the equality, Pp,g(z) = P̃p,g(z). The same idea works
for the other two expansions.
We have seen that these asymptotic expansions are unique. They also form a
hierarchy. We discuss this in Appendix A.2. Due to this hierarchy, in the absence
of one, others can be useful in extracting information about the rate of convergence
in (1.1).
Previous results on existence of Edgeworth expansions, for example in [11,
20, 23], assume independence of random variables Xn. For many applications the
independence assumption of random variables is too restrictive. Because of this
reason, there have been attempts to develop a theory of Edgeworth expansions for
weakly dependent random variables where weak dependence often refers to asymp-
totic decorrelation. See [9, 22, 29, 40, 41] for such examples. Their focus is on the
classical expansions introduced in Definition 1 and Definition 2.
Except in [9], the sequences of random variables considered are uniformly er-
godic Markov processes with strong recurrent properties or processes approximated
by such Markov processes. In [9], the authors consider aperiodic subshifts of finite
type endowed with a stationary equilibrium state and give explicit construction of
the order 1 Edgeworth expansion. They also prove the existence of higher order
classical Edgeworth expansions under a rapid decay assumption on the tail of the
characteristic function.
6
The goal of [21], a joint work with Carlangelo Liverani, is to generalize these
results and to provide sufficient conditions that guarantee the existence of Edgeworth
expansions for weakly dependent random variables including observations arising
from sufficiently chaotic dynamical systems, and strongly ergodic Markov chains.
In fact, we introduce a widely applicable theory for both classical and weak forms
of Edgeworth expansions and significantly improve existing results. This work is
discussed in detail in Chapter 3.
The CLT and related asymptotic expansions provide accurate descriptions only
of typical events. For example, if Xn’s are centered i.i.d. random variables then for
SN
all a > 0, lim P(SN ≥ aN) = 0, due to the Law of Large Numbers i.e. → 0
N→∞ N
in probability. Large Deviation Principles (LDPs) give better descriptions of these
non–typical events by specifying the exponential rate at which their probabilities
decay.
Before we present results related to LDPs, we recall the following definitions,
and facts whose proofs can be found in [17,30]. Given a function f : R→ (−∞,∞]
with f 6= ∞, define its effective domain to be Df = {x ∈ R|f(x) < ∞} and
its Legendre transform by f ∗(x) = sup [tx − f(t)]. Then, f ∗ is convex and lower
t∈R
semi-continuous. Therefore, Df∗ is an interval and f
∗ is continuous on Df∗ .
In addition, suppose f is convex, lower semi-continuous with D̊f = (a, b) and
f ∈ C2(a, b) with f ′′ > 0 on (a, b) (possibly a = −∞ or b = +∞). Then, D̊f∗ =
(A,B) where A = lim f ′(t) and B = lim f ′(t), f ∗ is continuously differentiable
t→a+ t→b−
on (A,B) and (f ∗)′ = (f ′)−1. For any f satisfying the above properties, for any
x ∈ D̊f∗ the supremum in the definition of f ∗(x) is achieved at the unique point
7
t ∈ D̊f which solves f ′(t) = x and hence, f ∗(x) = sup [tx− f(t)]. Also, f is called
t∈D̊f
steep if lim |f ′(t)| = lim |f ′(t)| =∞.
t→a t→b
The following classical result, due to Cramér, is one of the fundamental results
in the theory of Large Deviations.
Theorem 1.2 (Cramér). Let X be a real valued random variable with mean A and
variance σ2 > 0. Suppose that the logarithmic moment generating function of X,
logE(etX), is finite in a neighbourhood of 0. Let Xn be a sequence of i.i.d. copies of
X. Then,
1
lim logP(SN ≥ Nz) = −I(z), if z > A
N→∞ N
and
1
lim logP(SN ≤ Nz) = −I(z), if z < A
N→∞ N [ ]
where I is given by I(z) = sup λz − logE(eλX) (the Legendre transform of the
λ∈R
logarithmic moment generating function of X).
From the hypothesis it is immediate that I is convex and lower semi-continuous.
Also, I ′′ > 0 on D̊I = (inf(supp X), sup(supp X)), therefore I is strictly con-
vex on D̊I , I(z) = 0 ⇐⇒ z = µ and there is a unique λ∗ such that I(z) =
λ∗z − ∗logE(eλ X).
Cramér’s LDP has an extension to the non–i.i.d. case. We refer the reader
to [6][Chapter V.6] for a proof of the following result.
Theorem 1.3 (Local Gärtner–Ellis). Let Xn be a sequence of random variables not
necessarily i.i.d. Suppose there exists δ > 0 such that for λ ∈ (−δ, δ),
1
lim logE(eλSN ) = Ω(λ) (1.5)
N→∞ N
8
where Ω is (strictly co)nvex continuously differentiable function with Ω′(0) = 0. Then,Ω(δ)
for all z ∈ 0, ,
δ
1
lim logP(SN ≥ Nz) = −I(z) (1.6)
N→∞ N
where I(z) = sup [zλ− Ω(λ)].
λ∈(−δ,δ)
Remark 1.4.
Ω(δ)
1. If the limit (1.5) exists for all λ ∈ R. Then, B = lim exists and (1.6)
δ→∞ δ
holds for all z ∈ (0, B).
2. The function I appearing in the theorem is called the rate function because it
gives us the exponential rate of decay of tail probabilities.
In an on-going joint work with Pratima Hebbar, we develop a theory of higher
order asymptotics for LDPs, using the weak forms of Edgeworth expansions and
extensions of results in [27, Chapter VIII]. As in the CLT case, higher order asymp-
totics are given as expansions.
Definition 6. Suppose SN satisfies an LDP with rate function I. Then, SN admits
strong asymptotic expansion of order r for large deviations in the range (0, L) if
r
there are functions Cp : (0, L) → R for 0 ≤ p < and A > 0 such that for each
2
a ∈ (0, L),
b∑r/2c ( )
− ≥ I(a)N Cp(a) 1P(SN AN aN)e = + C · o .
Np+1/2
r,a r+1
2
p=0 N
These expansions are in the spirit of the higher order expansions found [1]
for i.i.d. sequences of random variables. In [7], authors refer to these expansions as
strong large deviation results. [7, 32] establish the order 1 expansions under certain
assumptions on the behaviour of the moment generating functions. These strengthen
9
the results of [1] but only in the order 1 case. Here, we give an alternative way to
establish the so-called strong large deviation results of all orders. We also manage to
recover the results in [1] in the non-lattice setting. For applications of these results
to statisitcs, see examples listed in [1, 7, 32] and references therein.
We also introduce the following weak form of the expansion for LDPs. As in
the CLT case, we define these expansions over a function space (F , ‖ · ‖).
Definition 7. Suppose SN satisfies an LDP with rate function I. Then, SN admits
weak asymptotic expansion of order r for large deviations in the range (0, L) for
f ∈ F r, if there are functions Dp : (0, L) → R for 0 ≤ p < and A > 0 such that
2
for each a ∈ (0, L),
b∑r/2c ( )
E(f(S − aN))eI(a)N Dp(a)N = + Cr,a ·
1
o .
Np+1/2 r+12
p=0 N
In fact, our results show that for a sequence Xn of i.i.d. l−Diophantine random
variables with all exponential moments, for every r, SN admits weak asymptotic
expansions of order r for large deviations on (0,∞) for sufficiently regular f . This
is a refinement of the LDP by Cramér for a broad class of random variables.
We also obtain similar results for certain classes of non–i.i.d. random variables.
As an application, we obtain asymptotic expansions for the LDP in the case of
Markov chains with smooth densities. In particular, let xn be a time homogeneous
Markov chain on a compact connected manifoldM with a smooth transition density
and h : M×M → R be smooth with non-degenerate critical points. Then Xn =
h(xn, xn−1) admits asymptotic expansions for large deviations of all orders. These
results are presented in Chapter 4.
10
Chapter 2: Central Limit Theorem: Discrete Random Variables.
2.1 Overview and main results.
L∑et X be a random variable with zero mean and positive variance σ
2. Let
n
Sn = Xj where Xj are independent identically distributed and have the same
n=1
distribution as X. Then, it is well-known that Sn satisfies the CLT with A = 0 and
σ as in (1.1).
In this chapter, we consider a case which is opposite to X having a density,
namely we suppose that X has a discrete distribution with d+1 atoms where d ≥ 2.
d = 2 is the simplest non-trivial case since distributions with two atoms are lattice
and as a result they do not admit even the first order Edgeworth expansion.
Thus, we suppose thatX takes values a1, . . . , ad+1 with probabilities p1, . . . , pd+1
respectively. Since X should have zero mean we suppose that our 2(d + 1)−tuple
(a,p) belongs to the set
Ω = {pi > 0, p1 + · · ·+ pd+1 = 1, p1a1 + · · ·+ pd+1ad+1 = 0}.
It is easy to see that Sn never admits the order d Edgeworth expansion. Indeed
∑ n!
P m(S ≤ z) = pm1 . . . p d+1a,p n . (2.1)
∑ ∑ m1! . . .m !
1 d+1
mi≥
d+1
0, mi=n
miai≤z
11
Applying the Local Central Limit Theorem to the time homogeneous Zd-random
walk which jumps to ei from the origin 0 with probability pi for i = 1, . . . , d and
stays at 0 with probability pd+1 we conclude that if
∑ ∑ √
miai = n aipi +O( n)
then
d/2 n! m1 mn p . . . p d+1
m 1 d+11! . . .md+1!
is uniformly bounded from below. Accordingly Pa,p(Sn ≤ z) has jumps of order
n−d/2. On the other hand Ed(z) is a smooth function of z. So, it cannot approximate
both Pa,p(Sn ≤ z − 0) and Pa,p(Sn ≤ z + 0) at the points of jumps.
Here we show that for typical (a,p) the order d Edgeworth expansion just
barely fails. We present two results in this direction. For the first result let
bj = aj − a1, for j = 2 . . . d+ 1.
Set
d(s) = max dist(bjs, 2πZ).
j∈{2,...d+1}
We say that a is β-Diophantine if there is a constant K such that for |s| > 1,
K
d(s) ≥ .
|s|β
It is easy to see that almost all a is β-Diophantine provided that β > (d− 1)−1 (see
[36,47]).
Theorem 2.1.1. If a is β-Diopha(ntine an)d
− 12 R β < 1 (2.2)
2
12
then [ ( ) ]
Sn
lim nR P √ ≤ z − E
→∞ a,p d−1
(z) = 0.
n σ n
Thus for almost every a the order (d − 1) Edgeworth expansion approximates the
S√ndistribution of with error O(nε−d/2) for any ε.
σ n
Note that Theorem 2.1.1 applies for all βs, in particular for βs which are much
larger than (d− 1)−1. However if β is large, then the statement of the theorem can
be simplified. Namely, let r be the integer such that r < 2R ≤ r + 1. (Note that
1
(2.2) can be rewritten as 2R < + 1 so provided that 2R is suffciently close to
β
1
+ 1 we can take r = bβ−1( )c+ 1. Then,β ( )
S
P √na,p ≤ z = E
1
d−1(z) +
σ n (o n)R
E 1= r(z) + o +O (Ed−1(z)− Er(z)) .
nR
r + 1
Since > R the first error term dominates the second and we obtain the
2
following result.
Corollary 2.1.1. [ ( ) ]
lim nR
S
P √na,p ≤ z − Er(z) = 0
n→∞ σ n
1
provided that a is β-Diophantine, r = 1 + bβ−1c, and r < 2R < + 1.
β
Theorem 2.1.1 shows that for almost every a and for r ∈ {1, . . . , d − 1}, the
order r Edgeworth expansion is v(alid. Result)s that follow show that,
S
P na,p √ ≤ z − Ed(z) (2.3)
σ n
is typically of order O(n−d/2) but the O(n−d/2) term has wild oscillations. To for-
mulate this result precisely we suppose that our 2(d+ 1)-tuple is chosen at random
13
according to an absolutely continuous distribution P on Ω. Thus (2.3) becomes a
random variable.
Theorem 2.1.2. There exists a smooth function Λ(a,p) such that for each z the
random variable [ ( )]
d/2
z2/2 n E − Se d(z) Pa,p √n ≤ z
Λ(a,p) σ n
converges in law to a non-trivial random variable X .
More precisely we have,
|ad+1 − a1|
Λ(a,p) =
d+ 1
√ (2.4)
2dπ 2 det(Da,p) σ(a,p)
where Da,p is a (d − 1) × (d − 1) matrix defined by equations (2.37)–(2.38) of
Section 2.5, σ(a,p) denotes the standard deviation of the distribution of the random
variable taking value aj with probability pj and X is defined as follows.
Let M be the space of pairs (L, χ) where L is a unimodular lattice in Rd and χ
is a homeomorphism χ : L → T. In the formulas below, we identify T with segment
[0, 1) equipped with addition modulo one. Given a vector w ∈ Rd we denote by
y(w) its first coordinate and by x(w) its last d− 1 coordinates.
Lemma 2.1.2. For almost every pair (L, χ) ∈M with respect to the Haar measure
the following limit exists
∑
X L sin(2πχ(w)) −||x(w)||2( , χ) = lim e . (2.5)
R→∞ y(w)
w∈L\{0}, ||w||≤R
In order to simplify the notation we will abbreviate expressions such as (2.5)
by ∑
X L sin(2πχ(w)) 2( , χ) = e−||x(w)|| . (2.6)
y(w)
w∈L\{0}
14
The Haar measure on M can be defined in two equivalent ways. First, note
that χ is of the form χ(w) = eiχ̃(w) for some linear functional χ̃ ∈ (Rd)∗. SLd(R)
acts on Rd ⊕ (Rd)∗ by the formula,
A(w, χ̃) = (Aw, χ̃A−1).
Observe that if A(w, χ̃) = (ŵ, χ̂) then,
χ̃(w) = ŵ(χ̂). (2.7)
The above action of SLd(R) induces the following action of SL (R) n (Rd)∗d on M
given by,
(A, χ̃)(L, χ) = (AL, e2πitχ̃ · (χ ◦ A−1)).
This action is transitive because SLd(R) acts transitively on unimodular lattices and
(Rd)∗ acts transitively on characters. This allows us to identify M with (SLd(R) n
Rd)/(SLd(Z) n Zd) and so M inherits the Haar measure from SL (R) nRdd .
The second way to define the Haar measure is to note that the space M of
unimodular lattices is naturally identified with SLd(R)/SLd(Z) and so it inherits
the Haar measure from SLd(R). Next for a fixed L the set of homeomorphisms
χ : L → T is a d dimensional torus so it comes with its own Haar measure.
Now, if we want to compute the average of a function Φ(L, χ) with respect to
the Haar measure then we can first compute its average Φ̄(L) in each fiber and then
integrate the result with respect to the Haar measure on the space of lattices. In
the proof of Lemma 2.1.2 given in Section A.1 the averaging inside a fiber will be
denoted by Eχ and the averaging with respect to the Haar measure on the space of
lattices will be denoted by EL.
15
If we assume that the pair (L, χ) is distributed according to the Haar measure
on M then X , defined in Lemma 2.1.2, becomes a random variable. This is the
variable mentioned in Theorem 2.1.2. Note that the distribution of X depends
neither on P nor on z.
Using the second representation of the Haar measure we can also describe X
as follows. Let w1, . . . ,wd be the shortest spanning set of L. That is w1 is the
shortest non zero vector in L and, for j > 1, wj is the shortest vector which is
linearly independent of w1, . . . ,wj−1. Given m = (m1, . . . ,md) ∈ Zd let (y,x)(m),
y ∈ R and x ∈ Rd−1, denote the point
m1w1 + · · ·+mdwd ∈ L. (2.8)
Let θj = χ(wj). Then θj are uniformly distributed on T and independent of each
other. Set θ(m) = m1θ1 + · · · + mdθd. Now X (see definition in Lemma 2.1.2) can
be rewritten as ∑
X sin(2πθ(m))= e−||x(m)||2 (2.9)
y(m)
m∈Zd\{0}
where L is uniformly distributed on the space of lattices, (y,x)(m) is defined by
(2.8), and (θ1, . . . θd) is uniformly distributed on Td and independent of L.
Theorems 2.1.1 and 2.1.2 have analogues when we consider probabilities that
Sn belongs to finite intervals. In particular, our results have applications to the
Local Limit Theorem.
Theorem 2.1.3. Let z1(n) and z2(n) be two uniformly bounded sequences such that
|z (n)− z (n)|nd/21 (2 [ →∞. Th(en, the ran)d]om vec[tor, ( )])
nd/2 z2 Sne 1/2 E (z )− P √ ≤ z , ez2 Sn2/2d 1 a,p 1 Ed(z2)− Pa,p √ ≤ z2 (2.10)
Λ(a,p) σ n σ n
16
converges in law to a random vector (X (L, χ1),X (L, χ2)) where X (L, χ) is defined
by (2.6) and the triple (L, χ1, χ2) is uniformly distributed on (SLd(R)/SLd(Z)) ×
Td × Td.
Here and below the uniform distribution of (L, χ1, χ2) means that L is uni-
formly distributed on the space of lattices and for a given lattice, χ1 and χ2 are
chosen independently and uniformly from the space of characters.
Theorem 2.1.4. Let z1(n) < z2(n) be two uniformly bounded sequences such that
ln = z2(n)− z1(n)→ 0.
(a) If l ≥ Cnε−d/2n for some ε > 0 then
Pa,p(z < S√n1 < zσ n 2) → 1 almost surely.
lnn(z1)
(b) If l nd/2n →∞ then
P Sna,p(z1 < √ < zσ n 2) ⇒ 1
lnn(z1)
(here and below “⇒” denotes the convergence in law).
c|ad+1 − a1|
(c) If ln = then
σ(a,p)nd/2√ [ ]P (z < S√na,p 1 < z2)
2d−
3 d σ n
2π det(Da,p) − 1 ⇒ Y
lnn(z1)
where
∑
Y L sin(2π[χ(w)− cy(w)])− sin(2πχ(w))( , χ, c) = e−||x(w)||2
y(w)
w∈L\{0}
and L, χ are as in Theorem 2.1.2 and Da,p given by equations (2.37)–(2.38).
The intuition behind this result is the following. Call yn δ-plausible if P(Sn =
yn) ≥ δn−d/2. The discussion following (2.1) shows that for each δ there are about
17
C(δ)nd/2 δ-plausible values. Therefore, if ln  n−d/2 then the interval [z1(n), z2(n)]
would typically contain no plausible values. Hence, we should not expect the LLT
to hold on that scale. Theorem 2.1.4 shows that as soon as interval [z1(n), z2(n)]
contains many plausible values then the LLT typically holds for this interval.
Recall that,
∑ n!
P ma,p(Sn ∈ [z1, z2]) = pm1∑ 1 . . . p
d+1
d+1 .
∑ m1! . . .m≥ d+1!mi 0, mi=n
z1≤ miai≤z2
∑Thus, in Theorem 2.1.4 we just count the number of visits of a random linear form
miai to a finite interval with weights given by multinomial coefficients. It is
also interesting to consider counting with equal weight. In this case the analogue
of Theorem 2.1.4(c) is obtained in [38] while for longer intervals only partial results
are available, for example see [15,34].
The chapter is organized as follows. Theorem 2.1.1 is proven in Section 2.2.
The proof is a minor modification of the arguments of [20, Chapter XVI]. The bulk
of the chapter is devoted to the proof of Theorem 2.1.2. In Section 2.3 we provide an
equivalent formula for X . This formula looks more complicated than (2.6) but it is
easier to identify with the limit of the error term. Section 2.4 contains preliminary
reductions. We show that the density ρ on Ω could be assumed smooth and the
integration in the Fourier inversion formula could be restricted to a finite domain. In
Section 2.5, we show that main contribution to the error term comes from resonances
where characteristic function of Sn is close to 1 in absolute value. The proof relies
on several technical estimates which are established in Section 2.6. In Section 2.7,
we use dynamics on homogenuous spaces in order to show that the contribution of
18
resonances converges to (2.6) completing the proof of Theorem 2.1.2. The proofs of
Theorems 2.1.3 and 2.1.4 are similar to the proof of Theorem 2.1.2. The necessary
modifications are explained in Section 2.8. We postpone the proof of Lemma 2.1.2
till Appendix A.1.
2.2 Edgeworth Expansion under Diophantine conditions.
Theorem 2.1.1 is a consequence of Theorem 2.2.1 below and the fact that in
our case there is a positive constant c such that
|φ(s)| ≤ 1− cd(s)2. (2.11)
(2.11) follows from inequality (2.35) proven in Section 2.5.
Theorem 2.2.1. If the distribution of X has d+ 2 moments and its characteristic
function satisfies
| Kφ(s)| ≤ 1− (2.12)
|s|γ
d
and R < is such that
2 ( )
R− 1 γ < 1 (2.13)
2
then [ ( ) ]
R SP √nlim n ≤ z − E
→∞ d−1
(z) = 0.
n σ n
Theorem 2.2.1 follows easily from the estimates in [20, ChapterXVI] but we
provide the proof here for completeness.
Proof. Denoting ( )
Sn
∆̄n(a,p) = Pa,p √ ≤ z − Ed−1(z)
σ n
19
we get by [20, Chapter XVI] that for∣each T∫ ∣
1 √
T ∣ √
σ n
| ∣∆̄n(a,p)| ≤ ∣
π − √T ∣φ
n(s)− Êd−1(sσ n) ∣∣∣∣ Cds+ . (2.14)s T
σ n
C C ε
Choose T = BnR with B = . Then, = . Take a small δ and split the
ε T nR
integral ∣in the RHS of (2.14) ∣into two parts.∫ ∣∣∣ √ ∣ √ ∣1 δ ∣φn
∫
(s)− Êd−1(sσ n) ∣∣∣∣ 1 ∣∣φn(s)− Ê ∣d−1(sσ n) ∣ds + ∣∣ ∣∣ ds.π −δ s π δ<|s|<BnR−1/2/σ s
( (2.15))
Again∫, by [2∣∣0, Chapter X∣∣VI], we have that the first integral of (2.15) is O n−d/2 .√∣∣∣ Êd−1(sσ n)∣Also, ∣∣ ds has exponential decay as n → ∞. Put J = {s : δ <|s|>δ s
|s| < BnR−1/2∣ /σ}∣. Thus, we only need to approximate,∫ ∣∣∣ n ∣∣∣ ∫ ∫φ (s) ≤ 1 | n | ≤ C ( )1ds φ (s) ds exp −c̄ n1−(R− γ2) ds (2.16)
J s δ J δ J
where the last inequality is due to (2.12). By (2.13) the integral decay faster than
d
any power of n. Because R < the contribution of |s| ≤ δ is also under control.
2
2.3 Change of variables.
Here we deduce Theorem 2.1.2 from:
Theorem 2.1.2*. For each z[the random v(ariable )]
nd/2 Ed(z)−
S
P na,p √ ≤ z
σ n
converges in law to X̂ where
∑
X̂ L −z2/2 |ad+1 −√a1| sin 2πχ(w)(a, p, , χ) = e e−4π2x(w)TDa,px(w) (2.17)
2σ(a, p) π3 y(w)
w∈L\{0}
20
a = (a1, . . . , ad+1), p = (p1, . . . , pd+1) and (a, p) ∈ Ω are distributed according to P
and Da,p and σ(a, p) are defined immediately after (2.4).
In order to deduce Theorem 2.1.2 from Theorem 2.1.2* we need to show that
z2/2 X̂e has the same distribution as X . To this end we rewrite the sum in (2.17)
Λ(a, p)
as
1 √ ∑ sin(2πχ(w))√ √e−4π2||( Da,px(w))||2 . (2.18)
(2π)d−1 det( Da,p) d−1w∈L\{ } y(w)/((2π) det( D0 a,p))
Let A be the linear map suc(h that √ )y √A(y,x) = , 2π Da,p x .
(2π)d−1 det(Da,p)
Put (L̄, χ̄) = A(L, χ). Then, using (2.7), (2.18) can be rewritten as:
1 √ ∑ sin(2πχ̄(w̄))e−||x(w̄))||2 .
(2π)d−1 det( Da,p) y(w̄)w̄∈L\{0}
Since det(A) = 1, the pair (L̄, χ̄) is distributed according to the Haar measure
on M proving our formula for X .
Sections 2.4–2.7 are devoted to the proof of Theorem 2.1.2*. Note that simi-
larly to (2.9) we have
X̂ −z2/2 |ad+1 −√a1|
∑ sin 2πθ(m)
= e e−4π
2x(m)TDa,px(m).
2σ(a, p) π3 y(m)
m∈Zd\{0}
The statements of Theorems 2.1.2 and 2.1.2* look similar, however, there
is an important distinction. Namely the proof of Theorem 2.1.2* is constructive.
In the course of the proof given n, a and z we construct a lattice L(a, n) and
a character χ(a,p, n, z) such that the expression n−d/2X̂ (a,p,L(a, n), χ(a,p, n, z))
well-approximates the error in the Edgeworth expansion. We believe that such a
21
construction could be made for more general distributions where the Edgeworth ex-
pansion fails, and this will be a subject of a future investigation. So the difference
between Theorems 2.1.2 and 2.1.2* is that in the first case we have only an approx-
imation in law while in the second case we are able to obtain an approximation in
probability.
2.4 Cut off.
2.4.1 Density.
Here we show that it is enough to prove Theorem 2.1.2* under the assumption
that P has smooth density supported on a subset
Ωκ = {(a,p) ∈ Ω : ∀i pi ≥ κ and ∀i 6= j |ai − aj| ≥ κ}
for some κ > 0. Indeed suppose that the theorem is true for such densities. Let
p(a,p) the original density of P. Let φ be a bounded continuous test function.
Given ε we can find a smooth density p̃(a,p) supported on some Ωκ such that
||p− p̃||L1∫≤ ε. In Section 2.7 we∫p∫rove that
φ(nd/2∆n)(p̃ da dp→) φ(X̂ (a,p,L,θ))p̃ da dp dµ(L,θ) (2.19)
S
where ∆n = Ed(z)−P √n ≤ z and µ is the Haar measure on (SLd(R)/SLd(Z))×
σ n
Td. Let pm(a,p) be the smooth density supported on Ωκ corresponding to ε = m−1.
Passing to subsequence, pm → p almost surely. Because |pmφ| ≤ ‖φ‖|pm| ∈ L1
and |pφ| ≤ ‖φ‖|p| ∈ L1 and ‖φ‖|pm| → ‖φ‖|p| almost surely, Lebesgue Dominated
Convergence Theorem gives
22
∫∫
φ(X̂ (a,p,L,θ))pm da dp dµ(L,θ∫) ∫
→ φ(X̂ (a,p,L,θ))p da dp dµ(L,θ). (2.20)
Combining∫(2.19) and (2.20) we h∫ave that,
φ(nd/2∆∫n∫)p da dp = φ(n
d/2∆n)pm da dp +O(m−1‖φ‖) (2.21)
−n−→−∞→ ∫∫ φ(X̂ (a,p,L,θ))pm da dp dµ(L,θ) +O(m
−1‖φ‖)
−m−→−∞→ φ(X̂ (a,p,L,θ))p da dp dµ(L,θ).
2.4.2 Fourier transform.
As in the previous section let ( )
Sn
∆n = Ed(z)− Fn(z) where Fn(z) = Pa,p √( ≤ z .σ n )
1 · 1− cosTxDenote by vT (x) = and let V(s, T ) = 1−
|s|
1
2 |s|≤T be itsπ Tx T
Fourier transform. Using the approach of [20, Section XVI.3] we let T2 = n
2d+6 and
decompose
∆n = [Ed − Fn] ? vT2(z)− [Fn − Fn ? vT2 ] (z) + [Ed − Ed ? vT2 ] (z). (2.22)
To estimate the last term we split ∫
[Ed − Ed ? vT2 ] (z) =∫ [Ed(z)− Ed(z − x)] vT2(x)dx (2.23)|x|≤1
+ [Ed(z)− Ed(z − x)] vT2(x)dx.
|x|≥1
Since vT is even∫the first integral in (2.2∫3) equals to
E ′′E ′ (y(z, x))d(z)xvT2(x)dx− d x2vT2(x)dx
|x|≤1 |x|≤1 2
23
∫
E ′′
( )
d (y(z, x)) 1− cosT2x= dx = O 1 .
|x|≤1 2 πT2 T2
Since both Ed and cosine are bounded the second integral in (2.23) is bounded
by ∫
dx C
C = .
( 2|x|≥1)T2x T2
Thus the last term in (2.22) is O T−12 . To estima√te the second term in√(2.22) we
split the integral in Fn ?√vT2 into regions {|x| ≥ 1/ T2} and {|x| ≤ 1/ T2}. The
contribution of {|x| ≥ 1/ T2} is bounded by
∫ ∞ dx
C √ = √
C
.
1/ T T x
2 T
2 2 2
On the other hand
∫
√ [Fn(z)− Fn(z − x)]VT2(x)dx = 0
|x|≤1/ T2 [ √ √ ]
unless there is a point of increase of Fn inside z − 1/ T2, z + 1/ T2 . The prob-
ability that such a point exists is bounded by
∑ ( [ √ √ ])
P m1a1 + · · ·+md+1ad+1 ∈ z − 1/ T2, z + 1/ T2 . (2.24)
m1+···+md+1=n
Note that for each fixed (m1, . . . ,md+1) the random variable
m1a1 + · · ·+md+1ad+1
has a bou(n√ded density with r)espect to the uniform distribution on the segment of
length O m21 + · · ·+m2d+1 and so ( )
|J |
P(m.a ∈ J) = O
‖m‖
24
( )
for(any in)terval J . Hence each term in( 1(2.2)4) is O √ and so the(sum isn T2d )
O √n 1 −1/2. Thus with probability 1−O we have that ∆n = ∆n,2+O T
n T2 n4
2
where
∫ [ ( ) ]T φn1 2 √t − Ên d(t)
∆n,2 = ∫ V(t, T )e
−itz
2 dt
2π −T it2
T√2 √1 σ n √ φn−iszσ n (s)− Êd(sσ n)= e V(s, n, T2)ds ,
∣ 2π ∣ − T√2 is∣ σ n√ ∣
V sσ n(s, n, T ) = 1− ∣∣ ∣∣ and φ(s) is the characteristic function of X given byT
φ(s) = p1e
isa1 + · · ·+ p eisad+1d+1 .
Let T1 = K1n
d/2 and define
∫ T√1 √1 σ n √ n−iszσ n φ (s)− Êd(sσ n)∆n,1 = e V(s, n, T2) ds.
2π − T√1 is
σ n
Let Γn = ∆n,2 −∆n,1. Put
∫
1 √ n
Γ̃ = e−iszσ n
φ (s)
n √ √ V(s, n, T2) ds.2π |s|∈[T1/((σ n),T2)/(σ n)] is
2
Then, we have Γn = Γ̃n +O e−εT1 due to the exponential decay of Êd.
The main result of Subsection 2.4.2 is the following.
Proposition 2.4.1. ∥∥∥ ∥∥Γ̃n∥ ≤ √C . (2.25)
L2 T nd1
Proof. ∫∫ ( √ )
E(Γ̃2 ) = E e−i(s1+s2)zσ n n n V V ds1 ds2n φ (s1)φ (s2) (s1, n, T2) (s2, n, T2) .s1 s2
We split this integral into two parts.
25
(1) In the region where |s1 + s2| ≤ 1 we use Corollary 2.5.2 proven in Section
2.5 to estimate the(in∫tegral by )
O 1√ √ √ E (|φ
n(s1)|) ds1 . (2.26)2
|s|∈[T1/(σ n),T2/(σ n)] ns1
The next result will be proven in Section 2.6.
Lemma 2.4.2.
C
E (|φn(s1)|) ≤ .
nd/2
Plugging the estimate of Lemma 2.4.2 into ((2.26) an)d integrating we see that
1
the contribution of the first region to E(Γ̃2n) is O .T d/21n
(2) Consider now the region where |s1 + s2| ≥ 1. Denote
bd+1 = ad+1 − a1, . . . , b2 = a2 − a1.
Then
φ(s) = eisa1ψ(s) where ψ(s) = p + p eisb21 2 + · · ·+ p eisbd+1d+1 .
Denote ν = (p1, . . . , pd, b2, . . . , bd). Then there exists a compactly supported density
ρ = ρ(a1,ν) such that the contribution of the second region is
∫∫ (∫ )
√
−i(s +s )zσ n in(s +s )a n n V V ds1 2 1 ds2e e 1 2 1ψ (s1)ψ (s2) (s1) (s2)ρ da1 dν .
|s s s1+s2|≥1 1 2
We are able to use a 2d-dimensional coordinate system because on Ω
p1 + · · ·+ pd+1 = 1, and p1a1 + · · ·+ pd+1ad+1 = 0. (2.27)
To estimate this integral we integ[rate by p]arts with respect to a1. We use thatk
eisna
1 d
1da1 = de
isna1
isn da1
26
for some large k (for(exam)ple we(can take k = 2d + 1)). The integration by partskd √
amoun{ts to applying to e
iszσ n
}{ ρ[ψ(s1})ψ{(s
n
2)] which leads to the terms( )k [ da1 ( ) ( ) }d 1 √ ] kd 2 ki(s +s )zσ n d 3e 1 2 [ρ] [[ψ(s1)ψ(s n2)] ]
da1 da1 da1
where k1 + k2 + k3 = k. (Note that both σ and ψ depend on a1 implicitly due to the
second equation in (2.27)). Thus, the contribution of the above term to the integral
is bounded ∫by∫
(s + s k1 (k1/2)+k31 2) n ds1 ds2
C √ √ E (|φn(s1)|) .
|s1|,|s2|∈[T1/σ n,T2/σ n] (s + s )k nk s1 s2
|s1+s
1 2
2|≥1
Using Lemma 2.4.2 again we can estimate the above integral by C if k ≥ k − 2nk/2
1
 C
− − otherwise.T k+d/2 k1/2 k31n
Thus the main contribution comes from k1 = k2 = 0, k3 = k proving Proposition
2.4.1.
Proposition 2.4.1 shows that the contribution from Γ̃n to the L
2-limit of nd/2∆n
√
can be made arbitrarily small by choosing K1 large. Also, on |s| ≤ T1/σ n we
have ( √ )
V sσ n sσ(s, n, T2) = 1− 1 √
T |s|<T2/σ n
= 1− .
n2d+
11
2 2
Hence ∆n,1 = ∆̂n + o(n
−2d)∫where √
1 φn(s)− Ê (sσ n) √d
∆̂ −iszσ nn := e ds
2π √|s|≤T /σ n is1
approximates well ∆n,1 and hence, ∆n too. Also, the error from this approximation
of nd/2∆n converges to 0 in L
2. Hence, we only need to analyze nd/2∆̂n for large
n.
27
2.5 Simplifying the error.
Denote
2πk
sk = |bd+1|
2π
and let Ik be the segment of length centered at s| k
. Put K2  K1. Due to the
bd+1|
results of the previous section it is sufficient to study
∑
∆̂n = Ĩk
√
|k|≤K2 n
where ∫ √
Ĩ 1
√ n
= e−iszσ n
φ (s)− Êd(sσ n)
k ds.
2πi I sk
√
Ĩ = O(n−(d+1)/20 ) due to [20, Section XVI.2]. Next, Êd(sσ n) decays exponentially
with respect to n outside of I0. So, its contribution to Ĩk is negligible for k 6= 0.
Accordingly, ∑ ( )
∆̂n = Ik +O
1
√ n(d+1)/2
0<|k|≤K n
where ∫
I 1
√ n
−iszσ n φ (s) √
k = e 1
2πi s |s|≤T1/σ n
ds.
Ik
Introduce the following notation
s̄k = arg max |φ(s)|, φ(s̄k) = rkeiφk .
s∈Ik
The following lemma is similar to the results of [12, Section 5.2].
Lemma 2.5.1. Suppose that
rn ≥ n−100k (2.28)
28
and
± T√1 ∈6 Ik. (2.29)
σ n
Then
n √
Ik = √
1 rk −z2e /2 einφk−is̄kzσ n(1+on→∞(1)).
i πnσ s̄k
Proof. Let eis̄kaj = ei(φk+βj(k)). Then
∑d+1
rk = pj cos(βj(k)) (2.30)
j=1
and ∑d+1
pj sin(βj(k)) = 0. (2.31)
j=1 √
C lnn lnn
Since (2.28) implies that rk ≥ 1− , (2.30) shows that |βj(k)| ≤ C and
n n
so (2.31) gives ∑ ( )d+1 ln3/2 n
pjβj(k) = O . (2.32)
n3/2
j=1
Now we use Taylor expansion
( )(
β (k)2 a2
)
j jδ
2
ei(s̄k+δ)aj = eiφk 1 + iβj(k)− 1 + iajδ −
2 2 ( )
ln3/2
+O n + δ3 . (2.33)
n3/2
Thus,
∑ ( )d+1 ( 2 2)a δ ln3/2 n
φ(s̄ + δ) = eiφk
j
k pj cos(βj(k))− +O + δ3
( 2 ) ( n3/2j=1 )
σ2δ2 3/2
= rke
iφk 1− +O ln n + δ3 (2.34)
2 n3/2
where we have used (2.32) as well as
p1a1 + · · ·+ pd+1ad+1 = 0, p 21a1 + · · ·+ pd+1a2 2d+1 = σ .
29
Hence for large n, the main contribution to Ik equals to
n ∫ ( )r √ σ2δ2 n √k ei(nφk− nσzs̄k) 1− e−iσzδ n∫ dδ2πis̄k 2n √
≈ rk i(nφ − nσzs̄ 2 2
√
e k k) e−σ δ n/2−iσδ nzdδ.
2πis̄k √ √2 πe−z2/2
Making the change of variables σδ n/2 = t we evaluate the last integral as √ .
σ n
Corollary 2.5.2. If I is a finite interval of order 1. Then
∫ ( )
|φn(s)|1 √|s|≤T /σ n ds = O √
1
.
1
I n
Proof. We can cove(r I by)a finite number of intervals Ik. The intervals where rnk ≤
1 ( |I| 1contr)ibute O while the contribution of the intervals where rnk ≥n100 n100 n100
1
is O √ due to Lemma 2.5.1.
n
∣∣ ∑d+1 ∣∣ ∑
Because rk ≈ 1, rk = |ψ(sk)| = ∣p1 + pjeibjsk∣ ≈ pj. Therefore, aj s̄k ≈
j=2
2πkbj
a1s̄k mod 2π for all j ≥ 2. Thus, ≈ 0 (mod 2π) for all 2 ≤ j ≤ d and hence,
bd+1
2πkbj
φ(sk) ≈ 1 which means sk and sk are close. Define, ξk = sk−sk, ηj,k = +2πlj,k,
bd+1
2πkbj
j = 1, . . . , d where lj,k is the unique integer such that +2πlj,k ≈ 0. Then,
bd+1
∑d+1 ∑
r2 2k = pj + 2 plpj cos[(bl − bj)ξk + ηl,k − ηj,k] + 2pd+1p1 cos bd+1ξk
j=1 l>j,j=6 1 ∑d
+ 2 pjp1 cos(bjξk + ηj,k). (2.35)
j=2
Therefore
∑
r2k = 1− plpj[(bl − bj)ξk + η − η ]2l,k j,k − pd+1p1b2 2d+1ξk
l>j,j=6 1
30
∑ ( ∑ )d d
− pjp1(bjξk + ηj,k)2 +O ξ3k + η3l,k .
j=2 l=1
Taking η1,k = b1 = 0 we can write the above as,
∑ ∑
r2 = −ξ2 p p (b − b )2k k l j l j − 2ξk plpj(bl − bj)(ηl,k − ηj,k)
l>j l>j
∑ (l,j)6=(d,1) ( ∑ )d
+ 1− plpj(bl − bj)(η 2 3 3l,k − ηj,k) +O ξk + ηl,k .
l>j l=1
(l,j)6=(d,1)
Since we have r2k approximated by a quadratic polynomial of ξk (the unknown) we
can approximate ξk by∑determining the maximizer of r2k(ξk), obtaining
l>j∑ plpj(bl − bj)(ηl,k − ηj,k) ( )ξk = − (l,j)=6 (d,1) +O ‖η 2k| . (2.36)2
l>j plpj(bl − bj)
Substituting back we find rk in terms of ηj,k only. Ignoring higher order terms we
compute the maximum to be:
∑
r2k = 1− plpj(bl − bj)(ηl,k − η 2j,k)
l>j
(l,j)=6 (d,1) [∑ ]2
p p (b − b )(η − η ) ( )l>j l j l j l,k j,k ∑d
(l,j)6=(d,1)
+ ∑ +O η3
l>j plpj(bl − b )2
l,k
[∑ ]
j
l=1
−1
Put R = p 2lpj(bl − bj) . Then,
l>j
∑
r2 2k = 1 + plpj(bl − bj) [plpj(bl − bj)R− 1] (ηl,k − ηj,k)
l>j
∑ (l,j)6=(d,1) (∑ )
+ plpjpmpn(bl− bj)(b 3m− bn)R(ηl,k − ηj,k)(ηm,k − ηn,k) +O ηl,j
l>j,m>n l>j
l 6=m,j 6=n
(l,j),(m,n)6=(d,1)
31
∑ (d ∑ )
:= 1− 2 Dl,j(a,p)ηl,kηj,k +O η3l,j . (2.37)
l,j=2 l>j
Thus,
∑ ( )d ∑
rk = 1− Dl,j(a,p)ηl,kηj,k +O η3 = 1− ηT 3l,j kDa,pηk +O(‖ηk‖ )
l,j=2 l>j
where Da,p is a (d− 1)× (d− 1) matrix with
[Da,p]i,j = Di,j(a,p) (2.38)
and ηTk = (η2,k, . . . , ηd,k). From this we have,
−z2e /2I √ (1− η
T
kDa,pηk +O(‖η ‖3))n √= k einφk−iskzσ nk (1 + o(1)).
i πnσ sk
Let B T1(a,p) be the contribution of the boundary terms ± √ ∈ Ik.
σ n
Lemma 2.5.3.
E(|B| ≤ C) .
n(2d−1)/2
Lemma 2.5.4. Let
Ik,l = Ik1|k|αn1/4‖η ‖∈[2l,2l+1k ].
with α = [2(d− 1)]−1. Then there is a constant c̃ such that
E ∑ ∑ ( )|Ik,l| O 1= 2K exp(−c̃22K) .
nd/2
0<|k|<Kn(d−1)/2 l>K
Lemmas 2.5.3 and 2.5.4 will be proven in Section 2.6.
Next we prove a lemma that would allow us to further simplify ∆̂n.
Lemma 2.5.5. (a) sk = sk + ω
Tηk +O(‖η‖2k) where ω = ω(a,p) is a 1× (d− 1)
vector. ( )
‖ ‖ O l√nn(b) If η = then nφk = nska1 + np2η2,k + · · ·+ npdηd,k + o(1).
n
32
Proof. Since sk − sk = ζk part (a) follows by (2.36).
Next, by (2.34) ( )
O 3 ln
3/2 n
φk = arg φ(sk) + δ +
n3/2
Note that,
φ(s ) = eiska1(p + p eiη2,k + · · ·+ p eiηd,kk 1 2 d + pd+1).
Thus,
( )
−1 p2 sin η2,k + · · ·+ pd sin ηd,karg(φ(sk)) = ska1 + tan
∑ p1 + p2 cos η2,k + · · ·+ pd cos ηd,k + pd+1d
= ska1 + plηl,k +O(‖η 3k‖ )
l=2
since the denominator in the first line is 1+O(||η||2). Now part (b) follows easily.
Now, we continue the analysis of the leading term in ∆̂n. Pick a small δ and
define
A1 = {(a,p)| Ik,l = 0 ∀k, l s.t. |k| < δn(d−1)/2 and l < K}.
Then
Ac = {(a,p)| ∃|k| < δn(d−1)/2, |k|αn1/41 ‖ηk‖ ≤ 2K}.
Thus, ∑
c C2
K √
P(A K1) = = O( δ2 )|k|(d−1)αn(d−1)/4
|k|<δn(d−1)/2
1 √
if α = . Hence, for a very large K and δ such that δ2K is very small, we
2(d− 1)
|k|
can approximate ∆n by the sum of Ik’s with δ ≤ ≤ K and |k|αn1/4‖η ‖ ≤
n(d−1)/2 k
2K .
33
√ k
We define the random vector Xk = nηk and Yk = − . Then, combiningn(d 1)/2
terms corresponding to k and −k, we obtain the following approximation to the
distribution of ∆n for large n
√
| | −z2b e /2 ∑d+1 √ sin(nφk − skzσ n)e−XTk Da,pXk
nd/2σ π3 Y∈ kk S(n,δ,K)
where S(n, δ,K) = {k > 0|δ < Yk < K, |Yk|α‖Xk‖ < 2K}.
Define q = (p2, . . . , pd). Then, Lemma 2.5.5 shows that
√ √ √
nφk − s T Tkzσ n = sk(na1 − zσ n) + nq ηk − zσ nω ηk + o(1)
2πnd/2 √ √
= ( na − zσ)Y + ( nq− zσω)T1 k Xk + o(1).|bd+1|
√
Therefore, for large n and K and δ such that δ2K is very small, the distribution
of ∆n is well approximated by
∑ (2πnd/2 √ √ )|b |e−z2/2 sin | | ( na1 − zσ)Yk + ( nq− zσω)TXkd+1 bd+1 T
∆̃n(δ,K) = √ e−Xk Da,pXk .
nd/2σ π3 Yk
k∈S(n,δ,K)
2.6 Expectation of characteristic function.
Proof of Lemma 2.4.2. Recall that d(s) = max d(bjs, 0) where the distance is
2≤j≤d+1
computed on the torus R/(2πZ). Formula (2.35) shows that there are positive con-
stants C, c such that
1 ≤ |φ
n(s)|
C e−cnd(s)2
< C. (2.39)
34
( 2) √
To prove the lemma we decompose E e−cnd(s) into the pieces where d(s) n is of
order 2l for some l ≤ (log2 n)/2. and use the fact that ∂ has a bounded density.( ) (lo∑g2 n)/2 ( √ )
E (φn(s)) ≤ 1CP d(s) < √ + C P d(s) n ∈ [2l, 2l+1] e−c4l
n
l=0
(lo∑g2 n)/2
≤ C 4
l
l C
+ C e−c4 ≤
nd/2 nd/2 nd/2
l=0
completing the proof.
T√1Proof of Lemma 2.5.3. Let k be such that ∈ Ik. Then
∫ σ n√T1/σ n √ n
I = e−iszσ nφ (s)k ds.
[π(2k−1)/|b sd+1| ]
Because T = K nd/2 ∈ π(2k − 1) Tand s , √1 we have s ≈ n(d−1)/21 1 . Thus|bd+1(| ∫ σ n√ )T1/σ n
E(|I Ck|) ≤ E |φn(s)| ds .
n(d−1)/2 π(2k−1)/|bd+1|
We claim that for all fixed bd,
∫∫
e−cnd(s)
2 C
ds db2 . . . dbd−1 ≤ . (2.40)
nd/2
If this is true then using that ρ is a smooth compactly supported density of bd we
have that,
(∫ √ ) ∫∫ ∫ √T1/σ n T1/σ n
E |φn(s)| ds = |φn∫∫ (s)| ds dbd dbd−1 . . . db2π(2k−1)/|bd+1| π∫(2k−1)/|bd+1|√T1/σ n
≤ C e−cnd(s)2∫ ∫∫ ρ(x) ds dx dbd−1 . . . db2π(2k−1)/|x|
≤ C e−cnd(s)2∫ ds(dbd−1 .). . db2 ρ(x) dx
≤ C ρ(x) dx = O 1 .
nd/2 nd/2
35
Thus
C
E(|Ik|) ≤ . (2.41)
n(2d−1)/2
− √T ∈ I |B| ≤ CSimilarly, if k, then(2.41) holds. Hence, E( ) − as required.σ n n(2d 1)/2
√
To prove (2.40) we decompose it into pieces where d(s) n is of order 2l. Taking
µ to be the product measure ds dbd−1 . . . db2 from (2.39) we have
∫∫
2 √
e−cnd(s) ds dbd−1 . . . db2 ≤ Cµ{(s, b2, . . . , bd−1)|d(s) < 1/ n}
(lo∑g2 n)/2 √
+ C µ{(s, b2, . . . , bd−1)|d(s) n ∈ [2l, 2l+1]}e−c4
l
l=0
(lo∑g2 n)/2C 4l≤ + C e−c4l ≤ C
nd/2 nd/2 nd/2
l=0
as required.
Proof of Lemma 2.5.4. Because
r = 1− ηTD η +O(‖η ‖3) and |k|αn1/4k k a,p k k ‖ηk‖ ∈ [2l, 2l+1]
we can write
4l
r = 1− c √ +O(n−3/4k ).|k|2α n
Accordingly
c22l
√
− n
rn ≤ Ce |k|2αk .
Also
l
P(|k|αn1/4‖η‖ ∈ [2l C2, 2l+1]) ≤ √ .
|k|n(d−1)/4
Hence,
c22l
√ 2l√
− n − c2 n
C |k|2α l l |k|2α
E(Ik,l) ≤ √
e √ 2 C2 e= .
n|k| |k|n(d−1)/4 |k|3/2n(d+1)/4
36
Thus ∑ c22K√K − nC2 e |k|2α
E(Ik,l) ≤ .|k|3/2n(d+1)/4
l>K
Therefore we need to estimate
∑ c22K√− nC2Ke |k|2α
=
|k|3/2n(d+1)/4
0<|k|<Kn(d−1)/2
∑ √C 1 22Kn(d−1)/2 c22K√− n
e |k|2α . (2.42)
nd/2 |k| |k|
0<|k|<Kn(d−1)/2
Split the sum over [ ]
Kn(d−1)/2| | ∈ Kn
(d−1)/2
k , (2.43)
2s+1 2s
for s ∈ N. Then, for a fixed s we have ( √ )1
K d−1| nk|2α = O s ,
2 d−1
so each term in the sum (2.42) is of order ( )
2K+ s2K+(3s/2) c2 d−1
− exp − 1 .K3/2n(d 1)/2 K d−1
n(d−1)/2
But the number of such terms is of order . Hence, the sum over k in (2.43)
2s
is ( ( ))
s
2K+s/2O −c2
2K+
d−1
exp .
K3/2 1K d−1
Summing over s we obtain the result.
37
2.7 Relation to homogeneous flows.
Given u ∈ Rd−1, v ∈ R consider the following function on spaceM of unimod-
ular lattices in Rd:
∑
Z sin 2π(u
Tx + vy) −4π2 T(L) = e x Da,px 1
y {δ<y<K,y
α‖x‖<2K}. (2.44)
(y,x)∈L\{0}
1 ( ) ( )1 γ −(d−1)t
Define γ = η and introduce the matrices, Hγ =
e 0
k 0
T I , G = .d− t 0T et1 Id−1
Then, we have
| −z2/2d/2 − bd+1n ∆̃n = √
|e Z(Zd Hγ G lnn ),
σ π3 2
where
√ nd/2 √
u = nq− zσω and v = ( na1 − zσ)|bd+1|
and q and ω are defined at the end of Section 2.5. Let L(n, a) be the unimodular
lattice Zd Hγ G ln(n) . Let
2
wj(n, a) = (yj(n, a),xj(n, a)), j = 1, . . . , d
with yj ∈ R and x ∈ Rd−1j be the shortest spanning set of L. Put,
θj(n, (a,p)) = u
Txj(n, a) + vyj(n, a), j = 1, . . . , d.
Proposition 2.7.1. If (a,p) is distributed according to P then the distribution of
the random vector
((a,p),L(n, a),θ(n, (a,p)))
converges to P×µ as n→∞, where µ is the Haar measure on [SLd(R)/SL dd(Z)]×T .
38
If we restrict our attention only to ((a,p),L(n, a)) then the result is standard
(see [39, Theorem 5.8], as well as [18,35,45]). The proof in the general case follows
the approach of the proof of Proposition 5.1 in [14].
Proof. We need to show that for each bounded smooth test function f ,
∫ ∫
f((a,p),L(n, a),θ) dP→ f((a,p),L,θ) dP dL dθ (2.45)
Ω Ω×M×Td
as n→∞. Write the Fourier series expansion of f :
∑
f((a,p),L(n, a),θ) = f ((a,p),L(n, a)) e2πikT θk . (2.46)
k=(k1,...,kd)∈Zd
Then, it is enough to prove (2.45) for individual terms in (2.46).
If k = 0 then by [39, Theorem 5.8] we can conclude that
∫ ∫
f0((a,p),L(n, a)) dP→ f0((a,p),L) dP dL dθ.
Ω Ω×M×Td
Now assume that k 6= 0. Since Ω is 2d dimensional, we can use (p1, . . . , pd, a1, b2, . . . , bd)
as local coordinates. In these coordinates L is independent of a1. Hence, yj’s and
xj’s are independent of a1. Put ν = (p1, . . . , pd, b2, . . . , bd). Then there exists a
compactly supported density ρ such that,
∫
T
Jn,k = ∫ fk((a,p),L(n, a)) e
2πik θ d(P ) (2.47)√ ∑
= f Tk((a,p),L(n, a)) exp 2πi n k[ ( j
q xj
∫ ) ]
nd/2 (√ )∑ ∑× ρ(a ,ν) exp 2πi na − zσ y k − zσ k ωT1 1 j j j xj da1 dν.|bd+1|
Note that,
∫
T
fk((a,p),L(n, a)) e2πik θdθ1 . . . dθd dP dL = 0
Td×Ω×M
39
because ∫
2πikTe θdθ1 . . . dθd = 0.
Td
Therefore, it is enough to prove that Jn,k converges to 0 as n → ∞. To prove this
we use integration by part(s as follows.∑Put, )
2πn(d+1)/2 yjkj ( )
g(a1,ν)∑= exp i a1 = exp i n
(d+1)/2φ(ν)a
| | 1bd+1
2π yjkj
where φ(ν) = and,
|bd+1| [ ( ∑ ) ]
2πnd/2 y k ∑j j
h(a1,ν) = ρ(a1,ν) exp −i ∫ + 4π k
T
jω xj zσ(a1,ν)|bd+1|
Then, the inner integral in (2.47) is g(a1,ν)h(a1,ν) da1 . Let ε > 0. On the set
Qk = {φ(ν) > ε} we can write
1 ( )
g(a ,ν) da = d exp ia n(d+1)/21 1 φ(ν) .
iφ(ν)n(d+1)/2
1
Integrating by parts on Qk (note that h has compact support) and using trivial
bounds on Qck, w∣∣e∫can co(nclude that ) ∣
| ∣Jn,k| ≤ ∣∣ exp ia n(d+1)/21 φ(ν) ∣∣h′(a1,ν) da1∣+ CP({φ(ν) ≤ ε})iφ∫(ν)n(d+1)/2 ∣
≤ 1 |h′(a1,ν)| da1 + CP({φ(ν) ≤ ε})
εn(d+1)/2
√
for small enough ε. But h′(a1,ν) = O(nd/2), hence the first term is O(1/ n).
Therefore, first taking n → ∞ and then taking ε → 0 we have the required result.
Proposition 2.7.1 implies that as n → ∞ the distribution of nd/2∆̃n(δ,K)
converges to the distribution of
∑
−z2/2 |ad+1 −√a1| sin 2πθ(m)e e−4π2xTDa,px1
y(m) {δ<|y(m)|<K, |y(m)|
α‖x(m)‖<2K}. (2.48)
2σ(a, p) π3
m∈Zd\{0}
40
√
Next we let δ → 0 and K →∞ in such a way that δ2K → 0. Then,
1{δ<|y(m)|<K, |y(m)|α|x(m)|<2K} → 1.
Thus, (2.48) converges to X̂ proving Theorem 2.1.2*.
2.8 Finite intervals.
The proofs of Theorems 2.1.3 and 2.1.4 are similar to the proofs of Theorems
2.1.1 and 2.1.2 so we just explain the necessary changes leaving the details to the
readers.
Proof of Theorem 2.1.3. The random vector (2.10) can be approximated by (Z(1),Z(2))
where Z(i) are defined as in (2.44) with u and v replaced by
√ nd/2 √
u(i) = nq− ziσω and v(i) = ( na − z σ)|bd+1|
1 i
respectively. Define θ(i) as in Proposition 2.7.1 but u and v replaced by u(i) and
v(i). To complete the proof we prove an analogue of Proposition 2.7.1. Namely that
((a,p),L(n, a),θ(1)(n, (a,p)),θ(2)(n, (a,p))) converges to P × µ′ as n → ∞ where
µ′ is the Haar measure on [SLd(R)/SL d dd(Z)]× T × T .
As in the proof of Proposition 2.7.1 we prove that for individual terms in the
Fourier series of a smooth function f on [SL dd(R)/SLd(Z)]× T × Td
∑
T (1) T (1) (2)
f 2πi[k1 θ +k2 (θ −θ )]k1,k2((a,p),L(n, a)) e
(k1,k2)∈Zd×Zd
we have
∫
2πi[kT θ(1)+kT (θ(1)−θ(2)J 1 2 )]n,k1,k2 := fk1,k2((a,p),L(n, a))e dP
Ω
41
∫
−n−→−∞→ f ((a,p),L)e2πi[kT1 θ +kT1 2 (θ1−θ2)]k1,k2 dP dL dθ1dθ2.
Ω×M×Td×Td
The case k1 = k2 = 0 follows from [39, T(heorem 5.8]. Note that )
d/2 ∑ ∑
kT (θ(1) − θ(2)2 )] = (z2(n)−
2πn
z1(n)) y k
T
| | j 2,j
+ k2,jω xj σ.
bd+1
If k1 = 0 choose appropriate local-coordinates in which σ is a coordinate. Integrating
by parts with respect to σ = σ(a,p) and using |z1(n)− z (n)|nd/22 →∞ we see that
Jn,0,k2 → 0 as n→∞.
If k1 6= 0 then using the same local coordinates (a1,ν) as in the proof of
Proposition 2.7.1 we can integrate by parts to conclude that Jn,k1,k2 → 0 as n→∞.
The proof follows through because the leading term of kTθ(1)1 + k
T
2 (θ
(1) − θ(2)) is
still n(d+1)/2φ(ν)a1.
Proof of Theorem 2.1.4. To prove part (a) pick ε̄ < ε. Applying Theorem 2.1.1 we
obtain that for a(lmost every (a,p))
S ( )
P z ≤ √n ≤ z = E (z )− E (z ) +O n−(d−ε̄)/2(a,p) 1 2 d−1 2 d−1 1
σ n
√ ( )
= n(z1)ln +O(l2n) +O(l / n) +O n−(d−ε̄)/2n .
According to the assumptions of part (a) the first term is much larger than the
remaining terms proving the result.
The proof of part (b) is similar except that we apply Theorem 2.1.3 instead of
Theorem 2.1.1 so we only get convergence in probability.
To prove part (c) we first prove the following analogue of Theorem 2.1.3 in
c|ad+1 − a1|
case z2 =(z1 + [ nd/2σ ( )] [ ( )])
nd/2
ez
2
1/2 Ed(z1)−
S S
P √n ≤ 2 na,p z , ez2/21 Ed(z2)− Pa,p √ ≤ z2
Λ(a,p) σ n σ n
42
converges in law to a random vector (X̃1, X̃2)(L, θ, c) where
∑ 2e−4π ||x(m)||2 ( )
(X̃1, X̃2)(L, θ, c) = sin θ(m), sin(θ(m)− cy(m)) .
y(m)
m∈Zd\{0}
Once this convergence is established the proof of part (c) is the same as the proof of
part (b). The proof of convergence is similar to the proof of Theorem 2.1.3 except
that θ(1) and θ(2) are now not independent. Namely using the same notation as in
the proof of Theorem 2.1.3 we have that u(2) = u(1)+o(1), while v(2) = v(1)−c+o(1).
Following the same argument as in the proof of Proposition 2.7.1 we obtain that
(L(n, a),θ(1)(n, a), [θ(2) − θ(1)
∗
](n, a)) converges as n → ∞ to (L∗,θ∗, θ̂ ) where
(L∗,θ∗) is distributed according to the Haar measure on SLd(R)/SLd(Z)× Td and
∗
θ̂j = θ
∗
j − cyj. This justifies the formula for (X̃1, X̃2).
43
Chapter 3: Central Limit Theorem: Weakly Dependent Random Vari-
ables.
3.1 Overview and main results.
∑N
Let SN = Xn be a sum of random variables. We assume that there is a
n=1
Banach space B and a family of bounded linear operators Lt : B → B and vectors
v ∈ B, ` ∈ B′ such that ( )
E eitSN = `(LNt v), t ∈ R. (3.1)
We will make the following assumptions on the family Lt.
(A1) t 7→ Lt is continuous and there exists s ∈ N and δ > 0 such that t 7→ Lt is s
times continuously differentiable for |t| ≤ δ.
(A2) 1 is an isolated and simple eigenvalue of L0, all other eigenvalues of L0 have
absolute value less than 1 and its essential spectrum is contained strictly inside
the disk of radius 1 (spectral gap).
(A3) For all t =6 0, sp(Lt) ⊂ {|z| < 1}.
∥ ∥ 1
(A4) There are positive real numbers K, r1, r2 and N0 such that ∥LN∥t ≤ forN r2
all t satisfying K ≤ |t| ≤ N r1 and N > N0.
44
Remark 3.1.1.
1. In practice we would check (A3) by showing that when t =6 0, the spectral radius
of Lt is at most 1 and no eigenvalue of Lt is on the unit circle. Because the
spectrum of a linear operator is a closed set this would imply that sp(Lt) is
contained in a closed disk strictly inside the unit disk.
(r −)/r
2. Suppose (A4) holds. Let N1 > N
1 1
0 be such that N1 > N0. Then, for all
N > N1,
dN(r1−)/r‖LN‖ ≤ ‖ L 1e
/r
N 1‖ ≤ ‖ LdN
(r1−)/r1e /r1
1
t ( t ) ( t )‖N1
≤ 1 for K ≤ |t| ≤ N r1−
/r
dN (r1−)/r1er2N 11
≤ 1
N r2KN1
r1 − 
where K = N /r1N1 . Therefore fixing N1 large enough we can maker1
r2KN1 as large as we want. Hence, given (A4), by slightly reducing r1, we may
assume r2 is sufficiently large.
3. Suppose (A1), (A2) and (A3) are satisfied with s ≥ 3. Then, [24, Theorem
2.4] implies that there exists A ∈ R and σ2 ≥ 0 such that
SN√−NA →−d N (0, σ2). (3.2)
N
Our interest is in SN that satisfies the CLT i.e. the case σ
2 > 0. Since in ap-
plications we specify conditions which guarantee this, in the following theorems
we always assume that σ2 > 0.
This is essentially an extension of Nagaev-Guivarc'h method. Some of the
spectral assumptions in the theorem can be found in the proofs of decay of corre-
45
lations and the CLT using transfer operators. For example, see [24, 29, 37]. The
key novelty here is the condition (A4) which guarantees a sufficient control over the
characteristic function for intermediate values of t. This is analogous to the condi-
tion (1.3) in Theorem 1.1. In addition, parallels can be drawn between the moment
condition in Theorem 1.1 with the condition s = r + 2. The proof of the result
is based on classical perturbation theory in [33], applicable due to (A1), (A2) and
(A3), which provides the actual expansion and control of the error near 0, the Berry-
Esseen inequality (see (3.4) below) which reduces that error to a Fourier inversion
integral over an interval of size O(nr/2) and the condition (A4).
Now we are in a position to state our first result on the existence of the classical
Edgeworth expansion for random variables satisfying (A1) through (A4) which we
refer to as weakly dependent random variables.
Theorem 3.1.1. Let r ∈ N with r ≥ 2. Suppose (A1) through (A4) hold with
r − 1
s = r + 2 and r1 > . Then SN admits Edgeworth expansion of order r.
2
Next, we examine the error of the order 1 Edgeworth expansion in more detail.
We first show that the order 1 expansion exists if (A1) through (A3) hold with s = 3.
Then, we show that the error of approximation can be improved if (A4) holds.
Theorem 3.1.2. Suppose (A1) through (A3) hold with s ≥ 3. Then, the order 1
Edgeworth expansion exists.
Theorem 3.1.3. Suppose (A1) through (A4) hold with s ≥ 4. Then,
( ) ( )
SN√−NA ≤ P1(z) 1P z = N(z) + n(z) +O
N N1/2 N q{ 1 }
where q = min 1, + r1 .
2
46
1
As one would expect, more precise asymptotics than the usual o(N− 2 ) are
available when the characteristic function has better decay. The proof shows that
the error depends mostly on the expansion of the characteristic function at 0. This
is an indication that the error in Theorem 3.1.2 cannot be improved more than by
√1a factor of even when r1 is large.
N
In [9], analogous results are obtained for subshifts of finite type in the sta-
tionary case and an explicit description of the first order Edgeworth expansion is
given. Here, we consider a wider class of (not necessarily stationary) sequences and
give explicit descriptions of higher order Edgeworth polynomials by relating the
coefficients to asymptotic moments. Also, we improve the condition
( )n
| itS | ≤ − c α(r − 1)Hr : E(e N ) K 1 , < 1, |t| > K|t|α 2
found in [9] by replacing it with (A4). In addition, this allows us to obtain better
asymptotics for the first order expansion.
We also extend the results in [4] on the existence of weak Edgeworth expan-
sions for i.i.d. random variables. In section 3.5.1, we compare their results with the
ours.
Before we mention our results, we define the space Fmk of functions. Put
Cm(f) = max ‖f (j)‖L1 and Ck(f) = max ‖xjf‖≤ L1 .0 j≤m 0≤j≤k
Define
Cmk (f) = C
m(f) + Ck(f).
We say f ∈ Fmk if f is m times continuously differentiable and Cmk (f) <∞.
47
Theorem 3.1.4. Suppose (A1) through (A4) hold with s = r+2. Choose q ∈ N such
r + 1
that q > . Then, for f ∈ F q+2, SN admits weak local Edgeworth expansion of
2r r+11
order r.
Theorem 3.1.5. Suppose (A1) through (A4) hold with s = r+2. Choose q ∈ N such
r + 1
that q > . Then, for f ∈ F q+20 , SN admits weak global Edgeworth expansion of2r1
order r.
In Theorem 3.1.4 and Theorem 3.1.5, f is required to have at least three
derivatives in order to guarantee the integrability of Fourier transforms of f and its
derivatives. In addition to (A1) through (A4), if we have,
C
(A5) There exists C, α > 0 and N1 such that ‖LNt ‖ ≤ for |t| > N r1 for N > Nα 1.t
then we can improve this assumption to f having only one continuous deriva-
tive.
r + 1
Theorem 3.1.4*. Suppose (A1) through (A5) hold with s = r + 2 and α >
2r1
for sufficiently large N . Then, for f ∈ F 1r+1, SN admits weak local Edgeworth
expansion of order r.
r + 1
Theorem 3.1.5*. Suppose (A1) through (A5) hold with s = r+2 and α > for
2r1
sufficiently large N . Then, for f ∈ F 10 , SN admits weak global Edgeworth expansion
of order r.
The proofs of these theorems are minor modifications of the proofs of the previ-
ous two theorems. This is described in remark 3.2.2 appearing after the proofs.
The next theorem gives sufficient conditions for the existence of the averaged
Edgeworth expansion.
Theorem 3.1.6. Suppose (A1) through (A4) hold with s = r + 2. Choose q ∈ N
48
r
such that q > . Then, SN admits averaged Edgeworth expansion of order r for
2r1
f ∈ F q0 .
We note that for integer valued random variable assumptions (A3) and (A4)
cannot hold since the characteristic function of SN is 2π-periodic. Therefore we
replace (A3) by,
(̃A3) When t 6∈ 2πZ, sp(Lt) ⊂ {|z| < 1} and when t ∈ 2πZ, sp(Lt) ⊂ {|z| < 1}∪{1}.
Also, because of periodicity of the characteristic function, an assumption similar to
(A4) is not required.
The following theorem provides conditions for the existence of asymptotic
expansions for the LCLT for weakly dependent integer valued random variables.
A similar result for Xn’s that are Zd-valued, is obtained in [42]. Compare with
Proposition 4.2 and 4.4 therein.
Theorem 3.1.7. Suppose Xn are integer valued, (A1), (A2) and (̃A3) are satisfied
with s = r + 2. Then SN admits order r lattice Edgeworth expansion.
The layout of the rest of the chapter is as follows. In section 3.2 we prove the
results mentioned earlier by constructing the Edgeworth polynomials using char-
acteristic functions and concluding that they satisfy the required asymptotics. In
section 3.3 we relate the coefficients of these polynomials to moments of SN and
provide an algorithm to compute coefficients. A few applications of the Edgeworth
expansions such as the Local Central Limit Theorem and Moderate Deviations, are
discussed in section 3.4. In the last section we give examples of sequences of random
variables for which our theory can be applied. First, we revisit the i.i.d. case and
49
recover previous results. Then, we focus on non-trivial examples like observations
arising from piece-wise expanding maps of an interval, Markov chains with finitely
many states and markov processes which are strongly ergodic.
3.2 Proofs of the main results.
Here we prove the results mentioned earlier. From now on we work in the
setting described in section 3.1.
Proof of Theorem 3.1.1. We seek polynomials Pp(x) with real coefficients such that( )
Sn√− nA
∑r
≤ − Pp(x)
( )
P x N(x) = n(x) + o n−r/2 . (3.3)
n np/2
p=1
Once we have found suitable candidates for Pp(x) we can apply the Berry-Esseen
inequality, ∫ ∣∣ ∣∣T
| − E | ≤ 1 ∣∣∣ F̂n(t)− Êr,n(t) ∣∣∣ C0Fn(x) r,N(x) dt+ , (3.4)π −T t T
where
( )
− rSn√ nA
∑ Pp(x)
Fn(x) = P ≤ x , Er,n(x) = N(x) + n(x),
n np/2
p=1
and C0 is independent of T . We refer the reader to [20, Chapter XVI.3] for a proof
of (3.4). What follows is a formal derivation of Pp(x). Later, we will use (3.4) along
with other estimates to prove (3.3).
It follows from (A1), (A2) and classical perturbation theory (see [33, IV.3.6
and VII.1.8]) that there exist δ > 0 such that for |t| ≤ δ, Lt has a top eigenvalue
µ(t) which is simple and the remainder of the spectrum is contained in a strictly
50
smaller disk. One can express Lt as
Lt = µ(t)Πt + Λt (3.5)
where Πt is the eigenprojection to the top eigenspace of Lt and Λt = (I − Πt)Lt.
Because ΛtΠt = ΠtΛt = 0, iterating (3.5), we obtain
Ln n nt = µ (t)Πt + Λt .
Using (A3) and compactness, there exist C (which does not depend on n and t) and
0 < r < 1 such that ‖Λnt ‖ ≤ Crn for all |t| ≤ δ. By (3.1),
√ ( )n ( ) ( )
E(eitSn/ n
t
) = µ √ ` Π √ n √t/ nv + ` Λt/ nv . (3.6)n
Now, we focus on the first term of (3.6). Put
Z(t) = `(Πtv). (3.7)
Then, substituting t = 0 in (3.6) yields 1 = Z(0) + `(Λn0v). Also, we know that
lim ‖Λn0v‖ = 0. This gives lim `(Λn0v) = 0. Therefore, Z(0) = 1 and Z(t) 6= 0
n→∞ n→∞
when |t| < δ. Also, this shows that `(Λn0v) = 0 for all n. Next, note that t →7 µ(t)
and t 7→ Πt are r+ 2 times continuously differentiable on |t| < δ (see [33, IV.3.6 and
VII.1.8]). Therefore, Z(t) is r + 2 times continuously differentiable on |t| < δ.
Now we are in a position to compute Pp(x). To this end we make use of
ideas in [20, Chapter XVI] (where the Edgeworth expansions for i.i.d. random
variables are constructed) and [24] (where the CLT is proved using Nagaev-Guivarc’h
method).
51
Consider the function ψ such that,
( ) 2 2 ( ) ( ) ( ( ))
inAt σ2t2
log µ √t i√At= − σ t t+ ψ √ ⇐⇒ µn √t √ − √t= e n 2 exp nψ .
n n( 2n n nS ) ([S ]
n
2)
n n − nA
where A = lim E is the asymptotic mean and σ2 = lim E √ is
n→∞ n n→∞ n
the asymptotic variance. (For details see section 3.3.)
By (3.6) we have,
( ( )) ( ) ( )
Sn
E it √
−nA σ2t2− t t − i√nAt(e n ) = e 2 exp nψ √ Z √ + e n ` Λn√t v (3.8)n n n
Notice that ψ(0) = ψ′(0) = 0 and ψ(t) is r+2 times continuously differentiable.
Now, denote by t2ψr(t) the order (r + 2) Taylor approximation of ψ. Then, ψr is
the unique polynomial such that ψ(t) = t2ψr(t) + o(|t|r+2). Also, ψr(0) = 0 and ψr
is a polynomial of degree r. In fact, we can write ψ(t) = t2ψ r+2r(t) + t ψ̃r(t) where
ψ̃r is continuous and ψ̃r(0) = 0. Thus,
( ( ))t ( ( t ) 1 ( t ))
exp nψ √ = exp t2ψ √ + tr+2r ψ̃r √ .
n n nr/2 n
Denote by Zr(t) the order−r Taylor expansion of Z(t) − 1. Then, Zr(0) = 0 and
Z(t) = 1 + Zr(t) + t
rZ̃r(t) with twice continuously differentiable Z̃r(t) such that
Z̃r(0) = 0. Then, to make the order n
−j/2 terms explicit, we compute:
2 ( ) ( )σ t2 t t
e µn2 √ Z √
n ( n2 2 t ) ( )σ t t
= e n2 µ √( ( e)xp logZ
√
n n
t 1 ( )
= exp t2ψr √
t
+ tr+2ψ̃ √
n nr/2
r∑nr
− (−1)
k+1 [ ( )]k ( ))
Zr √
t − 1 tr tZ
r/2 r
√
k n n n
k=1
52
∑r r1 [ (2 √t ) ∑ k+1[ ( )]k]m= 1 + t ψr − (−1) Zr √t
m! n k n
m=1 k=1
1 ( ) (r+2 √t 1 r √t ) (r+1 − r+1 )+ t ψ̃ − t Z + t O n 2
∑ nr/2 r r/2 rn n nr Ak(t) tr ( t ) ( r+1 )
= + ϕ √ + tr+1O n− 2 (3.9)
nk/2 nr/2 n
k=0
where A0 ≡ 1, ϕ(t) = t2ψ̃r(t) − Zr(t) is continuous and ϕ(0) = 0. Here Zr is the
remainder of logZ(t) when approximated by powers of Zr. Next write,
∑r Ak(t)
Qn(t) = . (3.10)
nk/2
k=1
Notice that
Ak and k have the same parity. (3.11)
This can be seen directly from the construction, because we collect terms with the
same power of n−1/2
t
, ψr and Zr are a polynomial in √ with no constant term and
n
we take powers of t2ψr(t) and Zr(t), the resulting Ak will contain terms of the form
c t2s+ks .
We claim tha(t,∫ ∣∣∣ ) ( ) 2 2 2n t t − t σ t σ2∣µ √ Z √ ∣e− − e−2 2 Qn(t)n n ∣∣√
|t|<δ n ∫ ∣ t [ ( ) (∣ dt (3.12)
2 2 ∣∣∣
)]
exp nψ √tt σ + logZ √
t − 1− ∣Qn(t)n n ∣
= e− 2 ∣√ ∣ dt
|(t|<δ n ) t
= o n−r/2 .
We note[ tha(t fr)om the ch(oice)]of Qn,
exp nψ √t + logZ √t − 1−Qn(t) ( ( ) )n n 1 ( )
= tr−1ϕ √t
r+1
+ trO n− 2
t nr/2 n
53
where ϕ(t) = o(1) as t→ 0. As a result, for all ε > 0 the integrand of (3.12) can be
ε 2 2
made smaller than (tr−1
t σ
+ tr)e− 2 by choosing δ small enough. This proves
nr/2
the claim.
√
Even though the following derivation is only valid for |t| < δ n, once the
polynomial function Qn(t) is obtained as above, we can consider it to be defined for
all t ∈ R.
Suppose |t| ≤ δ. From classical perturbation theory (see [33, Chapter IV]
and [29, Section 7]) we have ∫
n 1Λt = z
n(z − Lt)−1 dz (3.13)
2πi Γ
where Γ is the positively oriented circle centered at z = 0 with radius ε0. Here ε0 is
uniform in t and 0 < ε0 < 1. ∫Now,
Λn − n 1 nt Λ0 = ∫ z [(z − L
−1
t) − (z − Lt)−1] dz
2πi Γ
1
= zn[(z − L −10) (Lt − L0)(z − L )−1t ] dz.
2πi Γ
Λn − Λn
Because Lt − L0 = O(|t|) we have that t 0 = O(εn0 ). ` ∈ B′ and `(Λn0v) = 0|t|
implies th∫at ∣∣∣ − i√nAt ∣ ∣ − i√nAt∣e n `(Λn √ v) ∣∣∣ ∫ ∣∣∣e n `(Λn √ v − Λnv)t/ n t/ n 0√ dt =t ∫ √ ∣ ∣ t ∣
∣∣∣ dt
|t|<δ n |t|<δ n ∣∣∣Λn − Λn ∣≤ C t 0 ∣∣ dt = O(εn).
|t|<δ t
0
This decays exponentially fast to 0 as n→∞. This allows us to control the second
term in the R∫HS of ∣(3.6). Combining this with (3.12) w∣e can conclude that,∣∣∣ Sn√−nA t2σ2 t2σ2√
|t|<δ n ∣E
it
(e n )− e− −2 − e 2 Qn(t) ∣∣∣∣ dt = o(n−r/2). (3.14)t
54
Observe that,
σ2 2
̂k
− t 1 d − t2
k
k d̂
∫ (it) e 2 = √ e 2σ2 = n(t)2πσ2 dtk dtk
where f̂(x) = e−itxf(t) dt is the Fourier transform of f . Therefore,
( )[ 2 ]t
Rj(t)n(t) = √
1 − dAj i e− 2σ2 . (3.15)
2πσ2 dt
Then, the required Pp(x) for p ≥ 1, can be found using the relation,
d [ ]
n(x)Rp(x) = n(x)Pp(x) . (3.16)
dx
For more details, we refer the reader to [20, Chapter XVI.3,4].
C0
Given ε > 0, choose B > where C0 is as in (3.4). Let r ∈ N. Then we
ε
choose polynomials Pp(x) as descr∣ibed above. Then, from (3.4) it f∣ollows that,∫ Bnr/2 ∣∣∣ Sn−nA 2 2| − E | ≤ 1 ∣E it
√ − − t σ(e n ) e 2 (1 +Qn(t))∣∣ C0
Fn(x) r,n(x) ∣ dt+
π −Bnr/2 t ∣ Bnr/2
≤ εI1 + I2 + I3 +
nr/2
where ∫ ∣∣∣∣ itSn√−nA t2σ2 ∣1 ∣E(e n )− e− 2 (1 +Qn(t)) ∣∣I1 = π √ ∣ dt|t|<δ n∫ t∣∣∣∣
√ ∣ ∣
1 E(eitSn/ n) ∣
I ∣2 = dt
π ∫ √ ∣δ n<|t|<Bnr/2 t
1 − t2σ2I = e 2 ∣∣∣∣ ∣1 +Qn(t) ∣∣3 π √ t ∣ dt.|t|>δ n
From (3.12) we have that I1 is o(n
−r/2). Because our choice of ε > 0 is arbitrary
the proof is complete, if I2 and I3 are also o(n
−r/2). These follow from (3.18), (3.19)
and (3.17) below.
55
It is easy to see that,
∫ ∣ ∣
2 2
− t σ ∣∣∣1 +Qn(t) ∣2 ∣√ e ∣ dt = O(e−cn) (3.17)
|t|>δ n t
for some c > 0. Thus, we only need to control,
∫ ∣∣∣∣
√ ∣
E(eitSn/ n)∣
I2 = ∣∫ √ ∣ dtδ n<|t|<Bnr/∣∣2 t∣ √∣ ∣∣∣∣itS / n ∫ ∣
√ ∣
E(e n ) ∣∣E(eitSn/ n) ∣∣= √ √ dt+t √ ∣ ∣ dtδ n<|t|<δ n δ n<|t|<Bnr/2 t
where δ > max{δ,K} with K as in (A4).
By (A3) the spectral radius of Lt has modulus strictly less than 1. Because
t 7→ Lt is continuous, for all p < q, there exists γ < 1 and C > 0, such that
‖Lmt ‖ ≤ Cγm for all p ≤ |t| ≤ q for sufficiently large m. Then using (3.1) for
sufficiently large n we have,
∫ ∣∣∣ √∣ ∣E(eitSn/ n) ∣∣∣ ∫≤ √1 Cγn√ √ dt √ √ ‖Ln √t/ n‖ dt ≤ √ . (3.18)
δ n<|t|<δ n t δ n δ n<|t|<δ n n
√
This shows that the integral converges to 0 faster than any inverse power of n.
Next for sufficiently large n,
∫ √∣∣∣∣ ∣itS / n ∣∣∣ ∫E(e n ) ≤ 1√ dt √ √ |`(Ln √ v)| dt (3.19)
δ n<|t|<Bnr/2 t δ n
t/ n
δ n<|t|<Bnr/2
≤ 2Bn
r/2
‖`‖‖v‖
δnr2+1/2
r−1
= Cn −r −r/22 2 = o(n ).
1
The second inequality is due to assumption (A4) i.e. ‖Ln √t/ n‖ ≤ wherenr2
r − 1 r − 1
r2 > (we can assume r2 > for large n due to Remark 3.1.1) and
2 2
√|t| r−1 r−1K ≤ δ < < Bn r2 ≤ n 1 for n ∈ N with nr1− 2 ≥ B.
n
56
The proof of Theorem 3.1.2 follows the same idea. We include its proof for
completion.
Proof of Theorem 3.1.2. Because (A1) through (A3) hold with s ≥ 3, we have (3.9)
C0
where ϕ is continuous, ϕ(0)∫= 0 a∣nd r = 1. Given ε > 0, choose B > . Then,√ ∣∣∣
ε
Sn√−nA 2 2
∣
1 B n ∣E it(e n )− e−
t σ ∣
2
| (1 +Qn(t))∣Fn(x)− E1,n(x)| ≤ ∣
π √−B n t ∣ C√0dt+ B n
≤ εI1 + I2 + I3 + √ .
B n
Because, ϕ(t)[= o((1) )as t→ 0(and)]
exp nψ √t + logZ √t − 1−Q1(t) ( ) ( )n n
= √1 t 1ϕ √ + tO
t n n n
we have that, ∫ ∣∣∣∣ itSn√−nA t
2σ2 t2σ2 ∣E(e n )− e− − e−2 2 Q1(t) ∣
I ∣1 = √ ∣ dt = o(n−1/2).
|t|<δ n t
Also, I = O(e−cn∫ 3 ). Finally, because of (A3) there is γ < 1 such that,∣∣ √∣∣ ∣ ∣ ∣E(eitSn/ n ∫) ∣∣∣ ∣E(eitSn) ∣√ √ dt = ∣∣ ∣∣ dt ≤ C sup ‖Lnt ‖ ≤ Cγn
δ n<|t|<B n t δ<|t|<B t δ≤|t|≤B
Combining these estimates we have the result.
A slight modification of the previous proof gives us the proof of Theorem 3.1.3.
Higher regularity assumption gives us better asymptotics near 0 and the assumption
on the faster decay of the characteristic function gives us more control in the mid
range.
Proof of Theorem 3.1.3. Because (A1) through (A4) hold with s ≥ 4, we have (3.9)
where ϕ is C1, ϕ(0) = 0 a∫nd r = 1∣. Then,1/2+r S −nA 2 2 ∣n 1 ∣∣∣∣E it
n√
| − E | ≤ 1 (e
n )− − t σe 2 (1 +Qn(t))∣∣ C0
Fn(x) 1,n(x) ∣ dt+
π −n1/2+r1 t ∣ n1/2+r1
57
≤ C0I1 + I2 + I3 +
n1/2+r1
( t ) t
Because, ϕ √ ∼ √ near 0 and
[n ( )n ( )]
exp nψ √t + logZ √t − 1−Q1(t) 1 ( t ) (n n √ √ O 1)= ϕ + t
t n n n
we have that,
∫ ∣∣∣∣ itSn−nA
2
E √ − − t σ
2 t2σ2 ∣
(e n ) e 2 − e− 2 Q1(t) ∣∣∣ ( 1)I1 = √ dt = O .
|t|<δ n t n
Also, I3 = O(e−cn). As before, (3.18) holds for δ > max{δ,K}.
‖Lnt ‖ ≤
1
where K ≤ δ < |t| < nr1 .
∫ nr2 ∣∣∣∣
√ ∣∣∣∣ ∫E(eitSn/ n) ∣∣∣∣ ∣E(eitSn) ∣∣ 1√ dt = ∣ dt ≤ Cnr1−r2+ 2
δ n<|t|<n1/2+r1 t δ<|t|<nr1 t ( 1 )
Because r2 can be made arbitrarily large by choosing n large enough, I2 = O .
n
Therefore, ( )
|Fn(x)− E1,n(x)| = O
1
{ 1 } n
s
where s = min 1, + r1 and we have the required conclusion.
2
Remark 3.2.1. In the proof above, I1 gives the contribution to the error from the
1
expansion of the characteristic function near 0. This dominates when r1 ≥ .
2
Weak forms of Edgeworth expansions are discussed in detail in [4]. We adapt
the ideas found in [4] to our proofs of Theorems 3.1.4 and 3.1.5. One key difference
is the requirement on f to have two more derivatives than required in [4]. This
compensates for the lack of control over the tail of the characteristic function of SN .
In fact, it is enough to assume 1 + α more derivatives. But to avoid technicalities
58
we stick to the stronger regularity assumption. In the i.i.d. case as shown in
[4], a Diophantine assumption takes care of this. See section 3.5.1 for a detailed
discussion. ∫
Proof of Theorem 3.1.4. Recall that f̂(t) = e−itxf(x) dx and pick A as in (3.2).
Then by Plancherel theorem,
∫
1
E(f(Sn − nA)) = ∫ f̂((t)E(e
it(Sn−nA)) ) dt (3.20)2π√ 1 t itSn⇒ √−nA= nE(f(Sn − nA)) = f̂ √ E(e n ) dt.
2π n
We first estimate RHS away from 0. Fix small δ > 0. (A particular δ is chosen
later). Notice that for all δ ≤ |t| ≤ K (where K as in (A4)), there exists c0 ∈ (0, 1)
such that ‖Lnt ‖ ≤ cn0 . Thus,
∣∣∣∣ ∫ ∣∣ ∫ ∣∣f̂(t)E(eit(Sn−nA)) dt∣∣ ≤ ∣f̂(t)`(Lnt v)∣∣∣ dt ≤ C‖f‖1cn0 .
δ<|t|<K δ<|t|<K
By Remark 3.1.1, for large n we can assume r2 > r1 + (r + 1)/2. Therefore,
∣∣∣∣ ∫ ∣∣ ∫it(S −nA) ∣∣ ≤ ‖ ‖ ‖ ‖‖ ‖ ‖Ln‖ ≤ C‖f‖1f̂(t)E(e n ) dt f 1 ` v t dt r −r
K<|t|<nr 2 11 K<|t|<nr1 n
= ‖f‖ −(r+1)/21o(n ).
Because f ∈ F q+2 qr+1 , we have that t f̂(t) = (−i)qf̂ (q)(t) and f̂ (q) is integrable.
In fact, |f̂ (q)(t)| ≤ C . Note that we are using only the fact that f is q + 2
(1 + |t|)2
times continuously differentiable with integrable derivatives. Therefore for this to
r + 1
be true f ∈ F q+20 is sufficient. Integrability of f̂ (q) along with q > implies,
∣ 2r∣∣∣ ∫ ∣∣ ∫ ∫ ∣
1
∣ f̂ (q)it(S −nA) (t) ∣∣f̂(t)E(e n ) dt∣∣ ≤ |f̂(t)| dt ≤ ∣ ∣ dt (3.21)q
|t|>nr1 |t|>nr1 |t|>nr1 t
59
‖f̂ (q)≤ ‖1 = ‖f̂ (q)‖ o(n−(r+1)/21 ).
nr1q
Therefore, ∣∣∣∣ ∫ ∣∣f̂(t)E(eit(Sn−nA)) dt∣∣ = o(n−(r+1)/2). (3.22)
|t|>δ
√
From (3.8), for |t| ≤ δ n, we have,
itSn√−nA σ
2t2
E 2(e n ) = e− et O(δ)2 (1 +O(δ)) +O(n0 ).
√
Thus, choosing small δ, for large n when |t| < δ n there exist c, C > 0 such that
∣∣ ( Sn√−nA )∣E ite n ∣ ≤ 2Ce−ct .
Then, √ √ ∣ ∣Sn−nA
D log n < |t| ∣ it √ ∣< δ n =⇒ ∣E(e n )∣ ≤ Ce−cD logn C=
ncD
an∣d∣∣∣ ∫ ∣ ∣ ∫ ( ) ∣√ f̂(t)E(eit(Sn− ∣∣∣ ∣∣∣ t itSn√−nA ∣nA) dt) dt = f̂ √ E(e n )√ ∣D logn √ √ ∣<|t|<δ D∫logn<|t|<δ n n nn
≤ C | | 2δC‖f‖1
ncD √
f̂(t) dt = .
D logn cD<|t|<δ n
n
Combining this wi∣t∫h (3.22) and choosing D such that, cD > (r+ 1)/2 we have that,∣∣∣ ∣∣√ f̂(t)E(eit(Sn−nA)) dt∣∣ = o(n−(r+1)/2). (3.23)
|t|> D√lognn
| | D log nNext, suppose t < . Then,
∑nr f̂ (j)(0) tr+1
f̂(t) = tj + f̂ (r+1)((t))
j! (r + 1)!
j=0
where 0 ≤ |(t)| ≤ |t|. N∣ ∫ote that,∣ ∣∣ ∫
|f̂ (r+1)((t))| = ∣∣ xr+1e−i(t)xf(x) dx∣∣ ≤ |xr+1f(x)| dx ≤ Cr+1(f).
60
Therefore,
∫ ( t ) itSn√−nA
√ f̂ √ E(e n ) dt
|t|< D logn ∑nr f̂ (j) ∫(0) itSn√−nA
= tjE(e n√ ) dtj!nj/2
j=0 |t|< D logn ∫
1 1 ( ( ))itSn√−nA t
+ E(e n )tr+1f̂ (r+1)  √ dt
n(r+1)/2 (r + 1)! √|t|< D logn n
where
∣∣∣∣ ∫ ∣ ∫itSn√− ( ( ))nA ∣E(e n )tr+1f̂ (r+1) t 2√  √ dt∣∣ ≤ C (f) |t|r+1e−ctr+1 dt
|t|< D logn n
for large n. Hence,
∫ ( t ) itSn√−nA
n
√ f̂ √ E(e ∫ ) dt|t|< D logn ∑ nr f̂ (j)(0) itSn√−nA
= j n −(r+1)/2
j!nj/2 √
t E(e ) dt+ Cr+1(f)O(n ). (3.24)
j=0 |t|< D logn
Because s = r + 2, from (3.9),
σ2t2
( ( )) ( )
itSn√−nA t t i 2 2E √ √ − √
nAt+σ t ( )
e 2 (e n ) = exp nψ Z + e n 2 ` Λn √
∑ n n t/ n
v
r
A (t) tr ( t ) ( log(r+1)/2 )k
= + ϕ √ (n)+O . (3.25)
nk/2 nr/2 n n(r+1)/2
k=0
Substituting this in (3.24),
∫ ( )
√t E it
Sn√−nA
√ f̂ (e
n ) dt (3.26)
|t|∑< D logn ∫ nr f̂ (j) r(0) ∑ (2 2 Ak(t) log(r+1)/2(n))
= √ t
je−σ t /2 dt+O
j!nj/2 ∫ nk/2 n(r+1)/2∑j=0 ∑ |t|< D logn k=0r r f̂ (j)(0)
= tjA (t)e−σ
2t2/2 dt+ o(n−r/2).
j!n(k+j)/2 √
k
k=0 j=0 |t|< D logn
61
Recall from (3.11) that Ak and k have the same parity. Therefore, if k + j is odd
then ∫
j 2 2
√ t A (t)e
−σ t /2
k dt = 0.
|t|< D logn
So only integral powers of n−1 will remain in the expansion. Also there is C that
depends only on r such that,
∫ ∫
j −σ2t2/2 ≤ 4r −σ2t2/2 C C√ t Ak(t)e dt C √ t e dt ≤ = .eσ2D log(n)/4 nσ2D/4|t|≥ D logn |t|≥ D logn
Choosing D such that 2σ2D > (r + 1)/2,
∫ ∫
tj
2 2 2 2
A (t)e−σ t /2 dt = tjA (t)e−σ t /2 dt+ o(n−r/2k √ k ).
R |t|≤ D logn
Therefore, fixing D large, we can assume the integrals to be over the whole real line.
Now, define ∫
a = tj
2 2
k,j Ak(t)e
−σ t /2 dt
R
and substitute ∫
f̂ (j)(0) = (−it)jf(t) dt
R
in (3.26) to obtain,
∫ ( t ) ∑r ∑r ∫√ E itSn√−nA 1√ f̂ (e n ) dt = ak,j (−it)jf(t) dt+ o(n−r/2)
|t|< D logn n j!n
(k+j)/2
k=0 j=0 R
∑ ∫ ∑ (3.27)r 1 ak,j
= f(t) (−it)j dt+ o(n−r/2)
np j!
p∑=0 R k+j=2pbr/2c ∫1
= f(t)P (t) dt+ o(n−r/2p,l )
np
p=0 R
62
where ∑ ak,j
Pp,l(t) = (−it)j. (3.28)
j!
k+j=2p
The final simplification was done by absorbing the terms corresponding to higher
powers of n−1 into the error term. Note that Pp,l is a polynomial of degree at most
2p and that once we know A0, . . . , A2p we can compute Pp,l.
Finally combining (3.27) and (3.23) substituting in (3.20) we obtain the re-
quired result as shown below.
√ ∫
− 1
( t ) itSn√−nA
nE(f(Sn nA)) = n
2π √
f̂ √ E(e ) dt
|t|< D logn n
√ ∫
n
+ √ f̂(t)E(eit(Sn−nA)) dt
D logn
b
1 ∑
2π |t|>
n
r/2c ∫
1 √
= f(t)P −r/2p,l(t) dt+ o(n ) + n o(n
−(r+1)/2)
2π np
∑p=0br/2c ∫R1 1
= f(t)P (t) dt+ o(n−r/2p,l ).
2π np
p=0 R
The proof of Theorem 3.1.5 uses the relation (3.25) derived in the previous
proof. But we do not use the Taylor expansion of f̂ , so differentiability of f̂ is not
required. So the assumption on the decay of f at infinity can be relaxed.
Proof of Theorem 3.1.5. Multiplying (3.25) by f̂ and integrating we obtain,
∫ ( t ) itSn√−nA
√ f̂ √ E(e n ) dt
|t|< D logn n ∑r ∫1 ( t ) σ2t2
= √ f̂ √ Ak(t)e
−
2 dt+ ‖f‖1o(n−r/2).
nk/2 n
k=0 |t|< D logn
63
As in the proof of Theorem 3.1.4 the integrals above can be replaced by inte-
grals over R with∫out altering the order of the error because( t ) 2 2
√ f̂ √
σ t
Ak(t)e
−
2 dt ≤ ‖f‖1 o(n−r/2)
|t|≥ D logn n
f∫or D such that 2σ
2D > (r + 1)/2. Therefore∫,( ) rt ( )Sn ∑√−nA 1 t σ2t2it
n
√ f̂ √ E(e ) dt = f̂ √ Ak(t)e
−
2 dt+ ‖f‖ o(n−r/2).
k/2 1
|t|< D logn n n nk=0 R
We pick Rp as in (3.15) and claim Pp,g = Rp.
√ √ √
Note th∫at nf(t n)←→ f̂(t/ n). So ∫by the Plancherel theorem,√ ( √ ) 1 ( t ) σ2t2
nf t n Rk(t)n(t) dt = f̂ √ Ak(t)e− 2 dt.
R 2π R n
Thus, ∫
1 ( t ) itSn√−nA√ n
2π n √
f̂ √ E(e
|t|< D logn n (∑r ∫
) dt
1 1 √ ( √ ) )
= √ nf t n Rp(t)n(t) dt+ ‖f‖ −r/2p/2 1o(n )
∑n np=0 Rr ∫1 ( √ )
= f t n R (t)n(t) dt+ ‖f‖ o(n−(r+1)/2). (3.29)
np/2
p 1
p=0 R
Note that (3.23) holds because f ∈ F q+20 . Now, combining (3.29) with the estimate
(3.23) completes the proof.
Remark 3.2.2. Proofs of both the Theorem 3.1.4* and Theorem 3.1.5* are almost
identical except the estimate (3.21). In order to obtain the same asymptotics, the
assumption on the integrability of f̂ (q) can be replaced by (A5) and the fact that
|f̂(t)| ∼ 1 for a∣s t→ ±∞.t ∣∣∣ ∫ ∣f̂(t)E(eit(Sn−nA)) dt∣∣∣ ∫≤ C |f̂(t)|‖Lnt ‖ dt
|t|>nr1 |t|>nr1
64
∫
≤ ‖ ‖ 1C f 1 dt1+α
|∫t|>nr1 t
≤ C‖f‖1 1 dt
nr1(α−) t1+
r + 1
Since, r1α > choosing  small enough we can make the expression ‖f‖ o(n−(r+1)/21 )
2
as required.
Proof of Theorem 3.1.6. Select A as in (3.2). Define Pp by (3.15) and (3.16) and
√ y
f̃n(x) = f(− nx). Then the change of variables −√ → y yields,
∫ n[ (S − nA y ) ( y ) ( y )] √
P n√ ≤ x+ √ −N x+ √ − Er,n x+ √ f(y)dy = n∆n ∗ f̃n(x).
n n n n
∑r
where E 1r,n(x) = Pp(x)n(x).
np/2
p=1
itSn√−nA
Notice that E(e n )f̂̃n ∈ L1. Therefore,
∫
′ 1 − itSn√−nA(Fn ∗ f̃n) (x) = e itxE(e )f̂̃n n(t) dt.
2π
Also,
[ (∑r ∫1 )] 1 σ2t2 ( )
n + Rpn ∗ f̃n(x) = e−itxe− 2 1 +Qn(t) f̂̃n(t) dt
np/2 2π
p=1
where Rp’s are polynomials given by (3.15) and Qn(t) is given by (3.10). From these
we conclude that,
∫
′ 1 − ( itSn√−nAitx σ2t2 ( )(∆n ∗ f̃n) (x) = e E(e n )− e− 2 1 +Qn(t) f̂̃n(t) dt. (3.30)
2π
We claim that,
∫ itSn√−nA σ2t2 ( )
1 n− E(e )− e
−
2
itx 1 +Qn(t)(∆ ∗ f̃ )(x) = e f̂̃n n n(t) dt. (3.31)
2π −it
65
Indeed, if the right side of (3.31) converges absolutely, then Riemann-Lebesgue
Lemma gives us that it converges 0 as |x| → ∞. Differentiating (3.31) we obtain
(3.30). Thus the two sides in (3.31) can differ only by a constant. Since both are 0
at ±∞, this constant is 0 and (3.31) holds.
Now, we are left with the task of showing that the right side of (3
̂̃ 1 (
.31) con
t )-
verges absolutely. From the definition of f̃n it follows that, fn(t) = √ f̂ − √ .
n n
Combining this with (3.14), we have that,
∣∣∣ ∫ Sn√−nA σ2t2it ( )∣ ∣n−itxE(e )− e− 2 1 +Qn(t) ∣√ e ̂̃(fn(t) dt
∣
|t|<δ n ∫ ∣∣ −it
∣
∣ Sn−nA 2 2 )∣ ∣E it √ − −σ t(e n ) e 2 1 +Q (t)≤ n ̂̃ ∣∣√∫ ∣ (
fn(t)∣ dt
|t|<δ n t
itSn√−nA σ2t2 )∣
≤ ‖√f‖
n
1 ∣∣E(e )− e− 2 1 +Qn(t) ∣∣ dt
n √ ∣ ∣|t|<δ n t
= ‖f‖1o(n−(r+1)/2).
Note that,
∣∣∣ ∫ itSn√−nA 2 2∣ E(e n )− e−σ t
( )
− 2e itx
1 +Qn(t)
f̂̃ ∣∣√ (n(t) dt∣
∣
|t|>δ n ∫ ∣ −it∣∣ itSn√−nA 2 2∣ −σ t
) ( )∣E(e n )− e 2 1 +Qn(t)≤ t ∣∣√ ( f̂ − √ )∣ dt|t|>δ∫ n t n
1 ∣∣∣ 2 2 2∣ √ ∣E(e−it(Sn−nA) n σ t)− e− 2 1 +Q (− nt) ∣≤ √ n∫ f̂(t)
∣ dt
n ∣|t|>δ ∣ ∣ t
≤ √1
∣∣∣E(e−it(Sn−nA)) ∣∣∣ 2f̂(t) dt+O(e−cn ).n |t|>δ t
Put, ∫
1 ∣∣∣∣E(e−it(Sn− ∣nA)) ∣Jn = √ f̂(t)n t ∣∣ dt.|t|>δ
We claim J = o(n−(r+1)/2n ). This proves that (3.31) converges absolutely as required.
66
To conclude the asymptotics of Jn, choose δ > max{δ,K} where K as in (A4).
From (A3) there exists γ < 1 such that ‖Lnt ‖ ≤ γn for all δ ≤ |t| ≤ δ for sufficiently
large n. Then, usin∣g (3.1) for sufficien∣tly large n we have,∫ ∫
√1 ∣∣∣E(e−it(Sn−nA)) ∣∣∣ ≤ C‖√f‖1f̂(t) dt ‖Ln‖ dt = O(γn).n δ<|t|<δ t δ n tδ<|t|<δ
1
Next, for K ≤ δ ≤ |t| ≤ nr1 , ‖Lnt ‖ ≤ . Hence, for n sufficiently large so thatnr2
r
r2 > ,
2 ∫
√1 ∣∣∣∣ ∣E(e−it(Sn−nA)) ∣ ∫f̂(t)∣∣ dt ≤ √C ‖Lnt ‖|f̂(t)| dtn δ<|t|<nr1 t δ n δ<|t|<nr1
≤ C‖f̂‖1 = o(n−(r+1)/2).
nr2+1/2
r
Since q > , we have that,
∫ 2r1 ∣∣ ∣ ∫
√1 ∣∣E(e−it(Sn−nA)) ∣ ‖f (q)‖1 1 C‖f (q)‖1f̂(t)∣ dt ≤ √ dt ≤n ∣ q+1 qr1+1/2|t|>nr1 t n |t|>nr1 |t| n
= o(n−(r+1)/2).
Combining the above estimates, J = Cq(f)o(n−(r+1)/2n ).
This completes the proof that (∆ ∗ f̃ )(x) = o(n−(r+1)/2n n ). Hence,∫ [ (Sn√− nA ) ( ))]P ≤ yx+ √ −N x+ √y f(y)dy
n ∫ (n ) n √
= Er,n x∫+ √
y
( f(y) dy)+ n∆n ∗ f̃ (x)∑ nnr 1 y
= P x+ √ n(x)f(y) dy + Cq(f)o(n−r/2p )
np/2 n
p=1
as required.
In the lattice case, periodicity allows us to simplify the proof significantly
although the idea behind the proof is similar to the previous proofs.
67
Proof of Theorem 3.1.7. Under assumptions (A1) and (A2) we have the CLT for
Sn. Put A as in (3.2). We obs∫erve that,π ∫ π
2πP(Sn = k) = e−itkE(eitSn) dt = e−itk`(Lnt v) dt.
−π −π
After changing variabl∫es and using (3.6), (3.7) we have,√ √√ π n ( ∫−√itk t )n ( t ) π n −√itk ( )
2π nP (Sn = k) = √ e nµ √ Z √ dt+ n
n √
n n √
e ` Λt/ nv dt.
−π n −π n
(3.32)
By (̃A3) there exists C > 0 and r ∈ (0, 1) (both independent of t) such that
|` (Λnt v) | ≤ Crn for all t ∈ [−π, π]. Therefore the second term of (3.32) decays
exponentially fast to 0 as n→∞.
Now, we focus on the first term. Using the same strategy as in the proof of
Theorem 3.1.1 we have,
(
√t
)n ( t ) inAt σ2√ − t2 [ ]
µ Z √ = e n 2 1 +Qn(t) + o(n−r/2) (3.33)
n n
where Qn(t) is as in (3.10{). Define Rj as(in (3.15).√ r √ )}1 (k−nA)2 ∑ (R (k − nA)/ n)
2π nP p(Sn = k)− 2π √ e− 2σ2n 1 +
∫ ( 2π
j/2
) ( ) nj=1√π n
−√itk n
= e n√ ∫ µ √
t √tZ dt
−π n n n
∞ ∫
− it(k√−nA)
∞ 2 2
− e n e− 2 2 −√
itk
σ t /2 σ tdt− e n e− 2 Qn(t) dt+ o(n−r/2).
−∞ −∞
We estimate th∫e RHS by estimating the three integrals given below,√δ n ( ) ( )
−√itk t n t − it(k√−nA) σ2− t2I n n1 = 2∫ √ e µ √ Z √ − e e [1 +Qn(t)] dt−δ n n ( )n ( )
−√itk t n t
I2 = √ √ e
nµ √ Z √ dt
δ n<|t|<π n n n
68
∫
− it(k√−nA) σ2t2
I n −3 = e e 2√ [1 +Qn(t)] dt.
|t|>δ n
Clearly, |I3| decays to 0 exponentially fast as n → ∞. Also, |µ(2π)| = 1 and
|µ(t)| ∈ (0, 1) for 0 < |t| < 2π. Therefore, there exists  > 0 such that |µ(t)| <  on
δ ≤ |t| ≤ π. Put M = max |Z(t)|. Then,
δ≤|t|≤π
√ ∫ √
|I n2| ≤M n |µ(t)| dt ≤ 2M(π − δ) nn.
<|t|<π
Hence, |I2| decays to 0 exponentially fast as n→∞. From (3.33), we have that
[ ( ) ( ) ]
−√itk t n t i√nAt σ2t2 σ2t2
e n µ √ Z √ − e n e− 2 [1 +Qn(t)] = e− −r/22 o(n ).
n n
This implies |I | = o(n−r/21 ). Combining these estimates we have the required result.
3.3 Computing coefficients.
∫
Since E(eitSn) dt decays sufficiently fast, the Edgeworth expansion, and
|t|>δ
hence its coefficients, depend only on the Taylor expansion of E(eitSn) about 0. Here
we relate the coefficients of Edgeworth polynomials to the asymptotics of moments
of Sn by relating them to derivatives of µ(t) and Z(t) at 0.
Suppose (A1) through (A4) are satisfied with s = r + 2. Recall (3.6):
E(eitSn) = µ (t)n ` (Πtv) + ` (Λnt v) . (3.34)
Put Z(t) = ` (Πtv) as before. Also write Un(t) = ` (Λ
n
t v). We already know that
µ(t), Z(t) and U(t) are r+ 2 times continuously differentiable. Using (3.13) one can
69
show further that the derivatives of Un(t) satisfy:
sup ‖U (k)n ‖ ≤ Cεn0
|t|≤δ
for all n and for all 1 ≤ k ≤ r + 2.
Taking the first derivative of (3.34) at t = 0 we have:
(S )n
iE(S ′ ′ ′n) = nµ (0) + Z (0) + Un(0) =⇒ lim iE = µ′(0).
n→∞ n
In fact, using the Taylor expansion of log µ(t) and above limit one can conclude that
the number A we used in the statement of the CLT in (3.2), is given by
(S )n
A = lim E .
n→∞ n
Therefore one can rewrite (3.6) as
E(eit(Sn−nA)) = e−ntµ′(0)µ (t)n Z(t) + Un(t) (3.35)
′ (k)
where U (t) = e−ntµ (0)n Un(t). Also note that its derivatives satisfy ‖Un ‖∞ = O(εn0 )
for all 1 ≤ k ≤ r + 2.
From (3.35), it follows that moments of Sn−nA can be expanded in powers of
n with coefficients depending on derivatives of µ and Z at 0. However, only powers
of n upto order k/2 will appear. We prove this fact below.
Lemma 3.3.1. Let 1 ≤ k ≤ r + 2. Then for large n,
( k ) b∑k/2cE [Sn − nA] = a j nk,jn +O(0 ). (3.36)
j=0
Proof. We first note that taking the ∣kth derivative of (3.35) at t = 0,( ) dk ∣ [ ]′ (k)
ikE [S − nA]k = ∣∣ e−ntµ (0)n µ (t)n Z(t) + U (0)dtk t=0
70
∣
dk ∣∣∣ [ ]= e−ntµ′(0)µ (t)n Z(t) +O(n).dtk 0t=0
Observe that all the derivatives of e−ntµ
′(0)µ (t)n Z(t) will∣onl[y have positive inte]graldk ′
p∑owers of n (possibly) up to order k. Therefore,
∣ e−ntµ (0)µ (t)n Z(t) =
dtk t=0
k
ak,jn
j. We claim that for j > k/2, ak,j = 0. This claim proves the result.
j=0
We notice that the first derivative of e−tµ
′(0)µ (t) at t = 0∣ is 0. Thus we provedk
the more general claim that if g(0) = 1 and g′(0) = 0 then ∣ [g(t)nZ(t)] has no
dtk t=0
terms with powers of n greater than k/2. From the Leibniz rule,
∣∣∣∣ ∑k ( ) ∣dk n k dl ∣[g(t) Z(t)] = Z(k−l)(0) ∣ [g(t)n].dtk l ∣t=0 l dtl=0 t=0
dl ∣
Therefore it is enough to prove that ∣ [g(t)n] has no powers of n greater than
dtl t=0
l/2.
To this end we use the order l Taylor expansion of g(t) about t = 0. Since
g′(0) = 0 and g is r + 2 times continuously differentiable for l ≤ r + 2 there exists
φ(t) continuous such that,
g(t) = 1 + a t2 + · · ·+ a tl + tl+12∑ l φ(t)n!
=⇒ g(t)n = (a t22 )k2 . . . t(l+1)kl+1φ(t)kl+1
∑ k !k··· 0 2! . . . kl+1!k0+k2+ +kl+1=n Ck0k2...k n!= l+1 t2k2+···+(l+1)kl+1φ(t)kl+1 .
k0!k2! . . . kl+1!
k0+k2+···+kl+1=n
After combining and rearranging terms according to powers of t, we can obtain
the order l Taylor expansion of g(t)n. Notice that if kl+1 ≥ 1 then 2k2 + · · · + (l +
1)kl+1 ≥ l + 1. Terms with kl+1 ≥ 1 are part of the error term of the order
l Taylor expansion of g(t)n. Since our focus is on the derivative at t = 0, the
71
only terms that matter are terms with kl+1 = 0 and 2k2 + · · · + lkl = l. This
l
implies that k2 + · · · + kl ≤ . Because ki’s are non-negative integers, this means
2
k2 + · · ·
l
+ kl ≤ b c
l
. Hence, k0 ≥ n− b c.
2 2
dl
This analysis shows that the largest contribution to ∣∣ [g(t)n] comes from
dtl t=0
the term,
C(n−b(l c),1,...,1,0),...,0 n!2 tl
n− b l c !
2
whose kth derivative at 0 is,
C ( ⌊ ⌋ )(n−b (l c),1,...,1,0,...,0 l! n! l l2 ) = C b cl
n− b l c ! (n−b c
2
),1,...,1,0,...,0 l! n . . . n− + 1 = O(n ).
2 2
2
Therefore,
dl ∣∣∣ l[g(t)n] = O(nb c2 ).
dtl t=0
It is immediate from the proof that the coefficients ak,j are determined by
the derivatives of µ(t) and Z(t) near 0. For example, the constant term ak,0 =
(−i)kZ(k)(0). This follows from these three facts. The expansion (3.36) is the kth
′
derivative of the product of the three functions e−ntµ (0), µ (t)n and Z(t) at t = 0.
′
All derivatives of µ (t)n and e−ntµ (0) at t = 0 contain powers of n and thus, ak,0
corresponds to the term Z(t) being differentiated k times in the Leibneiz rule. Both
e−ntµ
′(0) and µ (t)n are 1 at t = 0. We will see later that the other coefficients ak,j
are combinations of µ′(0) = iA, higher order derivatives of µ at 0 upto order k and
derivatives of Z at 0 upto order k − 1.
As a corollary to Lemma 3.3.1, we conclude that asymptotic moments of orders
upto r + 2 exist. These provide us an alternative way to describe ak,j.
72
m
Corollary 3.3.2. For all 1 ≤ m ≤ r + 2 and 0 ≤ j ≤ ,
2
E m([Sn − nA]m)− nj+1am,j+1 − · · · − nb c2 am,bm c
a 2m,j = lim .
n→∞ nj
Proof. When m = 1, E([Sn − nA]) = a1,0 + O(n0 ) and it is immediate that a1,0 =
lim E([Sn − nA]). For arbitrary k we have,
n→∞
( )
E [Sn − nA]k = a bk/2ck,bk/2cn + a nbk/2c−1k,bk/2c−1 + · · ·+ ak,0 +O(n0 )
and dividing by n we obtain,
( )
E [Sn − nA]k ( 1)
b c = ak,bk/2c +O .n k/2 n
Now, it is immediate that,
( )
E [Sn − nA]k
ak,bk/2c = lim b c .n→∞ n k/2
k
Having computed ak,j, for r ≤ j ≤ b c, we can write,
2
( )
E [Sn − nA]k − a bk/2ck,bk/2cn − · · · − ak,rnr = a r−1k,r−1n + · · ·+ ak,0 +O(n0 ).
Dividing by nr−1, we obtain,
( )
E [Sn − nA]k − nrak,r − · · · − nbk/2ca ( )k,bk/2c 1
= a +O .
nr−1
k,r−1
n
Now, we can compute am+1(,r−1
,
k )E [Sn − nA] − nra bk/2ck,r − · · · − n ak,bk/2c
ak,r−1 = lim − .n→∞ nr 1
This proves the Corollary for arbitrary k ∈ {1, . . . , r + 2}.
73
Because the coefficients of polynomials Ap(t) (see (3.10)) are combinations of
derivatives of µ(t) and Z(t) at t = 0, we can write them explicitly in terms of ak,j,
and hence, by applying Corollary 3.3.2, the coefficients of Edgeworth polynomials
can be expressed in terms of moments of Sn. Next, we will introduce a recursive
algorithm to do this and illustrate the process by computing the first and second
Edgeworth polynomials.
Taking the first derivative of (3.35) at t = 0,
iE ′([Sn − nA]) = Z ′(0) + Un(0).
Then,
a ′1,0 = lim E([Sn − nA]) = −iZ (0).
n→∞
Next, taking the second derivative of (3.35) at t = 0 we have,
′′
i2E([S − nA]2n ) = n[µ′′(0)− µ′(0)2] + Z ′′(0) + Un(0).
Therefore, dividing by n and takin(g the limit we)have,[ ]2
2 Sn − nAa ′ 2 ′′2,1 = σ = lim E √ = µ (0) − µ (0). (3.37)
n→∞ n
Once we have found a2,1 we can find
( )
a2,0 = lim E([Sn − nA]2)− nσ2 = −Z ′′(0).
n→∞
We can repeat this procedure iteratively. For example, after we compute the 3rd
derivative of (3.35) at t = 0:
i3E([S − nA]3) = Z(3)n (0) + nµ′(0)[2µ′(0)2 − 3µ′′(0)] + nµ(3)(0)
74
+ 3nZ ′(0)[µ′(0)2 − ′′ (3)µ (0)] + Un (0)
we get that,
1 ( )
a3,1 = lim E [Sn − nA]3 = −A(3σ2 + A2) + iµ(3)(0)− 3iσ2Z ′(0)
n→∞ n
= −A(3σ2 + A2) + iµ(3)(0) + 3σ2a1,0.
This gives us µ(3)(0) and Z(3)(0) in terms of asymptotics of moments of Sn:
iµ(3)(0) = a3,1 + A(3σ
2 + A2)− 3σ2a1,0
( )
iZ(3)(0) = lim E([Sn − nA]3)− na→∞ 3,1 .n
Given that we have all the coefficients ak,j, 1 ≤ k ≤ m computed and
µ(k)(0), Z(k)(0) for 1 ≤ k ≤ m expressed in terms of the former, we can compute
a and express µ(m+1)m+1,j (0), Z
(m+1)(0) in terms of ak,j, 1 ≤ k ≤ m+ 1.
To see this note that µ(m+1)(0) appears only as a result of µn(t) being differ-
entiated m+ 1 times. So, µ(m+1)(0) only appears in derivatives of order m+ 1 and
higher. It is also easy to see that it appears in the form nµ(m+1)(0) in the (m+ 1)th
derivative of (3.35). Thus, it is a part of am+1,1 and all the other terms in am+1,1 are
products of µ(k)(0), Z(k)(0) for 1 ≤ k ≤ m whose orders add upto m + 1 and hence
they are products of ak,j, 1 ≤ k ≤ m.
Also, Zm+1(0) appears only in am+1,0. This is because Z
m+1(0) appears only
as a result of Z(t) being differentiated m + 1 times. Thus, it appears only in
derivatives of (3.35) of order m+ 1 or higher. In the (m+ 1)th derivative of (3.35),
′
there is only one term containing Z(m+1)(t) and it is e−ntµ (0)µ (t)n Zm+1(t). So
am+1,0 = (−i)m+1Zm+1(0).
75
Using Corollary 3.3.2, we have,
( )
E [Sn − nA]m+1
am+1,bm+1 c = lim→∞ bm+1
.
2 n n c2
m+ 1
Having computed am+1,j, for r ≤ j ≤ b c, we compute a( ) m+1,r−1
:
2
E [Sn − nA]m+1 − nram+1,r − · · · − b
m+1
n c2 am+1,bm+1 c
a 2m+1,r−1 = lim .
n→∞ nr−1
This gives us Z(m+1)(0) = im+1a m+1m+1,0 and µ (0) in terms of am+1,1 and ak,j,
1 ≤ k ≤ m i.e. explicitly in terms of moments of Sn. Proceeding inductively we can
compute all the derivatives upto order r of µ(t) and Z(t) at t = 0 in this manner
by taking derivatives up to order r of (3.35) at t = 0. This is possible because
our assumptions guarantee the existence of the first r + 2 derivatives of (3.35) near
t = 0.
Remark 3.3.1. This representation of µ(k)(0) and Z(k)(0) in terms of ak,j is not
unique. However, it is convenient to choose the ak,j’s with the lowest possible indices.
The inductive procedure explained above yields exactly this representation.
We will illustrate how the first and the second order Edgeworth expansion
can be computed explicitly once we have µ(4)(0), µ(3)(0), Z ′′(0) and Z ′(0) in terms
of asymptotic moments of Sn. Because A0(t) = 1 we have R0(t) = 1. From the
derivation of (3.9) we have,
t3 3
A (t) = (log µ)(3)(0) − Z ′(0)t = (µ(3) t1 ( (0)− 3µ
′′(0)µ′(0) +)2µ
′(0)3) − Z ′(0)t
6 6
3
= µ(3)
t
(0) + iA(3σ2 + A2) − Z ′(0)t
6
(it)3
= (a 23,1 − 3σ a1,0) − a1,0(it).
6
76
After taking the inverse Fourier transform as shown in (3.15) we have,
(a3,1 − 3σ2a1,0) a1,0
R1(x) = x(3σ
2 − x2) + x.
6σ6 σ2
Using (3.16) we obtain the firs(t Edgeworth p)olynomial,
a3,1 − 3σ2a1,0 2 − 2 − a1,0P1(x) = (σ x ) .
6σ4 σ
Similar calculations give us,
(it)6 [
A2(t) = (a3,1 + 3σ
2a )21,0 + A
2(6σ2 + A4]) + 4a3,1(A− 2a1,0)72 4 2
− (it) (it)3σ2(2a 2 22,0 − 4Aa1,0 + σ ) + a4,1 + (2a
24 1,0
− a2,0) .
2
From (3.15) and (3.16) we have,
x62 2 − 15σ2x4 + 45σ4x2 − 15σ6R2(t) =(a3,1[+ 3σ a1,0) 72σ12 ]
+ A2(6σ2 + A4) + 4a3,1(A− 2a 2 21,0)− 3σ (2a2,0 − 4Aa1,0 + σ ) + a4,1
× (x
4 − 6σ2x2 + 3σ2) 2 − (x
2 − σ2)
+ (2a1,0 a2,0) ,24σ8 2σ4
x(15σ2 − 10σ2x2 + x6)
P2(t) =(a3,1[+ 3σ
2a 21,0)
72σ10 ]
+ A2(6σ2 + A4) + 4a (A− 2a )− 3σ23,1 1,0 (2a 22,0 − 4Aa1,0 + σ ) + a4,1
× x(3σ
2 − x2) 2 − x+ (2a1,0 a2,0) .24σ6 2σ2
Remark 3.3.2. Once we have Rp for p ∈ N0 and Pp for p ∈ N, the polynomials
Pp,g, Pp,d and Pp,a are given by Pp,g = Pp,d = Rp and Pp,a = Pp. These relations were
obtained in the proofs in section 3.2.
Also, one can compute Pp,l using (3.28
∑ ∫
):
(−ix)j σ2t2
Pp,l(x) = t
jA (t)e−l 2 dt.
j!
l+j=2p
77
For example, ∫ √
σ2t2 2π
P −0,l(x) = A0(t)e 2 dt = .
σ2
∫ ∫ ∫
σ2t2 σ2 2 x2t σ2t2
P −1,l(x) = A2(t)e 2 dt − ix tA1(t)e− 2 dt− t2A −0(t)e 2 dt
2
P√1,l(x) =(a 2 2 53,1 + 3σ a1,0)
2π [ 24σ7 ]
+ A2(6σ2 + A4
1
) + 4a3,1(A− 2a )(− 3σ2 21,0 (2a2,0 − 4Aa1,0 + σ ))+ a4,1 8σ5
2
− (2a21,0 −
1
a2,0) − (a3,1 − 2
1 2a1,0 x x
3σ a
6 1,0
) + −
2σ σ5 σ3 2 2σ3
Higher order Edgeworth polynomials can be computed similarly.
We can compare our results with the centered i.i.d. case. Then, we have that
1
A = 0, a1,0 = 0 because the sequence is stationary. Also, a3,1 = lim E([S −
n→∞ nn
nA]3) = E((X1−A)3), a2,0 = 0 and a4,1 = E(X41 ). So, the above polynomials reduce
to,
E(X3) E(X3) E(X3)
A1(t) =
1 (it)3, R1(x) =
1 x(3σ2 − x2), P1(x) = 1 (σ2 − x2)
6 6σ6 6σ4
(it)6 4
A2(t) = E(X3 2 41 ) + (E(X1 )− 4
(it)
3σ )
72 ( 24P 3 2 4 ) 3√0,l(x) 1 P√1,l(x) E(X1 ) 5 E(X1 ) − 3 1 − E(X1 ) x − 1 x2= , = +
2π σ 2π σ7 24 σ5 σ 8 σ5 2 σ3 2
These agree with the polynomials found in [20, Chapter XVI] (to see this one has to
replace x by x/σ to make up for not normalizing by σ here) and [4]. The polynomials
1
Qk found in the latter are related to Pk,l by Qk(x) = Pk,l(x).
2π
It is also easy to see that these agree with previous work on non-i.i.d. examples.
In both [9, 29] only the first order Edgeworth polynomial is given explicitly. In [9],
because the sequence is stationary and centered, we can take A = 0 and a1,0 = 0.
78
Also, the pressure P (t) given there, corresponds to log µ(t) here. So we recover
3
A (t) = P ′′′
(it)
1 (0) in [9, Theorem 3]. In [29], sequence is centered but not assumed
6
to be stationary. So A = 0 and a1,0 6= 0 and the asymptotic bias appears in the
(it)3
expansion and A (t) = iµ(3)1 (0) − a1,0(it) which agrees with [29, Theorem 8.1].
6
This dependence on initial distribution corresponds to presence of ` in (3.1).
3.4 Applications.
3.4.1 Local Limit Theorem.
Existence of the Edgeworth expansion allows us to derive Local Limit Theo-
rems (LLTs). For example see [16, Theorem 4]. Also, as direct consequences of weak
global Edgeworth expansions, an LCLT comparable to the one given in [27, Chapter
II], holds. In fact, a stronger version of LCLT holds true in special cases.
To make the nota(tion)simpler, we assume that the asymptotic mean of SN isSN
0. That is A = lim E = 0.
N→∞ N
Proposition 3.4.1. Suppose that SN satisfies the weak global Edgeworth expansion
of order 0 for an integrable function f ∈ (F , ‖·‖) where ‖·‖ is translation invariant.
Further, assume that |xf(x)| is integrable. Then,
√ ∫u2
NE(f(S − u)) = √ 1 e− 2Nσ2N f(x) dx+ o(1) (3.38)
2πσ2
uniformly for u ∈ R.
√
Proof. After the change of variables z N → z in the RHS of the weak global
79
Edgeworth expansion,
√
NE(f(∫SN −( u))
√z
)
= ∫ [n ( )f(z − u)dz +(‖f‖o(N )1])
= √u ′ √zu( n )∫ + (z − u)n ∫ f(z − u()dz +)‖f‖o(1)N N
= n √u − C zuf(z u) dz + (z − u)n √ f(z − u)dz + ‖f‖o(1)
N N N
Here zu is between u and z and depends continuously on u.
Notice that,
∣∣∣ ∫ ( z ) ∣∣ ∫u(z − u)n √ f(z − u)dz∣ ≤ |(z − u)f(z − u)|dz ≤ ‖xf‖1
N
Therefore, after a change of variables z − u→ z in the RHS,
√ ( u )∫
NE(f(SN − u)) = n √ f(z)dz + max{‖xf‖1, ‖f‖} o(1)
N
as required.
In particular, the result holds for F = F 10 . If the order 0 weak global Edge-
worth expansion holds for all f ∈ F 10 , then we have the following corollary. We note
that this is indeed the case for faster decaying |E(eitSN )| as in Markov chains and
piecewise expanding maps described in sections 3.5.3.1, 3.5.3.2 and 3.5.4.
Corollary 3.4.2. Suppose that SN admits the weak global Edgeworth expansion of
order 0 for all f ∈ F 10 . Then, for all a < b,
√
N ( ) u2
P SN ∈
1
(u+ a, u+ b) = √ e− 2Nσ2 + o(1)
(b− a) 2πσ2
uniformly in u ∈ R.
80
Proof. Fix a < b. It is elementary to see that there exists a sequence fk ∈ F 10
with compact support such that fk → 1(u+a,u+b) point-wise and fk’s are uniformly
bounded in F 11 . This bound can be chosen uniformly in u, call it C.
Therefore, from the proof of Proposition 3.4.1, we have,
√ ( )∫
NE(fk(SN −
u
u)) = n √ fk(z)dz + C11(fk) o(1)
N
Because 0 ≤ C11(fk) ≤ C, taking the limit as k →∞ we conclude,
√ ( ) ( u )∫ u+b
NP SN ∈ (u+ a, u+ b) = n √ 1 dz + C o(1)
N u+a
and the result follows.
In fact, u in the previous theorem need not be fixed. For example, for a
uN
sequence uN with √ → u, we have the following:
N
Corollary 3.4.3. Suppose that SN admits the weak global Edgeworth expansion of
uN
order 0 for all f ∈ F 10 . Let uN be a sequence such that lim √ = u. Then, for all
N→∞ N
a < b,
√
N ( ) 1 u2
lim P SN ∈ (uN + a, uN + b) = √ e− 2σ2 .
N→∞ (b− a) 2πσ2
Now, we state the stronger version of LCLT in which we allow intervals to
shrink.
Definition 8. Given a sequence N in R+ with N → 0 as N →∞, we say that SN
admits an LCLT for N if we have,
√
N ( ) 1 u2
P SN ∈ (u− N , u+  −N) = √ e 2Nσ2 + o(1)
2N 2πσ2
uniformly in u ∈ R.
81
The next proposition gives a existence of weak global Edgeworth expansions
as a sufficient condition for SN to admit a LCLT for a sequence N . Notice that
existence of higher order expansions allow N to decay faster. In case expansions of
all orders exist, N can decay at any subexponential rate.
Proposition 3.4.4. Suppose that SN satisfies the weak global Edgeworth expansion
of order r (≥ 1) for all f ∈ F 10 . Let N be a sequence of positive real numbers such
that N → 0 and NN r/2 →∞ as N → ∞. Then, SN admits an LCLT for N .
Proof. WLOG assume N < 1 for all N . As in the previous proof, there exists a
sequence fk ∈ F 10 with compact support such that fk → 1(u− ,u+ ) point-wise andN N
fk’s are uniformly bounded in F
1
0 . This bound can be chosen uniformly in N and
u, call it C.
Let N ∈ N. Note that for all k,
∑r ∫1 ( √ ) ( )
E(fk(SN)) = p Pp,g(z)n(z)f 1 −(r+1)/2k z N dz + C0(fk) o N .
N 2
p=0
By taking the limit as k →∞ and using the fact 0 ≤ C10(fk) ≤ C, we conclude,
( ) ∑r ∫ u√+N1 N ( )
P SN ∈ (u− N , u+ N) = p Pp,g(z)n(z) dz + C o N−(r+1)/2 .
N 2 u−N
p=0 √N
z
After a change of variables z → √ in the p = 0 term and divide the whole
N
equation by 2N to get,
√
N ( )
P S∫N ∈ (u− N , u+ N)2N ( ) ∑r √ ∫ u1 √+N ( )N
= 1J (z −
z N 1
u)n √ dz + P
N p p,g(z)n(z) dz + C o2N N 2 u− r/2p=1 NN 2 √ N NNN
where JN = (−N , N).
82
Note that for p ≥ 1, there exists Cp such that |Pp,g(z)n(z)| < Cp. Therefore,∣∣∣ √ √∣ ∫ u√+N ∣∣ ∫ u√+NN N Cp N N Cpp Pp,g(z)n(z) dz∣∣ ≤ p 1 dz ≤ = o(1)2 u− u− p/2NN 2 √ N 2NN 2 √ N N
N N
Also, as in the proof of Proposition 3.4.1,
∫ ∫
1 ( ) (− √z 1 √u ) u+N1J (z u)n dz = n 1 dz
2 NN N 2N N u−N ∫
C u+N ( )
+ (z − zuu)n √ dz
2NN u−N N
Note that,
∣∣∣∣ ∫ u+ ( ) ∣∣∣∣ ∫C N − √z C u+Nu(z u)n dz ≤ | − | CNz u dz =2NN u− N 2NN u− 2NN N
Therefore, ∫
1 (− z ) ( u )1J (z u)n √ dz = n √ + o(1).
2 NN N N
Combining these estimates with  N r/2N →∞ we have that,
√
N ( ) ( u )
P SN ∈ (u− N , u+ N) = n √ + o(1)
2N N
and it is straightforward from the proof that this is uniform.
Remark 3.4.1. We note that this result implies [16, Theorem 4] because existence
of classical Edgeworth expansions imply the existence of the weak global Edgeworth
expansion and this result is uniform in u.
3.4.2 Moderate Deviations.
While the CLT describes the typical behaviour or ordinary deviations from the
mean provided by the law of large numbers, it is not sufficient to understand prop-
83
erties of distribution of Xn completely. Therefore, the study of excessive deviations
is important.
For example, deviations of order n are called large deviations. An exponential
moment condition is required for a large deviation√principle to hold, even for the
i.i.d. case. However, when deviations are of order n log n (moderate deviations)
this is not the case. We show here that a moderate deviation principle holds for SN
under a weaker assumption than the exponential moment assumption.
It is also worth noting that moderate deviations have numerous applications in
areas like statistical physics and risk analysis. For example, moderate deviations are
greatly involved in the computation of Bayes risk efficiency. See [44] for details.
Proposition 3.4.5. Suppose SN admits the order r Edgeworth expansion. Then
√
for all c ∈ (0, r), when 1 ≤ x ≤ cσ2(lnN, )
1− P SN√−AN ≤ x
N
lim = 1. (3.39)
N→∞ 1−N(x)
Proof. Note that,
[ (
− − − S
)] ( )
P N1 N(x) 1 √− AN ≤ SN − ANx = P √ ≤ x −N(x)
N ∑ Nr Pp(x) ( )
= n(x) + o N−r/2
Np/2
p=1
√
uniformly in x. So it is enough to show that for 1 ≤ x ≤ cσ2 lnN ,
Pp(x)n(x) N
−r/2
lim = 0 and = o(1)
N→∞ Np/2(1−N(x)) 1−N(x)
Note that for x ≥ 1,
2
− σ n(x)
(
O n(x)
)
1 N(x) = + .
x x3
84
Thus,
N−r/2 N−r/2 (√ −r/2 )≤ √ = O NlnN c
1−N(x) 1−N( cσ2 lnN) ( e−) lnN2
O lnN=
N (r−c)/2
Say Pp(x) is of degree q. Then for some C and K,
∣∣∣ Pp(x)n(x) ∣∣∣ ≤ (xq +K)n(x) (xq +K) ( ( 1 ))C = C x 1 +O
Np/2(1−N(x)) Np/2(1−N(x)) Np/2 x2
≤ (lnN)
q+1
C → 0 as N →∞.
Np/2
This completes the proof of (3.39).
Proposition 3.4.5 is a generalization of the results on moderate deviations
found in [43] to the non-i.i.d. case along with improvements on the moment condi-
tion. It should be noted that [4] contains an improvement of the moment condition
for the i.i.d. case. But the proof we present here is different from the proof presented
in [4].
As an immediate corollary to the above theorem, we can state the following
first order asymptotic for probability of moderate deviations.
Corollary 3.4.6. Assume SN admits the order r Edgeworth expansion. Then for
all c ∈ (0, r),
√
P(SN ≥
1 1
AN + cσ2N lnN) ∼ √ √ .
2πc N c lnN
3.5 Examples
Here we give several examples of systems satisfying assumptions (A1)–(A4).
85
3.5.1 Independent variables.
Let Xn be i.i.d. with r + 2 moments. In this case we can take B = R, and
define L v = E(eitX1t v) = φ(t)v where φ is the characteristic function of X1. Here
we have taken ` = 1. Put v = 1. Then, the independence of the random variables
gives us, Lnt 1 = E(eitSn) = φ(t)n. Also, the moment condition implies t → φ(t) is
Cr+2. This means (A1) is satisfied. (A2) is clear.
Suppose X1 is l−Diophantine. That is there exists C > 0 and t0 > 0 such that
C − C
for all |t| > t0, |φ(t)| < 1− . Then |φ(t)| ≤ e |t|l . So |φ(t)| < 1 for all t =6 0. So|t|l
we have (A3). Also, this implies that X1 is non-lattice. An easy computation shows
1
that when r1 < , there exists r2 such that t < |t| < nr10 =⇒ |φ(t)|n ≤ n−r2 . In
l
fact, |φ(t)|n ≤ e−cnα 1where α = 1− r1l > 0. So, (A4) is satisfied with r1 < .
l
r − 1
When l = 0 we see that (A4) is satisfied with r1 > . Hence, by Theo-
2
rem 3.1.1 order r Edgeworth expansion for Sn exists. This is exactly the classical
result of Cramér because the condition: lim sup |φ(t)| < 1 corresponds to l = 0.
|t|→∞
r + 1 (r + 1)l
Choose q > > . Then, by Theorem 3.1.4 and Theorem 3.1.5 we
2r1 2
have that Sn admits weak global expansion for f ∈ F q+20 and weak local expansion
for f ∈ F q+2r+1 . These are similar to the results appearing in [4] but slightly weaker
(r + 1)l
because we require one more derivative: q + 2 > 2 + as opposed to 1 +
2
(r + 1)l
. This is because we do not use the optimal conditions for the integrability
2
of the Fourier transform. If we required f ∈ F q+1 and f (q+1)r to be α−Hölder
for small α, then the proof would still hold true and we could recover the results
in [4].
86
3.5.2 Finite state Markov chains.
Here we present a non-trivial example for which the weak Edgeworth expan-
sions exist but the strong expansion does not exist.
Consider the Markov chain xn with states S = {1, . . . , d} whose transition
probability matrix P = (pjk)d×d is positive. Then, by the Perron-Forbenius theorem,
1 is a simple eigenvalue of P and all other eigenvalues are strictly contained inside the
unit disk. Suppose h = (hjk)d×d ∈ M(d,R) and that there does not exist constants
c, r and a d−vector H such that
rhjk = c+H(k)−H(j) mod 2π
for all j, k. Put Xn = hxnxn+1 .
For the family of operators L : Cd → Cd,
∑td
(L f) = eithjkt j pjkfk, j = 1, . . . , d (3.40)
k=1
v = 1 and ` = µ0, the initial distribution, we have (3.1).
Define br,j,k = hrj + hjk for all j, r = 1, . . . , d and k = 2, . . . , d. Put d(s) =
max {(br,j,k − br,1,k)s} where { . } denotes the fractional part. We further assume
that h is β−Diophantine, that is, there exists K ∈ R such that for all |s| > 1,
K
d(s) ≥ . (3.41)
|s|β
1
If β > then almost all h are β−Diophantine.
d2(d− 1)− 1
2
Because Sn can take at most O(nd −1) distinct values, Sn has a maximal jump
2
of order at least n−(d −1). Therefore, the process Xhn = hxnxn−1 does not admit the
order 2(d2 − 1) Edgeworth expansion.
87
The Perron-Forbenius theorem implies that the operator L0 satisfies (A2).
Because (3.40) is a finite sum, it is clear that t 7→ Lt is analytic on R. So we also
have (A1). Also the spectral radius of Lt is at most 1. Assume Lt has an eigenvalue
on the unit circle, say eiλ, with eigenvector f , then,
∑d
eiλf ithjkj = (Ltf)j = e pjkfk
k=1
Assuming max |fj| = |fr|,
j
∣∣∑d
|fr| = |eiλf | = ∣∣ eithjkr pjkfk∣∣∣∣ ∑d ∑d≤ pjk|fk| =⇒ pjk(|fk| − |fr|) ≥ 0
k=1 k=1 k=1
Because |fk|− |fr| ≤ 0 for all k and pjk ≥ 0 for all j and k we have |fk| = |fr| for all
k. Therefore, there exist a d−vector H such that f = ReiH(k)k for all k. Then,
∑d
eiλReiH(j) = eithjkpjkRe
iH(k)
∑k=1d
0 = p (ei(thjk+H(k)−H(j)−λ)jk − 1)
k=1
=⇒ thjk = λ+H(j)−H(k) mod 2π
But this is a contradiction. Therefore, (A3) holds. Next we notice that,
|(L2f) | = ∣∣∣∣∑d ∑d ∣∣∣∣ ∣eit(hrj+hjk)t r prjpjkfk = ∣∣∣∑d (∑d ) ∣∣eit(hrj+hjk)prjp f ∣jk k∣
j=1 k=1 k=1(∑j=1d ∣∣∣∣∑d ∣∣)≤ ‖f‖ eitbr,j,kp ∣rjpjk∣ (3.42)
k=1 j=1
Now we estimate |br,k(t)| where
∑d ∑d
br,k(t) = e
itbr,j,kp p = eitbr,1,k eit(br,j,k−br,1,k)rj jk prjpjk
j=1 j=1
88
Then we have,
∑d ∑d
|b (t)|2 = p2 2r,k rjpjk + 2 prjpjkprlplk cos((br,j,k − br,l,k)t)
(j=∑1 ) j>ld 2 ∑d
= prjpjk − 2 prjpjkprlplk[1− cos((br,j,k − br,l,k)t)]
(∑j=1 j>ld )2
= p 2rjpjk − 2Cd(t) +O(d(t)3), C > 0
∑j=1d
|br,k(t)| = prjpjk − C̃d(t)2 +O(d(t)3), C̃ > 0
j=1
Therefore,
∑d ∣∣∣∣∑d ∣∣ ∑d (∑d )eitbr,j,kp p ∣rj jk∣ = prjpjk − Cd(t)2 +O(d(t)3)
k=1 j=1 k=1 j=1
= 1− Cd(t)2 +O(d(t)3), C > 0
From the Diophantine condition (3.41), we can conclude that there exists θ > 0 such
that for all |t| > 1,
( )
‖L2t‖ ≤ 1− θd(t)2 =⇒ ‖LNt ‖ ≤ − 2
dN/2e
1 θd(t) ≤ e−θd(t)2N/2 ≤ −θt−2βe N/2
1−  1− 
When 1 < |t| < N 2β , we have, ‖LN‖ ≤ e−θN /2t which gives us (A4) with r1 = 2β
r + 1
where  > 0 can be made as small as required. Because for small , d e =
2(1− )
dr + 1e r + 1, choosing q > β, we conclude that for f ∈ F q+20 weak global and for2 2
f ∈ F q+2r+1 weak local Edgeworth expansions of order r for the process Xhn exist. Also,
SN admits averaged Edgeworth expansions of order r for f ∈ F 20 . In the special
1
case of β > , these hold for a full measure set of h even though the
d2(d− 1)− 1
order r strong expansion does not exist for r + 1 ≥ d2.
89
3.5.3 More general Markov chains.
3.5.3.1 Chains with smooth transition density.
First we consider the case where xn is a time homogeneous Markov process
on a compact connected manifold M with smooth transition density p(x, y) which
is bounded away from 0, and Xn = h(xn−1, xn) for a piece-wise smooth function
h :M×M→ R. We assume that h(x, y) can not be written in the form
h(x, y) = H(y)−H(x) + c(x, y) (3.43)
where c(x, y) is piece-wise constant.
In particular, there is no constant c and a function H such that h(x, y) =
H(y)−H(x)+c. Also, the transition probability P (x, dy) of Xn has a non-degenrate
absolute continuous component. Then, by [25], the CLT holds with σ2 > 0.
To check the assumption 3.43 we need the following:
Lemma 3.5.1. (3.43) holds iff there exists o ∈ M such that the function x 7→
h(o, x) + h(x, y) is piece-wise constant for each y.
Proof. If (3.43) holds then for each o ∈M
h(o, x) + h(x, y) = c(o, x) + c(x, y) +H(y)−H(o)
where c(o, x) + c(x, y) is piece-wise constant in x for each y.
Conversely, suppose for some o ∈ M, x 7→ h(o, x) + h(x, y) is piece-wise
constant for each y. Fix y. Let c = h(o, o) and H(x) = h(o, x) − h(o, o). Then,
h(o, o) + h(o, y) and h(o, x) + h(x, y) differ by a piece-wise constant function. Then
90
(3.43) holds because h(o, x)+h(x, y)−(h(o, o)+h(o, y)) = h(x, y)+H(x)−H(y)−c
is piecewise constant.
Let B = L∞(M) and consider∫the family of integral operators,
(L u)(x) = p(x, y)eith(x,y)t u(y) dy.
Let µ be the initial distribution of the Markov chain and {Fn} be the filtration
adapted to the processes. Then, using the Markov property,
E [eitSnµ ] = Eµ[eitSn−1Lt1].
By induction we can conclude ∫
Eµ(eitSn) = Lnt 1 dµ
Because h is bounded, expanding eith(x,y) as a power series in t, we see that t 7→ Lt
is analytic for all t. This shows that (A1) is statisfied.
From the Weierstrass theorem there exist fun∑ctions qk, rk on M such thatn
p(x, y) is a uniform limit of functions of the form qk(x)rk(y). Therefore, Lt
k=1
is a uniform limit of finite rank operators and is compact. Compact operators
have a point spectrum hence the essential spectral radius of Lt vanishes. It is also
immediate that ‖Lt‖ ≤ 1 for all t. Hence the spectrum is contained in the closed
unit disk.
In addition, L : L∞(M)→ L∞0 (M∫ ) given by
(L0u)(x) = p(x, y)u(y) dy
is a positive operator. Note that (L01)(x) = 1 for all x. Thus, 1 is an eigenvalue
of L0 with eigenfunction 1. Also, eigenvalue 1 is simple and all other eigenvalues
91
β are such that |β| < 1. This follows from a direct application of Birkhoff Theory
(see [2]). Thus, we have (A2).
Next we show that if β ∈ sp(Lt), t =6 0 then |β| < 1. If not, then there exists
λ and u ∈ L∞(M) such that
∫
p(x, y)eith(x,y)u(y) dy = eiλu(x)
Suppose sup |u(x)| = R then for each  > 0 there exists x such that
x ∣∣∫ ∣∣ ∫
R−  ≤ |u(x )| = |eiλu(x )| = ∣∣ p(x, y)eith(x,y)u(y) dy∣  ∣ ≤ p(x, y)|u(y)| dy
Therefore, ∫
p(x, y)[|u(y)| −R] dy ≥ −,
But |u(y)| − R ≤ 0. Hence, |u(y)| = R a.e. Therefore, u(y) = Reiθ(y) a.e. for some
function θ and we may assume θ ∈ [0, 2π).
∫
p(x, y)eith(x,y)Reiθ(y) dy = Reiλeiθ(x)∫
=⇒ p(x, y)[ei(th(x,y)−λ+θ(y)−θ(x)) − 1] dy = 0
=⇒ th(x, y)− λ+ θ(y)− θ(x) ≡ 0 mod 2π (3.44)
Fix y and t. Then, for all z, x 7→ h(y, x) + h(x, z) does not depend on x modulo 2π
i.e. it is piece-wise constant for all t =6 0. By Lemma 3.5.1, h(x, y) satisfies (3.43).
This contradiction proves (A3).
Recall that if K is integral operator
∫
(Ku)(x) = k(x, y)u(y)dy
92
then ∫
‖K‖ = sup |k(x, y)|dy.
x
In our case L2t has the kernel, ∫
lt(x, y) = e
it[h(x,z)+h(z,y)]p(x, z)p(z, y)dz.
By Lemma 3.5.1 for each x and y the function z →7 (h(x, z)+h(z, y)) is not piecewise
constant. So its derivative (whenever it exists) is not identically 0. Thus there is
an open set Vx,y and a vector field e such that ∂e[h(x, z) + h(z, y)] 6= 0 on Vx,y.
Integrating by parts in the direction of e we conclude that
∫
lim eit[h(x,z)+h(z,y)]p(x, z)p(z, y)dz = 0.
t→∞ Vx,y
By compactness there are constants r0, ε0 such that for |t| ≥ r0 and all x and y in
M, |lt(x, y)| ≤ l0(x, y)− ε0. It follows that
∫ ∫
‖L2t‖ = sup |lt(x, y)|dy ≤ l0(x, y)dy − ε0. (3.45)
x M M
The first term here equals
∫∫
p(x, z)p(z, y)dzdy = 1.
M×M
Hence for |t| ≥ r , ‖L20 t‖ ≤ 1 − ε0 and so ||LN || ≤ (1 − ε )dN/2et 0 . This proves (A4)
with no restriction on r1. Therefore, SN admits Edgeworth expansions of all orders.
Next we look at the case when (3.43) fails but the constants are not lattice
valued. Then, arguments for (A1), (A2) and (A3) hold. In particular, (3.44) cannot
93
hold since it implies tha(t )
θ(y) − θ(x) λ 2πh(x, y) + ∈ + Z
t t t t
However, we have to impose a Diophantine condition on the values that h(x, y) can
take in order to obtain a sufficient control over ‖LNt ‖ and obtain (A4).
For fixed x, y let the range of z 7→ h(x, z) + h(z, y) be S = {c1, . . . , cd}. Note
that these ci’s may depend on x and y. However, there can be at most finitely many
values that h(x, z) + h(z, y) can take as x and y vary onM because h is piece-wise
smooth. So we might as well assume that S is this complete set of values. Also,
take Uk to be the open set on which z 7→ h(x, z) + h(z, y) takes value ck. Take
bk = ck − c1 and define d(s) = max {bks}. Assume further that there exists K > 0
such that for all |s| > 1,
K
d(s) ≥
|s|β
If β > (d− 1)−1 for almost all d−tuples c = (c1, . . . , cd), the above holds.
Note that,∫ ∣∣∫ ∣∣
|L2u(x)| = ∣∣ e∣it[h(x,z)+h(z,y)]t ∫ ∫ p(x, z)p(z, y) dz
∣ |u(y)| dy
∣∣∣∑ ∣d ∣∣∣
∣ ∫ ∣∣∑ ∣d
≤ ‖∫u‖ ∣
∣
eitck p(x, z)p(z, y) dz∣ dy = ‖u‖ ∣U ∣ p eitbkk ∣
∣∣∣ dy
k=1 k k=1
where and pk = p(x, z)p(z, y) dz. Therefore, p1 + · · ·+ pd = p(x, y).
Uk
Now the situ∣∣ ation is sim∣∣ ilar to that of (3.42) and a similar calculation yields,∣∣∣∑
d
p eitbkk ∣∣∣ = p(x, y)− Cd(t)2 +O(d(t)3), C > 0
k=1
Therefore, ∫ [ ]
‖L2t‖ ≤ p(x, y)− Cd(t)2 +O(d(t)3) dy = 1− C̃d(s)2
94
From this we can repeat the analysis done in the finite state Markov chains example
1−
following (3.42). In particular, when 1 < |t| < N 2β , there exists θ > 0 such that
‖LN t ‖ ≤ e−θN which gives us (A4).
Finally, when (3.43) fails and h takes integer values with span 1, Xn is a
lattice random variable and we can discuss the existence of the lattice Edgeworth
expansion. In this case SN admits the lattice expansion of all orders. To this end,
only the condition (̃A3) needs to be checked. First note that L0 = L2πk for all k ∈ Z.
Also, assuming Lt has an eigenvalue on the unit circle, we conclude (3.44),
th(x, y)− λ+ θ(y)− θ(x) ≡ 0 mod 2π
This implies t(h(x, y) + h(y, x)) ∈ 2πZ + 2λ. Note that LHS belongs a lattice with
span t and RHS is a lattice with span 2π. Because t is not a multiple of 2π this
equality cannot happen. Therefore, when t 6∈ 2πZ, sp(Lt) ⊂ {|z| < 1} and we have
the claim.
3.5.3.2 Chains without densities.
We consider a more general case where transition probabilities may not have
a density. We claim we can recover (A1)–(A4) if the transition operator takes the
form
L0 = aJ0 + (1− a)K0
where a ∈ (0, 1) and J0 and K0 are Markov operators on L∞(M) (i.e. J0f ≥ 0 if
f ≥ 0 and J01 = 1 and similarly for K∫ 0),
J0f(x) = p(x, y)f(y) dµ(y)
95
and ∫
K0f(x) = f(y)Q(x, dy)
where p is a smooth transition density and Q is a transition probability measure.
Let h(x, y) be piece-wise smooth and put,
Jt(f) = J (eith0 f) and K itht(f) = K0(e f).
Defining Lt = aJt + (1− a)Kt we can conc∫lude t 7→ Lt is analytic and that
Eµ(eitSn) = Lnt 1 dµ.
Now we show that conditions (A2), (A3) and (A4) are satisfied. Because
‖Jt‖ ≤ 1 and ‖Kt‖ ≤ 1 we have ‖Lt‖ ≤ 1. Thus the spectral radius of Lt is
≤ 1. Because aJt is compact, Lt and (1 − a)Kt have the same essential spectrum.
See [33, Theorem IV.5.35]. However the spectral radius of the latter is at most
(1− a). Hence, the essential spectral radius of Lt is at most (1− a).
Because both J0 and K0 are Markov operators we can conclude that 1 is an
eigenvalue of L0 with constant function 1 as the corresponding eigenfunction. From
the previous paragraph the essential spectral radius of L0 is at most (1−a). Because
Ln is norm bounded it cannot have Jordan blocks. So 1 is semisimple.
Suppose, Ltu = eiθu. Without loss of generality we may assume ‖u‖∞ = 1.
Assuming there exists a positive measure set Ω with |u(x)| < 1− δ we can conclude
that, for all x,
|u(x)| = |Ltu(x)| = |a∫Jtu(x) + (1− a)Ktu(x)| ∫
≤ a |u(y)|p(x, y)dµ(y) + a |u(y)|p(x, y)dµ(y) + (1− a)
Ω Ωc
96
≤ 1− aδµ(Ω).
This is a contradiction. Therefore, |u(x)| = 1. Put u(x) = eiγ(x). Then,
∫
1 = a ei(th(x,y)+γ(y)−γ(x)−θ)p(x, y)dµ(y) + (1− a)e−i(θ+γ(x))Ktu
∫
Hence, ei(th(x,y)+γ(y)−γ(x)−θ)p(x, y)dµ(y) = 1 =⇒ Jtu = eiθu. From section 3.5.3.1,
this can only be true when t = 0 and in this case θ = 0 and u ≡ 1. This concludes
that Lt, t 6= 0 has no eigenvalues on the unit disk and the only eigenvalue of L0 on
the unit disk is 1 and its geometric multiplicity is 1. As 1 is semisimple, it is simple
as required. This concludes proof of (A2) and (A3).
From the previous case, there exists r > 0 and  ∈ (0, 1) such that such that
for all |t| > r we have ‖J 2t ‖ ≤ 1− . From this we have,
‖L2t‖ = ‖a2J 2t + a(1− a)JtKt + (1− a)aKtJt + (1− a)2K2t ‖ ≤ 1− a2.
Hence, for all |t| > r, for all N , ‖LNt ‖ ≤ (1 − a2)bN/2c which gives us (A4) with
no restrictions on r1. Therefore, SN admits Edgeworth expansions of all orders as
before.
As in the previous section, an analysis can be carried out when (3.43) fails.
The conclusions are exactly the same.
3.5.4 One dimensional piecewise expanding maps.
Here we check assumptions (3.1), (A1)–(A4) for piecewise expanding maps of
the interval using the results of [5, 37].
97
Let f : [0, 1]→ [0, 1] be such that there is a finite partition A0 of [0, 1] (except
possibly a measure 0 set) into open intervals such that for all I ∈ A0, f |I extends
to a C2 map on an interval containing I. In other words f is a piece-wise C2
map. F∨urther, assume that f
′ ≥ λ > 1 i.e. f is uniformly expanding. Next, let
n
A = T−jn A0 and suppose for each n there is Nn such that for all I ∈ An,
k=0
fNnI = [0, 1]. Such maps are called covering.
Statistical properties of piece-wise C2 covering expanding maps of an interval,
are well-understood. For example, see [37]. In particular, such a function f has
a unique absolutely continuous invariant measure with a strictly positive density
h ∈ BV[0, 1] and the associated transfer operator
∑
L ϕ(y)0ϕ(x) =
f ′(y)
y∈f−1(x)
has a spectral gap.
Let g be C2 except possibly at finite number of points and admitting a C2
extension on each interval of smoothness. Define Xn = g ◦ fn and consider it
as a random variable with x distributed according to some measure ρ(x)dx, ρ ∈
BV[0, 1].
Define a family of operators Lt : BV[0, 1]→ BV[0, 1] by
∑ itg(y)
L etϕ(x) = ′ ϕ(y)f (y)
y∈f−1(x)
where t = 0 corresponds to the transfer operator. Because g is bounded, writing
eitg(y) as a power series we can conclude t → Lt is analytic for all t. This gives
(A1).
98
(A2) follows from the fact that L0 has a spectral gap. We further assume
that
g is not cohomologous to a piece-wise constant function. (3.46)
In particular, g is not a BV coboundary.
The assumption (3.46) is reasonable. Indeed, suppose that g is piece-wise
constant taking values c1, c2 . . . ck. Then Sn takes less than n
k−1 distinct values
so the maximal jump is of order at least n−(k−1) so Sn can not admit Edgeworth
expansion of order (2k − 2) in contrast to the case where (3.46) holds as we shall
see below.
A direct computation gives,
∫
√ 1
E(eitSn/ n) = Ln √t/ nρ(x) dx.
0
Therefore, there exists A such that,
E it
Sn√−nA 2 2
lim (e n ) = e−t σ /2 (3.47)
n→∞
where σ2 ≥ 0. It is well know that σ2 > 0 ⇐⇒ g is a BV coboundary (see [24]).
From (3.47) it is clear that Sn satisfies the CLT.
To show (A3) holds, we first normalize the family of operators,
∑ eitg(y)L h(y)tv(x) = ′ v(y)f (y)h ◦ f(y)
f(y)=x
Then, L −1t = H ◦Lt ◦H where H is multiplication by the function h. Therefore, Lt
and Lt have the same spectrum. However, the eigenfunction corresponding to the
eigenvalue 1 of L0 changes to the constant function 1.
99
Assume eiθ is an eigenvalue of Lt. Then, there exists u ∈ BV[0, 1] with
L iθtu(x) = e u(x). Observe that,
∑
L | | |u(y)|h(y)0 u (x) =
∣ f ′(y)h ◦ f(y)f∣(y∑)=x
≥ ∣∣ ∣eitg(y)u(y)h(y) ∣∣ iθ′ = |Ltu(x)| = |e u(x)| = |u(x)|f (y)h ◦ f(y) ∣
f(y)=x
n
Also note that, L0 is a positive operator. Hence, L0 |u|(x) ≥ |u(x)| for all n. How-
ever, ∫
n
lim (L0 |u|)(x) = |u(y)| · 1 dy
n→∞
because 1 is the eigenfunction co∫rresponding to the top eigenvalue. So for all x,
|u(y)| dy ≥ |u(x)|
This implies that |u(x)| is constant. WLOG |u(x)| ≡ 1. So we can write u(x) =
eiγ(x). Then,
∑
L h(y)u(x) = ei(tg(y)+γ(y)) = ei(θ+γ(x))t
f ′(y)h ◦ f(y)
∑f(y)=x
⇒ h(y)= ei(tg(y)+γ(y)−γ(f(y))−θ)
f ′
= 1
(y)h ◦ f(y)
f(y)=x
for all x. Since, ∑
L h(y)01 =
f ′
= 1
(y)h ◦ f(y)
f(y)=x
and ei(tg(y)+γ(y)−γ(x)−θ) are unit vectors, it follows that
tg(y) + γ(y)− γ(f(y))− θ = 0 mod 2π (3.48)
for all y. Because g is not cohomologous to a piecewise constant function we have a
contradiction. Therefore, Lt and hence Lt does not have an eigenvalue on the unit
circle when t 6= 0.
100
To complete the proof of (A3) one has to show that the spectral radius of Lt is
at most 1 and that the essential spectral radius of Lt is strictly less than 1. This is
clear from Lasota-Yorke type inequality in [5, Lemma 1]. In fact, there is a uniform
κ ∈ (0, 1) such that ress(Lt) ≤ κ for all t.
Next, we describe in detail how the estimate in [5, Proposition 1] gives us (A4).
To make the notation easier we assume t > 0 and we replace |t| by t. [5, Proposition
1] implies that there exist c and C such that if K1 large enough (we fix one such
K1) then for all t > K1,
‖Ldc ln tet u‖ ≤ e−Cdc ln tet ‖u‖t (3.49)
where ‖h‖t = (1 + t)−1‖h‖BV + ‖h‖L1 . Therefore,
‖Lkdc ln teu‖ ≤ e−Cdc ln te‖L(k−1)dc ln tet t u‖t ≤ · · · ≤ e−Ckdc ln te‖u‖t
Also, ‖Lt‖t ≤ 1. So, if n = kdc ln te+ r where 0 ≤ r < dc ln te then
kdc ln te
‖Lnu‖ ≤ e−Ckdc ln te r
k
‖L u‖ ≤ e−Cn kdc ln te+rt t t t ‖u‖
−Cn
t ≤ e k+1‖u‖t
However,
(1 + t)−1‖h‖BV ≤ ‖h‖t ≤ [1 + (1 + t)−1]‖h‖BV
Therefore,
(1 + t)−1‖Lnu‖ ≤ [1 + (1 + t)−1]e−Cn
k
BV k+1t ‖u‖BV
which gives us
‖Ln
k
t ‖BV ≤ (t+ 2)e
−Cn
k+1
101
n
and here k = k(n, t) = b c. When K ≤ |t| ≤ nr n11 , kmin = b c anddc ln te dc lnnr1e
kmin → k1 as n→∞. Also, 1 ≥ ≥ kmin and,
kmin + 1 k + 1 kmin + 1
k(n,t) k
n −Cn min‖L ‖ ≤ (t+ 2)e−Cn k(n,t)+1 r1 k +1t BV ≤ 2n e min
kmin 1
Choosing n0 such that for all n > n0, > (so this choice of n0 works for
kmin + 1 2
all t) we can conclude that,
‖Ln‖ ≤ 2nr1e−Cn/2t BV
r − 1
This proves (A4) for all choices of r1. In particular given r, we can choose r1 >
2
in the above proof. This implies that Edgeworth expansions of all orders exist.
3.5.5 Multidimensional expanding maps.
LetM be a compact Riemannian manifold and f :M→M be a C2 expand-
ing map. Let g : M → R be a C2 function which is non homologous to constant.
The proof of Lemma 3.13 in [13] shows that this condition is equivalent to g not
being infinitesimally integrable in the following sense. The natural extension of f
acts on the space of pairs ({yn}n∈N, x) where f(yn+1) = yn for n > 0 and fy1 = x.
Given such pair let [∑ ] [ ]n−1 ∑n ∑∞
Γ({y } ∂, x) = lim g(fk ∂ ∂n yn) = lim g(yk) = g(yk).
n→∞ ∂x n→∞ ∂x ∂x
k=0 k=1 k=1
g is called infinitesimally integrable if Γ({yn}, x) actually depends only on x but not
on {yn}.
Let Xn = g ◦fn. We want to verify (A1)–(A4) when x is distributed according
to a smooth density ρ. Note that assumption (3.1) holds with v = ρ, ` being the
102
Lebesgue measure and
∑ eitg(y)
(Ltφ)(x) = ∣∣ ( )det ∂f ∣∣φ(y).
y∈f−1(x) ∂x
We will check (A1)–(A4) for L acting on C1t (M). The proof of (A1)–(A3) is the
same as in section 3.5.4. In particular, for (A3) we need Lasota–Yorke inequality
(see (3.52) below) which is proven in [13, equation (19)].
The proof of (A4) is also similar to section 3.5.4, so we just explain the differ-
ences. As before we assume that t >(0. Given a small c)onstant κ let
κ‖Dφ‖C0‖φ‖t = max ‖φ‖C0 , .
1 + t
Then by [13, Proposition 3.16]
‖Lnt φ‖t ≤ ‖φ‖t (3.50)
provided that n ≥ C1 ln t.
By [13, Lemma 3.18] if g is not infinitesimally integrable then there exists a
constant η < 1 such that
‖Lnt φ‖L1 ≤ ηn‖φ‖t. (3.51)
The Lasota–Yorke inequality says that there is a constant θ < 1, such that
‖D (Lnφ)‖ ≤ C (t‖φ‖ + θnt C0 3 C0 ‖Dφ‖C0) (3.52)
Also,
‖Lnt φ‖C0 ≤ ‖L
n
0 (|φ|)‖C0 ≤ C4 (‖ |φ| ‖L1 + θn‖ |φ| ‖Lip) (3.53)
where the last step relies on L0 having a spectral gap on the space of Lipshitz
functions. Combing (3.50) through (3.53), we conclude that Lt satisfies (3.49). The
rest of the argument is the same as in section 3.5.4.
103
Chapter 4: Large Deviation Principles.
4.1 Asymptotics for Cramér’s Theorem.
In this section, we focus on sequences of i.i.d. random variables. First, we
prove the existence of weak asymptic expansions for Cramér’s LDP – Theorem 1.2.
Next, we deduce existence of the strong expansion in special cases. As expected, a
stronger assumption on the regularity of the law of the random variables is required
for the second step.
4.1.1 Weak asymptotic expansions.
We recall that a random variable X is called l−Diophantine if there exist
C
positive constants t0 and C such that |E(eitX)| < 1 − for |t| > t0. It is known|t|l
that when X is l−Diophantine and r+2 moments exist weak Edgeworth expansions
exist. For example, see [4] and Section 3.5.1.
Given a random variable X with distribution function F , we define YX,γ to be
a random variable with distribution function Gγ given by,
yγ
γ e dF (y)dG (y) = (4.1)
µ(γ)
104
∫
where µ(γ) = eyγdF (y). Therefore,
∫∫yeyγdF (y)E[YX,γ] = . (4.2)
eyγdF (y)
In Section 3.1 we defined the functio(n spaces Fm mk : f ∈ Fk if f is m) times
continuously differentiable and Cmk (f) = max ‖f (j)‖L1 + max ‖xjf‖L1 < ∞.
0≤j≤m 0≤j≤k
We call a function f , (left) exponential of order α, if lim |e−αxf(x)| = 0. Denote
x→−∞
by F km,α the collection of all f ∈ F km with f (k) is exponential of order α.
We note that due to assumption f ∈ F k , f (k)m being exponential of order α is
enough to guarantee that f (l) is exponential of order α for all 0 ≤ l ≤ k. To see
this suppose f, f ′ ∈ L1. Then, lim f(x) = 0. Suppose f ′ is exponential of order
|x|→∞
α. The∫n, given  > 0∫there is M > 0 su∫ch that for x < −M , −eαx < f ′(x) < eαx.x x x
− αy  So,  e dy ≤ f ′(y) dy ≤  eαy dy =⇒ − eαx ≤ f(x) ≤ eαx for
−∞ −∞ −∞ α α
x < −M . So f is also of exponential order α. Since f (l) ∈ L1 for all 0 ≤ l ≤ k,
we can repeat the same argument starting from k and conclude that all lower order
derivatives are of exponential order α. ⋂
It is clear that F k ⊂ F km,α m,β if α > β. Finally, define, F k km,∞ = Fm,α. This
α>0
intersection is non-empty. For example, the family of Gaussian functions and C∞c (R)
are in F km,α for all α > 0.
Recall from Chapter 1 that for a function f : R → (−∞,∞] with f 6= ∞,
Df = {x ∈ R|f(x) < ∞} and f ∗(x) = sup [tx − f(t)]. If f is convex, lower semi-
t∈R
continuous with D̊f = (a, b) and f ∈ C2(a, b) with f ′′ > 0 on (a, b) then, D̊f∗ =
(A,B) where A = lim f ′(t) and B = lim f ′(t), f ∗ is continuously differentiable on
t→a+ t→b−
(A,B). For any f satisfying the above properties, for any x ∈ D̊f∗ the supremum in
105
the definition of f ∗(x) is achieved at a unique point. f is called steep if lim |f ′(t)| =
t→a+
lim |f ′(t)| =∞.
t→b−
Theorem 4.1.1. Let X be a non-constant, real-valued, and centred random variable.
Assume that the logarithmic moment generating function h(θ) = logE(eθX) is finite
on a neighbourhood of 0. Further assume that there is l ∈ N such that for all θ ∈ D̊h,
YX,θ is l−Diophantine. Let Xn be a sequence of i.i.d. copies of X. Let r ∈ N and
a ∈ (0, sup(supp X)). Let θa be the unique θ such that( ∫ ) ∫
I(a) = sup aθ − log eyθdF (y) = aθa − log eyθadF (y).
θ∈D̊h
l(r + 2)
Take q > + 1 and α > θa. Then, for every f ∈ F qr+1,α we have,2
b∑r/2c ∫ ( )1 1
E(f(SN−aN))eI(a)N = 1 Pp(z)f
q
θa(z)dz+Cr+1(fθa)·or,θa r+1 (4.3)p+
2 2
p=0 N N
where fθ(x) = e
−θxf(x) and Pp(z) polynomials depending on a.
Proof. Assuming F to be the distribution function of X we can define YX,γ by (4.1).
Let Yi’s be i.i.d. copies of YX,γ and take S̃N = Y1 + · · ·+ YN . A simple computation
gives us,
γ e
yγdFN(y)
dGN(y) = µ(γ)N
where FN is the distribution function of SN and G
γ
N is the distribution function of
S̃N . Now, we formally compute,
E(f(S − aN))eaγN = E(eaγNN f(SN − aN))
= ∫E(eγSNfγ(SN − aN))
= eγy2πfγ(y − aN)dFN(y)
106
∫
= µ(γ)N 2πfγ(y − aN)dGγN(y)
= µ(γ)NEγ(2πfγ(S̃N − aN))
1
where f (s) = e−sγγ f(s). Hence,
2π
E(f(S − aN))e(aγ−log µ(γ))NN = Eγ(2πfγ(S̃N − aN)). (4.4)
Put γ = θa. Then, YX,γ has mean a (see [17, Chapter 2]).
Since f ∈ F qr+1,α with θa < α we have fθa ∈ F
q
r+1. We prove this when r = 0
and q = 1. The argument for general q and r is similar. Suppose, f(x), f ′(x), xf(x) ∈
L1 and f ′(x) is continuous. It is immediate that (e−θaxf(x))′ = −θ e−θaxa f(x) +
e−θaxf ′(x) is continuous. We need to show, e−θaxf(x), (e−θaxf(x))′, xe−θaxf(x) ∈ L1.
Since f and f ′ are of exponential order, it is enough to show, e−θaxg(x), xe−θaxg(x) ∈
L1 if g is exponential of order α(> θa). This is true because there is M > 0 such
that for x < −M , |e−θaxf(x)| < e(α−θa)x and |xe−θaxf(x)| < −xe(α−θa)x.
Therefore, from [4], RHS of (4.4) admits the weak Edgeworth expansion whose
coefficients are determined by moments of YX,θa . Therefore, we have that for all
functions f ∈ F qr+1,α
b∑r/2c ∫ ( )1 1
E(f(SN − aN))eI(a)N = q1 Pp,l(z)fθa(z) dz + Cp+ r+1(fθa) · o r+1 .
p=0 N 2 N 2
Remark 4.1.1.
1. The assumption of X being centred is just to simplify the notation. One can
easily reformulate the results for non-centred X using the corresponding results
107
for X − E(X). Therefore, from now on we discuss results for centred random
variables only.
2. A similar result holds for a ∈ (inf(supp X), 0). In fact, one can deduce the
corresponding results for a < 0 by considering −X and (−a) > 0. But, for
simplicity we focus only on a > 0 hereafter.
3. Note that the requirement to expand Eγ(fθa(S̃N − aN)) is f
q
θa ∈ Fr+1 which
is indeed the case when f ∈ F qθ ,α for some α > θa. In particular, this resulta
holds for f ∈ Cqc (R).
4. In addition, if h(θ) is steep then sup(supp X) =∞ (see [30, Chapter 1]) and
the expansion holds for all a > 0.
We note that for a large class of random variables X, YX,θ is l−Diophantine.
For example, if X is 0−Diophantine then so is YX,θ because X is absolutely contin-
uous with respect to YX,θ (see [1, Lemma 4]). Also, we claim that if X is compactly
supported and l−Diophantine for l > 0 then so is YX,θ.
We recall from [4], that a random variable X with distribution function F is
l−Diophantine if and only if there exists C1, C2 > 0 such that for all |x| > C1,
∫
C2
inf {ax+ y}2dF (a) ≥
y∈R R |x|l
where {z} = dist(z,Z). If X is compactly supported (say on [c, d]) then,
∫
{ax+ y}2dGθ(a) = ∫ ∫1 d{ax+ y}2eθadF (a)d
R eθadF (a)
c ∫c
≥ ∫ eθc d{ax+ y}2 dF (a).
eθaR dF (a) c
108
Thus, for all |x| > C1,
∫
eθc
inf { C2ax+ y}2dGθ(a) ≥ ∫ .
y∈R dR eθadF (a) |x|l
c
So the random variable YX,θ with distribution function G
θ is l−Diophantine as
claimed earlier. From this we obtain the following corollary.
Corollary 4.1.2. Let X be a non-constant, real-valued, compactly supported and
l-Diophantine centred random variable. Let Xn be a sequence of i.i.d. copies of X.
Let r ∈ N and a ∈ (0, sup(supp X)). Let θa be the unique θ such that( ∫ ) ∫
I(a) = sup aθ − log eyθdF (y) = aθa − log eyθadF (y).
θ∈D̊h
l(r + 2)
Then, for every f ∈ F qr+1,α with q > + 1 and α > θa we have,2
b∑r/2c ∫ ( )
E(f(S I(a)N
1 q 1
N − aN))e = 1 Pp(z)fθp+ a(z)dz + Cr+1(fθa) · or,θa r+1
2 2
p=0 N N
for some polynomials Pp(z) depending on a.
4.1.2 Strong asymptotic expansions.
We prove a lemma that gives conditions for the point-wise limit of a sequence
of functions uniformly bounded in F qr+1 to satisfy the asymptotic expansions.
Lemma 4.1.3. Let q ≥ 0. Suppose {fk} is a sequence in F qr+1, SN admits the weak
local Edgeworth expansion for f , Cqk r+1(fk) ≤ C for all k, fk are uniformly bounded
in L∞(R), fk → f point-wise and for all p,
∫ ∫
lim Pp(z)fk(z)dz = Pp(z)f(z)dz. (4.5)
k→∞
109
Then,
√ b∑r/2c ∫1 1
NE(f(SN)) = Pp(z)f(z)dz + C · or,β(N−r/2).
2π Np
p=0
Proof. For large N ,
∣∣∣√ b∑r/2c ∫1 1E ∣∣N (fk(SN))− Pp(z)fk(z)dz∣ ≤ Cqr+1(f −r/2p k) · or,β(N )2π N
p=0
≤ C · or,β(N−r/2). (4.6)
LDCT gives us that,
lim E(fk(SN)) = E(f(SN))
k→∞
This along with assumption (4.5) allows us to take the limit k →∞ in the RHS of
(4.6) and to conclude,
∣∣∣√ b∑r/2c ∫E − 1 1 ∣∣N (f(SN)) Pp(z)f(z)dz∣ ≤ C · o (N−r/2r,β )
2π Np
p=0
which implies the result.
Remark 4.1.2. The same would hold if we replace weak local by weak global. How-
ever, our focus here is on weak local expansions.
The next theorem specifies when the existence of weak expansions imply the
existence of strong expansions.
Theorem 4.1.4. Let Xn be a sequence of random variables not necessarily i.i.d.
Suppose SN = X1 + · · · + XN admits the weak asymptotic expansion of order r for
large deviations in the range (0, L) for f ∈ F 1r+1,L where L+ > L when L <∞ and+
L+ =∞ if L =∞. That is,
b∑r/2c ∫ ( )
E(f(S − aN))eI(a)N 1 1N = Pp(z)f 1θa(z)dz + Cr+1(fp+1/2 θa) · or,θN a r+1N 2p=0
110
for all a ∈ (0, L) where I(a) and θa as in (4.11). Then, SN admits the strong
asymptotic expansion of order r for large deviation in (0, L).
Proof. If f ∈ C∞c then fθ ∈ F 1r+1 for all θ. Therefore, we approximate 1[0,∞) by a
sequence fk of C
∞
c functions such that (fk)θa are uniformly bounded in F
1
r+1 (see
Appendix A.3 for such a sequence) and invoke Lemma 4.1.3 to establish,
b∑r/2c ∫ ∞ ( )
P(SN ≥
1 1 1
aN)eI(a)N = Pp(z)e
−θazdz + C · o
2π Np+1/2
r,θa r+1 .
0 N 2p=0
Rem∫ ark 4.1.3. Note that the coefficients of the strong expansion are Cp(a) =
1 ∞
Pp(z)e
−θaz dz obtained by replacing f with 1[0,∞) in coefficients of the weak
2π 0
expansions. Since fk’s are bounded in F
1
r+1, we can do this without altering the
order of the error. However, for any q > 1, 1[0,∞) is not a pointwise limit of a
sequence of functions f in F qk r with C
q
r+1(fk) bounded. To see this, assume that
‖f ′ ′′k‖1, ‖fk‖1, ‖fk ‖1 are uniformly bounded and fk → 1[0,∞) point-wise. Then, for all
φ ∈ C∞c (R),∫ ∫ ∫ ∫ ∫ ∫
δ′ φ = − δ φ′ = 1 φ′′ = lim f φ′′ = lim − f ′ φ′[0,∞) k k = lim f ′′ φ
k→∞ k→∞ k→∞ k
|φ′(0)|
This implies that ≤ sup ‖f ′′‖ for all φ ∈ C∞(R). Clearly, this is a contra-
‖ 1φ‖ k c∞ k
diction. Therefore, Theorem 4.1.1 does not automatically give us strong expansions.
Now we are in a position to state and prove the main result of this section,
which extends Cramér’s LDP for i.i.d. random variables when the random variables
have a sufficiently regular density.
111
Theorem 4.1.5. Let X be a non-constant real valued centred random variable.
Assume that the logarithmic moment generating function h(θ) = logE(eθX) is finite
on a neighbourhood of 0. Further assume that, X is 0−Diophantine. Let r ∈ N.
Then for all a ∈ (0, sup(suppX)), there are constants Cp(a) such that
b∑r/2c ( )Cp(a) 1P(SN ≥ aN)eI(a)N = 1 + op+ r+1
p=0 N 2 N 2
where ∫
1 ∞
C −θazp(a) = e Pp(z)dz
2π 0
for some polynomials Pp(z) depending on a,( ∫ )
I(a) = sup aθ − log eyθdF (y)
θ∈R
and θa is this unique point the supremum is achieved.
Proof. If X is 0−Diophantine then so is YX,θ as X is absolutely continuous with
respect to YX,θ (see [1, Lemma 4]). Since, YX,θ has moments of all orders, YX,θ admits
the strong Edgeworth expansion of all orders. Therefore, for each r ∈ N, YX,θ admits
the weak local Edgeworth expansion of order r for f ∈ F 1r (see Appendix A.2).
From (4.4) we know that,
E(f(SN − aN))eI(a)N = Eγ(2πfθa(S̃N − aN))
where summands of S̃N have mean a. The assumptions allow us to expand RHS
using the weak local Edgeworth expansion and obtain,
b∑r/2c ∫ ( )
E(f(S − aN))eI(a)N 1= P (z)f (z)dz + C1N 1 p θa r+1(fθa) · o −r/2r,β N .p+
2
p=0 N
112
for f ∈ C∞c (R).
Now, we approximate 1 ∞[0,∞), by a sequence fk ∈ Cc (R) such that (fk)θa are
bounded in F 1r+1 (see Appendix A.3 for such a sequence) and use Theorem 4.1.4 to
obtain the required expansion.
Remark 4.1.4. This gives us an alternative proof of [1, Theorem 2] for X satisfying
the Cramér’s condition (which corresponds to Case 1 there).
There are two ways the coefficients Cp(a) depend on a. First note that θa
depends on the choice of a. Also, from Section 3.3, we know exactly how the
coefficients of Pp depend on the first p+ 2 asymptotic moments of S̃N and thus, on
the first p+ 2 moments of YX,θa . So the dependence of C(a) on a is explicit and one
can compute these coefficients. In addition, Cp(a) does not depend on r because
Pp(z)’s do not.
4.2 Higher order asymptotics in the non–i.i.d. case.
Let Xn be a sequence of random variables that are not necessarily i.i.d. with
asymptotic mean 0. Suppose that there exist a Banach space B, a family of bounded
linear operators Lz : B→ B and vectors v ∈ B, ` ∈ B′ such that
( )
E ezSN = `(LNz v), z ∈ C (4.7)
and satisfying the following,
(B1) There exists δ > 0 such that z 7→ Lz is continuous on the strip |Re(z)| < δ
and holomorphic on the disc |z| < δ.
113
(B2) 1 is an isolated and simple eigenvalue of L0, all other eigenvalues of L0 have
absolute value less than 1 and its essential spectrum is contained strictly inside
the disk of radius 1 (spectral gap).
(B1) and (B2) along with perturbation theory of operators (see [33]) imply
that there is δ0 ∈ (0, δ) such that
Lz = µ(z)Πz + Λz, |z| < δ0 (4.8)
where µ(z) is the top eigenvalue of Lz, Πz is the corresponding eigen–projection,
Π∥ zΛz = Λ∥ zΠz = 0 and z →7 µ(z), z →7 Πz and z →7 Λz are holomorphic. In addition,∥∥ dk ∥ΛNz ∥ < βNk with 0 < βk < 1. Therefore,dzk
LNz = µ(z)NΠz + ΛNz
Combining this with (4.7) we have,
E(ezSN ) = µ(z)N`(Πzv) + `(Λzv). (4.9)
Then, plugging in z = 0 and taking N → ∞, we conclude that `(Π0v) = 1. Also,
taking the derivative at z = 0, dividing by N and taking the limit as N → ∞, we
obtain,
d ∣∣ E(SN)
µ(z)∣ = lim = 0.
dz z=0 N→∞ N
Taking the second derivative at z = 0, dividing by N2 and taking the limit as
N →∞, we obtain,
d2 ∣∣∣ E(S2 )µ(z) = lim N
dz2 z=0 N→∞ N2
In addition, it follows from [24][Theorem 2.4] that there exists σ2 ≥ 0 such
114
that √SN →−d N (0, σ2). Since our interest is in SN that satisfies the CLT we would
N
asumme from now on that σ2 > 0.
We also assume the following:
(B3) µ(θ) > 0 for all θ ∈ (−δ0, δ0) (Here δ0 as in (4.8)).
Define Ω(θ) = log µ(θ) for θ ∈ (−δ0, δ0). Then, Ω(0) = log µ(0) = 0 and
′
′ µ (0) µ
′′(0)µ(0)− µ′(0)2
Ω (0) = = 0. Also, Ω′′(0) = = µ′′(0) = σ2 > 0. Since Ω′′
µ(0) µ(0)2
is continuous, there exists δ1 ∈ (0, δ0) such that Ω is strictly convex on (−δ1, δ1).
Note that due to convexity, Ω′(−δ1) < 0 < Ω′(δ1). In addition, when θ 6= 0,
µ(θ) > µ(0) = 1 by convexity.
Next, we consider the Legendre transform of Ω, I given by,
I(a) = sup [aθ − Ω(θ)], for a ∈ [0,Ω′(δ1))
θ∈(−δ1,δ1)
which itself is a strictly convex function.
Because Ω′ is strictly increasing and continuous on [0,Ω′(δ )], a − Ω′1 (θ) = 0
has a unique solution θa which depends continuously on a. Note that I(a) ≥ 0 for
all a and I(a) = 0 ⇐⇒ a = 0. Also, I(a) is continuous because I is convex and
I(0) = 0. In addition, I(Ω′(δ1)) = aδ1 − Ω(δ1).
Now, we are in a position to prove a Large Deviation Principle for SN using
Theorem 1.3. The following lemma shows that Theorem 1.3 applies in our case.
Lemma 4.2.1. Suppose (B1), (B2) and (B3) hold. Then, there exists 0 < δ2 ≤ δ1
such that for θ ∈ (−δ2, δ2),
1
lim logE(eθSN ) = log µ(θ)
N→∞ N
Proof. Because `(Π0v) > 0, there exists δ2 and m > 0 such that for θ ∈ [−δ2, δ2]
115
`(Πθv) > 2m. Because ‖ΛN‖ < Cµ(θ)Nθ for large N , we have that
lim µ(θ)−N`(ΛNθ v) = 0.
N→∞
Hence, there exists N0 such that for N > N0,
m < `(Πθv) + µ(θ)
−N`(ΛNθ v) < 3m.
Hence,
1 [ ]
lim ln `(Πθv) + µ(θ)
−N`(ΛNθ v) = 0.
N→∞ N
Now, for θ ∈ (−δ2, δ2) we can rewrite (4.9) as
1 1 [ ]
logE(eθSN ) = log µ(θ) + log `(Πθv) + µ(θ)−N`(ΛNθ v) .N N
This implies that,
1
lim logE(eθSN ) = log µ(θ).
N→∞ N
Combining this lemma with Theorem 1.3 and the analysis proceeding it, we
have the following LDP.
Theorem 4.2.2. Sup(pose (B1), ()B2) and (B3) hold. Then, there exists δ2 ∈ (0, δ1]log µ(δ2)
such that for all a ∈ 0, ,
δ2
1
lim logP(SN ≥ aN) = −I(a) (4.10)
N→∞ N
where
I(a) = sup [aθ − log µ(θ)] = aθa − log µ(θa) (4.11)
θ∈(−δ2(,δ2) )′ µ′(θ)
and θa is the unique θ solving log µ(θ) = = a.
µ(θ)
116
Remark 4.2.1. The range of a for which the LDP holds, is constrained by the
assumptions (B1), (B2) and (B3). We require a positive top eigenvalue µ(θ) to
exist, log µ(θ) to be strictly convex and `(Πθv) > 0. Larger the range of θ for which
these hold, larger the range of a. In particular, if these hold for all θ ∈ R, then by
log µ(δ)
convexity B = lim exists as an extended real number and for all a ∈ (0, B)
δ→∞ δ
the LDP holds.
Next, we compute higher order asymptotics of this LDP. To this end, we make
two more assumptions about Lz.
(B4) For all θ ∈ (−δ2, δ2), for all real numbers t 6= 0, sp(Lθ+it) ⊂ {|z| < µ(θ)}.
(B5) There are positive real numbers r1, r2, C,K and N∥0 suc∥h that for all θ ∈
− µ(θ)
N
( δ2, δ2), for all N > N0 and for all K < |t| < N r1 , ∥LN ∥θ+it ≤ C .N r2
Remark 4.2.2. As in Remark 3.1.1 it follows that by slightly decreasing r1 we can
assume r2 to be(as large as required for large enough N .
∈ log µ(δ )
)
2
Pick a 0, . Then,
δ2
E(f(SN − aN))eaθN = E(e∫θSN e−(SN−aN)θf(SN − aN))
1
= f̂ (t)e−iatNθ `(LN
2π θ+it
v) dt
1 e−iat
where f −θxθ(x) = e f(x). Now define, Lθ+it = Lθ+it. Then,
2π ∫ µ(θ)
E N(f(SN − aN))eaθN = µ(θ)N f̂θ(t)`(Lθ+itv) dt.
From this we have,
∫
E(f(S − aN))e[aθ−log µ(θ)]N NN = f̂θ(t)`(Lθ+itv) dt.
117
In particular, ∫
E(f(S − aN))eI(a)N LNN = f̂θa(t)`( θ+it v) dt. (4.12)a
e−iat
Note that for |θa + it| < δ0 the top eigenvalue of Lθa+it is µ(θa + it) = µ(θa +µ(θa)
it). As a function of t, µ(θa + it) is analytic in a neighbourhood of 0 by (4.8).
Further, ∣
′ d ∣∣ − µ′(θa) ′′ −µ′′(θa)µ(θa) = 1, µ (θa) = µ(z) = ia+ i = 0, µ (θa) = = −σ2
dt µ(θ ) µ(θ ) at=0 a a
with σa > 0. Thus, there exists δ such that
| 2µ(θ + it)| < e−σat2/4a , |t| < δ. (4.13)
We also notice that,
`(ΛNθ v)lim = 0
N→∞ µ(θ)N
because the spectral radius of Λθ is strictly smaller than µ(θ). Combining this with
E(eθSN ) = µ(θ)N`(Πθv) + `(ΛNθ v) we conclude that for all θ,
E(eθSN )
`(Πθv) = lim .
N→∞ µ(θ)N
The following lemma allows us to obtain asymtotics of (4.12). We note that it
is analogous to Theorem 3.1.4 where asymptotics of E(f(SN−aN)) for f ∈ F q+2r+1 are
discussed and can be proven using the ideas in the proof of Theorem 3.1.4. One just
has to replace Lt by Lθa+it there and introduce the corresponding changes.
Lemma 4.2.3. Suppose (B1) (through (B5)) hold. Let r ∈ N. Then, there exist
∈ log µ(δ2)δ2 (0, δ) such that for all a ∈ 0, there are polynomials Pp(z) such that
δ2
for g ∈ F q+1 r + 1r+1 with q > ,∫ 2r1br/2c ∫ ( )
LN
∑ 1 1
ĝ(t)`( q+2θa+itv) dt = Pp(z)g(z)dz + CNp+1/2 r+1
(g) · or,θa r+1
2
p=0 N
118
where θa is as in (4.11).
Proof. We state how to estimate LHS away from 0. The rest of the proof, which
contains the construction of polynomials Pp, is identical to that of Theorem 3.1.5
with it replaced by θa + it.
Fix δ > 0 as in (4.13∣∣).∫By (B4), for δ ≤ |t| ≤ ∣∣K, there exists c0 ∈ (0, 1) such
‖Ln nthat nθ +it‖ ≤ c0 . Thus, ∣∣ ĝ(t)`(Lθ +itv) dt∣∣ ≤ C‖g‖ cna a 1 0 .
δ<|t|<K
WLOG assuming r2 > r1 + (r + 1)/2,
∣∣∣∣ ∫ ∣ ∫Lnĝ(t)`( θa+itv) dt∣∣∣ ≤ C‖ ‖ n C‖g‖1g 1 ‖Lθa+it‖ dt ≤ r2−r1
K<|t|<nr1 K<|t|<nr1 n
= ‖g‖ o(n−(r+1)/21 ).
Since, g ∈ F q+2r+1 , we have that tqĝ(t) = (−i)qĝ(q)(t) and ĝ(q) is integrable.
r + 1
Integrability of ĝ(q) along with q > implies,
∣∣∣∣ ∫ ∣∣ ∫
2r1 ∫ ∣∣ ĝ(q)Ln (t) ∣∣ĝ(t)`( θ +itv) dt∣∣ ≤ |ĝ(t)| dt ≤ ∣ ∣ dt (4.14)a q
|t|>nr1 |t|>nr1 |t|>nr1 t
‖ĝ(q)≤ ‖1 = ‖ĝ(q)‖ o(n−(r+1)/21 ).
nr1q
Therefore, ∣∣∣∣ ∫ ∣n ∣ĝ(t)`(L v) dt∣∣ = o(n−(r+1)/2θ +it ). (4.15)a
|t|>δ
Remark 4.2.3.
1. The proof is almost identical to the proof of Theorem 3.1.4 and hence, the
coefficients of polynomials Pp can be computed as shown in Section 3.3. In
particular, they depend on exponential moments of SN .
119
2. Since θa depends on a, the coeffients of the polynomials Pp also depend on a.
As a direct consequence of Lemma 4.2.3 and equation (4.12), we have the
following theorem.
T(heorem 4log µ(δ )).2.4. Suppose (B1) through (B5) hold. Let r ∈ N. Then, for a ∈2
0, there exist θa ∈ (0, δ2) and polynomials Pp(z) such that for f ∈ F q+2
δ r+1,α2
r + 1
with q > and α > δ2,
2r1
b∑r/2c ∫ ( )1
E(f(S − aN))eI(a)N = P (z)f (z)dz + Cq+2N p θa r+1 (fθa) ·
1
o
p+1/2 r,θN a r+1N 2p=0
1
where f −θxθ(x) = e f(x), I and θa as in (4.11).
2π
Remark 4.2.4. In particular, the theorem holds for all f ∈ C∞c (R).
This is the weak asymptotic expansion which gives us the required higher order
asymptotics for (4.10), the LDP in Theorem 4.2.2.
Next, we replace (B5) by the following stronger assumption which allows us
to conclude existence of strong expansions for the LDP. Compare this assumption
with assumption (A5) in Chapter 3.
(̃B5) There are positive real numbers r1, r2, r3, C,K∥ and∥N0 such that for all θ ∈
− µ(θ)
N
( δ2, δ2), for all N > N0 and for all |t| > K, ∥LN ∥θ+it ≤ C .N r2|t|r3
As in the case of (B5), we can assume r2 and r3 to be large after slightly
reducing r1. Therefore we have the following theorem.
Theorem 4.2.5. Suppose (B1) through (B4) and (̃B5) hold. Let r ∈ N. Then,
there exists 0 < δ2(≤ δ such th)at SN admits a weak asymptotic expansions for thelog µ(δ2)
LDP in the range 0, for f ∈ F 1
δ r+1,α
with α > δ2.
2
120
(
∈ log µ(δ
)
2)
In particular, for all a 0, there exist constants Cp(a) such that
δ2
b∑r/2c ( )C (a) 1
P p(SN ≥ aN)eI(a)N = + Cr,θ o
Np+1/2 a r+1
.
2
p=0 N
where ∫
1 ∞
C (a) = e−θazp Pp(z)dz
2π 0
for some polynomials P0(z), . . . , Pr(z) depending on a and unique θa ∈ (0, δ2) such
that
I(a) = sup [aθ − log µ(θ)] = aθa − log µ(θa).
θ∈(−δ2,δ2)
Proof. The proof of the first part is similar to that of Theorem 4.2.4. The only
difference is the estimate (4.14).
Since f ∈ F 1 1r+1,α, we have g = fθ ∈ Fr+1. So tĝ(t) = (−i)ĝ′(t). WLOG assume
r + 1
r3 > . Then,
∣ 2r1∣∣∣ ∫ ∣∣ ∫ ∫ ∣ ′ ∣ĝ(t)`(Ln ∣∣ ≤ | |‖Ln ‖ ≤ ∣∣ ĝ (t) ∣θ +itv) dt C ĝ(t)a θa+it dt C ∣ dtt1+r| | r | | r | | r 3t >n 1 t >n 1 t >n 1
≤ C‖g
′‖1
nr1r3
= ‖g′‖ o(n−(r+1)/21 )
Now, the existence of the strong expansion follows from the first part of the theorem
and Theorem 4.1.4.
As in the i.i.d. case, Cp(a) does not depend r because θa and Pp do not. Also,
there are two ways the coefficients C(a) depend on a. First note θa depends on
the choice of a. Also, from Section 3.3, we know exactly how the coefficients of Pp
depend on the derivatives of the µ(z) and `(Πz(·)) at θa and thus, on the exponential
121
moments of SN . Since this dependence of C(a) on a is explicit, one can compute
these coefficients.
4.3 An application to Markov Chains.
Take xn to be a time homogeneous Markov process on a compact connected
manifold M with smooth transition density p(x, y) which is bounded away from 0,
and Xn = h(xn−1, xn) for smooth function h :M×M→ R. We assume that h(x, y)
can not be written in the form
h(x, y) = H(y)−H(x) + c(x, y) (4.16)
where c(x, y) is piece-wise constant. (An equivalent condition is given in Lemma 3.5.1).
This is exactly the setting we worked in Section 3.5.3.1.
We need the following lemma to establish (B1) through (B5).
Lemma 4.3.1. Let K(x, y) be a smooth∫positive function on M×M. Let P be an
operator on L∞(M) given by Pu(x) = K(x, y)u(y) dy. Then, P has a simple
M
leading eigenvalue λ > 0 and the corresponding eigenfunction g is positive and
smooth.
Proof.∑From the Weierstrass theorem, K(x, y) is a uniform limit of functions of the
form Jr(x)Lr(y). Therefore, P can be approximated by finite rank operators.
r≤n
So P is compact. Since P is an operator which leaves the cone of positive functions
invariant, by a direct application of Birkhoff Theory (see [2]), P has a leading
eigenvalue λ which is positive and simple. The corresponding eigenfunction g is also
positive.
122
Because P is compact, there is l ∈ (0, λ) such that spL∞(P )∩{|z| > r} = {λ}.
Next, we consider P acting on C1(M)∫. Observe that,
d ∂K
(Pu)(x) = (x, y)u(y) dy.
dx M ∂x
So, ‖Pu(x)‖C1 ≤ C‖u‖∞ for some C. Since ‖ · ‖∞ ≤ ‖ · ‖C1 unit ball with respect to
‖ · ‖C1 is relatively compact with respect to ‖ · ‖∞. Therefore the essential spectral
radius is 0 by [24, Lemma 2.2]. This gives us, spC1(P ) ∩ {|z| > r} ⊆ {λ}.
To see that equality holds, note that the constant function 1 ∈ C1(M). By
positivity of P ,
≥ g
n n
1 =⇒ P n P g1 ≥ =⇒ n ≥ λ gP 1 =⇒ |‖P n‖| ≥ n‖ gλ ‖ nC1 ≥ λ
sup g sup g sup g sup g
where |‖ · ‖| is the operator norm of P acting on C1(M). Therefore, the spectral
radius of P is ≥ λ. This establishes that g ∈ C1. We can repeat the argument and
show g ∈ Cr for r ∈ N.
Take B = L∞(M) and con∫sider the family of integral operators,
(Lzu)(x) = p(x, y)ezh(x,y)u(y) dy, z ∈ C.
M
Let µ be the initial distribution of the Markov chain. Then, using the Markov prop-
erty, we have E [ezSnµ ] = µ(LNz 1). Now, we check conditions (B1) through (B5).
Conditions (B1) and (B2) coincide with the conditions (A1) and (A2) in Chap-
ter 3 and we verify them in Section 3.5.3.1. In particular, (B1) holds with δ = ∞.
Note that, for all θ, Lθ is of the form P in Lemma 4.3.1. Therefore, (B3) holds for
all θ. Take λ(θ) be the top eigenvalue and gθ to be the corresponding eigenfunction.
Then, gθ is smooth.
123
To show (B4) and (B5) we d∫efine a new operator Qθ as follows.
1 θh(x,y) gθ(y)(Qθu)(x) = e p(x, y)u(y) d(y).
λ(θ) M gθ(x)
eθh(x,y)p(x, y)
It is easy see to that pθ(x, y) = and dmθ(y) = gθ(y) d(y) defines a
gθ(x)λ(θ)
new Markov chain xθn with the ass∫ociated Markov operator Qθ. That is, Qθ is a
1 gθ(y)
positive operator and Qθ1 = e
θh(x,y)p(x, y) dy = 1 because gθ is the
λ(θ) M gθ(x)
eigenfunction corresponding to eigenvalue λ(θ) of Lθ.
Now, we can repeat the argument in Section 3.5.3.1 to establish properties of
the perturbed operator given by ∫
(Qθ+it)u(x) = e
ith(x,y)pθ(x, y) dmθ(y)
M
Since (4.16) does not hold we conclude that sp(Lθ+it) ⊂ {|z| < 1}.
Take Gθ to be the operator on L
∞(M) that corresponds to multiplication by
gθ. Then, Lθ+it = λ(θ)G Q −1θ θ+itGθ . Therefore, sp(Lθ+it) is the sp(Qθ+it) scaled by
λ(θ). This implies sp(Lθ+it) ⊂ {|z| < λ(θ)} as required.
Since (4.16) does not hold, the asymptotic variance σ2θ of X
θ
n = h(x
θ
n−1, x
θ
n) is
positive. Taking γ(θ+ it) to be the top e∣ignevalue of Qθ+it, λ(θ+ itd2 ∣∣ d2 ∣
) = λ(θ)γ(θ+ it).
∣∣ γ′′(θ)T(hus, (log λ(θ))′′ = − log λ(θ + it) = − log γ(θ + it) = − +dt2 t=0 dt2 t=0 γ(θ)γ′(θ))2
=∫−γ′′(θ) + γ′(θ)2 (∵ γ(θ) = 1). Put SθN = Xθ1 + · · · + Xθγ(θ) N . Since,
E(eitSθN ) = QNθ+it1 dµ, from (3.37), we have that γ′(θ)2 − γ′′(θ) = σ2θ . Thus,
(log λ(θ))′′ = σ2θ > 0. Therefore, log λ(θ) is a strictly convex function.
Note that, Lθ = λ(θ)Πθ+Λθ where Πθ is the projection onto the top eigenspace.
From [27, Chapter III], Πθ = gθ ⊗ ϕθ where ϕθ is the top eigenfunction of Q∗θ, the
adjoint of Qθ. Because Q
∗
θ itself is a positive compact operator acting on (L
∞)∗ (the
124
space of finitely additive finite signed measures), ϕθ is a finite positive measure.
Hence, µ(Πθ1) = ϕθ(1)µ(gθ) > 0 for all θ.
As a result, Lemma 4.2.1 holds with δ2 arbitrary large and hence, Theo-
rem 4.2.2 holds with δ2 arbitrary large. So the rate function I(a) in Theorem 4.2.2
log λ(θ)
is finite for a ∈ (0, B) where B = lim . We observe that B < ∞ be-
θ→∞ θ
SN BN
cause h is bou∑nded i.e. ≤ ‖h‖∞ surely. In fact, we claim B = lim whereN N→∞ NN
BN = sup h(xj−1, xj) (the supremum taken over all possible realizations of
x0,...,xN j=1
the Markov chain xn).
BN BN
First note that BN is subadditive. So lim exists and is equal to inf .
N→∞ N N N
SN BN
Given, a > B there exists N0 such that for all N > N0, ≤ < a. Therefore,
N N
P(SN ≥ aN) = 0 for all N > N0 and hence, I(a) = ∞. Next, given a < B, for
all N ,∑BN > aN . Fix N . Then, there exists a realization x1, . . . , xN such thatN
aN < h(xj−1, xj) ≤ B. Since h is uniformly continuous onM×M, there exists
j=1
δ > 0 such that∑by choosing yj from a ball of radius δ centred at xj i.e. yj ∈ B(xj, δ),N
we have aN < h(yj−1, yj) ≤ B. We estimate the probability of choosing such a
j=1
realization y1, . . . , yN and obtain a lower bound for P(SN ≥ aN):
∫ ∫ ∫
P(SN ≥ aN) ≥ · · · p(yN−1, yN) . . . p(y0, y1) dµ(y0) dy1 . . . dyN
B(xN ,δ) (B(x1,δ) B(x0,δ) )N
≥ µ(B(x0, δ)) min p(x, y) vol(B )Nδ
x,y∈M
Therefore, I(a) <∞ as required.
Also, because gθ is smooth we can repeat the argument in Section 3.5.3.1 to
obtain (3.45) for Qθ+it. That is, there is θ and rθ such that ‖Q2θ+it‖ ≤ (1− θ) for
125
all |t| > rθ. Therefore,
‖LNθ+it‖ = λ(θ)N‖G N −1 NθQθ+itGθ ‖ ≤ λ(θ) ‖G ‖‖Q
N ‖‖G−1θ θ+it θ ‖ ≤ Cλ(θ)
N(1−  )bN/2cθ .
This establishes (B5).
Since the rate in (B5) is exponential and Theorem 4.2.2 holds for (0, B), we
conclude that for all r ∈ N, these Markov chains admit weak expansions for large
deviations of order r in the range (0, B) for F 3r+1,B+ where B+ =∞, if B =∞ and
B+ > B, if B <∞.
We need a stronger assumption on h to establish (B̃5). Suppose,
For all x, y critical points of z 7→ (h(x, z) + h(z, y)) are non-degenerate. (4.17)
Since critical points of z 7→ (h(x, z) + h(z, y)) are non-degenerate we can use the
stationary phase asymptotics in [48, Chapter VIII.2], to obtain,
∣∣∣ ∫ ∣eit(h(x,z)+h(z,y)p(x, z)p(z, y)eθ(h(x,z)+h(z,y)) dz∣∣ ≤ M|t|d/2M
M
where M is a constant and d is the dimension of M. Therefore, ‖L2θ+it‖ ≤ .|t|d/2
1
Choose K = (2M)2/d. Then for all |t| > K, ‖Q2θ+it‖ ≤ and hence,
( ) 2b(N−2)/2c
‖LNθ+it‖ ≤ ‖LN−2 2
1 M
θ+it ‖‖Lθ+it‖ ≤ , |t| > K.2 td/2
By convexity, λ(θ) > 1. Thus,
(1)b(N−2)/2cλ(θ)N‖LNθ+it‖ ≤M , |t| > K.2 td/2
This establishes (̃B5).
126
In particular, when h depends only on one variable, i.e. h(x, y) = H(x) for
some H, we have that h(x, z) + h(z, y) = H(x) +H(z). Then, the condition (4.17)
reduces to critical points of H being non-degenerate.
Again, because Theorem 4.2.2 hold for all (0, B) and the rate in (B̃5) is ex-
ponential, we conclude that these strongly ergodic Markov chains admit strong
expansions for large deviations of all orders in the range (0, B).
127
Chapter A: Appendix
A.1 Convergence of X .
We need some background information. Given a piecewise smooth function
g : Rd → R of compact support its Siegel transform is a function on the space of
lattices defined by ∑
S(g)(L) = g(w).
w∈L\{0}
We need an identity of Siegel, see ( [38, Section 3.7] or [46, Lecture XV]) saying
that ∫
EL(S(g)) = g(w)dw. (A.1)
Rd
In particular, if B is a set in Rd with piecewise smooth boundary not containing 0
then
PL(L ∩B 6= ∅) ≤ P(S(1B)(L) ≥ 1) ≤ EL(S(1B)) = Vol(B). (A.2)
sin(2πχ(w))
Proof of Lemma 2.1.2. Let L+ = {w ∈ L : y(w) > 0}. Since is even
y(w)
it is enough to restrict the attention to w ∈ L+.
Throughout the proof we fix two numbers ε > 0, τ < 1 such that ε (1−τ)
1. It is easy to see using (A.2) and Borel-Cantelli Lemma that for almost every lattice
128
L C, there exists C and β such that y(w) > . It follows that
∑ ‖w‖
β
sin 2πχ(w) ∑
e−||x(w)||
2 ≤ 2εC||w||βe−||w||
y(w)
w∈L+: ||x(w)||≥||w||ε w∈L+
converges absolutely. Hence it suffices to establish the convergence of
∑
X̄ sin 2πχ(w):= e−||x(w)||2 .
y(w)
w∈L+: ||x(w)||≤||w||ε<Rε
Let Rj,k = 2
k + j2τk, j = 0, . . . 2(1−τ)k. To prove the convergence of X̄ we will show
that for all L almost all χ satisfy two estimates below
∀ sequence {jk} X̄R converges as k →∞, (A.3)jk,k
∣ ∣
max sup ∣X̄R − X̄ ∣j,k → 0 as k →∞. (A.4)
j Rj,k≤R≤Rj+1,k
To prove (A.3) let
∑ sin 2πχ(w)
S = e−||x(w)||
2
j,k .
y(w)
w∈L+: ||x(w)||≤||w||ε, Rj,k≤||w||≤Rj+1,k
Using that Eχ(sin(2π(χ(w)))) = 0 and for w1 6= ±w2 we have
Eχ(sin(2π(χ(w1))) sin(2π(χ(w2)))) = 0
we see that Eχ(Sj,k) = 0 and
∑ e−2‖x(w)‖2
Varχ(Sj,k) =
2y2(w)
w∈L+: ||x(w)||≤||w||ε,Rj,k≤||w||≤Rj+1,k
≤ 1 Card(w : ||x(w)|| ≤ ||w||ε, R
2k+1 j,k
≤ ||w|| ≤ Rj+1,k)
2
≤ C(L)Vol(w : ||x(w)|| ≤ ||w||ε, R
2k j,k
≤ ||w|| ≤ Rj+1,k)
2
≤ C(L)2(τ+ε(d−1)−2)k.
129
Hence by Chebyshev inequality for each j
( )
P S ≥ 2−(1−τ+ε)k ≤ C(L)2(εd−τ)kχ j,k
and so ( )
P ∃j : S ≥ 2−(1−τ+ε)k ≤ C(L)2(1+εd−2τ)kχ j,k .
Thus if ε is sufficiently small and τ is sufficiently close to 1 then Borel-Cantelli
Lemma shows that ∑for almost every χ, if k is large enough, then for all j Sj,k ≤
2−(1−τ+ε)k and thus S ≤ 2−εkj,k proving (A.3). Likewise,
j
∣∣ ∣sup X̄R − X̄ ∣j,k
Rj,k≤R≤Rj+1,k ∑
≤ 1 e−||x(w)||2
|y(w)|
w∈L+: ||x(w)||≤||w||ε,||w||∈[Rj,k,Rj+1,k]
≤ C(L)2−2kVol(w : ||x(w)|| ≤ ||w||ε, Rj,k ≤ ||w|| ≤ Rj+1,k)
≤ C̄(L)2τ+ε(d−1)−1
proving (A.4). Lemma 2.1.2 is established.
130
A.2 Hierarchy of Expansions.
In the discussion below, we do not assume the abstract setting introduced in
section 3.1. Therefore the hierarchy of asymptotic expansions provided here holds
true in general.
We observe that the classical Edgeworth expansion is the strongest form of
asymptotic expansion among the expansions for non-lattice random variables. The
following proposition and remark A.2.1 establish this fact.
Proposition A.2.1. Suppose SN admits order r Edgeworth expansions, then it also
admits order r weak global expansion for f ∈ F 10 and order r averaged expansions
for f ∈ L1. Further, if the polynomials Pp in the Edgeworth expansion has opposite
parity as p then SN admits order r − 1 weak local expansion for f ∈ F 1r .
Remark A.2.1. Section 3.5.2 contains examples for which the weak and averaged
forms of expansions exist but the strong expansion does not. Therefore none of the
above implications are reversible. (Sn − nA )
Proof of Proposition A.2.1. Suppose f ∈ F 10 . Let Fn = P √ ≤ x and putn
∑r
E Pp(x)r,n(x) = N(x) + n(x).
np/2
p=1
Observe that Fn(x)− En(x) = o(n−r/2) uniformly in x and,
∑r [ ] ∑r1 Rp(x)
dEr,n(x) = n(x) dx+ P ′p (x) n (x) + Pp(x)n′(x) dx = n(x) dxnp/2 np/2
p=1 p=0
where R are polynomials given by R = P ′p p p + PpQ and Q is such that n
′(x) =
131
Q(x)n(x). Next, we observe that,
( ( ∫Sn − nA√ )) √E(f(Sn − nA)) = E∫ f √ n = f(x n) dFn(x)n√ ∫ √
= f(x n) dEr,n(x) + f(x n) d(Fn − Er,n)(x).
Now we integrate by parts and use Er,n(∞) = Fn(∞) = 1 and Er,n(−∞) = Fn(−∞) =
0 to obtain, ∫ √ √
E − ∣∣∞(f(Sn nA)) = f(x n) dEr,n(x) + (Fn − Er,n)(∫x)f(x n)∣−∞ √ √
∫ − (Fn − Er,∫n)(x) nf
′(x n) dx
∑r 1 √ ( ) √ √
= Rp(x)n(x) f(x n)dx+ o n
−r/2 nf ′(x n) dx
np/2∑p=0r ∫1 √ ( )
= Rp(x)n(x) f(x n)dx+ o n
−r/2 .
np/2
p=0
This is the order r weak global Edgeworth expansion. The existence of the
order r− 1 weak local expansion follows from this. This is our next theorem. So we
postpone its proof.
y
For f ∈ L1 substituting x by x + √ in the Edgeworth expansion for Sn we
n
have( ) ( )
Sn√− nAP ≤ x+ √y −N x+ √y
n n ∑ nr ( ) ( )1 y y ( − )= Pp x+ √ n x+ √ + o n r/2 .
np/2 n n
p=1
For fixed x, the error is uniform in y. Therefore, multiplying the equation by f(y)
and then integrating we can conclude that the order r averaged expansion exists.
Remark A.2.2. We have seen from the derivation of the Edgeworth expansion in
section 3.2 that Pp(x) and p have opposite parity in the weakly dependent case. This
132
implies that Pp,g has the same parity as p. This is true in the i.i.d. case as well.
Even though this assumption may look artificial in the general case, it is reasonable.
When using characteristic functions to derive the expansions, one is likely to end
up with Hermite polynomials which is the reason behind the parity relation.
Next, we compare the the relationships among the weak and averaged forms
of Edgeworth expansions.
Proposition A.2.2. Suppose SN admits order r weak global Edgeworth expansion
for f ∈ F q+1r for some q ≥ 0. If the polynomials Pp,g in the global Edgeworth
expansion has the same parity as p then SN admits order r− 1 weak local expansion
for f .
Proof. Assume, f ∈ F 1r . Then, from the Plancherel formula,∫ √ ( √ ) ∫1 ( t ) σ2t2
nf x n Pp,g(x)n(x) dx = f̂ √ Ap(t)e− 2 dt
R 2π R n
where Ap(t) are polynomials constructed using the following relation,( )[ ]
− t
2 1 d t2
P (t)e 2 = √ A −i e−p,g 2σ p 2σ2 .
2πσ2 dt
By construction Pp,g and Ap has the same parity. This means Ap has the same
parity as p.
First replace ∫ √
Pp,g(x)n(x) f(x n)dx
by ∫
1 ( ) 2 2√ √t σ tf̂ Ap(t)e− 2 dt
2π n R n
133
in the weak global expansion to obtain,
√ ∑r ∫1 1 ( t ) σ2t2 ( )
nE(f(S − nA)) = f̂ √ A (t)e− dt+ o n−(r−1)/2n p/2 p 2 .2π n
p=0 R
n
Then substituting for f̂ with its order r − 1 Taylor expansion,
√ 1 ∑r ∑r−1 f̂ (j) ∫(0) 2 2 ( )
nE(f(Sn − nA)) = tje−σ t /2Ap(t) dt + o n−(r−1)/2 .
2π j!n(j+p)/2
p=0 j=0 R
Put ∫ ∫
a = tje−σ
2t2/2A (j) jpj p(t) dt = 0 and f (0) = (−it) f(t) dt
R
to get,
√ ∑r ∑r−1 ∫1 apj ( )
nE(f(S − nA)) = (−it)jn f(t) dt+ o n−(r−1)/2
2π j!n(j+p)/2
p=0 j=0 R
Since p and Ap are of the same parity, when j + p is odd. apj = 0. So we collect
terms such that p+ j = 2k where k = 0, . . . , r − 1 and write,
∑ apj
Pk,w = (−it)j
j!
p+j=2k
Then, rearranging, simplifying and absorbing higher order terms to the error, we
obtain,
√ b(r∑−1)/2c ∫1 1 (
nE(f(S − nA)) = P (t)f(t) dt+ o n−(r−
)
1)/2
n k,w
2π nk
k=0 R
which is the order r − 1 weak local Edgeworth expansion.
134
A.3 Construction of {fk}.
1 1
For each k, let fk(x) = tan
−1(kx)+ for x ∈ [−1, k]. Extend fk to [−2, k+1]
π 2
in such a way that fk(−2) = fk(k + 1) = 0, fk is continuously differentiable and
satisfying the following conditions.
1. fk is increasing on [−2, k] with derivative on [−2,−1] is bounded above by 1.
2. fk is decreasing on [k + 1/2, k + 1] with derivative bounded below by −5.
3. |f ′k| ≤ 5 on [k, k + 1].
4. 0 ≤ fk ≤ 1 on [−2, k + 1] and fk = 0 elsewhere.
Then, fk is supported on [−2, k + 1]. Here our choice of bounds 1 and −5 in some
sense arbitrary. As long as they are large enough and independent of k, we obtain
an appropriate sequence of functions.
As an example, when k = 5, the graph of f5 looks like:
For all γ > 0,
∫ ∫ ∫ ∞
|(fk) (x)| dx = |e−γxγ fk(x)| dx ≤ e−γx dx = Cγ,1 <∞
−2
because 0 ≤ fk ≤ 1.
135
Since |f ′k| ≤ 5 on [k, k + 1], 0 ≤ fk ≤ 1 and fk is increasing on [−2, k],∫ ∫ k+1
|((f ′k)γ) (x)|dx = ∫ |γe
−γxfk(x) + e
−γxf ′k(x)| dx
−2
k+1 ( )
≤ γe−γx∫ fk(∫x) + e
−γx|f ′k(x)| dx
−2
k k ∫ k+1
≤ γ −γx ′ −γx −γx
−2 ∫e dx+ fk(x) dx+ (γe + 5e ) dx−1 kk+1
≤ 1 + (5 + γ)e−γx dx = Cγ,2 <∞
−2
Also, note that |xlf l −γxk(x)| ≤ x e for all x ∈ [−2, k + 1]. Hence,
∫ ∫ ∞
|xlf l −γxk(x)| dx ≤ x e dx = Jγ,l <∞
−2
Put Jr(γ) = max Jγ,l and Cγ(r) = max{Jr(γ), Cγ,1, Cγ,2}. Then, Cγ(r) is finite and
1≤l≤r
depends only on γ and r.
Now, we have the following,
1. C1r+1((fk)γ) ≤ Cγ(r) for all k.
1 1
2. Since, tan−1(kx) + converges pointwise to 1[0,∞)(x), it is easy to see that
π 2
fk → 1[0,∞) pointwise.
3. Since for each p, e−γzPp(z)fk(z) converges pointwise to e
−γzPp(z)1[0,∞)(z),
e−γz|P (z)|1 −γz −γzp [−2,∞) is integrable and |e Pp(z)fk(z)| ≤ e |Pp(z)|1[−2,∞), we
can apply the LDCT to conclude,
∫ ∫ ∞ ∫ ∞
Pp(z)g
−γz −γz
k (z) dz = e Pp(z)fk(z) dz → e Pp(z) dz.
−2 0
136
Bibliography
[1] Bahadur, R. R., Ranga Rao R.; On Deviations of the Sample Mean. Ann. Math.
Statist. 31 (1960), no. 4, 1015-1027.
[2] Birkhoff, G. Extensions of Jentzschs theorem. Trans. Amer. Math. Soc. 85
(1957), no. 1, 219–227.
[3] Bhattacharya, R. N., Ranga Rao R.; Normal Approximation and Asymptotic
Expansions, first edition, John Wiley and Sons, 1976, xiv+274 pp.
[4] Breuillard, E. Distributions diophantiennes et theoreme limite local sur Rd.
Probab. Theory Related Fields 132 (2005), no. 1, 39–73.
[5] Butterley, O., Eslami, P. Exponential mixing for skew products with disconti-
nuities. Trans. Amer. Math. Soc. 369 (2017), no. 2, 783–803.
[6] Bougerol, P., Lacroix, J.; Products of random matrices with applications to
Schrödinger operators, Progress in Probability and Statistics, first edition,
Birkhäuser Basel, Boston, 1985, xi+284 pp.
[7] Chaganty, N. R., Sethuraman, J., Strong Large Deviation and Local Limit The-
orems, Ann. Probab. 21 (1993), no. 3, 1671-1690.
[8] Chebyshev, P. L. Sur le développement des fonctions à une seule variable. Bull,
de l’Acad. Imp. des Sci. de St. Petersbourg 3(1) (1860), 193-200.
[9] Coelho, Z., Parry, W. Central limit asymptotics for shifts of finite type. Israel
J. Math. 69 (1990), no. 2, 235–249.
[10] Cramér, H. On the composition of elementary errors. Skand. Aktuarietidskr. 1
(1928), 13–74; 141–180.
[11] Cramér, H.; Random variables and probability distributions. Cambridge Tracts
in Mathematics no. 36, Cambridge, 1937, 122 pp.
137
[12] Dolgopyat, D. A Local Limit Theorem for sum of independent random vectors,
Electronic J. Prob. 21 (2016) paper 39.
[13] Dolgopyat, D. On mixing properties of compact group extensions of hyperbolic
systems. Israel J. Math. 130 (2002), 157–205.
[14] Dolgopyat D., Fayad B. Deviations of ergodic sums for toral translations: Con-
vex bodies, GAFA 24 (2014) 85–115.
[15] Dolgopyat D., Fayad B. Limit theorems for toral translations, Proc. Sympos.
Pure Math 89 (2015) 227–277.
[16] Dolgopyat, D., Fernando, K. An error term in the Central Limit Theorem for
sums of discrete random variables. preprint.
[17] Dembo, A., Zeitouni O.; Large Deviations Techniques and Applications, second
edition, Springer–Verlag Berlin Heidelberg, 2010, XVI+396.
[18] Eskin A., McMullen C. Mixing, counting, and equidistribution in Lie groups,
Duke Math. J. 71 (1993) 181–209.
[19] Esséen, C.–G. Fourier analysis of distribution functions. A mathematical study
of the Laplace-Gaussian law, Acta Math. 77 (1945) 1–125.
[20] Feller, W.; An introduction to probability theory and its applications Vol. II.,
Second edition, John Wiley & Sons, Inc., New York-London-Sydney, 1971,
xxiv+669.
[21] Fernando, K., Liverani, C. Edgeworth expansions for weakly dependent random
variables. arXiv:1803.07667 [math.PR].
[22] Götze F., Hipp C. Asymptotic Expansions for sums of Weakly Dependent Ran-
dom Vectors, Z. Wahrscheinlickeitstheorie verw., 64 (1983) 211-239.
[23] Gnedenko B.V., Kolmogorov A.N.; Limit distributions for sums of independent
random variables, Trans. K.L. Chung, Revised edition, Addison-Wesley, 1968,
ix+264 pp.
[24] Gouëzel, S. Limit theorems in dynamical systems using the spectral method.
Hyperbolic dynamics, fluctuations and large deviations, Proc. Sympos. Pure
Math., 89 (2015) 161–193, AMS, Providence, RI.
[25] Guivarc'h, Y., Hardy J. Théorèmes limites pour une classe de châınes de
Markov et applications aux difféomorphismes d 'Anosov, Annales de l'I.H.P.
Probabilités et statistiques, 24 (1) (1988) 73-98.
[26] Hall, P. Contributions of Rabi Bhattacharya to the Central Limit Theory and
138
Normal Approximation. In Manfred Denker & Edward C. Waymire (Eds.), Rabi
N. Bhattacharya Selected Papers, (pp 3–13). Birkhäuser Basel, 2016.
[27] Hennion, H., Hervé, L.; Limit Theorems for Markov Chains and Stochastic
Properties of Dynamical Systems by Quasi-Compactness, Lecture Notes in
Mathematics, first edition, Springer-Verlag, Berlin Heidelberg, 2001, viii+125
pp.
[28] Hebbar, P., Nolen, J. The asymptotics of solutions to parabolic PDE with peri-
odic coefficients, preprint.
[29] Hervé, L., Pène, F. The Nagaev-Guivarc'h method via the Keller-Liverani the-
orem, Bull. Soc. Math. France 138 (2010) no. 3, 415–489.
[30] den Hollander, F.; Large Deviations, Fields Institute Monographs 14, American
Mathematical Society, Providence, RI, 2000, x+142 pp.
[31] Ibragimov, I. A., Linnik, Y. V.; Independent and stationary sequences of ran-
dom variables. With a supplementary chapter by I. A. Ibragimov and V. V.
Petrov. Translation from the Russian edited by J. F. C. Kingman. Wolters-
Noordhoff Publishing, Groningen, 1971, 443 pp.
[32] Joutard, C. Strong large deviations for arbitrary sequences of random variables,
Ann. Inst. Stat. Math. 65 (2013) no. 1, 49-67.
[33] Kato, T.; Perturbation theory for linear operators, Classics in Mathematics,
Reprint of the 1980 edition, Springer-Verlag, Berlin, 1995, xxii+619.
[34] Kesten H. Uniform distribution mod 1, part I: Ann. of Math. 71 (1960) 445–471,
part II: Acta Arith. 7 (1961/1962) 355–380.
[35] Kleinbock D. Y., Margulis G. A. Bounded orbits of nonquasiunipotent flows on
homogeneous spaces, AMS Transl. 171 (1996) 141–172.
[36] Kleinbock D. Y., Margulis G. A. Logarithm laws for flows on homogeneous
spaces, Invent. Math. 138 (1999) 451–494.
[37] Liverani, C. Decay of correlations for piecewise expanding maps. J. Statist.
Phys. 78 (1995), no. 3-4, 1111–1129.
[38] Marklof J. The n-point correlations between values of a linear form, Erg. Th.,
Dynam. Sys. 20 (2000) 1127–1172.
[39] Marklof J., Strombergsson A. The distribution of free path lengths in the pe-
riodic Lorentz gas and related lattice point problems, Ann. Math. 172 (2010)
1949–2033.
139
[40] Nagaev S. V. More Exact Statement of Limit Theorems for Homogeneous
Markov Chain, Theory Probab. Appl., 6(1) (1961) 62–81.
[41] Nagaev S. V. Some Limit Theorems for Stationary Markov Chains, Theory
Probab. Appl., 2(4) (1959) 378–406.
[42] Pène F. Mixing and decorrelation in infinite measure: the case of periodic Sinai
billiard, arXiv:1706.04461v1 [math.DS].
[43] Rubin H., Sethuraman J. Probabilities of moderate deviations, Sankhya Ser. A,
27 (1965) 325–346.
[44] Rubin H., Sethuraman J. Bayes risk efficiency, Sankhya Ser. A, 27 (1965)
347–356.
[45] Shah N. A. Limit distributions of expanding translates of certain orbits on ho-
mogeneous spaces, Proc. Indian Acad. Sci. Math. Sci. 106 (1996) 105–125.
[46] Siegel C. L.; Lectures on the geometry of numbers, Springer, Berlin, 1989.
x+160 pp.
[47] Sprindzuk V. G.; Metric theory of Diophantine approximations, Scripta Series
in Math. V. H. Winston & Sons, Washington, D.C. 1979. xiii+156 pp.
[48] Stein E. M.; Harmonic Analysis: Real-Variable Methods, Orthogonality, and
Oscillatory Integrals, Princeton University Press, first edition, 1993. xiv+695
pp.
140