ABSTRACT
Title of dissertation: MEANS AND AVERAGING
ON RIEMANNIAN MANIFOLDS
Bijan Afsari, Doctor of Philosophy, 2009
Dissertation directed by: Professor P. S. Krishnaprasad
Department of Electrical and Computer
Engineering, and Program of Applied
Mathematics and Scientific Computations
Professor K. Grove
Department of Mathematics
Processing of manifold-valued data has received considerable attention in re-
cent years. Standard data processing methods are not adequate for such data.
Among many related data processing tasks finding means or averages of manifold-
valued data is a basic and important one. Although means on Riemannian man-
ifolds have a long history, there are still many unanswered theoretical questions
about them, some of which we try to answer. We focus on two classes of means: the
Riemannian Lp mean and the recursive-iterative means. The Riemannian Lp mean
is defined as the solution(s) of a minimization problem, while the recursive-iterative
means are defined based on the notion of Mean-Invariance (MI) in a recursive and
iterative process. We give a new existence and uniqueness result for the Rieman-
nian Lp mean. The significant consequence is that it shows the local and global
definitions of the Riemannian Lp mean coincide under an uncompromised condi-
tion which guarantees the uniqueness of the local mean. We also study smoothness,
isometry compatibility, convexity and noise sensitivity properties of the Lp mean. In
particular, we argue that positive sectional curvature of a manifold can cause high
sensitivity to noise for the L2 mean which might lead to a non-averaging behavior
of that mean. We show that the L2 mean on a manifold of positive curvature can
have an averaging property in a weak sense. We introduce the notion of MI, and
study a large class of recursive-iterative means. MI means are related to an inter-
esting class of dynamical systems that can find Riemannian convex combinations.
A special class of the MI means called pairwise mean, which through an iterative
scheme called Perimeter Shrinkage is related to cyclic pursuit on manifolds, is also
studied. Finally, we derive results specific to the special orthogonal group and the
Grassmannian manifold, as these manifolds appear naturally in many applications.
We distinguish the 2-norm Finsler balls of appropriate radius in these manifolds
as domains for existence and uniqueness of the studied means. We also introduce
some efficient numerical methods to perform the related calculations in the specified
manifolds.
MEANS AND AVERAGING ON
RIEMANNIAN MANIFOLDS
by
Bijan Afsari
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2009
Advisory Committee:
Professor P. S. Krishnaprasad, Co-Chair and Advisor
Professor Karsten Grove, Co-Chair and Co-Advisor
Professor Pamela Abshire (Dean?s Representative)
Professor Ramalingam Chellappa
Professor Ricardo Nochetto
c?Copyright by
Bijan Afsari
2009
Dedication
This dissertation is dedicated to my parents.
ii
Acknowledgments
I feel privileged to have the opportunity to learn from two fine scientists:
Professors P. S. Krishnaprasad and Karsten Grove. Prof. Krishnaprasad introduced
me to geometry and geometrical thinking in studying many applied problems. His
methodology of thinking will have a lasting impact on me. I am also thankful for
his supports of all kind during the past several years, as well as for his patience in
dealing with a not-so-patient student like me. I have learnt a lot about Riemannian
geometry from Prof. Grove, and I have become an admirer of his thinking style.
I certainly enjoyed my trips to the University of Notre Dame and discussions with
him there. I have to thank my advisors for giving me the freedom and independence
to think about problems that I found interesting.
As far as this dissertation is concerned, realistically, after my advisors the next
entity to thank is the Google search engine, for obvious reasons!
I should also thank my Ph.D. committee members for their insightful com-
ments.
I cannot forget the helps and cares from the people in the math department
especially Alverda McCoy and Celeste Regaldo. Another group of people whom I
have to thank is the nice people of International Education Services (IES) office at
UMD.
Throughout my undergraduate studies in Sharif University, and my graduate
studies in Tehran Polytechnic and the University of Maryland, I have benefited
from attending many classes taught by excellent teachers. Among them I would
iii
like to thank Professors: H. Amindavar, M. Dehghan, M. Freidlin, H. Hashemi, M.
Jahanbegloo, M. Machedon, D. Madan, R. Nochetto, H. Sheikhzadeh, and A. Tits.
My list of friends who made my graduate life almost always enjoyable is too
long, all of whom I thank. Fortunately, the list of those who directly influenced
this dissertation is short: I thank Arash for his helps with LATEXand Pedram for
discussions about consensus algorithms. I also thank Ali Hirsa for several reasons.
I thank my family for their constant support. I cannot imagine finishing this
dissertation without the helps from brother Bahman.
This research was supported in part by the Army Research Office under the
ODDR&E MURI01 Program Grant No. DAAD19-01-1-0465 to the Center for
Communicating Networked Control Systems (through Boston University), by NSF-
NIH Collaborative Research in Computational Neuroscience Program (CRCNS2004)
NIH-NIBIB grant 1 R01 EB004750-1, and by NSF grants DMS-0204671 and DMS-
0706791.
iv
Table of Contents
List of Tables viii
List of Figures ix
1 Introduction 1
1.1 Scope and Main Contributions of this Dissertation . . . . . . . . . . . 3
1.2 Outline of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . 6
2 Preliminaries from Riemannian Geometry 8
2.1 Introduction and Notations . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Some Notational Conventions . . . . . . . . . . . . . . . . . . 8
2.2 First Variation of Arc Length . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Structure of Convex Sets in Riemannian Manifolds . . . . . . . . . . 11
2.4 Bounds on the Hessian of the Distance Function and Its Powers . . . 16
2.4.1 A Useful Length Comparison Estimate . . . . . . . . . . . . . 22
2.5 Cartan-Alexandrov-Toponogov (C.A.T.) Comparison Theorems . . . 24
3 Existence and Uniqueness of the Riemannian Lp Mean 30
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.1 On the History of the Riemannian Center of Mass . . . . . . . 32
3.1.2 Contributions and Outline of the Chapter . . . . . . . . . . . 38
3.2 Existence and Uniqueness of the Riemannian Lp Means . . . . . . . . 39
3.2.1 Proof of Theorem 3.2.1 . . . . . . . . . . . . . . . . . . . . . . 40
3.2.2 The Case of an Arbitrary Probability Measure in M . . . . . . 51
3.3 Some Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Properties of the Riemannian Lp Mean 54
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.1 Contributions and Outline of the Chapter . . . . . . . . . . . 54
4.2 The L? Center of Mass and the Minimal Ball . . . . . . . . . . . . . 56
4.2.1 Lipschitz Continuity of f?(x) . . . . . . . . . . . . . . . . . . 59
4.3 Smoothness Properties of the Lp Means . . . . . . . . . . . . . . . . . 60
4.4 Isometry Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . 64
v
4.5 Convexity Properties of the Lp Means . . . . . . . . . . . . . . . . . . 65
4.5.1 A Refined Result for Constant Curvature Manifolds . . . . . . 70
4.6 Sensitivity Properties of the L2 Mean . . . . . . . . . . . . . . . . . . 72
4.6.1 Lack of Averaging Property in Positively Curved Manifolds . . 74
4.6.2 Weak Averaging Property in Positively Curved Manifolds . . . 79
4.6.3 A General Bound for Sensitivity . . . . . . . . . . . . . . . . . 82
4.7 Computing the Lp Means and an Example for Points on a Sphere . . 85
4.7.1 Gradient Descent Flow for Finding ?xp (1 < p <?) . . . . . . 86
4.7.2 Gradient Descent Method for Finding ?xp . . . . . . . . . . . . 88
4.7.3 An Example for Points on the Sphere in R3 . . . . . . . . . . 89
4.8 Appendix: Some Facts About Convex Sets in Euclidean Space . . . . 92
5 Recursive-Iterative Approach in Defining Means 94
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.1.1 Contributions and Outline of the Chapter . . . . . . . . . . . 96
5.2 Convex Mean Functions and Their Finite and Infinite Compositions . 98
5.2.1 Finite Compositions of Strictly Convex Mean Functions . . . . 101
5.2.2 Infinite Compositions of Convex Vectorial Mean Functions . . 104
5.2.3 Higher Orders of Smoothness . . . . . . . . . . . . . . . . . . 108
5.2.4 Isometry Compatibility . . . . . . . . . . . . . . . . . . . . . . 109
5.2.5 Dynamical System Point of View . . . . . . . . . . . . . . . . 110
5.2.6 Relation to Infinite Products of Stochastic Matrices . . . . . . 113
5.3 Perimeter Shrinkage, Cyclic Pursuit and a Pairwise Mean . . . . . . . 116
5.3.1 Perimeter Shrinkage Scheme and a Pairwise-Iterative Mean . . 116
5.3.2 Cyclic Pursuit on Manifolds: Discrete-time Case . . . . . . . . 124
5.3.3 Cyclic Pursuit Schemes in Continuous-time . . . . . . . . . . . 127
5.4 Means Based on Mean-Invariance . . . . . . . . . . . . . . . . . . . . 129
5.4.1 Weighted Mean-invariance Means . . . . . . . . . . . . . . . . 131
5.4.2 Existence, Uniqueness and Other Properties . . . . . . . . . . 134
5.4.3 Generalized Weighted Mean-Invariance Means . . . . . . . . . 140
5.5 Discussion and Examples . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.5.1 Perimeter Shrinkage and Pairwise Mean: An Example . . . . . 143
5.5.2 Cyclic Pursuit on the Sphere . . . . . . . . . . . . . . . . . . . 146
5.5.3 An Example of GMI Means on the Sphere . . . . . . . . . . . 147
6 Results Specific to the Orthogonal Group and the Grassmannian 151
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.1.1 Contributions and Outline of the Chapter . . . . . . . . . . . 152
6.2 Riemannian Geometry of SO(n) and Gp(n) . . . . . . . . . . . . . . . 153
6.2.1 Riemannian Geometry of SO(n) . . . . . . . . . . . . . . . . . 154
6.2.2 Riemannian Geometry of Gk(n) . . . . . . . . . . . . . . . . . 158
6.3 Lp and MI Means in Riemannian Balls . . . . . . . . . . . . . . . . . 163
6.4 Large Non-ball Convex Domains and a Finsler Structure . . . . . . . 164
6.4.1 The Matrix 2-Norm as a Finsler Norm . . . . . . . . . . . . . 166
6.4.2 Hessian of the Distance Function in Symmetric Spaces . . . . 171
vi
6.4.3 2-norm Finsler Balls in SO(n) . . . . . . . . . . . . . . . . . . 174
6.4.4 2-norm Finsler Balls in Gk(n) . . . . . . . . . . . . . . . . . . 175
6.5 Local Riemannian Lp Mean in 2-norm Finsler Balls . . . . . . . . . . 177
6.6 Mean-Invariance Means in 2-norm Finsler Balls . . . . . . . . . . . . 179
6.7 Efficient Geodesic Interpolation and Extrapolation . . . . . . . . . . . 180
6.7.1 Efficient Geodesic Interpolation and Extrapolation in SO(n) . 181
6.7.2 Efficient Geodesic Interpolation and Extrapolation in Gk(n) . 186
6.8 Two Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.8.1 Nelder-Mead Direct Search Algorithm in SO(n) . . . . . . . . 191
6.8.2 Clustering of Subspaces . . . . . . . . . . . . . . . . . . . . . 195
7 Conclusions and Future Research Directions 199
Bibliography 203
vii
List of Tables
4.1 Pseudo-code for gradient descent to find Lp Mean . . . . . . . . . . . 89
5.1 Angular distance between different means in example in Section 5.5.1 146
6.1 Pseudo-code for dyadic geodesic interpolation and extrapolation. . . . 184
6.2 MATLAB c? code for dyadic and direct geodesic interpolation . . . . . 185
6.3 Comparison between two methods to constructs geodesics in SO(n) . 186
6.4 Pseudo-code for the Nelder-Mead algorithm on a Riemannian manifold193
6.5 Optimization problem on SO(n) solved by the Nelder-Mead algorithm195
6.6 Pseudo-code for the K-means algorithm on a Riemannian manifold . 196
6.7 Table showing the performance of the K-means algorithm in G3(20) . 198
viii
List of Figures
2.1 First variation of arc length . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 The setting to define Jacobi fields and geodesic linearization . . . . . 17
2.3 A figure depicting the C.A.T. comparison theorems . . . . . . . . . . 29
3.1 A configuration used to prove insideness part of Theorem 3.2.1 . . . . 45
3.2 A configuration used to prove the uniqueness part of Theorem 3.2.1 . 49
3.3 An example showing nonuniqueness of the L1 mean in large balls . . 53
4.1 An example of minimal ball of N = 5 points . . . . . . . . . . . . . . 58
4.2 Lack of strong averaging property of the L2 mean on a sphere . . . . 76
4.3 Weak averaging property of the L2 mean on a sphere . . . . . . . . . 80
4.4 Different Lp means of N = 4 points on the unit sphere in R3 . . . . . 91
5.1 Convex and strictly convex mean functions . . . . . . . . . . . . . . . 100
5.2 Dynamical systems point of view of recursive-iterative means . . . . . 112
5.3 An example of Perimeter Shrinkage scheme . . . . . . . . . . . . . . . 119
5.4 Example of Perimeter Shrinkage on the sphere . . . . . . . . . . . . 145
5.5 Example of cyclic pursuit on the sphere . . . . . . . . . . . . . . . . . 148
5.6 Example of different GMI means on the sphere . . . . . . . . . . . . . 150
ix
6.1 Riemannian and 2-norm Finsler balls in so(n) . . . . . . . . . . . . . 176
x
Chapter 1
Introduction
It is an irony of the history of mathematics that Gauss?s efforts in the analysis of
astronomical measurements led to the theory of statistics in Euclidean space rather
than to the theory of statistics in manifolds [65, p. 12]1. Gauss was interested in
predicting spherical quantities but since the measurements were concentrated in a
small region he approximate the sphere by its tangent space around that point of
concentration, and hence reached at the celebrated Linear Mean Square estimation
technique. In fact, the approach of linearization of a manifold in order to process a
data set which takes values on the manifold is the most natural approach, and is used
in everyday life. A very common example is the case of directional data, e.g, the
heading of a vehicle moving in a plane or the direction of wind with respect to the
north. In both cases the data values live on the unit circle S1, but are represented
by a single number ? in an interval such as [0,2?). One immediate problem is
1Of course, we should not ignore the fact that it was almost 30 years later than Gauss?s
initial calculations that mathematicians obtained a better understanding of the geometry of non-
Euclidean spaces. The efforts of Gauss and many others in this direction culminated in the 1854
introduction of Riemannian Geometry by his student B. Riemann. The interesting point is that
statistics in Euclidean space developed and flourished independently, while statistics in Riemannian
manifolds is still in its development stage. One reason is that Riemannian geometry itself was
developed mostly in the 20th century.
1
that mathematically there is no preferred reference point on S1 and it is not clear
that if we choose another point of reference how the result of our subsequent data
processing will be affected. In other words the choice of the point around which we
linearize the manifold as well as the actual linearization could affect our calculations.
Of course, with a smart linearization and especially if the data points are very close
to the point of linearization, we can get away with potential problems, as Gauss
did. However, a more systematic way is to consider the geometry of the underlying
manifold and try to define data processing tasks in an intrinsic fashion which depend
only on the geometry of the manifold and not specific linearizations.
In recent years there has been an increasing interest in geometric methods for
processing of manifold-valued due to both diverse applications and the understand-
ing that such methods can be beneficial. Manifold-valued data appear naturally
in many diverse areas such crystallography [44], diffusion tensor medical imaging
[12, 75], computer graphics [17, 83], robotics [74, 92], multi-input multi-out wireless
communications systems [64, 48], statistical analysis of shapes [62], array processing
[84, 85] and computer vision [87, 7]. The general approach to define geometric signal
processing tasks for such data has been to redefine or generalize the standard tasks
via appropriate geometrical concepts. A very basic signal processing task is to find
the mean or average of a set of data points on a manifold. The definition of the
weighted average via the Riemannian center of mass is an example of the mentioned
redefinition. Several other data processing tasks also have been redefined e.g., the
Principal Component Analysis as in [29]. Actual numerical implementation of these
geometrical methods requires specific algorithms for computations on Riemannian
2
manifolds which also have received considerable attention from statistics, signal pro-
cessing and numerical analysis communities e.g., [84, 26, 2, 30, 45, 61, 3]. We should
also mention several interesting Ph.D. dissertations such as [10],[56] and [6] which
study statistics and data processing on Riemannian manifolds.
1.1 Scope and Main Contributions of this Dissertation
Although, there has been a great number of interesting works related to processing
of manifold-valued data, still the literature on this subject lacks many fundamental
pieces and there are many un-answered theoretical and applied questions in this
regard. The effort of this dissertation is to answer some of these questions which
relate to the notion of averaging of data points on a Riemannian manifold 1. For
example, the notion of L2 center of mass or Riemannian L2 mean which dates back
to E. Cartan has been used extensively in applied research. The definition Cartan
used for such a mean was a global definition which in a manifold of non-positive
curvature gives a well defined unique center of mass for a probability measure with
compact support. On the other hand, so far (with the exception of [61] and [17])
available existence and uniqueness results for the Riemannian L2 mean on manifolds
of positive curvature have been about the local L2 mean or center of mass [34, 50, 52].
In Chapter 3 of this dissertation, we give an existence and uniqueness result for the
global Riemannian mean, which in particular shows the equivalence of the local and
global definitions under an uncompromised condition (see Section 3.1.1 for more
1The focus of this dissertation is on the so-called intrinsic means and intrinsic averaging on
Riemannian manifolds. The other approach which is called the extrinsic or embedding approach
has been popular in the statistics community and is much more studied [65]. We shall touch upon
the notion of extrinsic mean in few places such as Sections 5.5.1 and 6.7.
3
details)1. We treat the larger class of Riemannian Lp(1?p??) means together
and in particular we pay special attention to the L? center of mass (see Section 4.2).
Another theoretical question is the relation between the Riemannian center of mass
of a set of points and the convex hull of the set. In Section 4.5 we address this issue in
detail, and answer a related question by [32]. Another fundamental question is about
the nature of the operation of averaging in a Riemannian manifold. By this we mean,
to what extent averaging reduces the effect of noise? Above all the main reason to
average a set of data points is to reduce the effect of noise! In Section 4.6, we define
different forms of averaging properties for the Riemannian L2 mean (see Definition
4.6.1), and show that in a manifold of positive curvature averaging might in some
occasions, unfortunately, amplify the noise! This is the direct consequence of positive
curvature and it seems to be unavoidable. However, we show that in a small enough
domain in a manifold of nonnegative curvature the L2 mean can have an averaging
property in a weak sense (see Theorem 4.6.1). Although the Riemannian L2 mean
is the most popular mean used, interestingly, there are many other possibilities for
defining the notion of mean in a Riemannian manifold. For example in [6] and
[5] two different forms of mean have been introduced which have certain desirable
properties. In particular, in [5], for points on the cone of positive definite matrices,
a constructive recursive-iterative approach has been given to define the geometric
mean of N points based on the mean of N ?1 points, and recursively down to
N = 2 points. In Chapter 5, we analyze a wide class of means which are defined in a
1Although the intended goal of current study has been data analysis on manifolds, some of the
derived results and specifically Theorems 3.2.1 and 4.5.1 might be of broad interest to researchers
in Riemannian geometry, as well.
4
similar or more general recursive-iterative manner. In particular, we give existence
and uniqueness results for such means. Our results are more general than those
obtained in [5], since we introduce and use different tools. In particular, we introduce
the notion of a primitive mean function, which generalizes the notion of a primitive
stochastic matrix (see Section 5.2 and Definition 5.2.3). The study of such recursive-
iterative means leads to the analysis of an interesting class of dynamical systems
that find Riemannian convex combinations. We use the diameter of the smallest
closed ball containing the points comprising the state as a Lyapunov function. We
call that ball the minimal ball (a.k.a. circumscribed ball, see Definition 4.2.1). Such
a system finds the mean of the set of initial data points by iteratively replacing
each data point with a ?convex combination? of some or all data points and the
state of the system converges to a single point which will be the mean of the initial
state. In each step the mean of the points which comprise the state of the system
remains unchanged. We call this property Mean Invariance (MI). A particular class
of the mentioned means is called the pairwise mean (see Section 5.3), which can
be implemented efficiently in certain symmetric (matrix related) manifolds such as
the special orthogonal group SO(n) and the Grassmannian Gk(n). In Chapter 6
we give more specific (i.e., stronger) results for these two manifolds. In particular,
we place emphasis on identifying large domains (compared with Riemannian balls,
see Section 6.4 and [36] for more details) for the existence and uniqueness of the
means introduced in the previous chapters. Such larger domains are not balls in
the Riemannian metric but rather balls in a Finsler metric which is derived from
the matrix 2-norm (instead of the trace or Frobenius norm) in the Lie algebra
5
of SO(n). We also introduce some efficient techniques to construct (and extend)
the geodesic between two points in SO(n) or Gk(n) without the use of matrix
exponential and logarithm -which are computationally very demanding-. We apply
the derived methods to two applications.
1.2 Outline of this Dissertation
Each of the subsequent chapters (except for Chapter 2) in this dissertation has its
own introduction, contribution and outline sections (see Sections 3.1.2, 4.1.1, 5.1.1
and 6.1.1 for more details). Here, we briefly give an outline of the chapters for
ease of use and navigation. Throughout the text, it is assumed that the reader is
familiar with the basics of Riemannian geometry at the level of a first year graduate
course on the subject. Still, in Chapter 2 we have collected some less commonly
used results, mostly from global Riemannian geometry, which are critical for our
later developments. In particular, results on the structure of convex sets, bounds
on the Hessian of the distance function and triangle comparison theorems are very
essential for our developments, and are collected in this chapter. In Chapter 3 after
some historical background, Theorem 3.2.1 is proved which gives a new existence
and uniqueness result for the Riemannian Lp mean. In Chapter 4 further properties
of the Riemannian Lp mean such as smooth dependence on data points, isometry
compatibility, convexity, and averaging properties are studied. Also a simple ex-
ample is given regarding numerical calculation of the mean. In Chapter 5 a large
class of recursive-iterative means based on the notion of Mean-Invariance (MI) is
studied. Interesting relations to the Birkhoff Curve Shortening Scheme, Perron-
6
Frobenius theorem, consensus algorithms [71] and cyclic pursuit on manifolds are
also unveiled. In Chapter 6, after reviewing the geometries of SO(n) and Gk(n)
we study existence and uniqueness of different means in Riemannian and 2-norm
Finsler balls; and we also derive some efficient geodesic calculations which can be
used to efficiently calculate the MI means. At the end of this chapter we also give
two applications in which our methods fit very well. We conclude the dissertation
in Chapter 7 with a summary of the dissertation, and discussions on open problems
and possible future research directions.
7
Chapter 2
Preliminaries from Riemannian Geometry
2.1 Introduction and Notations
In this chapter we briefly review some specific ingredients needed from Riemannian
geometry to deal with our problems of interest. We assume the reader is familiar
with basic concepts of Riemannian geometry. We state theorems with references for
their proofs. Also since there is little consensus about the exact definition of some
terms, especially those related to convexity, we take the opportunity to set forth the
definitions.
2.1.1 Some Notational Conventions
Throughout this manuscript M is an n-dimensional (or, if specified, m-dimensional)
complete Riemannian manifold equipped with a Riemannian metric g. We might
write Mn to show the dimension of M. We only deal with the unique Levi-Civita
connection derived from g. We denote the tangent space of M at q ?M by TqM.
We denote the Riemannian metric at q by gq. Recall that gq(.,.) : TqM?TqM ?R
is a positive definite inner product which changes smoothly with q. Usually, instead
8
of gq(X,Y) which indicates the inner product of two tangent vector X,Y ?TqM we
write?X,Y?q and also when the base point q is understood unambiguously we omit
the subscript and write ?X,Y?. Similarly and for convenience instead of gq(X,X)
we writebardblXbardblq and evenbardblXbardbl. We call a unit-speed geodesic a normal geodesic. Such
a geodesic is parameterized by its arc-length. By a minimizing, a minimal geodesic
or a segment between two points we mean a geodesic whose length is equal to the
distance between the two points. The letter B is reserved for metric balls and a ball
B(o,?) is an open ball with center o?M and radius ?. We also denote a ball of
radius ? in TqM with center 0?TqM by B(0,?). Unless otherwise stated, by a ball
we mean an open ball. ?A, intA and ?A denote the closure, interior and boundary
of the set A?M, respectively. The exponential map of M at q?M is denoted by
expq : TqM ?M and whenever it is invertible its inverse is denoted by exp?1q . We
assume the injectivity radius of M to be nonzero; and we denote it by injM. Note
that expq is invertible on any ball B(q,?) with ? < injM. We denote the curvature
tensor of g by R. At point q ? M, Kq(X,Y) = gq(X,R(X,Y)Y) is the sectional
curvature of the section spanned by orthonormal tangent vectors X,Y ?TqM. If a
is a matrix or a vector aT denotes its transpose. We usually use subscripts and less
often superscripts to denote indices of a sequence of points in a set. We also denote
a sequence whose kth term is xk by?xk??k=1 and when there is no confusion by?xk?k
or?xk?.
9
2.2 First Variation of Arc Length
The calculation of the first variation of arch length functional is the standard method
to characterize geodesics as stationary points (curves) of the arch length functional
(e.g., [18, p.74]). Without going into further details we mention, for future use, the
following useful immediate consequence of this calculation.
Proposition 2.2.1. Consider two points q1,q2 ?M with d(q1,q2) < injM and let
? : [0,1]?M be the minimizing geodesic connecting the two points with ?(0) = q1
and ?(1) = q2. Let c : (??,+?) ? M with c(0) = q2 be a smooth curve passing
through q2 such that it makes an acute angle with ? at q2, i.e., the angle between
velocity vectors ???(1) and ?c(0) is less than ?2. Then, there exists ? > 0 such that
for all 0 < t < ?, d(q1,c(t)) < d(q1,q2).
The angle condition ensures that the arc length functional for curves con-
necting q1 to c(t) is strictly decreasing for small t > 0. Figure 2.1 explains the
proposition. We will use this result in few occasions.
c(0) = q2
q1
c(t)
? < ?2 ?d(q1,c(t)) < d(q1,q2), 0 < t < ?
?
Figure 2.1: The first variation of arc length formula implies that, pro-
vided ? < ?2, by moving along c(t) the distance to q1 decreases at least
for small times.
10
2.3 Structure of Convex Sets in Riemannian Manifolds
In order to study means in a Riemannian manifold M in the subsequent chapters,
we need to have a good understanding of the structure of convex sets and balls in
a Riemannian manifold. Theorems 2.3.1 and 2.3.2 below contain the main facts we
need. We also prove some simple but useful corollaries of these theorems. For sake
of completeness we recall some definitions first.
Definition 2.3.1 ([78, p.168]). Let C be a non-empty subset of a Riemannian
manifold M.
1. C is strongly convex if for any two points q1,q2 ? C, there exists a unique
minimizing geodesic ? in M connecting q1 and q2, and ? is contained in C.
2. If for any q ? ?C, the closure of C, there exists an ?(q) > 0 such that C?
B(q,?(q)) is strongly convex, then C is said to be locally convex.
If C is strongly convex, then it is connected and locally convex. However, local
convexity does not imply strong convexity. For example, on the unit sphere in R3,
a closed hemisphere is locally convex but it is not strongly convex; since any two
antipodal points on the boundary can be connected by two minimizing geodesics
which lie entirely in the hemisphere. Small enough balls are strongly convex and
the injectivity radius and upper bound on the sectional curvature of M determine
a lower bound on the radius of largest strongly convex balls ([76, p. 177] and [18,
p.404]):
11
Theorem 2.3.1. Let M be a complete Riemannian manifold with injectivity radius
of injM and sectional curvature upper bound of ?. Let
rcx := 12 min{injM, ???}, (2.1)
we interpret 1? when ??0 as?1. Then for any o?M the open ball B(o,?) with
??rcx and closed ball B(o,?) with ? < rcx are strongly convex, and the distance
function xmapsto?d(o,x) is strictly convex along non-radial directions2.
This theorem and the definition of rcx are very important to us, as Theorem
3.2.1 which describes existence and uniqueness conditions concerns with strongly
convex balls. We shall use the following simple corollary:
Corollary 2.3.1. A closed ball of radius ? < rcx is strongly convex; and the interior
of the geodesic segment connecting any two points on the boundary lies in the
interior of the ball.
Proof. Since B(o,?) ? B(o,??) for ? < ?? < rcx; and since x mapsto? d(o,x) is strictly
convex in B(o,??) along non-radial directions, it follows that if q1negationslash= q2 are two points
on the boundary of B(o,?) and ? : [0,1]?M is a geodesic connecting them (with
?(0) = q1,?(1) = q2), then d(?(t),o) < ? for t?(0,1).
One can show that the intersection of any family of strongly convex (locally
convex) sets in a complete Riemannian manifold is strongly convex (locally convex).
1We shall follow this convention in all similar bounds we derive.
2Recall that the exponential map expo : ToM ?M is a radial isometry, therefore d(o,?(1
2)) =d(o,?(0))+d(o,?(1))
2 for a geodesic ? : [0,1]?M connecting o to x = ?(1) any x inside the injectivityball of o.
12
We are mainly interested in strongly convex sets, so we define:
Definition 2.3.2. Let M be a complete Riemannian manifold. Let A?M lie in a
strongly convex subset of M. The convex hull of A denoted by ConvHull(A) is the
intersection of all strongly convex sets containing A.
Definition 2.3.3 ([76] p.145). A submanifold S of the complete Riemannian man-
ifold M is called totally geodesic if geodesics in S are also geodesics in M and any
geodesic in M which is tangent to S at some point lies in S for some time. S is
globally totally geodesic if such a geodesic stays in S for all time.
According to this definition, a strongly convex open ball in M is a totally
geodesic submanifold. This holds true for more general convex sets as the following
theorem suggests. We state some results from [19] which are relevant.
Theorem 2.3.2 (Cheeger and Gromoll). Let C be a connected locally convex subset
of a Riemannian n-dimensional manifold M. Then the closure of C, C, carries the
structure of an embedded k dimensional (0?k?n) topological submanifold N of
M with smooth totally geodesic interior intC and (possibly non-smooth or empty)
boundary ?C. For each x?C define the tangent cone
Cx ={X?TxM|expxparenleftbigt XbardblXbardblparenrightbig?N for some positive t}?{0?TxM}. (2.2)
Then Cx\{0} is a k-dimensional open convex cone in TxM. In case x ? intC,
Cx = TxintC. In particular, if there is a point q ? intC and a normal minimal
geodesic ? : [0,d(x,d(x,q)]?C from x??C to q such that the distance between
13
q and ?C is equal to d(x,q) then Cx\{0}is a half-space. Moreover, if q?intC and
? : [0,d(x,q)]?M is the normal minimizing geodesic connecting x??C to q, then
?((0,d(x,q)])?intC.
The term?interior? here isnot meant initstopological meaning; asforexample
a line segment in R3 has no topological interior yet it is closed and convex in R3.
It is used in the same sprit as the ?relative interior? or ?interior? of a convex set in
Euclidean space (see Appendix 4.8 for definitions). Therefore, we have the following
useful corollary:
Corollary 2.3.2. Let x be point on the boundary of B(o,?) ? M with ? < rcx.
Then Bx\{0} ? TxM is an open half-space and there exists a vector nx ? Cx
pointing inward B such that for any point q?B the (velocity vector of the) minimal
geodesics connecting connecting x to q makes an angle less than ?2 with nx.
Proof. Note that since Bx\{0} is an open convex set it must have a supporting
hyperplane at 0 ? TxM and there should be a vector nx ? TxM pointing inward
the cone which makes angles less than ?2 with all vectors pointing inside the cone
Bx\{0}(see Appendix 4.8).
The convex hull of a set is not necessarily closed. Moreover, in contrast to the
Euclidean case, it is unknown whether the convex hull of a set of finite points in
a Riemannian manifold of non-constant curvature is closed [9, p. 231]. Therefore,
when dealing with points on the boundary of the convex hull of a set we need to
make sure that the closure of the convex hull is strongly convex.
14
Proposition 2.3.1. Let C ?M be a strongly convex set such that C belongs to
another bounded strongly convex set U. Then C is strongly convex. Therefore,
if A ? M is such that the closure of its convex hull, ConvHull(A), belongs to a
bounded strongly convex set, then ConvHull(A) is strongly convex.
Proof. It suffices to show that the unique minimal geodesic segment connecting
q1,q2 ? ?C belongs to C. Note that C is compact. Let ?qk1?k and ?qk2?k be two
sequences in intC with limits q1 and q2, respectively. Consider constant speed min-
imizing geodesics ?k : [0,1] ? intC such that ?k(0) = qk1 and ?k(1) = qk2. Note
that the speed of ?k is equal to d(qk1,qk2). Therefore, speeds of these geodesics are
uniformly bounded; and hence the sequence ?k becomes an equicontinuous sequence
on C. By the Arzela-Ascoli theorem we can extract a subsequence which converges
uniformly to a continuous curve ? : [0,1]? ?C. We know from the theory of length
spaces (see e.g., [15, Proposition 2.5.17 on p. 48]) that ? is a shortest continuous
curve connecting q1 and q2 in U. Therefore, due to strong convexity of U, ? must
be the unique minimizing geodesic segment connecting q1 and q2.
Note that on the unit sphere S2 ? R3 an open hemisphere is the largest
strongly convex set and its closure is not strongly convex. Therefore, the condi-
tion on C to belong to another strongly convex set is necessary to guarantee the
conclusion.
15
2.4 Bounds on the Hessian of the Distance Function and Its Powers
It is obvious that the convexity properties of the Riemannian distance function are
closely related to existence and uniqueness properties of the Lp Riemannian means.
Therefore, we review the bounds on the Hessian of the distance function and its
powers. The gist of these comparison results is that in an appropriate comparison
setting, as curvature increases the Hessian of the distance function (or any power of
it) decreases. Recall that the Hessian of f : M ?R at x?M is the symmetric bi-
linear form Hessfvextendsinglevextendsinglex : TxM?TxM ?R, such that Hessfvextendsinglevextendsinglex(??(t), ??(t)) = d2dt2f(?(t)),
where ? : R ? M is any smooth curve or geodesic passing through x at t. The
treatment presented here is a standard one and is borrowed collectively from [78,
50]. Figure 2.2 shows the framework. Let q ? M be a given point and x mapsto?
d(q,x) the distance function from that point. For convenience, we also denote this
function by dq(x). Consider ?(?) a curve (or possibly a geodesic) such that at
specific time t, ?(t) is in the injectivity ball (or domain) around q, i.e., there is a
unique minimizing geodesic connecting q to ?(t). The goal is calculate the second
derivative of f(?(t)) = dq(?(t)). Consider the family of geodesics c(s,t) = ct(s) such
that for each t, ct(s) : [0,d(q,?(t))]?M is a normal (i.e., unit speed) minimizing
geodesic connecting q to ?(t). Denote the covariant derivative along ct(s) by D?s.
Also we denote differentiation with respect to t by ? and with respect to s by ?.
Then s mapsto? ct(s) satisfies the geodesic equation for every t, i.e., D?sc?t(s) = 0. The
geodesic equation is a second order nonlinear differential equation. In order to
linearize it around smapsto?ct(s) one finds a linear differential equation for the derivative
16
Jt(s) = ?c(s,t)?t ?Tct(s). smapsto?Jt(s) is called the Jacobi field along geodesic smapsto?ct(s).
The Jacobi field satisfies the Jacobi equation:
D2
?s2Jt +R(Jt,c
?)c? = 0, (2.3)
where R is the curvature tensor of M. There are two initial conditions Jt(0) and
J?t(0) associated with this equation. Note that the Jacobi equation is invariant
?c?(dq(x),t)
q = c(0,t)
c(s,t)
??(t)
J(s,t)
t
s
x = ?(t) = c(d(q,x),t)
J(d(q,x),t)
?(?)
?c||(d(q,x),t)
Figure 2.2: For each t the curve smapsto?c(s,t) is a normal minimal geodesic connecting
q = c(0,t) to ?(t), where ?(?) is a curve passing through x?M. J(s,t) = ??tc(s,t)
is the associated Jacobi vector at c(s,t). At each point on the geodesic smapsto?ct(s),
J(s,t) has two components: one parallel to ct(s), denoted by Jbardbl(s,t) and the other
perpendicular to that, denoted by J?(s,t). These two components are not shown
in the picture, but the two mentioned directions are shown.
under linear reparametrization of the geodesic smapsto?ct(s). In many occasions, it is
more convenient to assume that the original geodesic corresponds to t = 0; and
as t deviates from 0, ct(s) deviates from c0(s). We also omit the subscripts from
Jt and from c0(s). In order to be able to compare c(s) with a nearby geodesic
leaving q but with a different small velocity vector we need to assume J(0) = 0.
17
At any point c(s) we can decompose J(s) ? Tc(s)M to a component along the
geodesic smapsto?c(s) denoted by Jbardbl(s) and another one perpendicular to the geodesic
denoted by J?(s). The parallel component is not so interesting since it can be
computed explicitly as Jbardbl(s) = (a + ts)c?(s) for constants a,b ?R. However, the
behavior of J?(s) depends on the curvature and is more interesting. It is possible
to show that ?J?(s),c?(s)? = ?J?(0),c?(0)?s +?J(0),c?(0)? (see e.g., [24, p. 118]).
Therefore, once we have J(0) = 0 and choose J?(0) perpendicular to c?(0), J(s)
remains perpendicular to c?(s). A conjugate point of q along geodesic c(s) is a point
qc = c(sc) and not equal to q, such that J(sc) = 0 for some initial conditions of
J?(0). Let qs1 be the first such a point. A crucial fact is that for s?(0,s1) and any
X ?Tc(s)M one can find a unique Jacobi field with J(0) = 0 such that J(s) = X,
where X is a given tangent vector in Tc(s1)M ([78, Lemma 2.4 p. 37]). Now let
x?M\parenleftbigCq?{q}parenrightbig where Cq is the cut locus of q, i.e., the set of cut points of q1.
For such an x let cs : [0,d(q,x)] ? M be the unique minimizing normal geodesic
connecting q to x, i.e., c(0) = q and c(d(q,x)) = x. For every X?TxM we can find
a Jacobi field J(s) along c(s) with J(d(q,x)) = X with J(0) = 0. Then one can
show that (see e.g., [78, Lemma 4.10 p. 109])
Hessdq|x(X,X) =?J??(dq(x)),J?(dq(x))?. (2.4)
Note that J?? = (J?)? and if J(dq(x)) = X is parallel to c(dq(x)) then J?(dq(x)) = 0
and Hessdq|x(X,X) = 0. This means that the Hessian of x mapsto? dq(x) is indefinite
1Recall that a point z?M is a cut point of q along a geodesic ? emanating from q, if z is the
first point along ? after which ? ceases to be length minimizing.
18
along the geodesic connecting q to x although it might be positive definite along
other directions. Equation (2.4) shows why we needed Jacobi fields to describe the
Hessian of the distance function.
Combining (2.3) and (2.4) is not straightforward. But one can find lower and
upper bounds on the Hessian of x mapsto?dq(x) based on the upper and lower bounds
on the sectional curvatures of M, respectively. For that we need to solve (2.3) for
constant curvature. If the sectional curvature is a constant ?, then (2.3) reads as
D2
ds2J
?(s) +?J?(s) = 0. (2.5)
As a matter uniform notations it is convenient to define:
sn?(s) =
??
???
???
???
????
1?
? sin(
??s) ? > 0
s ? = 0
1?
|?| sinh(
radicalbig|?|s) ? < 0
, cs?(s) =
??
???
???
???
????
cos(??s) ? > 0
1 ? = 0
cosh(radicalbig|?|s) ? < 0
.
(2.6)
It is easy to see that the solution to (2.5), can be written as J?(s) = J?(0)cs?(s) +
(J?)?(0)sn?(s).
Let ? ? K ? ? be the lower and upper bounds on the sectional curva-
tures of M. One can bound ?J??(s),J?(s)? in terms of ?(J?)??(s),(J?)?(s)? and
?(J?)??(s),(J?)?(s)? and this results in the following (see e.g., [78, Lemma 2.9 p.
153]):
Theorem 2.4.1. If x?B(q,?), where ? < min{injqM, ???}if ? > 0 and ? < injpM
19
if ??0, then
cs?(dq(x))
sn?(dq(x))?X
?,X???Hess dqvextendsinglevextendsingle
x(X,X)?
cs?(dq(x))
sn?(dq(x))?X
?,X??, (2.7)
where X? is the component of X ?TxM orthogonal to the geodesic connecting q
to x, which is the same as the direction of the gradient of xmapsto?dq(x) at x.
From this theorem we observe that the upper curvature bound is the crucial
factor in determining the lower bound on the Hessian (and its positiveness) of the
distance function. In particular, if M is of non-positive curvature, then xmapsto?dq(x) is
convex within injectivity balls around q. In addition, if M is simply connected and
hence its injectivity radius is infinity, then the distance function in M is globally
convex. Along directions other than the direction of the geodesic connecting q
to ?(t), the Hessian of t mapsto? dq(?(t)) is positive definite and the distance function
is strictly convex. Note that the excluded or degenerate direction is exactly the
direction of the gradient of x mapsto? dq(x) at x = ?(t). We recall the same property
for the distance function in Rn. If M is of constant curvature ?, then the lower
and upper bounds are equal, hence the Hessian of xmapsto?dq(x) is Hessdq|x(X,X) =
cs?(dq(x))
sn?(dq(x))?X
?,X??for X?TxM.
Let Sn? be a simply connected complete Riemannian manifold of constant cur-
vature ? and dimension n = dimM. It is well-known that these conditions de-
termine Sn? together with a Riemannian structure uniquely, up to isometry [78,
p.137]. Therefore, if ? > 0 we can assume that Sn? is the standard sphere in Rn+1
with radius 1?? whose injectivity radius is ???. When ??0, as mentioned before,
20
the injectivity radius is infinity. Now again consider our manifold M with upper
and lower curvature bounds of ? > 0 and ? ? 0, respectively. The correspond-
ing spaces of constant curvature are (Sn?,?g) and (Sn? ,??g) with distances functions
?d and ??d. Consider points q,x ? M with d(q,x) < ? as in Theorem 2.4.1. Also
consider pairs of points ?q,?x?Sn? and ??q, ??x?Sn? , counterpart to q and x, such that
d(q,x) = ?d(?q, ?x) = ??d(??q, ??x). Let ?,?? and ??? be unit speed geodesics passing through x,
?x and ??x, respectively; such that the angles between each of them and the geodesics
connecting q (?q and ??q) to x (?x and ??x) are equal. Then we write:
Hess?d?q ?Hessdq ?Hess??d??q (2.8)
meaning that:
Hess?d?qvextendsinglevextendsingle??(t)(???(t), ???(t))?Hessdq(??(t), ??(t))?Hess??d??qvextendsinglevextendsingle???(t)(????(t), ????(t)). (2.9)
We have this corollary:
Corollary 2.4.1. Let M be a complete Riemannian manifold of upper bounds on
sectional curvature ?. Define f(x) = 1pdp(q,x) which we also write as f(x) = 1pdpq(x).
Let ? : R?M be a unit speed geodesic. Assume the angle between the geodesic
connecting q to x = ?(t) and ?(t) be ? then
df
dt(?(t)) = d
p?1(q,?(t))?cos? (2.10)
21
and
d2f
dt2 (?(t))?(p?1)?d
p?2(q,?(t))?cos2 ?+dp?1(q,?(t))?cs?d(q,?(t))
sn?d(q,?(t))?sin
2 ? (2.11)
The latter is equivalent to
Hessf|q ?Hess ?f|?q. (2.12)
If M ?Sn? we have equality in the above.
Proof. We just need to calculate the first and second order derivatives of f(?(t)) =
1
pd
p
q(?(t)) by the chain rule:
df
dt(?(t)) = d
p?1
q (?(t))
d
dtdq(?(t)) = d
p?1
q ??dq(?(t)), ??(t)? (2.13)
from which we have:
d2f
dt2 (?(t)) = (p?1)d
p?2
q (?(t))??dq(?(t)), ??(t)?
2 +dp?1
q (?(t))
d2
dt2dq(?(t)). (2.14)
Now the claims follow from Theorem 2.4.1.
2.4.1 A Useful Length Comparison Estimate
Let q,x,y ? B(o,?) ? M where ? < rcx and assume X,Y ? TqM are such that
x = expq X and y = expq Y. Assume that M has lower and upper bounds of ??0
and ? ? 0 on its sectional curvature. We want to find upper and lower bounds
on d(x,y) in terms of bardblX ?Ybardbl and the curvature bounds. Let ? : [0,1] ? M
22
be any curve in B(o,?) connecting x to y such that ?(0) = x and ?(1) = y. Let
ct(s) : [0,1]?M, for fixed t, be a geodesic connecting q to ?(t) such that ct(0) = q
and ct(1) = ?(t). Denote by J(s,t) the Jacobi field along ct(s) with J(0,t) = 0
and J?(0,t) arbitrary. We are using the same conventions established before in this
section. From Rauch?s comparison theorem (e.g., [18, p. 390]) we have
sn?l2t(s)bardblJ?(0,t)bardbl?bardblJ(s,t)bardbl?sn?l2t(s)bardblJ?(0,t)bardbl, (2.15)
where lt = d(q,?(t)) (see (2.6)). The above holds for arbitrary J?(0,t) since we as-
sumed ??0 and ??0, otherwise it would only hold for J?(0,t) orthogonal to c?t(0)
[18, p. 390]. Note that since ?(t) with velocity ??(t) = J(1,t) is a curve connecting
x to y we have d(x,y)?integraltext10 bardbl??bardbl= integraltext10 bardblJ(1,t)bardbldt. Let J(t,s) be a particular Jacobi
field such that J?(0,t) = Y ?X, then we have d(x,y)? maxt sn?l2t(1)?bardblY ?Xbardbl.
The best upper bound estimate for lt is twice the radius of the ball i.e., lt < 2?. So
we have
max
t
sn?l2t(1)? 12?radicalbig|?|sinh2?
radicalbig
|?|. (2.16)
Now let ? : [0,1]?M be the geodesic connecting x to y. We havebardbl??(t)bardbl= d(x,y).
Back in TqM there is a curve ? : [0,1]?TqM with velocity vector ??(t) = J?(0,t)
which induces the Jacobi field J(t,s) with bardblJ(1,t)bardbl = d(x,y). Now ? is a curve
connecting X to Y in TqM, therefore
bardblX?Ybardbl?
integraldisplay 1
0
bardbl??bardbl=
integraldisplay 1
0
bardblJ?(0,t)bardbldt? 1min
t sn?l2t(1)
d(x,y). (2.17)
23
Note that since the the distances d(q,x) and d(q,y) can be larger than rcx we have
no restriction on lt based on convexity of the distance function; that is we cannot
conclude that lt?max{bardblXbardbl,bardblYbardbl}. However, as before we have lt < 2?. As a result
we have
min
t
sn?l2t(1)? 12??? sin2???. (2.18)
Therefore, in a manifold of nonnegative curvature with curvature bounded from
above by ? > 0 we have:
sin2???
2??? ?bardblX?Ybardbl?d(x,y)?bardblX?Ybardbl (2.19)
and in a manifold of non-positive curvature with curvature bounded from below by
? < 0 we have:
bardblX?Ybardbl?d(x,y)? sinh2?
radicalbig|?|
2?radicalbig|?| ?bardblX?Ybardbl. (2.20)
We will use these estimates in Section 4.6.2 when we consider the sensitivity prop-
erties of the L2 mean.
2.5 Cartan-Alexandrov-Toponogov (C.A.T.) Comparison Theorems
The material presented here can be found collectively in [15, pp.238-240], [51,
pp.197-203] and [18, pp.399-403 and p.420].
Definition 2.5.1. A(geodesic) trianglein(M,d)consists of threevertices q1,q2,q3?
M and three minimal geodesics segments connecting the three points. The angle
24
between the geodesic side connecting q1 to q2 and the geodesic side connecting q1 to
q3 is denoted by ?q2q1q3. We denote the side between q1 and q2 by q1q2. The two
sides q1q2 and q1q3 and angle ?q2q1q3 = ? constitute a hinge at q1 with angle ? and
sides q1q2 and q1q3. If the vertices are such that the sides of the triangle are deter-
mined uniquely, we denote the triangle by ?q1q2q3. In referring to the vertices of
the triangle, sometimes it is convenient to use modulo 3 cyclic indexing, e.g., q4?q1
and so forth. The perimeter of a triangle with vertices q1,q2,q3 is summationtext3i=1 d(qi,qi+1).
As noted in the definition, a triangle is not necessarily determined uniquely
by its vertices. However, if for a triangles with vertices q1,q2,q3 ? M, we have
d(qi,qi+1) < injM for 1 = 1,2,3, then the geodesic sides are uniquely determined.
In particular, if the perimeter of that triangle is smaller than 2injM, then the
sides are defined uniquely. Since, otherwise, say d(q1,q2)?injM, then the triangle
inequality requires d(q2,q3) + d(q1,q3)?injM, which contradicts the condition on
the perimeter.
Definition 2.5.2. Let q1,q2 and q3 be vertices of a triangle in M. A comparison
triangle in (Sn?, ?d), if it exists, is a triangle with vertices ?q1, ?q2, ?q3 ? Sn? such that
?d(?qi, ?qi+1) = d(qi,qi+1),i = 1,2,3. A comparison hinge in (Sn?, ?d) is a hinge with
vertex ?q1 ? Sn? and sides ?q1?q2 and ?q1?q3 such that ??q2?q1?q3 = ?q2q1q3, ?d(?q1, ?q2) =
d(q1,q2) and ?d(?q1, ?q3) = d(q1,q3).
When the lengths of two sides and the angle between them from a triangle
in a manifold of constant curvature are given we can calculate the length of the
third side and the two other angles. The calculation is carried out using the well-
25
known spherical, hyperbolic or Euclidean laws of cosines. For a triangle with vertices
q1q2q3 in the model space (Sn?,d), with ? negationslash= 0, the Law of Cosines for finding the
side opposite to q1 reads as [18, p. 103]:
cs?d(q2,q3) = cs?d(q1,q2)?cs?d(q1,q3) +??sn?d(q1,q2)?sn?d(q1,q3)?cos?q2q1q3,
(2.21)
where functions sn? and cs? are defined in (2.6). However, in a general Riemannian
manifold Mn, where the sectional curvatures are not necessarily constant, we cannot
perform this calculation so neatly and explicitly. Nevertheless, one can find upper
and lower bounds on the length of the third side based on lower and upper bounds
on the sectional curvature of the manifold, respectively. Let ? ? ? ? R denote
upper and lower sectional curvature bounds of Mn, respectively. The main idea is
to compare the unknown length of the third side with the lengths of the third sides
of comparison triangles (more accurately hinges, see below) in the corresponding
model spaces, i.e., Sn? and Sn?. A class of very useful triangle comparison theorems
known as Toponogov or Cartan-Alexandrov-Toponogov (C.A.T) comparison theo-
rems serve this purpose. These theorems can also be used to perform other forms
comparisons: angle or secant comparisons. The statement given in the following is
a very comprehensive one which we will use in different occasions [51, pp. 197-198]:
Theorem 2.5.1 (C.A.T Comparison Theorems). Let (Mn,d) be a complete Rie-
mannian manifold. For K, the sectional curvature of M assume ??K??. Denote
the injectivity radius of M by injM > 0. Let rcx be as in (2.1).
26
1. (Upper Bound on Curvature) If the perimeter of a triangle with vertices
q1,q2,q3?is smaller than 4rcx, i.e., summationtext3i=1 d(qi,qi+1) < 4rcx, then a comparison
triangle??q1?q2?q3 in Sn? exists and is unique up to congruence.
(a) (Angle version) The angles of ??q1?q2?q3 are not smaller than the corre-
sponding angles of?q1q2q3, i.e., ??qi?1?qi?qi+1 ??qi?1qiqi+1.
(b) (Secant version) Let x and y be two points on two different sides of
?q1q2q3. Via the obvious 1-1 correspondence between the points on the
perimeters of the two triangles find two points ?x and ?y (corresponding to
x and y, respectively) on the perimeter of ??q1?q2?q3. Then we have that
the secant connecting ?x to ?y is not shorter than the secant connecting x
to y, i.e., ?d(?x, ?y)?d(x,y)
(c) (Hinge version) Corresponding to the hinge (?q2q1q3,q1q2,q1q3) a com-
parison hinge (??q2?q1?q3, ?q1?q2, ?q1?q3) in Sn? exists and d(q2,q3)? ?d(?q2, ?q3).
2. (Lower Bound on Curvature: Toponogov Theorems) With curvature bounded
from below there is no need for limiting the size of a triangle in M and for any
triangle with vertices q1q2q3 a comparison triangle in Sn? exists. Moreover, with
? > 0 we have summationtexti d(qi,qi+1)?2 ??? and equality holds only if M = Sn? . The
Angle, Secant and Hinge versions in this part are exactly as in the previous
part except that the directions of the inequalities are reversed.
If the bounds on the curvature are strict bounds (e.g., ? < K), then the correspond-
ing inequalities can be replaced by strict inequalities.
27
One can think of the comparison triangle in Sn? as fatter than our original
triangle in M and the comparison triangle in Sn? as thinner than our original one
in M. As a side, we mention that the remarkable property that the lower positive
bound on curvature limits the perimeter of a triangle is the main theme in a class
problems in global Riemannian geometry known as ?pinching problems?. Figure 2.3
pictorially explains the theorem.
28
??d(??x
i, ??qi) = d(xi,qi) = ?d(?xi, ?qi)
q1
q2
q3 ?q1
?q2
?q3
??q2
??q1
??q3
x1
x2??x
2
?x1
?x2
arrowvertexdblarrowdblbt
??x1
(Sn? , ??d) (Sn?, ?d)(Mn,d)
??d(??q
i, ??qi+1) = d(qi,qi+1) = ?d(?qi, ?qi+1)(i = 1,2,3)
???qi?1??qi??qi+1??qi?1qiqi+1 ???qi?1?qi?qi?1(i = 1,2,3),
??d(??x
1, ??x2)?d(x1,x2)? ?d(?x1,?x2)
[a]
??d(??q
2, ??q3)?d(q2,q3)? ?d(?q2, ?q3)
q1
q2
q3
???q2??q1??q3 = ?q2q1q3 = ??q2?q1?q3
?
?q2
?q3
?q1
??d(??q
1, ??qi) = d(q1,qi) = ?d(?q1, ?qi) (i = 1,2)
(Mn,d) (Sn?, ?d)
??q3
??q1
??q2(S
n
? ,
??d)
[b]
Figure 2.3: C.A.T comparison theorems (Theorem 2.5.1). Part [a] describes the
angle and secant versions and part [b] describes the hinge version. ? and ? are
lower and upper bounds on the sectional curvature of Mn, respectively. we construct
comparison triangles in (Sm? , ??d) and (Sm?, ?d) with vertices ??q1, ??q2, ??q3 and ?q1, ?q1, ?q1,
respectively. Corresponding sides of the triangles have equal lengths. Corresponding
to a secant x1x2 we find secants ?x1?x2 and ??x1??x2 in other two triangles. The triangle
in Sm? is fatter and the triangle in Sm? is thinner than the original one in M, i.e.,
the angles and secants in M are not smaller than those in Sm? and not larger than
those in Sm?.
29
Chapter 3
Existence and Uniqueness of the Riemannian Lp Mean
3.1 Introduction
We start with the following definition:
Definition 3.1.1. Let M be a complete Riemannian manifold of dimension n with
Riemannian structure g and corresponding distance function d. Depending on the
situation we might write (M,g) or (M,d). Assume w is a probability measure on
M with support supp(w). Define
fp(x) =
??
??
??
1
p
integraltext dp(x,s)dw(s) 1?p <?
maxs?supp(w) d(x,s) p =?.
(3.1)
The Riemannian Lp center of mass or mean with respect to measure w is the set
of (global) minimizers of fp on M. Most of the times we assume that w is a finite
support probability mass function with support {xi}Ni=1, where w(xi) = wi > 0;
and because of our intended applications we might refer to xi?s as data points. We
call vector w = [w1,...,wN]T a weight vector. Unless otherwise stated, we always
assume a weight vector is a vector of positive entries which add up to one. We
30
rewrite (3.1) as
fp(x) =
??
??
???
1
p
summationtextN
i=1 wid
p(x,xi) 1?p <?
maxi d(x,xi) p =?.
(3.2)
The Lp Riemannian center of mass or mean for the set {xi}Ni=1 ? M with
corresponding weight vector w is the set of (global) minimizers of fp in M. We
might use alternative terms such as intrinsic Lp mean or Riemannian Lp average,
as well. When we refer to the Lp mean of {xi}Ni=1, we implicitly assume that the
weight vector is known. We denote the Lp mean of{xi}Ni=1 by ?xp; and in case there
is no ambiguity we might drop the subscript p.
Remark 3.1.1. Although we write{xi}Ni=1 as a set we are not ruling out the pos-
sibility that some of xi?s might coincide; but obviously we are not interested in and
rule out the trivial case where all of them coincide.
This definition in (M,d) is inspired by the standard corresponding definition in
the Euclidean space (Rn,bardbl.bardbl) with the standard distance functionbardbl.bardbl. Perfunctorily,
one excepts the two to have similar properties, including existence and uniqueness
properties, provided oneconfines thepoints{xi}Ni=1 in?small enoughdomains? inM.
Note that the Lp mean in M is not necessarily a single point. The most immediate
and valid questions are the accurate existence and uniqueness conditions for the
Lp Riemannian means and their other properties. The main difficulty in answering
these questions arises when M is a manifold of positive sectional curvature, where
the Riemannian distance function to a point looses its convexity in distances far
away. As it will become more evident the notion of convexity plays a central with
31
regards to the Lp means and their properties. Our goal in this chapter is to prove
or more accurately to improve the existence and uniqueness theorems for the Lp
means; and in the next chapter we shall investigate some important properties of
the means. Before further exposition of details about the results we shall give a
historical introduction and perspective about the subject which will help us relate
our results to the previous ones. Even before that we introduce some notations and
conventions.
3.1.1 On the History of the Riemannian Center of Mass
Here, we give a brief chronological history of the notion of Riemannian center of
mass and related existence and uniqueness results. Our account is based on the
most widely used and available texts; and it might not be complete. According to
many sources (e.g., [9, p. 235]) Cartan was the first to define and use the center of
mass in the context of Riemannian geometry. In 1920?s, he defined the L2 center
of mass in complete simply connected manifolds of nonpositive curvature (a.k.a.
Hadamard manifolds) and proved the existence and uniqueness of the center. He
used the L2 center of mass to prove that any compact subgroup of the isometry
group of an Hadamard manifold has a fixed point. A proof of Cartan?s theorem
using the L? center of mass instead, can be found in [78, p. 225] or [76, p. 164]1.
Aside from an unpublished work by Calabi pointed out in [9, p. 233], the next usage
is by Grove and Karcher in [34], where motivated by solving pinching problems, they
1The benefit of resorting to the L? mean instead of the L2 mean is that one does not need to
resort to the existence of a bi-invariance probability measure on the isometry group. I thank Prof.
P. Petersen for this remark.
32
define the L2 center of mass in general Riemannian manifolds but for probability
measures with support in small enough balls. As a result, Grove and Karcher give
an extension of the result of Cartan to arbitrary manifolds. As one expects and
it will become clear later here, positive curvature and finite injectivity radius of
M are what might bring about non-uniqueness of the center. In a series of papers
[34, 35, 36] Grove, Karcher and Ruh improve i.e., enlarge the domain of existence
and uniqueness and consider new applications of the L2 center of mass. In [36],
for compact Lie groups, the domain of uniqueness is enlarged and is shown to be a
strongly convex non-ball set. Next, Karcher in [50] gives new proofs, applications
and improvements of the previous results.
Two points are worth of elaboration at this time: First, the definition used by
Grove and Karcher for the L2 center of mass is what we call the ?local definition?,
i.e., given a probability measure w with support in a small ball B(o,?) the local
L2 center of mass is defined as a point in B(o,?) at which f2 is minimized. In
other words, the function under minimization is f2|?B, the restriction f2 to B(o,?).
This definition serves the purposes of finding a unique zero of the gradient vector
field of f2 in B and deriving useful related estimates. However, as pointed out by
Groisser in [32] , the center defined in this fashion is susceptible to depend on the
candidate ball chosen; and the local definition does not immediately rule out the
annoying possibility that a fixed probability measure might have different centers
depending on which candidate ball we use. We refer the reader to [32] for more
on this and alternative definitions and ways to ameliorate this problem. On the
other hand, evidently the global definition does not suffer from this problem and
33
as soon as its uniqueness is established, the center is independent of the candidate
ball1. Another possible benefit of resorting to global minimization of fp is that one
might expect the global minimizer to be a ?better mean? than a local minimizer.
For these reasons the global definition seems to be a more natural one. As it has
become standard, the global mean is also referred to as the Frech?et mean (e.g., [61]).
Second, the detailed techniques used and the Jacobi field estimates derived to show
the uniqueness of the local L2 center of mass in [34, 35, 36] and in [50] are rather
different, but the grand strategies used are very similar. Assume supp(w)?B(o,?).
The strategy in [34, 35, 36] is to make sure that ? is small enough such that the
Hessian of x mapsto? d2(s,x) is positive definite at any critical point x of f2 in B(o,?)
and for all s?B(o,?). The strategy in [50] is to choose ? small enough such that
the Hessian of xmapsto?d2(x,s) is positive definite for all x,s?B(o,?). In other words,
both strategies use the fact that in order for the Hessian of f2 to be positive definite
at x ? B(o,?), it is sufficient (but not necessary) for the Hessian of x mapsto? d2(x,s)
to be positive definite for all s ? B(o,?). One expects that the upper bounds on
? derived based on the mentioned sufficient condition might not be very sharp and
can be improved. Before delving into this issue, we briefly recall the result in [50].
Let injM denote the injectivity radius of M and ? an upper bound on the sectional
curvatures of M. Let
? < rcx = 12 min{injM, ???}. (3.3)
Denote the gradient vector field of f2 by ?f2. Since B(o,?) is strongly convex
1As we shall see, under a uniqueness condition, the two means coincide; therefore, the men-
tioned unappealing scenario will not happen.
34
(see Theorem 2.3.1), ??f2 will be pointing inward the ball on its boundary (see
Theorem 3.2 for more details). Therefore, the minimum of f2|?B cannot happen on
the boundary of the ball and the minimizer must lie inside the ball. In order to
guarantee the uniqueness of the minimizer, as explained before, Karcher requires
xmapsto?d2(x,s) to be strictly convex for all x,s?B(o,?). Based on the Jacobi field
estimates (see Section 2.4.1), this happens when d(x,s) < ?2??, which in turn is
guaranteed if ? < ?4?. Therefore, the local L2 center of mass in B(o,?) is unique if
? < 12 min{injM, ?2??}. (3.4)
Unfortunately, as pointed out by Karcher [50], for the unit sphere in R3 the above
gives ?4 as the upper bound on the radius of the largest ball containing supp(w)
to ensure the uniqueness of the center of mass. Nevertheless, it is obvious that
for at least two point masses with equal weights lying in a hemisphere (i.e., if
? < ?2), the midpoint of the shortest geodesic between them is the unique L2 center
of mass. Roughly speaking, in most occasions, bound (3.4) underestimates the
radius of uniqueness by a factor of two. It is shown by W. S. Kendall [52], that
? < ?4?? is not necessary to ensure the uniqueness of the local L2 center of mass
in B(o,?). In [52], he initially calls any local minimizer of f2 on M a ?Karcher
mean?. However, later in the paper Kendall extends his definition to the situation
where in the definition of f2 the distance function d is replaced by its restriction
to B(o,?), i.e., d|B. Groisser calls this latter mean the ?solipsistic Karcher mean?
[32]. Kendall shows that given supp(w) ? B(o,?), if ? < min{injM, ?2??}, then
35
there exists a unique solipsistic Karcher mean in B(o,?) (see Theorem (7.3) and the
remark above it in [52]). A ball whose radius ? satisfies Kendall?s bound is weakly
convex but not necessarily strongly convex [18, p.405], i.e., any two points in it can
be connected by a unique geodesic which is the unique shortest curve between the
two points among all curves in the ball and not entire M; and that is why ? can be
larger than 12injM in Kendall?s bound. It is interesting to note that the mentioned
extended definition had appeared in [16], although probably it was not noticed by
Kendall. The definition of the center of mass used in [18, p. 407] is as such. For our
applications this definition seems to add no clear benefit. Strong convexity of the
ball is needed to ensure the existence of the Riemannian center of mass. Therefore,
as also alluded to in [32], when one uses the global distance in defining f2 (as we do
in (3.1)), Kendall?s result yields the bound in (3.3) to ensure both the existence and
uniqueness of the local L2 center of mass in B(o,?) for a probability measure whose
support is in B(o,?). Observe that on the unit sphere in R3, by (3.3) the local L2
center of mass is unique if ? < ?2. As mentioned long time ago in [34], interest in
Riemannian center of mass also exists among probabilists who investigate Brownian
motions in manifolds and their relations to harmonic maps. Interestingly, Kendall?s
work belongs to this area of research. His proof of his uniqueness result requires
some preparations and is rooted in the mentioned literature.
The next step in this journey is the result of Buss and Fillmore [17]. Their
work is motivated by the use of the Riemannian L2 center of mass in spherical
interpolation for many applications including computer graphics and animation.
Buss and Fillmore show that for a probability mass function with finite support on
36
the unit sphere in Rn, if the support is within a hemisphere, the global L2 center
of mass is unique and belongs to the hemisphere. They use an interesting reflection
argument to show that the center must lie inside the ball. They also use an elegant
technique of pairing the points in order to show that any stationary point of f2
inside the hemisphere is a local minimizer; hence, they prove the uniqueness of the
L2 center. Their uniqueness proof can be considered as an alternative proof for
Kendall?s result on a constant curvature manifold and for finite support probability
mass functions. Although, they are not specific about it, Buss and Fillmore show
that for such a measure on the unit sphere the ?local? and ?global? L2 centers of
mass coincide. We mention that it is usually possible to obtain a compromised
global existence and uniqueness result from a local existence and uniqueness result.
For example, assume we know that ? < r? for some r? > 0 guarantees the existence
and uniqueness of the local Lp center of mass for a probability measure with support
in B(o,?) and that the center is the only stationary point of fp in B(o,?). Then just
by the triangle inequality we obtain the existence and uniqueness of the global Lp
center of mass for a measure with support in B(o,?) where ? < 12r?. Le in [61] uses
a rather similar argument to show that in a general (i.e., nonconstant curvature)
manifold ? < 12rcx guarantees existence and uniqueness of the global L2 mean for a
probability measure with support in B(o,?). Le?s work is motivated by applications
in the statistical analysis of shapes.
We should mention that statisticians have dealt with data points on circles
and spheres since early 20th century in a context known as ?statistical analysis
of directional and spherical data? [28, p. xi]. A survey of references such as [65,
37
28] shows that the framework of Riemannian geometry and especially the related
existence-uniqueness issues, however, have not appeared explicitly in the mentioned
context. As this body of literature is more concerned with some specific parametric
distributions (e.g., von Mises distribution) on the circle and the sphere. Another
reason is that usually what is considered as the mean, is the so-called extrinsic mean
which arises from the standard embedding of the sphere or any other manifold
in a related Euclidean space of higher dimension. In contrast, what we call the
Riemannian mean can also be called ?intrinsic mean.? The extrinsic mean in most
cases is the projection of the standard Euclidean mean of the data points computed
in the ambient space onto the (sub)manifold; and in the case of the sphere or few
other manifolds it is much easier to numerically compute than the Riemannian or
intrinsic mean. Until recently with works such as [11] the Riemannian mean had
not have gained much attention in the statistics literature.
3.1.2 Contributions and Outline of the Chapter
In this chapter we provide a unified and extensive analysis of the existence and
uniqueness of global Riemannian Lp means for 1?p??. The main contributions
of this chapter are as follows:
1. In Theorem 3.2.1 (or Theorem 3.2.21) we derive a new bound on the radius
1We derive our results in this and other chapters for the case of finite number of points with
positive weights which we have introduced and dealt with so far. The reason is that in an applied
and computation-oriented setting this is the more realistic framework. However, most of the results
in this and the next chapter carry over to the case of arbitrary probability measures in M with no
or little extra work. Since in the related literature the existence-uniqueness results are stated for
such measures, in Section 3.2.2 we state Theorem 3.2.2 which is the counterpart of Theorem 3.2.1
for arbitrary probability measures.
38
of the ball containing the support of a probability measure on M to ensure
uniqueness of the (global) Lp mean with respect to that measure. In particular,
for p = 2 our existence and uniqueness result, improves the result by Le in
[61] regarding the global L2 mean. More significantly, our result leads to the
discovery that under the best available existence and uniqueness condition
for the local Lp mean, the local and global Lp means coincide; therefore,
in the sense described before, ours is an uncompromised global existence and
uniqueness result. On the route to prove Theorem 3.2.1, we extend (with some
modifications) the method of Buss-Fillmore in [17], via comparison arguments,
to arbitrary manifolds.
2. As a part the proof of Theorem 3.2.1 (or Theorem 3.2.2) we give an alternative
and more geometrical proof for the uniqueness bound by W. S. Kendall for
the L2 mean.
The organization of the chapter is as follows: Entire Section 3.2 is devoted to the
proof of the existence and uniqueness theorems for the Lp means. In Section 3.3 we
discuss the uniqueness issue when 1?p < 2 and open problems.
3.2 Existence and Uniqueness of the Riemannian Lp Means
As mentioned before, the existence and uniqueness of properties of the Lp means
in manifolds of non-positive curvature especially if they are simply connected, look
like their counterparts in Euclidean space in many aspects. However, the case of a
manifold of positive curvature is more complicated as can be seen in the example
39
of the sphere. In fact, the sphere gives us a guideline as what to expect. This fact
will be our guide in using comparison theorems to prove the following existence and
uniqueness theorem:
Theorem 3.2.1. Let (M,d) be a complete Riemannian manifold with sectional
curvature upper bound of ? and injectivity radius of injM. Define
??,p =
?
???
???
1
2min{injM,
?
2??} 1?p < 2
1
2min{injM,
??
?}(= rcx) 2?p??.
(3.5)
Let {xi}Ni=1 ? B(o,?) ? M with ? < ??,p, then this set of points has a unique
Lp mean for 1 < p ??, which lies in B(o,?). For p = 1, unless all the data
points lie on a geodesic, again the mean exists, is unique and lies in B(o,?); and in
the degenerate case the mean might not be unique but it lies in the interior of the
geodesic. Moreover, for 1 < p <?the Lp mean is the only stationary point of fp
(see (3.2)) in B(o,?). In addition, for 1 < p < 2, fp is strictly convex in B(o,?);
and for p = 1 it is again strictly convex unless all x?i lie on a geodesic in which case
f1 will be only convex.
The following subsections include the proof of this theorem with some extra
descriptions.
3.2.1 Proof of Theorem 3.2.1
The following subsections constitute the proof of Theorem 3.2.1. Before starting
the proof notice that we can write the expression for the gradient vector field of
40
fp(1 < p <?) at x?B(o,?) as
?fp(x) =?
Nsummationdisplay
i=1
widp?2(x,xi)exp?1x xi. (3.6)
Note that for p = 1 the above remains valid at xnegationslash= xi(1?i?N).
3.2.1.1 Insideness
We show that a minimum of fp, if it exists, should happen in B(o,?) if 1 < p??
and in B(o,?) if p = 1. First, let us resolve the situation for points on the boundary
of the ball. By Corollary 2.3.1 the interior of a minimizing geodesic connecting any
two points in B(o,?) belongs to the interior of the ball; therefore, exp?1x xi ?TxM
belongs to the interior of Bx for any xnegationslash= xi on the boundary of B(o,?), where Bx
is the tangent cone of B at x (see Theorem 2.3.2). Hence, ?fp(x)(1 < p < ?)
which is a linear combination of exp?1x xi?s also belongs to the interior of Bx and will
be pointing inward the ball at any point x on the boundary of the ball; therefore,
x cannot be a minimizer of fp. This also is valid for p = 1, unless when x = xi
for some i in which case the gradient of f1 at x is not defined. Note that at x,
since the boundary of the ball is smooth and exp?1x xi lies in the interior of Bx
it makes an acute angle with the radial geodesic connecting x to o. Hence, by
a first order variation of the arc-length (see Proposition 2.2.1), if we move along
this radial geodesic towards the center, we can still reduce d(x,xi) for 1 ? i?N
and consequently we can reduce f?(x). Therefore, for p =? also a point on the
boundary of B(o,?) cannot minimize f?. Next, we show that for any point outside
41
B(o,?) like q there is q? inside the ball such that fp(q) > fp(q?). In order to show
that, of course, we show that d(q,xi) > d(q?,xi) for 1?i?N. If d(o,q)?2?, then
from the triangle inequality d(q,xi) ? d(o,q)?d(o,xi) > ?, hence fp(q) > fp(o)
and q cannot be a minimizer. So we only consider a q which is in the annular
region A = {x ? M|? < d(o,x) < 2?}. Since d(q,o) < 2? < injM, there is a
unique minimal geodesic that connects q to o. This geodesic meets the boundary of
B(o,?) perpendicularly at a unique point, qc, which lies between q and qc.Denote the
corresponding normal vector pointing inward B(o,?) by nqc (see Corollary 2.3.2).
We choose a point q?, the reflection of q inside the ball, at distance d(q,qc) between
qc and o on the geodesic connecting q to o. A minimizing geodesic emanating from
qc to any of xi?s makes an angle less than ?2 with nqc at qc (by Theorem 2.3.2).
Note that since d(qc,xi) < 2? < injM, the mentioned geodesic is unique. Now, we
perform two sets of comparisons: one for the case ??0 and one for the case ? > 0.
The Case ??0
Our model or comparison space is (Sn0 ?Rn, ?d), where ?d is the standard Euclidean
distance function. Consider the same configuration as the one in M, in the model
space: a ball of radius ? centered at ?o and a point ?q outside the ball at distance
d(o,q) from the center, i.e., ?d(?o, ?q) = d(o,q). Figure 3.1 shows the configuration in
both M and Sn?. In that figure, corresponding equal angles or sides in M and Sn?
are marked. From ?q? choose a geodesic which makes an angle equal to ?oq?xi with
the geodesic connecting o to q? and choose the point ?xi such that ?d(?q?, ?xi) = d(q?,xi).
Then by applying the hinge version of Theorem 2.5.1 with upper bound on curvature
42
to the triangles?q?oxi and??q??o?xi we see that d(?o, ?xi) < d(o,xi), hence xi?B(?o,?).
Therefore, from Corollary 2.3.2, ??i = ??q??qc?xi < ?2. Now applying the usual Law of
Cosines formula to two triangles??qc?q??xi and??q?qc?xi yields
?d(?q, ?xi) > ?d(?q?, ?xi). (3.7)
Next, we apply Theorem 2.5.1 to triangles ?qq?xi and ??q?q??xi. To verify that we
can apply the theorem, we note that:
d(q,q?) = 2d(qc,q), d(xi,q?)?d(xi,o)+d(o,q?) < 2??d(qc,q). (3.8)
If d(xi,q)?2??d(qc,q), then we already have the claim that d(xi,q?) < d(xi,q), so
we assume d(xi,q) < 2??d(qc,q). Therefore, the perimeter of?qq?xi is smaller than
4? < 4rcx. Note that by construction ?xiq?q = ???oq?xi and ??xi?q??q = ????o?q??xi;
therefore ?xiq?q = ??xi?q??q. Applying Theorem 2.5.1 to the two triangles together
with (3.7) yield d(xi,q)? ?d(?xi, ?q) > d(xi,q?).
The Case ? > 0
In this case we proceed exactly as the case ??0, except that we need to use the
Spherical Law of Cosines to get (3.7). We elaborate on that. Let ??q??qc?xi = ??i. Then
??q?qc?xi = ????i. The Law of Cosines (2.21) in the triangles ??q??qc?xi and ??q?qc?xi
43
reads as
cos???d(?q?,?xi) = cos???d(?q?, ?qc).cos???d(?xi, ?qc)
+ sin???d(?q?, ?qc).sin???d(?xi, ?qc).cos ??i (3.9)
and
cos???d(?q, ?xi) = cos???d(?q, ?qc).cos???d(?xi, ?qc)
? sin???d(?q, ?qc).sin???d(?xi, ?qc).cos ??i, (3.10)
respectively. Noting that ?d(?q, ?qc) = ?d(?q?, ?qc) < ???, ?d(?xi, ?qc) < ???, ??i < ?2 and
comparing these two equations we see that (3.7) is valid in this case, too. The rest
of the proof is as the previous case.
3.2.1.2 Existence
By the previous part we have minB(o,?) fp ?infMfp. Considering continuity of fp,
this shows existence of a minimizer of fp on M. Note that the minimum of fp on
M happens in B(o,?) for 1 < p??, while it can happen inside the ball as well as
on the boundary when p = 1, in which case it must coincide with one of the data
points. We observe that for 1 < p <?the gradient of fp is continuous, and the Lp
mean is a stationary point of?fp in B(o,?); while, for p = 1 the center is a critical
point of f1 in B(o,?).
44
?i
qc
q
q?
o
xi
2?
?
???i
Pq
Mn
??i
2?
?
?xi
?o
?q
?qc
?q?
P?q
????i
Sn?
Figure 3.1: We apply comparison theorems to show that for any point q in the an-
nular region A ={x?M|? < d(o,x) < 2?}there is the point q? inside B(o,?) which
is the reflection of q across the boundary of B(o,?), such that d(q,xi) > d(q?,xi).
This is essentially the result of the fact that ?i < ?2. The left and right pictures
show the configurations in M and the model space Sn?, respectively. Corresponding
equal angles or sides in M and Sn? are marked.
3.2.1.3 Uniqueness
We assume 1?p <?and postpone the cases of p =?to the end of the section.
The Case ??0
Recall that since ? < ??,p = injM2 each x ? B(o,?) can be connected by a unique
geodesic to all xi?s. For ??0 and 1 < p <?, from (2.6) and Corollary 2.4.1, we
see that the Hessian of xmapsto? 1pdp(x,xi) is positive definite at xnegationslash= xi. From this and
since xmapsto? 1pdp(x,xi) is continuous at x = xi and d(xi,xi) = 0 < d(x,xi), we conclude
that x mapsto? 1pdp(x,xi) (and hence fp) is a strictly convex function inside B(o,?) for
1 < p <?. If p = 1 againunless allxi lie on a geodesic segment againxmapsto?f1(x) will
45
be strictly convex, otherwise, it will be only convex. It remains to consider p =?.
Let q,q? ? B(o,?) be two distinct minimizers of f?. Let r = f?(q) = f?(q?).
Note that we must have r < rcx. The closed balls B(q,r) and B(q?,r) must contain
{xi}Ni=1. By the strict convexity of the distance function along non-radial directions,
the distance between any point in{xi}Ni=1 and the midpoint of the unique minimizing
geodesic connecting q and q? is smaller than r. This contradicts the minimality of
both q and q?. Hence, the L? mean must be unique. This is all we want.
The Case ? > 0
Recall Corollary 2.4.1 and Definition 2.6 with positive ?. Unfortunately, for ? > 0,
the previous rather un-tuned method, which is based on showing the strict convexity
ofeach individual terms infp, gives convexity onlywhen ? < min{12injM, ?4??}. This
is because cot??d(x,xi) is positive only when d(xi,x) < ?2??. Therefore, in order
to guarantee strict convexity of xmapsto? 1pdp(x,xi) (and hence fp), one is forced to have
? < ?4??. Note that this is enough to prove our claim for 1?p < 2. For p =?also
the argument used for ??0 works for ? > 0. Henceforth, we assume 2?p <?
and ? > 0. The goal is to show that any stationary point of fp in B(o,?) is a local
minimizer of fp. Let q?B(o,?) be a stationary point of fp i.e., ?fp(q) = 0. Note
that if q coincides with the center o, then we have d(q,xi) < ?2?? for 1?i?N. In
this special case the Hessian of each individual term xmapsto? 1pdp(x,xi) at x = q = o
will be positive definite; and hence q will be a local minimizer of fp. Henceforth, we
assume that qnegationslash= o or equivalently d(q,o) > 0. Let tmapsto??(t) be a unit speed geodesic
leaving q?B(o,?) at t = 0. We denote the angle between the geodesic connecting
46
xi ?B(o,?) to q and ? at q by ?i. For convenience let us write d(xi,q) by di(q) or
even di. By Corollary 2.4.1 we have
d2fp
dt2 (?(t))|t=0?
Nsummationdisplay
i=1
wi
parenleftbigg
(p?1)?dp?2i (q)?cos2 ?i+dp?1i (q)?
?
??cot
?
?di(q)?sin2 ?i
parenrightbigg
.
(3.11)
The summand can be written as
(p?2)?dp?2i ?cos2 ?i +dp?2i ?parenleftbig1?sin2 ?i(1???.di?cot??di)parenrightbig. (3.12)
Since??di(q)cot??di(q)?1, we have1
d2fp
dt2 (?(t))|t=0?
Nsummationdisplay
i=1
widp?1i (q)??cot??di(q). (3.13)
We need to show that the above is positive if?fp(q) = 0. Let ?qo denote the unit
speed minimal geodesic connecting q to o. Let ?i denote the angle between the
geodesic connecting q to xi ?B(o,?) and ?qo. The projection of ?fp(q) along the
direction of ?qo at q is zero. Therefore, we have
summationdisplay
widp?1(q,xi)cos?i = 0. (3.14)
1 For 1 ? p < 2, we cannot get a relation similar to (3.13), i.e., in (3.12) we cannot get rid
of the effect of the angle ?i and reach at a lower bound independent of ?i which only involves di
and ?i. We will elaborate on the implications of this fact in Section 3.3. It is possible that a finer
analysis for 1?p < 2 will result in a better bound on ?, possibly depending on p. This can be a
topic for further research.
47
Our goal is to show that this is enough to guarantee that the Hessian of fp is positive
definite at q. We do this by comparison with the spherical case. Consider a ball
B(?o,?) ? Sn?. Choose a point ?q inside the ball such that ?d(?o, ?q) = d(o,q). Then
choose the point ?xi such that ?d(?q, ?xi) = d(q,xi) and ??xi?q?o = ?xiqo = ?i. Note that
by this construction ?xi might not belong to B(?o,?). However, applying Theorem
2.5.1 to the triangles??xi?q?o and?xiqo we observe that ?d(?o, ?xi)?d(xi,o) < ?; and
therefore, ?xi ?B(?o,?). Also note that since d(o,xi),d(o,q) < ? and d(q,xi) < 2?,
the perimeter of?xiqo is less than 4rcx and we are allowed to use Theorem 2.5.1.
By construction we have
Nsummationdisplay
i=1
wi ?dp?1(?q, ?xi)cos?i = 0, (3.15)
where as mentioned before ??o?q?xi = ?i for 1?i?N.
Note that B(?o,?) is inside the larger hemisphere H = B(?o, ???). The boundary
of a hemisphere is a ?global? totally geodesic submanifold of Sn?, i.e., a geodesic
connecting two points on the boundary belongs to the boundary for all times [9,
p.208]. We use this fact as follows. Continue the radial geodesic connecting ?q to
?o till it meets the boundary of the hemisphere at ?z. Also continue the geodesic
connecting ?q to ?xi to meet the boundary of H at ?yi (1 ? i ? N). This geodesic
meets the boundary of B(?o,?) at ?y?i. Figure (3.2) depicts the situation for a single
data point ?xi. For the reason explained before we have ??q?z?yi = ?2 (1 ? i ? N).
Then the application of the Spherical Law of Cosines twice in the triangle ??q?yi?z,
once for the angle ??q?z?yi = ?2 and once for the angle ??yi?q?z = ?i yields the crucial
48
???q?o
?z
?q
?xi
??
?
?
?y?i
?o
?yi
?i
Figure 3.2: ?o is the center of a hemisphere H of radius ??? in Sn?. Every data point
?xi belongs to B(?o,?) ? H, and ?q ? B is a critical point of ??fp. The minimal
geodesic connecting ?q to ?xi hits the boundary of the hemisphere perpendicularly at
?z. The minimal geodesic connecting ?q to ?xi hits the boundaries of B and H at ?y?i
and ?yi, respectively.
relation
cot???d(?yi, ?q) = cos?i cot???d(?z, ?q), 1?i?N. (3.16)
This relation enables us to relate ?i to ?d(?xi, ?q) via a constant (i.e., ?d(?z, ?q) which is
independent of i. Since the derivative of tmapsto?cott is smaller than?1 on (0,?) it is
easy to see that
cott1?cott2 ??(t1?t2), t1,t2?(0,?). (3.17)
Note that
?d(?yi, ?q)? ?d(?xi, ?q) = ?d(?xi, ?yi). (3.18)
Also note that ?d(?xi, ?yi)? ?d(?y?i, ?yi) and ?d(?y?i, ?yi)? ?2???? (see Figure (3.2)). There-
49
fore we have
?d(?yi, ?q)? ?d(?xi, ?q)? ?
2????. (3.19)
So from (3.17) we have
cot???d(?xi, ?q)?cot???d(?yi, ?q) + (?2 ????), 1?i?N. (3.20)
Combining this together with (3.16) we have ?d(?xi, ?q) < ?d(?yi, ?q) < ??? we have
cot???d(?xi, ?q)?cot???d(?z, ?q)cos?i+(?2????) > cot???d(?z, ?q)cos?i, 1?i?N.
(3.21)
Recall that ? < ?2??; hence we have the strict inequality. By our construction, the
above implies
cot??d(q,xi) > cot???d(?q, ?z)cos?i. (3.22)
Plugging the above in (3.13) and using (3.14), we conclude
d2
dt2fp(??(t))|t=0 > cot
???d(?q,?z)summationdisplayw
idp?1(q,xi)cos?i = 0, (3.23)
i.e., the Hessian of fp is positive definite at q. Since on the boundary of B(o,?),
??fp is pointing inward, the Poincar?e-Hopf Index Theorem implies that q is the
only zero of?fp in B. Hence, by our insideness argument in Subsection 3.2.1.1, q
is the unique minimum of fp in M. This completes the proof of the theorem.
50
3.2.2 The Case of an Arbitrary Probability Measure in M
Our existence and uniqueness result in Theorem 3.2.1 concerns with finite number
of points in M with corresponding weights. For most data processing applications
this is the natural framework. However, it is customary to consider existence and
uniqueness of themean for arbitraryprobability measures in M. It isstraightforward
to extend Theorem 3.2.1 to this more general case. In fact, the proof we gave for
Theorem 3.2.1 carries over to this case:
Theorem 3.2.2. Let w be a probability measure with support inside the ball B(o,?)
with ? < ??,p where ??,p is defined in Theorem 3.2.1. Then for 1 < p ??, the
Lp center of mass with respect to w is a unique point and lies inside B. For p = 1,
unless the w-measure of a geodesic segment is 1, again the center exists, is unique
and lies in B; and in the degenerate case the center might not be unique. Moreover,
for 1 < p <?the Lp center is in fact the only stationary point of fp (see (3.1)) in
B(o,?). In addition, for 1 < p < 2, fp is strictly convex in B(o,?); and for p = 1
it is again strictly convex unless in the degenerate case mentioned before in which
case f1 will be only convex.
Proof. The proof is essentially the same as that of Theorem 3.2.1.
3.3 Some Comments and Non-uniqueness of the L1 Mean in Manifolds
with Positive Curvature
As mentioned before, we could not get rid of the effect of the angles ?i in (3.12) for
1 ? p < 2; hence, ??,p for 1 ? p < 2 is smaller than the one for p? 2. It might
51
be possible to improve the bound for 1 ? p < 2. Nonetheless, a simple example
shows that the bound for p?2 is not valid for at least values of p equal or close to
1. Note that since xmapsto?d(q,x) in not C1 at q, every xi is a critical point of f1, and
for this reason the mean might be a critical point of f1 which is not a stationary
point, i.e., it can be one of the xi?s. In this situation ?x1 will not satisfy?f1(?x1) = 0,
necessarily. Now, consider the unit upper hemisphere in R3 and let o denote the
north pole. Consider three points x1,x2 and x3 which are located at (arc) distance
r from o such that the geodesics connecting o to xi?s make equal angles of ?3 with
each other. Figure 3.3 shows the top view of the upper hemisphere and the triangle
?x1x2x3. By symmetry, the triangle ?x1x2x3 is an equilateral triangle whose L2
(and in fact Lp,p?2 ) center of mass is o for r < ?2 by Theorem 3.2.1. At the same
time, as long as r < ?4, again by symmetry and the same theorem o is the only Lp
mean for 1?p < 2. Denote the side length of the triangle by a. The Spherical Law
of Cosines determines the dependence of a on r as:
cosa = cos2 r?12 sin2 r. (3.24)
Note that f1(o) = 3r and f1(xi) = 2a for i = 1,2,3. It is easy to check that there
is a number r1(?0.4248?), such that if r = r1, then f1(xi) = f1(o) for i = 1,2,3;
hence, o can no longer be the unique L1 mean. A direct numerical evaluation of f1
shows that the four points o,x1,x2 and x3 minimize f1. Note that at xi(i = 1,2,3)
the gradient of f1 is not defined; whereas, at o we have?f1(o) = 0. For r little bit
larger than r1, we have f1(xi) < f1(o) (i = 1,2,3). The same holds for values of
52
p close to 1; hence, o cannot be the unique Lp mean for those values of p. As an
example, for r = 0.4250?, x1,x2 and x3 (but not o) are the minimizers of f1 on the
sphere. For this value of r the sum of the internal angles of the triangle?x1x2x3 is
approximately 408?! Despite this example, it is left open whether the derived bound
for 1?p < 2 can be improved. Nevertheless, from the proof of Theorem 3.2.1, we
maintain that although there might be more than one Lp(1?p < 2) center of mass
when ? < rcx, they still must belong to the interior of B(o,?).
o
x1
r
?
2
120?
?
x2 x3
?
r ?
aa
a
r
120?
120?
Figure 3.3: Top view of the upper unit hemisphere in R3. o is the north pole
and ?x1x2x3 is an equilateral triangle whose vertices are symmetrically located
at distance r from o. At o the gradient of f1 is zero (we assume equal weights).
However, for values of r close to ?2 we have f1(xi) < f1(o) and the three vertices
themselves constitute their L1 center of mass.
53
Chapter 4
Properties of the Riemannian Lp Mean
4.1 Introduction
In this chapter we study some important properties of the Riemannian Lp means.
One expects the Riemannian Lp means restricted to small enough strongly convex
balls have more or less similar properties as their Euclidean counterparts. In most
occasions that is the case; however, there are exceptions, mainly in manifolds of
positive curvature.
4.1.1 Contributions and Outline of the Chapter
Two main contributions of this chapter are:
1. In Theorem 4.5.1 we give an affirmative answer to the question asked by
Groisser in [32] that whether finding the global L2 (or more generally Lp(1 <
p <?)) mean of{xi}Ni=1 automatically finds a zero of the gradient vector field
of fp which is in the convex hull of{xi}Ni=1? In Theorem 4.5.2 we give a refined
result for manifolds of constant curvature.
2. In Definition 4.6.1 of Section 4.6 we define some notions which help us bet-
54
ter understand the sensitivity or averaging properties of the L2 mean, and
in Proposition 4.6.1 and Theorem 4.6.1 discover some averaging and non-
averaging properties of theL2 mean in positively curved manifolds. The results
of this section are important, since they invite us to be cautious in averaging
data on positively curved manifolds.
The organization of this chapter is as follows: In Section 4.2 we define the minimal
ball of {xi}Ni=1 (also often ambiguously referred to as the ?circumscribed ball?),
describe its relation to the L? mean of{xi}Ni=1 and study its other properties. The
minimal ball is crucial to our developments in Chapter 5. In Section 4.3 we derive
smoothness properties of the Lp mean. Continuous dependence of the Lp mean on
the data, weights and p are very important facts which we shall use to derive other
results. In Section 4.4 we study the important property of isometry compatibility of
the Lp means. In Section 4.5 the convexity properties of the Lp mean, by which we
mean the relative position of the mean with respect to the convex hull of the data
points are studied. As mentioned before, in Section 4.6 sensitivity and averaging
properties of the L2 mean are investigated. In Section 4.7 we consider gradient
flow and steepest descent algorithms for finding the Lp mean, and give a numerical
example for finding the Lp mean on a sphere. In Appendix 4.8 we have gathered
some facts about convex sets and cones in Euclidean space which we need especially
in Section 4.5.
55
4.2 The L? Center of Mass and the Minimal Ball
In Theorem 3.2.1 we gave the existence and uniqueness condition for the L? cen-
ter of mass of {xi}Ni=1 ?M. Since the L? mean or center of mass has significant
geometrical meaning which we shall use in the next chapter for our further devel-
opments, it is useful to study it further. An important observation is that the L?
mean of{xi}Ni=1 is the center of the smallest closed ball containing the set.
Definition 4.2.1. Let{xi}Ni=1?M. We call a closed ball of minimum radius that
contains{xi}Ni=1 a minimal ball of{xi}Ni=1.
Remark 4.2.1. A minimal ball is neither unique nor convex. However, if {xi}Ni=1
lies in a strongly convex ball of radius ? < rcx, then by Theorem 3.2.1 the minimal
ball is both strongly convex and unique.
Remark 4.2.2. The minimal ball of a bounded set in an Hadamard manifold or the
related Bruhat-Tits space has been studied e.g., in [58]. Also, the minimal ball in
manifolds of positive curvature seems to have been studied or used very little. The
only reference that we are aware of which studies the minimal ball of a closed set
in a manifold of positive curvature is [1] where under the name of ?circumscribed
ball? it is defined for a particular simply connected manifold with positive upper
curvature. Very often, in Euclidean space also the minimal ball is referred to as the
?circumscribed ball.? However, note that e.g., for N = 3 points in the plane the
minimal ball of{xi}3i=1 might not coincide with the usual ?circumscribed circle? of
{xi}3i=1. To avoid any confusions we choose the name ?minimal ball?.
56
Theorem 4.2.1. Let ? < rcx. If {xi}Ni=1 lies in the closed ball B(o,?), then
B(?x?,f?(?x?)) where ?x? ? B(o,?) is the unique minimal ball of {xi}Ni=1 which
is strongly convex. Moreover, at least two of xi?s belong to the boundary of the ball
B(?x?,f?(?x?)). If exactly two of them lie on the boundary of the ball then they
are at antipodal positions. B(?x?,f?(?x?)) is the smallest closed strongly convex
ball containing{xi}Ni=1.
Proof. The fact that ?x?, the center of the minimal ball, is unique follows from
Theorem 3.2.1. By definition the radius of the minimal ball cannot be large than ?;
hence, the ball is strongly convex.
Now by a gradient-like method we establish the properties for the points on the
boundary of the minimal ball. Obviously one of xi?s, say x1, lies on the boundary of
B(?x?,f?(?x?)) . In addition to x1 another point, say x2, should lie on the boundary
of the minimal ball. Otherwise, we could find a smaller ball by moving the center
toward x1 and then shrinking the radius, thereby reducing f?(x) = d(x,x1) in a
small neighborhood of ?x?. Now assume that there are exactly two points x1 and x2
on the boundary of B(?x?,f?(?x?)). In this case f?(?x?) = d(x1, ?x?) = d(x2, ?x?)
and in a small neighborhood of ?x?, f?(x) = maxi=1,2 d(x,xi). If ?x1x?x2 = ? < ?,
then we could reduce f?(?x?) further. To see this, let ? : [0,1]?M with ?(0) = ?x?
be a geodesic leaving ?x? whose velocity vector is such that it makes an acute angle
1
2? with the two sides of the hinge (?x1?x?x2, ?x?x1, ?x?x2). Since the mentioned
angle is acute, from Proposition 2.2.1, we see that for small enough nonzero t,
d(?(t),xi) < d(?x?,xi) = f?(?x?), where i = 1,2 . Therefore, by moving ?x?
57
infinitesimally along ? we could reduce the radius of the minimal ball further, which
contradicts the minimality assumption. Hence, x1 and x2 ought to be at antipodal
positions. Notice, that we can do this radius reduction until x1 and x2 become
antipodal on the ball or a third point lies on the ball. This completes the proof.
Figure (4.1) shows the minimal ball of N = 5 points. Three of the points are
on the boundary and two of them are in antipodal positions. This need not be the
case in general, i.e., we can have three points on the boundary and no two of them
in antipodal positions. As it can be seen the center or the L? mean is blind to the
points in the interior. However, if x1 moves either inside or outside the ball or if x4
moves outside the ball, then the center of the minimal ball will change.
r = f?(?x?)
?x?x1 x2
x3
x4
x5
Figure 4.1: The minimal ball for{xi}5i=1. At least two of the points (here three of
them) are on the boundary of the ball. The center of the ball, denoted by ?x? is the
L? mean of the points. The radius of the ball is f?(?x?). The center is blind to
the data points inside the ball, i.e., small movements of these points have no effect
on the center or mean.
58
4.2.1 Lipschitz Continuity of f?(x)
Except for p = 1 and p =?, fp is at least continuously differentiable everywhere in
B(o,?) with ? < rcx (see (2.1)). For p = 1 or p =?, however, fp is only Lipschitz
with Lipschitz constant of L = 1. Recall that f : A ? M ?R is Lipschitz with
constant L if
|f(x)?f(y)|?L d(x,y), (4.1)
for all x,y ?A. The fact that given the weights wi, f1 is Lipschitz with constant
L = 1 is immediate by the triangle inequality. To see the Lipschitz continuity for
p =?we need little extra work. Let us define:
?fp = parenleftbig
Nsummationdisplay
i=1
widp(x,xi)parenrightbig1p, (4.2)
where the weights are kept fixed independent of p. Note that ?fp(x) ? f?(x) as
p ?? for all x; and that given the conditions in Theorem 3.2.1, both fp and ?fp
have ?xp as their unique minimizers. For our purposes and in this part we can assume
that all the functions of interest have the compact ball A = B(o,?), (? < ??,p),
as their fixed domains. Next, we show that ??fp(x)?p is a uniformly bounded and
equicontinuous sequence. Note that ?fp(x)?2?. The gradient of ?fp at x?A is
??fp(x) =?(
summationdisplay
i
widp(x,xi))1p?1
Nsummationdisplay
i=1
widp?2(x,xi)exp?1x xi. (4.3)
59
One can easily check that bardbl??fpbardblx ? 1 for all x ? A. Therefore, ?fp is Lipschitz
continuous with L = 1 for 1 ? p < ?. We use this fact to prove the following
proposition for our later use:
Proposition 4.2.1. Let {xi}Ni=1 ? B(o,?) with ? < 12injM. Then the sequence
??fp?p has a subsequence ??fpj?pj, which converges to f? uniformly on B(o,?) as
p??. This implies that f? is Lipschitz on B(o,?) with Lipschitz constant L = 1.
Proof. Since ?fp is Lipschitz with constant L = 1 for all 1 < p < ?, ??fp?p is
an equicontinuous and uniformly bounded sequence. The Arzela-Ascoli Theorem
guarantees the existence of a subsequence??fpj?which converges to f? uniformly on
B(o,?). Since the convergence is uniform, the limit function f? is Lipschitz with
constant L = 1, as well. To see this observe that for every ? > 0 and large pj
|f?(x)?f?(y)|?|f?(x)??fpj(x)|+|?fpj(x)??fpj(y)|+|?fpj(y)?f?(y)|??+d(x,y).
(4.4)
4.3 Smoothness Properties of the Lp Means
Note that ?xp depends on the data points{xi}Ni=1, on the weights wi and on the order
p. We show that ?xp depends continuously on these factors; however, higher orders
of smoothness do not hold always as we shall elaborate on. Unless otherwise stated,
by saying ?f is a smooth function?, we mean ?f has continuous derivatives of all
orders.? Before proving the next theorem, we recall some smoothness properties of
60
the Riemannian distance function xmapsto?d(q,x):
Proposition 4.3.1. Let (M,d) be a complete Riemannian manifold of injectivity
radius injM. Denote by Cq the cut-locus of q. Then we have:
1. The distance function xmapsto?d(q,x) is C? on M\parenleftbig{q}?Cqparenrightbig. In particular it is
C? on B(q,injM)\{q}.
2. At x = q, xmapsto?dp(x,q) has the same smoothness properties as ?xmapsto?bardbl?x??qbardblp
has at ?x = ?q where ?x, ?q?Rn andbardbl.bardblis the standard Euclidean norm in Rn.
Proof. The first part is a result of diffeomorphic properties of the exponential map
[78, p. 108]. The second part is the immediate property of the exponential map and
the Riemannian distance function.
Theorem 4.3.1. Let{xi}Ni=1 belong to a ball of radius ? defined in Theorem 3.2.1.
Then
1. For 1?p <?, if p is an even integer, then ?xp is a smooth function of wi?s and
xi?s; otherwise, ?xp is at least a C[p?1] function of wi?s and xi?s (for p = 1 the
non-degeneracy is assumed). Here, [a] is the integer part of a?R. Moreover,
if ?xp does not lie on any xi?s, then ?xp changes smoothly as wi?s and xi?s change
in a small neighborhood.
2. For p =?, ?xp is a continuous function of wi?s and xi?s.
Proof. First, we assume 2?p <?. ?xp isthesolutiontoF(x;x1 ...,xN,w1,...,wN) =
??fp = 0 (see (3.6)). We invoke the Implicit Function Theorem (IFT). The deriva-
61
tive of Fp with respect to x is nothing but the Hessian of fp which is positive
definite as shown before. However, note that Fp is not (local) C? at x = xi and
the degree of its smoothness at x = xi depends on p. For x /?{xi}Ni=1, Fp is a
local C? function of x. Similar, to the Euclidean case, if p is an even integer,
Fp is a local C? function of x. For other values of p, again similar to the Eu-
clidean case, Fp is only a C[p?1] function, because it is so at x = xi?s. Now the
result is immediate result of the Implicit Function Theorem. For 1?p < 2, note
that Fp is not differentiable at xi?s. Therefore, we can not invoke the IFT1. For
1 ? p < 2, we prove continuous dependence of ?xp on wi?s and xi?s directly. Let
v = [x1,...,xN,w1,...,wN]T. We take a sequence vk = [xk1,...,xkN,wk1,...,wkN]T
which converges to v = [x1,...,xN,w1,...,wN]T in MN?[0,1]N and we show that
the sequence ?xkp, i.e., the minimizer of ?f(k)p = ?fp,(x,vk) converges to ?xp. Recall the
definition of ?fp in (4.2) and that??fkp?k is an equicontinuous and uniformly bounded
sequence of functions on B(o,?). Therefore, by the Arzela-Ascoli theorem we can
extract a subsequence??fkjp ?kj which converges to ?fp uniformly. Now by the uniform
convergence we observe that given any ? > 0 for large enough kj
?? < ?fp(x;vkj)? ?fp(x;v) < ?. (4.5)
Therefore, min ?fp(x;vkj)?min ?fp(x;v) as vk ?v. Using this fact, the continuity
1The standard version of the Implicit Function Theorem, requires Fp to be at least C1 in all
variables and its continuity is not enough. In the current situation, i.e., p < 2, the maximum we
can expect some form local Holder smoothness, which might require stronger versions of the IFT.
However, here, we prove continuous dependence by a different approach
62
of ?fp(.;.) and noting that
|?fp(?xkjp ;v)? ?fp(?xp;v)|?|?fp(?xkjp ;v)? ?fp(?xkjp ;vkj)|+|?fp(?xkjp ;vkj)? ?fp(?xp;v)|, (4.6)
we have lim ?fp(?xkjp ;v) = fp(?xp;v). Due to compactness of B(o,?),??xkjp ?has a con-
verging subsequence. Denotethissubsequence againby ?xkjp . Wehave ?fp(lim ?xkjp ;v) =
lim ?fp(?xkjp ;v) = fp(?xp;v). Now, since ?xp is the unique minimizer of ?fp we must have
lim ?xkjp = ?xp. We maintain that lim ?xkp = ?xp. Otherwise, we have an infinite subse-
quence of ?xkp, whose all terms will stay away from ?xp, but by the previous argument
this subsequence must have a subsequence converging to ?xp which is a contradiction.
The above argument works forp = 1 as long asf1 hasa unique minimizer which
is ensured if{xi}Ni=1 are not on a geodesic. It also works for p =?because of Propo-
sition 4.2.1, which implies the equicontinuity of the sequence ?f?(x;xk1,...,xkN)?k.
Note that for p =?we have no weights. The proof is complete.
Probably, lack of high order smoothness for p =?is rather surprising, since
as p increases smoothness of ?xp in its dependence on xi?s increases. In Figure 4.1,
note that if x4 moves inside the minimal ball, the center will not change, but as it
moves outside the ball the center has to change. This can explain the nonsmooth
dependence of ?x? on the points on the boundary of the minimal ball. The following
simple corollary will be useful in Section 5.2:
Corollary 4.3.1. Let {xi}Ni=1 ? B(o,?) with ? < rcx. Then the radius of the
minimal ball of the set{xi}Ni=1 is a continuous function of the data points xi?s.
63
The following theorem concerns the dependence of ?xp on p.
Theorem 4.3.2. Let 1 ? p0 ? ? and p ? p0 (for p0 = 1 we only mean p
approaching 1 from the right). Assume{xi}Ni=1?B(o,?) where ? < min{??,p0,??,p}
for p close to p0. For p0 = 1 assume the data points are not on a geodesic. Then
?xp??xp0.
Proof. The proof can be exactly as the one in the second part of the proof of the
previous theorem, except here we take a sequence ??fp?p as p ? p0 which is again
equicontinuous and uniformly bounded (as explained before).
4.4 Isometry Compatibility
An important property of the L2 center of mass used in [34] is its isometric com-
patibly (see also [50]). By this we mean that if ? : M ?M is an isometry of M,
then the center of mass of{?(xi)}Ni=1 can be found by application of ? to the center
of mass of{xi}Ni=1.
Theorem 4.4.1. Let{xi}Ni=1 ?B(o,?) with ? < ?p,?. Assume ? is an isometry of
M. Then the Lp center of mass of{?(xi)}Ni=1 is ?(?xp).
Proof. For 1 < p < ?, ?xp is the only zero of ?fp(x;x1,...,xN) inside B(o,?).
Since ? is an isometry, from the chain rule it follows that ?(?xp) is a zero of the gra-
dient of fp(x;,?(x1),...,?(xN)) in B(?(o),?) (see (3.6) and note that d(xi, ?xp) =
d(?(xi),?(?xp)). Because {?(xi)}Ni=1 has a unique center in B(?(o),?), we conclude
that ?(?xp) must be this unique center. For p = 1,?the result follows from conti-
64
nuity with respect to p. Note that for p = 1 the claim holds even in the degenerate
case.
4.5 Convexity Properties of the Lp Means
The Lp mean of{xi}Ni=1 exists uniquely as soon as{xi}Ni=1 lies in a strongly convex
ball of radius ? determined in Theorem 3.2.1. Note that the mean is independent
of the mentioned ball and the existence of other candidate balls has no effect on
the position of the mean. Therefore, the mean belongs to the intersection of all
strongly convex balls containing{xi}Ni=1. One would like the mean to belong to the
convex hull of {xi}Ni=1. The proof of this fact is not immediate, since the convex
hull of a set is defined as the intersection of all strongly convex ?sets? containing
that set and not merely the ?balls.? To appreciate the difference, let M be an open
disk in R2 and{xi}3i=1 vertices of a triangle in M. The convex hull of{xi}3i=1 is the
triangle?x1x2x3 itself, but we cannot construct the triangle by intersection of any
collection of open balls in M containing {xi}3i=1, since these balls are of bounded
radius. However, if M = R2, then we could construct the triangle as an intersection
of infinitely many open circles of unbounded radius. Still, we can prove the following
theorem using the uniqueness of the stationary points of fp inside the candidate ball:
Theorem 4.5.1. If the set of points{xi}Ni=1 lies in a ball of radius ? < ??,p defined
in Theorem 3.2.1, then the Lp mean for 1 ? p ??, in general, belongs to the
closure of the convex hull of the points. More precisely,
1. For 1 < p <?, if at least one of the xi?s belongs to the interior of the convex
65
hull then ?xp also belongs to the interior of the convex hull; otherwise, it belongs
the closure of the convex hull.
2. For p = 1,?, ?xp belongs to the closure of the convex hull of{xi}Ni=1.
Moreover, for 1 < p??, ?xp belongs to the interior of the minimal ball of{xi}Ni=1
and for p = 1 it might belong to the boundary of the ball in addition to its interior.
The first part of the previous statement holds for the Lp mean of any K elements
of{xi}Ni=1; more precisely, the Lp mean of any K elements of{xi}Ni=1 where at least
two of them are distinct belongs to the interior of the minimal ball of{xi}Ni=1.
Proof. Let B(o,?) be the candidate ball. We know that ConvHull({xi}Ni=1), the
convex hull of {xi}Ni=1, belongs to B(o,?). Since we deal with finite number of
points we can assume that the closure of the convex hull also belongs to B(o,?)
(and in fact to its interior). First, let 1 < p < ?. We already know that fp is
strictly convex in B(o,?) and hence in the closure of ConvHull({xi}Ni=1) which is a
strongly convex subset of B(o,?) by Proposition 2.3.1. About the vector field??fp
on the boundary of ConvHull({xi}Ni=1) we have:
Lemma 4.5.1. Let r < ??,p and{xi}Ni=1 belong to a ball of radius r. Then:
1. For 1 < p < ?, if at least one of xi?s, say x1, belongs to the interior of the
convex hull of{xi}Ni=1, then at every point on the boundary of the convex hull
the vector field??fp is pointing inward.
2. For 1 < p <?, at every point on the boundary of the minimal ball of{xi}Ni=1
the vector field??fp is pointing inward.
66
Proof. Denote the convex hull of{xi}Ni=1 by C. The condition on r guarantees that
any pair of points of interest can be connected by a unique minimizing geodesic and
that the convex hull and the minimal ball exist uniquely. Let x be a point on the
boundary of C. From Theorem 2.3.2 we know that Cx, the tangent cone of intC
at x, is an open convex cone as a subset of TxM. Let ?i : [0,d(x,xi)]?M be the
normal geodesic that connects x to xi. For ?i two scenarios might happen:
1. If for some t, ?i(t) ? intC then, by definition, ??i(0) ? Cx. In particular,
??1(0)?Cx.
2. If for all t ? [0,d(x,xi)], ?i(t) /? intC, then since by Proposition 2.3.1 ?C is
strongly convex, ?(t) ? ?C for all t ? [0,d(x,xi)]. Let ?yk??k=1 with yk ?
intC be a sequence of points converging to xi and ck : [0,d(x,yk)] ? ?C the
sequence of geodesics that connect x to yk?s. Note that ?ck(0) ? Cx. The
sequence of initial velocities ??ck(0)?k is a bounded sequence in TxM; and it
has a convergent subsequence which must converge to ??i(0); therefore ??i(0)
belongs to the boundary of Cx.
Therefore, ??i(0)?s belong to ?Cx ? TxM and at least ??1(0) belongs to the interior
of the closed convex cone ?Cx. Since ??fp(x) is a linear combination of ??i(0)?s
with positive weights, we conclude that??fp(x) belongs to the interior of Cx (see
Appendix 4.8). Hence, along the boundary of the convex hull??fp(x) points inward
the hull. This proves the first claim. The situation for the minimal ball is easier,
since its boundary is smooth. In this case for any point x on the boundary of the
ball, including those xi?s that lie on the boundary, each ??i(0) (defined as before) is
67
pointing inward (see Corollary 2.3.1); so is??fp(x).
Recall that the interior of C = ConvHull({xi}Ni=1) is a smooth convex subman-
ifold of M. Assuming x1 is in the interior of the convex hull, the vector field??fp
is pointing inward the convex hull, and the minimum of fp|?C, the restriction of fp to
the compact set ?C should happen in the interior of the hull. This point should be a
stationary point of fp|?C, i.e., where the projection of?fp(?xp) onto T?xpintC is zero.
From Proposition 2.3.1, ?C is strongly convex; therefore, each unique minimizing
geodesic connecting ?xp to xi?s (which might lie on the boundary) belongs to ?C. Us-
ing this and since the interior of C is totally geodesic by Theorem 2.3.2 each velocity
vector exp?1?xp xi?T?xpM (1?i?N) has no component out of T?xpintC. This means
that at ?xp actually ?fp(?xp) = 0 (see (3.6)). Now recall from Theorem 3.2.1 that
?fp has ?xp as its only zero in B(o,?). Therefore, ?xp should belong to the interior
of the convex hull of {xi}Ni=1. A similar argument shows that, for 1 < p < ?, ?xp
belongs to the interior of the minimal of {xi}Ni=1. This is independent of whether
any of xi?s belong to the interior of the minimal ball, since ??fp will be always
pointing inward the minimal ball on the boundary of the minimal ball by Part 2 of
Lemma 4.5.1.
The rest of the proof is comprised of limit arguments to address all the re-
maining cases:
1. First, for1 < p <?andif noneofxi?sbelong to theinterior ofConvHull({xi}Ni=1),
we use the smoothness results from the previous section (Theorem 4.3.1). Let
all xi?s belong to the boundary of the convex hull and let?xk1?k be a sequence
68
of interior points that converges to x1. Consider new sets of data points
{xk1,x1,...,xN} comprised of N + 1 points and corresponding weight vector
[12w1, 12w1,w2,...,wN]T in defining the mean. All these sets have the same
convex hull as our original set. Denote the Lp mean of each set by ?xkp. By
the preceded argument ?xkp belongs to the interior of ConvHull({xi}Ni=1) for ev-
ery k. Since the Lp mean (of N + 1 points) is a continuous function of the
points we have ?xkp ? ?xp as xk1 ?x1. Therefore, ?xp belongs to the closure of
ConvHull({xi}Ni=1).
2. Second, for p =?we have already shown that when{xi}Ni=1 (and the weights)
are fixed, as p ??, ?xp ? ?x?. This implies that ?x? must belong to the
closure of the convex hull of{xi}Ni=1. The same argument also applies to the
case p = 1, provided we exclude the degenerate situation. However, for the
degenerate situation even though the minimizer of f1 might not be unique it
belongs to the interior of the convex hull. This completes the proof of the
theorem.
The proof of the theorem suggests that ?x1 or ?x? might lie on the boundary of
the convex hull although some of xi?s lie in the interior of the convex hull. One can
see this easily in examples in the plane.
Remark 4.5.1. Note that the Lp mean of a subset of{xi}Ni=1 belongs to the interior
of the minimal ball of{xi}Ni=1 according to this theorem, however, it will not belong
69
to the interior of the convex hull of{xi}Ni=1 even in the case of M ?Rn. This shows
a dramatic difference in inclusion properties of the convex hull and the minimal ball.
Remark 4.5.2. Our proof suggests that in the case we have countable number of
points with positive weights also the theorem claims hold.
Remark 4.5.3. In [32], Groisser asks the question whether (global) minimization
of f2 (see (3.1)) automatically finds a zero of the vector field?f2 which is inside the
convex hull of the support of the underlying probability measure w. Theorem 4.5.1
gives an affirmative answer to this question with two technical caveats. The first one
is that we assume w is a probability mass function. Note that for a more general
probability measure the question might become trivial. Second, we showed that the
center belongs to the ?closure of the convex hull? rather than to the convex hull
itself. Resolving this technicality seems to be difficult, but in the next subsection
we elaborate on that further.
4.5.1 A Refined Result for Constant Curvature Manifolds
The exact structure of the convex hull of a finite set of points in a manifold of
non-constant is not known. Specifically it is not known whether the convex hull
is closed or not [9, p. 231]. If it were a closed set, as in the case of Euclidean
space, our statement in part 1 of Theorem 4.5.1 would include ?convex hull? rather
then ?closure of the convex hull.? Here, we show that in a manifold of constant
curvature the Lp mean (1 < p <?) of{xi}Ni=1 belongs to the interior of the convex
hull of the set. The main property that helps us show this is the fact that these
70
manifolds posses generic totally geodesic submanifolds of lower dimension, which
is rare in the case of manifolds of non-constant curvature. We recall the following
axiom of plane from [78, p. 136]: Let M be a manifold of constant curvature
of dimension n. Let W be any r dimensional subspace of TqM, q ? M. Then
the set S?(W) = {expq X|X ? W,bardblXbardbl < ?} is an r-dimensional totally geodesic
submanifold of M for every ? < injM (see Definition 2.3.3).
Theorem 4.5.2. Let M be a complete Riemannian manifold of constant curvature
?. Define ??,p as in (3.5) with 1 < p <?and let ? < ??,p. Then the Lp mean of
{xi}Ni=1?B(o,?) belongs to the interior of the convex hull of{xi}Ni=1.
Proof. As before denote the convex hull by C. Recall our definitions and arguments
in the proof of Theorem 4.5.1 and Lemma 4.5.1. We only need to show that??fp(x)
is pointing inward the convex hull. As mentioned before, exp?1x xi belongs to Cx for
1 ? i ? N. For convenience, let ?i = dp?2(x,xi)exp?1x xi. Note that ??fp(x) =
summationtext
i wi?i (see (3.6)). We assume that all ?i?s belong to the boundary of the cone
Cx; otherwise, similar to Theorem 4.5.1,??fp(x) will be pointing inward C. First,
assume that the dimension of C is n. Let all ?i?s belong to a face F of dimension k <
n. Consequently, ?i?s and hence exp?1x xi?s belong to a k-dimensional subspace W ?
TpM, which in turn implies that{xi}Ni=1 belongs to a totally geodesic submanifold of
dimension smaller than n. This is a contradiction since we assumed the dimension of
the convex hull to be n. Now, assume there is no face of dimension less than n which
contains all ?i?s. If summationtexti wi?i does not belong to the interior of the hull, it belongs to
a face F? of dimension k? < n. It follows from the properties of closed convex cones
71
and their faces that ?i?F? for 1?i?N which is again a contradiction. Therefore,
if the dimension of the convex hull is n,??fp(x) must be pointing inward the hull
at every point x on the boundary of the hull. Next, if the dimension of C (or Cx)
is less than n,{xi}Ni=1 belongs to a totally geodesic submanifold of lower dimension
and we can apply the same argument to this submanifold. Note that our claim
is trivial if the data points lie on a geodesic, i.e, a totally geodesic submanifold of
dimension one. Therefore, we conclude that on the boundary of the convex hull of
{xi}Ni=1, the vector field??fp is pointing inward the hull.
We emphasize that the assumption of wi > 0 in the definition of the mean
is crucial for this result to hold. The main obstacle in extending the preceding
argument to the non-constant curvature case is that a face of the cone Cx does not
map to ak-dimensional (1 < k < n) totally geodesic submanifold via the exponential
map, necessarily. Whether constant curvature is necessary to ensure the mean to
belong to the interior of the convex hull is not clear.
4.6 Sensitivity Properties of the L2 Mean
The standard Euclidean mean enjoys an appealing averaging or smoothing property;
and in fact that is the main reason for its use in everyday life. We distinguish three
forms of averaging properties:
Definition 4.6.1 (Averaging properties). Given B(o,?) ? M, let ?w,B : BN =
B?...B ?B denote the L2 mean function on B, i.e., ? = ?wN,B(x1,...,xN) is
the L2 mean of{xi}Ni=1?B with corresponding weight vector wN = [w1,...,wN]T.
72
Then we say
1. ?wN,B is strongly averaging if for any two sets of points{xi}Ni=1 and{x?i}Ni=1 in
B we have
d(?wN,B(x1,...,xN),?wN,B(x?1,...,x?N))?
summationdisplay
i
wid(xi,x?i). (4.7)
2. ?wN,B is averaging if for any two sets of points {xi}Ni=1 and {x?i}Ni=1 in B we
have
d(?wN,B(x1,...,xN),?wN,B(x?1,...,x?N))?max
i
d(xi,x?i). (4.8)
3. ?wN,B is weakly averaging if for any set of points {xi}Ni=1 in B replacing a
single point xj by another point x?j does not change the position of the mean
by more than d(xj,x?j), i.e.,
d(?wN,B(x1,...,xN),?wN,B(x1,...,xj?1,x?j,xj+1,...,xN))?d(xj,x?j). (4.9)
In occasions we might refer to the L2 mean without being specific about B(o,?),
wN, or N; in which case we assume that ? < ??,2, o, w or N are arbitrary.
Strong averaging implies averaging which in turn implies weak averaging. It is
easy to see that the Euclidean L2 mean in Rn is strongly averaging. More generally,
in Theorem 4.6.1 we show that the Riemannian L2 mean in a manifold of nonpos-
itive curvature also has the same property. For manifolds of positive curvature the
situation is completely different and the L2 mean is not strongly averaging. The
73
interpretation is that, the mean can have high sensitivity with respect to the noise.
4.6.1 Lack of (Strong) Averaging Property for L2 Mean in Manifolds
of Nonnegative Curvature
Here, we show why the L2 mean cannot be strongly averaging if M is of nonnegative
sectional curvature 1. Let q ? M and ?(t) = expq tX be a unit speed geodesic
emanating from q with velocity vector X ? TqM. Take two points x1 = expq t1X
and x2 = expq t1X on ? with 0 < t1 < t2 < injM. The L2 mean of the set{x1,x2}
with weight vector w2 = [w,1?w]T(0 < w < 1) is ?x = expq(wt1+(1?w)t2)X. Now,
consider another geodesic ?(t) = expq tX? emanating from q with velocity vector X?
of unit norm. Take perturbed versions of x1 and x2 on ?, as x?1 = expq t1X? and x?2 =
expq t1X?. Obviously, the L2 mean of{x?1,x?2}with weight vector w2 = [w,1?w]T
is ?x? = expq parenleftbigwt1 + (1?w)t2parenrightbigX?. It is a standard matter to show that (e.g. [66])
d(expq tX,expq tX?) = tbardblX?X?bardblqparenleftbig1?Kq(X,X
?)
12 (1 +?X,X
??q)t2parenrightbig+o(t), (4.10)
where Kq(X,X?) is the sectional curvature of M at q along the plane spanned by
unit norm tangent vectors X,X? ?TqM and o(t)t3 ?0 as t?0. If Kq(X,X?) > 0,
then tmapsto?h(t) = d(expq tX,expq tX?) is a strictly concave function on the interval
(0,?) for small enough ?. Therefore, for 0 < t1 < t2 < ? and 0 < w < 1 we
have wh(t1) + (1?w)h(t2) < h(wt1 + (1?w)t2). This means that d(?x, ?x?) >
wd(x1,x?1)+(1?w)d(x2,x?2), which implies that the L2 mean for two points cannot
1Tacitly we assume that the sectional curvature is not identically zero.
74
be strongly averaging. By choosing repeated points we can conclude that the L2
mean for more than two points also cannot be strongly averaging. Note that for
small enough ?, tmapsto?d(expq tX,expq tX?) is strictly increasing on the interval [0,?].
Therefore, d(?x, ?x?) < d(x2,x?2) for small t2. This suggests that although the L2 mean
is not strongly averaging it might be averaging in small domains. Next, we show
that this is not possible if the sectional curvatures of M are bounded below by ? > 0
and above by ?. First, we consider the constant curvature case and specifically the
unit sphere (in R3). Figure 4.2 explains the situation. Consider two geodesics (great
circles) t mapsto??(t) = expq1 tX and t mapsto??(t) = expq1 tX? with unit velocities X and
X?, respectively, that leave q1 at angle ?. The two geodesics meet again at q2,
the antipodal of q1. As before we consider the function tmapsto?d(expq1 tX,expq1 tX?).
The interesting point is that this function achieves a maximum at ?t = ?2 which
corresponds to midpoints ?x = expq1 ?tX and ?x? = expq1 ?tX?. The two geodesics
deviate from each other for 0 < t < ?t and become closer for ?t < t??. Now, if we
choose x1 = expq1(?t?wl)X and x2 = expq1(?t?(1?w)l)X, then the L2 mean of
{x1,x2}with weight vector w2 = [w,1?w]T is ?x for 0 < w < 1 and all 0?l??t. On
? we choose x?1 = expq1(?t?wl)X? and x?2 = expq1(?t?(1?w)l)X? whose L2 mean is
?x?. Obviously, d(?x, ?x?) > maxi d(xi,x?i). This means that the L2 mean of two points
in a ball which contains our data points cannot be averaging. The same argument
by choosing repeated points that coincide with x1 and by varying w implies that
the L2 mean cannot be averaging for more than two points, as well. Note that by
choosing ? and l small enough we see that the L2 mean cannot be averaging even
in arbitrary small balls on S2.
75
?x? o
?
x?2
x?1 x1
x2
?
q2
?
XX?
?
q1
?x
Figure 4.2: Top view for the upper unit hemisphere in R3. The pairs of points
x1,x2 and x?1,x?2 lie on two geodesics separated by angle ? and connecting the two
poles q1 and q2, as shown. ?x and ?x? are the midpoints of the two geodesics. x1 and
x2 (x?1 and x?2) are mirror images with respect to ?x(?x?). For all values of ? > 0
and no matter how close x1 and x2 (and x?1 and x?2) to each other are, we have
that d(?x, ?x?) > d(xi,x?i), i = 1,2. Therefore, the L2 mean on the sphere cannot be
averaging, even in very small balls.
In M, we shall use a more general argument using the properties of Jacobi
fields to show that one can always find data points{x1,x2}and their perturbations
{x?1,x?2} for which the L2 mean is not averaging. Consider q1 ?M and the family
of geodesics tmapsto?f(t,s) = expq1 t(X + sY) where X ?Tq1M and Y ?TXTq1M 1 .
Denote the geodesic tmapsto?f(t,0) by ?(t). In the following we denote all derivatives,
either the Levi-Civita covariant derivative of a vector field or the standard derivative
of function, with respect to t by ?. The covariant derivatives of the related vector
fields are taken along t mapsto? ?(t). Assume that bardbl??(0)bardbl = bardblXbardbl = 1. The distance
between points f(t,s) and f(t,0) can be written as:
d(expq1 tX,expq1 t(X +sY)) =bardblJ(t,0)bardbls+o(t,s), (4.11)
1Here, to avoid confusion and for convenience we shall departure from our original notation on
Jacobi fields introduced in Section 2.4 and interchange the role of the t and s variables.
76
where tmapsto?J(t,0) = J0(t) = J(t) is the Jacobi field along tmapsto?expq1 tX with initial
conditions J(0) = 0 and ?J(0) = Y; and lims?0+ o(t,s)s2 = 0. We know that there
exists an orthonormal basis of parallel vector fields{??(t),E1(t),...,En?1(t)}along
?. Recall, that a vector field is parallel along the geodesic tmapsto??(t), if its covariant
derivative along ? is identically zero. We take the two dimensional section along ?
spanned by{??(t),E1(t)}, and denote the sectional curvature of M along this section
by K(??(t),E1(t)) = K(t). Note that 0 < ? ? K(t) ? ?. Consider a Jacobi field
J(t) with J(0) = 0 along this section, which is orthogonal to ??(t) along ?(t):
J(t) = g(t)E1(t), (4.12)
where g(0) = 0 , ?g(0)negationslash= 0 and g(t) : R?R satisfies
?g(t) +K(t)g(t) = 0. (4.13)
This equation is derived from the Jacobi equation (2.3); and ?J(0) = ?g(0)E1(0) = Y.
The latter is the only restriction that we impose on Y introduced before. Due to
linearity of (4.13) we can assume that ?g1(0) > 0. It is a well-known consequence of
Sturm?s comparison theorems for ordinary differential equations (e.g. [24, pp.210-
211]) that
?g(0)sin
??t
?? ?g(t)? ?g(0)sin
??t
?? (4.14)
for t > 0. In the above, each inequality is valid as long as its left side remains
positive. It is easy to see that at some time ??? ?tc ? ???, g(tc) = 0 or J(tc) = 0;
77
but for t ? (0,tc) we have g(t) > 0. Note that q2 = ?(tc), is a conjugate point
of q1 along ?, but not necessarily the first conjugate point. Since g(t) > 0 on the
interval (0,tc) we have bardblJ(t)bardbl = g(t); and since g(0) = g(tc) = 0 there should be
a time 0 < ?t < tc at which bardblJ(t)bardblachieves its maximum on the interval [0,tc]. At
?t, we have ?g(?t) = 0 and from (4.13) we have ?g(t) =?K(t)g(t) < 0 in the interval
(0,tc) i.e., g is strictly concave in that interval. Therefore, ?t is the only maximizer
of bardblJ(t)bardbl = g(t) on [0,tc]. The existence of such a ?t is the main ingredient we
needed. We proceed very similarly to the sphere case now. Let ?x = expq1 ?tX and
?x? = expq1 ?t(X+sY) and choose t1,t2?(0,tc)\{?t}such that t1 < ?t < t2. We choose
two points x1 = expq1 t1X and x1 = expq1 t2X on tmapsto?expq1 tX. Similarly, we choose
x?1 = expq1 t1(X + sY ) and x?2 = expq1 t2(X + sY) on tmapsto?expq1 t(X + sY ) and on
both sides of ?x?. Let w = ?t?t1t2?t1; we have 0 < w < 1. Now, ?x is the L2 mean of
{x1,x2} with the weight vector w = [w,1?w]T and ?x? is the L2 mean of{x?1,x?2}
with the same weight vector w. Also from (4.11) we conclude that for small enough
s which depends on t1 and t2
d(expq1 ?tX,expq1 ?t(X +sY )) > d(expq1 tiX,expq1 ti(X +sY)) (4.15)
for i = 1,2. Therefore, we have:
d(?x, ?x?) > max
i=1,2
d(xi,x?i). (4.16)
Note that this result holds even for t1 and t2 very close to each other and for small
78
s. Therefore, the L2 mean of two points is not averaging even in small domains.
This argument, by using repetitive points and changing the weight vector, can be
extended to more than two points.
To summarize we state the following:
Proposition 4.6.1. The L2 mean is not strongly averaging in a manifold of non-
negative sectional curvature (excluding zero curvature). It also is not averaging in
a manifold with positive lower sectional curvature bound.
Note that the sectional curvature of a 1-dimensional manifold is zero and the
above excludes the case of unit circle in R2, for example. A few words are due at
this stage. First, note that we have not ruled out the possibility of finding a set
of points and their perturbations whose L2 means satisfy the averaging or strong
averaging properties: we just showed that (4.7) and (4.8) cannot hold for arbitrary
sets of points in a manifold of positive curvature. Second, we emphasized on the lack
of averaging property even in small domains, since one might suspect that this non-
averaging behavior is a global phenomenon. On the other hand, as we show next,
one can achieve weak averaging by restricting the data points to a small enough
domain.
4.6.2 Weak Averaging Property of the L2 Mean in Positively Curved
Manifolds: An Example
In manifolds of positive curvature, one can attain weak averaging in small domains.
First, we give an example which shows that the L2 mean can not be weakly averaging
79
in large domains. Consider the unit sphere S2 in R3. Let x1 and x2 belong to the
upper hemisphere and assume that they are very close to the boundary of the
hemisphere in almost antipodal positions as shown in Figure 4.3. In this figure x1 is
on the yz plane and x2 is in a small neighborhood of the plane and close to the xy
plane. The z axis is normal to the page and upward pointing. ?x is the midpoint of
geodesic segment connecting x1 and x2. Now if x2 gets reflected to the left of the yz
plane to a new point x?2, which is still close to x2, the midpoint of the new geodesic
segment between x1 and x2 will be ?x? which will be very far from ?x. This means
that L2 mean on the sphere is not weakly averaging in a ?large? domain such as
a hemisphere. This is a global phenomenon; and stems from the fact that x1 and
x2 are close to the cut locus of each other. Recall that the lack of strong averaging
property on the sphere was due to the local effect of curvature.
y
x1
x?2 x2
d(x?2,x2)
d(?x?,?x) ?x?x?
x
Figure 4.3: Top view of the upper unit hemisphere in R3. ?x and ?x? are the midpoints
of the geodesics connecting x1 to x2 and x1 to x?2, respectively. We have d(?x, ?x?)?
d(x?2,x2) while x2 and x?2 are very close to each other. Therefore, sadly, the L2 mean
is not even weakly averaging in the hemisphere!
We try to quantify this phenomenon. Let x1 ?M and X2,X?2?Tx1M be two
tangent vectors at x1 such that x2 = expx1(X2) and x?2 = expx1(X?2). Assume that
80
the points belong to a ball B(o,?) with ? < rcx. The midpoints of the geodesic
segments joining x1 and x2 and joining x1 and x?2 are ?x = expx1(X22 ) and ?x? =
expx1(X?22 ), respectively. If M is of non-negative curvature with curvature upper
bound ?, then using (2.19) twice gives d(?x, ?x?) ? 12 ? 2?
??
sin2??? ?d(x2,x
?
2). Applying
the hinge version of the C.A.T comparison theorem with ? = 0, i.e., comparing with
a triangle in Rm shows that 12d(x2,x?2) ? d(?x,?x?); and together with the previous
relation we conclude
1
2d(x2,x
?
2)?d(?x, ?x
?)? 1
2?
2???
sin2????d(x2,x
?
2). (4.17)
If M is of non-positive curvature with lower curvature bound of ? < 0, applying
(2.20) yields 12 ? 2?
?
|?|
sinh2?
?
|?|d(x2,x
?
2) ? d(?x, ?x
?). If we apply the hinge version of the
C.A.T comparison theorem with curvature upper bound of ? = 0 and use the
previous relation we have
1
2?
2?radicalbig|?|
sinh2?radicalbig|?|?d(x2,x
?
2)?d(?x, ?x
?)? 1
2d(x2,x
?
2). (4.18)
Therefore, in a manifold of negative curvature the equal weight L2 mean (of two
points) shows better sensitivity behavior than the L2 mean in the Euclidean space.
However, in a manifold of positive curvature we need to restrict the data to small
domains in order to guarantee a weakly averaging behavior of the L2 mean. For
example, if we require
? < min{injM2 , 0.3???}, (4.19)
81
then d(?x, ?x?) < d(x2,x?2). Note that this bound is more restrictive than ??,2 < rcx
but better than Karcher?s bound (3.4).
4.6.3 A General Bound for Sensitivity
Here, we show that the L2 mean in a manifold of nonpositive curvature is always
strongly averaging, and in a manifold of positive curvature it can be only weakly
averaging inside small enough balls as described below. The main result that we use
is (4.24) which is essentially Corollary 1.6 in [50]. However, there the underlying
constants are not given explicitly, while we would like to have them explicitly. To
derive (4.24) we need the following result which is given as Theorem 1.5.1 in [50]:
Let {xi}Ni=1 ?B(o,?) with ? < 12 min{injM, ?2??}. For ?x2 the L2 mean of {xi}Ni=1
(with respect to a weight vector w), and any x?B(o,?) we have
bardbl?f2(x)bardbl?d(x, ?x2)?min{1,2?cs?2?sn
?2?
}= d(x, ?x2)?
?
???
???
1 ??0
2????cot(2???) ? > 0
(4.20)
where sn? and cs? are defined in (2.6). Consider another set of points, i.e., the
perturbations of xi?s, as {x?i}Ni=1 ? B(o,?). We denote the L2 mean of {x?i}Ni=1 by
?x?2. Since the weights in the definition of f2 are fixed and only xi?s change, we use
f2(x;x1,...,xN) to show the dependence of f2 on xi?s. Now, for x = ?x?2 in (4.20)
we have
d(?x,?x?)?bardbl?f2(?x?;x1,...,xN)bardbl?
?
???
???
1 ??0
1
2????cot(2???) ? > 0
. (4.21)
82
Recall the definition of?f2(x;x1,...,xN) and the fact that?f2(?x?;x?1,...,x?N) = 0;
so we have
?f2(?x?;x1,...,xN) = ?f2(?x?;x1,...,xN)??f2(?x?;x?1,...,x?N)
=
Nsummationdisplay
i=1
wiparenleftbig?exp?1?x? xi + exp?1?x? x?iparenrightbig. (4.22)
From (2.19) and (2.20) we have
bardblexp?1?x? xi?exp?1?x? x?ibardbl?d(?xi,?x?i)?
?
???
???
1 ??0
2???
sin(2???) ? > 0
. (4.23)
Recall that the condition needed to use (2.19) and (2.20) is ? < rcx which holds
here. Now, (4.21), (4.22) and (4.23) yield:
d(?x,?x?)?
?
???
???
1 ??0
1
cos(2???) ? > 0
?
???
????
Nsummationdisplay
i=1
wid(xi,x?i). (4.24)
Note that for this bound we need ? to satisfy (3.4), which is the condition we
assumed in (4.20). We state the following theorem:
Theorem 4.6.1. The L2 mean of N points is strongly averaging in a manifold of
nonpositive curvature. In a manifold with upper curvature bound of ? > 0, the L2
mean is weakly averaging in a ball of radius ? if ? satisfies
? < 12 min{injM, cos
?1(maxi wi)
?? }. (4.25)
83
Proof. From (4.24) we see that L2 mean is strongly averaging in a manifold of
nonpositive curvature. By restricting ? we can achieve weak averaging in B(o,?): if
cos2????maxi wi, then d(?x,?x?)?d(xj,x?j), where in the data set {xi}Ni=1, xj is
replaced by x?j. To satisfy the mentioned restriction ? must obey (4.25).
Remark 4.6.1. In the case of equal weights (4.25) reads as
? < 12 min{injM, cos
?1( 1
N)?
? }. (4.26)
Note as N increases the upper bound converges to 12 min{injM, ?2??}, which is
Karcher?s bound (3.4). The bound (4.26) is more conservative than the bound
we found in (4.19) for N = 2. The reason is that (4.20) requires ? < ?4??. This
is exactly the same restriction that we could surpass by using the method of Buss-
Fillmore in Section 3.2. A look at the proof of (4.20) in [50] shows that in order to
improve (4.20), i.e., to relate the gradient of f2(x) to d(x,?x) for larger values of ? we
need to find a better lower bound for Hessf2, which seems to be more complicated.
The example in the previous subsection, shows that in the case N = 2 the worst
displacement of the mean by displacing only one of the points happens when the
two data points are close to the cut points of each other, i.e., where the exponential
map is close to becoming singular. In the case of equal weights, as (4.26) suggests,
one expects that as N increases the restriction on ? to become looser. Therefore,
we state the following conjecture which we already proved for N = 2:
Conjecture 4.6.1. The L2 mean of N points with equal weights in B(o,?)?M is
weakly averaging if ? < min{injM2 , 0.3???}.
84
4.6.3.1 A Lipschitz Property
The estimate in (4.24) can be used to establish the Lipschitz continuity of the L2
mean as a function from MN = M?...?MN to M. Recall the product structure
of MN and denote the induced distance function by dMN(.,.). We also use the
convention of denoting (x1,...,xN) ? MN by x ? MN. As before, the L2 mean
of {xi}Ni=1 with weight vector w is denoted by ?w,B(x). The following proposition
follows easily from (4.24):
Proposition 4.6.2. If ? < 12 min{injM, ?2??}, then ?w,B : MN ?M is a Lipschitz
function with Lipschitz constant of L = 1cos2???(summationtexti w2i)12, more precisely we have
d(?w,B(x),?w,B(y))?LdMN(x,y) (4.27)
for any x,y ? MN, such that {xi}Ni=1 and {yi}Ni=1 lie in B(o,?). Here, ? ? 0 is
treated as ? = 0.
Note that for equal weights and if M is of nonpositive curvature the Lipschitz
constant is L = 1?N, and if M is of positive curvature the Lipschitz constant is L >
1?
N; which again is a testimony of higher sensitivity in the case of positively curved
manifolds. We shall see some consequences of this fact in our further developments
in Section 5.4.
4.7 Computing the Lp Means and an Example for Points on a Sphere
In this section we consider the gradient descent method for finding the Lp means.
85
4.7.1 Gradient Descent Flow for Finding ?xp (1 < p <?)
The following theorem shows that a (continuous-time) gradient flow with initial
condition in a uniqueness ball will find ?xp for 1 < p <?:
Theorem 4.7.1. Let 1 < p <?and{xi}Ni=1 ?B(o,?) where ? < ??,p. Then the
differential equation
dz
dt(t) =??fp(z(t)), z(0) = z0?B(o,?), (4.28)
where?fp is defined in (3.6), induces a gradient descent flow on M which will take
z0 to ?xp as t??.
Proof. First, we need to establish existence and possibly uniqueness of a solution of
(4.28) on the interval [0,?). Since on the boundary of B(o,?), ??fp is pointing
inward the ball any solution t mapsto? z(t) cannot leave B(o,?). Therefore, since ? <
injM, B(o,?) is a local coordinate neighborhood in which the solution stays, and
exp?1o : B(o,?)?M ?B(0,?)?TpM ?= Rn is the corresponding local coordinate
function. Let D?Rn be a bounded open set. Recall that an ordinary differential
equation of the form dzdt = f(z) with the initial condition z(0) = z0 ? D and
f : D?Rn:
1. has a local solution, i.e., a solution that solves the equation only on a small
interval [0,?), if f is continuous [20, p. 178];
2. has a global solution, i.e., a solution which solves it on [0,+?), but not
86
necessarily unique if f is continuous and has linear growth, that is there are
constants a,b > 0 such thatbardblf(x)bardbl?abardblxbardbl+b for all x?Rn [20, p. 178].
3. has a global and unique solution if f is locally Lipschitz and any solution
starting in the compact set ?D?Rn stays in ?D for entire time [53, p. 94].
This result carries over to a local coordinate neighborhood in M such as B(o,?).
Now, note that for 2?p <?,?fp is a locally Lipschitz vector field (see Proposition
4.3.1 also) and since z(t) does not leave B(o,?) we can apply part 3 of the above.
Therefore, when 2 ? p < ? the solution of (4.28) exists and is unique for t ? 0.
For 1 < p < 2 the existence is guaranteed by the part 2 of the above. To see this in
the Euclidean case note that, for x?B(o,?)?Rn we have
bardblx?xibardblp?1?(2?)p?2(bardblxbardbl+bardblxibardbl); (4.29)
and therefore, ?fp(x) has linear growth in B(o,?) (see (3.6)). Since z(t) will not
leave B(o,?), from part 2 we conclude that (4.28) in Rn has a solution defined on
the interval [0,+?) for 1 < p < 2. As explained before this extends to B(o,?)?M,
immediately. In summary, for 1 < p <?the gradient flow in (4.28) has a solution
defined on the interval [0,?). If z(t) is such a solution, since ddtfp(z(t)) < 0 for
z(t)negationslash= ?xp and since z(t)?B(o,?), by LaSalle?s invariance principle or Lyapunov?s
Theorem [42, 53] we have z(t)??xp as t??. This completes the proof.
We just mention that for p = 1 and?one might be able to define generalized
gradient flows as discussed in [20]; however, since their numerical implementation
87
will impose extra difficulty we avoid this approach. In Section 4.3 we saw that ?x1 or
?x? can be approximated by ?xp where p is very large or p is close to 1, respectively.
4.7.2 Gradient Descent Method for Finding ?xp
Numerical computation of the Lp mean in a manifold can be challenging. One
reason is that the explicit evaluation of the exponential map, except for some special
manifolds, is complicated. As we mentioned in Theorem 3.2.1 and Section 4.7.1, if
{xi}Ni=1 lies in B(o,?) the continuous-time gradient flow (4.28) for minimization of
fp starting in B(o,?) will converge to ?xp. Unfortunately, the continuous-time flow
is not implementable on a digital computer; and in a more realistic situation we
would either discretize the flow or use a gradient descent algorithm instead. In
either case, a new iterate might not belong to B(o,?) or even B(o,??,p). One can
impose an extra constraint to keep the iterates inside the ball as part of a gradient
descent algorithm; however, keeping track of the iterates will be demanding. Note
that in reality even though we might know apriori that the data points are close
enough, we might not know the actual small ball which contains the points. Finding
such a ball can be as difficult as finding the mean itself. Another complicated issue
with a gradient descent algorithm is finding a good step-size such that the new
iterate reduces the cost function by the most. Again, finding such a step-size can
be very difficult or impossible in most cases. In Table 4.1 we present an idealistic
gradient descent for finding the Lp mean for 1 < p <?. Finding the step-size hk as
prescribed there and keeping the iterates inside a suitable ball make the algorithm
idealistic. There are possible step-size selection methods, such as the Armjio step-
88
size selection, that make the algorithm realistic. However, in practice a method that
might be the most efficient is to use a small constant step-size. In the following we
give an application of such a gradient descent in finding the Lp means for points on
the unit sphere in R3. Newton-type methods also have been proposed for finding
the mean e.g., [17, 32].
Idealistic Gradient Descent Algorithm for finding the Lp(1 < p <?) mean
1. Let{xi}Ni=1?B(o,?) and ? < ??,p. Choose ? > 0.
2. Set ?x0p to an arbitrary point in B(o,?).
3. Find ??fp(?x0p) from (3.6)
4. Whilebardbl?fp(?xk)bardbl> ?
(a) Set ?xk+1p = exp?xkp parenleftbig?h?fp(?xkp)parenrightbig and find hk = argminhfp(?xk+1p )
such that ?xk+1p ?B(o,??,p).
(b) Find ?xk+1p = exp?xkp parenleftbig?hk?fp(?xkp)parenrightbig
(c) Find??fp(?xk+1p ) from (3.6)
(d) k?k + 1
5. Set ?xp = ?xkp.
Table 4.1: Pseudo-code of a gradient descent algorithm for finding the Lp mean in
M. This algorithm is idealistic but it is guaranteed to converge to the mean.
4.7.3 An Example for Points on the Sphere in R3
In this example, we find the equal weight Lp mean of N = 4 points {xi}Ni=1 which
lie on the upper left part of the unit hemisphere in R3. The sphere S2 is endowed
with the Riemannian metric induced from the standard metric of R3. In this metric
the distance between x,y?S2?R3 is
d(x,y) = cos?1?x,y?, (4.30)
89
where ?.,.?is the standard inner product in R3. We implement a gradient descent
algorithm with fixed step-size hk = 1. The exponential map corresponding to this
metric is calculated as follows [17]. We describe expq : TqS2?S2 at the north pole
q =
bracketleftbigg
00
1
bracketrightbigg
. At other points it is found by a simple rotation with respect to the north
pole. Identify TqS2 with R2, so Z = bracketleftbigZ1Z2 bracketrightbig is a tangent vector at q. Then
z = expq Z =
bracketleftbigg
Z1 sinrr
Z2 sinrr
cosr
bracketrightbigg
, (4.31)
where r =bardblZbardbl=?Z1 +Z2 and sin00 = 1. We require r < ? which is the injectivity
radius of S2. The inverse of expq can be found easily. If z =
bracketleftbigg
z1z
2z
3
bracketrightbigg
?S2?R3 is not
antipodal to q then
Z = bracketleftbigZ1Z2 bracketrightbig = exp?1q z = rsinrbracketleftbigz1z2 bracketrightbig, (4.32)
where r = d(z,q) = cos?1 z3. We denote by ?i and ?i the angles that xi?R3 makes
with the z axis and the angle that the projection of xi on the xy plane makes with
the x axis, respectively. They are the standard polar angles. Then
xi =
bracketleftbiggsin?
i cos?i
sin?i sin?i
cos?i
bracketrightbigg
; (4.33)
90
and we choose the following pairs of (?i,?i)
(?1,?1) = (0,0)
(?2,?2) = (2?10,0)
(?3,?3) = (?5, ?4)
(?4,?4) = (2?5 , ?2).
(4.34)
Figure 4.4 shows the four points and their Lp means for p = 1.1,2,3,10. We used
?1
?0.5
0
0.5
1
?1
?0.5
0
0.5
1
0
0.2
0.4
0.6
0.8
1
x
x4
x3
x1
x2
y
z
x1,...,x4
x1.1
x2
x3
x10
Figure 4.4: An example for equal weight Lp means of N = 4 points on the unit hemi-
sphere for p = 1.1,2,3,10. The means are found by a gradient descent algorithm.
As p increases the mean moves towards the L? center of{xi}4i=1. For this data set
three points x1,x2 and x4 lie on the boundary of the minimal ball of{xi}4i=1.
the gradient descent algorithm in Table 4.1 with fixed step-size hk = 1. It is clear
as p increases ?xp moves towards x4: d(x4,?x1.1) = 0.316? while d(x4, ?x2) = 0.268?
and d(x4, ?x10) = 0.220?. This demonstrates the fact that as p increases, the outliers
contribute more in determining the mean. Unfortunately, finding the Lp mean for
larger values of p becomes numerically very sensitive, since it amounts to computing
91
large powers of d(x,xi) (see (3.6) for the formula of the gradient of fp) which causes
loss of numeric precision very fast. For example, in finding ?x20, with two different
initial conditions we got two answers that are apart by 3.6?, while for p = 10 we
got two answers that are apart by only 8.5377?10?7 degrees. Nevertheless, by
approximating ?x20 we see that d(x1, ?x20)?d(x2, ?x20)?d(x4, ?x20)?0.212?. Recall
that ?xp??x? as p increases. In fact, we have d(?x1.1, ?x20)?17.5? while d(?x10, ?x20)?
4.8?. For our configuration of points, from Theorem 4.2.1 we conclude that x1,x2 and
x4 must lie on the boundary of the minimal ball of{xi}4i=1 and the center of the ball
is located at approximate polar coordinates of (??,??) = (0.220?,0.337?). Hence,
we see that the points belong to a ball of radius smaller than ?4 which Theorem 3.2.1
requires for uniqueness of ?x1.1. In running the algorithm for small p (1 < p < 5),
we did not observe divergence or iterates leaving the hemisphere. For p = 10 and
for some initial conditions the iterate leaves the upper hemisphere but it returns
to it and finally converges to ?xp. Overall, this is rather remarkable since we have
used a fixed step-size of hk = 1. In [17] it is reported that no divergence has been
observed for a gradient descent algorithm with fixed step-size of 1 for p = 2 on the
unit sphere. In [61] it is shown that, in particular, in S2 and for p = 2, if ? < 310rcx,
then the gradient descent algorithm with fixed step size of hk = 1 converges to ?x2.
4.8 Appendix: Some Facts About Convex Sets in Euclidean Space
A reference for the following material is [82]. Let A be a subset of the vector space
Rn which is endowed with the standard inner product?.,.?. We say that A is convex
if for all v1,v2 ?A and 0???1 we have ?v1 + (1??)v2 ?A. Note that if the
92
previous inclusion holds for one 0 < ? < 1 it holds for all ??[0,1]. We say that the
convex set A is of dimension r if the smallest affine subspace of Rn which contains A
is of dimension r. We denote this subspace bySA. A is closed (open) if it is closed
(open) in SA. By the relative interior (relative boundary) or simply the interior
(boundary) of A we mean the interior (boundary) of A as a subset of SA. If the
dimension of A is r we can always embed A in an affine subspace of Rn of dimension
r < n. A face of a closed convex set A is a convex subset F ? A such that for
?v1,v2 ? C and a 0 < ? < 1, ?v1 + (1??)v2 ? F implies v1,v2 ? F. The only
n-dimensional face of an n-dimensional closed set A is A itself. The rest of the
faces are of lower dimensions. Any face F negationslash= A lies on the boundary of A. Any
two distinct faces of A have disjoint interiors and A can be decomposed to disjoint
unions of the interiors of its faces from dimension k = 0 to k = n. A hyperplane Px
passing through x?A is called a supporting hyperplane of A if A lies completely in
one side of Px. Alternatively, this means that a normal vector of Px, like xx, exists
such that ?y?x,nx??0 for all y ?A, or equivalently y?x makes an angle not
larger than ?2 with nx. Any convex set has a supporting hyperplane at any of its
boundary points.
A convex cone C?Rn is a convex set for which x?C implies tx?C for all
t?0. Note that the origin belongs to C. The closure of C is a convex cone again
denoted by ?C. The sum of a point on the boundary and a point in the interior of
C belongs to the interior of C. F ?C is a face of C if and only if summationtextNi=1 wivi ?F
with wi > 0 and summationtextNi=1 wi = 1 imply vi?F for 1?i?N.
93
Chapter 5
Recursive-Iterative Approach in Defining Means
5.1 Introduction
The idea of minimizing fp (see (3.2)) in order to define the center of mass or mean
for a set of points on a Riemannian manifold is obviously based on our intuition
about the properties of the standard Euclidean center of mass. As we explained in
Subsection 4.6, in the case of a manifold of positive curvature, some rather counter-
intuitive situations can happen which question our reliance on the properties of the
Euclidean mean. Nevertheless, the standard Euclidean or arithmetic mean has also
some other properties, which might be used to define other kinds of means. One
of them is the property that the Euclidean arithmetic mean of e.g., three points
is equal to the arithmetic mean of the pairwise means of the three points; and
if we repeat this process, i.e., replace the points by their pairwise means, then the
newly generated points will eventually converge to the arithmetic mean of the initial
points. One interpretation is that all the intermediate sets of points generated in
this process have the same mean, i.e., the mean remains invariant under the pairwise
mean replacement. We call this the Mean-Invariance (MI) property of the arithmetic
94
mean. In this chapter we want to define new means based on similar ideas or
their generalizations, i.e., to construct the mean of N points through recursion and
iteration based on the mean of K < N points. Here, we consider recursion based
definitions as well as pairwise definitions. One can mix the given data points and
produce convex combinations via pairwise or multi-point interactions in order to
generate the mean of the data points. In most occasions we generate a sequence of
points which lie in the closure of the convex hull and in the of minimal ball of the
data points, and eventually converge to a single point, i.e., the mean of the data
points. The analysis of these iterative schemes is very interesting and we develop
powerful tools for that. As explained in Section 4.5, the common property of all
the Lp Riemannian means for different values of p is that each of them assigns a
point in the closure of convex of hull of a given set of data points. However, the
actual form of this assignment depends on p. The recursive-iterative approach also
results in similar assignments. The process to define an iterative mean for N points
on a manifold M induces a very interesting dynamical system on MN. In all these
dynamics the diagonal of MN is an invariant set, and we would like the orbit of a
point in MN to converge to the diagonal of MN. If the data points belong to a small
enough strongly convex ball or a suitable strongly convex set this type of convergence
happens. However, if the data points are not close enough, a periodic behavior
might occur. As an example which shows this periodic behavior, we investigate a
Perimeter Shrinkage (PS) scheme whose invariant sets are equidistance points on
closed or periodic geodesics in M.
95
5.1.1 Contributions and Outline of the Chapter
The main contribution of this chapter is introduction of the notions of convex,
strictly convex and primitive vectorial mean functions (see Definitions 5.2.2 and
5.2.3), which generalize the notions of stochastic, positive stochastic and primitive
stochastic matrices to Riemannian manifolds, respectively. Theorem 5.2.2 is a more
general version of the Perron-Frobenius Theorem about the infinite power of a prim-
itive matrix and enables us to prove our existence and uniqueness theorems for a
large class of recursive-iterative means. We use the size or radius of the minimal ball
of the iterates as a (so-called) Lyapunov function to prove Theorem 5.2.2. Section
5.2 is devoted to the development of the theory of primitive convex mean functions
and their properties. After developing all required results we digress little, at the
end of the section, and consider the dynamical systems aspect of the iterative ap-
plications of a primitive vectorial mean function and relations to infinite products
of stochastic matrices in Subsections 5.2.5 and 5.2.6, respectively.
We study two subclasses of recursive-iterative means: the pairwise mean and
Mean-Invariance (MI) based means1. We introduce the notion of pairwise mean in
conjunction with the Primitive Shrinkage (PS) scheme in Section 5.3. As another
contribution in Subsections 5.3.2 and 5.3.3 we study the problem of cyclic pursuit on
a manifold and identify the equidistance points on closed geodesics as its invariant
set (see Theorem 5.3.2 for details). The results in Section 5.3 are not entirely new
by themselves. The PS scheme, in a rather different formulation, has been known
1The pairwise mean is also based on the notion of mean-invariance; however, we prefer to
distinguish the two concepts since we use different tools to analyze them.
96
to geometers interested in studying the existence of closed geodesics as the Birkhoff
Curve Shortening Scheme. In the course of this research some of the properties of
the Birkhoff Curve Shortening were re-discovered. Nevertheless, to our knowledge
the results of Theorem 5.3.2 are new in the context of cyclic pursuit schemes. In
Section 5.4 we introduce the MI means and their generalizations, and investigate
some of their properties. The main idea of building the class of means which we
call the Mean Invariance (MI) based means (see Section 5.4) was given by Ando, Li
and Mathias in [5]. They defined such means on the cone of positive (semi) definite
matrices to define geometric for such matrices (see also [77]). Later Lawson and Lim
in [59] extended this definition to manifolds of nonpositive curvature and specifically
to Hadamard manifolds. However, the tools and techniques used in [59] do not allow
for extension to the case of positive curvature manifolds 1. Despite the generalities
of our approach based on primitive mean functions, there is a special case studied
in Subsection 5.4.2.2, where we could define the MI means in domains which are not
Riemannian balls, necessarily. We do this by an extra assumption and departure
from resorting to the radius of the minimal to prove the convergence. Finally, in
Section 5.5 we give some numerical examples and provide related discussions.
1To be accurate, in order to define mean-invariance based means, Theorem 3.14 in [59] requires
the base case (see Definitions 5.4.1, 5.4.2 and 5.4.3) to have a property which the authors call
nonexpansiveness. This property is the same as what we called averaging property in Definition
4.6.1; and as we showed in Section 4.6.1 the L2 mean lacks this property in manifolds of positive
curvature (even locally). For this reason and few others, Lawson and Lim?s theory cannot be
employed to yield results as general as ours in Section 5.4.
97
5.2 Convex Mean Functions and Their Finite and Infinite
Compositions
Our ultimate goal is to build the mean of N > 2 data points based on the mean
of N ?1 or less number of points in a recursive and iterative process. To have
a unified and coherent theory it is useful to introduce the notions of convex and
strictly convex mean functions and their compositions.
Definition 5.2.1. A convex mean or averaging function of order N ?2 in M is a
function ? : MN ?M?...?M ?M that assigns to an N-tuple (x1,...,xN)?MN
a point ?(x1,...,xN) called the ?-mean of (x1,...,xN) such that
1. ?(x1,...,xN) belongs to the closure of the convex hull of{xi}Ni=1,
2. there exist a natural number K ? N and indices 1 ? i1 < ... < iK ? N
such that ?(x1,...,xN) belongs to the interior of any closed strongly con-
vex ball containing {xi}Ni=1, unless when xi1 = ... = xiK = x in which case
?(x1,...,xN) = x,
3. ?(x1,...,xN) is continuous in its arguments.
If in the above, item 2 is replaced by
2?. ?(x1,...,xN) belongs to the interior of any closed strongly convex ball con-
taining{xi}Ni=1, unless when x1 = ... = xN = x, in which case ?(x,...,x) = x,
then we call ? a strictly convex mean function of order N.
98
In many occasions we denote an element (x1,...,xN)?MN by the boldface
letter x. Then we write x = (x1,...,xN) and ?(x) will be interpreted accordingly.
Unless otherwise stated, we consider the domain of ? to be of the form
D? ={(x1,...,xN)?MN|?o?M and ????,{xi}Ni=1?B(o,?)}, (5.1)
where ?? < rcx is a small enough real number such that ? is well defined for all
x?D?. A convex mean function is permutation invariant if for every x?D? we
have ?(x?(1),...,x?(N)) = ?(x1,...,xN), where ? is any permutation of the numbers
1,...,N.
Figure 5.1 shows the difference between a convex and strictly convex mean
function. In the left panel, ? is a convex mean function of order N = 6. The set
{xi}Ni=1 lies in the closed strongly convex ball shown, and the data points x1,x2
and x3 coincide and lie on the boundary of the ball containing, while the rest of
the data points lie inside the ball. The mean lies on the boundary of the ball and
?(x1,...,x6) = x1 = x2 = x3. In the left panel, ? is a strictly convex mean function,
and for the same set of data points, the mean lies inside the ball, even though some
of the data coincide and lie on the boundary of the ball.
Example 5.2.1 (Lp Means as Convex Mean Functions). In Theorem 3.2.1, we
showed that the Lp mean of{xi}Ni=1 (with positive weights) for 1 < p??belongs
to the interior of any closed ball containing {xi}Ni=1. Moreover, in Theorem 4.5.1
we showed that the mean also belongs to the closure of the convex hull of{xi}Ni=1.
Therefore, the Lp mean is a strictly convex mean function of (x1,...,xN). In the
99
?(x)
x4
x5
x6
x4
x5 x6
x1 = x2 = x3
? is a convex mean function ? is a strictly convex mean functionn
x1 = x2 = x3 = ?(x)
Figure 5.1: Left: A convex mean function. Right: A strictly convex mean function.
case of equal weights it is also permutation invariant. The Lp(1 < p??) mean of
{xi}Ni=1, if some of the points appear with zero weight, is only a convex mean function
of (x1,...,xN). On the other hand, the L1 mean of{xi}Ni=1 is not a strictly convex
mean function of (x1,...,xN), since the L1 mean of {xi}Ni=1 does not necessarily
belong to the interior of the minimal ball of{xi}Ni=1; but, the L1 mean is a convex
mean function of (x1,...,xN).
In the definition of a strictly convex mean function, a more natural requirement
could have been to require ?(x1,...,xN) to belong to the interior of the convex hull
of{xi}Ni=1. However, this ?more natural requirement? can be restrictive and bring
about difficulties for two technical reasons. First, as explained in the previous
example and we discussed before in Section 4.5, the Lp(1 < p ??) mean does
not satisfy this more natural requirement or at least we do not know whether it
does or not. Second, our ultimate goal is to study infinite compositions of convex
mean functions. Since the shape of the convex hull of a finite set of points in a
non-constant curvature manifold M is not well understood, it would be difficult to
define, in a sound way, the notion of reduction of the size of the convex of hull of
the points in order prove convergence results. On the other hand, working with
100
balls, specifically the minimal ball, is quite straightforward. Also note that the
requirement of ?(x1,...,xN) belonging to the interior of the convex hull of{xi}Ni=1
is less general than the items 1 and 2 (or 2? ) of Definition 5.2.1 combined together.
Therefore, in summary, by having the two items 1 and 2 (or 2?) in Definition 5.2.1
we can develop a coherent theory, avoid unnecessary technicalities and consider a
large class of means.
The following proposition will give us significant notational convenience:
Proposition 5.2.1. Let ?1 be a strictly convex mean function of order N1?2 and
let N2 > N1 be another natural number. Then ?2 : MN2 ?M, the extension of ?1
to MN2, defined as
?2(x1,...,xN1,...,xN2) = ?1(x1,...,xN1) (5.2)
is a convex (but not a strictly) mean function of order N2.
Proof. The claim follows from the observation that the closure of the convex hull of
{xi}N1i=1 is a subset of the closure of the convex hull of{xi}N2i=1 and that any closed
ball that contains{xi}N2i=1 contains{xi}N1i=1, too.
5.2.1 Finite Compositions of Strictly Convex Mean Functions
We are interested in compositions of mean functions to construct higher order mean
functions from lower order ones. The simplest approach is to consider finite com-
positions of strictly convex mean function in the sense described in the following
theorem (whose proof is obvious):
101
Theorem 5.2.1. Let ?(x1,...,xN) be a strictly convex mean function. If one or
moreof the input arguments arereplaced by strictly convex functions of (x1,...,xN),
then the resultant function is still a strictly convex mean function of (x1,...,xN).
Next, we consider a more tangible construction:
Example 5.2.2 (Geodesic Averaging). Let ??(x1,x2) for 0 < ? < 1 be a second
order strictly convex mean function that assigns to x1 and x2, a point on the unique
minimizing geodesic connecting x1 to x2 at distance ?d(x1,x2) from x1. Then we
define
?3(x1,x2,x3) = ?2
3
(?1
2
(x1,x2),x3). (5.3)
It is easy to see that ?3(x1,x2,x3) is a third order strongly convex mean. Note
that in Rn, ?3(x1,x2,x3) = 13(x1 + x2 + x3) which is the Euclidean L2 average of
x1,x2 and x3 (with equal weights). However, in a general Riemannian manifold,
?3(x1,x2,x3) is not necessarily the same as the L2 mean of xi?s with equal weights.
More disturbing is that ?3(.,.,.) is not necessarily a permutation-invariant strictly
convex mean function either. We can factorize any convex combination summationtextNi=1 wixi
to a series of nested pairwise convex combinations and translate that to a series of N
nested applications of the geodesic averaging ??i(.,.)?s (0 < ?i < 1 and 1?i?N)
defined above. A similar idea has been used in [88] to define a linear combination
of N > 2 points on a manifold based on pairwise linear or more precisely geodesic
combinations without convexity constraints. One can show that ?N defined as
?N(x1,...,xN) = ??1parenleftbigx1,??2(x2,??3(...,??N?1(xN?1,xN)))parenrightbig, (5.4)
102
is a strictly convex mean function of order N.
Besides not being permutation invariant another valid criticism about the
mean functions built in this example is that in their construction very little infor-
mation about the global geometry of M has been used. As a concrete example, let
N = 3 and {xi}Ni=1 belong to the unit sphere S2 in R3. Assume that the points
are in an open hemisphere. One can imagine another manifold M embedded in R3
such {xi}3i=1 is in M, the geodesic segments connecting x2 and x3 in S2 and in M
coincide; and the geodesic segments connecting ??2(x2,x3) in S2 and M also coin-
cide. In this situation ?3 will assign, in both manifolds, the same point as the mean.
However, the corresponding triangle?x1x2x3 in S2 and M can have very different
shapes. In this sense the above mean functions do not involve much ?search? or
?sweeping? as compared to, e.g., the L2 Riemannian mean which seeks the mean
globally. On the other hand the nested pairwise mean function in (5.4) is certainly
much easier to calculate. In the sequel, we extend the pairwise mean calculations
in two directions by infinite iterative processes. In one direction, we construct a
mean which in not permutation invariant for K > 3 points, however, it includes
more search or sweeping of the manifold. In other direction, again via an infinite
iterative process, we define a large class of permutation invariant means of higher
orders based on means of lower orders. For that we need to introduce the idea of
infinite combinations of strictly convex mean functions.
103
5.2.2 Infinite Compositions of Convex Vectorial Mean Functions
Now, we introduce the notions of convex and strictly convex vectorial mean functions
from MN to MN, which can be considered as generalizations of the notions of
stochastic and positive stochastic matrices, respectively.
Definition 5.2.2. A (strictly) convex vectorial mean function of order N in M is
a function ? : MN ?MN defined as
?(x) = (?1(x),...,?N(x))?MN, (5.5)
where each ?i : MN ? M is a (strictly) convex mean function of order N. We
call ?i the ith component of ?. The composition of two or more convex vectorial
mean functions is defined the same way as the standard composition of functions
on MN. Let ?1 and ?2 be two convex vectorial mean functions of order N in M by
?3 = ?1??2 we mean
?3(x) = ?2(?1(x)) (5.6)
for all x?MN. We denote the kth power of ? by ?k.
The following important proposition is an immediate consequence of the above
definition:
Proposition 5.2.2. Let ? be a convex vectorial mean function of order N in M.
If y = ?(x), then the radius of the minimal ball of {yi}Ni=1 is not larger than the
radius of the minimal ball of {xi}Ni=1. Moreover, if ? is strictly convex, then the
104
radius of the minimal ball of {yi}Ni=1 is less than the radius of the minimal ball of
{xi}Ni=1, unless x1 = ... = xN, where both balls have radius of zero.
Therefore, applying a strictly convex vectorial mean function ? to a set of N
data points will generate new N points whose minimal ball radius is smaller than
that of the input points, unless all the points coincide. However, note that the
minimal ball of the output points is not necessarily inside the the minimal of the
input points.
Example 5.2.3. Let ?? be the geodesic averaging function with parameter ? as in
Example 5.2.2. Assume {xi}3i=1 lies in a strongly convex ball. The mean function
??12? (x1,x2,x3) = ??(x1,x2) is only a convex mean function of order 3. Now define
? as
?(x) = (??1(x1,x2),??2(x2,x3),??3(x3,x1)), (5.7)
where 0 < ?i < 1 for i = 1,2,3. Again ? is a convex but not a strictly convex
vectorial mean function of order 3. This is because if x1 = x2 and y = ?(x), then
y1 = x1 and y1 belongs to the boundary of the minimal ball of {xi}3i=1. However,
note that ?2, the second power of ?, will be strictly convex.
This example leads us to the following definition (borrowed from the theory of
positive matrices), which introduces a notion similar to the primitivity for stochastic
matrices [43, p. 519] but for convex vectorial mean functions:
Definition 5.2.3. Aconvex vectorial mean function?of order N is called primitive,
if there exists a natural number ns such that ?ns is strictly convex. We call the
smallest value for ns, the index of primitivity of ?.
105
Note that any finite composition of strictly convex vectorial mean functions
of order N is a strictly convex vectorial mean function by Theorem 5.2.1. The
following important theorem shows that the infinite power of a primitive vectorial
mean function also is (or converges to) a strictly convex vectorial mean function,
which has a special feature that all its components are the same function.
Theorem 5.2.2. Let ? be a primitive vectorial mean function of order N with
primitivity index ns?N (i.e., ?ns is a strictly convex vectorial mean function). The
sequence of convex vectorial mean functions???k??k=1 where
??k(x) = ?k(x) (5.8)
converges pointwise to a strictly convex vectorial mean function ?? whose all N
components areequal, i.e., ?? = (??,..., ??), where ??is a strictly convex meanfunction
of order N.
Proof. Let x = (x1,...,xN)?MN be a point in the domain of ?. We write
xk+1 = ??k+1(x) = ?(xk), (5.9)
with x0 = x. We use the convention xk = (xk1,...,xNk ). Denote the minimal ball
of {xki}N1=1 by B(qk,rk). Note that by the definition of strictly convex vectorial
mean function, {xki}Ni=1 will stay in ConvHull({xki}Ni=1) ? B(q0,r?0) for all k ? ns,
where r?0 < r0. Moreover, note that {xi}Ni=1 lies in a ball of radius smaller than
rcx; therefore, the minimal ball of{xi}Ni=1 exists and is unique. Also by Proposition
106
5.2.2 we have rk+1 ?rk; therefore, there exists r? ?0 such limk rk = r?. We show
that r? = 0. Since ?xk?k stays in a compact subset of MN, it has a converging
subsequence (in the product topology)?xkj?kj with a limit ?x. Now by continuity of
? we have
lim
kj
?ns(xkj) = ?ns(?x) (5.10)
or
lim
kj
xkj+ns = ?ns(?x) = ?x, (5.11)
for a point ?x?MN. Note that by continuity of the radius of the minimal ball with
respect to the data points established in Corollary 4.3.1, the radius of the minimal
ball of{?xi}Ni=1 is equal to r?. The sequence of radii of minimal balls of{xkj+nsi }Ni=1,
also has r? as its limit. On one hand, due to continuity, the minimal ball of{?xi}Ni=1
must have radius r?; on the other hand, from (5.11) and due to strict convexity
of ?ns, the radius of the minimal ball of {?xi}Ni=1 should be less than r?. This is a
contradiction and we must have r? = 0, which means that ?x1 = ... = ?xN = ?x. Note
that for a large enough index Kj, {xKji }Ni=1 will be in a small enough closed ball
around ?x. However, after that, due to convexity of ?,{xki}Ni=1 will stay in that ball
for all k?Kj. Therefore, limk xk = ?x.
Consider the assignment xmapsto? ??(x) = (??(x),..., ??(x)). Note that due to strict
convexity of ?ns, ?x must belong to the interior of any ball containing{xi}Ni=1, unless
all the data points coincide. Also, by construction, ?x belongs to the closure of the
convex hull of {xi}Ni=1. We only need to show the continuous dependence of ??(x)
on x to prove that ?? is a strictly convex mean function (see Definitions 5.2.1 and
107
5.2.2). To this end, first note that given ? > 0 we can find large enough K such
?K(x)?B(??(x), ?2)?MN. Next, note that when K, x and ? are fixed, we can find
? > 0, such that y?B(x,?)?MN implies ?k(y)?B(?k(x), ?2)?MN. Therefore,
we have that given ? > 0 there are K and ? > 0 such that if y ? B(x,?), then
?K(?y) ? B(??(x), ?2). However, because of the convexity of ?, ?k(y) will stay in
B(??(x),?), for k?K. Hence, ??(y) must be in B(??(x),?) and this shows continuity
of ?.
5.2.3 Higher Orders of Smoothness
Assuming ? is continuous, the strictly convex mean function ?? constructed in The-
orem 5.2.2 is continuous. Whether ?? can achieve higher orders of smoothness, if ?
itself has higher orders of smoothness, is an interesting question. In general, it seems
to be difficult to answer this question; the main reason is that we are dealing with
infinite compositions of functions. However, if we could find uniform (in k) bounds
on the norms of the derivatives of ?k we might be able to answer the question using
standard tools. A special, yet important case is when ? is Lipschitz with constant
L = 1 i.e., there is L = 1 such that
dMN(?(x),?(y))?LdMN(x,y), (5.12)
for x,y such that{xi}Ni=1 and{yi}Ni=1 lie in a ball B(o,??).
Theorem 5.2.3. If in addition to conditions in Theorem 5.2.2, ? is Lipschitz with
108
Lipschitz constant L = 1 (see 5.12) in its domain, then the new strictly convex mean
functions ??and ?? also are Lipschitz with constants L = 1 and L = 1?N, respectively.
Proof. Observe that the compositions of Lipschitz functions result in a Lipschitz
function whose constant is the product of the Lipschitz constants of the participating
functions. As a result, all the members in the sequence ???k?k defined by ??k = ?k,
are Lipschitz with Lipschitz constant L = 1. Therefore, ????k is an equicontinuous
sequence andby the Arzela-Ascolli theorem, there exists a subsequence???kj?kj which
converges to ?? uniformly. Hence, ?? must be Lipschitz with constant L = 1. Note
that because of the product metric on MN, this means that ?? : MN ?M, will be
Lipschitz with constant L = 1?N.
The fact that Lipschitz constant of ?k is Lk, explains why we might not have
higher order smoothness in this infinite process. For example, if ? has its gradient
bounded by 1 we couldapply the above theorem, and conclude that ??is differentiable
almost everywhere, since it is Lipschitz. However, if the gradient of ? was bounded
by L > 1, we could not have the same conclusion, at least through this path.
5.2.4 Isometry Compatibility
We mentioned about the isometry compatibility of the Lp means in Section 4.4. For
a convex mean function ? of order N, if ?(?(x1),...,?(xN)) = ?(?(x1,...,xN))
for any isometry ? of M, then ? is called isometry compatible. Similarly, we
say that a convex vectorial mean function ? of order N is isometry compatible
if ?(?(x1),...,?(xN)) = parenleftbig?(?1(x)),...,?(?N(x))parenrightbig. The following theorem shows
109
that if we start with isometry compatible means our higher order means also will
be isometry compatible.
Theorem 5.2.4. If in Theorem 5.2.2, ? is isometry compatible, then ?? also will be
isometry compatible.
Proof. The proof follows from the fact that any isometry of M is a continuous
function and that we have pointwise convergence in Theorem 5.2.2.
Therefore, all the means that we construct are isometry compatible.
5.2.5 Dynamical System Point of View
The repetitive application of ? induces an interesting dynamical system on MN.
Let us denote the diagonal of MN by
?(MN) ={(x1,...,xN)?MN|x1 = ... = xN}. (5.13)
Define a cylinder with cubical or square cross section around the diagonal as follows:
C? ={(x1,...,xN)?MN|?q = (q,...,q)??(MN) such that max
i
d(q,xi)???},
(5.14)
where ?? is small enough such that all the components of ? are well-defined. Notice
the similarity between this definition and Definition 5.1. Now for any point x?C?
the iteration
xk+1 = ?(xk) = ?k+1(x) (5.15)
110
is well-defined. More interestingly if B(qk,rk) is the minimal ball of {xki}Ni=1, then
the distance between xk and ?(MN) measured in the L? norm is minimized by
qk = (qk,...,qk) and
rk = min
q??(MN)
max
i
d(q,xki). (5.16)
Note that qk is the projection xk onto ?(MN) under the distance induced by the
L? norm, and the center of the minimal ball of{xki}Ni=1 is the preimage of qk under
the diagonal map, i.e., the map which maps M to ?(MN). Now rk decreases at
least every ns iterations and the L? distance between xk and ?(MN) decreases,
accordingly. We showed in Theorem 5.2.2 that the orbit of x, converges to the diag-
onal of MN and that the diagonal of MN is the only invariant set of this dynamical
system. The orbit will eventually meet the diagonal at ??(x), which is the image of
the mean of {xi}Ni=1 on ?(MN). Figure 5.2 helps us visualize the behavior of the
dynamical system induced by a primitive mean function. One can visualize that
the smallest cubical cylinder containing xk shrinks by every ns applications of ?.
Note that the diagonal ?(MN) also coincides with the continuum of the equilibria
of the system (5.15). Moreover, any equilibrium point of the system is stable in the
sense of Lyapunov [53], but it is not asymptotically stable since a perturbation of
x??(MN) will not eventually come back to x, necessarily.
Remark 5.2.1. In what preceded we alluded to the L?-projection interpretation
of the L? mean. More generally, the Lp mean of {xi}Ni=1 (with weight vector w)
can be considered as the projection of x ? MN onto ?(MN) under the distance
111
??(x)
MN
?(MN)
C?
x
?(x)
?2(x)
?k(x)
Figure 5.2: Repetitive application of the primitive mean function ? brings a point
x in the cylinder C? closer (in the L? sense) to ?(MN), the diagonal of MN. In
the limit the orbit of x meets the diagonal at ??(x).
function induced by the Lp norm in MN. For x,y?MN we define
dMN,p(x,y) = parenleftbig
summationdisplay
i=1
widp(xi,yi)parenrightbig1p, (5.17)
and for points which are close enough to ?(MN) define
?xp = argminy??(MN)dMN,p(x,y). (5.18)
Therefore, we have ?xp = (?xp,..., ?xp) where ?xp is the Lp mean of{xi}Ni=1 with weights
w = [w1,...,wN]T. Theorem 3.2.1 states that if x belongs to a cylinder of the form
in (5.14) whose radius is smaller than ??,p, then the Lp projection of x onto ?(MN)
112
defined in (5.18) is well-defined and yields a unique point.
5.2.6 Relation to Infinite Products of Stochastic Matrices
Theorem 5.2.2 is very similar to the famous Perron-Frobenius theorem which gives
a condition for the convergence of the powers Ak of an N?N stochastic matrix A.
More general results about the behavior of the infinite product of different matrices
are available, e.g. the result of Wolfowitz [90]. It seems that our method of using the
radius of minimal ball to study the behavior of the infinite composition of a single
convex vectorial mean function with itself can be useful in deriving similar results
for the more general case of different convex vectorial mean functions or matrices.
Since it is not of direct interest to us, we just give a simple example of such a result
here, but we believe that deeper results are possible.
Theorem 5.2.5. Let ??k?k be a sequence of convex vectorial mean functions of
order N with common domains. Assume that there exists a subsequence of ??k?k
which converges uniformly to a strictly convex vectorial mean function. Then the
infinite composition of??k?k defined by
??k = ?k??????1 = ?k???k?1 (5.19)
converges pointwise to a strictly convex vectorial mean function ??, whose all com-
ponents are equal.
Proof. Let??kj?kj be a subsequence which converges uniformly to ??. Due to com-
pactness, the subsequence ?xkj?kj must in turn have a subsequence (denoted again
113
by ?xkj?kj for convenience) which converges to a point ?x ? MN. We show that
limkj ?kj(xkj) = limkj ??kj(?x) = ??(?x). We denote the Riemannian distance function
induced by the product structure of MN by dMN(.,.). We have
dMN(?kj(xkj), ??(?x))?dMN(?kj(xkj), ??(xkj)) +dMN(??(xkj), ??(?x)). (5.20)
Now given ? > 0, due to the uniform convergence of ?kj to ??, we can find a natural
number K1(?) such that for all indices kj larger than K1(?),
dMN(?kj(x), ??(x))? ?2, (5.21)
for all x in the common domain of ?k?s. Also due to continuity of ?? and since
xkj ? ?x, there exists another natural number K2(?), such that for all indices kj
larger than K2(?) we have
dMN(??(xkj), ??(?x))? ?2. (5.22)
The above three relations mean that given ? > 0, for all indices kj larger than
max{K1(?),K2(?)}
dMN(?kj(xkj), ??(?x))? ?2. (5.23)
Therefore, limkj ?kj(xkj) = ??(?x). The rest of the proof is essentially the proof of
Theorem 5.2.2 with ns = 1, except that the above relation replaces (5.10) in that
proof.
114
Corollary 5.2.1. In Theorem 5.2.5 if all ?k?s are Lipschitz continuous with Lips-
chitz constant of L = 1, then ??k converges to ?? where ?? itself is a strictly convex
mean function with equal components; and moreover, it is Lipschitz continuous with
Lipschitz constant of L = 1.
Corollary 5.2.2. Let ??k?k be a sequence of vectorial strictly convex mean func-
tions of order N in M. If there exists a function which appears infinitely many often
in that sequence, then the result of Theorem 5.2.5 holds.
Remark 5.2.2. In [69], Moreau introduces a general theory of infinite compositions
of certain convex nonlinear operators and in particular stochastic matrices to ana-
lyze consensus algorithms with time-varying communication links or topologies. The
goal of a consensus algorithm is to bring a set of autonomous agents to agreement on
a state or quantity via local communications between the agents. The theory devel-
oped in [69] is based on the analysis of the graphs associated with the interaction of
the agents and the properties of the convex linear or nonlinear operators employed.
The convexity assumptions for the those functions mentioned in [69] are similar to
the ones we made for strictly convex vectorial mean functions in Definition 5.2.2
except that the assumptions in [69] do not include (or are not equivalent to) the
strict inclusion with respect to closed balls and only a strict inclusion with respect
to the convex hull is assumed. Hence, one cannot use the convergence results intro-
duced in [69] to define the class of means we define here. The main reason is that
the proofs in [69] rely on the inclusion properties of the convex hull of finite points
in Euclidean space which are not valid in a Riemannian manifold. Nevertheless, we
115
believe that the rest of the ideas used in [69] can be used to prove stronger versions
of Theorem 5.2.5.
5.3 Perimeter Shrinkage Scheme, Cyclic Pursuit and a Pairwise Based
Mean
In this section, we use the simplest mean function of order N = 2, i.e., the geodesic
midpoint assignment or more generally the geodesic averaging, to design mean func-
tions of higher orders. Although, it is possible to analyze this process using the
methods from the previous section, it would more interesting to use other tech-
niques.
5.3.1 Perimeter Shrinkage Scheme and a Pairwise-Iterative Mean
Let ?? be the geodesic averaging function defined in Example 5.2.2. Consider the
points {xi}Ni=1 ? M. We use a cyclic indexing: xN+1 = x1 and xN+2 = x2. Now
assume that xi and xi+1 for all 1?i?N are close enough such that ??(xi,xi+1) is
well defined i.e., is a single point. For example, if each pair of (xi,xi+1) lies in a ball
of radius less than injM2 , where injM is the injectivity radius of M, then ??(xi,xi+1)
is well-defined. In particular, if all of the points lie in a strongly convex ball things
become very simple, as we shall see. Consider the following iterative scheme:
xk+1i = ??(xki,xki+1), 1?i?N (5.24)
116
where k is the iteration step and x0i = xi. Figure (5.3) schematically shows the
first iteration of the process for N = 4 and ? = 12. We call this iterative scheme
a perimeter shrinkage scheme because of the next lemma. We define the perimeter
shrinkage map PS? : MN ?MN as
PS?(x) = (??(x1,x2),...,??(xN,x1)), (5.25)
where x = (x1 ...,xN)?MN. Here, we have implicitly assumed that the geodesic
averagings are well-defined. We denote (x1,...,xN) ?MN by x?MN and write
(5.24) compactly as
xk+1 = PS?(xk). (5.26)
If we assume the domain of PS? to be as in (5.1) with ?PS? ? rcx, then PS?
is a convex vectorial mean function of order N and it is easy to check that the
N ?1th power of PS?, is a strictly convex vectorial mean function of order N.
Therefore, we could use Theorem 5.2.2 to show the convergence of the iteration,
when initial data set{xi}Ni=1 lies in a strongly convex ball. However, since there is
another interesting mechanism involved, i.e., the perimeter shrinkage mechanism,
and we want to consider a domain different from the one mentioned above, we use
a different approach.
Another function which we need is the perimeter function P : MN ?Rdefined
as:
P(x1,...,xN) =
Nsummationdisplay
i=1
d(xi,xi+1), (5.27)
117
which is also continuous. Similarly, the energy function E : MN ?R, which is the
sum of the squares of the side lengths of the geodesic polygon x1 ...xN, defined as
E(x1,...,xN) =
Nsummationdisplay
i=1
d2(xi,xi+1) (5.28)
is continuous, too. Also recall the definition of a closed geodesic:
Definition 5.3.1. On a Riemannian manifold a closed geodesic is a loop which is
geodesic at all of its points.
Less formally, a closed geodesic is a geodesic that comes back to its initial
point but at an angle which is equal to ?. A geodesic that is only closed, i.e., it
closes the loop at an angle different from ? is called a geodesic lasso [24, pp. 255].
A closed geodesic is also called a periodic geodesic [9]. A simple closed geodesic
is a closed geodesic that does not intersect itself. Closed geodesics, their existence
and counting them have very old and deep roots in Riemannian geometry [9]. They
correspond to the periodic solutions of the geodesic flow.
The following simple lemma is very useful:
Lemma 5.3.1. Let{xki}Ni=1 and{xk+1i }Ni=1 be two consecutive sets of points gener-
ated by the PS process with parameter ?. Let Pk and Pk+1 be the perimeters of the
closed geodesic polygons xk1xk2 ...xkN and xk+11 xk+12 ...xk+1N , respectively. Also let Ek
and Ek+1 denote the sums of the squares of the side lengths of the closed geodesic
polygons xk1xk2 ...xkN and xk+11 xk+12 ...xk+1N , respectively. We have
1. Pk+1?Pk with Pk+1 = Pk if and only if the geodesic polygon xk1xk2 ...xkN is a
118
x01
x02
x12
x11
x03
x13
x04
x14
Figure 5.3: The initial geodesic polygon x01x02x03x04 is transformed to the new geodesic
polygon x11x12x13x14 by inserting the new vertices at the midpoint of each side. The
perimeter of x01x02x03x04 is strictly less than that of x11x12x13x14, unless the latter is a
closed geodesic. We assume that the initial points are such that the sides of the
initial geodesic can be determined uniquely, i.e, the shortest geodesics of interest
are unique.
closed geodesic1.
2. Ek+1 ?Ek with Ek+1 = Ek if and only if the geodesic polygon xk1 ...xkN is of
equal side lengths and it is also a closed geodesic.
Here, a point is assumed to be a closed geodesic of length zero.
Proof. The proof of the first part is essentially applying the triangle identity. Recall
that for x,y,z?M:
d(x,y)?d(x,z) +d(z,y) (5.29)
with equality if and only if z lies between x and y on the shortest geodesic connecting
1To be pedantic, we have to say that ?the geodesical polygon xk1xk2 ...xk
N can be parameterized
as a unit speed closed geodesic,? since geodesics are curves, i.e., functions defined on the real line.
119
these two points. Therefore, we have:
d(xk+1i ,xk+1i+1 )?d(xk+1i ,xki+1)+d(xki+1,xk+1i+1 ) = (1??)?d(xki,xki+1)+??d(xki+1,xki+2),
(5.30)
for 1 ? i ? N; and hence Pk+1 ? Pk. The equality only happens when we have
equalities in all the triangle inequalities employed. But that means that the consec-
utive geodesic segments xkixki+1 and xki+1xki+2 have to be geodesic at the connecting
vertex xki+1. As a result, the whole geodesic polygon xk1xk2 ...xkN has to be a closed
a geodesic. For the second part first apply the triangle inequality as before and
expand as
d2(xk+1i ,xk+1i+1 )?(d(xk+1i ,xki+1) +d(xki+1,xk+1i+1 ))2?(1??)2?d2(xki ,xki+1) +
2?(1??)d(xki+1,xk+1i+1 ))d(xki+1,xki+2) +?2d2(xki+1,xki+2),(5.31)
for 1?i?N. Now adding both sides of the above for 1?i?N yields:
Ek+1?(1??)2Ek + 2?(1??)
Nsummationdisplay
i=1
d(xki+1,xk+1i+1 ))d(xki+1,xki+2) +?2Ek. (5.32)
Now applying the Cauchy-Schwartz inequality gives
Ek+1?Ek((1??)2 + 2?(1??)+?2) = Ek, (5.33)
where the equality holds if and only if the geodesic polygon xk1 ...xkN is a closed
120
geodesic (as before) and the two vectors [d(x1,x2),...,d(xN,xN+1)]T and
[d(x2,x3),...,d(xN+1,xN+2)]T are parallel in RN (i.e., the equality condition for
the Cauchy-Schwartz inequality). It is easy to see that the latter requires that
d(xi,xi+1) = d(xi+1,xi+2) for 1?i?N. This completes the proof.
The above lemma shows that the set
? ={(y1,...,yN)?MN|geodesic polygon y1 ...yN is a closed geodesic}, (5.34)
is invariant under PS?, i.e., PS?(?)??. For the elements of ? the perimeter of
the corresponding geodesics remain invariant under the PS? map, too. However,
the energy function E in (5.28) is invariant under PS? only on ?E ??:
?E ={(y1,...,yN)??|d(y1,y2) = ... = d(yN,yN+1)}. (5.35)
Neither of these two sets is a fixed point set of PS?. One can see that y =
(y1,...,yN) ? MN is a fixed point of PS? if and only if y1 = ... = yN, i.e.,
the length of the corresponding closed geodesic is zero or equivalently y belongs to
the diagonal of MN. In order to define a mean we need to guarantee that the PS
scheme converges to a fixed point of PS?. If {xi}Ni=1 lies in a strongly convex set
this will be guaranteed. A strongly convex set cannot contain a non-trivial closed
geodesic, since if such a geodesic exists in that set, for two points on the geodesic
whose distance from each other is half of the length of the closed geodesic, there are
two minimizing geodesics connecting them in that set, which contradicts the defini-
121
tion of a strongly convex set. Also another consequence of the strong convexity is
that if{xi}Ni=1 lies strongly convex set A, so does{xki}Ni=1 for all k?1.
Theorem 5.3.1. Consider the perimeter shrinkage map PS? defined in (5.25). If
{xi}Ni=1 lies in a compact strongly convex set A?M, then the perimeter shrinkage
iteration scheme induced by PS? converges to a single point ?x in A, which also
belongsto theclosure oftheconvex hull of{xi}Ni=1. The assignment (x1,...,xN)mapsto??x
is a continuous function from A to A. Moreover, the function ?P,?(x1,...,xN) = ?x
with domain defined in (5.1) and with parameter ?? < rcx is a strictly convex mean
function of order N. For ? = 12 we call ?x the pairwise mean of (x1,...,xN)1.
Proof. Denote the convex hull of {xi}Ni=1 by C and its closure by C. Note that
C?A is a compact strongly convex set by Proposition 2.3.1; hence, it includes no
closed geodesic. Consider the sequence?xk??k=0 defined in (5.26). The sequence stays
in the compact setC= C?...?C. By compactness, there is a subsequence ?xkj?
which converges to a point ?x = (?x1,..., ?xN)?C. On the other hand, from Lemma
(5.3.1) the sequence of positive numbers?Pk?k and hance?Pkj?kj should have a limit
P?; and also P(PS?(x)) < P(x) in C unless P(x) = 0. Note that P(?x) = P? (by
continuity of P). If P?negationslash= 0 we have (by continuity of functions PS? and P):
P? = lim
k??
P(PS?((xk)) = lim
kj
P(PS?(xkj)) = P(PS??(?x)) < P(?x), (5.36)
which is a contradiction. Therefore, P? = 0 and ?x = (?x,..., ?x), where ?x ? C.
1The pairwise mean in general is not permutation invariant, so to be accurate we use n-tuple
(x1,...,xN) instead of{xi}Ni=1.
122
Now, after finite steps the sets{xkji }Ni=1 will be in an arbitrary small strongly convex
ball around ?x; and hence does the next set of points PS?(xkj1 ,...,xkjN) and so forth.
Therefore, xk ??x = (?x,..., ?x) in the product topology. The proof of the continuity
of xmapsto??x is similar to the proof of a similar fact in Theorem 5.2.1.
Nowassume{xi}Ni=1 lies inaclosed ballB(o,?). If (y1,...,yN) = PSN?1? (x1,...,xN)
then {yi}Ni=1 lies the smaller ball B(o,??) for some ?? < ?, unless xi?s coincide and
they lie on the boundary of B(o,?). Therefore, excluding this case, by the previous
part ?x belongs to the interior of B(o,?), which shows that ?P,?(x) is a strictly convex
mean function of order N.
Remark 5.3.1. It is appropriate here to mention both the strengths and limitations
of the notion of a primitive vectorial mean function and Theorem 5.2.1. Note that
if we define a more general version of the PS map PS? in (5.25) as
??(x1,...,xN) = (??1(x1,x2),...,??N(xN,x1)), (5.37)
where ? = [?1,...,?N]T, then ??(x1,...,xN) is a primitive vectorial mean function
with primitivity index of N?1, provided we consider its domain to be of the form
(5.54) with parameter rcx. Therefore, the iterative application of ??, by Theorem
5.2.1, induces a strictly convex mean function of order N. A look at the proof of
Lemma 5.3.1, shows that with ?i?s being different from each other, contrary to PS?,
?? is not a perimeter shrinkage map, necessarily. Hence, the use of the radius of
the minimal ball in Theorem 5.2.1, which is the consequence of the properties of
the primitive vectorial mean functions, enabled us to analyze more general mean
123
functions than those we could analyze by just using the perimeter function. At the
same time the use of perimeter function in the first part of Theorem 5.3.1, enabled
us to prove the convergence of the PS scheme in a strongly convex domain whose
shape is arbitrary, i.e., it is not a ball necessarily.
5.3.2 Cyclic Pursuit on Manifolds: Discrete-time Case
As soon as we make sure that a set of data points lies in a strongly convex ball
Theorem 5.3.1 assures us that the perimeter shrinkage iteration converges to a single
point. How about when the points are not in any strongly convex domain? Analysis
of the problem in this case has relations with the problem of cyclic pursuit in which
N ordered agents such as ants, frogs or moving vehicles follow each other in a cyclic
fashion i.e., agent i follows agent i+1 (modulus N) with certain rules of pursuit [14].
The simplest rule of pursuit can be that agent i moves along the shortest path to
agent i+1 either with speed proportional to its distance to agent i+1 or with unit
speed. In the discrete-time setting we assume that each agent hops to the midpoint
of the shortest geodesic connecting it to the agent it follows. Therefore, maybe the
behavior of frogs will be more suitable to this situation. Now with this formulation
the cyclic pursuit problem and the PS scheme are equivalent. It is well-known that
the cyclic pursuit in the plane or in Euclidean space results in convergence to a single
point [14]. Here, we consider the cyclic pursuit problem in a compact Riemannian
manifold.
Theorem 5.3.2. Let M be a compact and connected (hence complete) Riemannian
manifold. For the points{xi}Ni=1?M assume the PS scheme (or the cyclic pursuit
124
scheme) iteration is well-defined at each step (in particular, if d(xi,xi+1) < injM,
then this would be guaranteed). Then xk ??E and E(xk)?E??0, where ?E is
defined in (5.35). More precisely,
1. If E? = 0, then xki ?x??M for 1?i?N.
2. If E? > 0 and x? = (x?1,...,x?N) ? ?E is one of the cluster points of ?xk?k,
then the geodesic polygon x?1 ...x?N is a closed geodesic of length?NE? and
d(x?i,x?i+1) =
radicalBig
E?
N for 1?i?N.
Moreover, the set
?El ={(y1,...,yN)??E|d(xi,xi+1) = lN, 1?i?N}??E (5.38)
is an asymptotically unstable invariant set of the PS scheme unless l = 0, by which
we mean that for any point x ? ?El there is a small perturbation of x such that
the orbit of x never converges back to ?El. If the set of the lengths of the closed
geodesics of length smaller than l is finite1, then ?El is an unstable invariant set for
the PS scheme, meaning that there exists a small perturbation of x ? ?El whose
orbit will not stay close to ?El.
Proof. From the proof of Lemma 5.3.1 we know that if d(xki,xki+1) < injM for
1 ? i ? N, then d(xk+1i ,xk+1i+1 ) < injM for 1 ? i ? N. Hence, the mentioned
condition is sufficient to guarantee that the PS process remains well defined at
all steps. Now, the decreasing sequence ?Ek?k of real numbers must have a limit,
1This condition is satisfied automatically in the special orthogonal groups and the Grassman-
nian manifolds (see Remark 6.2.2).
125
denoted by E?. If E? = 0, then since after finite number of steps PS?(xk) belongs
to a small convex ball, the scheme converges to a single point by Theorem 5.3.1.
Now let E? > 0. The compactness of MN requires the sequence ?xk?k to have a
converging subsequence ?xkj?kj, with the limit x?. By continuity of the functions
involved we have:
E(x?) = E(lim
kj
xkj) = lim
kj
E(xkj) = E? (5.39)
and
E(PS?(x?)) = E(PS?(limxkj)) = lim
kj
E(PS?(xkj)) = E?. (5.40)
Therefore, x? belongs to ?E. This means that any cluster point of?xk?k belongs to
?E. This together with the compactness of MN implies that xk ??E. Claims 1
and 2 follow from the definition of ?E.
Now, note that for any x??El with l > 0, there exists a small perturbation
?x for which E(?x), and hence, E(?xk) for k?0 is smaller than E? = l2N. Therefore,
obviously ?xk cannot come back to ?El because of the continuity of xmapsto? E(x). If
there are only a finite number of possible values for the lengths of closed geodesics
shorter than l, then since E(?xk) eventually will differ from E? by a constant ? > 0,
the distance of ?xk from ?El also will eventually be larger than a constant ??. This
means that ?El(l > 0) is an unstable invariant set with respect to PS?.
This theorem says that in a cyclic pursuit on M the agents will converge to a
formation which consists of equidistance points on closed geodesics of fixed length
l. However, only ?E0, i.e., the diagonal of MN is an asymptotically stable invariant
126
set of the PS scheme. Note that we did not claim that the formation corresponds to
a single closed geodesic. Convergence to a single closed geodesic requires a deeper
study of the properties of the PS scheme such as the one in [21].
5.3.3 Cyclic Pursuit Schemes in Continuous-time
The formulation of the PS scheme described before is in discrete-time. It is easy to
formulate a continuous-time counterpart of the PS scheme: for all i at each instant
of time xi should pursue xi+1 along the minimizing geodesic connecting them with
speed equal to the distance between them. We implicitly have assumed that always
xi and xi+1 are such that the minimizing geodesic connecting them is unique. This
implies that xi pursues xi+1 in the sense that it tries to reduce its distance from
xi+1. We can write this scheme as:
?xi(t) =?12?d2(xi+1,x)vextendsinglevextendsinglex=x
i
= exp?1xi(t)xi+1(t), 1?i?N. (5.41)
The initial conditions for this set of equations are xi(0) = xi. A simple argument
shows that similar to the discrete-time PS scheme, the continuous-time PS scheme
also remains well-defined for all t > 0, if we start with initial conditions such that
d(xi(0),xi+1(0)) < injM for 1?i?N. Now consider the energy E defined in (5.28)
as a Lyapunov function for the dynamical system (5.41) defined on MN.
d
dtE(x1(t),...,xN(t)) =
Nsummationdisplay
i=1
??xi,?d2(x,xi+1)vextendsinglevextendsinglex=x
i
?xi(t)+??xi+1,?d2(x,xi)vextendsinglevextendsinglex=x
i+1
?xi+1(t)
(5.42)
127
or equivalently
d
dtE(x1(t),...,xN(t)) = ?2
Nsummationdisplay
i=1
parenleftbigg
?exp?1xi(t) xi+1(t),exp?1xi(t) xi+1(t)?xi(t) +
?exp?1xi+1(t) xi+2(t),exp?1xi+1(t) xi(t)?xi+1(t)
parenrightbigg
. (5.43)
Now algebraic simplification and noting that bardblexp?1xi xi+1bardblxi = bardblexp?1xi+1 xibardblxi+1 =
d(xi,xi+1) yield
d
dtE(x1(t),...,xN(t)) =?2
Nsummationdisplay
i=1
bardblexp?1xi(t) xi?1(t) + exp?1xi(t) xi+1(t)bardbl2xi(t), (5.44)
wherebardbl.bardblx is the norm induced by the Riemannian structure of M at x?M. We
conclude that along the trajectories of (5.41) we have ddtE(x1(t),...,xN(t)) = 0 if
and only if x = (x1,...,xN)??E. In the language of LaSalle?s invariance principle
[53, 42], ?E is the largest invariant set of the system in (5.41) at which the total
derivative of E is zero. Based on this observation we could state a theorem similar
to Theorem 5.3.2 for the continuous-time case; however, for the sake of space we
avoid that.
We should mention that since the closed geodesic formation is not a stable for-
mation (see Theorem 5.3.2) one might think of stabilizing this formationby requiring
the autonomous agents to apply some sort of control or feedback in a decentralized
fashion. It might be possible that second order dynamics for the agents as intro-
duced in [49] for the planar case is helpful in this case too. Also we add that a
similar control problem for stabilization of the formation of agents on the sphere
128
S2 embedded in R3 has been proposed in [73], with the caveat that the control
described is not found intrinsically. By this we mean that the control depends on
quantities which are measured and defined in the ambient space (i.e., R3). We also
refer the reader to [79, 80, 81] for related works.
5.4 Means Based on Mean-Invariance
In this section we introduce a simple scheme to construct higher order permutation
invariant strictly convex mean functions from simple geodesic midpoint assignment.
This idea was first introduced in [5] for defining a geometrical mean for positive
definite matrices. Here, we extend it to more general manifolds and also associate
it with the concept of mean-invariance. First, we give the definition of the simpler
equal weight Mean-Invariance (MI) based mean. We describe the procedure in the
following definition, and in Section 5.4.2 we prove existence and uniqueness and
some other properties of the derived means.
Definition 5.4.1. Let (M,d) be a complete Riemannian manifold and ?1
2
(.,.) de-
note the geodesic midpoint assignment map (see Example 5.2.2). Define the Nth
order Mean-Invariance (MI) based mean function ?MI,N recursively as:
1. If N = 2, then set ?MI,2(.,.) = ?1
2
(.,.),
2. Otherwise, in order to find ?MI,N(x) for any x?MN set x01 = x1,...,x0N = xN
and perform the following iteration:
xk+1i = ?MI,N?1(x?N+1?ik ), 1?i?N, (5.45)
129
where for the N-dimensional vector x ? MN, x?j ? MN?1 is an N ?1-
dimensional vector built from x by dropping the jth component of x out. We
write the above iteration more compactly as:
xk+1 = parenleftbig?MI,N?1(x?Nk ),...,?MI,N?1(x?1k )parenrightbig (5.46)
for k = 1,2,... or even more compactly we write
xk+1 = ?MI,N(xk), (5.47)
where the components of?MI,N(?) are defined in (5.46). Assuming all the steps
of this construction are well-defined and if the sequence xk = (xk1,...,xkN)?
MN converges to a point ?x = (?x,...,?x), then we call ?x the Nth order MI
mean of{xi}Ni=1 and set ?MI,N(x) = ?x.
Assuming ?MI,N(x1,...,xN) is well-defined, we observe that by construction
(i.e., the iteration in (5.45)) all the generated intermediate sets of points {xki}Ni=1
have ?x as their Nth order MI mean. Hence, the name ?mean-invariance? is justified.
In fact, the main idea in defining the Nth order MI mean, is nothing but replacing
each point by the N?1th order mean of N?1 points including the point of interest
itself and repeating this process. One canshow, using the Perron-Frobenius theorem,
that this process in Euclidean space produces the standard Euclidean mean with
equal weights. Note that ?MI,2(.,.) = ?1
2
(.,.) is permutation invariant. Now, assume
?MI,N?1 is permutation invariant. Since all N combinations of size N?1 from the
130
set {xi}Ni=1 are included in defining ?MI,N (see (5.46)), by induction we conclude
that ?MI,N must be permutation invariant.
5.4.1 Weighted Mean-invariance Means
Here, we propose two methods to incorporate weights in the defining the MI mean.
In the first method, given the set{xi}Ni=1 we can increase the number of points by
introducing new points that are repetition of the original points. For example, for
{x1,x2}, ?MI,2(x1,x2) is the midpoint of the minimizing geodesic connecting them
x1 and x2, while if we set x3?x1 then ?MI,3(x1,x2,x3) is a point on the geodesic at
distance third of 13d(x1,x2) from x1. More generally, we can introduce Ni repetitions
of xi for all i, and resemble rational weights of wi = Nisummationtext
i Ni
. We can approximate
any weight by this process; however, obviously it has the drawback of dramatically
increasing the number of total points to Nnew = summationtextNi=1 Ni.
The second idea is to include the weights in the geodesic averaging step ex-
plicitly starting with N = 2. As an example, let N = 3 and weights w1,w2 and w3
be given. We interpret these weights as global contributions of xi?s in the mean.
However, for the pairwise mean of x1 and x2 we require the weight ?1 = w1w1+w2 for
x1 and 1??1 for w2. Similarly we set ?i = wiwi+wi+1 with modulus 3 cyclic indexing.
Example 5.4.1. Let us consider an example with N = 3 points in R. We have
three weights {wi}3i=1 from which we determine the ?i as ?i = wiw
i+wi+1
(i = 1,2,3).
Denote xk = [xk1,xk2,xk3]T ?R3. Then the iteration process is obtained from (5.45)
by replacing the midpoint mean function ?MI,2 = ?1
2
that gives xk+11 ?s by ??1(.,.),
131
i.e., we set xk+11 = ??1(xk1,xk2) and so forth for i = 2,3. Therefore, we have
xk+1 = Axk, (5.48)
where
A =
?
??
??
??
?
?1 1??1 0
0 ?2 1??2
1??3 0 ?3
?
??
??
??
?
. (5.49)
Now, from the Perron-Frobenius theorem we know that Ak converges to a rank
one matrix A? whose all rows are the same [43, p.524]. This common vector is the
eigenvector corresponding to the largest eigenvalue of AT which is 1. The eigenvector
should be normalized so that the sum of its entries are 1. It is easy to calculate this
eigenvector as ?w = [?w1, ?w2, ?w3]T where
?wi = (1??i)
?1
summationtext3
j=1(1??j)?1
, i = 1,2,3. (5.50)
More concretely, if w = [w1,w2,w3]T = [13, 13, 13]T, then ? = [?1,?2,?3]T = [12, 12, 12]T
and ?w = [13, 13, 13]T, hence ?3MI(x1,x2,x3) = 13(x1 + x2 + x3). However, if w =
[14, 18, 58]T, then ? = [23, 16, 12]T and ?w = [3077, 1277, 3577]T, hence the explicitly weighted
MI mean of{xi}3i=1 corresponding to w is ?WMI,3(x1,x2,x3) = 3077x1 + 1277x2 + 3577x3.
As it can be seen in this simple example the relation between initial weights wi?s and
the actual weights wi?s is a nonlinear relation. However, our intuition that putting
a large w1 for x1 will result in large ?w1 is correct. More accurately, we can see that
132
if w1 > w2 > w3 then ?1 > ?2 > ?3, and hence ?w1 > ?w2 > ?w3.
Next, we give a formal definition for the explicitly Weighted MI (WMI) mean
based on the second idea mentioned above, since it is more practical.
Definition 5.4.2. Let (M,d) be a complete Riemannian manifold and ??(.,.) be the
geodesic averaging function. Define the Nth order weighted Mean-Invariance based
mean ?WMI,wN(.) corresponding to the positive weight vector wN = [w1,...,wN]T
recursively as:
1. If N = 2 then ?WMI,w2(.,.) = ?w1(.,.) and w2 = [w1,w2]T is the weight vector.
2. Otherwise, to find ?WMI,wN(x) where x = (x1,...,xN), set x0 = xand perform
the following iteration:
xk+1 = parenleftbig?WMI,w1N?1(x?Nk ),...,?WMI,wN
N?1
(x?1k )parenrightbig. (5.51)
In the above the corresponding weight vector wiN?1 is found as follows:
wiN?1 = w
?N+1?i
N
1TN?1w?N+1?iN i = 1,...,N + 1. (5.52)
Here 1N?1 is an N?1-dimensional vector all whose elements are 1. We write
(5.51) more compactly as
xk+1 = ?WMI,WN(xk), (5.53)
where WN is the N ?N matrix of weights whose ith row is equal to the
133
transpose of (wiN)?N+1?i. Here, for an N?1-dimensional vector x, by x?i we
mean the N-dimensional vector whose ith component is zero and the rest of
it is filled in an increasing order of indices with the elements of x also in a
increasing order of their indices (cf. x?i in Definition 5.4.1). If the sequence
of points?xk?k in MN converges to a point ?x = (?x,..., ?x), then we call ?x the
Nth order WMI mean of{xi}Ni=1 with weight vector wN = [w1,...,wN]T and
set ?WMI,wN(x1,...,xN) = ?x.
In the case of equal weights the WMI mean reduces to the MI mean. Also
note that the matrix WN built in the definition, is a stochastic matrix which has
exactly one zero at each row, and each column of it also has exactly one zero. This
suggests that the convex vectorial mean function?WMI,wN must be a primitive mean
function of index ns = 2, as we will see next.
5.4.2 Existence, Uniqueness and Other Properties
Next, we prove existence and uniqueness properties of the MI and WMI means. For
this we use Theorem 5.2.2 about convex vectorial mean functions.
Theorem 5.4.1. Define
D ={(x1,...,xN)?MN|?o?M and ? < rcx,{xi}Ni=1?B(o,?)}. (5.54)
For any integer N ? 2 and positive weight vector wN, the functions ?MI,N and
?WMI,WN defined in Definitions 5.4.1 and 5.4.2 are primitive mean functions with
primitivity index of ns = 2 on the domain D defined above. Therefore, if{xi}Ni=1 lies
134
in a ball B(o,?) with ? < rcx, then the MI mean and the WMI mean with weight
vector wN exist, are unique, belong to B(o,?) and to the closure of the convex hull
of{xi}Ni=1. Moreover, ?MI,N(x) and ?WMI,wN(x) are strictly convex mean functions
of order N with domain D as above.
Proof. The proof is by induction over N. Obviously, all the claims hold for N = 2.
Now, we assume they holds for N ?2, and we show they hold for N + 1. We only
need to show that ?MI,N+1 and ?WMI,WN+1 are primitive mean functions. The rest
of the claims follow from Theorem 5.2.2, immediately. First, note that ?MI,N+1 is
a convex vectorial mean function of order N + 1. Its convexity follows from the
fact that each of its components, i.e., xmapsto??MI,N(x?N+2?i) is convex by Proposition
5.2.1. Now suppose {xi}N+1i=1 ?B(o,?) and y = ?MI,N+1(x). If of one yi?s, say y1,
lies on the boundary of B(o,?), then by strict convexity of ?MI,N, x1 = ... = xN
and they must lie on the boundary of the ball. Assume xN+1 does not lie on the
boundary of the ball - otherwise we have nothing to prove-, then again by the strict
convexity of ?MI,N, yi (2?i?N +1) lies inside the ball. Now since only one point
among{yi}N+1i=1 lies on the boundary of the ball, for the new set of points generated
as z = ?MI,N+1(y) we see that all zi?s must belong to the interior of the ball. This
means that ?MI,N+1 is primitive with primitivity index ns = 2. The same argument
also works for ?WMI,WN+1.
This result, in particular gives a new proof for the existence, uniqueness and
continuity results derived in [5, 59].
135
5.4.2.1 A Lipschitz Property in Manifolds of Nonpositive Curvature
In the construction of the MI mean from order N?1 to N, if each mean function of
order N?1 has Lipschitz constant L, then the vectorial mean function constructed
has Lipschitz constant L? = ?N?1L. In order to guarantee the resultant mean
function of order N to be Lipschitz, according to Theorem 5.2.3 we must have L = 1.
Therefore, we need to have L? = 1?N?1. This only can happen if M is of nonpositive
curvature.
Theorem 5.4.2. In a manifold of non-positive curvature the MI mean function
?MI : MN ?M of order N is Lipschitz with constant L = 1?N.
Proof. For the L2 mean of N = 2 points with equal weights or the geodesic midpoint
assignment we found the Lipschitz constant in Proposition 4.6.2 to be L = 1?2.
Therefore, ?MI,3 : M3 ?M3 defined according to (5.47) is Lipschitz with constant
L = 1; hence, by Theorem 5.2.3 ?MI,3 will be Lipschitz with constant L = 1?3. Now
a simple induction proves the claim.
It seems that, unfortunately, opposite to the Riemannian L2 mean no form
of Lipschitz property exists for the MI mean in the case of a manifold of positive
curvature. From Proposition 4.6.2, recall that in a manifold of positive curvature
(or more precisely in a manifold of nonnegative curvature where the curvature is not
identically zero) the Riemannian L2 mean, and in particular the geodesic midpoint
assignment, are Lipschitz but with a Lipschitz constant larger than in the Euclidean
case. This results in a Lipschitz constant larger than 1 for the vectorial mean
136
function which is used to construct the MI of order N = 3. Consequently, our
iterative process, as explained in Section 5.2.3, annihilates Lipschitz property by
multiplication of the constants larger than 1.
5.4.2.2 An Existence and Uniqueness Result for Non-ball Domains
In Theorem 5.4.1, in order to have a valid MI or WMI mean we required {xi}Ni=1
to belong a to strongly convex ball of radius smaller than rcx. This in turn, was
needed because of the requirements of Theorem 5.2.2 and our requirements about
convex mean functions. As we shall see in Chapter 6 this assumption is restrictive
for our applications in homogenous spaces such as the orthogonal group and the
Grassmannian manifold, where we would like to have the same result but for points
which belong to a special non-ball strongly convex set (see Sections 6.4 and 6.6 for
more details). It is possible to prove existence and uniqueness of the MI and WMI
means in a less restrictive situation as follows:
Theorem 5.4.3. Let (M,d) be a complete Riemannian manifold. Let A?M be
a strongly convex open set with o?A such that xmapsto?d2(x,o) is a strictly convex
function in A. Also assume that there exists another strongly convex set A? such
that ?A ? A?.1 If {xi}Ni=1 ? A, then the MI and WMI means of {xi}Ni=1 exist, are
unique, depend continuously on xi?s, and belong to the closure of the convex hull of
{xi}Ni=1. Moreover, we have
d(o,?MI(x1,...,xN))?max
i
d(o,xi), (5.55)
1This is a technical assumption which is automatically satisfied when A has a specific shape
such as a Riemannian or a Finsler ball as we shall see in Section 6.6.
137
with equality if and only if xi?s coincide. The same holds for ?WMI.
Proof. The proof is based on induction over the number of points N. The assump-
tion about A? guarantees that the closure of the convex hull of{xi}Ni=1 is a strongly
convex set (see Proposition 2.3.1); and the claim that the mean, if it exits and is
unique, belongs to the closure of the convex hull of{xi}Ni=1 is the direct consequence
of the construction and closed-ness of the closure of the convex hull of{xi}Ni=1. Also
note that in case the MI or WMI mean of {xi}Ni=1 ?A exists and is unique, after
finite number of iterations the points xki will belong to a small convex ball, hence
the continuity of the mean with respect to the initial points follows from Theorem
5.4.1. Hence, we only need to prove the existence and uniqueness of the mean as
well as (5.55). As before we only consider the MI scheme. For N = 2 due to strong
convexity of A and strict convexity of d2(o,x), ??(x1,x2) is unique, belongs to the
geodesic connecting x1 and x2, d2(o,??(x1,x2))?maxi d2(o,xi) with equality if and
only if x1 = x2. Now assume the claim holds for N?1 points. Define
hk = max
i
d2(o,xki) (5.56)
Denote by IN?1j the subset of indices {1,...,N} which is formed by indices that
contribute in determining xk+1j from xki?s. Note that if max1?i?N d2(o,xk+1i ) happens
at i = j then
max
1?i?N
d2(o,xk+1i ) = d2(o,xk+1j )? max
i?IN?1j
d2(o,xki)? max
1?i?N
d2(o,xki). (5.57)
138
By the induction assumption, equality happens in the first inequality above if and
only if all the N ?1 xkis contributing in determining xk+1j for i ? IN?1j coincide.
The equality in the second inequality above holds if and only if max1?i?N d2(o,xki)
happens at some i = l ? IN?1j . Therefore, hk+1 ? hk with equality if and only
if at least N?1 of xki?s coincide and these N ?1 points are not closer to o than
the other xki. Assume this situation has happened, but not all the points coincide,
say xk1 = ... = xkN?1 but xkN?1 is different from them with d2(o,xkN) ? d2(o,xk1).
Then d2(o,xk+11 ) = d2(o,xk1) but d2(o,xk+1i ) < d2(o,xk1) for 1 < i ? N since in
determining each of these xk+1i ?s not all of contributing xki?s coincide. Therefore,
the farthest xk+1i (1?i?N) from o, which is xk+11 cannot coincide with any other
points at this step. As a result at step k + 2 we have hk+2 < hk+1 = d2(o,xk+11 ),
unless all xki?s would have coincided. Hence, in any situation we have hk+2 ? hk
with equality if and only if all xki?s coincide. Therefore, noting the continuity of
(y1,...,yN)mapsto?maxi d2(o,yi) and the continuity of the means of order N?1, it easy
to see that the subsequence ?x2k?1?k must have a converging subsequence which
converges to ?x = (?x,..., ?x) ? MN. This implies that after some finite steps k0,
{x2k0?1i }Ni=1 lies in a small enough convex ball. Therefore, from Theorem 5.4.1 this
means that all sequences?xki?k converge to a common limit which coincides with ?x.
Also it is obvious that limk hk = d2(o,?x)?h1 = maxi d2(o,xi) with equality if and
only if all xi?s coincide and this proves (5.55).
139
5.4.3 Generalized Weighted Mean-Invariance Means Resembling Lp
Means
The MI or WMI means introduced before resemble the L2 mean, as evidenced by the
case of Euclidean space where the MI mean and the equal weight L2 mean coincide.
Note that by resemblance we do not necessarily mean closeness, rather we mean
similar behavior with respect to outliers. Although, note that if the data points are
close enough, then the equal weight L2 mean and the MI mean in M will be close
(at least up to the first order of approximation). It is possible to define recursive-
iterative means which resemble other Lp means, provided we start with a base case
different from K = 2. Note that the equal weight Lp mean of two close points
coincides with the midpoint of the unique minimizing geodesic connecting them, for
all p > 1. However, that is not the case for K ?3 points. The following definition
generalizes Definitions 5.4.1 and 5.4.2 by choosing a base case at which the Lp mean
of K ? 3 points is found. We call such a mean the Generalized Weighted Mean
Invariance (GWMI) based mean of type p. The interesting point is that the same
method used in Theorem 5.4.1 can be used to prove the existence and uniqueness
of the GWMI mean, as we see in Theorem 5.4.4.
Definition 5.4.3. Let (M,d) be a complete Riemannian manifold. Assume ?p,wN :
MN ? M is the Lp(1 < p ??) mean function with the positive weight vector
wN, i.e., ?p,wN(x1,...,xN) is the Lp mean of the set{xi}Ni=1 with the corresponding
weight vector wN, provided the mean exists uniquely. Let K ?3 and 1 < p??
be given. We define the Nth order Generalized Weighted Mean-Invariance Mean
140
(GWMI) of type p or resembling the Lp mean as follows:
1. If N = K then ?GWMI,p,wN(.) = ?p,wK(.),
2. Otherwise, to find ?GWMI,p,wN(x) where x = (x1,...,xN), set x0 = x and
perform the following iteration:
xk+1 = parenleftbig?GWMI,p,w1N?1(x?Nk ),...,?GWMI,p,wN
N?1
(x?1k )parenrightbig. (5.58)
Inthe above thecorresponding weight vector wiN?1 isfound fromwN according
to (5.52) in Definition 5.4.2. We write (5.58) more compactly as
xk+1 = ?GWMI,p,WN(xk), (5.59)
where WN is defined the same as in Definition 5.4.3. If the sequence of points
xk ?MN converges to a point ?x = (?x,...,?x)?MN, then we call ?x the Nth
order GWMI mean of{xi}Ni=1 of type p with weight vector wN = [w1,...,wN]T
and set ?GWMI,p,wN(x1,...,xN) = ?x. If the weights are equal we call the mean
Generalized Mean Invariance (GMI) based mean.
We have the following existence and uniqueness theorem:
Theorem 5.4.4. For 1 < p??define
Dp ={(x1,...,xN)?MN|?o?M and ? < ??,p,{xi}Ni=1?B(o,?)}, (5.60)
where ??,p is defined in (3.5). Given any integer N ?K and positive weight vector
141
wN, the function ?GWMI,p,wN defined in Definition 5.4.3 is a primitive vectorial
mean function with primitivity index of ns = 2 on the domain Dp defined above.
Therefore, if {xi}Ni=1 lies in a ball B(o,?) with ? < ??,p, then the GWMI mean
with weight vector wN exists, is unique, belongs to B(o,?) and to the closure of the
convex hull of{xi}Ni=1. Moreover, ?GWMI,p,wN(x) is a strictly convex mean function
of order N with domain Dp as above.
Proof. The proof is the same as the proof of Theorem 5.4.1, except that the base
case of the induction proof is N = K instead of N = 2.
Note that even in Euclidean space the GWMI mean of type pnegationslash= 2 and the Lp
mean are highly nonlinear functions of the data points and there might not be any
direct relation type between them.
5.5 Discussion and Examples
Besides theoretical interest, our motivation in studying the recursive-iterative means
and especially the pairwise and WMI mean is possible computational benefits over
e.g., the gradient descent method for finding the L2 mean. Recall that finding the
Lp means in a general manifold requires knowing or using the exponential map of
the manifold, which can be very difficult or computationally intensive. For certain
homogenous manifolds such as the unit sphere Sn, the special orthogonal group
SO(n) and the Grassmannian manifold Gk(n), if we use a suitable embedding of the
manifold in a Euclidean space, then the midpoint of a geodesic between two points
can be found without explicit use of the corresponding exponential maps. This fact,
142
which we shall explore further in Section 6.7, can be useful in computing pairwise,
MI or WMI means in the mentioned manifolds. On the other hand, the MI and WMI
means suffer from a curse of dimensionality in the sense that the computational load
to calculate them increases exponentially with N. To see this, note that if CN is the
computation load for finding the MI mean of order N, then from 5.45 in Definition
5.4.1 we have CN = RNCN?1, where R is the number of iterations we perform till
achieve convergence. We assume that R is fixed. Clearly, the relation between CN
and CN?1 will result in a prohibitive exponential dependence of CN on N. Note that
each of the N sets of iterations performed in order to find the MI and WMI means of
order N can be implemented independently or in parallel which can result in faster
implementation. Still this parallel implementation can become expensive for large
N. Therefore, the practical use of MI and WMI or GWMI means are limited to the
situation where the number of data points is not large. In contrast, the pairwise
mean, which unfortunately is not permutation invariant (unless N = 3), is much
cheaper to calculate.
5.5.1 Perimeter Shrinkage and Pairwise Mean: An Example
In our first example we apply the PS scheme to the same set of points as in the
example studied in Section 4.7.3. Let{xi}4i=1 be as in that example:
x1 =
bracketleftbigg
00
1
bracketrightbigg
,x2 =
bracketleftbigg
0.58780
0.8090
bracketrightbigg
,x3 =
bracketleftbigg
0.50
0.7071
bracketrightbigg
,x4 =
bracketleftbigg
00.9511
0.3090
bracketrightbigg
. (5.61)
143
An important observation which we mentioned in the beginning of the section is that
implementation of the PS scheme in this case is quite easy and does not require any
complicated calculation. To see this note that the midpoint of the geodesic be-
tween two non-antipodal points x1,x2?Sn where Sn is considered as an embedded
submanifold of Rn is simply
?1
2
(x1,x2) = x1 +x2bardblx
1 +x2bardbl
, (5.62)
wherebardbl.bardblis the standard Euclidean norm of Rn+1. This is due to the symmetry of
Sn and the symmetry of the embedding.
Now, we apply the PS scheme to the set {xi}Ni=1. The points are fed to the
PS scheme according to their original index order. Figure 5.4 shows the perimeter
of the geodesic polygon xk1xk2xk3xk4 in terms of index iteration k. The bottom graph
show logPk in terms of k. This latter graph suggests that the convergence of the
polygon or the the points to the pairwise mean is a linear convergence, in the sense
that Pk+1 ? hPk where h < 1 is a constant. This should not be surprising since,
after a while that the points become close enough, the behavior of the PS scheme
in S3 or in any manifold will become similar to its counterpart?s in Euclidean space,
which has this linear behavior.
Also we try to feed the PS scheme with the data points in different orders.
Note that we have three distinct order possibilities to feed the PS scheme. They
are 1234, 1324 and 1342. The rest of the possibilities are equivalent to one of the
three mentioned. Associated to each of these three orders we have three different
144
0 10 20 30 40 500
1
2
3
4
P k
k
Perimeter of polgon x1kx2kx3kx4k in terms of number of iterations
0 10 20 30 40 5010
?10
100
1010
Logarithm of the perimeter of polgon x1kx2kx3kx4k in terms of number of iterations
log P
k
k
Figure 5.4: Perimeter of the polygon xk1xk2xk3xk4 in example in Section 5.5.1 under
the PS iteration. The bottom graph shows that convergence of the polygon to a
point is a linear convergence.
pairwise means ?1234, ?1324 and?1342. It would beinteresting to comparethe distance
between these three means1. We also add two other means to this comparison. First,
is the equal weight L2 mean of{xi}4i=1 denoted by ?L2, which is found by the gradient
descent algorithm. Second, is the so-called extrinsic mean ?ex which is the projection
of the Euclidean mean of{xi}4i=1 as a subset of R3 onto S2 and is quite easy to find:
?ex =
summationtextN
i=1 xi
bardblsummationtextNi=1 xibardbl, (5.63)
wheneversummationtextNi=1 xi isnot zero. Table5.1 shows theangular distances between all these
means in Degrees. As one can see all the distances are very small. Nevertheless, on
1A valid question which we have not addressed is: what is the source of this permutation
dependence? It seems that curvature is one factor. Also another factor is the number of data
points. By knowing and understanding these factors we might be able to devise means which are
simple to compute and yet their permutation dependence is under control.
145
d(.,.) ?1234 ?1324 ?1342 ?L2 ?ex
?1234 0 0.2846 0.9622 0.4031 1.0373
?1324 0.2846 0 0.8049 0.2654 1.0720
?1342 0.9622 0.8049 0 0.5659 0.6905
?L2 0.4031 0.2654 0.5659 0 0.8251
?ex 1.0373 1.0720 0.6905 0.8251 0
Table 5.1: The angular distances in Degrees between different means for the set
{xi}4i=1 in example of Section 5.5.1.
average the more sophisticated means ?1234, ?1324, ?1342 and ?L2 are closer to each
other than to ?ex, in the sense that the average of the distances between the means
in the group is less than the average of the distances between ?ex and the means in
the group.
5.5.2 Cyclic Pursuit on the Sphere
Imagine a group ofN frogs dispersed uniformly on a globe. If each frog starts chasing
its closest group member, what would happen after a while? Since the situation is
independent of labels, we assume frog i starts following frog i+1 which is its closest
group member. Obviously, by xN+1 we mean x1. This is exactly the formulation we
introduced in Section 5.3.2. We consider the discrete-time case, and assume at each
unit of time frog i hops to the midpoint of the minimal geodesic between itself and
from i + 1. Theorem 5.3.2 tells us that the frogs will either all meet at a point or
will converge to circular formations on great circles. We know that if they start in a
convex ball, they will converge to a point; however, if they are not initially located
in a strongly convex ball either of the two endings can happen. Note that by our
146
assumption of uniform placement on the globe, if N is large enough the chance for
them of not being in a strongly convex ball is low. It seems to be difficult to give
a condition to guarantee whether a given initial configuration of the group which
is dispersed on entire globe will converge to a point or to great circles. However,
as Theorem 5.3.2 suggests a great circle in not a stable formation. We can observe
this fact in a numerical simulation. We assume there are N = 200 frogs in cyclic
pursuit. Figure 5.5 shows the evolution of their positions and Pk the perimeter of
the closed geodesic polygon xk1 ...xkN in terms of the discrete-time index k. As it
can be seen, after initial approach to a great circle, due to finite precision errors in
computer simulations, Pk (k?18583) becomes smaller than 2? and converges to 0,
eventually. The conclusion is that unless the frogs commit no errors, they will meet
each other at a point. Whether they can avoid this event depends on whether they
can employ smarter rules to control their formation!
5.5.3 An Example of GMI Means on the Sphere
In this example, we find the GMI means of types p = 1.1 and p = 3, and the MI
mean for the data points{xi}Ni=1 in the example of Subsection 4.7.3. The base case
for the GMI means is K = 3. The corresponding Lp means are found by the gradient
descent algorithm. Figure 5.6 shows the resulted means. For comparison we have
shown the L1.1, L3 and L2 means of the same data set. While it seems very difficult
to give any useful quantitative measure of closeness between the Lp mean and its
GMI version, this example shows that they are close. However, as mentioned before,
the more important issue is their behavior with respect to the outliers. We see that
147
?1 0
1
?10
1
?1
0
1
k=0,Pk=13.06pi
?1 0
1
?10
1
?1
0
1
k=10,Pk=8.46pi
?1 0
1
?10
1
?1
0
1
k=100,Pk=5.36pi
?1 0
1
?10
1
?1
0
1
k=1000,Pk=3.10pi
?1 0
1
?10
1?1
0
1
k=10000,Pk=2.0014pi
?1 0 1?10
1?1
0
1
k=18583,Pk=2.0000pi
?0.7?0.65
?0.6
00.1
0.2
?0.8
?0.75
?0.7
k=100000,Pk=0.05pi
1 2 3 4 5 6 7 8 9 10
x 104
0
10
20
P k
k
Perimeter Pk in terms of k in a cyclic pursuit on a sphere
Figure 5.5: Top pictures show the evolution of the positions of a group of N = 200
frogs on a globe in discrete-time cyclic pursuit in terms of time k. Convergence is
slow and after k = 18584 iterations they reach to a formation which is the closest
to a great circle in perimeter. However, since this configuration is not stable, due
to finite precision errors in calculations they keep getting closer to each other, and
after k = 100,000 iteration the perimeter will be 0.05?. The bottom graph shows
the perimeter Pk in terms of k, which after initial ?convergence? to 2?, due to finite
precision tends towards zero.
148
the GMI means of types p = 1.1 and p = 3 have different behaviors and resemble
their Lp versions.
In the simulations for this example we observed two difficulties in computing
the GMI means:
1. The first difficulty arises when the scheme is close to convergence, in this
situation we need to find the Lp mean of data points which are very close to
each other and this can be numerically difficult. Two reasons for this can be
found. The first reason is that in such a case the formulas (4.32) and (4.31)
for finding the exponential map and its inverse at a point which is close to the
data points involves divisions by small numbers and also finding the arccos of
small angles. The second reason is that one needs to use very small stepsize or
adaptive stepsize in the gradient algorithm to make sure that the new updates
are not out of the convex hull of the data points (which is a very small).
2. The second difficulty also is related to finding the Lp mean for values of p close
to 1 or very large. We already mentioned the difficulty with large valued of
p in Subsection 4.7.3. A look at (3.6), the formula for?fp, shows that if the
iterate of the gradient descent is close to one of the data points, then evaluation
of ?fp requires again division by a small number. This adds some error in
finding the Lp mean; however, since in the iterative process we find many Lp
means the errors can accumulate. Again, the use of very small stepsize helps
in maintaining the error in acceptable levels.
149
?1
?0.5
0
0.5
1
?1 ?0.8
?0.6 ?0.4
?0.2 0
0.2 0.4
0.6 0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
 
x4
x3
x1
x2
 
data
GMI3
L3
GMI1.1
L1.1
GMI2
L2
Figure 5.6: GMI mean of types p = 1.1 and p = 3 together with the MI mean for
four points{xi}4i=1 on the unit sphere are shown. For comparison the corresponding
Lp means are also shown. The base case for the GMI means is K = 3.
150
Chapter 6
Results Specific to the Orthogonal Group and the
Grassmannian
6.1 Introduction
Feasibility of numerical computations related to manifold-valued data depends heav-
ily on our ability in computing the exponential map and geodesics of the underlying
manifold. For an arbitrary manifold computing the exponential map can be very
difficult. However, for a class of manifolds known as Riemannian symmetric spaces
that can be done with relative ease. Fortunately, Riemannian symmetric spaces nat-
urally appear in many applications, and for this reason they deserve special study.
In this chapter we study the Riemannian Lp means and recursive-iterative means in
two important symmetric spaces: the special orthogonal group SO(n) (comprised of
n?n orthogonal matrices of determinant 1), and the Grassmannian manifold Gk(n)
which is the space of k-dimensional subspaces of Rn. Finding geodesics and other
related calculations in these two manifolds boil down to matrix computations. We
place emphasize on developing efficient methods for calculation of the pairwise and
MI means in these manifolds.
151
6.1.1 Contributions and Outline of the Chapter
It is an interesting fact observed in [36] (see also [16, Chapter 7]) that existence
and uniqueness of the local Riemannian L2 mean could be extended to rather large
non-ball domains in SO(n). By large we mean relative to the diameter or volume
of SO(n) or in the sense of inclusion. Such a domain can be described via a Finsler
distance induced by the matrix 2-norm -instead of the Riemannian norm- in the
Lie algebra of SO(n) (see Section (6.4)). In this chapter we try to build our results
based on this observation. More concretely, we give large domains for existence and
uniqueness of the means. These domains are not Riemannian balls rather Finsler
balls, and are larger compared with the Riemannian balls. Not all the results we
derive are completely new; however, in most occasions the already known results are
derived based on new observations (e.g., Theorem 6.5.1). Two main contributions
of this chapter are as follows:
1. In Theorem 6.6.1 we derive the existence and uniqueness results for pairwise,
MI and WMI mean of points in Finsler 2-norm balls of appropriate size on
SO(n) and Gk(n). Also in Theorem 6.5.1 we give the same result for local Lp
means.
2. In Theorems 6.7.1 and 6.7.2 we give efficient methods to find the midpoint
of the geodesic between two points in SO(n) (or Gk(n)) and the reflection of
a point with respect to another point in SO(n) (or Gk(n)). The significance
of these results is that they allow us to interpolate or extrapolate a geodesic
between two points without the explicit use of matrix logarithm and exponen-
152
tial, both of which have high computational complexity. This will be helpful,
especially in finding the pairwise and MI means (see Sections 5.3, 5.4 and 6.8)
in these spaces.
We should emphasize that the results in this chapter most likely can still be strength-
ened. Our efforts to strengthen Theorem 6.5.1 to include the global Riemannian Lp
mean as well as larger Finsler domains were not successful.
The outline of this chapter is as follows: In Section 6.2 we give the needed
results about the geometry of SO(n) and Gk(n). In Section 6.3 we give the existence
and uniqueness results for means of points in a Riemannian ball in SO(n) or Gk(n).
In Section 6.4 we describe the Finsler balls based on the matrix 2-norm on the
tangent space and study convexity of the Riemannian distance function within these
strongly convex Finsler balls in SO(n) and Gk(n). In Section 6.5 existence and
uniqueness of the local Lp means in 2-norm Finsler balls is studied. In Section 6.6
we give existence and uniqueness results for pairwise, MI and WMI means in the
mentioned 2-norm balls. In Section 6.7 we derive efficient methods to find midpoint
of a geodesic between two point in SO(n) and Gk(n). In Section 6.8 we apply the
methods of Section 6.7 to implement two algorithms: the Nelder-Mead method of
direct search on SO(n) and the K-means clustering algorithm on Gk(n).
6.2 Riemannian Geometry of SO(n) and Gp(n)
Here, we briefly give needed results about the Riemannian geometry of SO(n) and
Gk(n).
153
6.2.1 Riemannian Geometry of SO(n)
The special orthogonal group SO(n) is a compact Lie group, and we denote its Lie
algebra by so(n) which is the space of n?n skew-symmetric matrices with the Lie
bracket operation defined by
[X,Y] = XY ?YX, X,Y ?so(n). (6.1)
The associated Lie algebra level adjoint operator adX : so(n)?so(n) is defined by
adX(Y) = [X,Y] (6.2)
for X,Y ?so(n). The corresponding group level adjoint representation Adx(Y) is
defined as
Adx(Y) = xYx?1 = xYxT (6.3)
for x?SO(n) and Y ?so(n). The two operators are related to each other by the
well-known relation
eadX(Y) = AdeX(Y) (6.4)
for X,Y ? so(n), where eX is the matrix exponential of n?n matrix X. In the
above eadX : so(n)?so(n) is a linear operator defined as
eadX(Y) = ea|a=adX(Y) =
?summationdisplay
k=0
adkX
k! (Y), (6.5)
154
where adkX(Y) = [X,adk?1X (Y)] and ad0X(Y) = Y.
Remark 6.2.1. We also will use the inverse of the matrix exponential, i.e., the
matrix logarithm. It is well-known that if a non-singular n?n matrix x has no
eigenvalues on the negative real line, then its logarithm X = logx is a unique
matrix such that x = eX and all the eigenvalues of X have imaginary parts between
?? and ?. Note that a skew-symmetric matrix has pure imaginary eigenvalues;
hence, if x ? SO(n) has no eigenvalues equal to ?1, then there exists a unique
X ?so(n) such that all the eigenvalues of X are smaller than ? in absolute value
and that x = eX. See Theorem 6.2.2 which gives the injectivity radius of SO(n) in
the standard Riemannian metric.
For x?SO(n) the left translation Lx : SO(n)?SO(n) is defined by Lx(y) =
xy for y?SO(n). The right translation can be defined similarly. A tangent vector
Y ? so(n) can be left translated to TxSO(n) via the derivative of Lx. This way,
by left or right translations, corresponding to any Y ? so(n), we can build left
or right invariant vector fields YL or YR. Therefore, we can define the Lie bracket
operation for left or right invariant vector fields based on the definition of Lie bracket
on so(n). This gives an algebraic-differential picture of SO(n). There is a natural
way to equip SO(n) and more generally any compact Lie group with a Riemannian
structure which matches the group structure, in the sense that both left and right
translations are isometries. This follows from the fact that there always exists a
Riemannian structure g which is invariant under both left and right translations,
i.e., it is bi-invariant. We do not mention the details and only mention the following
155
theorem [46, 33, 40]:
Theorem 6.2.1. There exists a bi-invariant Riemannian metric g on SO(n) for
which
1. SO(n) is a complete Riemannian manifold,
2. The geodesics throughI, theidentity ofSO(n), coincide with theone-parameter
subgroups ofSO(n), i.e, allsuch geodesics areoftheformetX where X?so(n).
Hence, all geodesics coincide with the one-parameter subgroups and their
translations. In this metric inversion is an isometric reflection with respect
to I.
3. If?is the Levi-Civita connection for g and R is its corresponding curvature
tensor at identity then:
(a) ?XY = 12[X,Y], X,Y ?so(n)
(b) R(X,Y)Z =?14[[X,Y],Z], X,Y,Z?so(n)
(c) ?R(X,Y)Z,U?I =?14?[X,Y],[Z,U]?, X,Y,Z,U ?so(n)
The most common such a Riemannian metric is constructed from the inner
product induced by the the so-called Killing form on so(n). We avoid the details
and give the important facts which we need. The mentioned inner product on so(n)
after a scaling will be
?X,Y?I = gI(X,Y) = 12tr(XTY), X,Y ?so(n). (6.6)
156
For left-invariant vector fields XL and YL we have
gx(XL(x),YL(x)) = gI(x?1XL(x),x?1YL(x)). (6.7)
We call this metric the standard Riemannian metric of SO(n). Note that at so(n)
we have gI(X,X) = 12bardblXbardbl2F, where bardbl.bardblF denotes the Frobenius norm. It can be
shown that for X,Y ?so(n) [13]
0?bardbl[X,Y]bardblF ?bardblXbardblFbardblYbardblF. (6.8)
From this and item 3c in Theorem 6.2.1 we conclude that the sectional curvature
of SO(n) in the standard Riemannian metric satisfies 0?K ? 12. Also it is quite
easy to establish the following Theorem, using the properties of matrix exponential
and skew-symmetric matrices:
Theorem 6.2.2. we have the following:
1. The injectivity radius of SO(n) in the standard Riemannian metric is ?.
2. Let n? = [n2], where [a] is the smallest integer not larger than a?R. Closed
simple geodesics in SO(n) are of length 2radicalbigl21 +...+l2n??, where li?s are non-
negative integers with no common factor other than 1. A shortest (nontrivial)
closed geodesic is of length 2?. A longest simple geodesic is of length 2?n??.
Hence, the Riemannian diameter of SO(n) is?n??.
3. In addition, for even n the farthest point from the identity I is ?I and for
157
odd n the farthest points from I are of the form
?I =?I + 2VV T, (6.9)
where v?Rn andbardblVbardbl= 1. (bardbl.bardblis the standard norm of Rn.)
6.2.2 Riemannian Geometry of Gk(n)
The following materials can be found in many references, but we mostly follow
[26, 91, 46, 33]. The Grassmannian Gk(n) is defined as the manifold of k-dimensional
subspaces of Rn, where we assume k ? bracketleftbign2bracketrightbig. Here, [a] is the integer part of real
number a. For k = 1, Gk(n) is the real projective plant RPn?1, which is found by
identifying the antipodal points on the unit sphere Sn?1. The more complicated and
interesting case is when k is larger than 1; and henceforth we assume k?2. Gk(n)
is a smooth manifold which can be endowed with a Riemannian structure. There
is a natural Riemannian structure which is closely related to the quotient topology
associated with Gk(n). We briefly mention these facts. Consider a subgroup of
SO(n) defined as
Kk(n) ={bracketleftbigo1 00 o2 bracketrightbig|o1?O(k),o2?O(n?k),det(o1o2) = 1}. (6.10)
Here, O(n) denotes the Lie group of orthogonal matrices. Kk(n) is also denoted by
S(O(k)?O(n?k)). Now let q = [Un?k|U?n?(n?k)] denote an element of SO(n). The
first k-columns of q denoted by U represent a k-dimensional subspace of Rn, and
158
the rest of the columns denoted by U? represent the orthogonal complement of that
subspace. Consider an action of Kk(n) on SO(n) which is defined by multiplication
from the right of q?SO(n). Gk(n) is identified with the set of orbits of this action
which coincides with the left coset of the subgroup Kk(n). Moreover, Gk(n) can
be considered as a Riemannian manifold whose metric is derived from the standard
Riemannian metric of SO(n). Each element q ? SO(n) can be associated to an
element {qKk(n)}? Gk(n) which consists of the orbit of Q under the action of
Kk(n):
{qKk(n)}={qo|o?Kk(n)}. (6.11)
Note that this is the collection of elements of SO(n) whose first k columns span
the same subspace as the first k columns of q. The tangent space to SO(n) at
the identity will be decomposed to two orthogonal subspaces according to Cartan
decomposition: so(n) = gk(n)?kk(n), where
kk(n) ={bracketleftbigA 00 Bbracketrightbig|A?so(k),B?so(n?k)} (6.12)
and
gk(n) ={bracketleftbig0 ?ATA 0 bracketrightbig|A?R(n?k)?k}. (6.13)
gk(n) and kk(n) are called horizontal and vertical spaces, respectively. At any other
point q?SO(n) we have the corresponding decomposition via left translation. Note
that gk(n) is identified with the tangent space of Gk(n) at{IKk(n)}. Therefore, if
Xi = [ 0 ? ?XTi?X
i 0
bracketrightbig (i = 1,2) are two tangent vectors at{IK
k(n)}, then the Riemannian
159
metric is nothing but
?X1,X2?{IKk(n)} = gI(X1,X2) =?12tr(X1X2) = tr( ?XT1 ?X2), (6.14)
where g is the standard Riemannian metric of SO(n) (6.6). We denote the corre-
sponding norm at gk(n) by bardbl.bardbl, as before. A consequence of the above relation is
that the geodesic in Gk(n) along X ? gk(n) is noting but the equivalent class of
{etXKk(n)}, where eA is the standard matrix exponential of matrix A?Rn?n. Let
?X = U(n?k)?(n?k)bracketleftbig?k?k
0
bracketrightbigV T
k?k be a Singular Value Decomposition (SVD) of ?X(n?k)?k.
We can write U = [U1,U2 ], where U1 is (n?k)?k and U2 is (n?k)?(n?2k).
Then X?gk(n) can be written as:
X = bracketleftbig 0 ? ?XT?X 0 bracketrightbig = bracketleftbigV 0 00 U1 U2 bracketrightbig
bracketleftbigg
0 ?? 0
? 0 00 0 0
bracketrightbiggbracketleftbig
V T 0 00 U
1 U2
bracketrightbigT. (6.15)
Assume that ? = diag(?1,...,?k), where ?1 ? ... ? ?k ? 0. Here, diag(a) is a
diagonal matrix whose diagonal is the vector a. Note that bardblXbardbl= (summationtext?2i)12. Now,
etX can be computed easily as
etX = bracketleftbigV 00 U bracketrightbig
bracketleftbiggC(t?
1,...,t?k) ?S(t?1,...,t?k) 0
S(t?1,...,t?k) C(t?1,...,t?k) 0
0 0 I(n?2k)?(n?2k)
bracketrightbiggbracketleftbig
V 00 U bracketrightbigT, (6.16)
where
C(?1,...,?k) = diagparenleftbigcos?1,...,cos?kparenrightbig and S(?1,...,?k) = diagparenleftbigsin?1,...,sin?kparenrightbig.
(6.17)
160
The decomposition in (6.16) is known as the CS decomposition of the orthogonal
matrix etX [31, 26]. An element s?Gk(n) can be represented by a subspace that is
a member of{etXKk(n)}as
S(tX) = bracketleftbigV 00 U bracketrightbig
bracketleftbiggC(t?
1,...,t?k)
S(t?1,...,t?k)
0(n?2k)?k
bracketrightbigg
(6.18)
We call this the standard subspace representation of s ={etXKk(n)}?Gk(n). For
any v?SO(k), S(tX)v represents the same subspace, but it is in the standard form
anymore. Note that S(tX) is found from (6.16) by dropping the right most factor
and two columns of the matrix in middle. Also note that the geodesics along ?X
(or XT) is given by {(etX)TKk(n)} which can be found simply by a transposition
operation. Therefore, a subspace S? which corresponds to reflection of S with respect
to{IKk(n)}is simply
S?(tX) = S(?tX) = bracketleftbigV 00 U bracketrightbig
bracketleftbigg C(t?
1,...,t?k)
?S(t?1,...,t?k)
0(n?2k)?k
bracketrightbigg
. (6.19)
Note that S? can be found by replacing t?i with ?t?i. The Riemannian distance
between two elements of Gk(n) can be found from their subspace representations
(standard or non-standard) easily: If s1,s2 ? Gk(n) have subspace representa-
tions S1 and S2, respectively, let ST1 S2 = UCV T be an SVD of ST1 S2, then C =
diag(cos?1,...,cos?k), where 0 ? ?1,...,?k ? ?2 are the principal angles between
161
s1 and s2 and the Riemannian distance between s1 and s2 is
d(s1,s2) = (
ksummationdisplay
i=1
?2i)12. (6.20)
Theorem 6.2.3. We have the following
1. The injectivity radius ofGk(n)inthe standardRiemannianmetric isinjGk(n) =
?
2.
2. Closed geodesics in Gk(n) are of length radicalbigl21 +...+l2k?, where l1,...,lk are
nonnegative integers with no common factor other than 1. A shortest closed
geodesic is of length ? and a longest is of length ?k?. Hence, the distance
between two farthest points inGk(n) (or equivalently the Riemannian diameter
of Gk(n)) is
?k
2 ?.
3. Let R : gk(n)?gk(n)?gk(n) ? gk(n) denote curvature tensor of Gk(n) at
{IKk(n)}. We have
R(X,Y)Z =?[[X,Y],Z], X,Y,Z?gk(n), (6.21)
where [.,.] is the matrix Lie bracket. Therefore, the sectional curvature along
two orthonormal tangent vectorsX,Y ?gk(n) is?tr([[X,Y],Y]X) =bardbl[X,Y]bardbl2,
where bardbl.bardbl is the standard norm in so(n). From this, one can conclude that
the sectional curvature of Gk(n) satisfies 0?K?2.
Remark 6.2.2. In Theorem 5.3.2, a condition on the finiteness of the number of
162
possible values for the lengths of closed geodesic is mentioned. From Theorems 6.2.2
and 6.2.3, this condition is satisfied in SO(n) and Gk(n).
6.2.2.1 Isometric Identification of Gk(n) with the Space of Orthogonal
Projections Matrices of Rank k
It is obvious that to any k-dimensional subspace of Rn we can associate, in a unique
way, an n?n orthogonal projection matrix. Such a matrix has the form SST,
where S ? Rn?k is a matrix comprised of an orthogonal basis of that subspace.
Let us denote the linear space of n?n symmetric matrices bySn and its subset of
orthogonal projection matrices of rank k byPk,n. Pk,n can be isometrically identified
with Gk(n) as Riemannian manifolds via an isometric map [89]. This map, simply
maps a tangent vector at{IKk(n)} which can be represented as X = [ 0 ? ?XT?X 0 bracketrightbig to
a tangent vector ?X = [ 0 ?XT?X 0 bracketrightbig in the tangent space to Pk,n at Ik = [Ik?k 00 0]. This
can be easily extended to other points in Gk(n) via rotating a subspace in Gk(n)
and congruence transformation inPk,n. The actual formula for the geodesic in this
space can be found e.g., in [41].
6.3 Existence and Uniqueness of Lp and MI Means in Riemannian
Balls
The following is a direct consequence of what preceded in Section 6.2:
Theorem 6.3.1. Theorems 3.2.1, 3.2.2, 5.4.1 and 5.4.4 hold true in M = SO(n)
equipped with the standard Riemannian metric in which case we have ? = 12 and
163
injM = ?, and hence (see (3.5))
??,p =
?
??
???
?2
4 ? 1?p < 2
?
2(= rcx) 2?p??.
(6.22)
The same theorems hold for M = Gk(n)(2 ? k ? [n2]) where have ? = 2 and
injM = ?2, and accordingly
??,p =
?
???
???
?2
8 ? 1?p < 2
?
4(= rcx) 2?p??.
(6.23)
The main shortcoming of the above theorem is that the balls of radius ? < ??,p
become increasingly small compared with the diameter of SO(n) or Gk(n) as n
increases (see Theorems 6.2.2 and 6.2.3). Note that ??,p remains constant as n
increases, while the diameters of SO(n) and Gk(n) increase by n. The crucial
observation made in [36] is that one can find larger non-ball strongly convex domains
in a compact Lie group in which the local center of mass is unique. This domain is
easily described in terms of the matrix 2-norm in the Lie algebra so(n) (rather than
the norm (6.6) related to the standard Riemannian metric). We shall elaborate on
this issue next.
6.4 Large Non-ball Convex Domains in SO(n) and Gk(n) and a Finsler
Structure
For our purposes a Finsler manifold is defined as follows [72]:
164
Definition 6.4.1. A Finsler metric F on a smooth manifold M assigns a norm
F(x,.) to each tangent space TxM in a smooth manner as x varies. The pair (M,F)
is called a Finsler manifold.
Remark 6.4.1. A more restricted definition requires a convexity condition on the
norm, i.e., the norm must be a Minkowski norm1. The main consequence of this
restriction is that the geodesics are well behaved in the sense of uniqueness and
smoothness. An example clears the issue [4]. InR2 the standard Euclidean norm has
only the straight lines as its length minimizing curves between two points. However,
the L? norm which is defined by bardblbracketleftbigxybracketrightbigbardbl? = max{|x|,|y|} in addition to straight
lines has some other merely piece-wise differentiable length minimizing curves too.
The same holds for the L1 norm defined asbardblbracketleftbigxybracketrightbigbardbl1 =|x|+|y|. Neither of these two
norms are Minkowski norms. Also note that in these two norms the unit sphere is
not smooth, in contrast with the unit spheres in the Lp with 1 < p <?. For our
purposes the issue of uniqueness of Finsler geodesics are not important; hence, in
what follows we shall use a norm which is not Minkowski.
1A Minkowski norm is defined as follows: Let V be a finite dimensional vector space. A
function on V is called a Minkowski norm if:
1. F(u)?0 for every u?V and F(u) = 0 if and only if u = 0,
2. F(?u) =|?|u for every u?V and ??R and
3. F is smooth on V ?{0} and its Hessian at every point y is positive definite, i.e., the
symmetric bi-linear form:
Fy(u,v) = 12 ?
2
?s?tF
2(y + su + tv)|s=t=0 (6.24)
is positive definite.
165
6.4.1 The Matrix 2-Norm as a Finsler Norm
The matrix induced 2-norm of A?Rn?n is defined as:
bardblAbardbl2 = max
bardblubardbl2=1
bardblAubardbl (6.25)
wherebardbl.bardblis the standard Euclidean norm inRn. bardblAbardbl2 is the largest singular value of
A and for a normal matrix (hence a skew-symmetric) it is the largest eigenvalue of A
in absolute value. Note that this norm is bi-invariant with respect to left and right
multiplication by orthogonal matrices, i.e., bardblxAybardbl2 = bardblAbardbl2 for x,y ? O(n). We
equip so(n) with this norm, extend it to any other tangent space via bi-invariance;
and we make SO(n) into a Finsler manifold. Now the length of a piecewise smooth
curve c : [0,1]?SO(n) is
L(c) =
integraldisplay 1
t=0
vextenddoublevextenddoubledc
dt
vextenddoublevextenddouble
2dt. (6.26)
The 2-norm Finsler distance between two points x,y ? SO(n) is defined as the
infimum of the length of all piecewise smooth curves in SO(n) connecting x and y.
We denote the 2-norm distance between x and y by d2(x,y). A curve between x and
y is a length minimizing curve or minimal if its length is equal to distance between
x and y. We refrain from using the term geodesic hence defining them as stationary
points of the length functional requires defining a connection which requires further
preparation.
Remark 6.4.2. For n = 2,3 and A?so(n) we have bardblAbardbl2 = 1?2bardblAbardblF. Therefore,
bardbl.bardbl2 is a Minkowski norm in this case. However, it is easy to see thatbardbl.bardbl2 is not a
166
Minkowski norm for n?4. For example, for n = 4, consider three so(n) elements
Y, U and V. Let Y = Q?QH with ? = diag(+i,?i,0,0), where i = ??1, be
the eigen decomposition of Y. Here QH is the conjugate transpose of the unitary
matrix Q. Set U = QDQH with D = diag(0,0,+i,?i) and V = U. For small
enough|s|and|t|(s,t?R) we havebardblY + sU + tVbardbl2 =bardblYbardbl2 = 1. As a result the
bi-linear form in (6.24) is zero with U = V negationslash= 0; and it cannot be positive definite.
This automatically extends to every n > 4. Therefore, we expect to have length
minimizing curves with cranks in SO(n) (n ? 4) as a Finsler manifold. In fact,
consider this curve in so(4):
?(t) =
?
???
???
tY +tU 0?t? 12
tY + (1?t)U 12 ?t?1,
(6.27)
where Y and U are defined as before. Note that bardbl??(t)bardbl2 = bardblYbardbl2. The piecewise
linear curve ?(t) connects the origin to Y and its length is bardblYbardbl2, so it is a length
minimizing curve. Obviously, there is the straight line segment ?(t) = tY with
0 ?t?1 which does the same, i.e, its length is equal to the distance between Y
and the origin. Theorem 6.4.1 tells us that the curve etY is a length minimizing
curve in SO(4) connecting I to eY ; and since the curve ?(t) = e?(t) also has the
same 2-norm length it too is a length minimizing curve in SO(4). Therefore, in
(SO(4),bardbl.bardbl2) the length minimizing curves are not unique, even locally, which is in
contrast with the Riemannian case- where the length minimizing curves are locally
unique and they are the geodesics.
167
Remark 6.4.3. The 2-norm is not an admissible norm with respect to the Lie
bracket operation. A normbardbl.bardblon a Lie algebra g is admissible ifbardbl[A,B]bardbl?bardblAbardblbardblBbardbl
for A,B ? g [60]. As we mentioned before bardbl.bardblF is an admissible norm on so(n).
Any induced norm like the 2-norm, can be converted to an admissible norm via
scaling. For example, if we definebardblAbardbl2? = 2bardblAbardbl2, then from the triangle inequality
and sincebardblABbardbl2?bardblAbardbl2bardblBbardbl2 we havebardbl[A,B]bardbl2? ?bardblAbardbl2?bardblBbardbl2? for all A,B?so(n).
However, the unit ball ofbardbl.bardbl2? is contained in the unit ball ofbardbl.bardbl2. In this sense we
have shrunk the unit ball. Some of our results here, are essentially derived in [36] in
the context of a general compact Lie group and with an admissible norm. Applying
those results to SO(n) with thebardbl.bardbl2? norm, results in too conservative results about
the radius of largest convex balls in SO(n).
The following theorem shows that curves of the form etX and their translates
minimize the 2 Finsler distance in addition to the Riemannian distance. Therefore,
the Finsler structure is quite simple. The proof of the theorem can be found in [16,
Chapter 7] (cf. [36]):
Theorem 6.4.1. Let X,Y ? so(n) with bardblXbardbl2 +bardblYbardbl2 < ?. Then there exists
a unique H ? so(n) with bardblHbardbl2 < ? such that h = eH = eXeY . Moreover,
bardblHbardbl2 ?bardblXbardbl2 +bardblYbardbl2. Therefore, the 1-parameter subgroups of SO(n) and their
translates locally minimize the 2-norm Finsler distances. Here, locally means on
2-norm lengths not larger than ?.
The following corollaries are immediate:
168
Corollary 6.4.1. If X,Y are as in Theorem 6.4.1 and x = eX and y = eY , then
d2(x,y) =bardbllogxTybardbl2?bardblXbardbl2 +bardblYbardbl2. (6.28)
Corollary 6.4.2. The tangent space gk(n) can be endowed with the Finsler 2-norm
induced from so(n). Hence, Gk(n) will be a Finsler manifold in which horizontal
geodesics and their translates locally minimize the 2-norm Finsler distance. Here,
locally means on 2-norm lengths not larger than ?2. We denote the 2-norm distance
between x,y?Gk(n) by d2(x,y).
6.4.1.1 Metric Decreasing Property of the Matrix Exponential in SO(n)
The exponential map in a Riemannian manifold of nonnegative curvature has a
metric decreasing property. This means that the distances between two points in
the tangent spaces is not smaller than the distance between their images under the
exponential on the manifold. One can show that on SO(n) this holds for any bi-
invariant Finsler distance. This fact was observed in [36] for admissible bi-invariant
norms (see also [16, Chapter 7]). Here, we give a rather different proof, where we
do not use the admissibility assumption. First, we give a short preparation about
the derivative of the matrix exponential (see [45] or [38]). Let f(t) = eA(t) where
A : [0,1]?so(n) is a curve in so(n). Then the derivative of f(t) is computed as
d
dtf(t) = dexpA(t)(
?A(t))f(t), (6.29)
169
where ?A(t) = dA(t)dt , and dexpX : so(n)?so(n) is defined as
dexpX(Y) = e
a?1
a |adX(Y) (6.30)
for any Y ?so(n). Here, by ea?1a |adX(Y) we mean that in the Laurant expansion of
umapsto? ea?1a , a?C is replaced by adX and then applied to Y (see (6.2)).
Theorem 6.4.2. Let bardbl.bardblbe any bi-invariant norm in so(n) and d its induced dis-
tance function in SO(n) 1. We have
d(eX,eY )?bardblX?Ybardbl, ?X,Y ?so(n). (6.31)
Proof. Consider the function f : [0,1]?[0,1]?SO(n) with f(t,s) = et(X+s(Y ?X)).
Note that f(1,0) = eX and f(1,1) = eY , i.e., smapsto?f(1,s) is a curve connecting eX
to eY , therefore:
d(eX,eY )?
integraldisplay 1
0
bardbl?f?sbardblds. (6.32)
But from (6.29) we have
?f
?s = dexpt(X+s(Y ?X))t(Y ?X)f(t,s). (6.33)
From bi-invariance of bardbl.bardbl we have bardbl?f?sbardbl = bardbldexpt(X+s(Y?X))t(Y ?X)bardbl. Now, we
1Here, d can be either the Riemannian distance function or the 2-norm Finsler distance.
170
calculate ?g?t where g(t,s) = dexpt(X+s(Y ?X))(t(Y ?X)). Note that
g(t,s) =
?summationdisplay
i=1
ti
i!ad
i?1
X+s(Y?X)(Y ?X). (6.34)
Hence
?g
?t =
?summationdisplay
i=0
ti
i!ad
i
u|u=X+s(Y?X)(Y ?X) = e
adt(X+s(Y?X))(Y ?X)
= et(X+s(Y ?X))(Y ?X)e?t(X+s(Y ?X)). (6.35)
Therefore,bardbl?g?tbardbl=bardblY?Xbardbland this immediately implies that d(eX,eY )?bardblY?Xbardbl.
6.4.2 Hessian of the Distance Function in Symmetric Spaces
Both SO(n) and Gk(n) are symmetric spaces. One consequence is that the eigen-
values of the curvature operator do not change along a geodesic, and the Jacobi
equation becomes a constant coefficient equation which can be solved explicitly. In
the sequel assume M is a symmetric Riemannian manifold of dimension m. Let
c : [0,d(q,x)] ? M be a unit speed geodesic connecting q to x. The curvature
operator Rc(t) : Tc(t) ?Tc(t) defined by
Xmapsto?Rc(t)(X,c?(t))c?(t) (6.36)
is a self-adjoint operator which has a set of m real eigenvalues ?1(t)?...??m(t).
Here, c?(t) is the velocity vector of c(t). Due to symmetry of M the eigenvalues
171
remain constant in time and due to both the symmetry and compactness of M they
are all nonnegative, therefore ?i(t) = ?i ? 0 [46]. Denote a set of orthonormal
eigenvectors of Rc(0) by {Vi(0)}ni=1. For every i by parallel translating the vector
Vi(0) along c(t), we will have a parallel vector Vi(t) which for each t is an eigen-
vector of Rc(t) associated with ?i. Hence, {Vi(t)}mi=1 will be an orthonormal frame
along c(t). Any Jacobi field J(t) along c(t) with initial condition J(0) = 0 can
be written as J(t) = summationtextmi=1 AiJi(t), where Ji(t) satisfies J??i (t) + ?iJ(t) = 0 and
bardblJ?i(0)bardbl = 1. We can solve explicitly for Ji as Ji(t) = sn?i(t)Vi(t), where sn?(t) is
defined in (2.6). Now, recalling formula (2.4) for the Hessian of xmapsto?d(q,x) we have
Hess(d(q,x))(Ji(t),Jj(t)) =?Ji(t)??,J?j (t)??ij, where ?ij is Kronecker?s delta. Note
that from this we have Hess(d(q,x))(Jm(t),Jm(t)) = 0, since Jm(t) has no normal
component along c(t). From that formula we have:
Hess(d(q,x))(Vi(d(q,x)),Vj(d(q,x))) =
?
???
???
0 i or j = m
cs?i(d(q,x))
sn?i(d(q,x))?ij 1?i,j < m.
(6.37)
Note that since ?i ? 0, we have cs?i(d(q,x))sn?
i(d(q,x))
= ??i cot(??id(q,x)), except when
?i = 0 in which case the expression is equal to 1d(q,x). It is obvious that the largest
eigenvalue of the curvature operator, i.e., ?1 is what controls the sign of the Hessian.
For SO(n) and Gk(n) it is possible to find the ?i?s more explicitly and identify the
conditions to guarantee that xmapsto?d(q,x) is positive-definite.
172
6.4.2.1 Eigenvalues of the Curvature Operator in SO(n) and Gk(n)
There are different ways to derive the results we are going to present. One short
way, which we shall follow, is to use the notion of reduced commutator introduced
in [13]. Let {?i}mi=1, where m = n(n+1)2 , be an orthonormal basis for so(n). Any
Y ?so(n) can be written as Y = summationtextmi=1 yi?i. Denote the vector
bracketleftbigg y1
...
ym
bracketrightbigg
?Rm by ?(Y),
and hence identify so(n) (as a vector space) with Rm. The reduced commutator
CadX : Rm?Rm is linear operator defined via the identity
?([X,Y]) = CadX?(Y). (6.38)
Note that CadX is a matrix representing the adjoint operator adX. We recall the
following proposition from [13]:
Proposition 6.4.1. Let n = 2n? be an even number and let ?i?1,...,?i?n? (i =
??1) be the eigenvalues of X ?so(n). Then the eigenvalues of C
adX are i(??i?
?j)(1 ? i < j ? n?) as well as zero with multiplicity n?. If n = 2n? + 1 and
?i?1,...,?i?n?,0aretheeigenvalues ofX, thenthe eigenvalues ofCadX are?i?i(1?
i?n?), i(??i??j)(1?i < j?n?) as well as zero with multiplicity n?.
Now, back to our application, we want to find the eigenvalues of the curvature
operator for SO(n) and Gk(n). From Theorems 6.2.1 and 6.2.3 we see that the
curvature operator at I and along X is given by Y mapsto??14ad2X(Y) for SO(n) and Y mapsto?
?ad2X(Y) for Gk(n) with the extra condition of bardblXbardbl = 1. The above proposition
173
enables us to find the eigenvalues of the curvature operator. Note that we can write
?1(?ad2X) = (|?1(X)|+|?2(X)|)2, (6.39)
where ?1(A) and ?2(A) are the first and second largest eigenvalues in absolute value
of the symmetric matrix (or skew-symmetric) operator A, respectively.
6.4.3 2-norm Finsler Balls in SO(n)
The following Theorem characterizes stronglyconvex 2-norm(Finsler) balls inSO(n):
Theorem 6.4.3. For o?SO(n) define the 2-norm (Finsler) ball
B2(o,?) ={oetX|X?so(n),bardblXbardbl2 < ?}. (6.40)
Then B2(o,?) is a (Riemannian) strongly convex set in SO(n) for every ? < ?2.
Moreover, the Riemannian distance function x mapsto? d(o,x) is strictly convex inside
B2(o,?) along non-radial directions; and it is only convex along radial directions.
Proof. We partially follow [36] to prove the first part. Without loss of generality
we can assume that o coincides with I, the identity of SO(n). Let X,Y ? so(n)
andbardblXbardbl2,bardblYbardbl2 < ?. Set x = eX and y = eY . We have x,y ?B2(I,?), and since
d2(x,y) < ? there is a unique minimizing geodesic connecting x to y . Let z be
the unique midpoint of the minimizing geodesic connecting x to y. Note that z?1x
and z?1y have I as their midpoint. Therefore, by inversion we have (z?1x)?1 =
z?1y and z2 = zxz?1y. We write ?x = zxz?1. Note that d2(I,?x) = d2(I,x) < ?.
174
Therefore, there exists ?X ? so(n) with bardbl?Xbardbl2 < ? such that ?x = e ?X, and we have
z2 = e ?XeY . From this and Theorem 6.4.1, due to the bi-invariance of d2, we have
d2(I,z2)?bardbl?Xbardbl2 +bardblYbardbl2 < 2? < ?, which implies that there exists Z??so(n) with
bardblZ?bardbl2 < 2? < ? with z2 = e2Z?. Therefore, z = eZ with Z = Z?2 and bardblZbardbl2 < ?
or equivalently d2(I,z) < ?. This means that the midpoint of any minimizing
geodesic connecting two points in B2(I,?) belongs to B2(I,?). This proves the
strong convexity of B2(o,?). For the second statement, let X ?so(n) be such that
bardblXbardbl2 and x = eX. In order to use (6.37) we normalize X as ?X = XbardblXbardbl = Xd(I,x) (note
thatbardblXbardblis the Riemannian norm in (6.6)). If ?1(|?1|= ?) and ?2 are the two largest
eigenvalues (in absolute value) of X, then the largest eigenvalue of?ad2?X by (6.39)
is equal to (?+|?1|)2d2(I,x) . Hence, the largest eigenvalue of the curvature operator along
X is ?1 = 14 (?+|?1|)2d2(I,x) . Therefore, we have ?1 ? ?2d2(I,x). Next, from (6.37) we see that
Hessd(I,x) will be positive definite along non-radial directions if??1d(I,x) < ?2 or
equivalently if ? < ?2. Along a radial direction the Hessian is zero.
6.4.4 2-norm Finsler Balls in Gk(n)
The next Theorem is the counterpart of Theorem 6.4.3 in Gk(n):
Theorem 6.4.4. For o?SO(n) define the 2-norm (Finsler) ball
B2({oKk(n)},?) ={{oetXKk(n)}|X?gk(n),bardblXbardbl2 < ?}. (6.41)
Then B2(o,?) is a (Riemannian) strongly convex set in Gk(n) for every ? < ?4.
Moreover, the Riemannian distance function x mapsto? d(o,x) is strictly convex inside
175
B2(o,?) along non-radial directions; and it is only convex along radial directions.
Proof. Both parts follow with a very minor modification from the proof of Theorem
3.1 in [47]. The second part also can be shown similar to the previous theorem, by
noting that here ?1 < 4?2d2({IK
k(n)},x)
.
Remark 6.4.4. These Finsler balls should not look esoteric. Recall that one can
associate [n2] angles, called angles of rotation, with any matrix in SO(n). B2(I,?)
is the set of all elements of SO(n) whose maximum angles of rotation (in absolute
value) are less than ?. A similar interpretation is true for Gk(n) with principal angles
instead. However, here we need to allow for both positive and negative principal
angles. This means that for any s?B2({IKk(n)},?), its reflection with respect to
{IKk(n)}is also in B2({IKk(n)},?). Figure 6.1 shows both B(0,?) and B2(0,?) in
so(n), which transform to B(I,?) and B2(I,?) in SO(n) via the matrix exponential.
The meaning ofB2(I,?) being larger than B(I,?) is clear inthe picture. The relation
between B(I,?) and B2(I,?) in SO(n) is very similar to the relation between the
unit circle and square in the plane.
B(0,?) ={X?so(n)|bardblXbardbl= 1?2bardblXbardblF < ?}
B2(0,?)
B(0,?)
so(n)
?
B2(0,?) ={X?so(n)|bardblXbardbl2 < ?}
B(0,?)?B2(0,?)?so(n)
Figure 6.1: In so(n), the 2-norm Finsler ball of radius ? ? ? is larger than the
Riemannian ball of radius ?.
176
6.5 Existence and Uniqueness of the Local Riemannian Lp Mean in
Finsler Balls
The following theorem gives existence and uniqueness of the local Lp in Finsler balls.
Recall that the local Lp mean in a set B is defined by solving min?B fp(x) (see (3.2)
and Section 3.1.1).
Theorem 6.5.1. Let{xi}Ni=1 be a set of data points in M = SO(n) which lies in a
Finsler ball B2(o,?) with ? < ?4. The local Lp Riemannian mean (1 < p??) with
respect to the weight vector w for this set exists is unique and lies inside B2(o,?).
For p = 1 if the points do not lie on a geodesic, again the same holds, and if the
the points lie on a geodesic the mean might not be unique. The same holds for
M = Gk(n) with ? < ?8.
Proof. We briefly give one proof for bothSO(n) and Gk(n). First, note that the local
mean exists. Next, from Theorems 6.4.3, 6.4.4 and Corollary 2.4.1 we conclude that
for any two points q,x?B(o,?), xmapsto?dp(q,x)(p > 1) is a strictly convex function;
so is x mapsto? fp(x). For p = 1, as before, we have the strict convexity of f1(x) if
not all the data points lie on a geodesic. For p = ?, we might assume that the
L? is the minimum of xmapsto? ?f(x) = maxi d2(xi,x), which is again strictly convex1.
Therefore, except for the possible degenerate case when p = 1, if the local Lp mean
does not lie on the boundary of B(o,?), then it is unique. To see that the mean
cannot lie on the boundary of B, first, recall the properties of strongly convex sets
1We borrow this trick from [76, p. 168], in which the trick is used to avoid the unpleasant
dealing with non-strict convexity of xmapsto?d(xi,x).
177
from Theorem 2.3.2. Let x be a point on the boundary of B2(o,?). Then, the
gradient of fp (1?p <?) at x is a linear combination of a finite number of vectors
pointing inside B (see (3.6)); hence, itself will be pointing inside. This shows that
the Lp(1?p <?) mean cannot be on the boundary of B. For p =?, note that
there exists a vector in TxM pointing inside B to which all thee vectors exp?1x xi
make acute angle; and by the first variation of arc length moving along this vector
reduces f?. Therefore, for 1?p??, x cannot be minimizer of fp. This completes
the proof.
Remark 6.5.1. The result about SO(n) is not completely new, it could be derived
with minor modifications from results in [36] or [16]. Ideally, one would like to have
the claims of the theorem to hold for the global Lp mean. Our efforts, unfortunately,
were not successfully in proving this. The reflection technique used by Buss and
Fillmore [17] which we also used to prove the global aspect of Theorem 3.2.1, seems
not adequate. One reason is that a 2-norm Finsler ball is not a Riemannian ball
(except for n = 2,3). As a result a radial geodesic does not meet the boundary of
the Finsler ball perpendicularly.
Remark 6.5.2. Note that the local L? mean of {xi}Ni=1 ? B2(o,?) is the center
of the minimal (Riemannian) ball of {xi}Ni=1, but this ball is not strongly convex,
necessarily.
178
6.6 Existence and Uniqueness of Mean-Invariance Means in Finsler
Balls in SO(n) and Gk(n)
The following theorem gives the existence and uniqueness condition for pairwise, MI
and WMI means for points in Finsler balls in SO(n) and Gk(n):
Theorem 6.6.1. Let M = SO(n) and assume {xi}Ni=1 ? B2(o,?) ? M where
? < ?2. The pairwise mean, MI and WMI means of this set exist, are unique and lie
in B2(o,?) and the results of Theorems 5.3.1 and 5.4.3 hold. The same statements
hold true if M = Gk(n) and ? < ?4.
Proof. The proof is based on our previous results in Chapter 5. We give one short
proof for both parts. Note that since we have finite number of data points we can
assume that the points lie in B2(o,??) with ?? < ? . Also note that all the operations
related to MI or WMI means remain valid inside B2(o,??). Now, the claim about
the pairwise mean follows from Theorem 5.3.1 with A = B2(o,??). The claim for the
case of MI and WMI means also follows similarly from Theorems 6.4.3 (and 6.4.4
in case M = Gk(n)) and 5.4.3, with A = B(o,??) and A? = B(o,?). Note that in all
cases the mean lies in B2(o,?).
Remark 6.6.1. This theorem gives the existence and uniqueness of pairwise, MI
and WMI means in 2-norm balls which are larger than the balls in Theorem 6.5.1
for local Lp means. While this might be considered as an advantage for recursive-
iterative means, we conjecture that the global Lp(p?2) mean should exist uniquely
in the same balls as Theorem 6.6.1 gives.
179
Remark 6.6.2. In Theorem 5.4.4 we proved the existence and uniqueness of GWMI
means for data points in a Riemannian ball of specified radius. A generalization of
that theorem to Finsler balls, requires accurate analysis of the boundary (which is
nonsmooth) of the Finsler ball B2(o,?) in SO(n) or Gp(n). In particular, we need
to show that for{xi}Ni=1 ?B2(o,?), the Lp mean of the data points belongs to the
interior of B(o,?). Notice that Theorems 3.2.1 and 6.5.1 have different assumptions
about the relative positions of the data points with respect to the boundary of the
balls. To show existence and uniqueness of the GWMI means, we have to establish
that the Lp(1 < p??) mean is a strongly convex mean function with respect to
Finsler balls instead of Riemannian balls (see Definition 5.2.1). This seems quite
plausible, but our efforts in showing that also were not completely successful.
Remark 6.6.3. The Perimeter Shrinkage (PS) scheme introduced in Section 5.3
remains valid if the data points do not belong to a strongly convex ball provided
d2(xi,xi+1) < ?(1 ? i ? N). However, as explained in Theorem 5.3.2 the points
might converge to equidistance positions on closed geodesics.
6.7 Efficient Geodesic Interpolation and Extrapolation
In numerical computations we need to choose a numerical representation for our
manifold-valued data. Usually, this can be accomplished by choosing a suitable
embedding of the manifold in some Euclidean space. This leads to another possible
definition of mean, which is the so-called extrinsic mean:
Definition 6.7.1. Assume (M,d) is embedded in (RnE,bardbl.bardbl), wherebardbl.bardblis the stan-
180
dard Euclidean distance. Also denote the embedding of each x ? M with the
same letter. Denote by ProjM : RnE ? M the orthogonal projection operator
from the ambient space RnE onto the submanifold M. Then the extrinsic mean
of {xi}Ni=1 ? M ? RnE with weight vector w = [w1,...,wN]T is defined by
?ex = ProjM(?EC) where ?EC = summationtextNi=1 wixi is the standard Euclidean weighted
average.
In some occasions it is easy to find the extrinsic mean. For example, as men-
tioned in Subsection 5.5.1 for unit sphere Sn in Rn+1, finding the related projection
operator is easy. It can be shown that the set of points in M whose projection onto
M is not unique has Lebesque measure zero [11]; hence, the extrinsic mean with
respect to a given embedding is almost always unique. On the other hand, since
there are many embedding possibilities, a given set of points can have many different
extrinsic means. Yet, there are some embeddings which have interesting properties
and bring about ease of calculations. The standard embedding of SO(n) in Rn?n
and the embedding of Gk(n) (as orthogonal projection matrices of rank k) in the
space of symmetric matrices are so. These two embeddings allow us to perform fast
geodesic midpoint calculation, as we shall see in the next two subsections.
6.7.1 Efficient Geodesic Interpolation and Extrapolation in SO(n)
The standard embedding of SO(n) in Rn?n is just the usual representation of
elements of SO(n) as n?n matrices. We also measure the distance between
x1,x2 ? SO(n) ? Rn?n by the Frobenius norm as bardblx1 ?x2bardblF. The projection
181
of x?Rn?n onto SO(n) is defined by
ProjSO(n)(x) = argminy?SO(n)bardblx?ybardbl2F. (6.42)
It is easy to check that, if the determinant of x is positive, then ProjO(n)(x) = UV T,
where x = U?V T is an SVD of x [43, p. 432]. Recall that an n?n matrix
can be written as x = P?, where P is positive semi-definite and ? (called the
right polar factor) is orthogonal [43, p. 413]. There is also the so-called left polar
decomposition for which x = QR, where the polar factor Q is orthogonal and R is
positive semi-definite. If x is non-singular, then the polar factors are unique and
also Q = ?. Moreover, if x is a normal matrix (i.e., xxT = xTx), then R = P. Since
x = U?UTUV T, we have ? = UV T, provided x is non-singular. With this brief
preparation we state the following theorem:
Theorem 6.7.1. Consider the standard embedding of SO(n) in Rn?n. Let x1,x2 ?
SO(n). We have the followings:
1. If ?EC(x1,x2) = x1+x22 = U?V T is an SVD of ?EC(x1,x2), then ?1
2
(x1,x2) =
?ex(x1,x2) = UV T. This operation produces a unique midpoint as long as x1
is not a cut point of x2. In particular, if d2(x1,x2) < ? the operation remains
valid.
2. The extrapolated point Ext1(x1,x2) = x1xT2 x1 is the reflection of x2 with
respect to (w.r.t) x1 along the minimizing geodesic connecting x2 tox1. Unless,
x1 and x2 are antipodal points (i.e., d2(x1,x2) = ?) the extrapolated point will
182
be different from x2.
Proof. Moakher in [67] has shown that ?ex(x1,x2) = ProjSO(n)(x1+x22 ) and e12 log(xT2 x1)
both are equal to the polar factor of x1+x22 . This holds if x1+x22 is nonsingular, in
which case the polar factor is equal to UV T, where U?V T is an SVD of x1+x22 .
It is easy to check that x1+x22 is nonsingular, and its determinant is positive if no
eigenvalue of xT2 x1 is ?1. This will be true if d2(I,xT2 x1) < ?. This proves item
1 of the theorem. The first part of item 2 follows from the fact that inversion (or
transpose) is an reflection isometry w.r.t to the identity. Unless the reflected point
and the identity lie on a closed geodesic and in an antipodal positions the reflection
will not coincide with the original point. This proves the second part of 2.
Corollary 6.7.1. The above interpolation and extrapolation operations are well-
defined if x1 and x2 lie in a strongly convex domain, in particular in a Finsler ball
of radius ? < ?2.
The above theorem helps us in constructing the geodesic between two points
or extrapolating the geodesic between them in an iterative process and without the
use of matrix logarithm and exponential. Table 6.1 gives the pseudo-codes for con-
struction of the geodesic between two points and extending the geodesic beyond
them based on midpoint calculation and reflection operation. For the interpolation
part, at each level, given a set of points on the geodesic, we generate new midpoints
and double the number of intermediate points this way. The extrapolation part is
nothing but a reflection operation, but it can be complemented by another interpo-
lation between the reflected point and the new point. Note that the direct numerical
183
Dyadic geodesics interpolation
1. Let M = SO(n) or M = Gk(n). Assume two non-antipodal points
x0,x1?M. Set i = 1 and fix integer K?1.
2. Level i: Given points{x0,...,x2i?1}, for 0?j ?2i?1?1, find xj+1
2
=
?1
2
(xj,xj+1) from item 1 of Theorem 6.7.1 or 1 of Theorem 6.7.2.
3. Re-index the 2i + 1 points to 0,...,2i?1, increment i. If i > K stop;
otherwise goto step 2.
Dyadic geodesics extrapolation
1. Let M = SO(n) or M = Gk(n). Consider x0,x1 ?M. Set i = 1 and fix
integer J ?1.
2. Level i: Given points {x0,...,xi?1,xi}, find xi+1 = Ext1(xi,xi?1) from
item 2 of Theorem 6.7.1 or item 2 of Theorem 6.7.2.
3. Increment i. If i > J stop; otherwise goto step 2.
Table 6.1: Pseudo-code for dyadic geodesic interpolation and extrapolation.
calculation of the form ?1
2
(x1,x2) = x1e12 logxT1 x2 compared with the SVD based mid-
point calculation in 1 has much higher complexity. Finding the matrix logarithm
has the heaviest load, although matrix exponential is also computationally demand-
ing. Moreover, in numerical calculation of logxT1 x2, there is no guarantee that the
result is skew-symmetric to machine precision and one might need to use special
algorithms for this purpose [23]. Without going into the details of the complexity
of the related numerical algorithms we give an example to compare the direct and
dyadic interpolation methods.
Example 6.7.1. We perform a comparison between the computational times for
direct and the dyadic geodesic construction techniques in MATLAB c?. Table 6.2
shows the main parts of the used MATLAB code. We try levels K = 1 and K = 2,
184
[U,S,V] = svd(x1+x2);\%Level 1
M11 = U*V?;
[U,S,V] = svd(M11+x1);
M21 = U*V?;
[U,S,V] = svd(M11+x2);
M22 = U*V?;
N = 2^K;%K=Level
X = logm(x1?*x2);
M = x1;
E = expm(1/N*X);
for i=1:N-1
M = M * E;% M = x1e iN logxT1 x2
end
Table 6.2: Left: Main part of a MATLAB code to compute the geodesic between
x1,x2 ?SO(n) at two levels K = 1 and K = 2 using dyadic efficient interpolation
(See Table 6.1. Right: Main part of a MATLAB code to compute the geodesic
between x1,x2?SO(n) at level K, by direct matrix logarithm-exponential calcula-
tions.
i.e., 1 and 3 intermediate points. In the direct method, instead of finding logxT1 x2
per computed point we just find E = e 12K logxT1 x2, and in the subsequent calculation
use it to make the method more efficient. Table 6.3 shows the results of the median
of the computation time for the two methods. It is clear that the dyadic method is
much faster. However, note that its efficiency decreases by the level or the number of
the generated points geodesic. As explained before, the efficiency of the algorithm
mostly stems from avoiding the use of matrix logarithm. At the same time the
direct method needs to find the logarithm only once; therefore, as K increases that
advantage of the dyadic method decreases. It is interesting to mention that the first
step in some of the most powerful methods for computing the logarithm of a matrix
A (including the method used in MATLAB) is the so-called ?inverse scaling and
squaring? [22], in which successive square roots of A, that is A 12i (i = 1,2,...), are
found. On the other hand, finding the geodesic midpoint involves finding e12 logxT1 x2,
which is nothing but the square rootofxT2 x1. Theretofore, it should not be surprising
that the dyadic method is at least 10 times faster than the direct method at Level
K = 1.
185
Level K = 1 Level K = 2
n dyadic direct
3 0.0001 0.0020
30 0.0029 0.0410
100 0.0470 0.4586
n dyadic direct
3 0.002 0.0020
30 0.0095 0.0596
100 0.1428 0.3963
Table 6.3: Comparison between computation times for two geodesic interpolations
MATLAB codes in Table 6.2. We try two levels K = 1 and K = 2. Times are in
Seconds and each is the median of 100 independent runs.
6.7.2 Efficient Geodesic Interpolation and Extrapolation in Gk(n)
We mentioned about the isometric identification ofPk,n the space of n?n orthog-
onal projection matrices of rank k and Gk(n) in Subsection 6.2.2.1. By standard
embedding ofPk,n inSn, the space of n?n positive semi-definite symmetric matri-
ces, we can define the extrinsic mean of{xi}Ni=1?Pk,n as ProjPk,n(?EC(x1,...,xN)),
as before. Here, ProjPk,n :Sn?Pk,n is defined as
ProjPk,n(x) = argminy?Snbardblx?ybardbl2F. (6.43)
The following lemma gives this projection operator explicitly:
Lemma 6.7.1. Let A ?Rn?n be symmetric and positive semi-definite. Then we
have ProjPk,n(A) = PA = UUT, where Un?(n?k) is a matrix whose columns are the
eigenvectors of A associated to its k largest eigenvalues. If the k largest eigenvalues
of A are strictly larger than the rest, then PA is unique.
Proof. We recall that any n?n orthogonal projection P of rank k, can be associated
to a k-dimensional subspace of Rn, S, and P can written as P = UPUTP where UP is
186
an n?k matrix whose columns are an orthonormal basis of S. Now let A = U?UT
be the full SVD or EVD of A, where ? = diag(?1,...,?n) with ?1 ?...??n ?0.
We want to find a P or a UP which minimizes
f(P) =bardblUPUTP ?Abardbl2F. (6.44)
Up to a additive and multiplicative constants f(P) is equal to?tr(UPUPA). So we
need to maximize maxUP tr(UPUTP A). Which is equivalent to solving
max
UP
tr(UTUP(UTUP)T?). (6.45)
Note that ?P = UTUP(UTUP)T again is a rank k orthogonal projection matrix. The
above function is equal to summationtexti ?Pii?i, where ?Pii is the ith element on the diagonal of ?P.
It is easy to check that any entry of an orthogonal projection matrix is not larger
than 1 in absolute value. Therefore, the maximum value summationtexti ?Pii?i can achieve is
summationtextk
i=1 ?i. This can happen if ?P =
bracketleftbigI
k?k 0
0 0
bracketrightbig. But by choosing U
P equal to the first k
columns of U we can achieve this particular ?P and that maximum. This completes
the proof.
Next, we have the following theorem:
Theorem 6.7.2. Let S1 and S2 be subspaces representations of s1,s2 in Gk(n), re-
spectively. Also let P1 = S1ST1 and P2 = S2ST2 denote the corresponding orthogonal
projection matrices associated with s1 and s2. We have:
1. The extrinsic meanofs1 ands2 (orP1 andP2) canbeexpressed as?ex(P1,P2) =
187
ProjPk,n(?EC(P1,P2)) = SST, where S is an n?k matrix of k eigenvectors
associated with the k largest eigenvalues of ?EC(P1,P2) = P1+P22 . Moreover,
SST is the orthogonal projection matrix associated with the midpoint of the
minimizing geodesic between s1 and s2 and S is a subspace representative of
the midpoint. The above projection gives a unique a point if d2(s1,s2) < ?2.
2. Denote the reflection of s2 w.r.t s1 along the minimizing geodesic connecting
them by Ext1(s1,s2). Then Ext1(s1,s2) has the subspace representation (2P1?
I)S2.
Proof. Without loss ofgenerality we canassume S1 = bracketleftbig Ik?k0(n?k)?k bracketrightbigandS2 = Q
bracketleftbiggC(?
1,...,?k)
S(?1,...,?k)
0(n?2k)?k
bracketrightbigg
,
where Q is of the form Q = bracketleftbigV 00 U bracketrightbig (see (6.16)). Note that P1 = bracketleftbigIk?k 00 0bracketrightbig and
P2 = Q
bracketleftbigg
C2(?1,...,?k) C(?1,...,?k)S(?1,...,?k) 0
C(?1,...,?k)S(?1,...,?k) S2(?1,...,?k) 0
0 0 0
bracketrightbigg
QT. (6.46)
Denote the matrix in the middle by ?P2(?1,...,?k). Note that if ?P = U?P2?UT?P
2
is an
SVD or EVD of P, then P2 = QU?P2?(QU?P2)T is an the SVD of P2. Moreover, for
SVD?s of ?EC = P1+P22 and E = P1+ ?P22 we have a similar relation, due to the special
forms of P1 and Q. Now, we compute an SVD of E. Note that since E is symmetric
and positive semi-definite its EVD and SVD coincide. We rewrite E more explicitly
as
E = 12
bracketleftbigg
diag(cos2 ?1+1,...,cos2 ?k+1) diag(cos?1 sin?1,...,cos?k sin?k) 0
diag(cos?1 sin?1,...,cos?k sin?k) diag(sin2 ?1,...,sin2 ?k) 0
0 0 0
bracketrightbigg
. (6.47)
Note that ?i appears in four entries of E at locations (i,i), (i,i + k), (i + k,i) and
188
(i+k,i +k). Those entries can be put in a matrix Ei of dimension 2?2 as:
Ei = 12
bracketleftbigg
1+cos2 ?i cos?i sin?i
cos?i sin?i sin2 ?i
bracketrightbigg
. (6.48)
It is easy to check that Ei has ?i,1 = cos2 ?i2 and ?i,2 = sin2 ?i2 as eigenvalues.
Note that since ?i < ?2, we have ?i,1 > ?i,2. In fact, for the same reason we have
cos2 ?i2 > 12 and sin2 ?i2 < 12; and hence, ?i,1 > ?j,2 for 1?i,j?k. The corresponding
eigenvectors can be found as vi,1 = bracketleftbigcos
?i
2
sin ?i2
bracketrightbig and v
i,2 =
bracketleftbig?sin ?i
2
cos ?i2
bracketrightbig. Now one can check
that E has 2k eigenvalues ?k,1 ? ... ? ?1,1 > ?1,2 ? ... ? ?k,2 (1 ? i ? k) and
the rest of its n?k eigenvalues are zero. The corresponding eigen matrix can be
constructed as UE = [?UE, ?U?E], where ?UE is of dimension n?2k and is defined as
?UE =
bracketleftbiggC(?1
2 ,...,
?k
2 ) ?S(
?1
2 ,...,
?k
2 )
S(?12 ,...,?k2 ) C(?12 ,...,?k2 )
0 0
bracketrightbigg
, (6.49)
and ?U?E is an n?(n?2k) matrix whose columns span the orthogonal complement
of the subspace spanned by the columns of ?UE. Also define the diagonal matrix ?
as
? = diag(cos2 ?1,...,cos2 ?1,sin2 ?1,...,sin2 ?k,0,...,0bracehtipupleft bracehtipdownrightbracehtipdownleft bracehtipupright
n?2k
). (6.50)
Note that E = UE?UTE is an SVD (or EVD) of E. By Lemma 6.7.1, the projection
of E onto the space of orthogonal projection matrices of dimension n?n and rank
k is PE = UUT where U is of dimension n?k and U =
bracketleftbiggC(?1
2 ,...,
?k
2 )
S(?12 ,...,?k2 )
0
bracketrightbigg
. This proves
the first statement. To prove item 2, we recall the reflection formula (6.19). The
geodesic reflection of S2 with respect to S1 is found by negating S(?1,...,?k) in
189
S2 = Q
bracketleftbiggC(?
1,...,?k)
S(?1,...,?k)
0(n?2k)?k
bracketrightbigg
. Note that
(2P1?In?n)S2 = Q
bracketleftbigg C(?
1,...,?k)
?S(?1,...,?k)
0(n?2k)?k
bracketrightbigg
. (6.51)
Therefore, (2P1?In?n)S2 spans the same subspace as S?2 (the geodesic reflection of
S2 with respect to S1).
6.8 Two Applications
There has been an increasing number of real-world applications of averaging or
finding the mean for manifold-valued data. Of course, most applications are related
to the cases where the underlying manifold is a homogenous space. Here, we consider
two less explored applications, where the ideas introduced in this chapter fit and
serve very well. The first one is the extension of the famous Nelder-Mead algorithm
to SO(n) for maximization of a non-smooth function. The second application is
the clustering of subspaces by extension of the K-means algorithm to Gk(n)1. Our
emphasis is on using the pairwise mean (instead of the Riemannian L2 or the MI
mean). Also for finding the pairwise mean and other related calculations we use the
efficient methods introduced before.
1The author envisioned these two applications in the early stages of this research. However,
later he found out that they had already been studied by other researchers. Papers [25, 63] and [37]
(and possibly few others) address the two mentioned applications, respectively. Nevertheless, the
treatment presented here is developed independently and differs from the ones in the mentioned
references.
190
6.8.1 Nelder-Mead Direct Search Algorithm in SO(n)
Direct search methods for optimization try to find a minimum of a function without
the use of derivative information [55]. They usually are very useful in cases where the
function is non-smooth, its derivative is expensive or impossible to compute or the
function is not a even numerical in its nature. A very popular direct search algorithm
known as Nelder-Mead simplex algorithm [70] can be easily extended from Rn to a
manifold and in fact it can be efficiently implemented in SO(n) or Gk(n) using the
methods introduced, before. Interestingly, the Nelder-Mead algorithm encompasses
all the operations we mentioned so far: finding the center of mass, geodesic midpoint
assignment and geodesic reflection of a point with respect to another one. We briefly
describe the algorithm and give an example of its use for solving a non-differentiable
maximization problem with applications in adaptive optics.
In Table 6.4 a pseudo code for the Nelder-Mead algorithm which is adapted to
a Riemannian manifold M is given. The adaptation of the original Euclidean version
to M is straightforward, and mainly is achieved by the obvious use of the exponential
map of M. The algorithm consists of very simple steps. Denote the dimension of
M by m. The algorithm will be given initial m+1 points considered as vertices of a
simplex. Then, through a sequence of function evaluations, numerical comparisons,
center of mass calculations and reflections the initial simplex will move, shrink or
expand, and typically by getting close to a minimum point the simplex starts to
shrink such that it finally will converge to that point. The stopping criterion in
Step 3, compares the size of the current simplex with a given threshold to stop. We
191
should mention that except for special cases there is no proof for convergence of
this algorithm. Nevertheless, the Nelder-Mead algorithm has remained a popular
tool among many practitioners in many different fields [55]. Here, we choose to use
the pairwise mean as the center of mass in Step 4. Note that, interestingly, all the
operations in Steps 4-8 can be calculated via pairwise operations. Also note that
the operation of expansion in Step 6 which gives xe = exp?x(?2exp?1?x xm+1) can be
viewed as the reflection of ?x with respect to xr which itself is the reflection of xm+1
with respect to ?x, and was computed in Step 5. Our concern, of course, should be
that the defined operations remain well-defined. For this also there is no guarantee,
unless we keep track of the points and force them to remain in valid distances, which
can be prohibitively expensive or impractical. Nevertheless, one hopes that in the
case of SO(n) or Gk(n) by starting in a large convex 2-norm Finsler ball the iterates
will stay there.
6.8.1.1 An Application in Adaptive Optics: Maximization of a
Non-differentiable Function in SO(n)
In [27], for applications in adaptive optics, maximization of the non-differentiable
cost function f : O(n)?R defined as
f(x) =
nsummationdisplay
i=1
max
1?j?N
(xCjxT)ii (6.52)
is proposed. Here, {Ci}Ni=1(N ? 2) is a set of symmetric matrices. Without loss
of generality we can assume that the domain of f is SO(n). The mentioned max-
192
Nelder-Mead Algorithm on a Riemannian Manifold
1. Let (M,d) be a complete Riemannian manifold of dimension m with
exponential map expx at x?M. Consider f : M ?R. Set ? > 0
2. Choose x1,...,xm+1 ?M.
3. if maxi,j d(xi,xj) < ?, then stop; else, reorder the points such that
f(x1)?...?f(xm+1) and goto step 4.
4. Find ?x the center of mass of{xi}mi=1.
5. (Reflection) Find xr the reflection of xm+1 w.r.t. ?x as xr =
exp?x(?exp?1?x xm+1)
(a) if f(x1)?f(xr) < f(xm), then xm+1 ??x and goto step 3.
6. (Expansion) if f(xr) < f(x1), then find xe the expansion of xm+1 w.r.t.
?x as xe = exp?x(?2exp?1?x xm+1).
(a) if f(xe) < f(x1), then set xm+1 ? xe and goto 3, else set
xm+1 ?xr and goto step 3
7. (Contraction) else find xc the contraction of xm+1 w.r.t. ?x as xc =
exp?x(12 exp?1?x xxm+1) = ?1
2
(?x,xm+1)
(a) if f(xc)?f(xm+1), then set xm+1 ?xc and goto 3
(b) else goto step 8
8. (Reduction) For i = 2,...,m+ 1 set xi??1
2
(xi,x1) and goto step 3
Table 6.4: Pseudo-code of the Nelder-Mead algorithm on a Riemannian manifold
M.
imization problem does not have a unique solution [27]. In the special case that
N = 2, C1 = C and C2 = 0n?n, the maximum value of f is equal to sum of the
positive eigenvalues of C, and certainly a solution in this case is the transpose of the
eigen-matrix of C [27]. For this special case we examine the Nelder-Mead algorithm
adapted to SO(n) to minimize?f. Let C be as follows (to 4 decimal digits)
C =
bracketleftbigg1.0133 0.7204 1.0904 0.1530
0.7204 ?0.4630 ?0.5926 1.4637
1.0904 ?0.5926 0.8661 ?0.9719
0.1530 1.4637 ?0.9719 ?0.6416
bracketrightbigg
. (6.53)
193
The eigenvalues of C are 2.2136,1.6085,?0.9703 and?2.0769. Therefore, the max-
imum of f which is equal to the sum of positive eigenvalues of C is 3.8220. To start
the Nelder-Mead algorithm, we need to choose a simplex with m+1 vertices, where
m is the dimension of M. Here, m = n(n+1)2 = 10. We generate m+1 = 11 random
points inside the Finsler ball B2(0, ?2) as the initial simplex. We use the pairwise
mean as the center of mass in Step 4 of the algorithm. For this step, reflection,
expansion, contraction and reduction steps (see Steps 5-8) we use efficient interpo-
lation and extrapolation schemes introduced in Theorem 6.7.1. Note that in the
reflection step 5 we have xr = ?xxTm+1?x, and in the expansion step xe is the reflection
of ?x with respect to xr; therefore xe = xr?xTxr, which can be computed efficiently
again. After almost 110 iterations the calculated value for maxf will be equal to
3.8220, in four decimal digits. Denote the solution in four decimal digits by x?. We
have x? and x?CxT? as follows:
x? =
bracketleftbigg 0.5325 ?0.5910 ?0.5908 ?0.1346
?0.7519 ?0.0551 ?0.6477 0.1102
0.3817 0.6164 ?0.4003 0.5605
?0.0734 ?0.5175 0.2670 0.8097
bracketrightbigg
and x?CxT? =
bracketleftbigg?1.0807 0.0000 0.0000 ?0.3316
0.0000 2.1020 0.2347 ?0.0000
0.0000 0.2347 1.7200 0.0000?0.3316 ?0.0000 0.0000 ?1.9665
bracketrightbigg
.
(6.54)
Note that with C1 = C as above and C2 = 04?4, we have f(x?) = 4.8311. The
convergence is slow in terms of the number of iterations. However, we should note
that the computation load for each step is not significant. To see the behavior of the
calculated maximizer in terms of iteration index, we have tabulated the approximate
maximum in 15 decimal digits in Table 6.5. The last row shows maxf in 15 decimal
digits, which is found using the (numerical) eigenvalues of C. We should note,
despite their slow convergence, in many practical problems direct search methods
194
iteration i computed maxf
1 2.99792831533123
10 3.40130355940283
20 3.74564184272491
30 3.77369724540551
40 3.81199282263081
50 3.81796650623473
..
.
..
.
110 3.822008220185086
..
.
..
.
180 3.82201461161657
190 3.82201461242503
200 3.82201461328261
15digit 3.82201461365772
Table 6.5: Computed maximum of the non-differentiable function f in (6.52) in
terms of the iteration index in the Nelder-Mead algorithm in SO(4).
including the Nelder-Mead algorithm are the only choices.
6.8.2 Clustering of Subspaces
Data clustering is an important task in many data and signal processing algorithms.
The standard clustering problem consists of clustering N data points x1,...,xN in
Rn to Nc clusters C1,...,CNc such that the members in cluster Ci are closer to
the center of Ci than to the center of any other clusters. By the center of Ci we
mean the Euclidean mean of the data points inCi. The center of each cluster can
be used as a representative of the cluster or its members in subsequent analysis.
A very popular algorithm to solve this problem is an iterative algorithm called
the K-means algorithm [39]. In occasions, we might encounter a situation where
our data points are linear subspaces themselves and we would like to cluster the
data. This problem can be named as Grassmannian Clustering [37]. Interestingly,
Grassmannian-valued data appear in different applications and clustering of such
data has many potential applications. Two interesting application are the pre-coder
codebook design [68] and channel quantization in Multi-Input Multi-Output wireless
195
communication channels [48, 54]. In [37] a Grassmannian K-means algorithm and
in [54] a similar K-means algorithm together with the related Vector Quantization
algorithm are proposed. These two algorithms use the extrinsic mean (see Section
6.7) for points on Gk(n), mostly due to the ensued ease of calculation. Here, we
implement a K-means algorithm with the pairwise mean, which is the easiest mean
to compute (among all the means we studied).
K-means Algorithm on a Riemannian Manifold
1. Let (M,d) be a complete Riemannian manifold. Consider a set of data
points{xi}Ni=1. Fix Nc?N.
2. Randomly partition{xi}Ni=1 into Nc non-empty clustersC1,...,CNc.
3. for i = 1,...,Nc, find ?xCi the center ofCi.
4. for i = 1,...,N, if d(xi, ?xCj)?d(xi, ?xCl)?lnegationslash= j then place xi inCj. if
more than one j satisfies that condition choose one of them randomly.
5. if one of the clusters is empty assign one data point to it randomly.
6. if in Step 4, none of the clusters have changed, then stop; else goto
Step 3.
Table 6.6: Pseudo-code for the K-means algorithm on a Riemannian Manifold M.
It is assumed that finding the center of mass is a well-defined operation.
6.8.2.1 An Example of Clustering of Subspaces With the K-means
Algorithm
In our example, we generate some data points in Gk(n) (k = 3 and n = 20) around
Nc = 3 centers, and then try to cluster the data points and recover the centers
via the K-means algorithm. First, we briefly describe the process to generate these
data points. Denote the centers by c1,c2,c3 ? Gk(n) and the associated subspace
representation by S1,S2,S3?Rn?k. We set d2(c1,c2) = ?24 = l2 and d2(c1,c3) = ?6 =
196
l3 and generate c2 and c3 with reference to c1; hence, we can set S1 =
bracketleftbigg
Ik?k
0
bracketrightbigg
. For
i = 2,3 we generate
Ci =
bracketleftbigg
0 ??CTi
?Ci 0
bracketrightbigg
, (6.55)
where the elements of ?Ci are of uniform distribution on the interval [0,1]. We then
normalize Ci as Ci ? CibardblCibardbl2li. Now Si is equal to the first k columns of eCi. Note
that C1 = 0. Next, we generate the actual data points around these centers. We
generate n1 = 6, n2 = 3 and n3 = 2 points around c1, c2 and c3, respectively, as
follows:
Xi,j = Ci +?
bracketleftbigg
0 ?WTi,j
Wi,j 0
bracketrightbigg
1?i?Nc, 1?j?ni, (6.56)
where Wi,j ?R(n?k)?k manifests the noise, and is of standard normal elements and
? is a constant controlling the noise power. We set Xij as the first k columns
of eXi,j. After shu?ing all these subspaces we relabel them as X1,...,XN, where
n = n1 + n2 + n3 = 11. The parameter ? is a variable that we shall change. First
we set ? = 0.05. We implement the K-means algorithm as in Table 6.6 and choose
the pairwise mean to find the center of mass of the clusters in Step 3. We perform
20 iterations to compute the pairwise mean. If the algorithm recovers the clusters
correctly, then either based on the number of points in each estimated cluster or
based on the distances d(ci,?cj) we can recover the label of each of center. We do
this relabeling and show the distances between the estimated centers of clusters and
the original centers in Table 6.7. The left panel in Table 6.7 shows the median of
the distances between the estimated centers and the original centers for T = 100
197
d(ci,?ci) ?c1 ?c2 ?c3
c1 0.0416 0.5657 0.4712
c2 0.5614 0.0392 0.4592
c3 0.4639 0.4613 0.0406
d(ci,?ci) ?c1 ?c2 ?c3
c1 0.1884 0.5268 0.3821
c2 0.5310 0.1885 0.4356
c3 0.3911 0.5160 0.1434
Table 6.7: Distances between the centers of the estimated clusters by the K-means
algorithm (on G3(20)) and the original centers in the example of Subsection 6.8.2.1.
Each number is the median of the results of 100 independent runs of the experiment.
The table on the left shows the distances for noise level ? = 0.05, and the one on
the right shows the distances for ? = 0.20. The ability of the K-means algorithm
to correctly detect the clusters reduces as ? increases.
independent runs. The distances are computed from (6.20). The centers are kept
fixed and the data around them are generated in each run. Next, we increase the
noise level ? to 0.20, and show the results of T = 100 runs in the right panel of
Table 6.7. Expectedly, the ability of the K-means algorithm to resolve the clusters
decreases by increasing the noise, and as can be seen d(ci,?ci) increases as ? increases.
198
Chapter 7
Conclusions and Future Research Directions
In this dissertation we studied some mostly theoretical aspects of the problem of
averaging on Riemannian manifolds. There are different ways of defining means of
data points on a Riemannian manifold, and we studied in detail two of them: the
Riemannian Lp mean (in Chapters 3 and 4) and the recursive-iterative means (in
Chapter 5). The most immediate question is that of existence and uniqueness of a
defined mean. In Theorem 3.2.1 one such a question is answered for Riemannian Lp
means. This result improves upon previous results such as the one in [61]. However,
there might be room for further improvement, e.g., in Theorem 3.2.1 can we get
rid of the specific shape of the domain that includes the support of the probability
measure (i.e., a ball)?
In Chapter 4, we studied the L? center of mass, smoothness and convexity
properties of the Riemannian Lp mean, and averaging or sensitivities properties of
the L2 mean. With regards to convexity there are still several open questions as
explained in Section 4.5 which especially might be of interest to pure geometers, as
the convex hull of finite points in a Riemannian manifold is still a mystery. Possible
199
practical implications of the non-averaging properties whcih we explained in Section
4.6 can be a subject for further research. The fact is that in a manifold of positive
curvature, our intuition about the notion of average or mean might turn out to be
wrong, and this issue deserves further experimental study. Also we introduced the
notion of weak averaging property which is a global notion, and in Theorem 4.6.1
we identified the radius of a ball containing the data points to guarantee the weak
averaging. We also conjecture that this radius can be improved (see Conjecture
4.6.1). We did not study in detail the computational methods for finding the Lp
mean. However, convergence analysis of gradient descent and other algorithms are
widely open questions. See [61] for a result which guarantees convergence of the
gradient descent for finding the L2 mean in symmetric spaces, provided the data
points are in a ball which is smaller than the one that is needed to have a unique
mean.
In Chapter 5 we studied a large class of recursive-iterative means which are
defined based on the notion of mean-invariance. Our analysis generalizes the theories
developed in [5] and [59]. The notion of a primitive vectorial mean function and
the use of the radius of the minimal ball as a Lyapunov function allowed us to
develop a theory, in Section 5.2, which parallels the Perron-Frobenius theory about
the infinite power of a primitive stochastic matrix. Of course, an important part of
the Perron-Frobenius theorem is the explicit description of the limit mean function,
which in a Riemannian manifold is very difficult describe. We also studied the
dynamical system point of view of recursive-iterative means. We also studied the
simple pairwise mean and its relation with Perimeter Shrinkage scheme and cyclic
200
pursuit on manifolds. An open problem in this regard is to design an algorithm
which brings a set of agents in cyclic pursuit to a closed geodesic in a stable fashion.
We also studied weighted MI and Generalized Weighted MI (GWMI) means which
can resemble the behavior of the Lp means. A main shortcoming of the recursive-
iterative means is that it is not easy to associate statistical notions to them. For
example, it is not obvious what the best way is to define a quantity associated with
the MI mean which parallels the variance in the case of the L2 mean.
In Chapter 6, we focused on results specific to SO(n) and Gk(n). These man-
ifolds appear naturally in many applications, and numerical computations on them
boil down to standard matrix calculations. Our major focus was on identifying 2-
norm Finsler balls of appropriate radius as domains of existence and uniqueness of
the Lp and MI means. Our efforts were not completely successful especially with
respect to the Lp mean. In particular, we were only able to show that in a 2-norm
ball of radius smaller than half of the largest strongly convex 2-norm Finsler ball,
the local (and not global) Lp mean exists uniquely (see Theorem 6.5.1). Also while
we could show that in the largest convex Finsler balls the MI and WMI means exist
uniquely, we were not able to extend this result to GWMI means (see Theorem 6.6.1
and remarks afterwards). As explained in the remarks of Theorem 6.6.1, we conjec-
ture that these results can beimproved. We also studied some computational aspects
of averaging on SO(n) and Gk(n). In particular, using some standard embeddings
of SO(n) and Gk(n) in Euclidean space and in the space of symmetric matrices,
respectively, we gave the formulas to find the midpoint of the minimizing geodesic
between two points without the use of the matrix exponential and logarithm. This
201
calculation can be used to efficiently find the MI and pairwise means. We also stud-
ied two simple examples: the Nelder-Mead direct search algorithm on SO(n) and
the K-means clustering algorithm on Gk(n) both of which have components that
can be implemented efficiently via the methods we introduced.
Besides the above mentioned open problems which are directly related to our
work, there is a major research direction that can be pursued. An important class of
problems arises when we know that a data set belongs to an unknown manifold, e.g.,
a low dimensional manifold and we still want to process this data set. Therefore,
one has to both learn the manifold and its exponential map, and perform data
analysis. Learning a manifold has close relations with approximating a manifold
with a graph (see e.g., [86, 57, 8]), and it would be very interesting to study existence
and uniqueness conditions and other properties of the means based on the quantities
associated with the graph of the data.
202
Bibliography
[1] U. Abresch and W. T. Meyer. Pinching below 14, injectivity radius, and conju-
gate radius. Journal of Differential Geometry, 40(3):643?691, 1994.
[2] P.-A. Absil, R. Mahony, and R. Sepulchre. Riemannian geometry of Grass-
mann manifolds with a view on algorithmic computation. Acta Applicandae
Mathematicae, 80(2):199?220, January 2004.
[3] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix
Manifolds. Princeton University Press, Princeton, NJ, 2008.
[4] J. C. Alvarez and C. Duran. An introduction to Finsler geometry.
http://www.math.poly.edu/research/finsler/intro/one.html.
[5] T. Ando, C. K. Li, and R. Mathias. Geometric means. Linear Algebra and its
Applications, 385:305?334, 2004.
[6] V. Arsigny. Processing Data in Lie Groups: An Algebraic Approach. Applica-
tion to Non-Linear Registration and Diffusion Tensor MRI. PhD thesis, ?Ecole
Polytechnique, Nov. 2006.
[7] E. Begelfor and M. Werman. Affine invariance revisited. In IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, volume 2,
pages 2087?2094, 2006.
[8] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and
data representation. Neural Computation, 15(6):13731396, 2003.
[9] M. Berger. A Panoramic View of Riemannian Geometry. Springer, 2007.
[10] A. Bhattacharya. Nonparametric Statistics On Manifolds With Applications To
Shape Spaces. PhD thesis, The University of Arizona, Nov. 2008.
[11] R. Bhattacharya and V. Patrangenaru. Large sample theory of intrinsic and
extrinsic sample means on manifolds. I. The Annals of Statistics, 31(1), 1-29
2003.
203
[12] D. Le Bihan, J.-F. Mangin, C. Poupon, C. A. Clark, S. Pappata, N. Molko, and
H. Chabriat. Diffusion tensor imaging: Concepts and applications. Journal of
Magnetic Resonance Imaging, 13:534?546, 2001.
[13] A. Blochand I. Iserles. Commutators of skew-symmetric matrices. International
Journal of Bifurcation and Chaos, 15:793?801, 2005.
[14] A. M. Bruckstein. Why the ant trails look so straight and nice. The Mathe-
matical Intelligencer, 15(2):59?62, June 1993.
[15] D. Burago, Y. Burago, and S. Ivanov. A Course in Metric Geometry, volume 33
of Graduate Studies in Mathematics. American Mathematical Society, 2001.
[16] P. Buser and H. Karcher. Gromov?s almost flat manifolds. Soci?et?e
Math?ematique de France, 1981.
[17] S.R. Buss and J.P. Fillmore. Spherical averages and application to spherical
splines and interpolation. ACM Transcations on Graphics, 20(2):95?126, April
2001.
[18] I. Chavel. Riemannian Geometry: A Modern Introduction. Cambridge Univer-
sity Press, second edition edition, 2006.
[19] J. Cheeger and D. Gromoll. On the structure of complete manifolds of nonneg-
ative curvature. Annals of Mathematics, 96(3):413?443, November 1972.
[20] F. H. Clarke, Yu. S. Ledyaev, R.J. Stern, and P. R. Wolenski. Nonsmooth
Analysis and Control Theory. Graduate Texts in Mathematics. Springer, 1998.
[21] T. H. Colding and W. P. Minicozzi II. Width and mean curvature flow, July
2007. arXiv0705.3827C.
[22] P. I. Davies and N. J. Higham. A Schur-Parlett algorithm for computing matrix
functions. SIAM Journal on Matrix Analysis and Applications, 25(2):464?485,
2003.
[23] L. Dieci. Considerations on computing real logarithms of matrices, Hamiltonian
logarithms, and skew-symmetric logarithms. Linear Algebra and its Applica-
tions, 244:35?54, 1996.
[24] M. P. do Carmo. Riemannian Geometry. Birkhauser, 1992.
[25] D. W. Dreisigmeyer. Direct search algorithms over Riemannian manifolds,
December 2006. Available at http://ddma.lanl.gov/Documents/publications
/dreisigm-2007-direct.pdf.
[26] A. Edelman, T. A. Arias, and S. Smith. The geometry of algorithms with
orthogonality constraints. SIAM Journal on Matrix Analysis and Applications,
20(2):303?353, 1998.
204
[27] B. L. Ellerbroek, C. Van Loan, N. P. Pitsianis, and R. J. Plemmons. Optimiz-
ing closed-loop adaptive-optics performance with use of multiple control band-
widths. Journal of Optical Society of America A, 11(11):2871?2886, November
1991.
[28] N. I. Fisher, T. Lewis, and B. J. J. Embleton. Statistical Analysis of Spherical
Data. Cambridge University Press, 1987.
[29] P.T. Fletcher, L. Conglin, S.M. Pizer, and S. Joshi. Principal geodesic analysis
for the study of nonlinear statistics of shape. IEEE Transactions on Medical
Imaging, 23(8):995?1005, Aug 2004.
[30] K. A. Gallivan, A. Srivastava, X. Liu, and P. Van Dooren. Efficient algorithms
for inferences on Grassmann manifolds. In Proc. 12th IEEE Workshop on
Statistical Signal Processing, St. Louis, October 2003.
[31] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Uni-
versity Press, 1996.
[32] D. Groisser. Newton?s methods, zeros of vector fields, and the Riemannian
center of mass. Advances in Applied Mathematics, 33, Nov 2004.
[33] K. Grove. Riemannian Geometry: A Metric Entrance. Number 65 in Lecture
Note Series. Department of Mathematics, University of Aarhus, 2nd edition,
Sept. 2002.
[34] K. Grove and H. Karcher. How to conjugate C1-close group actions? Mathe-
matische Zeitschrift, 132(1):11?20, March 1973.
[35] K. Grove, H. Karcher, and E. A. Ruh. Groupactions andcurvature. Inventiones
math, 23:31?48, 1974.
[36] K. Grove, H. Karcher, and E.A. Ruh. Jacobi fields and Finsler metrics on
compact Lie groups with an application to differentiable pinching problems.
Math. Ann., 211:7?21, 1974.
[37] P. Guber and F. J. Theis. Grassmann clustering. In Proceedings of the European
Signal Processing Conference(EUSIPCO), 2006.
[38] B. C. Hall. Lie Groups, Lie Algebras, and Representations. Graduate Texts in
Mathematics. Springer, 2003.
[39] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning.
Springer Series in Statistics. Springer, 2001.
[40] S. Helgason. Differential Geometry and Symmetric Spaces. Academic Press,
1962.
[41] U. Helmke, K. Hper, and J. Trumpf. Newton?s method on Gra?mann manifolds,
2007. arXiv:0709.2205v2 [math.OC].
205
[42] U. Helmke and J.B. Moore. Optimization and Dynamical Systems. Springer,
2nd edition, 1996.
[43] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press,
1985.
[44] M. Humbert, N. Gey, J. Muller, and C. Esling. Determination of a mean
orientation from a cloud of orientations. application to electron back-scattering
pattern measurements. Journal of Applied Crystallography, 29:662?666, 1996.
[45] A. Iserles, H.Z. Munthe-Kaas, S.P. Norsett, and A. Zanna. Lie-group methods.
Acta Numerica, 9:215?365, 2000.
[46] J. Jost. Riemannian Geometry and Geometric Analysis. Universitext. SPringer,
3rd edition, 2002.
[47] J. Jost and Y.L. Xin. Bernstein type theorems for higher codimension. Journal
Calculus of Variations and Partial Differential Equations, 9(4):277?296, 1999.
[48] R. W. Heath Jr. Communication and signal processing on the
Grassmann manifold. 9th IEEE Workshop on Signal Process-
ing for Wireless Communications 2008 Plenary Talk, Available at:
http://spawc2008.org/Plenaries/SPAWC2008 Heath.pdf, July 2008.
[49] E.W. Justh and P.S. Krishnaprasad. Equilibria and steering laws for planar
formations. Systems & Control Letters, 52:25?38, 2004.
[50] H. Karcher. Riemannian center of mass and mollifier smoothing. Communica-
tions of Pure and Applied Mathematics, XXX:509?541, 1977.
[51] H. Karcher. Global Differential Geometry, volume 27 of MAA Studies in Math-
ematics, chapter Riemannian Comparison Constructions, pages 170?222. The
Mathematical Association of America, 1989.
[52] W. S. Kendall. Probability, convexity, and harmonic maps with small image I:
Uniqueness and fine existence. Proceedings of the London Mathematical Society,
61(2):371?406, 2 1990.
[53] H. K. Khalil. Nonlinear Systems. Prentice Hall, third edition, 2002.
[54] J. H. Kim, W. Zirwas, and M. Haardt. Efficient feedback via subspace-based
channel quantization for distributed cooperative antenna systems with tempo-
rally correlated channels. EURASIP Journal on Advances in Signal Process,
January 2008. Article No. 131.
[55] T. G. Kolda, R. M. Lewis, and Virginia Torczon. Optimization by direct
search: New perspectives on some classical and modern methods. SIAM Re-
view, 45(3):385?482, 2003.
206
[56] K. A. Krakowski. Geometrical Methods of Inference. PhD thesis, The University
of Western Australia, Aug. 2002.
[57] S. Lafon and A. B. Lee. Diffusion maps and coarse-graining: A unified frame-
work for dimensionality reduction, graph partitioning, and data set parame-
terization. IEEE Transactions on Pattern Analysis and Machine Intelligence,
28(9):1393?1403, Sept. 2006.
[58] S. Lang. Fundamentals of Differential Geometry. Graduate Texts in Mathe-
matics. Springer, 1999.
[59] J. Lawson and Y. Lim. A general framework for extending means to higher
orders, 2006. arXiv.org:math/0612293.
[60] M. Lazard and J. Tits. Domaines d?injectivite de l?application exponentielle.
Toplogy, 4:315?322, 1966.
[61] H. Le. Locating Fr?echet means with application to shape spaces. Advances in
Applied Probability, 33(2):324?338, July 2001.
[62] H. Le and D. G. Kendall. The Riemannian structure of Euclidean shape spaces:
A novel environment for statistics. Annals of Statistics, 21(3):1225?1271, 1993.
[63] S. Li, M. Choi, H. Kim, and F.C. Park. Geometric direct search algorithms
for image registration. IEEE Transactions on Image Processing, 16(9):2215 ?
2224, Sept. 2007.
[64] D. J. Love, R. W. Heath Jr., and T. Strohmer. Grassmannian beamforming
for Multiple-Input Multiple-Output wireless systems. IEEE Transactions on
Information Theory, 49(10):2735?2747, Oct. 2003.
[65] K. V. Mardia and P. E. Jupp. Directional Statistics. Wiley, 2000.
[66] W. Meyer. Toponogov?s theorem and applications. Lecture notes for the College
on Differential Geometry at Trietse, 1989. Available at http://wwwmath.uni-
muenster.de/u/meyer/publications/toponogov.html.
[67] M. Moakher. Means and averaging in the group of rotations. SIAM Journal
on Matrix Analysis and Applications, 24(1):1?16, 2002.
[68] B. Mondal, R. W. Heath Jr., and L. Hanlen. Quantization on the Grassmann
manifold: Applications to precoded MIMO wireless systems. IEEE Transac-
tions on Signal Processing, 5(8):1025?1028, March 2005.
[69] L. Moreau. Stability of multiagent systems with time-dependent communi-
cation links. IEEE Transactions on Automatic Control, 50(2):169?182, Feb.
2005.
[70] J. A. Nelder and R. Mead. A simplex method for function minimization. The
Computer Journal, 7(4):308?313, 1965.
207
[71] R. Olfati-Saber, J. A. Fax, and R. M. Murray. Consensus and cooperation
in networked multi-agent systems. Proceedings of the IEEE, 95(1):215?233,
January 2007.
[72] J.C. Alvarez Paiva and A.C. Thompson. Volumes on Normed and Finsler
Spaces, volume 50 of Riemann-Finsler Geometry. MSRI, 2004.
[73] D. A. Paley. Stabilization of collective motion on a sphere. Automatica 45,
45:212?216, 2009.
[74] F. C. Park and B. Ravani. Smooth invariant interpolation of rotations. ACM
Transactions on Graphics, 16(3):277?295, July 1997.
[75] X. Pennec. Statistical Computing on Manifolds for Computational Anatomy.
Habilitation `a diriger des recherches, Universit?e Nice Sophia-Antipolis, Decem-
ber 2006.
[76] P. Petersen. Riemannian Geometry. Springer, 2006.
[77] D. Petz. Means of positive matrices: Geometry and a conjecture. Annales
Mathematicae et Informaticae, 32:129?139, 2005.
[78] T. Sakai. Riemannian Geometry, volume 149. American Mathematical Society,
1996.
[79] A. Sarlette and R. Sepulchre. Consensus optimization on manifolds. SIAM
Journal on Control and Optimization, 48(1):56?76, 2009.
[80] A. Sarlette, R. Sepulchre, and N.E. Leonard. Autonomous rigid body attitude
synchronization. In Proceedings of the 46th IEEE Conference on Decision and
Control, New Orleans, LA (USA), pages 2566?2571, December 2007.
[81] A. Sarlette, R. Sepulchre, and N.E. Leonard. Cooperative attitude synchro-
nization in satellite swarms: A consensus approach. In Proceedings of the 17th
IFAC Symposium on Automatic Control in Aerospace, Toulouse (France), June
2007.
[82] R. Schneider. Convex Bodies: The Brunn-Minkowski Theory. Cambridge Uni-
versity Press, 1993.
[83] K. Shoemake. Animating rotation with quaternion curves. In ACM SIG-
GRAPH, volume 19, pages 245?254, 1985.
[84] S. T. Smith. Geometric optimization methods for adaptive filtering. PhD thesis,
Harvard University, May 1993.
[85] A. Srivastava. A Bayesian approach to geometric subspace estimation. IEEE
Transactions on Signal Processing, 48(5):1390?1400, May 2000.
208
[86] J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for
nonlinear dimensionality reduction. Science, 290:23192323, 2000.
[87] P. Turaga, A. Veeraraghavan, and R. Chellappa. Statistical analysis on Stiefel
and Grassmann manifolds with applications in computer vision. In IEEE con-
ference on Computer Vision and Pattern Recognition (CVPR), pages 1?8, June
2008.
[88] J. Wallner and N. Dyn. Convergence and C1 analysis of subdivision schemes
on manifolds by proximity. Computer Aided Geometric Design, 22(7):593?622,
2005.
[89] A. Weinstein. Almost invariant submanifolds for compact group actions. Jour-
nal of the European Mathematical Society, 2(1):53?86, March 2000.
[90] J. Wolfowitz. Products of indecomposable, aperiodic, stochastic matrices. Pro-
ceedings of the American Mathematical Society, 15:733?736, 1963.
[91] Y. C. Wong. Differential geometry of grassmann manifolds. Proceedings of the
National Academy of Sciences of the United States of America, 57(3):589?594,
March 1967.
[92] M. Zefran and V. Kumar. Planning smooth motions on SE(3). In IEEE Inter-
national Conference on Robotic Automation, pages 121?126, Minneapolis, MN,
April 1996.
209