ABSTRACT Title of dissertation: MEANS AND AVERAGING ON RIEMANNIAN MANIFOLDS Bijan Afsari, Doctor of Philosophy, 2009 Dissertation directed by: Professor P. S. Krishnaprasad Department of Electrical and Computer Engineering, and Program of Applied Mathematics and Scientific Computations Professor K. Grove Department of Mathematics Processing of manifold-valued data has received considerable attention in re- cent years. Standard data processing methods are not adequate for such data. Among many related data processing tasks finding means or averages of manifold- valued data is a basic and important one. Although means on Riemannian man- ifolds have a long history, there are still many unanswered theoretical questions about them, some of which we try to answer. We focus on two classes of means: the Riemannian Lp mean and the recursive-iterative means. The Riemannian Lp mean is defined as the solution(s) of a minimization problem, while the recursive-iterative means are defined based on the notion of Mean-Invariance (MI) in a recursive and iterative process. We give a new existence and uniqueness result for the Rieman- nian Lp mean. The significant consequence is that it shows the local and global definitions of the Riemannian Lp mean coincide under an uncompromised condi- tion which guarantees the uniqueness of the local mean. We also study smoothness, isometry compatibility, convexity and noise sensitivity properties of the Lp mean. In particular, we argue that positive sectional curvature of a manifold can cause high sensitivity to noise for the L2 mean which might lead to a non-averaging behavior of that mean. We show that the L2 mean on a manifold of positive curvature can have an averaging property in a weak sense. We introduce the notion of MI, and study a large class of recursive-iterative means. MI means are related to an inter- esting class of dynamical systems that can find Riemannian convex combinations. A special class of the MI means called pairwise mean, which through an iterative scheme called Perimeter Shrinkage is related to cyclic pursuit on manifolds, is also studied. Finally, we derive results specific to the special orthogonal group and the Grassmannian manifold, as these manifolds appear naturally in many applications. We distinguish the 2-norm Finsler balls of appropriate radius in these manifolds as domains for existence and uniqueness of the studied means. We also introduce some efficient numerical methods to perform the related calculations in the specified manifolds. MEANS AND AVERAGING ON RIEMANNIAN MANIFOLDS by Bijan Afsari Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2009 Advisory Committee: Professor P. S. Krishnaprasad, Co-Chair and Advisor Professor Karsten Grove, Co-Chair and Co-Advisor Professor Pamela Abshire (Dean?s Representative) Professor Ramalingam Chellappa Professor Ricardo Nochetto c?Copyright by Bijan Afsari 2009 Dedication This dissertation is dedicated to my parents. ii Acknowledgments I feel privileged to have the opportunity to learn from two fine scientists: Professors P. S. Krishnaprasad and Karsten Grove. Prof. Krishnaprasad introduced me to geometry and geometrical thinking in studying many applied problems. His methodology of thinking will have a lasting impact on me. I am also thankful for his supports of all kind during the past several years, as well as for his patience in dealing with a not-so-patient student like me. I have learnt a lot about Riemannian geometry from Prof. Grove, and I have become an admirer of his thinking style. I certainly enjoyed my trips to the University of Notre Dame and discussions with him there. I have to thank my advisors for giving me the freedom and independence to think about problems that I found interesting. As far as this dissertation is concerned, realistically, after my advisors the next entity to thank is the Google search engine, for obvious reasons! I should also thank my Ph.D. committee members for their insightful com- ments. I cannot forget the helps and cares from the people in the math department especially Alverda McCoy and Celeste Regaldo. Another group of people whom I have to thank is the nice people of International Education Services (IES) office at UMD. Throughout my undergraduate studies in Sharif University, and my graduate studies in Tehran Polytechnic and the University of Maryland, I have benefited from attending many classes taught by excellent teachers. Among them I would iii like to thank Professors: H. Amindavar, M. Dehghan, M. Freidlin, H. Hashemi, M. Jahanbegloo, M. Machedon, D. Madan, R. Nochetto, H. Sheikhzadeh, and A. Tits. My list of friends who made my graduate life almost always enjoyable is too long, all of whom I thank. Fortunately, the list of those who directly influenced this dissertation is short: I thank Arash for his helps with LATEXand Pedram for discussions about consensus algorithms. I also thank Ali Hirsa for several reasons. I thank my family for their constant support. I cannot imagine finishing this dissertation without the helps from brother Bahman. This research was supported in part by the Army Research Office under the ODDR&E MURI01 Program Grant No. DAAD19-01-1-0465 to the Center for Communicating Networked Control Systems (through Boston University), by NSF- NIH Collaborative Research in Computational Neuroscience Program (CRCNS2004) NIH-NIBIB grant 1 R01 EB004750-1, and by NSF grants DMS-0204671 and DMS- 0706791. iv Table of Contents List of Tables viii List of Figures ix 1 Introduction 1 1.1 Scope and Main Contributions of this Dissertation . . . . . . . . . . . 3 1.2 Outline of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . 6 2 Preliminaries from Riemannian Geometry 8 2.1 Introduction and Notations . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Some Notational Conventions . . . . . . . . . . . . . . . . . . 8 2.2 First Variation of Arc Length . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Structure of Convex Sets in Riemannian Manifolds . . . . . . . . . . 11 2.4 Bounds on the Hessian of the Distance Function and Its Powers . . . 16 2.4.1 A Useful Length Comparison Estimate . . . . . . . . . . . . . 22 2.5 Cartan-Alexandrov-Toponogov (C.A.T.) Comparison Theorems . . . 24 3 Existence and Uniqueness of the Riemannian Lp Mean 30 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.1 On the History of the Riemannian Center of Mass . . . . . . . 32 3.1.2 Contributions and Outline of the Chapter . . . . . . . . . . . 38 3.2 Existence and Uniqueness of the Riemannian Lp Means . . . . . . . . 39 3.2.1 Proof of Theorem 3.2.1 . . . . . . . . . . . . . . . . . . . . . . 40 3.2.2 The Case of an Arbitrary Probability Measure in M . . . . . . 51 3.3 Some Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4 Properties of the Riemannian Lp Mean 54 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1.1 Contributions and Outline of the Chapter . . . . . . . . . . . 54 4.2 The L? Center of Mass and the Minimal Ball . . . . . . . . . . . . . 56 4.2.1 Lipschitz Continuity of f?(x) . . . . . . . . . . . . . . . . . . 59 4.3 Smoothness Properties of the Lp Means . . . . . . . . . . . . . . . . . 60 4.4 Isometry Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . 64 v 4.5 Convexity Properties of the Lp Means . . . . . . . . . . . . . . . . . . 65 4.5.1 A Refined Result for Constant Curvature Manifolds . . . . . . 70 4.6 Sensitivity Properties of the L2 Mean . . . . . . . . . . . . . . . . . . 72 4.6.1 Lack of Averaging Property in Positively Curved Manifolds . . 74 4.6.2 Weak Averaging Property in Positively Curved Manifolds . . . 79 4.6.3 A General Bound for Sensitivity . . . . . . . . . . . . . . . . . 82 4.7 Computing the Lp Means and an Example for Points on a Sphere . . 85 4.7.1 Gradient Descent Flow for Finding ?xp (1 < p 0 such that for all 0 < t < ?, d(q1,c(t)) < d(q1,q2). The angle condition ensures that the arc length functional for curves con- necting q1 to c(t) is strictly decreasing for small t > 0. Figure 2.1 explains the proposition. We will use this result in few occasions. c(0) = q2 q1 c(t) ? < ?2 ?d(q1,c(t)) < d(q1,q2), 0 < t < ? ? Figure 2.1: The first variation of arc length formula implies that, pro- vided ? < ?2, by moving along c(t) the distance to q1 decreases at least for small times. 10 2.3 Structure of Convex Sets in Riemannian Manifolds In order to study means in a Riemannian manifold M in the subsequent chapters, we need to have a good understanding of the structure of convex sets and balls in a Riemannian manifold. Theorems 2.3.1 and 2.3.2 below contain the main facts we need. We also prove some simple but useful corollaries of these theorems. For sake of completeness we recall some definitions first. Definition 2.3.1 ([78, p.168]). Let C be a non-empty subset of a Riemannian manifold M. 1. C is strongly convex if for any two points q1,q2 ? C, there exists a unique minimizing geodesic ? in M connecting q1 and q2, and ? is contained in C. 2. If for any q ? ?C, the closure of C, there exists an ?(q) > 0 such that C? B(q,?(q)) is strongly convex, then C is said to be locally convex. If C is strongly convex, then it is connected and locally convex. However, local convexity does not imply strong convexity. For example, on the unit sphere in R3, a closed hemisphere is locally convex but it is not strongly convex; since any two antipodal points on the boundary can be connected by two minimizing geodesics which lie entirely in the hemisphere. Small enough balls are strongly convex and the injectivity radius and upper bound on the sectional curvature of M determine a lower bound on the radius of largest strongly convex balls ([76, p. 177] and [18, p.404]): 11 Theorem 2.3.1. Let M be a complete Riemannian manifold with injectivity radius of injM and sectional curvature upper bound of ?. Let rcx := 12 min{injM, ???}, (2.1) we interpret 1? when ??0 as?1. Then for any o?M the open ball B(o,?) with ??rcx and closed ball B(o,?) with ? < rcx are strongly convex, and the distance function xmapsto?d(o,x) is strictly convex along non-radial directions2. This theorem and the definition of rcx are very important to us, as Theorem 3.2.1 which describes existence and uniqueness conditions concerns with strongly convex balls. We shall use the following simple corollary: Corollary 2.3.1. A closed ball of radius ? < rcx is strongly convex; and the interior of the geodesic segment connecting any two points on the boundary lies in the interior of the ball. Proof. Since B(o,?) ? B(o,??) for ? < ?? < rcx; and since x mapsto? d(o,x) is strictly convex in B(o,??) along non-radial directions, it follows that if q1negationslash= q2 are two points on the boundary of B(o,?) and ? : [0,1]?M is a geodesic connecting them (with ?(0) = q1,?(1) = q2), then d(?(t),o) < ? for t?(0,1). One can show that the intersection of any family of strongly convex (locally convex) sets in a complete Riemannian manifold is strongly convex (locally convex). 1We shall follow this convention in all similar bounds we derive. 2Recall that the exponential map expo : ToM ?M is a radial isometry, therefore d(o,?(1 2)) =d(o,?(0))+d(o,?(1)) 2 for a geodesic ? : [0,1]?M connecting o to x = ?(1) any x inside the injectivityball of o. 12 We are mainly interested in strongly convex sets, so we define: Definition 2.3.2. Let M be a complete Riemannian manifold. Let A?M lie in a strongly convex subset of M. The convex hull of A denoted by ConvHull(A) is the intersection of all strongly convex sets containing A. Definition 2.3.3 ([76] p.145). A submanifold S of the complete Riemannian man- ifold M is called totally geodesic if geodesics in S are also geodesics in M and any geodesic in M which is tangent to S at some point lies in S for some time. S is globally totally geodesic if such a geodesic stays in S for all time. According to this definition, a strongly convex open ball in M is a totally geodesic submanifold. This holds true for more general convex sets as the following theorem suggests. We state some results from [19] which are relevant. Theorem 2.3.2 (Cheeger and Gromoll). Let C be a connected locally convex subset of a Riemannian n-dimensional manifold M. Then the closure of C, C, carries the structure of an embedded k dimensional (0?k?n) topological submanifold N of M with smooth totally geodesic interior intC and (possibly non-smooth or empty) boundary ?C. For each x?C define the tangent cone Cx ={X?TxM|expxparenleftbigt XbardblXbardblparenrightbig?N for some positive t}?{0?TxM}. (2.2) Then Cx\{0} is a k-dimensional open convex cone in TxM. In case x ? intC, Cx = TxintC. In particular, if there is a point q ? intC and a normal minimal geodesic ? : [0,d(x,d(x,q)]?C from x??C to q such that the distance between 13 q and ?C is equal to d(x,q) then Cx\{0}is a half-space. Moreover, if q?intC and ? : [0,d(x,q)]?M is the normal minimizing geodesic connecting x??C to q, then ?((0,d(x,q)])?intC. The term?interior? here isnot meant initstopological meaning; asforexample a line segment in R3 has no topological interior yet it is closed and convex in R3. It is used in the same sprit as the ?relative interior? or ?interior? of a convex set in Euclidean space (see Appendix 4.8 for definitions). Therefore, we have the following useful corollary: Corollary 2.3.2. Let x be point on the boundary of B(o,?) ? M with ? < rcx. Then Bx\{0} ? TxM is an open half-space and there exists a vector nx ? Cx pointing inward B such that for any point q?B the (velocity vector of the) minimal geodesics connecting connecting x to q makes an angle less than ?2 with nx. Proof. Note that since Bx\{0} is an open convex set it must have a supporting hyperplane at 0 ? TxM and there should be a vector nx ? TxM pointing inward the cone which makes angles less than ?2 with all vectors pointing inside the cone Bx\{0}(see Appendix 4.8). The convex hull of a set is not necessarily closed. Moreover, in contrast to the Euclidean case, it is unknown whether the convex hull of a set of finite points in a Riemannian manifold of non-constant curvature is closed [9, p. 231]. Therefore, when dealing with points on the boundary of the convex hull of a set we need to make sure that the closure of the convex hull is strongly convex. 14 Proposition 2.3.1. Let C ?M be a strongly convex set such that C belongs to another bounded strongly convex set U. Then C is strongly convex. Therefore, if A ? M is such that the closure of its convex hull, ConvHull(A), belongs to a bounded strongly convex set, then ConvHull(A) is strongly convex. Proof. It suffices to show that the unique minimal geodesic segment connecting q1,q2 ? ?C belongs to C. Note that C is compact. Let ?qk1?k and ?qk2?k be two sequences in intC with limits q1 and q2, respectively. Consider constant speed min- imizing geodesics ?k : [0,1] ? intC such that ?k(0) = qk1 and ?k(1) = qk2. Note that the speed of ?k is equal to d(qk1,qk2). Therefore, speeds of these geodesics are uniformly bounded; and hence the sequence ?k becomes an equicontinuous sequence on C. By the Arzela-Ascoli theorem we can extract a subsequence which converges uniformly to a continuous curve ? : [0,1]? ?C. We know from the theory of length spaces (see e.g., [15, Proposition 2.5.17 on p. 48]) that ? is a shortest continuous curve connecting q1 and q2 in U. Therefore, due to strong convexity of U, ? must be the unique minimizing geodesic segment connecting q1 and q2. Note that on the unit sphere S2 ? R3 an open hemisphere is the largest strongly convex set and its closure is not strongly convex. Therefore, the condi- tion on C to belong to another strongly convex set is necessary to guarantee the conclusion. 15 2.4 Bounds on the Hessian of the Distance Function and Its Powers It is obvious that the convexity properties of the Riemannian distance function are closely related to existence and uniqueness properties of the Lp Riemannian means. Therefore, we review the bounds on the Hessian of the distance function and its powers. The gist of these comparison results is that in an appropriate comparison setting, as curvature increases the Hessian of the distance function (or any power of it) decreases. Recall that the Hessian of f : M ?R at x?M is the symmetric bi- linear form Hessfvextendsinglevextendsinglex : TxM?TxM ?R, such that Hessfvextendsinglevextendsinglex(??(t), ??(t)) = d2dt2f(?(t)), where ? : R ? M is any smooth curve or geodesic passing through x at t. The treatment presented here is a standard one and is borrowed collectively from [78, 50]. Figure 2.2 shows the framework. Let q ? M be a given point and x mapsto? d(q,x) the distance function from that point. For convenience, we also denote this function by dq(x). Consider ?(?) a curve (or possibly a geodesic) such that at specific time t, ?(t) is in the injectivity ball (or domain) around q, i.e., there is a unique minimizing geodesic connecting q to ?(t). The goal is calculate the second derivative of f(?(t)) = dq(?(t)). Consider the family of geodesics c(s,t) = ct(s) such that for each t, ct(s) : [0,d(q,?(t))]?M is a normal (i.e., unit speed) minimizing geodesic connecting q to ?(t). Denote the covariant derivative along ct(s) by D?s. Also we denote differentiation with respect to t by ? and with respect to s by ?. Then s mapsto? ct(s) satisfies the geodesic equation for every t, i.e., D?sc?t(s) = 0. The geodesic equation is a second order nonlinear differential equation. In order to linearize it around smapsto?ct(s) one finds a linear differential equation for the derivative 16 Jt(s) = ?c(s,t)?t ?Tct(s). smapsto?Jt(s) is called the Jacobi field along geodesic smapsto?ct(s). The Jacobi field satisfies the Jacobi equation: D2 ?s2Jt +R(Jt,c ?)c? = 0, (2.3) where R is the curvature tensor of M. There are two initial conditions Jt(0) and J?t(0) associated with this equation. Note that the Jacobi equation is invariant ?c?(dq(x),t) q = c(0,t) c(s,t) ??(t) J(s,t) t s x = ?(t) = c(d(q,x),t) J(d(q,x),t) ?(?) ?c||(d(q,x),t) Figure 2.2: For each t the curve smapsto?c(s,t) is a normal minimal geodesic connecting q = c(0,t) to ?(t), where ?(?) is a curve passing through x?M. J(s,t) = ??tc(s,t) is the associated Jacobi vector at c(s,t). At each point on the geodesic smapsto?ct(s), J(s,t) has two components: one parallel to ct(s), denoted by Jbardbl(s,t) and the other perpendicular to that, denoted by J?(s,t). These two components are not shown in the picture, but the two mentioned directions are shown. under linear reparametrization of the geodesic smapsto?ct(s). In many occasions, it is more convenient to assume that the original geodesic corresponds to t = 0; and as t deviates from 0, ct(s) deviates from c0(s). We also omit the subscripts from Jt and from c0(s). In order to be able to compare c(s) with a nearby geodesic leaving q but with a different small velocity vector we need to assume J(0) = 0. 17 At any point c(s) we can decompose J(s) ? Tc(s)M to a component along the geodesic smapsto?c(s) denoted by Jbardbl(s) and another one perpendicular to the geodesic denoted by J?(s). The parallel component is not so interesting since it can be computed explicitly as Jbardbl(s) = (a + ts)c?(s) for constants a,b ?R. However, the behavior of J?(s) depends on the curvature and is more interesting. It is possible to show that ?J?(s),c?(s)? = ?J?(0),c?(0)?s +?J(0),c?(0)? (see e.g., [24, p. 118]). Therefore, once we have J(0) = 0 and choose J?(0) perpendicular to c?(0), J(s) remains perpendicular to c?(s). A conjugate point of q along geodesic c(s) is a point qc = c(sc) and not equal to q, such that J(sc) = 0 for some initial conditions of J?(0). Let qs1 be the first such a point. A crucial fact is that for s?(0,s1) and any X ?Tc(s)M one can find a unique Jacobi field with J(0) = 0 such that J(s) = X, where X is a given tangent vector in Tc(s1)M ([78, Lemma 2.4 p. 37]). Now let x?M\parenleftbigCq?{q}parenrightbig where Cq is the cut locus of q, i.e., the set of cut points of q1. For such an x let cs : [0,d(q,x)] ? M be the unique minimizing normal geodesic connecting q to x, i.e., c(0) = q and c(d(q,x)) = x. For every X?TxM we can find a Jacobi field J(s) along c(s) with J(d(q,x)) = X with J(0) = 0. Then one can show that (see e.g., [78, Lemma 4.10 p. 109]) Hessdq|x(X,X) =?J??(dq(x)),J?(dq(x))?. (2.4) Note that J?? = (J?)? and if J(dq(x)) = X is parallel to c(dq(x)) then J?(dq(x)) = 0 and Hessdq|x(X,X) = 0. This means that the Hessian of x mapsto? dq(x) is indefinite 1Recall that a point z?M is a cut point of q along a geodesic ? emanating from q, if z is the first point along ? after which ? ceases to be length minimizing. 18 along the geodesic connecting q to x although it might be positive definite along other directions. Equation (2.4) shows why we needed Jacobi fields to describe the Hessian of the distance function. Combining (2.3) and (2.4) is not straightforward. But one can find lower and upper bounds on the Hessian of x mapsto?dq(x) based on the upper and lower bounds on the sectional curvatures of M, respectively. For that we need to solve (2.3) for constant curvature. If the sectional curvature is a constant ?, then (2.3) reads as D2 ds2J ?(s) +?J?(s) = 0. (2.5) As a matter uniform notations it is convenient to define: sn?(s) = ?? ??? ??? ??? ???? 1? ? sin( ??s) ? > 0 s ? = 0 1? |?| sinh( radicalbig|?|s) ? < 0 , cs?(s) = ?? ??? ??? ??? ???? cos(??s) ? > 0 1 ? = 0 cosh(radicalbig|?|s) ? < 0 . (2.6) It is easy to see that the solution to (2.5), can be written as J?(s) = J?(0)cs?(s) + (J?)?(0)sn?(s). Let ? ? K ? ? be the lower and upper bounds on the sectional curva- tures of M. One can bound ?J??(s),J?(s)? in terms of ?(J?)??(s),(J?)?(s)? and ?(J?)??(s),(J?)?(s)? and this results in the following (see e.g., [78, Lemma 2.9 p. 153]): Theorem 2.4.1. If x?B(q,?), where ? < min{injqM, ???}if ? > 0 and ? < injpM 19 if ??0, then cs?(dq(x)) sn?(dq(x))?X ?,X???Hess dqvextendsinglevextendsingle x(X,X)? cs?(dq(x)) sn?(dq(x))?X ?,X??, (2.7) where X? is the component of X ?TxM orthogonal to the geodesic connecting q to x, which is the same as the direction of the gradient of xmapsto?dq(x) at x. From this theorem we observe that the upper curvature bound is the crucial factor in determining the lower bound on the Hessian (and its positiveness) of the distance function. In particular, if M is of non-positive curvature, then xmapsto?dq(x) is convex within injectivity balls around q. In addition, if M is simply connected and hence its injectivity radius is infinity, then the distance function in M is globally convex. Along directions other than the direction of the geodesic connecting q to ?(t), the Hessian of t mapsto? dq(?(t)) is positive definite and the distance function is strictly convex. Note that the excluded or degenerate direction is exactly the direction of the gradient of x mapsto? dq(x) at x = ?(t). We recall the same property for the distance function in Rn. If M is of constant curvature ?, then the lower and upper bounds are equal, hence the Hessian of xmapsto?dq(x) is Hessdq|x(X,X) = cs?(dq(x)) sn?(dq(x))?X ?,X??for X?TxM. Let Sn? be a simply connected complete Riemannian manifold of constant cur- vature ? and dimension n = dimM. It is well-known that these conditions de- termine Sn? together with a Riemannian structure uniquely, up to isometry [78, p.137]. Therefore, if ? > 0 we can assume that Sn? is the standard sphere in Rn+1 with radius 1?? whose injectivity radius is ???. When ??0, as mentioned before, 20 the injectivity radius is infinity. Now again consider our manifold M with upper and lower curvature bounds of ? > 0 and ? ? 0, respectively. The correspond- ing spaces of constant curvature are (Sn?,?g) and (Sn? ,??g) with distances functions ?d and ??d. Consider points q,x ? M with d(q,x) < ? as in Theorem 2.4.1. Also consider pairs of points ?q,?x?Sn? and ??q, ??x?Sn? , counterpart to q and x, such that d(q,x) = ?d(?q, ?x) = ??d(??q, ??x). Let ?,?? and ??? be unit speed geodesics passing through x, ?x and ??x, respectively; such that the angles between each of them and the geodesics connecting q (?q and ??q) to x (?x and ??x) are equal. Then we write: Hess?d?q ?Hessdq ?Hess??d??q (2.8) meaning that: Hess?d?qvextendsinglevextendsingle??(t)(???(t), ???(t))?Hessdq(??(t), ??(t))?Hess??d??qvextendsinglevextendsingle???(t)(????(t), ????(t)). (2.9) We have this corollary: Corollary 2.4.1. Let M be a complete Riemannian manifold of upper bounds on sectional curvature ?. Define f(x) = 1pdp(q,x) which we also write as f(x) = 1pdpq(x). Let ? : R?M be a unit speed geodesic. Assume the angle between the geodesic connecting q to x = ?(t) and ?(t) be ? then df dt(?(t)) = d p?1(q,?(t))?cos? (2.10) 21 and d2f dt2 (?(t))?(p?1)?d p?2(q,?(t))?cos2 ?+dp?1(q,?(t))?cs?d(q,?(t)) sn?d(q,?(t))?sin 2 ? (2.11) The latter is equivalent to Hessf|q ?Hess ?f|?q. (2.12) If M ?Sn? we have equality in the above. Proof. We just need to calculate the first and second order derivatives of f(?(t)) = 1 pd p q(?(t)) by the chain rule: df dt(?(t)) = d p?1 q (?(t)) d dtdq(?(t)) = d p?1 q ??dq(?(t)), ??(t)? (2.13) from which we have: d2f dt2 (?(t)) = (p?1)d p?2 q (?(t))??dq(?(t)), ??(t)? 2 +dp?1 q (?(t)) d2 dt2dq(?(t)). (2.14) Now the claims follow from Theorem 2.4.1. 2.4.1 A Useful Length Comparison Estimate Let q,x,y ? B(o,?) ? M where ? < rcx and assume X,Y ? TqM are such that x = expq X and y = expq Y. Assume that M has lower and upper bounds of ??0 and ? ? 0 on its sectional curvature. We want to find upper and lower bounds on d(x,y) in terms of bardblX ?Ybardbl and the curvature bounds. Let ? : [0,1] ? M 22 be any curve in B(o,?) connecting x to y such that ?(0) = x and ?(1) = y. Let ct(s) : [0,1]?M, for fixed t, be a geodesic connecting q to ?(t) such that ct(0) = q and ct(1) = ?(t). Denote by J(s,t) the Jacobi field along ct(s) with J(0,t) = 0 and J?(0,t) arbitrary. We are using the same conventions established before in this section. From Rauch?s comparison theorem (e.g., [18, p. 390]) we have sn?l2t(s)bardblJ?(0,t)bardbl?bardblJ(s,t)bardbl?sn?l2t(s)bardblJ?(0,t)bardbl, (2.15) where lt = d(q,?(t)) (see (2.6)). The above holds for arbitrary J?(0,t) since we as- sumed ??0 and ??0, otherwise it would only hold for J?(0,t) orthogonal to c?t(0) [18, p. 390]. Note that since ?(t) with velocity ??(t) = J(1,t) is a curve connecting x to y we have d(x,y)?integraltext10 bardbl??bardbl= integraltext10 bardblJ(1,t)bardbldt. Let J(t,s) be a particular Jacobi field such that J?(0,t) = Y ?X, then we have d(x,y)? maxt sn?l2t(1)?bardblY ?Xbardbl. The best upper bound estimate for lt is twice the radius of the ball i.e., lt < 2?. So we have max t sn?l2t(1)? 12?radicalbig|?|sinh2? radicalbig |?|. (2.16) Now let ? : [0,1]?M be the geodesic connecting x to y. We havebardbl??(t)bardbl= d(x,y). Back in TqM there is a curve ? : [0,1]?TqM with velocity vector ??(t) = J?(0,t) which induces the Jacobi field J(t,s) with bardblJ(1,t)bardbl = d(x,y). Now ? is a curve connecting X to Y in TqM, therefore bardblX?Ybardbl? integraldisplay 1 0 bardbl??bardbl= integraldisplay 1 0 bardblJ?(0,t)bardbldt? 1min t sn?l2t(1) d(x,y). (2.17) 23 Note that since the the distances d(q,x) and d(q,y) can be larger than rcx we have no restriction on lt based on convexity of the distance function; that is we cannot conclude that lt?max{bardblXbardbl,bardblYbardbl}. However, as before we have lt < 2?. As a result we have min t sn?l2t(1)? 12??? sin2???. (2.18) Therefore, in a manifold of nonnegative curvature with curvature bounded from above by ? > 0 we have: sin2??? 2??? ?bardblX?Ybardbl?d(x,y)?bardblX?Ybardbl (2.19) and in a manifold of non-positive curvature with curvature bounded from below by ? < 0 we have: bardblX?Ybardbl?d(x,y)? sinh2? radicalbig|?| 2?radicalbig|?| ?bardblX?Ybardbl. (2.20) We will use these estimates in Section 4.6.2 when we consider the sensitivity prop- erties of the L2 mean. 2.5 Cartan-Alexandrov-Toponogov (C.A.T.) Comparison Theorems The material presented here can be found collectively in [15, pp.238-240], [51, pp.197-203] and [18, pp.399-403 and p.420]. Definition 2.5.1. A(geodesic) trianglein(M,d)consists of threevertices q1,q2,q3? M and three minimal geodesics segments connecting the three points. The angle 24 between the geodesic side connecting q1 to q2 and the geodesic side connecting q1 to q3 is denoted by ?q2q1q3. We denote the side between q1 and q2 by q1q2. The two sides q1q2 and q1q3 and angle ?q2q1q3 = ? constitute a hinge at q1 with angle ? and sides q1q2 and q1q3. If the vertices are such that the sides of the triangle are deter- mined uniquely, we denote the triangle by ?q1q2q3. In referring to the vertices of the triangle, sometimes it is convenient to use modulo 3 cyclic indexing, e.g., q4?q1 and so forth. The perimeter of a triangle with vertices q1,q2,q3 is summationtext3i=1 d(qi,qi+1). As noted in the definition, a triangle is not necessarily determined uniquely by its vertices. However, if for a triangles with vertices q1,q2,q3 ? M, we have d(qi,qi+1) < injM for 1 = 1,2,3, then the geodesic sides are uniquely determined. In particular, if the perimeter of that triangle is smaller than 2injM, then the sides are defined uniquely. Since, otherwise, say d(q1,q2)?injM, then the triangle inequality requires d(q2,q3) + d(q1,q3)?injM, which contradicts the condition on the perimeter. Definition 2.5.2. Let q1,q2 and q3 be vertices of a triangle in M. A comparison triangle in (Sn?, ?d), if it exists, is a triangle with vertices ?q1, ?q2, ?q3 ? Sn? such that ?d(?qi, ?qi+1) = d(qi,qi+1),i = 1,2,3. A comparison hinge in (Sn?, ?d) is a hinge with vertex ?q1 ? Sn? and sides ?q1?q2 and ?q1?q3 such that ??q2?q1?q3 = ?q2q1q3, ?d(?q1, ?q2) = d(q1,q2) and ?d(?q1, ?q3) = d(q1,q3). When the lengths of two sides and the angle between them from a triangle in a manifold of constant curvature are given we can calculate the length of the third side and the two other angles. The calculation is carried out using the well- 25 known spherical, hyperbolic or Euclidean laws of cosines. For a triangle with vertices q1q2q3 in the model space (Sn?,d), with ? negationslash= 0, the Law of Cosines for finding the side opposite to q1 reads as [18, p. 103]: cs?d(q2,q3) = cs?d(q1,q2)?cs?d(q1,q3) +??sn?d(q1,q2)?sn?d(q1,q3)?cos?q2q1q3, (2.21) where functions sn? and cs? are defined in (2.6). However, in a general Riemannian manifold Mn, where the sectional curvatures are not necessarily constant, we cannot perform this calculation so neatly and explicitly. Nevertheless, one can find upper and lower bounds on the length of the third side based on lower and upper bounds on the sectional curvature of the manifold, respectively. Let ? ? ? ? R denote upper and lower sectional curvature bounds of Mn, respectively. The main idea is to compare the unknown length of the third side with the lengths of the third sides of comparison triangles (more accurately hinges, see below) in the corresponding model spaces, i.e., Sn? and Sn?. A class of very useful triangle comparison theorems known as Toponogov or Cartan-Alexandrov-Toponogov (C.A.T) comparison theo- rems serve this purpose. These theorems can also be used to perform other forms comparisons: angle or secant comparisons. The statement given in the following is a very comprehensive one which we will use in different occasions [51, pp. 197-198]: Theorem 2.5.1 (C.A.T Comparison Theorems). Let (Mn,d) be a complete Rie- mannian manifold. For K, the sectional curvature of M assume ??K??. Denote the injectivity radius of M by injM > 0. Let rcx be as in (2.1). 26 1. (Upper Bound on Curvature) If the perimeter of a triangle with vertices q1,q2,q3?is smaller than 4rcx, i.e., summationtext3i=1 d(qi,qi+1) < 4rcx, then a comparison triangle??q1?q2?q3 in Sn? exists and is unique up to congruence. (a) (Angle version) The angles of ??q1?q2?q3 are not smaller than the corre- sponding angles of?q1q2q3, i.e., ??qi?1?qi?qi+1 ??qi?1qiqi+1. (b) (Secant version) Let x and y be two points on two different sides of ?q1q2q3. Via the obvious 1-1 correspondence between the points on the perimeters of the two triangles find two points ?x and ?y (corresponding to x and y, respectively) on the perimeter of ??q1?q2?q3. Then we have that the secant connecting ?x to ?y is not shorter than the secant connecting x to y, i.e., ?d(?x, ?y)?d(x,y) (c) (Hinge version) Corresponding to the hinge (?q2q1q3,q1q2,q1q3) a com- parison hinge (??q2?q1?q3, ?q1?q2, ?q1?q3) in Sn? exists and d(q2,q3)? ?d(?q2, ?q3). 2. (Lower Bound on Curvature: Toponogov Theorems) With curvature bounded from below there is no need for limiting the size of a triangle in M and for any triangle with vertices q1q2q3 a comparison triangle in Sn? exists. Moreover, with ? > 0 we have summationtexti d(qi,qi+1)?2 ??? and equality holds only if M = Sn? . The Angle, Secant and Hinge versions in this part are exactly as in the previous part except that the directions of the inequalities are reversed. If the bounds on the curvature are strict bounds (e.g., ? < K), then the correspond- ing inequalities can be replaced by strict inequalities. 27 One can think of the comparison triangle in Sn? as fatter than our original triangle in M and the comparison triangle in Sn? as thinner than our original one in M. As a side, we mention that the remarkable property that the lower positive bound on curvature limits the perimeter of a triangle is the main theme in a class problems in global Riemannian geometry known as ?pinching problems?. Figure 2.3 pictorially explains the theorem. 28 ??d(??x i, ??qi) = d(xi,qi) = ?d(?xi, ?qi) q1 q2 q3 ?q1 ?q2 ?q3 ??q2 ??q1 ??q3 x1 x2??x 2 ?x1 ?x2 arrowvertexdblarrowdblbt ??x1 (Sn? , ??d) (Sn?, ?d)(Mn,d) ??d(??q i, ??qi+1) = d(qi,qi+1) = ?d(?qi, ?qi+1)(i = 1,2,3) ???qi?1??qi??qi+1??qi?1qiqi+1 ???qi?1?qi?qi?1(i = 1,2,3), ??d(??x 1, ??x2)?d(x1,x2)? ?d(?x1,?x2) [a] ??d(??q 2, ??q3)?d(q2,q3)? ?d(?q2, ?q3) q1 q2 q3 ???q2??q1??q3 = ?q2q1q3 = ??q2?q1?q3 ? ?q2 ?q3 ?q1 ??d(??q 1, ??qi) = d(q1,qi) = ?d(?q1, ?qi) (i = 1,2) (Mn,d) (Sn?, ?d) ??q3 ??q1 ??q2(S n ? , ??d) [b] Figure 2.3: C.A.T comparison theorems (Theorem 2.5.1). Part [a] describes the angle and secant versions and part [b] describes the hinge version. ? and ? are lower and upper bounds on the sectional curvature of Mn, respectively. we construct comparison triangles in (Sm? , ??d) and (Sm?, ?d) with vertices ??q1, ??q2, ??q3 and ?q1, ?q1, ?q1, respectively. Corresponding sides of the triangles have equal lengths. Corresponding to a secant x1x2 we find secants ?x1?x2 and ??x1??x2 in other two triangles. The triangle in Sm? is fatter and the triangle in Sm? is thinner than the original one in M, i.e., the angles and secants in M are not smaller than those in Sm? and not larger than those in Sm?. 29 Chapter 3 Existence and Uniqueness of the Riemannian Lp Mean 3.1 Introduction We start with the following definition: Definition 3.1.1. Let M be a complete Riemannian manifold of dimension n with Riemannian structure g and corresponding distance function d. Depending on the situation we might write (M,g) or (M,d). Assume w is a probability measure on M with support supp(w). Define fp(x) = ?? ?? ?? 1 p integraltext dp(x,s)dw(s) 1?p 0; and because of our intended applications we might refer to xi?s as data points. We call vector w = [w1,...,wN]T a weight vector. Unless otherwise stated, we always assume a weight vector is a vector of positive entries which add up to one. We 30 rewrite (3.1) as fp(x) = ?? ?? ??? 1 p summationtextN i=1 wid p(x,xi) 1?p 0 guarantees the existence and uniqueness of the local Lp center of mass for a probability measure with support in B(o,?) and that the center is the only stationary point of fp in B(o,?). Then just by the triangle inequality we obtain the existence and uniqueness of the global Lp center of mass for a measure with support in B(o,?) where ? < 12r?. Le in [61] uses a rather similar argument to show that in a general (i.e., nonconstant curvature) manifold ? < 12rcx guarantees existence and uniqueness of the global L2 mean for a probability measure with support in B(o,?). Le?s work is motivated by applications in the statistical analysis of shapes. We should mention that statisticians have dealt with data points on circles and spheres since early 20th century in a context known as ?statistical analysis of directional and spherical data? [28, p. xi]. A survey of references such as [65, 37 28] shows that the framework of Riemannian geometry and especially the related existence-uniqueness issues, however, have not appeared explicitly in the mentioned context. As this body of literature is more concerned with some specific parametric distributions (e.g., von Mises distribution) on the circle and the sphere. Another reason is that usually what is considered as the mean, is the so-called extrinsic mean which arises from the standard embedding of the sphere or any other manifold in a related Euclidean space of higher dimension. In contrast, what we call the Riemannian mean can also be called ?intrinsic mean.? The extrinsic mean in most cases is the projection of the standard Euclidean mean of the data points computed in the ambient space onto the (sub)manifold; and in the case of the sphere or few other manifolds it is much easier to numerically compute than the Riemannian or intrinsic mean. Until recently with works such as [11] the Riemannian mean had not have gained much attention in the statistics literature. 3.1.2 Contributions and Outline of the Chapter In this chapter we provide a unified and extensive analysis of the existence and uniqueness of global Riemannian Lp means for 1?p??. The main contributions of this chapter are as follows: 1. In Theorem 3.2.1 (or Theorem 3.2.21) we derive a new bound on the radius 1We derive our results in this and other chapters for the case of finite number of points with positive weights which we have introduced and dealt with so far. The reason is that in an applied and computation-oriented setting this is the more realistic framework. However, most of the results in this and the next chapter carry over to the case of arbitrary probability measures in M with no or little extra work. Since in the related literature the existence-uniqueness results are stated for such measures, in Section 3.2.2 we state Theorem 3.2.2 which is the counterpart of Theorem 3.2.1 for arbitrary probability measures. 38 of the ball containing the support of a probability measure on M to ensure uniqueness of the (global) Lp mean with respect to that measure. In particular, for p = 2 our existence and uniqueness result, improves the result by Le in [61] regarding the global L2 mean. More significantly, our result leads to the discovery that under the best available existence and uniqueness condition for the local Lp mean, the local and global Lp means coincide; therefore, in the sense described before, ours is an uncompromised global existence and uniqueness result. On the route to prove Theorem 3.2.1, we extend (with some modifications) the method of Buss-Fillmore in [17], via comparison arguments, to arbitrary manifolds. 2. As a part the proof of Theorem 3.2.1 (or Theorem 3.2.2) we give an alternative and more geometrical proof for the uniqueness bound by W. S. Kendall for the L2 mean. The organization of the chapter is as follows: Entire Section 3.2 is devoted to the proof of the existence and uniqueness theorems for the Lp means. In Section 3.3 we discuss the uniqueness issue when 1?p < 2 and open problems. 3.2 Existence and Uniqueness of the Riemannian Lp Means As mentioned before, the existence and uniqueness of properties of the Lp means in manifolds of non-positive curvature especially if they are simply connected, look like their counterparts in Euclidean space in many aspects. However, the case of a manifold of positive curvature is more complicated as can be seen in the example 39 of the sphere. In fact, the sphere gives us a guideline as what to expect. This fact will be our guide in using comparison theorems to prove the following existence and uniqueness theorem: Theorem 3.2.1. Let (M,d) be a complete Riemannian manifold with sectional curvature upper bound of ? and injectivity radius of injM. Define ??,p = ? ??? ??? 1 2min{injM, ? 2??} 1?p < 2 1 2min{injM, ?? ?}(= rcx) 2?p??. (3.5) Let {xi}Ni=1 ? B(o,?) ? M with ? < ??,p, then this set of points has a unique Lp mean for 1 < p ??, which lies in B(o,?). For p = 1, unless all the data points lie on a geodesic, again the mean exists, is unique and lies in B(o,?); and in the degenerate case the mean might not be unique but it lies in the interior of the geodesic. Moreover, for 1 < p fp(q?). In order to show that, of course, we show that d(q,xi) > d(q?,xi) for 1?i?N. If d(o,q)?2?, then from the triangle inequality d(q,xi) ? d(o,q)?d(o,xi) > ?, hence fp(q) > fp(o) and q cannot be a minimizer. So we only consider a q which is in the annular region A = {x ? M|? < d(o,x) < 2?}. Since d(q,o) < 2? < injM, there is a unique minimal geodesic that connects q to o. This geodesic meets the boundary of B(o,?) perpendicularly at a unique point, qc, which lies between q and qc.Denote the corresponding normal vector pointing inward B(o,?) by nqc (see Corollary 2.3.2). We choose a point q?, the reflection of q inside the ball, at distance d(q,qc) between qc and o on the geodesic connecting q to o. A minimizing geodesic emanating from qc to any of xi?s makes an angle less than ?2 with nqc at qc (by Theorem 2.3.2). Note that since d(qc,xi) < 2? < injM, the mentioned geodesic is unique. Now, we perform two sets of comparisons: one for the case ??0 and one for the case ? > 0. The Case ??0 Our model or comparison space is (Sn0 ?Rn, ?d), where ?d is the standard Euclidean distance function. Consider the same configuration as the one in M, in the model space: a ball of radius ? centered at ?o and a point ?q outside the ball at distance d(o,q) from the center, i.e., ?d(?o, ?q) = d(o,q). Figure 3.1 shows the configuration in both M and Sn?. In that figure, corresponding equal angles or sides in M and Sn? are marked. From ?q? choose a geodesic which makes an angle equal to ?oq?xi with the geodesic connecting o to q? and choose the point ?xi such that ?d(?q?, ?xi) = d(q?,xi). Then by applying the hinge version of Theorem 2.5.1 with upper bound on curvature 42 to the triangles?q?oxi and??q??o?xi we see that d(?o, ?xi) < d(o,xi), hence xi?B(?o,?). Therefore, from Corollary 2.3.2, ??i = ??q??qc?xi < ?2. Now applying the usual Law of Cosines formula to two triangles??qc?q??xi and??q?qc?xi yields ?d(?q, ?xi) > ?d(?q?, ?xi). (3.7) Next, we apply Theorem 2.5.1 to triangles ?qq?xi and ??q?q??xi. To verify that we can apply the theorem, we note that: d(q,q?) = 2d(qc,q), d(xi,q?)?d(xi,o)+d(o,q?) < 2??d(qc,q). (3.8) If d(xi,q)?2??d(qc,q), then we already have the claim that d(xi,q?) < d(xi,q), so we assume d(xi,q) < 2??d(qc,q). Therefore, the perimeter of?qq?xi is smaller than 4? < 4rcx. Note that by construction ?xiq?q = ???oq?xi and ??xi?q??q = ????o?q??xi; therefore ?xiq?q = ??xi?q??q. Applying Theorem 2.5.1 to the two triangles together with (3.7) yield d(xi,q)? ?d(?xi, ?q) > d(xi,q?). The Case ? > 0 In this case we proceed exactly as the case ??0, except that we need to use the Spherical Law of Cosines to get (3.7). We elaborate on that. Let ??q??qc?xi = ??i. Then ??q?qc?xi = ????i. The Law of Cosines (2.21) in the triangles ??q??qc?xi and ??q?qc?xi 43 reads as cos???d(?q?,?xi) = cos???d(?q?, ?qc).cos???d(?xi, ?qc) + sin???d(?q?, ?qc).sin???d(?xi, ?qc).cos ??i (3.9) and cos???d(?q, ?xi) = cos???d(?q, ?qc).cos???d(?xi, ?qc) ? sin???d(?q, ?qc).sin???d(?xi, ?qc).cos ??i, (3.10) respectively. Noting that ?d(?q, ?qc) = ?d(?q?, ?qc) < ???, ?d(?xi, ?qc) < ???, ??i < ?2 and comparing these two equations we see that (3.7) is valid in this case, too. The rest of the proof is as the previous case. 3.2.1.2 Existence By the previous part we have minB(o,?) fp ?infMfp. Considering continuity of fp, this shows existence of a minimizer of fp on M. Note that the minimum of fp on M happens in B(o,?) for 1 < p??, while it can happen inside the ball as well as on the boundary when p = 1, in which case it must coincide with one of the data points. We observe that for 1 < p d(q?,xi). This is essentially the result of the fact that ?i < ?2. The left and right pictures show the configurations in M and the model space Sn?, respectively. Corresponding equal angles or sides in M and Sn? are marked. 3.2.1.3 Uniqueness We assume 1?p 0 Recall Corollary 2.4.1 and Definition 2.6 with positive ?. Unfortunately, for ? > 0, the previous rather un-tuned method, which is based on showing the strict convexity ofeach individual terms infp, gives convexity onlywhen ? < min{12injM, ?4??}. This is because cot??d(x,xi) is positive only when d(xi,x) < ?2??. Therefore, in order to guarantee strict convexity of xmapsto? 1pdp(x,xi) (and hence fp), one is forced to have ? < ?4??. Note that this is enough to prove our claim for 1?p < 2. For p =?also the argument used for ??0 works for ? > 0. Henceforth, we assume 2?p 0. The goal is to show that any stationary point of fp in B(o,?) is a local minimizer of fp. Let q?B(o,?) be a stationary point of fp i.e., ?fp(q) = 0. Note that if q coincides with the center o, then we have d(q,xi) < ?2?? for 1?i?N. In this special case the Hessian of each individual term xmapsto? 1pdp(x,xi) at x = q = o will be positive definite; and hence q will be a local minimizer of fp. Henceforth, we assume that qnegationslash= o or equivalently d(q,o) > 0. Let tmapsto??(t) be a unit speed geodesic leaving q?B(o,?) at t = 0. We denote the angle between the geodesic connecting 46 xi ?B(o,?) to q and ? at q by ?i. For convenience let us write d(xi,q) by di(q) or even di. By Corollary 2.4.1 we have d2fp dt2 (?(t))|t=0? Nsummationdisplay i=1 wi parenleftbigg (p?1)?dp?2i (q)?cos2 ?i+dp?1i (q)? ? ??cot ? ?di(q)?sin2 ?i parenrightbigg . (3.11) The summand can be written as (p?2)?dp?2i ?cos2 ?i +dp?2i ?parenleftbig1?sin2 ?i(1???.di?cot??di)parenrightbig. (3.12) Since??di(q)cot??di(q)?1, we have1 d2fp dt2 (?(t))|t=0? Nsummationdisplay i=1 widp?1i (q)??cot??di(q). (3.13) We need to show that the above is positive if?fp(q) = 0. Let ?qo denote the unit speed minimal geodesic connecting q to o. Let ?i denote the angle between the geodesic connecting q to xi ?B(o,?) and ?qo. The projection of ?fp(q) along the direction of ?qo at q is zero. Therefore, we have summationdisplay widp?1(q,xi)cos?i = 0. (3.14) 1 For 1 ? p < 2, we cannot get a relation similar to (3.13), i.e., in (3.12) we cannot get rid of the effect of the angle ?i and reach at a lower bound independent of ?i which only involves di and ?i. We will elaborate on the implications of this fact in Section 3.3. It is possible that a finer analysis for 1?p < 2 will result in a better bound on ?, possibly depending on p. This can be a topic for further research. 47 Our goal is to show that this is enough to guarantee that the Hessian of fp is positive definite at q. We do this by comparison with the spherical case. Consider a ball B(?o,?) ? Sn?. Choose a point ?q inside the ball such that ?d(?o, ?q) = d(o,q). Then choose the point ?xi such that ?d(?q, ?xi) = d(q,xi) and ??xi?q?o = ?xiqo = ?i. Note that by this construction ?xi might not belong to B(?o,?). However, applying Theorem 2.5.1 to the triangles??xi?q?o and?xiqo we observe that ?d(?o, ?xi)?d(xi,o) < ?; and therefore, ?xi ?B(?o,?). Also note that since d(o,xi),d(o,q) < ? and d(q,xi) < 2?, the perimeter of?xiqo is less than 4rcx and we are allowed to use Theorem 2.5.1. By construction we have Nsummationdisplay i=1 wi ?dp?1(?q, ?xi)cos?i = 0, (3.15) where as mentioned before ??o?q?xi = ?i for 1?i?N. Note that B(?o,?) is inside the larger hemisphere H = B(?o, ???). The boundary of a hemisphere is a ?global? totally geodesic submanifold of Sn?, i.e., a geodesic connecting two points on the boundary belongs to the boundary for all times [9, p.208]. We use this fact as follows. Continue the radial geodesic connecting ?q to ?o till it meets the boundary of the hemisphere at ?z. Also continue the geodesic connecting ?q to ?xi to meet the boundary of H at ?yi (1 ? i ? N). This geodesic meets the boundary of B(?o,?) at ?y?i. Figure (3.2) depicts the situation for a single data point ?xi. For the reason explained before we have ??q?z?yi = ?2 (1 ? i ? N). Then the application of the Spherical Law of Cosines twice in the triangle ??q?yi?z, once for the angle ??q?z?yi = ?2 and once for the angle ??yi?q?z = ?i yields the crucial 48 ???q?o ?z ?q ?xi ?? ? ? ?y?i ?o ?yi ?i Figure 3.2: ?o is the center of a hemisphere H of radius ??? in Sn?. Every data point ?xi belongs to B(?o,?) ? H, and ?q ? B is a critical point of ??fp. The minimal geodesic connecting ?q to ?xi hits the boundary of the hemisphere perpendicularly at ?z. The minimal geodesic connecting ?q to ?xi hits the boundaries of B and H at ?y?i and ?yi, respectively. relation cot???d(?yi, ?q) = cos?i cot???d(?z, ?q), 1?i?N. (3.16) This relation enables us to relate ?i to ?d(?xi, ?q) via a constant (i.e., ?d(?z, ?q) which is independent of i. Since the derivative of tmapsto?cott is smaller than?1 on (0,?) it is easy to see that cott1?cott2 ??(t1?t2), t1,t2?(0,?). (3.17) Note that ?d(?yi, ?q)? ?d(?xi, ?q) = ?d(?xi, ?yi). (3.18) Also note that ?d(?xi, ?yi)? ?d(?y?i, ?yi) and ?d(?y?i, ?yi)? ?2???? (see Figure (3.2)). There- 49 fore we have ?d(?yi, ?q)? ?d(?xi, ?q)? ? 2????. (3.19) So from (3.17) we have cot???d(?xi, ?q)?cot???d(?yi, ?q) + (?2 ????), 1?i?N. (3.20) Combining this together with (3.16) we have ?d(?xi, ?q) < ?d(?yi, ?q) < ??? we have cot???d(?xi, ?q)?cot???d(?z, ?q)cos?i+(?2????) > cot???d(?z, ?q)cos?i, 1?i?N. (3.21) Recall that ? < ?2??; hence we have the strict inequality. By our construction, the above implies cot??d(q,xi) > cot???d(?q, ?z)cos?i. (3.22) Plugging the above in (3.13) and using (3.14), we conclude d2 dt2fp(??(t))|t=0 > cot ???d(?q,?z)summationdisplayw idp?1(q,xi)cos?i = 0, (3.23) i.e., the Hessian of fp is positive definite at q. Since on the boundary of B(o,?), ??fp is pointing inward, the Poincar?e-Hopf Index Theorem implies that q is the only zero of?fp in B. Hence, by our insideness argument in Subsection 3.2.1.1, q is the unique minimum of fp in M. This completes the proof of the theorem. 50 3.2.2 The Case of an Arbitrary Probability Measure in M Our existence and uniqueness result in Theorem 3.2.1 concerns with finite number of points in M with corresponding weights. For most data processing applications this is the natural framework. However, it is customary to consider existence and uniqueness of themean for arbitraryprobability measures in M. It isstraightforward to extend Theorem 3.2.1 to this more general case. In fact, the proof we gave for Theorem 3.2.1 carries over to this case: Theorem 3.2.2. Let w be a probability measure with support inside the ball B(o,?) with ? < ??,p where ??,p is defined in Theorem 3.2.1. Then for 1 < p ??, the Lp center of mass with respect to w is a unique point and lies inside B. For p = 1, unless the w-measure of a geodesic segment is 1, again the center exists, is unique and lies in B; and in the degenerate case the center might not be unique. Moreover, for 1 < p 0 and large pj |f?(x)?f?(y)|?|f?(x)??fpj(x)|+|?fpj(x)??fpj(y)|+|?fpj(y)?f?(y)|??+d(x,y). (4.4) 4.3 Smoothness Properties of the Lp Means Note that ?xp depends on the data points{xi}Ni=1, on the weights wi and on the order p. We show that ?xp depends continuously on these factors; however, higher orders of smoothness do not hold always as we shall elaborate on. Unless otherwise stated, by saying ?f is a smooth function?, we mean ?f has continuous derivatives of all orders.? Before proving the next theorem, we recall some smoothness properties of 60 the Riemannian distance function xmapsto?d(q,x): Proposition 4.3.1. Let (M,d) be a complete Riemannian manifold of injectivity radius injM. Denote by Cq the cut-locus of q. Then we have: 1. The distance function xmapsto?d(q,x) is C? on M\parenleftbig{q}?Cqparenrightbig. In particular it is C? on B(q,injM)\{q}. 2. At x = q, xmapsto?dp(x,q) has the same smoothness properties as ?xmapsto?bardbl?x??qbardblp has at ?x = ?q where ?x, ?q?Rn andbardbl.bardblis the standard Euclidean norm in Rn. Proof. The first part is a result of diffeomorphic properties of the exponential map [78, p. 108]. The second part is the immediate property of the exponential map and the Riemannian distance function. Theorem 4.3.1. Let{xi}Ni=1 belong to a ball of radius ? defined in Theorem 3.2.1. Then 1. For 1?p 0 for large enough kj ?? < ?fp(x;vkj)? ?fp(x;v) < ?. (4.5) Therefore, min ?fp(x;vkj)?min ?fp(x;v) as vk ?v. Using this fact, the continuity 1The standard version of the Implicit Function Theorem, requires Fp to be at least C1 in all variables and its continuity is not enough. In the current situation, i.e., p < 2, the maximum we can expect some form local Holder smoothness, which might require stronger versions of the IFT. However, here, we prove continuous dependence by a different approach 62 of ?fp(.;.) and noting that |?fp(?xkjp ;v)? ?fp(?xp;v)|?|?fp(?xkjp ;v)? ?fp(?xkjp ;vkj)|+|?fp(?xkjp ;vkj)? ?fp(?xp;v)|, (4.6) we have lim ?fp(?xkjp ;v) = fp(?xp;v). Due to compactness of B(o,?),??xkjp ?has a con- verging subsequence. Denotethissubsequence againby ?xkjp . Wehave ?fp(lim ?xkjp ;v) = lim ?fp(?xkjp ;v) = fp(?xp;v). Now, since ?xp is the unique minimizer of ?fp we must have lim ?xkjp = ?xp. We maintain that lim ?xkp = ?xp. Otherwise, we have an infinite subse- quence of ?xkp, whose all terms will stay away from ?xp, but by the previous argument this subsequence must have a subsequence converging to ?xp which is a contradiction. The above argument works forp = 1 as long asf1 hasa unique minimizer which is ensured if{xi}Ni=1 are not on a geodesic. It also works for p =?because of Propo- sition 4.2.1, which implies the equicontinuity of the sequence ?f?(x;xk1,...,xkN)?k. Note that for p =?we have no weights. The proof is complete. Probably, lack of high order smoothness for p =?is rather surprising, since as p increases smoothness of ?xp in its dependence on xi?s increases. In Figure 4.1, note that if x4 moves inside the minimal ball, the center will not change, but as it moves outside the ball the center has to change. This can explain the nonsmooth dependence of ?x? on the points on the boundary of the minimal ball. The following simple corollary will be useful in Section 5.2: Corollary 4.3.1. Let {xi}Ni=1 ? B(o,?) with ? < rcx. Then the radius of the minimal ball of the set{xi}Ni=1 is a continuous function of the data points xi?s. 63 The following theorem concerns the dependence of ?xp on p. Theorem 4.3.2. Let 1 ? p0 ? ? and p ? p0 (for p0 = 1 we only mean p approaching 1 from the right). Assume{xi}Ni=1?B(o,?) where ? < min{??,p0,??,p} for p close to p0. For p0 = 1 assume the data points are not on a geodesic. Then ?xp??xp0. Proof. The proof can be exactly as the one in the second part of the proof of the previous theorem, except here we take a sequence ??fp?p as p ? p0 which is again equicontinuous and uniformly bounded (as explained before). 4.4 Isometry Compatibility An important property of the L2 center of mass used in [34] is its isometric com- patibly (see also [50]). By this we mean that if ? : M ?M is an isometry of M, then the center of mass of{?(xi)}Ni=1 can be found by application of ? to the center of mass of{xi}Ni=1. Theorem 4.4.1. Let{xi}Ni=1 ?B(o,?) with ? < ?p,?. Assume ? is an isometry of M. Then the Lp center of mass of{?(xi)}Ni=1 is ?(?xp). Proof. For 1 < p < ?, ?xp is the only zero of ?fp(x;x1,...,xN) inside B(o,?). Since ? is an isometry, from the chain rule it follows that ?(?xp) is a zero of the gra- dient of fp(x;,?(x1),...,?(xN)) in B(?(o),?) (see (3.6) and note that d(xi, ?xp) = d(?(xi),?(?xp)). Because {?(xi)}Ni=1 has a unique center in B(?(o),?), we conclude that ?(?xp) must be this unique center. For p = 1,?the result follows from conti- 64 nuity with respect to p. Note that for p = 1 the claim holds even in the degenerate case. 4.5 Convexity Properties of the Lp Means The Lp mean of{xi}Ni=1 exists uniquely as soon as{xi}Ni=1 lies in a strongly convex ball of radius ? determined in Theorem 3.2.1. Note that the mean is independent of the mentioned ball and the existence of other candidate balls has no effect on the position of the mean. Therefore, the mean belongs to the intersection of all strongly convex balls containing{xi}Ni=1. One would like the mean to belong to the convex hull of {xi}Ni=1. The proof of this fact is not immediate, since the convex hull of a set is defined as the intersection of all strongly convex ?sets? containing that set and not merely the ?balls.? To appreciate the difference, let M be an open disk in R2 and{xi}3i=1 vertices of a triangle in M. The convex hull of{xi}3i=1 is the triangle?x1x2x3 itself, but we cannot construct the triangle by intersection of any collection of open balls in M containing {xi}3i=1, since these balls are of bounded radius. However, if M = R2, then we could construct the triangle as an intersection of infinitely many open circles of unbounded radius. Still, we can prove the following theorem using the uniqueness of the stationary points of fp inside the candidate ball: Theorem 4.5.1. If the set of points{xi}Ni=1 lies in a ball of radius ? < ??,p defined in Theorem 3.2.1, then the Lp mean for 1 ? p ??, in general, belongs to the closure of the convex hull of the points. More precisely, 1. For 1 < p 0 in the definition of the mean is crucial for this result to hold. The main obstacle in extending the preceding argument to the non-constant curvature case is that a face of the cone Cx does not map to ak-dimensional (1 < k < n) totally geodesic submanifold via the exponential map, necessarily. Whether constant curvature is necessary to ensure the mean to belong to the interior of the convex hull is not clear. 4.6 Sensitivity Properties of the L2 Mean The standard Euclidean mean enjoys an appealing averaging or smoothing property; and in fact that is the main reason for its use in everyday life. We distinguish three forms of averaging properties: Definition 4.6.1 (Averaging properties). Given B(o,?) ? M, let ?w,B : BN = B?...B ?B denote the L2 mean function on B, i.e., ? = ?wN,B(x1,...,xN) is the L2 mean of{xi}Ni=1?B with corresponding weight vector wN = [w1,...,wN]T. 72 Then we say 1. ?wN,B is strongly averaging if for any two sets of points{xi}Ni=1 and{x?i}Ni=1 in B we have d(?wN,B(x1,...,xN),?wN,B(x?1,...,x?N))? summationdisplay i wid(xi,x?i). (4.7) 2. ?wN,B is averaging if for any two sets of points {xi}Ni=1 and {x?i}Ni=1 in B we have d(?wN,B(x1,...,xN),?wN,B(x?1,...,x?N))?max i d(xi,x?i). (4.8) 3. ?wN,B is weakly averaging if for any set of points {xi}Ni=1 in B replacing a single point xj by another point x?j does not change the position of the mean by more than d(xj,x?j), i.e., d(?wN,B(x1,...,xN),?wN,B(x1,...,xj?1,x?j,xj+1,...,xN))?d(xj,x?j). (4.9) In occasions we might refer to the L2 mean without being specific about B(o,?), wN, or N; in which case we assume that ? < ??,2, o, w or N are arbitrary. Strong averaging implies averaging which in turn implies weak averaging. It is easy to see that the Euclidean L2 mean in Rn is strongly averaging. More generally, in Theorem 4.6.1 we show that the Riemannian L2 mean in a manifold of nonpos- itive curvature also has the same property. For manifolds of positive curvature the situation is completely different and the L2 mean is not strongly averaging. The 73 interpretation is that, the mean can have high sensitivity with respect to the noise. 4.6.1 Lack of (Strong) Averaging Property for L2 Mean in Manifolds of Nonnegative Curvature Here, we show why the L2 mean cannot be strongly averaging if M is of nonnegative sectional curvature 1. Let q ? M and ?(t) = expq tX be a unit speed geodesic emanating from q with velocity vector X ? TqM. Take two points x1 = expq t1X and x2 = expq t1X on ? with 0 < t1 < t2 < injM. The L2 mean of the set{x1,x2} with weight vector w2 = [w,1?w]T(0 < w < 1) is ?x = expq(wt1+(1?w)t2)X. Now, consider another geodesic ?(t) = expq tX? emanating from q with velocity vector X? of unit norm. Take perturbed versions of x1 and x2 on ?, as x?1 = expq t1X? and x?2 = expq t1X?. Obviously, the L2 mean of{x?1,x?2}with weight vector w2 = [w,1?w]T is ?x? = expq parenleftbigwt1 + (1?w)t2parenrightbigX?. It is a standard matter to show that (e.g. [66]) d(expq tX,expq tX?) = tbardblX?X?bardblqparenleftbig1?Kq(X,X ?) 12 (1 +?X,X ??q)t2parenrightbig+o(t), (4.10) where Kq(X,X?) is the sectional curvature of M at q along the plane spanned by unit norm tangent vectors X,X? ?TqM and o(t)t3 ?0 as t?0. If Kq(X,X?) > 0, then tmapsto?h(t) = d(expq tX,expq tX?) is a strictly concave function on the interval (0,?) for small enough ?. Therefore, for 0 < t1 < t2 < ? and 0 < w < 1 we have wh(t1) + (1?w)h(t2) < h(wt1 + (1?w)t2). This means that d(?x, ?x?) > wd(x1,x?1)+(1?w)d(x2,x?2), which implies that the L2 mean for two points cannot 1Tacitly we assume that the sectional curvature is not identically zero. 74 be strongly averaging. By choosing repeated points we can conclude that the L2 mean for more than two points also cannot be strongly averaging. Note that for small enough ?, tmapsto?d(expq tX,expq tX?) is strictly increasing on the interval [0,?]. Therefore, d(?x, ?x?) < d(x2,x?2) for small t2. This suggests that although the L2 mean is not strongly averaging it might be averaging in small domains. Next, we show that this is not possible if the sectional curvatures of M are bounded below by ? > 0 and above by ?. First, we consider the constant curvature case and specifically the unit sphere (in R3). Figure 4.2 explains the situation. Consider two geodesics (great circles) t mapsto??(t) = expq1 tX and t mapsto??(t) = expq1 tX? with unit velocities X and X?, respectively, that leave q1 at angle ?. The two geodesics meet again at q2, the antipodal of q1. As before we consider the function tmapsto?d(expq1 tX,expq1 tX?). The interesting point is that this function achieves a maximum at ?t = ?2 which corresponds to midpoints ?x = expq1 ?tX and ?x? = expq1 ?tX?. The two geodesics deviate from each other for 0 < t < ?t and become closer for ?t < t??. Now, if we choose x1 = expq1(?t?wl)X and x2 = expq1(?t?(1?w)l)X, then the L2 mean of {x1,x2}with weight vector w2 = [w,1?w]T is ?x for 0 < w < 1 and all 0?l??t. On ? we choose x?1 = expq1(?t?wl)X? and x?2 = expq1(?t?(1?w)l)X? whose L2 mean is ?x?. Obviously, d(?x, ?x?) > maxi d(xi,x?i). This means that the L2 mean of two points in a ball which contains our data points cannot be averaging. The same argument by choosing repeated points that coincide with x1 and by varying w implies that the L2 mean cannot be averaging for more than two points, as well. Note that by choosing ? and l small enough we see that the L2 mean cannot be averaging even in arbitrary small balls on S2. 75 ?x? o ? x?2 x?1 x1 x2 ? q2 ? XX? ? q1 ?x Figure 4.2: Top view for the upper unit hemisphere in R3. The pairs of points x1,x2 and x?1,x?2 lie on two geodesics separated by angle ? and connecting the two poles q1 and q2, as shown. ?x and ?x? are the midpoints of the two geodesics. x1 and x2 (x?1 and x?2) are mirror images with respect to ?x(?x?). For all values of ? > 0 and no matter how close x1 and x2 (and x?1 and x?2) to each other are, we have that d(?x, ?x?) > d(xi,x?i), i = 1,2. Therefore, the L2 mean on the sphere cannot be averaging, even in very small balls. In M, we shall use a more general argument using the properties of Jacobi fields to show that one can always find data points{x1,x2}and their perturbations {x?1,x?2} for which the L2 mean is not averaging. Consider q1 ?M and the family of geodesics tmapsto?f(t,s) = expq1 t(X + sY) where X ?Tq1M and Y ?TXTq1M 1 . Denote the geodesic tmapsto?f(t,0) by ?(t). In the following we denote all derivatives, either the Levi-Civita covariant derivative of a vector field or the standard derivative of function, with respect to t by ?. The covariant derivatives of the related vector fields are taken along t mapsto? ?(t). Assume that bardbl??(0)bardbl = bardblXbardbl = 1. The distance between points f(t,s) and f(t,0) can be written as: d(expq1 tX,expq1 t(X +sY)) =bardblJ(t,0)bardbls+o(t,s), (4.11) 1Here, to avoid confusion and for convenience we shall departure from our original notation on Jacobi fields introduced in Section 2.4 and interchange the role of the t and s variables. 76 where tmapsto?J(t,0) = J0(t) = J(t) is the Jacobi field along tmapsto?expq1 tX with initial conditions J(0) = 0 and ?J(0) = Y; and lims?0+ o(t,s)s2 = 0. We know that there exists an orthonormal basis of parallel vector fields{??(t),E1(t),...,En?1(t)}along ?. Recall, that a vector field is parallel along the geodesic tmapsto??(t), if its covariant derivative along ? is identically zero. We take the two dimensional section along ? spanned by{??(t),E1(t)}, and denote the sectional curvature of M along this section by K(??(t),E1(t)) = K(t). Note that 0 < ? ? K(t) ? ?. Consider a Jacobi field J(t) with J(0) = 0 along this section, which is orthogonal to ??(t) along ?(t): J(t) = g(t)E1(t), (4.12) where g(0) = 0 , ?g(0)negationslash= 0 and g(t) : R?R satisfies ?g(t) +K(t)g(t) = 0. (4.13) This equation is derived from the Jacobi equation (2.3); and ?J(0) = ?g(0)E1(0) = Y. The latter is the only restriction that we impose on Y introduced before. Due to linearity of (4.13) we can assume that ?g1(0) > 0. It is a well-known consequence of Sturm?s comparison theorems for ordinary differential equations (e.g. [24, pp.210- 211]) that ?g(0)sin ??t ?? ?g(t)? ?g(0)sin ??t ?? (4.14) for t > 0. In the above, each inequality is valid as long as its left side remains positive. It is easy to see that at some time ??? ?tc ? ???, g(tc) = 0 or J(tc) = 0; 77 but for t ? (0,tc) we have g(t) > 0. Note that q2 = ?(tc), is a conjugate point of q1 along ?, but not necessarily the first conjugate point. Since g(t) > 0 on the interval (0,tc) we have bardblJ(t)bardbl = g(t); and since g(0) = g(tc) = 0 there should be a time 0 < ?t < tc at which bardblJ(t)bardblachieves its maximum on the interval [0,tc]. At ?t, we have ?g(?t) = 0 and from (4.13) we have ?g(t) =?K(t)g(t) < 0 in the interval (0,tc) i.e., g is strictly concave in that interval. Therefore, ?t is the only maximizer of bardblJ(t)bardbl = g(t) on [0,tc]. The existence of such a ?t is the main ingredient we needed. We proceed very similarly to the sphere case now. Let ?x = expq1 ?tX and ?x? = expq1 ?t(X+sY) and choose t1,t2?(0,tc)\{?t}such that t1 < ?t < t2. We choose two points x1 = expq1 t1X and x1 = expq1 t2X on tmapsto?expq1 tX. Similarly, we choose x?1 = expq1 t1(X + sY ) and x?2 = expq1 t2(X + sY) on tmapsto?expq1 t(X + sY ) and on both sides of ?x?. Let w = ?t?t1t2?t1; we have 0 < w < 1. Now, ?x is the L2 mean of {x1,x2} with the weight vector w = [w,1?w]T and ?x? is the L2 mean of{x?1,x?2} with the same weight vector w. Also from (4.11) we conclude that for small enough s which depends on t1 and t2 d(expq1 ?tX,expq1 ?t(X +sY )) > d(expq1 tiX,expq1 ti(X +sY)) (4.15) for i = 1,2. Therefore, we have: d(?x, ?x?) > max i=1,2 d(xi,x?i). (4.16) Note that this result holds even for t1 and t2 very close to each other and for small 78 s. Therefore, the L2 mean of two points is not averaging even in small domains. This argument, by using repetitive points and changing the weight vector, can be extended to more than two points. To summarize we state the following: Proposition 4.6.1. The L2 mean is not strongly averaging in a manifold of non- negative sectional curvature (excluding zero curvature). It also is not averaging in a manifold with positive lower sectional curvature bound. Note that the sectional curvature of a 1-dimensional manifold is zero and the above excludes the case of unit circle in R2, for example. A few words are due at this stage. First, note that we have not ruled out the possibility of finding a set of points and their perturbations whose L2 means satisfy the averaging or strong averaging properties: we just showed that (4.7) and (4.8) cannot hold for arbitrary sets of points in a manifold of positive curvature. Second, we emphasized on the lack of averaging property even in small domains, since one might suspect that this non- averaging behavior is a global phenomenon. On the other hand, as we show next, one can achieve weak averaging by restricting the data points to a small enough domain. 4.6.2 Weak Averaging Property of the L2 Mean in Positively Curved Manifolds: An Example In manifolds of positive curvature, one can attain weak averaging in small domains. First, we give an example which shows that the L2 mean can not be weakly averaging 79 in large domains. Consider the unit sphere S2 in R3. Let x1 and x2 belong to the upper hemisphere and assume that they are very close to the boundary of the hemisphere in almost antipodal positions as shown in Figure 4.3. In this figure x1 is on the yz plane and x2 is in a small neighborhood of the plane and close to the xy plane. The z axis is normal to the page and upward pointing. ?x is the midpoint of geodesic segment connecting x1 and x2. Now if x2 gets reflected to the left of the yz plane to a new point x?2, which is still close to x2, the midpoint of the new geodesic segment between x1 and x2 will be ?x? which will be very far from ?x. This means that L2 mean on the sphere is not weakly averaging in a ?large? domain such as a hemisphere. This is a global phenomenon; and stems from the fact that x1 and x2 are close to the cut locus of each other. Recall that the lack of strong averaging property on the sphere was due to the local effect of curvature. y x1 x?2 x2 d(x?2,x2) d(?x?,?x) ?x?x? x Figure 4.3: Top view of the upper unit hemisphere in R3. ?x and ?x? are the midpoints of the geodesics connecting x1 to x2 and x1 to x?2, respectively. We have d(?x, ?x?)? d(x?2,x2) while x2 and x?2 are very close to each other. Therefore, sadly, the L2 mean is not even weakly averaging in the hemisphere! We try to quantify this phenomenon. Let x1 ?M and X2,X?2?Tx1M be two tangent vectors at x1 such that x2 = expx1(X2) and x?2 = expx1(X?2). Assume that 80 the points belong to a ball B(o,?) with ? < rcx. The midpoints of the geodesic segments joining x1 and x2 and joining x1 and x?2 are ?x = expx1(X22 ) and ?x? = expx1(X?22 ), respectively. If M is of non-negative curvature with curvature upper bound ?, then using (2.19) twice gives d(?x, ?x?) ? 12 ? 2? ?? sin2??? ?d(x2,x ? 2). Applying the hinge version of the C.A.T comparison theorem with ? = 0, i.e., comparing with a triangle in Rm shows that 12d(x2,x?2) ? d(?x,?x?); and together with the previous relation we conclude 1 2d(x2,x ? 2)?d(?x, ?x ?)? 1 2? 2??? sin2????d(x2,x ? 2). (4.17) If M is of non-positive curvature with lower curvature bound of ? < 0, applying (2.20) yields 12 ? 2? ? |?| sinh2? ? |?|d(x2,x ? 2) ? d(?x, ?x ?). If we apply the hinge version of the C.A.T comparison theorem with curvature upper bound of ? = 0 and use the previous relation we have 1 2? 2?radicalbig|?| sinh2?radicalbig|?|?d(x2,x ? 2)?d(?x, ?x ?)? 1 2d(x2,x ? 2). (4.18) Therefore, in a manifold of negative curvature the equal weight L2 mean (of two points) shows better sensitivity behavior than the L2 mean in the Euclidean space. However, in a manifold of positive curvature we need to restrict the data to small domains in order to guarantee a weakly averaging behavior of the L2 mean. For example, if we require ? < min{injM2 , 0.3???}, (4.19) 81 then d(?x, ?x?) < d(x2,x?2). Note that this bound is more restrictive than ??,2 < rcx but better than Karcher?s bound (3.4). 4.6.3 A General Bound for Sensitivity Here, we show that the L2 mean in a manifold of nonpositive curvature is always strongly averaging, and in a manifold of positive curvature it can be only weakly averaging inside small enough balls as described below. The main result that we use is (4.24) which is essentially Corollary 1.6 in [50]. However, there the underlying constants are not given explicitly, while we would like to have them explicitly. To derive (4.24) we need the following result which is given as Theorem 1.5.1 in [50]: Let {xi}Ni=1 ?B(o,?) with ? < 12 min{injM, ?2??}. For ?x2 the L2 mean of {xi}Ni=1 (with respect to a weight vector w), and any x?B(o,?) we have bardbl?f2(x)bardbl?d(x, ?x2)?min{1,2?cs?2?sn ?2? }= d(x, ?x2)? ? ??? ??? 1 ??0 2????cot(2???) ? > 0 (4.20) where sn? and cs? are defined in (2.6). Consider another set of points, i.e., the perturbations of xi?s, as {x?i}Ni=1 ? B(o,?). We denote the L2 mean of {x?i}Ni=1 by ?x?2. Since the weights in the definition of f2 are fixed and only xi?s change, we use f2(x;x1,...,xN) to show the dependence of f2 on xi?s. Now, for x = ?x?2 in (4.20) we have d(?x,?x?)?bardbl?f2(?x?;x1,...,xN)bardbl? ? ??? ??? 1 ??0 1 2????cot(2???) ? > 0 . (4.21) 82 Recall the definition of?f2(x;x1,...,xN) and the fact that?f2(?x?;x?1,...,x?N) = 0; so we have ?f2(?x?;x1,...,xN) = ?f2(?x?;x1,...,xN)??f2(?x?;x?1,...,x?N) = Nsummationdisplay i=1 wiparenleftbig?exp?1?x? xi + exp?1?x? x?iparenrightbig. (4.22) From (2.19) and (2.20) we have bardblexp?1?x? xi?exp?1?x? x?ibardbl?d(?xi,?x?i)? ? ??? ??? 1 ??0 2??? sin(2???) ? > 0 . (4.23) Recall that the condition needed to use (2.19) and (2.20) is ? < rcx which holds here. Now, (4.21), (4.22) and (4.23) yield: d(?x,?x?)? ? ??? ??? 1 ??0 1 cos(2???) ? > 0 ? ??? ???? Nsummationdisplay i=1 wid(xi,x?i). (4.24) Note that for this bound we need ? to satisfy (3.4), which is the condition we assumed in (4.20). We state the following theorem: Theorem 4.6.1. The L2 mean of N points is strongly averaging in a manifold of nonpositive curvature. In a manifold with upper curvature bound of ? > 0, the L2 mean is weakly averaging in a ball of radius ? if ? satisfies ? < 12 min{injM, cos ?1(maxi wi) ?? }. (4.25) 83 Proof. From (4.24) we see that L2 mean is strongly averaging in a manifold of nonpositive curvature. By restricting ? we can achieve weak averaging in B(o,?): if cos2????maxi wi, then d(?x,?x?)?d(xj,x?j), where in the data set {xi}Ni=1, xj is replaced by x?j. To satisfy the mentioned restriction ? must obey (4.25). Remark 4.6.1. In the case of equal weights (4.25) reads as ? < 12 min{injM, cos ?1( 1 N)? ? }. (4.26) Note as N increases the upper bound converges to 12 min{injM, ?2??}, which is Karcher?s bound (3.4). The bound (4.26) is more conservative than the bound we found in (4.19) for N = 2. The reason is that (4.20) requires ? < ?4??. This is exactly the same restriction that we could surpass by using the method of Buss- Fillmore in Section 3.2. A look at the proof of (4.20) in [50] shows that in order to improve (4.20), i.e., to relate the gradient of f2(x) to d(x,?x) for larger values of ? we need to find a better lower bound for Hessf2, which seems to be more complicated. The example in the previous subsection, shows that in the case N = 2 the worst displacement of the mean by displacing only one of the points happens when the two data points are close to the cut points of each other, i.e., where the exponential map is close to becoming singular. In the case of equal weights, as (4.26) suggests, one expects that as N increases the restriction on ? to become looser. Therefore, we state the following conjecture which we already proved for N = 2: Conjecture 4.6.1. The L2 mean of N points with equal weights in B(o,?)?M is weakly averaging if ? < min{injM2 , 0.3???}. 84 4.6.3.1 A Lipschitz Property The estimate in (4.24) can be used to establish the Lipschitz continuity of the L2 mean as a function from MN = M?...?MN to M. Recall the product structure of MN and denote the induced distance function by dMN(.,.). We also use the convention of denoting (x1,...,xN) ? MN by x ? MN. As before, the L2 mean of {xi}Ni=1 with weight vector w is denoted by ?w,B(x). The following proposition follows easily from (4.24): Proposition 4.6.2. If ? < 12 min{injM, ?2??}, then ?w,B : MN ?M is a Lipschitz function with Lipschitz constant of L = 1cos2???(summationtexti w2i)12, more precisely we have d(?w,B(x),?w,B(y))?LdMN(x,y) (4.27) for any x,y ? MN, such that {xi}Ni=1 and {yi}Ni=1 lie in B(o,?). Here, ? ? 0 is treated as ? = 0. Note that for equal weights and if M is of nonpositive curvature the Lipschitz constant is L = 1?N, and if M is of positive curvature the Lipschitz constant is L > 1? N; which again is a testimony of higher sensitivity in the case of positively curved manifolds. We shall see some consequences of this fact in our further developments in Section 5.4. 4.7 Computing the Lp Means and an Example for Points on a Sphere In this section we consider the gradient descent method for finding the Lp means. 85 4.7.1 Gradient Descent Flow for Finding ?xp (1 < p 0 such thatbardblf(x)bardbl?abardblxbardbl+b for all x?Rn [20, p. 178]. 3. has a global and unique solution if f is locally Lipschitz and any solution starting in the compact set ?D?Rn stays in ?D for entire time [53, p. 94]. This result carries over to a local coordinate neighborhood in M such as B(o,?). Now, note that for 2?p 0. 2. Set ?x0p to an arbitrary point in B(o,?). 3. Find ??fp(?x0p) from (3.6) 4. Whilebardbl?fp(?xk)bardbl> ? (a) Set ?xk+1p = exp?xkp parenleftbig?h?fp(?xkp)parenrightbig and find hk = argminhfp(?xk+1p ) such that ?xk+1p ?B(o,??,p). (b) Find ?xk+1p = exp?xkp parenleftbig?hk?fp(?xkp)parenrightbig (c) Find??fp(?xk+1p ) from (3.6) (d) k?k + 1 5. Set ?xp = ?xkp. Table 4.1: Pseudo-code of a gradient descent algorithm for finding the Lp mean in M. This algorithm is idealistic but it is guaranteed to converge to the mean. 4.7.3 An Example for Points on the Sphere in R3 In this example, we find the equal weight Lp mean of N = 4 points {xi}Ni=1 which lie on the upper left part of the unit hemisphere in R3. The sphere S2 is endowed with the Riemannian metric induced from the standard metric of R3. In this metric the distance between x,y?S2?R3 is d(x,y) = cos?1?x,y?, (4.30) 89 where ?.,.?is the standard inner product in R3. We implement a gradient descent algorithm with fixed step-size hk = 1. The exponential map corresponding to this metric is calculated as follows [17]. We describe expq : TqS2?S2 at the north pole q = bracketleftbigg 00 1 bracketrightbigg . At other points it is found by a simple rotation with respect to the north pole. Identify TqS2 with R2, so Z = bracketleftbigZ1Z2 bracketrightbig is a tangent vector at q. Then z = expq Z = bracketleftbigg Z1 sinrr Z2 sinrr cosr bracketrightbigg , (4.31) where r =bardblZbardbl=?Z1 +Z2 and sin00 = 1. We require r < ? which is the injectivity radius of S2. The inverse of expq can be found easily. If z = bracketleftbigg z1z 2z 3 bracketrightbigg ?S2?R3 is not antipodal to q then Z = bracketleftbigZ1Z2 bracketrightbig = exp?1q z = rsinrbracketleftbigz1z2 bracketrightbig, (4.32) where r = d(z,q) = cos?1 z3. We denote by ?i and ?i the angles that xi?R3 makes with the z axis and the angle that the projection of xi on the xy plane makes with the x axis, respectively. They are the standard polar angles. Then xi = bracketleftbiggsin? i cos?i sin?i sin?i cos?i bracketrightbigg ; (4.33) 90 and we choose the following pairs of (?i,?i) (?1,?1) = (0,0) (?2,?2) = (2?10,0) (?3,?3) = (?5, ?4) (?4,?4) = (2?5 , ?2). (4.34) Figure 4.4 shows the four points and their Lp means for p = 1.1,2,3,10. We used ?1 ?0.5 0 0.5 1 ?1 ?0.5 0 0.5 1 0 0.2 0.4 0.6 0.8 1 x x4 x3 x1 x2 y z x1,...,x4 x1.1 x2 x3 x10 Figure 4.4: An example for equal weight Lp means of N = 4 points on the unit hemi- sphere for p = 1.1,2,3,10. The means are found by a gradient descent algorithm. As p increases the mean moves towards the L? center of{xi}4i=1. For this data set three points x1,x2 and x4 lie on the boundary of the minimal ball of{xi}4i=1. the gradient descent algorithm in Table 4.1 with fixed step-size hk = 1. It is clear as p increases ?xp moves towards x4: d(x4,?x1.1) = 0.316? while d(x4, ?x2) = 0.268? and d(x4, ?x10) = 0.220?. This demonstrates the fact that as p increases, the outliers contribute more in determining the mean. Unfortunately, finding the Lp mean for larger values of p becomes numerically very sensitive, since it amounts to computing 91 large powers of d(x,xi) (see (3.6) for the formula of the gradient of fp) which causes loss of numeric precision very fast. For example, in finding ?x20, with two different initial conditions we got two answers that are apart by 3.6?, while for p = 10 we got two answers that are apart by only 8.5377?10?7 degrees. Nevertheless, by approximating ?x20 we see that d(x1, ?x20)?d(x2, ?x20)?d(x4, ?x20)?0.212?. Recall that ?xp??x? as p increases. In fact, we have d(?x1.1, ?x20)?17.5? while d(?x10, ?x20)? 4.8?. For our configuration of points, from Theorem 4.2.1 we conclude that x1,x2 and x4 must lie on the boundary of the minimal ball of{xi}4i=1 and the center of the ball is located at approximate polar coordinates of (??,??) = (0.220?,0.337?). Hence, we see that the points belong to a ball of radius smaller than ?4 which Theorem 3.2.1 requires for uniqueness of ?x1.1. In running the algorithm for small p (1 < p < 5), we did not observe divergence or iterates leaving the hemisphere. For p = 10 and for some initial conditions the iterate leaves the upper hemisphere but it returns to it and finally converges to ?xp. Overall, this is rather remarkable since we have used a fixed step-size of hk = 1. In [17] it is reported that no divergence has been observed for a gradient descent algorithm with fixed step-size of 1 for p = 2 on the unit sphere. In [61] it is shown that, in particular, in S2 and for p = 2, if ? < 310rcx, then the gradient descent algorithm with fixed step size of hk = 1 converges to ?x2. 4.8 Appendix: Some Facts About Convex Sets in Euclidean Space A reference for the following material is [82]. Let A be a subset of the vector space Rn which is endowed with the standard inner product?.,.?. We say that A is convex if for all v1,v2 ?A and 0???1 we have ?v1 + (1??)v2 ?A. Note that if the 92 previous inclusion holds for one 0 < ? < 1 it holds for all ??[0,1]. We say that the convex set A is of dimension r if the smallest affine subspace of Rn which contains A is of dimension r. We denote this subspace bySA. A is closed (open) if it is closed (open) in SA. By the relative interior (relative boundary) or simply the interior (boundary) of A we mean the interior (boundary) of A as a subset of SA. If the dimension of A is r we can always embed A in an affine subspace of Rn of dimension r < n. A face of a closed convex set A is a convex subset F ? A such that for ?v1,v2 ? C and a 0 < ? < 1, ?v1 + (1??)v2 ? F implies v1,v2 ? F. The only n-dimensional face of an n-dimensional closed set A is A itself. The rest of the faces are of lower dimensions. Any face F negationslash= A lies on the boundary of A. Any two distinct faces of A have disjoint interiors and A can be decomposed to disjoint unions of the interiors of its faces from dimension k = 0 to k = n. A hyperplane Px passing through x?A is called a supporting hyperplane of A if A lies completely in one side of Px. Alternatively, this means that a normal vector of Px, like xx, exists such that ?y?x,nx??0 for all y ?A, or equivalently y?x makes an angle not larger than ?2 with nx. Any convex set has a supporting hyperplane at any of its boundary points. A convex cone C?Rn is a convex set for which x?C implies tx?C for all t?0. Note that the origin belongs to C. The closure of C is a convex cone again denoted by ?C. The sum of a point on the boundary and a point in the interior of C belongs to the interior of C. F ?C is a face of C if and only if summationtextNi=1 wivi ?F with wi > 0 and summationtextNi=1 wi = 1 imply vi?F for 1?i?N. 93 Chapter 5 Recursive-Iterative Approach in Defining Means 5.1 Introduction The idea of minimizing fp (see (3.2)) in order to define the center of mass or mean for a set of points on a Riemannian manifold is obviously based on our intuition about the properties of the standard Euclidean center of mass. As we explained in Subsection 4.6, in the case of a manifold of positive curvature, some rather counter- intuitive situations can happen which question our reliance on the properties of the Euclidean mean. Nevertheless, the standard Euclidean or arithmetic mean has also some other properties, which might be used to define other kinds of means. One of them is the property that the Euclidean arithmetic mean of e.g., three points is equal to the arithmetic mean of the pairwise means of the three points; and if we repeat this process, i.e., replace the points by their pairwise means, then the newly generated points will eventually converge to the arithmetic mean of the initial points. One interpretation is that all the intermediate sets of points generated in this process have the same mean, i.e., the mean remains invariant under the pairwise mean replacement. We call this the Mean-Invariance (MI) property of the arithmetic 94 mean. In this chapter we want to define new means based on similar ideas or their generalizations, i.e., to construct the mean of N points through recursion and iteration based on the mean of K < N points. Here, we consider recursion based definitions as well as pairwise definitions. One can mix the given data points and produce convex combinations via pairwise or multi-point interactions in order to generate the mean of the data points. In most occasions we generate a sequence of points which lie in the closure of the convex hull and in the of minimal ball of the data points, and eventually converge to a single point, i.e., the mean of the data points. The analysis of these iterative schemes is very interesting and we develop powerful tools for that. As explained in Section 4.5, the common property of all the Lp Riemannian means for different values of p is that each of them assigns a point in the closure of convex of hull of a given set of data points. However, the actual form of this assignment depends on p. The recursive-iterative approach also results in similar assignments. The process to define an iterative mean for N points on a manifold M induces a very interesting dynamical system on MN. In all these dynamics the diagonal of MN is an invariant set, and we would like the orbit of a point in MN to converge to the diagonal of MN. If the data points belong to a small enough strongly convex ball or a suitable strongly convex set this type of convergence happens. However, if the data points are not close enough, a periodic behavior might occur. As an example which shows this periodic behavior, we investigate a Perimeter Shrinkage (PS) scheme whose invariant sets are equidistance points on closed or periodic geodesics in M. 95 5.1.1 Contributions and Outline of the Chapter The main contribution of this chapter is introduction of the notions of convex, strictly convex and primitive vectorial mean functions (see Definitions 5.2.2 and 5.2.3), which generalize the notions of stochastic, positive stochastic and primitive stochastic matrices to Riemannian manifolds, respectively. Theorem 5.2.2 is a more general version of the Perron-Frobenius Theorem about the infinite power of a prim- itive matrix and enables us to prove our existence and uniqueness theorems for a large class of recursive-iterative means. We use the size or radius of the minimal ball of the iterates as a (so-called) Lyapunov function to prove Theorem 5.2.2. Section 5.2 is devoted to the development of the theory of primitive convex mean functions and their properties. After developing all required results we digress little, at the end of the section, and consider the dynamical systems aspect of the iterative ap- plications of a primitive vectorial mean function and relations to infinite products of stochastic matrices in Subsections 5.2.5 and 5.2.6, respectively. We study two subclasses of recursive-iterative means: the pairwise mean and Mean-Invariance (MI) based means1. We introduce the notion of pairwise mean in conjunction with the Primitive Shrinkage (PS) scheme in Section 5.3. As another contribution in Subsections 5.3.2 and 5.3.3 we study the problem of cyclic pursuit on a manifold and identify the equidistance points on closed geodesics as its invariant set (see Theorem 5.3.2 for details). The results in Section 5.3 are not entirely new by themselves. The PS scheme, in a rather different formulation, has been known 1The pairwise mean is also based on the notion of mean-invariance; however, we prefer to distinguish the two concepts since we use different tools to analyze them. 96 to geometers interested in studying the existence of closed geodesics as the Birkhoff Curve Shortening Scheme. In the course of this research some of the properties of the Birkhoff Curve Shortening were re-discovered. Nevertheless, to our knowledge the results of Theorem 5.3.2 are new in the context of cyclic pursuit schemes. In Section 5.4 we introduce the MI means and their generalizations, and investigate some of their properties. The main idea of building the class of means which we call the Mean Invariance (MI) based means (see Section 5.4) was given by Ando, Li and Mathias in [5]. They defined such means on the cone of positive (semi) definite matrices to define geometric for such matrices (see also [77]). Later Lawson and Lim in [59] extended this definition to manifolds of nonpositive curvature and specifically to Hadamard manifolds. However, the tools and techniques used in [59] do not allow for extension to the case of positive curvature manifolds 1. Despite the generalities of our approach based on primitive mean functions, there is a special case studied in Subsection 5.4.2.2, where we could define the MI means in domains which are not Riemannian balls, necessarily. We do this by an extra assumption and departure from resorting to the radius of the minimal to prove the convergence. Finally, in Section 5.5 we give some numerical examples and provide related discussions. 1To be accurate, in order to define mean-invariance based means, Theorem 3.14 in [59] requires the base case (see Definitions 5.4.1, 5.4.2 and 5.4.3) to have a property which the authors call nonexpansiveness. This property is the same as what we called averaging property in Definition 4.6.1; and as we showed in Section 4.6.1 the L2 mean lacks this property in manifolds of positive curvature (even locally). For this reason and few others, Lawson and Lim?s theory cannot be employed to yield results as general as ours in Section 5.4. 97 5.2 Convex Mean Functions and Their Finite and Infinite Compositions Our ultimate goal is to build the mean of N > 2 data points based on the mean of N ?1 or less number of points in a recursive and iterative process. To have a unified and coherent theory it is useful to introduce the notions of convex and strictly convex mean functions and their compositions. Definition 5.2.1. A convex mean or averaging function of order N ?2 in M is a function ? : MN ?M?...?M ?M that assigns to an N-tuple (x1,...,xN)?MN a point ?(x1,...,xN) called the ?-mean of (x1,...,xN) such that 1. ?(x1,...,xN) belongs to the closure of the convex hull of{xi}Ni=1, 2. there exist a natural number K ? N and indices 1 ? i1 < ... < iK ? N such that ?(x1,...,xN) belongs to the interior of any closed strongly con- vex ball containing {xi}Ni=1, unless when xi1 = ... = xiK = x in which case ?(x1,...,xN) = x, 3. ?(x1,...,xN) is continuous in its arguments. If in the above, item 2 is replaced by 2?. ?(x1,...,xN) belongs to the interior of any closed strongly convex ball con- taining{xi}Ni=1, unless when x1 = ... = xN = x, in which case ?(x,...,x) = x, then we call ? a strictly convex mean function of order N. 98 In many occasions we denote an element (x1,...,xN)?MN by the boldface letter x. Then we write x = (x1,...,xN) and ?(x) will be interpreted accordingly. Unless otherwise stated, we consider the domain of ? to be of the form D? ={(x1,...,xN)?MN|?o?M and ????,{xi}Ni=1?B(o,?)}, (5.1) where ?? < rcx is a small enough real number such that ? is well defined for all x?D?. A convex mean function is permutation invariant if for every x?D? we have ?(x?(1),...,x?(N)) = ?(x1,...,xN), where ? is any permutation of the numbers 1,...,N. Figure 5.1 shows the difference between a convex and strictly convex mean function. In the left panel, ? is a convex mean function of order N = 6. The set {xi}Ni=1 lies in the closed strongly convex ball shown, and the data points x1,x2 and x3 coincide and lie on the boundary of the ball containing, while the rest of the data points lie inside the ball. The mean lies on the boundary of the ball and ?(x1,...,x6) = x1 = x2 = x3. In the left panel, ? is a strictly convex mean function, and for the same set of data points, the mean lies inside the ball, even though some of the data coincide and lie on the boundary of the ball. Example 5.2.1 (Lp Means as Convex Mean Functions). In Theorem 3.2.1, we showed that the Lp mean of{xi}Ni=1 (with positive weights) for 1 < p??belongs to the interior of any closed ball containing {xi}Ni=1. Moreover, in Theorem 4.5.1 we showed that the mean also belongs to the closure of the convex hull of{xi}Ni=1. Therefore, the Lp mean is a strictly convex mean function of (x1,...,xN). In the 99 ?(x) x4 x5 x6 x4 x5 x6 x1 = x2 = x3 ? is a convex mean function ? is a strictly convex mean functionn x1 = x2 = x3 = ?(x) Figure 5.1: Left: A convex mean function. Right: A strictly convex mean function. case of equal weights it is also permutation invariant. The Lp(1 < p??) mean of {xi}Ni=1, if some of the points appear with zero weight, is only a convex mean function of (x1,...,xN). On the other hand, the L1 mean of{xi}Ni=1 is not a strictly convex mean function of (x1,...,xN), since the L1 mean of {xi}Ni=1 does not necessarily belong to the interior of the minimal ball of{xi}Ni=1; but, the L1 mean is a convex mean function of (x1,...,xN). In the definition of a strictly convex mean function, a more natural requirement could have been to require ?(x1,...,xN) to belong to the interior of the convex hull of{xi}Ni=1. However, this ?more natural requirement? can be restrictive and bring about difficulties for two technical reasons. First, as explained in the previous example and we discussed before in Section 4.5, the Lp(1 < p ??) mean does not satisfy this more natural requirement or at least we do not know whether it does or not. Second, our ultimate goal is to study infinite compositions of convex mean functions. Since the shape of the convex hull of a finite set of points in a non-constant curvature manifold M is not well understood, it would be difficult to define, in a sound way, the notion of reduction of the size of the convex of hull of the points in order prove convergence results. On the other hand, working with 100 balls, specifically the minimal ball, is quite straightforward. Also note that the requirement of ?(x1,...,xN) belonging to the interior of the convex hull of{xi}Ni=1 is less general than the items 1 and 2 (or 2? ) of Definition 5.2.1 combined together. Therefore, in summary, by having the two items 1 and 2 (or 2?) in Definition 5.2.1 we can develop a coherent theory, avoid unnecessary technicalities and consider a large class of means. The following proposition will give us significant notational convenience: Proposition 5.2.1. Let ?1 be a strictly convex mean function of order N1?2 and let N2 > N1 be another natural number. Then ?2 : MN2 ?M, the extension of ?1 to MN2, defined as ?2(x1,...,xN1,...,xN2) = ?1(x1,...,xN1) (5.2) is a convex (but not a strictly) mean function of order N2. Proof. The claim follows from the observation that the closure of the convex hull of {xi}N1i=1 is a subset of the closure of the convex hull of{xi}N2i=1 and that any closed ball that contains{xi}N2i=1 contains{xi}N1i=1, too. 5.2.1 Finite Compositions of Strictly Convex Mean Functions We are interested in compositions of mean functions to construct higher order mean functions from lower order ones. The simplest approach is to consider finite com- positions of strictly convex mean function in the sense described in the following theorem (whose proof is obvious): 101 Theorem 5.2.1. Let ?(x1,...,xN) be a strictly convex mean function. If one or moreof the input arguments arereplaced by strictly convex functions of (x1,...,xN), then the resultant function is still a strictly convex mean function of (x1,...,xN). Next, we consider a more tangible construction: Example 5.2.2 (Geodesic Averaging). Let ??(x1,x2) for 0 < ? < 1 be a second order strictly convex mean function that assigns to x1 and x2, a point on the unique minimizing geodesic connecting x1 to x2 at distance ?d(x1,x2) from x1. Then we define ?3(x1,x2,x3) = ?2 3 (?1 2 (x1,x2),x3). (5.3) It is easy to see that ?3(x1,x2,x3) is a third order strongly convex mean. Note that in Rn, ?3(x1,x2,x3) = 13(x1 + x2 + x3) which is the Euclidean L2 average of x1,x2 and x3 (with equal weights). However, in a general Riemannian manifold, ?3(x1,x2,x3) is not necessarily the same as the L2 mean of xi?s with equal weights. More disturbing is that ?3(.,.,.) is not necessarily a permutation-invariant strictly convex mean function either. We can factorize any convex combination summationtextNi=1 wixi to a series of nested pairwise convex combinations and translate that to a series of N nested applications of the geodesic averaging ??i(.,.)?s (0 < ?i < 1 and 1?i?N) defined above. A similar idea has been used in [88] to define a linear combination of N > 2 points on a manifold based on pairwise linear or more precisely geodesic combinations without convexity constraints. One can show that ?N defined as ?N(x1,...,xN) = ??1parenleftbigx1,??2(x2,??3(...,??N?1(xN?1,xN)))parenrightbig, (5.4) 102 is a strictly convex mean function of order N. Besides not being permutation invariant another valid criticism about the mean functions built in this example is that in their construction very little infor- mation about the global geometry of M has been used. As a concrete example, let N = 3 and {xi}Ni=1 belong to the unit sphere S2 in R3. Assume that the points are in an open hemisphere. One can imagine another manifold M embedded in R3 such {xi}3i=1 is in M, the geodesic segments connecting x2 and x3 in S2 and in M coincide; and the geodesic segments connecting ??2(x2,x3) in S2 and M also coin- cide. In this situation ?3 will assign, in both manifolds, the same point as the mean. However, the corresponding triangle?x1x2x3 in S2 and M can have very different shapes. In this sense the above mean functions do not involve much ?search? or ?sweeping? as compared to, e.g., the L2 Riemannian mean which seeks the mean globally. On the other hand the nested pairwise mean function in (5.4) is certainly much easier to calculate. In the sequel, we extend the pairwise mean calculations in two directions by infinite iterative processes. In one direction, we construct a mean which in not permutation invariant for K > 3 points, however, it includes more search or sweeping of the manifold. In other direction, again via an infinite iterative process, we define a large class of permutation invariant means of higher orders based on means of lower orders. For that we need to introduce the idea of infinite combinations of strictly convex mean functions. 103 5.2.2 Infinite Compositions of Convex Vectorial Mean Functions Now, we introduce the notions of convex and strictly convex vectorial mean functions from MN to MN, which can be considered as generalizations of the notions of stochastic and positive stochastic matrices, respectively. Definition 5.2.2. A (strictly) convex vectorial mean function of order N in M is a function ? : MN ?MN defined as ?(x) = (?1(x),...,?N(x))?MN, (5.5) where each ?i : MN ? M is a (strictly) convex mean function of order N. We call ?i the ith component of ?. The composition of two or more convex vectorial mean functions is defined the same way as the standard composition of functions on MN. Let ?1 and ?2 be two convex vectorial mean functions of order N in M by ?3 = ?1??2 we mean ?3(x) = ?2(?1(x)) (5.6) for all x?MN. We denote the kth power of ? by ?k. The following important proposition is an immediate consequence of the above definition: Proposition 5.2.2. Let ? be a convex vectorial mean function of order N in M. If y = ?(x), then the radius of the minimal ball of {yi}Ni=1 is not larger than the radius of the minimal ball of {xi}Ni=1. Moreover, if ? is strictly convex, then the 104 radius of the minimal ball of {yi}Ni=1 is less than the radius of the minimal ball of {xi}Ni=1, unless x1 = ... = xN, where both balls have radius of zero. Therefore, applying a strictly convex vectorial mean function ? to a set of N data points will generate new N points whose minimal ball radius is smaller than that of the input points, unless all the points coincide. However, note that the minimal ball of the output points is not necessarily inside the the minimal of the input points. Example 5.2.3. Let ?? be the geodesic averaging function with parameter ? as in Example 5.2.2. Assume {xi}3i=1 lies in a strongly convex ball. The mean function ??12? (x1,x2,x3) = ??(x1,x2) is only a convex mean function of order 3. Now define ? as ?(x) = (??1(x1,x2),??2(x2,x3),??3(x3,x1)), (5.7) where 0 < ?i < 1 for i = 1,2,3. Again ? is a convex but not a strictly convex vectorial mean function of order 3. This is because if x1 = x2 and y = ?(x), then y1 = x1 and y1 belongs to the boundary of the minimal ball of {xi}3i=1. However, note that ?2, the second power of ?, will be strictly convex. This example leads us to the following definition (borrowed from the theory of positive matrices), which introduces a notion similar to the primitivity for stochastic matrices [43, p. 519] but for convex vectorial mean functions: Definition 5.2.3. Aconvex vectorial mean function?of order N is called primitive, if there exists a natural number ns such that ?ns is strictly convex. We call the smallest value for ns, the index of primitivity of ?. 105 Note that any finite composition of strictly convex vectorial mean functions of order N is a strictly convex vectorial mean function by Theorem 5.2.1. The following important theorem shows that the infinite power of a primitive vectorial mean function also is (or converges to) a strictly convex vectorial mean function, which has a special feature that all its components are the same function. Theorem 5.2.2. Let ? be a primitive vectorial mean function of order N with primitivity index ns?N (i.e., ?ns is a strictly convex vectorial mean function). The sequence of convex vectorial mean functions???k??k=1 where ??k(x) = ?k(x) (5.8) converges pointwise to a strictly convex vectorial mean function ?? whose all N components areequal, i.e., ?? = (??,..., ??), where ??is a strictly convex meanfunction of order N. Proof. Let x = (x1,...,xN)?MN be a point in the domain of ?. We write xk+1 = ??k+1(x) = ?(xk), (5.9) with x0 = x. We use the convention xk = (xk1,...,xNk ). Denote the minimal ball of {xki}N1=1 by B(qk,rk). Note that by the definition of strictly convex vectorial mean function, {xki}Ni=1 will stay in ConvHull({xki}Ni=1) ? B(q0,r?0) for all k ? ns, where r?0 < r0. Moreover, note that {xi}Ni=1 lies in a ball of radius smaller than rcx; therefore, the minimal ball of{xi}Ni=1 exists and is unique. Also by Proposition 106 5.2.2 we have rk+1 ?rk; therefore, there exists r? ?0 such limk rk = r?. We show that r? = 0. Since ?xk?k stays in a compact subset of MN, it has a converging subsequence (in the product topology)?xkj?kj with a limit ?x. Now by continuity of ? we have lim kj ?ns(xkj) = ?ns(?x) (5.10) or lim kj xkj+ns = ?ns(?x) = ?x, (5.11) for a point ?x?MN. Note that by continuity of the radius of the minimal ball with respect to the data points established in Corollary 4.3.1, the radius of the minimal ball of{?xi}Ni=1 is equal to r?. The sequence of radii of minimal balls of{xkj+nsi }Ni=1, also has r? as its limit. On one hand, due to continuity, the minimal ball of{?xi}Ni=1 must have radius r?; on the other hand, from (5.11) and due to strict convexity of ?ns, the radius of the minimal ball of {?xi}Ni=1 should be less than r?. This is a contradiction and we must have r? = 0, which means that ?x1 = ... = ?xN = ?x. Note that for a large enough index Kj, {xKji }Ni=1 will be in a small enough closed ball around ?x. However, after that, due to convexity of ?,{xki}Ni=1 will stay in that ball for all k?Kj. Therefore, limk xk = ?x. Consider the assignment xmapsto? ??(x) = (??(x),..., ??(x)). Note that due to strict convexity of ?ns, ?x must belong to the interior of any ball containing{xi}Ni=1, unless all the data points coincide. Also, by construction, ?x belongs to the closure of the convex hull of {xi}Ni=1. We only need to show the continuous dependence of ??(x) on x to prove that ?? is a strictly convex mean function (see Definitions 5.2.1 and 107 5.2.2). To this end, first note that given ? > 0 we can find large enough K such ?K(x)?B(??(x), ?2)?MN. Next, note that when K, x and ? are fixed, we can find ? > 0, such that y?B(x,?)?MN implies ?k(y)?B(?k(x), ?2)?MN. Therefore, we have that given ? > 0 there are K and ? > 0 such that if y ? B(x,?), then ?K(?y) ? B(??(x), ?2). However, because of the convexity of ?, ?k(y) will stay in B(??(x),?), for k?K. Hence, ??(y) must be in B(??(x),?) and this shows continuity of ?. 5.2.3 Higher Orders of Smoothness Assuming ? is continuous, the strictly convex mean function ?? constructed in The- orem 5.2.2 is continuous. Whether ?? can achieve higher orders of smoothness, if ? itself has higher orders of smoothness, is an interesting question. In general, it seems to be difficult to answer this question; the main reason is that we are dealing with infinite compositions of functions. However, if we could find uniform (in k) bounds on the norms of the derivatives of ?k we might be able to answer the question using standard tools. A special, yet important case is when ? is Lipschitz with constant L = 1 i.e., there is L = 1 such that dMN(?(x),?(y))?LdMN(x,y), (5.12) for x,y such that{xi}Ni=1 and{yi}Ni=1 lie in a ball B(o,??). Theorem 5.2.3. If in addition to conditions in Theorem 5.2.2, ? is Lipschitz with 108 Lipschitz constant L = 1 (see 5.12) in its domain, then the new strictly convex mean functions ??and ?? also are Lipschitz with constants L = 1 and L = 1?N, respectively. Proof. Observe that the compositions of Lipschitz functions result in a Lipschitz function whose constant is the product of the Lipschitz constants of the participating functions. As a result, all the members in the sequence ???k?k defined by ??k = ?k, are Lipschitz with Lipschitz constant L = 1. Therefore, ????k is an equicontinuous sequence andby the Arzela-Ascolli theorem, there exists a subsequence???kj?kj which converges to ?? uniformly. Hence, ?? must be Lipschitz with constant L = 1. Note that because of the product metric on MN, this means that ?? : MN ?M, will be Lipschitz with constant L = 1?N. The fact that Lipschitz constant of ?k is Lk, explains why we might not have higher order smoothness in this infinite process. For example, if ? has its gradient bounded by 1 we couldapply the above theorem, and conclude that ??is differentiable almost everywhere, since it is Lipschitz. However, if the gradient of ? was bounded by L > 1, we could not have the same conclusion, at least through this path. 5.2.4 Isometry Compatibility We mentioned about the isometry compatibility of the Lp means in Section 4.4. For a convex mean function ? of order N, if ?(?(x1),...,?(xN)) = ?(?(x1,...,xN)) for any isometry ? of M, then ? is called isometry compatible. Similarly, we say that a convex vectorial mean function ? of order N is isometry compatible if ?(?(x1),...,?(xN)) = parenleftbig?(?1(x)),...,?(?N(x))parenrightbig. The following theorem shows 109 that if we start with isometry compatible means our higher order means also will be isometry compatible. Theorem 5.2.4. If in Theorem 5.2.2, ? is isometry compatible, then ?? also will be isometry compatible. Proof. The proof follows from the fact that any isometry of M is a continuous function and that we have pointwise convergence in Theorem 5.2.2. Therefore, all the means that we construct are isometry compatible. 5.2.5 Dynamical System Point of View The repetitive application of ? induces an interesting dynamical system on MN. Let us denote the diagonal of MN by ?(MN) ={(x1,...,xN)?MN|x1 = ... = xN}. (5.13) Define a cylinder with cubical or square cross section around the diagonal as follows: C? ={(x1,...,xN)?MN|?q = (q,...,q)??(MN) such that max i d(q,xi)???}, (5.14) where ?? is small enough such that all the components of ? are well-defined. Notice the similarity between this definition and Definition 5.1. Now for any point x?C? the iteration xk+1 = ?(xk) = ?k+1(x) (5.15) 110 is well-defined. More interestingly if B(qk,rk) is the minimal ball of {xki}Ni=1, then the distance between xk and ?(MN) measured in the L? norm is minimized by qk = (qk,...,qk) and rk = min q??(MN) max i d(q,xki). (5.16) Note that qk is the projection xk onto ?(MN) under the distance induced by the L? norm, and the center of the minimal ball of{xki}Ni=1 is the preimage of qk under the diagonal map, i.e., the map which maps M to ?(MN). Now rk decreases at least every ns iterations and the L? distance between xk and ?(MN) decreases, accordingly. We showed in Theorem 5.2.2 that the orbit of x, converges to the diag- onal of MN and that the diagonal of MN is the only invariant set of this dynamical system. The orbit will eventually meet the diagonal at ??(x), which is the image of the mean of {xi}Ni=1 on ?(MN). Figure 5.2 helps us visualize the behavior of the dynamical system induced by a primitive mean function. One can visualize that the smallest cubical cylinder containing xk shrinks by every ns applications of ?. Note that the diagonal ?(MN) also coincides with the continuum of the equilibria of the system (5.15). Moreover, any equilibrium point of the system is stable in the sense of Lyapunov [53], but it is not asymptotically stable since a perturbation of x??(MN) will not eventually come back to x, necessarily. Remark 5.2.1. In what preceded we alluded to the L?-projection interpretation of the L? mean. More generally, the Lp mean of {xi}Ni=1 (with weight vector w) can be considered as the projection of x ? MN onto ?(MN) under the distance 111 ??(x) MN ?(MN) C? x ?(x) ?2(x) ?k(x) Figure 5.2: Repetitive application of the primitive mean function ? brings a point x in the cylinder C? closer (in the L? sense) to ?(MN), the diagonal of MN. In the limit the orbit of x meets the diagonal at ??(x). function induced by the Lp norm in MN. For x,y?MN we define dMN,p(x,y) = parenleftbig summationdisplay i=1 widp(xi,yi)parenrightbig1p, (5.17) and for points which are close enough to ?(MN) define ?xp = argminy??(MN)dMN,p(x,y). (5.18) Therefore, we have ?xp = (?xp,..., ?xp) where ?xp is the Lp mean of{xi}Ni=1 with weights w = [w1,...,wN]T. Theorem 3.2.1 states that if x belongs to a cylinder of the form in (5.14) whose radius is smaller than ??,p, then the Lp projection of x onto ?(MN) 112 defined in (5.18) is well-defined and yields a unique point. 5.2.6 Relation to Infinite Products of Stochastic Matrices Theorem 5.2.2 is very similar to the famous Perron-Frobenius theorem which gives a condition for the convergence of the powers Ak of an N?N stochastic matrix A. More general results about the behavior of the infinite product of different matrices are available, e.g. the result of Wolfowitz [90]. It seems that our method of using the radius of minimal ball to study the behavior of the infinite composition of a single convex vectorial mean function with itself can be useful in deriving similar results for the more general case of different convex vectorial mean functions or matrices. Since it is not of direct interest to us, we just give a simple example of such a result here, but we believe that deeper results are possible. Theorem 5.2.5. Let ??k?k be a sequence of convex vectorial mean functions of order N with common domains. Assume that there exists a subsequence of ??k?k which converges uniformly to a strictly convex vectorial mean function. Then the infinite composition of??k?k defined by ??k = ?k??????1 = ?k???k?1 (5.19) converges pointwise to a strictly convex vectorial mean function ??, whose all com- ponents are equal. Proof. Let??kj?kj be a subsequence which converges uniformly to ??. Due to com- pactness, the subsequence ?xkj?kj must in turn have a subsequence (denoted again 113 by ?xkj?kj for convenience) which converges to a point ?x ? MN. We show that limkj ?kj(xkj) = limkj ??kj(?x) = ??(?x). We denote the Riemannian distance function induced by the product structure of MN by dMN(.,.). We have dMN(?kj(xkj), ??(?x))?dMN(?kj(xkj), ??(xkj)) +dMN(??(xkj), ??(?x)). (5.20) Now given ? > 0, due to the uniform convergence of ?kj to ??, we can find a natural number K1(?) such that for all indices kj larger than K1(?), dMN(?kj(x), ??(x))? ?2, (5.21) for all x in the common domain of ?k?s. Also due to continuity of ?? and since xkj ? ?x, there exists another natural number K2(?), such that for all indices kj larger than K2(?) we have dMN(??(xkj), ??(?x))? ?2. (5.22) The above three relations mean that given ? > 0, for all indices kj larger than max{K1(?),K2(?)} dMN(?kj(xkj), ??(?x))? ?2. (5.23) Therefore, limkj ?kj(xkj) = ??(?x). The rest of the proof is essentially the proof of Theorem 5.2.2 with ns = 1, except that the above relation replaces (5.10) in that proof. 114 Corollary 5.2.1. In Theorem 5.2.5 if all ?k?s are Lipschitz continuous with Lips- chitz constant of L = 1, then ??k converges to ?? where ?? itself is a strictly convex mean function with equal components; and moreover, it is Lipschitz continuous with Lipschitz constant of L = 1. Corollary 5.2.2. Let ??k?k be a sequence of vectorial strictly convex mean func- tions of order N in M. If there exists a function which appears infinitely many often in that sequence, then the result of Theorem 5.2.5 holds. Remark 5.2.2. In [69], Moreau introduces a general theory of infinite compositions of certain convex nonlinear operators and in particular stochastic matrices to ana- lyze consensus algorithms with time-varying communication links or topologies. The goal of a consensus algorithm is to bring a set of autonomous agents to agreement on a state or quantity via local communications between the agents. The theory devel- oped in [69] is based on the analysis of the graphs associated with the interaction of the agents and the properties of the convex linear or nonlinear operators employed. The convexity assumptions for the those functions mentioned in [69] are similar to the ones we made for strictly convex vectorial mean functions in Definition 5.2.2 except that the assumptions in [69] do not include (or are not equivalent to) the strict inclusion with respect to closed balls and only a strict inclusion with respect to the convex hull is assumed. Hence, one cannot use the convergence results intro- duced in [69] to define the class of means we define here. The main reason is that the proofs in [69] rely on the inclusion properties of the convex hull of finite points in Euclidean space which are not valid in a Riemannian manifold. Nevertheless, we 115 believe that the rest of the ideas used in [69] can be used to prove stronger versions of Theorem 5.2.5. 5.3 Perimeter Shrinkage Scheme, Cyclic Pursuit and a Pairwise Based Mean In this section, we use the simplest mean function of order N = 2, i.e., the geodesic midpoint assignment or more generally the geodesic averaging, to design mean func- tions of higher orders. Although, it is possible to analyze this process using the methods from the previous section, it would more interesting to use other tech- niques. 5.3.1 Perimeter Shrinkage Scheme and a Pairwise-Iterative Mean Let ?? be the geodesic averaging function defined in Example 5.2.2. Consider the points {xi}Ni=1 ? M. We use a cyclic indexing: xN+1 = x1 and xN+2 = x2. Now assume that xi and xi+1 for all 1?i?N are close enough such that ??(xi,xi+1) is well defined i.e., is a single point. For example, if each pair of (xi,xi+1) lies in a ball of radius less than injM2 , where injM is the injectivity radius of M, then ??(xi,xi+1) is well-defined. In particular, if all of the points lie in a strongly convex ball things become very simple, as we shall see. Consider the following iterative scheme: xk+1i = ??(xki,xki+1), 1?i?N (5.24) 116 where k is the iteration step and x0i = xi. Figure (5.3) schematically shows the first iteration of the process for N = 4 and ? = 12. We call this iterative scheme a perimeter shrinkage scheme because of the next lemma. We define the perimeter shrinkage map PS? : MN ?MN as PS?(x) = (??(x1,x2),...,??(xN,x1)), (5.25) where x = (x1 ...,xN)?MN. Here, we have implicitly assumed that the geodesic averagings are well-defined. We denote (x1,...,xN) ?MN by x?MN and write (5.24) compactly as xk+1 = PS?(xk). (5.26) If we assume the domain of PS? to be as in (5.1) with ?PS? ? rcx, then PS? is a convex vectorial mean function of order N and it is easy to check that the N ?1th power of PS?, is a strictly convex vectorial mean function of order N. Therefore, we could use Theorem 5.2.2 to show the convergence of the iteration, when initial data set{xi}Ni=1 lies in a strongly convex ball. However, since there is another interesting mechanism involved, i.e., the perimeter shrinkage mechanism, and we want to consider a domain different from the one mentioned above, we use a different approach. Another function which we need is the perimeter function P : MN ?Rdefined as: P(x1,...,xN) = Nsummationdisplay i=1 d(xi,xi+1), (5.27) 117 which is also continuous. Similarly, the energy function E : MN ?R, which is the sum of the squares of the side lengths of the geodesic polygon x1 ...xN, defined as E(x1,...,xN) = Nsummationdisplay i=1 d2(xi,xi+1) (5.28) is continuous, too. Also recall the definition of a closed geodesic: Definition 5.3.1. On a Riemannian manifold a closed geodesic is a loop which is geodesic at all of its points. Less formally, a closed geodesic is a geodesic that comes back to its initial point but at an angle which is equal to ?. A geodesic that is only closed, i.e., it closes the loop at an angle different from ? is called a geodesic lasso [24, pp. 255]. A closed geodesic is also called a periodic geodesic [9]. A simple closed geodesic is a closed geodesic that does not intersect itself. Closed geodesics, their existence and counting them have very old and deep roots in Riemannian geometry [9]. They correspond to the periodic solutions of the geodesic flow. The following simple lemma is very useful: Lemma 5.3.1. Let{xki}Ni=1 and{xk+1i }Ni=1 be two consecutive sets of points gener- ated by the PS process with parameter ?. Let Pk and Pk+1 be the perimeters of the closed geodesic polygons xk1xk2 ...xkN and xk+11 xk+12 ...xk+1N , respectively. Also let Ek and Ek+1 denote the sums of the squares of the side lengths of the closed geodesic polygons xk1xk2 ...xkN and xk+11 xk+12 ...xk+1N , respectively. We have 1. Pk+1?Pk with Pk+1 = Pk if and only if the geodesic polygon xk1xk2 ...xkN is a 118 x01 x02 x12 x11 x03 x13 x04 x14 Figure 5.3: The initial geodesic polygon x01x02x03x04 is transformed to the new geodesic polygon x11x12x13x14 by inserting the new vertices at the midpoint of each side. The perimeter of x01x02x03x04 is strictly less than that of x11x12x13x14, unless the latter is a closed geodesic. We assume that the initial points are such that the sides of the initial geodesic can be determined uniquely, i.e, the shortest geodesics of interest are unique. closed geodesic1. 2. Ek+1 ?Ek with Ek+1 = Ek if and only if the geodesic polygon xk1 ...xkN is of equal side lengths and it is also a closed geodesic. Here, a point is assumed to be a closed geodesic of length zero. Proof. The proof of the first part is essentially applying the triangle identity. Recall that for x,y,z?M: d(x,y)?d(x,z) +d(z,y) (5.29) with equality if and only if z lies between x and y on the shortest geodesic connecting 1To be pedantic, we have to say that ?the geodesical polygon xk1xk2 ...xk N can be parameterized as a unit speed closed geodesic,? since geodesics are curves, i.e., functions defined on the real line. 119 these two points. Therefore, we have: d(xk+1i ,xk+1i+1 )?d(xk+1i ,xki+1)+d(xki+1,xk+1i+1 ) = (1??)?d(xki,xki+1)+??d(xki+1,xki+2), (5.30) for 1 ? i ? N; and hence Pk+1 ? Pk. The equality only happens when we have equalities in all the triangle inequalities employed. But that means that the consec- utive geodesic segments xkixki+1 and xki+1xki+2 have to be geodesic at the connecting vertex xki+1. As a result, the whole geodesic polygon xk1xk2 ...xkN has to be a closed a geodesic. For the second part first apply the triangle inequality as before and expand as d2(xk+1i ,xk+1i+1 )?(d(xk+1i ,xki+1) +d(xki+1,xk+1i+1 ))2?(1??)2?d2(xki ,xki+1) + 2?(1??)d(xki+1,xk+1i+1 ))d(xki+1,xki+2) +?2d2(xki+1,xki+2),(5.31) for 1?i?N. Now adding both sides of the above for 1?i?N yields: Ek+1?(1??)2Ek + 2?(1??) Nsummationdisplay i=1 d(xki+1,xk+1i+1 ))d(xki+1,xki+2) +?2Ek. (5.32) Now applying the Cauchy-Schwartz inequality gives Ek+1?Ek((1??)2 + 2?(1??)+?2) = Ek, (5.33) where the equality holds if and only if the geodesic polygon xk1 ...xkN is a closed 120 geodesic (as before) and the two vectors [d(x1,x2),...,d(xN,xN+1)]T and [d(x2,x3),...,d(xN+1,xN+2)]T are parallel in RN (i.e., the equality condition for the Cauchy-Schwartz inequality). It is easy to see that the latter requires that d(xi,xi+1) = d(xi+1,xi+2) for 1?i?N. This completes the proof. The above lemma shows that the set ? ={(y1,...,yN)?MN|geodesic polygon y1 ...yN is a closed geodesic}, (5.34) is invariant under PS?, i.e., PS?(?)??. For the elements of ? the perimeter of the corresponding geodesics remain invariant under the PS? map, too. However, the energy function E in (5.28) is invariant under PS? only on ?E ??: ?E ={(y1,...,yN)??|d(y1,y2) = ... = d(yN,yN+1)}. (5.35) Neither of these two sets is a fixed point set of PS?. One can see that y = (y1,...,yN) ? MN is a fixed point of PS? if and only if y1 = ... = yN, i.e., the length of the corresponding closed geodesic is zero or equivalently y belongs to the diagonal of MN. In order to define a mean we need to guarantee that the PS scheme converges to a fixed point of PS?. If {xi}Ni=1 lies in a strongly convex set this will be guaranteed. A strongly convex set cannot contain a non-trivial closed geodesic, since if such a geodesic exists in that set, for two points on the geodesic whose distance from each other is half of the length of the closed geodesic, there are two minimizing geodesics connecting them in that set, which contradicts the defini- 121 tion of a strongly convex set. Also another consequence of the strong convexity is that if{xi}Ni=1 lies strongly convex set A, so does{xki}Ni=1 for all k?1. Theorem 5.3.1. Consider the perimeter shrinkage map PS? defined in (5.25). If {xi}Ni=1 lies in a compact strongly convex set A?M, then the perimeter shrinkage iteration scheme induced by PS? converges to a single point ?x in A, which also belongsto theclosure oftheconvex hull of{xi}Ni=1. The assignment (x1,...,xN)mapsto??x is a continuous function from A to A. Moreover, the function ?P,?(x1,...,xN) = ?x with domain defined in (5.1) and with parameter ?? < rcx is a strictly convex mean function of order N. For ? = 12 we call ?x the pairwise mean of (x1,...,xN)1. Proof. Denote the convex hull of {xi}Ni=1 by C and its closure by C. Note that C?A is a compact strongly convex set by Proposition 2.3.1; hence, it includes no closed geodesic. Consider the sequence?xk??k=0 defined in (5.26). The sequence stays in the compact setC= C?...?C. By compactness, there is a subsequence ?xkj? which converges to a point ?x = (?x1,..., ?xN)?C. On the other hand, from Lemma (5.3.1) the sequence of positive numbers?Pk?k and hance?Pkj?kj should have a limit P?; and also P(PS?(x)) < P(x) in C unless P(x) = 0. Note that P(?x) = P? (by continuity of P). If P?negationslash= 0 we have (by continuity of functions PS? and P): P? = lim k?? P(PS?((xk)) = lim kj P(PS?(xkj)) = P(PS??(?x)) < P(?x), (5.36) which is a contradiction. Therefore, P? = 0 and ?x = (?x,..., ?x), where ?x ? C. 1The pairwise mean in general is not permutation invariant, so to be accurate we use n-tuple (x1,...,xN) instead of{xi}Ni=1. 122 Now, after finite steps the sets{xkji }Ni=1 will be in an arbitrary small strongly convex ball around ?x; and hence does the next set of points PS?(xkj1 ,...,xkjN) and so forth. Therefore, xk ??x = (?x,..., ?x) in the product topology. The proof of the continuity of xmapsto??x is similar to the proof of a similar fact in Theorem 5.2.1. Nowassume{xi}Ni=1 lies inaclosed ballB(o,?). If (y1,...,yN) = PSN?1? (x1,...,xN) then {yi}Ni=1 lies the smaller ball B(o,??) for some ?? < ?, unless xi?s coincide and they lie on the boundary of B(o,?). Therefore, excluding this case, by the previous part ?x belongs to the interior of B(o,?), which shows that ?P,?(x) is a strictly convex mean function of order N. Remark 5.3.1. It is appropriate here to mention both the strengths and limitations of the notion of a primitive vectorial mean function and Theorem 5.2.1. Note that if we define a more general version of the PS map PS? in (5.25) as ??(x1,...,xN) = (??1(x1,x2),...,??N(xN,x1)), (5.37) where ? = [?1,...,?N]T, then ??(x1,...,xN) is a primitive vectorial mean function with primitivity index of N?1, provided we consider its domain to be of the form (5.54) with parameter rcx. Therefore, the iterative application of ??, by Theorem 5.2.1, induces a strictly convex mean function of order N. A look at the proof of Lemma 5.3.1, shows that with ?i?s being different from each other, contrary to PS?, ?? is not a perimeter shrinkage map, necessarily. Hence, the use of the radius of the minimal ball in Theorem 5.2.1, which is the consequence of the properties of the primitive vectorial mean functions, enabled us to analyze more general mean 123 functions than those we could analyze by just using the perimeter function. At the same time the use of perimeter function in the first part of Theorem 5.3.1, enabled us to prove the convergence of the PS scheme in a strongly convex domain whose shape is arbitrary, i.e., it is not a ball necessarily. 5.3.2 Cyclic Pursuit on Manifolds: Discrete-time Case As soon as we make sure that a set of data points lies in a strongly convex ball Theorem 5.3.1 assures us that the perimeter shrinkage iteration converges to a single point. How about when the points are not in any strongly convex domain? Analysis of the problem in this case has relations with the problem of cyclic pursuit in which N ordered agents such as ants, frogs or moving vehicles follow each other in a cyclic fashion i.e., agent i follows agent i+1 (modulus N) with certain rules of pursuit [14]. The simplest rule of pursuit can be that agent i moves along the shortest path to agent i+1 either with speed proportional to its distance to agent i+1 or with unit speed. In the discrete-time setting we assume that each agent hops to the midpoint of the shortest geodesic connecting it to the agent it follows. Therefore, maybe the behavior of frogs will be more suitable to this situation. Now with this formulation the cyclic pursuit problem and the PS scheme are equivalent. It is well-known that the cyclic pursuit in the plane or in Euclidean space results in convergence to a single point [14]. Here, we consider the cyclic pursuit problem in a compact Riemannian manifold. Theorem 5.3.2. Let M be a compact and connected (hence complete) Riemannian manifold. For the points{xi}Ni=1?M assume the PS scheme (or the cyclic pursuit 124 scheme) iteration is well-defined at each step (in particular, if d(xi,xi+1) < injM, then this would be guaranteed). Then xk ??E and E(xk)?E??0, where ?E is defined in (5.35). More precisely, 1. If E? = 0, then xki ?x??M for 1?i?N. 2. If E? > 0 and x? = (x?1,...,x?N) ? ?E is one of the cluster points of ?xk?k, then the geodesic polygon x?1 ...x?N is a closed geodesic of length?NE? and d(x?i,x?i+1) = radicalBig E? N for 1?i?N. Moreover, the set ?El ={(y1,...,yN)??E|d(xi,xi+1) = lN, 1?i?N}??E (5.38) is an asymptotically unstable invariant set of the PS scheme unless l = 0, by which we mean that for any point x ? ?El there is a small perturbation of x such that the orbit of x never converges back to ?El. If the set of the lengths of the closed geodesics of length smaller than l is finite1, then ?El is an unstable invariant set for the PS scheme, meaning that there exists a small perturbation of x ? ?El whose orbit will not stay close to ?El. Proof. From the proof of Lemma 5.3.1 we know that if d(xki,xki+1) < injM for 1 ? i ? N, then d(xk+1i ,xk+1i+1 ) < injM for 1 ? i ? N. Hence, the mentioned condition is sufficient to guarantee that the PS process remains well defined at all steps. Now, the decreasing sequence ?Ek?k of real numbers must have a limit, 1This condition is satisfied automatically in the special orthogonal groups and the Grassman- nian manifolds (see Remark 6.2.2). 125 denoted by E?. If E? = 0, then since after finite number of steps PS?(xk) belongs to a small convex ball, the scheme converges to a single point by Theorem 5.3.1. Now let E? > 0. The compactness of MN requires the sequence ?xk?k to have a converging subsequence ?xkj?kj, with the limit x?. By continuity of the functions involved we have: E(x?) = E(lim kj xkj) = lim kj E(xkj) = E? (5.39) and E(PS?(x?)) = E(PS?(limxkj)) = lim kj E(PS?(xkj)) = E?. (5.40) Therefore, x? belongs to ?E. This means that any cluster point of?xk?k belongs to ?E. This together with the compactness of MN implies that xk ??E. Claims 1 and 2 follow from the definition of ?E. Now, note that for any x??El with l > 0, there exists a small perturbation ?x for which E(?x), and hence, E(?xk) for k?0 is smaller than E? = l2N. Therefore, obviously ?xk cannot come back to ?El because of the continuity of xmapsto? E(x). If there are only a finite number of possible values for the lengths of closed geodesics shorter than l, then since E(?xk) eventually will differ from E? by a constant ? > 0, the distance of ?xk from ?El also will eventually be larger than a constant ??. This means that ?El(l > 0) is an unstable invariant set with respect to PS?. This theorem says that in a cyclic pursuit on M the agents will converge to a formation which consists of equidistance points on closed geodesics of fixed length l. However, only ?E0, i.e., the diagonal of MN is an asymptotically stable invariant 126 set of the PS scheme. Note that we did not claim that the formation corresponds to a single closed geodesic. Convergence to a single closed geodesic requires a deeper study of the properties of the PS scheme such as the one in [21]. 5.3.3 Cyclic Pursuit Schemes in Continuous-time The formulation of the PS scheme described before is in discrete-time. It is easy to formulate a continuous-time counterpart of the PS scheme: for all i at each instant of time xi should pursue xi+1 along the minimizing geodesic connecting them with speed equal to the distance between them. We implicitly have assumed that always xi and xi+1 are such that the minimizing geodesic connecting them is unique. This implies that xi pursues xi+1 in the sense that it tries to reduce its distance from xi+1. We can write this scheme as: ?xi(t) =?12?d2(xi+1,x)vextendsinglevextendsinglex=x i = exp?1xi(t)xi+1(t), 1?i?N. (5.41) The initial conditions for this set of equations are xi(0) = xi. A simple argument shows that similar to the discrete-time PS scheme, the continuous-time PS scheme also remains well-defined for all t > 0, if we start with initial conditions such that d(xi(0),xi+1(0)) < injM for 1?i?N. Now consider the energy E defined in (5.28) as a Lyapunov function for the dynamical system (5.41) defined on MN. d dtE(x1(t),...,xN(t)) = Nsummationdisplay i=1 ??xi,?d2(x,xi+1)vextendsinglevextendsinglex=x i ?xi(t)+??xi+1,?d2(x,xi)vextendsinglevextendsinglex=x i+1 ?xi+1(t) (5.42) 127 or equivalently d dtE(x1(t),...,xN(t)) = ?2 Nsummationdisplay i=1 parenleftbigg ?exp?1xi(t) xi+1(t),exp?1xi(t) xi+1(t)?xi(t) + ?exp?1xi+1(t) xi+2(t),exp?1xi+1(t) xi(t)?xi+1(t) parenrightbigg . (5.43) Now algebraic simplification and noting that bardblexp?1xi xi+1bardblxi = bardblexp?1xi+1 xibardblxi+1 = d(xi,xi+1) yield d dtE(x1(t),...,xN(t)) =?2 Nsummationdisplay i=1 bardblexp?1xi(t) xi?1(t) + exp?1xi(t) xi+1(t)bardbl2xi(t), (5.44) wherebardbl.bardblx is the norm induced by the Riemannian structure of M at x?M. We conclude that along the trajectories of (5.41) we have ddtE(x1(t),...,xN(t)) = 0 if and only if x = (x1,...,xN)??E. In the language of LaSalle?s invariance principle [53, 42], ?E is the largest invariant set of the system in (5.41) at which the total derivative of E is zero. Based on this observation we could state a theorem similar to Theorem 5.3.2 for the continuous-time case; however, for the sake of space we avoid that. We should mention that since the closed geodesic formation is not a stable for- mation (see Theorem 5.3.2) one might think of stabilizing this formationby requiring the autonomous agents to apply some sort of control or feedback in a decentralized fashion. It might be possible that second order dynamics for the agents as intro- duced in [49] for the planar case is helpful in this case too. Also we add that a similar control problem for stabilization of the formation of agents on the sphere 128 S2 embedded in R3 has been proposed in [73], with the caveat that the control described is not found intrinsically. By this we mean that the control depends on quantities which are measured and defined in the ambient space (i.e., R3). We also refer the reader to [79, 80, 81] for related works. 5.4 Means Based on Mean-Invariance In this section we introduce a simple scheme to construct higher order permutation invariant strictly convex mean functions from simple geodesic midpoint assignment. This idea was first introduced in [5] for defining a geometrical mean for positive definite matrices. Here, we extend it to more general manifolds and also associate it with the concept of mean-invariance. First, we give the definition of the simpler equal weight Mean-Invariance (MI) based mean. We describe the procedure in the following definition, and in Section 5.4.2 we prove existence and uniqueness and some other properties of the derived means. Definition 5.4.1. Let (M,d) be a complete Riemannian manifold and ?1 2 (.,.) de- note the geodesic midpoint assignment map (see Example 5.2.2). Define the Nth order Mean-Invariance (MI) based mean function ?MI,N recursively as: 1. If N = 2, then set ?MI,2(.,.) = ?1 2 (.,.), 2. Otherwise, in order to find ?MI,N(x) for any x?MN set x01 = x1,...,x0N = xN and perform the following iteration: xk+1i = ?MI,N?1(x?N+1?ik ), 1?i?N, (5.45) 129 where for the N-dimensional vector x ? MN, x?j ? MN?1 is an N ?1- dimensional vector built from x by dropping the jth component of x out. We write the above iteration more compactly as: xk+1 = parenleftbig?MI,N?1(x?Nk ),...,?MI,N?1(x?1k )parenrightbig (5.46) for k = 1,2,... or even more compactly we write xk+1 = ?MI,N(xk), (5.47) where the components of?MI,N(?) are defined in (5.46). Assuming all the steps of this construction are well-defined and if the sequence xk = (xk1,...,xkN)? MN converges to a point ?x = (?x,...,?x), then we call ?x the Nth order MI mean of{xi}Ni=1 and set ?MI,N(x) = ?x. Assuming ?MI,N(x1,...,xN) is well-defined, we observe that by construction (i.e., the iteration in (5.45)) all the generated intermediate sets of points {xki}Ni=1 have ?x as their Nth order MI mean. Hence, the name ?mean-invariance? is justified. In fact, the main idea in defining the Nth order MI mean, is nothing but replacing each point by the N?1th order mean of N?1 points including the point of interest itself and repeating this process. One canshow, using the Perron-Frobenius theorem, that this process in Euclidean space produces the standard Euclidean mean with equal weights. Note that ?MI,2(.,.) = ?1 2 (.,.) is permutation invariant. Now, assume ?MI,N?1 is permutation invariant. Since all N combinations of size N?1 from the 130 set {xi}Ni=1 are included in defining ?MI,N (see (5.46)), by induction we conclude that ?MI,N must be permutation invariant. 5.4.1 Weighted Mean-invariance Means Here, we propose two methods to incorporate weights in the defining the MI mean. In the first method, given the set{xi}Ni=1 we can increase the number of points by introducing new points that are repetition of the original points. For example, for {x1,x2}, ?MI,2(x1,x2) is the midpoint of the minimizing geodesic connecting them x1 and x2, while if we set x3?x1 then ?MI,3(x1,x2,x3) is a point on the geodesic at distance third of 13d(x1,x2) from x1. More generally, we can introduce Ni repetitions of xi for all i, and resemble rational weights of wi = Nisummationtext i Ni . We can approximate any weight by this process; however, obviously it has the drawback of dramatically increasing the number of total points to Nnew = summationtextNi=1 Ni. The second idea is to include the weights in the geodesic averaging step ex- plicitly starting with N = 2. As an example, let N = 3 and weights w1,w2 and w3 be given. We interpret these weights as global contributions of xi?s in the mean. However, for the pairwise mean of x1 and x2 we require the weight ?1 = w1w1+w2 for x1 and 1??1 for w2. Similarly we set ?i = wiwi+wi+1 with modulus 3 cyclic indexing. Example 5.4.1. Let us consider an example with N = 3 points in R. We have three weights {wi}3i=1 from which we determine the ?i as ?i = wiw i+wi+1 (i = 1,2,3). Denote xk = [xk1,xk2,xk3]T ?R3. Then the iteration process is obtained from (5.45) by replacing the midpoint mean function ?MI,2 = ?1 2 that gives xk+11 ?s by ??1(.,.), 131 i.e., we set xk+11 = ??1(xk1,xk2) and so forth for i = 2,3. Therefore, we have xk+1 = Axk, (5.48) where A = ? ?? ?? ?? ? ?1 1??1 0 0 ?2 1??2 1??3 0 ?3 ? ?? ?? ?? ? . (5.49) Now, from the Perron-Frobenius theorem we know that Ak converges to a rank one matrix A? whose all rows are the same [43, p.524]. This common vector is the eigenvector corresponding to the largest eigenvalue of AT which is 1. The eigenvector should be normalized so that the sum of its entries are 1. It is easy to calculate this eigenvector as ?w = [?w1, ?w2, ?w3]T where ?wi = (1??i) ?1 summationtext3 j=1(1??j)?1 , i = 1,2,3. (5.50) More concretely, if w = [w1,w2,w3]T = [13, 13, 13]T, then ? = [?1,?2,?3]T = [12, 12, 12]T and ?w = [13, 13, 13]T, hence ?3MI(x1,x2,x3) = 13(x1 + x2 + x3). However, if w = [14, 18, 58]T, then ? = [23, 16, 12]T and ?w = [3077, 1277, 3577]T, hence the explicitly weighted MI mean of{xi}3i=1 corresponding to w is ?WMI,3(x1,x2,x3) = 3077x1 + 1277x2 + 3577x3. As it can be seen in this simple example the relation between initial weights wi?s and the actual weights wi?s is a nonlinear relation. However, our intuition that putting a large w1 for x1 will result in large ?w1 is correct. More accurately, we can see that 132 if w1 > w2 > w3 then ?1 > ?2 > ?3, and hence ?w1 > ?w2 > ?w3. Next, we give a formal definition for the explicitly Weighted MI (WMI) mean based on the second idea mentioned above, since it is more practical. Definition 5.4.2. Let (M,d) be a complete Riemannian manifold and ??(.,.) be the geodesic averaging function. Define the Nth order weighted Mean-Invariance based mean ?WMI,wN(.) corresponding to the positive weight vector wN = [w1,...,wN]T recursively as: 1. If N = 2 then ?WMI,w2(.,.) = ?w1(.,.) and w2 = [w1,w2]T is the weight vector. 2. Otherwise, to find ?WMI,wN(x) where x = (x1,...,xN), set x0 = xand perform the following iteration: xk+1 = parenleftbig?WMI,w1N?1(x?Nk ),...,?WMI,wN N?1 (x?1k )parenrightbig. (5.51) In the above the corresponding weight vector wiN?1 is found as follows: wiN?1 = w ?N+1?i N 1TN?1w?N+1?iN i = 1,...,N + 1. (5.52) Here 1N?1 is an N?1-dimensional vector all whose elements are 1. We write (5.51) more compactly as xk+1 = ?WMI,WN(xk), (5.53) where WN is the N ?N matrix of weights whose ith row is equal to the 133 transpose of (wiN)?N+1?i. Here, for an N?1-dimensional vector x, by x?i we mean the N-dimensional vector whose ith component is zero and the rest of it is filled in an increasing order of indices with the elements of x also in a increasing order of their indices (cf. x?i in Definition 5.4.1). If the sequence of points?xk?k in MN converges to a point ?x = (?x,..., ?x), then we call ?x the Nth order WMI mean of{xi}Ni=1 with weight vector wN = [w1,...,wN]T and set ?WMI,wN(x1,...,xN) = ?x. In the case of equal weights the WMI mean reduces to the MI mean. Also note that the matrix WN built in the definition, is a stochastic matrix which has exactly one zero at each row, and each column of it also has exactly one zero. This suggests that the convex vectorial mean function?WMI,wN must be a primitive mean function of index ns = 2, as we will see next. 5.4.2 Existence, Uniqueness and Other Properties Next, we prove existence and uniqueness properties of the MI and WMI means. For this we use Theorem 5.2.2 about convex vectorial mean functions. Theorem 5.4.1. Define D ={(x1,...,xN)?MN|?o?M and ? < rcx,{xi}Ni=1?B(o,?)}. (5.54) For any integer N ? 2 and positive weight vector wN, the functions ?MI,N and ?WMI,WN defined in Definitions 5.4.1 and 5.4.2 are primitive mean functions with primitivity index of ns = 2 on the domain D defined above. Therefore, if{xi}Ni=1 lies 134 in a ball B(o,?) with ? < rcx, then the MI mean and the WMI mean with weight vector wN exist, are unique, belong to B(o,?) and to the closure of the convex hull of{xi}Ni=1. Moreover, ?MI,N(x) and ?WMI,wN(x) are strictly convex mean functions of order N with domain D as above. Proof. The proof is by induction over N. Obviously, all the claims hold for N = 2. Now, we assume they holds for N ?2, and we show they hold for N + 1. We only need to show that ?MI,N+1 and ?WMI,WN+1 are primitive mean functions. The rest of the claims follow from Theorem 5.2.2, immediately. First, note that ?MI,N+1 is a convex vectorial mean function of order N + 1. Its convexity follows from the fact that each of its components, i.e., xmapsto??MI,N(x?N+2?i) is convex by Proposition 5.2.1. Now suppose {xi}N+1i=1 ?B(o,?) and y = ?MI,N+1(x). If of one yi?s, say y1, lies on the boundary of B(o,?), then by strict convexity of ?MI,N, x1 = ... = xN and they must lie on the boundary of the ball. Assume xN+1 does not lie on the boundary of the ball - otherwise we have nothing to prove-, then again by the strict convexity of ?MI,N, yi (2?i?N +1) lies inside the ball. Now since only one point among{yi}N+1i=1 lies on the boundary of the ball, for the new set of points generated as z = ?MI,N+1(y) we see that all zi?s must belong to the interior of the ball. This means that ?MI,N+1 is primitive with primitivity index ns = 2. The same argument also works for ?WMI,WN+1. This result, in particular gives a new proof for the existence, uniqueness and continuity results derived in [5, 59]. 135 5.4.2.1 A Lipschitz Property in Manifolds of Nonpositive Curvature In the construction of the MI mean from order N?1 to N, if each mean function of order N?1 has Lipschitz constant L, then the vectorial mean function constructed has Lipschitz constant L? = ?N?1L. In order to guarantee the resultant mean function of order N to be Lipschitz, according to Theorem 5.2.3 we must have L = 1. Therefore, we need to have L? = 1?N?1. This only can happen if M is of nonpositive curvature. Theorem 5.4.2. In a manifold of non-positive curvature the MI mean function ?MI : MN ?M of order N is Lipschitz with constant L = 1?N. Proof. For the L2 mean of N = 2 points with equal weights or the geodesic midpoint assignment we found the Lipschitz constant in Proposition 4.6.2 to be L = 1?2. Therefore, ?MI,3 : M3 ?M3 defined according to (5.47) is Lipschitz with constant L = 1; hence, by Theorem 5.2.3 ?MI,3 will be Lipschitz with constant L = 1?3. Now a simple induction proves the claim. It seems that, unfortunately, opposite to the Riemannian L2 mean no form of Lipschitz property exists for the MI mean in the case of a manifold of positive curvature. From Proposition 4.6.2, recall that in a manifold of positive curvature (or more precisely in a manifold of nonnegative curvature where the curvature is not identically zero) the Riemannian L2 mean, and in particular the geodesic midpoint assignment, are Lipschitz but with a Lipschitz constant larger than in the Euclidean case. This results in a Lipschitz constant larger than 1 for the vectorial mean 136 function which is used to construct the MI of order N = 3. Consequently, our iterative process, as explained in Section 5.2.3, annihilates Lipschitz property by multiplication of the constants larger than 1. 5.4.2.2 An Existence and Uniqueness Result for Non-ball Domains In Theorem 5.4.1, in order to have a valid MI or WMI mean we required {xi}Ni=1 to belong a to strongly convex ball of radius smaller than rcx. This in turn, was needed because of the requirements of Theorem 5.2.2 and our requirements about convex mean functions. As we shall see in Chapter 6 this assumption is restrictive for our applications in homogenous spaces such as the orthogonal group and the Grassmannian manifold, where we would like to have the same result but for points which belong to a special non-ball strongly convex set (see Sections 6.4 and 6.6 for more details). It is possible to prove existence and uniqueness of the MI and WMI means in a less restrictive situation as follows: Theorem 5.4.3. Let (M,d) be a complete Riemannian manifold. Let A?M be a strongly convex open set with o?A such that xmapsto?d2(x,o) is a strictly convex function in A. Also assume that there exists another strongly convex set A? such that ?A ? A?.1 If {xi}Ni=1 ? A, then the MI and WMI means of {xi}Ni=1 exist, are unique, depend continuously on xi?s, and belong to the closure of the convex hull of {xi}Ni=1. Moreover, we have d(o,?MI(x1,...,xN))?max i d(o,xi), (5.55) 1This is a technical assumption which is automatically satisfied when A has a specific shape such as a Riemannian or a Finsler ball as we shall see in Section 6.6. 137 with equality if and only if xi?s coincide. The same holds for ?WMI. Proof. The proof is based on induction over the number of points N. The assump- tion about A? guarantees that the closure of the convex hull of{xi}Ni=1 is a strongly convex set (see Proposition 2.3.1); and the claim that the mean, if it exits and is unique, belongs to the closure of the convex hull of{xi}Ni=1 is the direct consequence of the construction and closed-ness of the closure of the convex hull of{xi}Ni=1. Also note that in case the MI or WMI mean of {xi}Ni=1 ?A exists and is unique, after finite number of iterations the points xki will belong to a small convex ball, hence the continuity of the mean with respect to the initial points follows from Theorem 5.4.1. Hence, we only need to prove the existence and uniqueness of the mean as well as (5.55). As before we only consider the MI scheme. For N = 2 due to strong convexity of A and strict convexity of d2(o,x), ??(x1,x2) is unique, belongs to the geodesic connecting x1 and x2, d2(o,??(x1,x2))?maxi d2(o,xi) with equality if and only if x1 = x2. Now assume the claim holds for N?1 points. Define hk = max i d2(o,xki) (5.56) Denote by IN?1j the subset of indices {1,...,N} which is formed by indices that contribute in determining xk+1j from xki?s. Note that if max1?i?N d2(o,xk+1i ) happens at i = j then max 1?i?N d2(o,xk+1i ) = d2(o,xk+1j )? max i?IN?1j d2(o,xki)? max 1?i?N d2(o,xki). (5.57) 138 By the induction assumption, equality happens in the first inequality above if and only if all the N ?1 xkis contributing in determining xk+1j for i ? IN?1j coincide. The equality in the second inequality above holds if and only if max1?i?N d2(o,xki) happens at some i = l ? IN?1j . Therefore, hk+1 ? hk with equality if and only if at least N?1 of xki?s coincide and these N ?1 points are not closer to o than the other xki. Assume this situation has happened, but not all the points coincide, say xk1 = ... = xkN?1 but xkN?1 is different from them with d2(o,xkN) ? d2(o,xk1). Then d2(o,xk+11 ) = d2(o,xk1) but d2(o,xk+1i ) < d2(o,xk1) for 1 < i ? N since in determining each of these xk+1i ?s not all of contributing xki?s coincide. Therefore, the farthest xk+1i (1?i?N) from o, which is xk+11 cannot coincide with any other points at this step. As a result at step k + 2 we have hk+2 < hk+1 = d2(o,xk+11 ), unless all xki?s would have coincided. Hence, in any situation we have hk+2 ? hk with equality if and only if all xki?s coincide. Therefore, noting the continuity of (y1,...,yN)mapsto?maxi d2(o,yi) and the continuity of the means of order N?1, it easy to see that the subsequence ?x2k?1?k must have a converging subsequence which converges to ?x = (?x,..., ?x) ? MN. This implies that after some finite steps k0, {x2k0?1i }Ni=1 lies in a small enough convex ball. Therefore, from Theorem 5.4.1 this means that all sequences?xki?k converge to a common limit which coincides with ?x. Also it is obvious that limk hk = d2(o,?x)?h1 = maxi d2(o,xi) with equality if and only if all xi?s coincide and this proves (5.55). 139 5.4.3 Generalized Weighted Mean-Invariance Means Resembling Lp Means The MI or WMI means introduced before resemble the L2 mean, as evidenced by the case of Euclidean space where the MI mean and the equal weight L2 mean coincide. Note that by resemblance we do not necessarily mean closeness, rather we mean similar behavior with respect to outliers. Although, note that if the data points are close enough, then the equal weight L2 mean and the MI mean in M will be close (at least up to the first order of approximation). It is possible to define recursive- iterative means which resemble other Lp means, provided we start with a base case different from K = 2. Note that the equal weight Lp mean of two close points coincides with the midpoint of the unique minimizing geodesic connecting them, for all p > 1. However, that is not the case for K ?3 points. The following definition generalizes Definitions 5.4.1 and 5.4.2 by choosing a base case at which the Lp mean of K ? 3 points is found. We call such a mean the Generalized Weighted Mean Invariance (GWMI) based mean of type p. The interesting point is that the same method used in Theorem 5.4.1 can be used to prove the existence and uniqueness of the GWMI mean, as we see in Theorem 5.4.4. Definition 5.4.3. Let (M,d) be a complete Riemannian manifold. Assume ?p,wN : MN ? M is the Lp(1 < p ??) mean function with the positive weight vector wN, i.e., ?p,wN(x1,...,xN) is the Lp mean of the set{xi}Ni=1 with the corresponding weight vector wN, provided the mean exists uniquely. Let K ?3 and 1 < p?? be given. We define the Nth order Generalized Weighted Mean-Invariance Mean 140 (GWMI) of type p or resembling the Lp mean as follows: 1. If N = K then ?GWMI,p,wN(.) = ?p,wK(.), 2. Otherwise, to find ?GWMI,p,wN(x) where x = (x1,...,xN), set x0 = x and perform the following iteration: xk+1 = parenleftbig?GWMI,p,w1N?1(x?Nk ),...,?GWMI,p,wN N?1 (x?1k )parenrightbig. (5.58) Inthe above thecorresponding weight vector wiN?1 isfound fromwN according to (5.52) in Definition 5.4.2. We write (5.58) more compactly as xk+1 = ?GWMI,p,WN(xk), (5.59) where WN is defined the same as in Definition 5.4.3. If the sequence of points xk ?MN converges to a point ?x = (?x,...,?x)?MN, then we call ?x the Nth order GWMI mean of{xi}Ni=1 of type p with weight vector wN = [w1,...,wN]T and set ?GWMI,p,wN(x1,...,xN) = ?x. If the weights are equal we call the mean Generalized Mean Invariance (GMI) based mean. We have the following existence and uniqueness theorem: Theorem 5.4.4. For 1 < p??define Dp ={(x1,...,xN)?MN|?o?M and ? < ??,p,{xi}Ni=1?B(o,?)}, (5.60) where ??,p is defined in (3.5). Given any integer N ?K and positive weight vector 141 wN, the function ?GWMI,p,wN defined in Definition 5.4.3 is a primitive vectorial mean function with primitivity index of ns = 2 on the domain Dp defined above. Therefore, if {xi}Ni=1 lies in a ball B(o,?) with ? < ??,p, then the GWMI mean with weight vector wN exists, is unique, belongs to B(o,?) and to the closure of the convex hull of{xi}Ni=1. Moreover, ?GWMI,p,wN(x) is a strictly convex mean function of order N with domain Dp as above. Proof. The proof is the same as the proof of Theorem 5.4.1, except that the base case of the induction proof is N = K instead of N = 2. Note that even in Euclidean space the GWMI mean of type pnegationslash= 2 and the Lp mean are highly nonlinear functions of the data points and there might not be any direct relation type between them. 5.5 Discussion and Examples Besides theoretical interest, our motivation in studying the recursive-iterative means and especially the pairwise and WMI mean is possible computational benefits over e.g., the gradient descent method for finding the L2 mean. Recall that finding the Lp means in a general manifold requires knowing or using the exponential map of the manifold, which can be very difficult or computationally intensive. For certain homogenous manifolds such as the unit sphere Sn, the special orthogonal group SO(n) and the Grassmannian manifold Gk(n), if we use a suitable embedding of the manifold in a Euclidean space, then the midpoint of a geodesic between two points can be found without explicit use of the corresponding exponential maps. This fact, 142 which we shall explore further in Section 6.7, can be useful in computing pairwise, MI or WMI means in the mentioned manifolds. On the other hand, the MI and WMI means suffer from a curse of dimensionality in the sense that the computational load to calculate them increases exponentially with N. To see this, note that if CN is the computation load for finding the MI mean of order N, then from 5.45 in Definition 5.4.1 we have CN = RNCN?1, where R is the number of iterations we perform till achieve convergence. We assume that R is fixed. Clearly, the relation between CN and CN?1 will result in a prohibitive exponential dependence of CN on N. Note that each of the N sets of iterations performed in order to find the MI and WMI means of order N can be implemented independently or in parallel which can result in faster implementation. Still this parallel implementation can become expensive for large N. Therefore, the practical use of MI and WMI or GWMI means are limited to the situation where the number of data points is not large. In contrast, the pairwise mean, which unfortunately is not permutation invariant (unless N = 3), is much cheaper to calculate. 5.5.1 Perimeter Shrinkage and Pairwise Mean: An Example In our first example we apply the PS scheme to the same set of points as in the example studied in Section 4.7.3. Let{xi}4i=1 be as in that example: x1 = bracketleftbigg 00 1 bracketrightbigg ,x2 = bracketleftbigg 0.58780 0.8090 bracketrightbigg ,x3 = bracketleftbigg 0.50 0.7071 bracketrightbigg ,x4 = bracketleftbigg 00.9511 0.3090 bracketrightbigg . (5.61) 143 An important observation which we mentioned in the beginning of the section is that implementation of the PS scheme in this case is quite easy and does not require any complicated calculation. To see this note that the midpoint of the geodesic be- tween two non-antipodal points x1,x2?Sn where Sn is considered as an embedded submanifold of Rn is simply ?1 2 (x1,x2) = x1 +x2bardblx 1 +x2bardbl , (5.62) wherebardbl.bardblis the standard Euclidean norm of Rn+1. This is due to the symmetry of Sn and the symmetry of the embedding. Now, we apply the PS scheme to the set {xi}Ni=1. The points are fed to the PS scheme according to their original index order. Figure 5.4 shows the perimeter of the geodesic polygon xk1xk2xk3xk4 in terms of index iteration k. The bottom graph show logPk in terms of k. This latter graph suggests that the convergence of the polygon or the the points to the pairwise mean is a linear convergence, in the sense that Pk+1 ? hPk where h < 1 is a constant. This should not be surprising since, after a while that the points become close enough, the behavior of the PS scheme in S3 or in any manifold will become similar to its counterpart?s in Euclidean space, which has this linear behavior. Also we try to feed the PS scheme with the data points in different orders. Note that we have three distinct order possibilities to feed the PS scheme. They are 1234, 1324 and 1342. The rest of the possibilities are equivalent to one of the three mentioned. Associated to each of these three orders we have three different 144 0 10 20 30 40 500 1 2 3 4 P k k Perimeter of polgon x1kx2kx3kx4k in terms of number of iterations 0 10 20 30 40 5010 ?10 100 1010 Logarithm of the perimeter of polgon x1kx2kx3kx4k in terms of number of iterations log P k k Figure 5.4: Perimeter of the polygon xk1xk2xk3xk4 in example in Section 5.5.1 under the PS iteration. The bottom graph shows that convergence of the polygon to a point is a linear convergence. pairwise means ?1234, ?1324 and?1342. It would beinteresting to comparethe distance between these three means1. We also add two other means to this comparison. First, is the equal weight L2 mean of{xi}4i=1 denoted by ?L2, which is found by the gradient descent algorithm. Second, is the so-called extrinsic mean ?ex which is the projection of the Euclidean mean of{xi}4i=1 as a subset of R3 onto S2 and is quite easy to find: ?ex = summationtextN i=1 xi bardblsummationtextNi=1 xibardbl, (5.63) wheneversummationtextNi=1 xi isnot zero. Table5.1 shows theangular distances between all these means in Degrees. As one can see all the distances are very small. Nevertheless, on 1A valid question which we have not addressed is: what is the source of this permutation dependence? It seems that curvature is one factor. Also another factor is the number of data points. By knowing and understanding these factors we might be able to devise means which are simple to compute and yet their permutation dependence is under control. 145 d(.,.) ?1234 ?1324 ?1342 ?L2 ?ex ?1234 0 0.2846 0.9622 0.4031 1.0373 ?1324 0.2846 0 0.8049 0.2654 1.0720 ?1342 0.9622 0.8049 0 0.5659 0.6905 ?L2 0.4031 0.2654 0.5659 0 0.8251 ?ex 1.0373 1.0720 0.6905 0.8251 0 Table 5.1: The angular distances in Degrees between different means for the set {xi}4i=1 in example of Section 5.5.1. average the more sophisticated means ?1234, ?1324, ?1342 and ?L2 are closer to each other than to ?ex, in the sense that the average of the distances between the means in the group is less than the average of the distances between ?ex and the means in the group. 5.5.2 Cyclic Pursuit on the Sphere Imagine a group ofN frogs dispersed uniformly on a globe. If each frog starts chasing its closest group member, what would happen after a while? Since the situation is independent of labels, we assume frog i starts following frog i+1 which is its closest group member. Obviously, by xN+1 we mean x1. This is exactly the formulation we introduced in Section 5.3.2. We consider the discrete-time case, and assume at each unit of time frog i hops to the midpoint of the minimal geodesic between itself and from i + 1. Theorem 5.3.2 tells us that the frogs will either all meet at a point or will converge to circular formations on great circles. We know that if they start in a convex ball, they will converge to a point; however, if they are not initially located in a strongly convex ball either of the two endings can happen. Note that by our 146 assumption of uniform placement on the globe, if N is large enough the chance for them of not being in a strongly convex ball is low. It seems to be difficult to give a condition to guarantee whether a given initial configuration of the group which is dispersed on entire globe will converge to a point or to great circles. However, as Theorem 5.3.2 suggests a great circle in not a stable formation. We can observe this fact in a numerical simulation. We assume there are N = 200 frogs in cyclic pursuit. Figure 5.5 shows the evolution of their positions and Pk the perimeter of the closed geodesic polygon xk1 ...xkN in terms of the discrete-time index k. As it can be seen, after initial approach to a great circle, due to finite precision errors in computer simulations, Pk (k?18583) becomes smaller than 2? and converges to 0, eventually. The conclusion is that unless the frogs commit no errors, they will meet each other at a point. Whether they can avoid this event depends on whether they can employ smarter rules to control their formation! 5.5.3 An Example of GMI Means on the Sphere In this example, we find the GMI means of types p = 1.1 and p = 3, and the MI mean for the data points{xi}Ni=1 in the example of Subsection 4.7.3. The base case for the GMI means is K = 3. The corresponding Lp means are found by the gradient descent algorithm. Figure 5.6 shows the resulted means. For comparison we have shown the L1.1, L3 and L2 means of the same data set. While it seems very difficult to give any useful quantitative measure of closeness between the Lp mean and its GMI version, this example shows that they are close. However, as mentioned before, the more important issue is their behavior with respect to the outliers. We see that 147 ?1 0 1 ?10 1 ?1 0 1 k=0,Pk=13.06pi ?1 0 1 ?10 1 ?1 0 1 k=10,Pk=8.46pi ?1 0 1 ?10 1 ?1 0 1 k=100,Pk=5.36pi ?1 0 1 ?10 1 ?1 0 1 k=1000,Pk=3.10pi ?1 0 1 ?10 1?1 0 1 k=10000,Pk=2.0014pi ?1 0 1?10 1?1 0 1 k=18583,Pk=2.0000pi ?0.7?0.65 ?0.6 00.1 0.2 ?0.8 ?0.75 ?0.7 k=100000,Pk=0.05pi 1 2 3 4 5 6 7 8 9 10 x 104 0 10 20 P k k Perimeter Pk in terms of k in a cyclic pursuit on a sphere Figure 5.5: Top pictures show the evolution of the positions of a group of N = 200 frogs on a globe in discrete-time cyclic pursuit in terms of time k. Convergence is slow and after k = 18584 iterations they reach to a formation which is the closest to a great circle in perimeter. However, since this configuration is not stable, due to finite precision errors in calculations they keep getting closer to each other, and after k = 100,000 iteration the perimeter will be 0.05?. The bottom graph shows the perimeter Pk in terms of k, which after initial ?convergence? to 2?, due to finite precision tends towards zero. 148 the GMI means of types p = 1.1 and p = 3 have different behaviors and resemble their Lp versions. In the simulations for this example we observed two difficulties in computing the GMI means: 1. The first difficulty arises when the scheme is close to convergence, in this situation we need to find the Lp mean of data points which are very close to each other and this can be numerically difficult. Two reasons for this can be found. The first reason is that in such a case the formulas (4.32) and (4.31) for finding the exponential map and its inverse at a point which is close to the data points involves divisions by small numbers and also finding the arccos of small angles. The second reason is that one needs to use very small stepsize or adaptive stepsize in the gradient algorithm to make sure that the new updates are not out of the convex hull of the data points (which is a very small). 2. The second difficulty also is related to finding the Lp mean for values of p close to 1 or very large. We already mentioned the difficulty with large valued of p in Subsection 4.7.3. A look at (3.6), the formula for?fp, shows that if the iterate of the gradient descent is close to one of the data points, then evaluation of ?fp requires again division by a small number. This adds some error in finding the Lp mean; however, since in the iterative process we find many Lp means the errors can accumulate. Again, the use of very small stepsize helps in maintaining the error in acceptable levels. 149 ?1 ?0.5 0 0.5 1 ?1 ?0.8 ?0.6 ?0.4 ?0.2 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x4 x3 x1 x2 data GMI3 L3 GMI1.1 L1.1 GMI2 L2 Figure 5.6: GMI mean of types p = 1.1 and p = 3 together with the MI mean for four points{xi}4i=1 on the unit sphere are shown. For comparison the corresponding Lp means are also shown. The base case for the GMI means is K = 3. 150 Chapter 6 Results Specific to the Orthogonal Group and the Grassmannian 6.1 Introduction Feasibility of numerical computations related to manifold-valued data depends heav- ily on our ability in computing the exponential map and geodesics of the underlying manifold. For an arbitrary manifold computing the exponential map can be very difficult. However, for a class of manifolds known as Riemannian symmetric spaces that can be done with relative ease. Fortunately, Riemannian symmetric spaces nat- urally appear in many applications, and for this reason they deserve special study. In this chapter we study the Riemannian Lp means and recursive-iterative means in two important symmetric spaces: the special orthogonal group SO(n) (comprised of n?n orthogonal matrices of determinant 1), and the Grassmannian manifold Gk(n) which is the space of k-dimensional subspaces of Rn. Finding geodesics and other related calculations in these two manifolds boil down to matrix computations. We place emphasize on developing efficient methods for calculation of the pairwise and MI means in these manifolds. 151 6.1.1 Contributions and Outline of the Chapter It is an interesting fact observed in [36] (see also [16, Chapter 7]) that existence and uniqueness of the local Riemannian L2 mean could be extended to rather large non-ball domains in SO(n). By large we mean relative to the diameter or volume of SO(n) or in the sense of inclusion. Such a domain can be described via a Finsler distance induced by the matrix 2-norm -instead of the Riemannian norm- in the Lie algebra of SO(n) (see Section (6.4)). In this chapter we try to build our results based on this observation. More concretely, we give large domains for existence and uniqueness of the means. These domains are not Riemannian balls rather Finsler balls, and are larger compared with the Riemannian balls. Not all the results we derive are completely new; however, in most occasions the already known results are derived based on new observations (e.g., Theorem 6.5.1). Two main contributions of this chapter are as follows: 1. In Theorem 6.6.1 we derive the existence and uniqueness results for pairwise, MI and WMI mean of points in Finsler 2-norm balls of appropriate size on SO(n) and Gk(n). Also in Theorem 6.5.1 we give the same result for local Lp means. 2. In Theorems 6.7.1 and 6.7.2 we give efficient methods to find the midpoint of the geodesic between two points in SO(n) (or Gk(n)) and the reflection of a point with respect to another point in SO(n) (or Gk(n)). The significance of these results is that they allow us to interpolate or extrapolate a geodesic between two points without the explicit use of matrix logarithm and exponen- 152 tial, both of which have high computational complexity. This will be helpful, especially in finding the pairwise and MI means (see Sections 5.3, 5.4 and 6.8) in these spaces. We should emphasize that the results in this chapter most likely can still be strength- ened. Our efforts to strengthen Theorem 6.5.1 to include the global Riemannian Lp mean as well as larger Finsler domains were not successful. The outline of this chapter is as follows: In Section 6.2 we give the needed results about the geometry of SO(n) and Gk(n). In Section 6.3 we give the existence and uniqueness results for means of points in a Riemannian ball in SO(n) or Gk(n). In Section 6.4 we describe the Finsler balls based on the matrix 2-norm on the tangent space and study convexity of the Riemannian distance function within these strongly convex Finsler balls in SO(n) and Gk(n). In Section 6.5 existence and uniqueness of the local Lp means in 2-norm Finsler balls is studied. In Section 6.6 we give existence and uniqueness results for pairwise, MI and WMI means in the mentioned 2-norm balls. In Section 6.7 we derive efficient methods to find midpoint of a geodesic between two point in SO(n) and Gk(n). In Section 6.8 we apply the methods of Section 6.7 to implement two algorithms: the Nelder-Mead method of direct search on SO(n) and the K-means clustering algorithm on Gk(n). 6.2 Riemannian Geometry of SO(n) and Gp(n) Here, we briefly give needed results about the Riemannian geometry of SO(n) and Gk(n). 153 6.2.1 Riemannian Geometry of SO(n) The special orthogonal group SO(n) is a compact Lie group, and we denote its Lie algebra by so(n) which is the space of n?n skew-symmetric matrices with the Lie bracket operation defined by [X,Y] = XY ?YX, X,Y ?so(n). (6.1) The associated Lie algebra level adjoint operator adX : so(n)?so(n) is defined by adX(Y) = [X,Y] (6.2) for X,Y ?so(n). The corresponding group level adjoint representation Adx(Y) is defined as Adx(Y) = xYx?1 = xYxT (6.3) for x?SO(n) and Y ?so(n). The two operators are related to each other by the well-known relation eadX(Y) = AdeX(Y) (6.4) for X,Y ? so(n), where eX is the matrix exponential of n?n matrix X. In the above eadX : so(n)?so(n) is a linear operator defined as eadX(Y) = ea|a=adX(Y) = ?summationdisplay k=0 adkX k! (Y), (6.5) 154 where adkX(Y) = [X,adk?1X (Y)] and ad0X(Y) = Y. Remark 6.2.1. We also will use the inverse of the matrix exponential, i.e., the matrix logarithm. It is well-known that if a non-singular n?n matrix x has no eigenvalues on the negative real line, then its logarithm X = logx is a unique matrix such that x = eX and all the eigenvalues of X have imaginary parts between ?? and ?. Note that a skew-symmetric matrix has pure imaginary eigenvalues; hence, if x ? SO(n) has no eigenvalues equal to ?1, then there exists a unique X ?so(n) such that all the eigenvalues of X are smaller than ? in absolute value and that x = eX. See Theorem 6.2.2 which gives the injectivity radius of SO(n) in the standard Riemannian metric. For x?SO(n) the left translation Lx : SO(n)?SO(n) is defined by Lx(y) = xy for y?SO(n). The right translation can be defined similarly. A tangent vector Y ? so(n) can be left translated to TxSO(n) via the derivative of Lx. This way, by left or right translations, corresponding to any Y ? so(n), we can build left or right invariant vector fields YL or YR. Therefore, we can define the Lie bracket operation for left or right invariant vector fields based on the definition of Lie bracket on so(n). This gives an algebraic-differential picture of SO(n). There is a natural way to equip SO(n) and more generally any compact Lie group with a Riemannian structure which matches the group structure, in the sense that both left and right translations are isometries. This follows from the fact that there always exists a Riemannian structure g which is invariant under both left and right translations, i.e., it is bi-invariant. We do not mention the details and only mention the following 155 theorem [46, 33, 40]: Theorem 6.2.1. There exists a bi-invariant Riemannian metric g on SO(n) for which 1. SO(n) is a complete Riemannian manifold, 2. The geodesics throughI, theidentity ofSO(n), coincide with theone-parameter subgroups ofSO(n), i.e, allsuch geodesics areoftheformetX where X?so(n). Hence, all geodesics coincide with the one-parameter subgroups and their translations. In this metric inversion is an isometric reflection with respect to I. 3. If?is the Levi-Civita connection for g and R is its corresponding curvature tensor at identity then: (a) ?XY = 12[X,Y], X,Y ?so(n) (b) R(X,Y)Z =?14[[X,Y],Z], X,Y,Z?so(n) (c) ?R(X,Y)Z,U?I =?14?[X,Y],[Z,U]?, X,Y,Z,U ?so(n) The most common such a Riemannian metric is constructed from the inner product induced by the the so-called Killing form on so(n). We avoid the details and give the important facts which we need. The mentioned inner product on so(n) after a scaling will be ?X,Y?I = gI(X,Y) = 12tr(XTY), X,Y ?so(n). (6.6) 156 For left-invariant vector fields XL and YL we have gx(XL(x),YL(x)) = gI(x?1XL(x),x?1YL(x)). (6.7) We call this metric the standard Riemannian metric of SO(n). Note that at so(n) we have gI(X,X) = 12bardblXbardbl2F, where bardbl.bardblF denotes the Frobenius norm. It can be shown that for X,Y ?so(n) [13] 0?bardbl[X,Y]bardblF ?bardblXbardblFbardblYbardblF. (6.8) From this and item 3c in Theorem 6.2.1 we conclude that the sectional curvature of SO(n) in the standard Riemannian metric satisfies 0?K ? 12. Also it is quite easy to establish the following Theorem, using the properties of matrix exponential and skew-symmetric matrices: Theorem 6.2.2. we have the following: 1. The injectivity radius of SO(n) in the standard Riemannian metric is ?. 2. Let n? = [n2], where [a] is the smallest integer not larger than a?R. Closed simple geodesics in SO(n) are of length 2radicalbigl21 +...+l2n??, where li?s are non- negative integers with no common factor other than 1. A shortest (nontrivial) closed geodesic is of length 2?. A longest simple geodesic is of length 2?n??. Hence, the Riemannian diameter of SO(n) is?n??. 3. In addition, for even n the farthest point from the identity I is ?I and for 157 odd n the farthest points from I are of the form ?I =?I + 2VV T, (6.9) where v?Rn andbardblVbardbl= 1. (bardbl.bardblis the standard norm of Rn.) 6.2.2 Riemannian Geometry of Gk(n) The following materials can be found in many references, but we mostly follow [26, 91, 46, 33]. The Grassmannian Gk(n) is defined as the manifold of k-dimensional subspaces of Rn, where we assume k ? bracketleftbign2bracketrightbig. Here, [a] is the integer part of real number a. For k = 1, Gk(n) is the real projective plant RPn?1, which is found by identifying the antipodal points on the unit sphere Sn?1. The more complicated and interesting case is when k is larger than 1; and henceforth we assume k?2. Gk(n) is a smooth manifold which can be endowed with a Riemannian structure. There is a natural Riemannian structure which is closely related to the quotient topology associated with Gk(n). We briefly mention these facts. Consider a subgroup of SO(n) defined as Kk(n) ={bracketleftbigo1 00 o2 bracketrightbig|o1?O(k),o2?O(n?k),det(o1o2) = 1}. (6.10) Here, O(n) denotes the Lie group of orthogonal matrices. Kk(n) is also denoted by S(O(k)?O(n?k)). Now let q = [Un?k|U?n?(n?k)] denote an element of SO(n). The first k-columns of q denoted by U represent a k-dimensional subspace of Rn, and 158 the rest of the columns denoted by U? represent the orthogonal complement of that subspace. Consider an action of Kk(n) on SO(n) which is defined by multiplication from the right of q?SO(n). Gk(n) is identified with the set of orbits of this action which coincides with the left coset of the subgroup Kk(n). Moreover, Gk(n) can be considered as a Riemannian manifold whose metric is derived from the standard Riemannian metric of SO(n). Each element q ? SO(n) can be associated to an element {qKk(n)}? Gk(n) which consists of the orbit of Q under the action of Kk(n): {qKk(n)}={qo|o?Kk(n)}. (6.11) Note that this is the collection of elements of SO(n) whose first k columns span the same subspace as the first k columns of q. The tangent space to SO(n) at the identity will be decomposed to two orthogonal subspaces according to Cartan decomposition: so(n) = gk(n)?kk(n), where kk(n) ={bracketleftbigA 00 Bbracketrightbig|A?so(k),B?so(n?k)} (6.12) and gk(n) ={bracketleftbig0 ?ATA 0 bracketrightbig|A?R(n?k)?k}. (6.13) gk(n) and kk(n) are called horizontal and vertical spaces, respectively. At any other point q?SO(n) we have the corresponding decomposition via left translation. Note that gk(n) is identified with the tangent space of Gk(n) at{IKk(n)}. Therefore, if Xi = [ 0 ? ?XTi?X i 0 bracketrightbig (i = 1,2) are two tangent vectors at{IK k(n)}, then the Riemannian 159 metric is nothing but ?X1,X2?{IKk(n)} = gI(X1,X2) =?12tr(X1X2) = tr( ?XT1 ?X2), (6.14) where g is the standard Riemannian metric of SO(n) (6.6). We denote the corre- sponding norm at gk(n) by bardbl.bardbl, as before. A consequence of the above relation is that the geodesic in Gk(n) along X ? gk(n) is noting but the equivalent class of {etXKk(n)}, where eA is the standard matrix exponential of matrix A?Rn?n. Let ?X = U(n?k)?(n?k)bracketleftbig?k?k 0 bracketrightbigV T k?k be a Singular Value Decomposition (SVD) of ?X(n?k)?k. We can write U = [U1,U2 ], where U1 is (n?k)?k and U2 is (n?k)?(n?2k). Then X?gk(n) can be written as: X = bracketleftbig 0 ? ?XT?X 0 bracketrightbig = bracketleftbigV 0 00 U1 U2 bracketrightbig bracketleftbigg 0 ?? 0 ? 0 00 0 0 bracketrightbiggbracketleftbig V T 0 00 U 1 U2 bracketrightbigT. (6.15) Assume that ? = diag(?1,...,?k), where ?1 ? ... ? ?k ? 0. Here, diag(a) is a diagonal matrix whose diagonal is the vector a. Note that bardblXbardbl= (summationtext?2i)12. Now, etX can be computed easily as etX = bracketleftbigV 00 U bracketrightbig bracketleftbiggC(t? 1,...,t?k) ?S(t?1,...,t?k) 0 S(t?1,...,t?k) C(t?1,...,t?k) 0 0 0 I(n?2k)?(n?2k) bracketrightbiggbracketleftbig V 00 U bracketrightbigT, (6.16) where C(?1,...,?k) = diagparenleftbigcos?1,...,cos?kparenrightbig and S(?1,...,?k) = diagparenleftbigsin?1,...,sin?kparenrightbig. (6.17) 160 The decomposition in (6.16) is known as the CS decomposition of the orthogonal matrix etX [31, 26]. An element s?Gk(n) can be represented by a subspace that is a member of{etXKk(n)}as S(tX) = bracketleftbigV 00 U bracketrightbig bracketleftbiggC(t? 1,...,t?k) S(t?1,...,t?k) 0(n?2k)?k bracketrightbigg (6.18) We call this the standard subspace representation of s ={etXKk(n)}?Gk(n). For any v?SO(k), S(tX)v represents the same subspace, but it is in the standard form anymore. Note that S(tX) is found from (6.16) by dropping the right most factor and two columns of the matrix in middle. Also note that the geodesics along ?X (or XT) is given by {(etX)TKk(n)} which can be found simply by a transposition operation. Therefore, a subspace S? which corresponds to reflection of S with respect to{IKk(n)}is simply S?(tX) = S(?tX) = bracketleftbigV 00 U bracketrightbig bracketleftbigg C(t? 1,...,t?k) ?S(t?1,...,t?k) 0(n?2k)?k bracketrightbigg . (6.19) Note that S? can be found by replacing t?i with ?t?i. The Riemannian distance between two elements of Gk(n) can be found from their subspace representations (standard or non-standard) easily: If s1,s2 ? Gk(n) have subspace representa- tions S1 and S2, respectively, let ST1 S2 = UCV T be an SVD of ST1 S2, then C = diag(cos?1,...,cos?k), where 0 ? ?1,...,?k ? ?2 are the principal angles between 161 s1 and s2 and the Riemannian distance between s1 and s2 is d(s1,s2) = ( ksummationdisplay i=1 ?2i)12. (6.20) Theorem 6.2.3. We have the following 1. The injectivity radius ofGk(n)inthe standardRiemannianmetric isinjGk(n) = ? 2. 2. Closed geodesics in Gk(n) are of length radicalbigl21 +...+l2k?, where l1,...,lk are nonnegative integers with no common factor other than 1. A shortest closed geodesic is of length ? and a longest is of length ?k?. Hence, the distance between two farthest points inGk(n) (or equivalently the Riemannian diameter of Gk(n)) is ?k 2 ?. 3. Let R : gk(n)?gk(n)?gk(n) ? gk(n) denote curvature tensor of Gk(n) at {IKk(n)}. We have R(X,Y)Z =?[[X,Y],Z], X,Y,Z?gk(n), (6.21) where [.,.] is the matrix Lie bracket. Therefore, the sectional curvature along two orthonormal tangent vectorsX,Y ?gk(n) is?tr([[X,Y],Y]X) =bardbl[X,Y]bardbl2, where bardbl.bardbl is the standard norm in so(n). From this, one can conclude that the sectional curvature of Gk(n) satisfies 0?K?2. Remark 6.2.2. In Theorem 5.3.2, a condition on the finiteness of the number of 162 possible values for the lengths of closed geodesic is mentioned. From Theorems 6.2.2 and 6.2.3, this condition is satisfied in SO(n) and Gk(n). 6.2.2.1 Isometric Identification of Gk(n) with the Space of Orthogonal Projections Matrices of Rank k It is obvious that to any k-dimensional subspace of Rn we can associate, in a unique way, an n?n orthogonal projection matrix. Such a matrix has the form SST, where S ? Rn?k is a matrix comprised of an orthogonal basis of that subspace. Let us denote the linear space of n?n symmetric matrices bySn and its subset of orthogonal projection matrices of rank k byPk,n. Pk,n can be isometrically identified with Gk(n) as Riemannian manifolds via an isometric map [89]. This map, simply maps a tangent vector at{IKk(n)} which can be represented as X = [ 0 ? ?XT?X 0 bracketrightbig to a tangent vector ?X = [ 0 ?XT?X 0 bracketrightbig in the tangent space to Pk,n at Ik = [Ik?k 00 0]. This can be easily extended to other points in Gk(n) via rotating a subspace in Gk(n) and congruence transformation inPk,n. The actual formula for the geodesic in this space can be found e.g., in [41]. 6.3 Existence and Uniqueness of Lp and MI Means in Riemannian Balls The following is a direct consequence of what preceded in Section 6.2: Theorem 6.3.1. Theorems 3.2.1, 3.2.2, 5.4.1 and 5.4.4 hold true in M = SO(n) equipped with the standard Riemannian metric in which case we have ? = 12 and 163 injM = ?, and hence (see (3.5)) ??,p = ? ?? ??? ?2 4 ? 1?p < 2 ? 2(= rcx) 2?p??. (6.22) The same theorems hold for M = Gk(n)(2 ? k ? [n2]) where have ? = 2 and injM = ?2, and accordingly ??,p = ? ??? ??? ?2 8 ? 1?p < 2 ? 4(= rcx) 2?p??. (6.23) The main shortcoming of the above theorem is that the balls of radius ? < ??,p become increasingly small compared with the diameter of SO(n) or Gk(n) as n increases (see Theorems 6.2.2 and 6.2.3). Note that ??,p remains constant as n increases, while the diameters of SO(n) and Gk(n) increase by n. The crucial observation made in [36] is that one can find larger non-ball strongly convex domains in a compact Lie group in which the local center of mass is unique. This domain is easily described in terms of the matrix 2-norm in the Lie algebra so(n) (rather than the norm (6.6) related to the standard Riemannian metric). We shall elaborate on this issue next. 6.4 Large Non-ball Convex Domains in SO(n) and Gk(n) and a Finsler Structure For our purposes a Finsler manifold is defined as follows [72]: 164 Definition 6.4.1. A Finsler metric F on a smooth manifold M assigns a norm F(x,.) to each tangent space TxM in a smooth manner as x varies. The pair (M,F) is called a Finsler manifold. Remark 6.4.1. A more restricted definition requires a convexity condition on the norm, i.e., the norm must be a Minkowski norm1. The main consequence of this restriction is that the geodesics are well behaved in the sense of uniqueness and smoothness. An example clears the issue [4]. InR2 the standard Euclidean norm has only the straight lines as its length minimizing curves between two points. However, the L? norm which is defined by bardblbracketleftbigxybracketrightbigbardbl? = max{|x|,|y|} in addition to straight lines has some other merely piece-wise differentiable length minimizing curves too. The same holds for the L1 norm defined asbardblbracketleftbigxybracketrightbigbardbl1 =|x|+|y|. Neither of these two norms are Minkowski norms. Also note that in these two norms the unit sphere is not smooth, in contrast with the unit spheres in the Lp with 1 < p 4. Therefore, we expect to have length minimizing curves with cranks in SO(n) (n ? 4) as a Finsler manifold. In fact, consider this curve in so(4): ?(t) = ? ??? ??? tY +tU 0?t? 12 tY + (1?t)U 12 ?t?1, (6.27) where Y and U are defined as before. Note that bardbl??(t)bardbl2 = bardblYbardbl2. The piecewise linear curve ?(t) connects the origin to Y and its length is bardblYbardbl2, so it is a length minimizing curve. Obviously, there is the straight line segment ?(t) = tY with 0 ?t?1 which does the same, i.e, its length is equal to the distance between Y and the origin. Theorem 6.4.1 tells us that the curve etY is a length minimizing curve in SO(4) connecting I to eY ; and since the curve ?(t) = e?(t) also has the same 2-norm length it too is a length minimizing curve in SO(4). Therefore, in (SO(4),bardbl.bardbl2) the length minimizing curves are not unique, even locally, which is in contrast with the Riemannian case- where the length minimizing curves are locally unique and they are the geodesics. 167 Remark 6.4.3. The 2-norm is not an admissible norm with respect to the Lie bracket operation. A normbardbl.bardblon a Lie algebra g is admissible ifbardbl[A,B]bardbl?bardblAbardblbardblBbardbl for A,B ? g [60]. As we mentioned before bardbl.bardblF is an admissible norm on so(n). Any induced norm like the 2-norm, can be converted to an admissible norm via scaling. For example, if we definebardblAbardbl2? = 2bardblAbardbl2, then from the triangle inequality and sincebardblABbardbl2?bardblAbardbl2bardblBbardbl2 we havebardbl[A,B]bardbl2? ?bardblAbardbl2?bardblBbardbl2? for all A,B?so(n). However, the unit ball ofbardbl.bardbl2? is contained in the unit ball ofbardbl.bardbl2. In this sense we have shrunk the unit ball. Some of our results here, are essentially derived in [36] in the context of a general compact Lie group and with an admissible norm. Applying those results to SO(n) with thebardbl.bardbl2? norm, results in too conservative results about the radius of largest convex balls in SO(n). The following theorem shows that curves of the form etX and their translates minimize the 2 Finsler distance in addition to the Riemannian distance. Therefore, the Finsler structure is quite simple. The proof of the theorem can be found in [16, Chapter 7] (cf. [36]): Theorem 6.4.1. Let X,Y ? so(n) with bardblXbardbl2 +bardblYbardbl2 < ?. Then there exists a unique H ? so(n) with bardblHbardbl2 < ? such that h = eH = eXeY . Moreover, bardblHbardbl2 ?bardblXbardbl2 +bardblYbardbl2. Therefore, the 1-parameter subgroups of SO(n) and their translates locally minimize the 2-norm Finsler distances. Here, locally means on 2-norm lengths not larger than ?. The following corollaries are immediate: 168 Corollary 6.4.1. If X,Y are as in Theorem 6.4.1 and x = eX and y = eY , then d2(x,y) =bardbllogxTybardbl2?bardblXbardbl2 +bardblYbardbl2. (6.28) Corollary 6.4.2. The tangent space gk(n) can be endowed with the Finsler 2-norm induced from so(n). Hence, Gk(n) will be a Finsler manifold in which horizontal geodesics and their translates locally minimize the 2-norm Finsler distance. Here, locally means on 2-norm lengths not larger than ?2. We denote the 2-norm distance between x,y?Gk(n) by d2(x,y). 6.4.1.1 Metric Decreasing Property of the Matrix Exponential in SO(n) The exponential map in a Riemannian manifold of nonnegative curvature has a metric decreasing property. This means that the distances between two points in the tangent spaces is not smaller than the distance between their images under the exponential on the manifold. One can show that on SO(n) this holds for any bi- invariant Finsler distance. This fact was observed in [36] for admissible bi-invariant norms (see also [16, Chapter 7]). Here, we give a rather different proof, where we do not use the admissibility assumption. First, we give a short preparation about the derivative of the matrix exponential (see [45] or [38]). Let f(t) = eA(t) where A : [0,1]?so(n) is a curve in so(n). Then the derivative of f(t) is computed as d dtf(t) = dexpA(t)( ?A(t))f(t), (6.29) 169 where ?A(t) = dA(t)dt , and dexpX : so(n)?so(n) is defined as dexpX(Y) = e a?1 a |adX(Y) (6.30) for any Y ?so(n). Here, by ea?1a |adX(Y) we mean that in the Laurant expansion of umapsto? ea?1a , a?C is replaced by adX and then applied to Y (see (6.2)). Theorem 6.4.2. Let bardbl.bardblbe any bi-invariant norm in so(n) and d its induced dis- tance function in SO(n) 1. We have d(eX,eY )?bardblX?Ybardbl, ?X,Y ?so(n). (6.31) Proof. Consider the function f : [0,1]?[0,1]?SO(n) with f(t,s) = et(X+s(Y ?X)). Note that f(1,0) = eX and f(1,1) = eY , i.e., smapsto?f(1,s) is a curve connecting eX to eY , therefore: d(eX,eY )? integraldisplay 1 0 bardbl?f?sbardblds. (6.32) But from (6.29) we have ?f ?s = dexpt(X+s(Y ?X))t(Y ?X)f(t,s). (6.33) From bi-invariance of bardbl.bardbl we have bardbl?f?sbardbl = bardbldexpt(X+s(Y?X))t(Y ?X)bardbl. Now, we 1Here, d can be either the Riemannian distance function or the 2-norm Finsler distance. 170 calculate ?g?t where g(t,s) = dexpt(X+s(Y ?X))(t(Y ?X)). Note that g(t,s) = ?summationdisplay i=1 ti i!ad i?1 X+s(Y?X)(Y ?X). (6.34) Hence ?g ?t = ?summationdisplay i=0 ti i!ad i u|u=X+s(Y?X)(Y ?X) = e adt(X+s(Y?X))(Y ?X) = et(X+s(Y ?X))(Y ?X)e?t(X+s(Y ?X)). (6.35) Therefore,bardbl?g?tbardbl=bardblY?Xbardbland this immediately implies that d(eX,eY )?bardblY?Xbardbl. 6.4.2 Hessian of the Distance Function in Symmetric Spaces Both SO(n) and Gk(n) are symmetric spaces. One consequence is that the eigen- values of the curvature operator do not change along a geodesic, and the Jacobi equation becomes a constant coefficient equation which can be solved explicitly. In the sequel assume M is a symmetric Riemannian manifold of dimension m. Let c : [0,d(q,x)] ? M be a unit speed geodesic connecting q to x. The curvature operator Rc(t) : Tc(t) ?Tc(t) defined by Xmapsto?Rc(t)(X,c?(t))c?(t) (6.36) is a self-adjoint operator which has a set of m real eigenvalues ?1(t)?...??m(t). Here, c?(t) is the velocity vector of c(t). Due to symmetry of M the eigenvalues 171 remain constant in time and due to both the symmetry and compactness of M they are all nonnegative, therefore ?i(t) = ?i ? 0 [46]. Denote a set of orthonormal eigenvectors of Rc(0) by {Vi(0)}ni=1. For every i by parallel translating the vector Vi(0) along c(t), we will have a parallel vector Vi(t) which for each t is an eigen- vector of Rc(t) associated with ?i. Hence, {Vi(t)}mi=1 will be an orthonormal frame along c(t). Any Jacobi field J(t) along c(t) with initial condition J(0) = 0 can be written as J(t) = summationtextmi=1 AiJi(t), where Ji(t) satisfies J??i (t) + ?iJ(t) = 0 and bardblJ?i(0)bardbl = 1. We can solve explicitly for Ji as Ji(t) = sn?i(t)Vi(t), where sn?(t) is defined in (2.6). Now, recalling formula (2.4) for the Hessian of xmapsto?d(q,x) we have Hess(d(q,x))(Ji(t),Jj(t)) =?Ji(t)??,J?j (t)??ij, where ?ij is Kronecker?s delta. Note that from this we have Hess(d(q,x))(Jm(t),Jm(t)) = 0, since Jm(t) has no normal component along c(t). From that formula we have: Hess(d(q,x))(Vi(d(q,x)),Vj(d(q,x))) = ? ??? ??? 0 i or j = m cs?i(d(q,x)) sn?i(d(q,x))?ij 1?i,j < m. (6.37) Note that since ?i ? 0, we have cs?i(d(q,x))sn? i(d(q,x)) = ??i cot(??id(q,x)), except when ?i = 0 in which case the expression is equal to 1d(q,x). It is obvious that the largest eigenvalue of the curvature operator, i.e., ?1 is what controls the sign of the Hessian. For SO(n) and Gk(n) it is possible to find the ?i?s more explicitly and identify the conditions to guarantee that xmapsto?d(q,x) is positive-definite. 172 6.4.2.1 Eigenvalues of the Curvature Operator in SO(n) and Gk(n) There are different ways to derive the results we are going to present. One short way, which we shall follow, is to use the notion of reduced commutator introduced in [13]. Let {?i}mi=1, where m = n(n+1)2 , be an orthonormal basis for so(n). Any Y ?so(n) can be written as Y = summationtextmi=1 yi?i. Denote the vector bracketleftbigg y1 ... ym bracketrightbigg ?Rm by ?(Y), and hence identify so(n) (as a vector space) with Rm. The reduced commutator CadX : Rm?Rm is linear operator defined via the identity ?([X,Y]) = CadX?(Y). (6.38) Note that CadX is a matrix representing the adjoint operator adX. We recall the following proposition from [13]: Proposition 6.4.1. Let n = 2n? be an even number and let ?i?1,...,?i?n? (i = ??1) be the eigenvalues of X ?so(n). Then the eigenvalues of C adX are i(??i? ?j)(1 ? i < j ? n?) as well as zero with multiplicity n?. If n = 2n? + 1 and ?i?1,...,?i?n?,0aretheeigenvalues ofX, thenthe eigenvalues ofCadX are?i?i(1? i?n?), i(??i??j)(1?i < j?n?) as well as zero with multiplicity n?. Now, back to our application, we want to find the eigenvalues of the curvature operator for SO(n) and Gk(n). From Theorems 6.2.1 and 6.2.3 we see that the curvature operator at I and along X is given by Y mapsto??14ad2X(Y) for SO(n) and Y mapsto? ?ad2X(Y) for Gk(n) with the extra condition of bardblXbardbl = 1. The above proposition 173 enables us to find the eigenvalues of the curvature operator. Note that we can write ?1(?ad2X) = (|?1(X)|+|?2(X)|)2, (6.39) where ?1(A) and ?2(A) are the first and second largest eigenvalues in absolute value of the symmetric matrix (or skew-symmetric) operator A, respectively. 6.4.3 2-norm Finsler Balls in SO(n) The following Theorem characterizes stronglyconvex 2-norm(Finsler) balls inSO(n): Theorem 6.4.3. For o?SO(n) define the 2-norm (Finsler) ball B2(o,?) ={oetX|X?so(n),bardblXbardbl2 < ?}. (6.40) Then B2(o,?) is a (Riemannian) strongly convex set in SO(n) for every ? < ?2. Moreover, the Riemannian distance function x mapsto? d(o,x) is strictly convex inside B2(o,?) along non-radial directions; and it is only convex along radial directions. Proof. We partially follow [36] to prove the first part. Without loss of generality we can assume that o coincides with I, the identity of SO(n). Let X,Y ? so(n) andbardblXbardbl2,bardblYbardbl2 < ?. Set x = eX and y = eY . We have x,y ?B2(I,?), and since d2(x,y) < ? there is a unique minimizing geodesic connecting x to y . Let z be the unique midpoint of the minimizing geodesic connecting x to y. Note that z?1x and z?1y have I as their midpoint. Therefore, by inversion we have (z?1x)?1 = z?1y and z2 = zxz?1y. We write ?x = zxz?1. Note that d2(I,?x) = d2(I,x) < ?. 174 Therefore, there exists ?X ? so(n) with bardbl?Xbardbl2 < ? such that ?x = e ?X, and we have z2 = e ?XeY . From this and Theorem 6.4.1, due to the bi-invariance of d2, we have d2(I,z2)?bardbl?Xbardbl2 +bardblYbardbl2 < 2? < ?, which implies that there exists Z??so(n) with bardblZ?bardbl2 < 2? < ? with z2 = e2Z?. Therefore, z = eZ with Z = Z?2 and bardblZbardbl2 < ? or equivalently d2(I,z) < ?. This means that the midpoint of any minimizing geodesic connecting two points in B2(I,?) belongs to B2(I,?). This proves the strong convexity of B2(o,?). For the second statement, let X ?so(n) be such that bardblXbardbl2 and x = eX. In order to use (6.37) we normalize X as ?X = XbardblXbardbl = Xd(I,x) (note thatbardblXbardblis the Riemannian norm in (6.6)). If ?1(|?1|= ?) and ?2 are the two largest eigenvalues (in absolute value) of X, then the largest eigenvalue of?ad2?X by (6.39) is equal to (?+|?1|)2d2(I,x) . Hence, the largest eigenvalue of the curvature operator along X is ?1 = 14 (?+|?1|)2d2(I,x) . Therefore, we have ?1 ? ?2d2(I,x). Next, from (6.37) we see that Hessd(I,x) will be positive definite along non-radial directions if??1d(I,x) < ?2 or equivalently if ? < ?2. Along a radial direction the Hessian is zero. 6.4.4 2-norm Finsler Balls in Gk(n) The next Theorem is the counterpart of Theorem 6.4.3 in Gk(n): Theorem 6.4.4. For o?SO(n) define the 2-norm (Finsler) ball B2({oKk(n)},?) ={{oetXKk(n)}|X?gk(n),bardblXbardbl2 < ?}. (6.41) Then B2(o,?) is a (Riemannian) strongly convex set in Gk(n) for every ? < ?4. Moreover, the Riemannian distance function x mapsto? d(o,x) is strictly convex inside 175 B2(o,?) along non-radial directions; and it is only convex along radial directions. Proof. Both parts follow with a very minor modification from the proof of Theorem 3.1 in [47]. The second part also can be shown similar to the previous theorem, by noting that here ?1 < 4?2d2({IK k(n)},x) . Remark 6.4.4. These Finsler balls should not look esoteric. Recall that one can associate [n2] angles, called angles of rotation, with any matrix in SO(n). B2(I,?) is the set of all elements of SO(n) whose maximum angles of rotation (in absolute value) are less than ?. A similar interpretation is true for Gk(n) with principal angles instead. However, here we need to allow for both positive and negative principal angles. This means that for any s?B2({IKk(n)},?), its reflection with respect to {IKk(n)}is also in B2({IKk(n)},?). Figure 6.1 shows both B(0,?) and B2(0,?) in so(n), which transform to B(I,?) and B2(I,?) in SO(n) via the matrix exponential. The meaning ofB2(I,?) being larger than B(I,?) is clear inthe picture. The relation between B(I,?) and B2(I,?) in SO(n) is very similar to the relation between the unit circle and square in the plane. B(0,?) ={X?so(n)|bardblXbardbl= 1?2bardblXbardblF < ?} B2(0,?) B(0,?) so(n) ? B2(0,?) ={X?so(n)|bardblXbardbl2 < ?} B(0,?)?B2(0,?)?so(n) Figure 6.1: In so(n), the 2-norm Finsler ball of radius ? ? ? is larger than the Riemannian ball of radius ?. 176 6.5 Existence and Uniqueness of the Local Riemannian Lp Mean in Finsler Balls The following theorem gives existence and uniqueness of the local Lp in Finsler balls. Recall that the local Lp mean in a set B is defined by solving min?B fp(x) (see (3.2) and Section 3.1.1). Theorem 6.5.1. Let{xi}Ni=1 be a set of data points in M = SO(n) which lies in a Finsler ball B2(o,?) with ? < ?4. The local Lp Riemannian mean (1 < p??) with respect to the weight vector w for this set exists is unique and lies inside B2(o,?). For p = 1 if the points do not lie on a geodesic, again the same holds, and if the the points lie on a geodesic the mean might not be unique. The same holds for M = Gk(n) with ? < ?8. Proof. We briefly give one proof for bothSO(n) and Gk(n). First, note that the local mean exists. Next, from Theorems 6.4.3, 6.4.4 and Corollary 2.4.1 we conclude that for any two points q,x?B(o,?), xmapsto?dp(q,x)(p > 1) is a strictly convex function; so is x mapsto? fp(x). For p = 1, as before, we have the strict convexity of f1(x) if not all the data points lie on a geodesic. For p = ?, we might assume that the L? is the minimum of xmapsto? ?f(x) = maxi d2(xi,x), which is again strictly convex1. Therefore, except for the possible degenerate case when p = 1, if the local Lp mean does not lie on the boundary of B(o,?), then it is unique. To see that the mean cannot lie on the boundary of B, first, recall the properties of strongly convex sets 1We borrow this trick from [76, p. 168], in which the trick is used to avoid the unpleasant dealing with non-strict convexity of xmapsto?d(xi,x). 177 from Theorem 2.3.2. Let x be a point on the boundary of B2(o,?). Then, the gradient of fp (1?p K stop; otherwise goto step 2. Dyadic geodesics extrapolation 1. Let M = SO(n) or M = Gk(n). Consider x0,x1 ?M. Set i = 1 and fix integer J ?1. 2. Level i: Given points {x0,...,xi?1,xi}, find xi+1 = Ext1(xi,xi?1) from item 2 of Theorem 6.7.1 or item 2 of Theorem 6.7.2. 3. Increment i. If i > J stop; otherwise goto step 2. Table 6.1: Pseudo-code for dyadic geodesic interpolation and extrapolation. calculation of the form ?1 2 (x1,x2) = x1e12 logxT1 x2 compared with the SVD based mid- point calculation in 1 has much higher complexity. Finding the matrix logarithm has the heaviest load, although matrix exponential is also computationally demand- ing. Moreover, in numerical calculation of logxT1 x2, there is no guarantee that the result is skew-symmetric to machine precision and one might need to use special algorithms for this purpose [23]. Without going into the details of the complexity of the related numerical algorithms we give an example to compare the direct and dyadic interpolation methods. Example 6.7.1. We perform a comparison between the computational times for direct and the dyadic geodesic construction techniques in MATLAB c?. Table 6.2 shows the main parts of the used MATLAB code. We try levels K = 1 and K = 2, 184 [U,S,V] = svd(x1+x2);\%Level 1 M11 = U*V?; [U,S,V] = svd(M11+x1); M21 = U*V?; [U,S,V] = svd(M11+x2); M22 = U*V?; N = 2^K;%K=Level X = logm(x1?*x2); M = x1; E = expm(1/N*X); for i=1:N-1 M = M * E;% M = x1e iN logxT1 x2 end Table 6.2: Left: Main part of a MATLAB code to compute the geodesic between x1,x2 ?SO(n) at two levels K = 1 and K = 2 using dyadic efficient interpolation (See Table 6.1. Right: Main part of a MATLAB code to compute the geodesic between x1,x2?SO(n) at level K, by direct matrix logarithm-exponential calcula- tions. i.e., 1 and 3 intermediate points. In the direct method, instead of finding logxT1 x2 per computed point we just find E = e 12K logxT1 x2, and in the subsequent calculation use it to make the method more efficient. Table 6.3 shows the results of the median of the computation time for the two methods. It is clear that the dyadic method is much faster. However, note that its efficiency decreases by the level or the number of the generated points geodesic. As explained before, the efficiency of the algorithm mostly stems from avoiding the use of matrix logarithm. At the same time the direct method needs to find the logarithm only once; therefore, as K increases that advantage of the dyadic method decreases. It is interesting to mention that the first step in some of the most powerful methods for computing the logarithm of a matrix A (including the method used in MATLAB) is the so-called ?inverse scaling and squaring? [22], in which successive square roots of A, that is A 12i (i = 1,2,...), are found. On the other hand, finding the geodesic midpoint involves finding e12 logxT1 x2, which is nothing but the square rootofxT2 x1. Theretofore, it should not be surprising that the dyadic method is at least 10 times faster than the direct method at Level K = 1. 185 Level K = 1 Level K = 2 n dyadic direct 3 0.0001 0.0020 30 0.0029 0.0410 100 0.0470 0.4586 n dyadic direct 3 0.002 0.0020 30 0.0095 0.0596 100 0.1428 0.3963 Table 6.3: Comparison between computation times for two geodesic interpolations MATLAB codes in Table 6.2. We try two levels K = 1 and K = 2. Times are in Seconds and each is the median of 100 independent runs. 6.7.2 Efficient Geodesic Interpolation and Extrapolation in Gk(n) We mentioned about the isometric identification ofPk,n the space of n?n orthog- onal projection matrices of rank k and Gk(n) in Subsection 6.2.2.1. By standard embedding ofPk,n inSn, the space of n?n positive semi-definite symmetric matri- ces, we can define the extrinsic mean of{xi}Ni=1?Pk,n as ProjPk,n(?EC(x1,...,xN)), as before. Here, ProjPk,n :Sn?Pk,n is defined as ProjPk,n(x) = argminy?Snbardblx?ybardbl2F. (6.43) The following lemma gives this projection operator explicitly: Lemma 6.7.1. Let A ?Rn?n be symmetric and positive semi-definite. Then we have ProjPk,n(A) = PA = UUT, where Un?(n?k) is a matrix whose columns are the eigenvectors of A associated to its k largest eigenvalues. If the k largest eigenvalues of A are strictly larger than the rest, then PA is unique. Proof. We recall that any n?n orthogonal projection P of rank k, can be associated to a k-dimensional subspace of Rn, S, and P can written as P = UPUTP where UP is 186 an n?k matrix whose columns are an orthonormal basis of S. Now let A = U?UT be the full SVD or EVD of A, where ? = diag(?1,...,?n) with ?1 ?...??n ?0. We want to find a P or a UP which minimizes f(P) =bardblUPUTP ?Abardbl2F. (6.44) Up to a additive and multiplicative constants f(P) is equal to?tr(UPUPA). So we need to maximize maxUP tr(UPUTP A). Which is equivalent to solving max UP tr(UTUP(UTUP)T?). (6.45) Note that ?P = UTUP(UTUP)T again is a rank k orthogonal projection matrix. The above function is equal to summationtexti ?Pii?i, where ?Pii is the ith element on the diagonal of ?P. It is easy to check that any entry of an orthogonal projection matrix is not larger than 1 in absolute value. Therefore, the maximum value summationtexti ?Pii?i can achieve is summationtextk i=1 ?i. This can happen if ?P = bracketleftbigI k?k 0 0 0 bracketrightbig. But by choosing U P equal to the first k columns of U we can achieve this particular ?P and that maximum. This completes the proof. Next, we have the following theorem: Theorem 6.7.2. Let S1 and S2 be subspaces representations of s1,s2 in Gk(n), re- spectively. Also let P1 = S1ST1 and P2 = S2ST2 denote the corresponding orthogonal projection matrices associated with s1 and s2. We have: 1. The extrinsic meanofs1 ands2 (orP1 andP2) canbeexpressed as?ex(P1,P2) = 187 ProjPk,n(?EC(P1,P2)) = SST, where S is an n?k matrix of k eigenvectors associated with the k largest eigenvalues of ?EC(P1,P2) = P1+P22 . Moreover, SST is the orthogonal projection matrix associated with the midpoint of the minimizing geodesic between s1 and s2 and S is a subspace representative of the midpoint. The above projection gives a unique a point if d2(s1,s2) < ?2. 2. Denote the reflection of s2 w.r.t s1 along the minimizing geodesic connecting them by Ext1(s1,s2). Then Ext1(s1,s2) has the subspace representation (2P1? I)S2. Proof. Without loss ofgenerality we canassume S1 = bracketleftbig Ik?k0(n?k)?k bracketrightbigandS2 = Q bracketleftbiggC(? 1,...,?k) S(?1,...,?k) 0(n?2k)?k bracketrightbigg , where Q is of the form Q = bracketleftbigV 00 U bracketrightbig (see (6.16)). Note that P1 = bracketleftbigIk?k 00 0bracketrightbig and P2 = Q bracketleftbigg C2(?1,...,?k) C(?1,...,?k)S(?1,...,?k) 0 C(?1,...,?k)S(?1,...,?k) S2(?1,...,?k) 0 0 0 0 bracketrightbigg QT. (6.46) Denote the matrix in the middle by ?P2(?1,...,?k). Note that if ?P = U?P2?UT?P 2 is an SVD or EVD of P, then P2 = QU?P2?(QU?P2)T is an the SVD of P2. Moreover, for SVD?s of ?EC = P1+P22 and E = P1+ ?P22 we have a similar relation, due to the special forms of P1 and Q. Now, we compute an SVD of E. Note that since E is symmetric and positive semi-definite its EVD and SVD coincide. We rewrite E more explicitly as E = 12 bracketleftbigg diag(cos2 ?1+1,...,cos2 ?k+1) diag(cos?1 sin?1,...,cos?k sin?k) 0 diag(cos?1 sin?1,...,cos?k sin?k) diag(sin2 ?1,...,sin2 ?k) 0 0 0 0 bracketrightbigg . (6.47) Note that ?i appears in four entries of E at locations (i,i), (i,i + k), (i + k,i) and 188 (i+k,i +k). Those entries can be put in a matrix Ei of dimension 2?2 as: Ei = 12 bracketleftbigg 1+cos2 ?i cos?i sin?i cos?i sin?i sin2 ?i bracketrightbigg . (6.48) It is easy to check that Ei has ?i,1 = cos2 ?i2 and ?i,2 = sin2 ?i2 as eigenvalues. Note that since ?i < ?2, we have ?i,1 > ?i,2. In fact, for the same reason we have cos2 ?i2 > 12 and sin2 ?i2 < 12; and hence, ?i,1 > ?j,2 for 1?i,j?k. The corresponding eigenvectors can be found as vi,1 = bracketleftbigcos ?i 2 sin ?i2 bracketrightbig and v i,2 = bracketleftbig?sin ?i 2 cos ?i2 bracketrightbig. Now one can check that E has 2k eigenvalues ?k,1 ? ... ? ?1,1 > ?1,2 ? ... ? ?k,2 (1 ? i ? k) and the rest of its n?k eigenvalues are zero. The corresponding eigen matrix can be constructed as UE = [?UE, ?U?E], where ?UE is of dimension n?2k and is defined as ?UE = bracketleftbiggC(?1 2 ,..., ?k 2 ) ?S( ?1 2 ,..., ?k 2 ) S(?12 ,...,?k2 ) C(?12 ,...,?k2 ) 0 0 bracketrightbigg , (6.49) and ?U?E is an n?(n?2k) matrix whose columns span the orthogonal complement of the subspace spanned by the columns of ?UE. Also define the diagonal matrix ? as ? = diag(cos2 ?1,...,cos2 ?1,sin2 ?1,...,sin2 ?k,0,...,0bracehtipupleft bracehtipdownrightbracehtipdownleft bracehtipupright n?2k ). (6.50) Note that E = UE?UTE is an SVD (or EVD) of E. By Lemma 6.7.1, the projection of E onto the space of orthogonal projection matrices of dimension n?n and rank k is PE = UUT where U is of dimension n?k and U = bracketleftbiggC(?1 2 ,..., ?k 2 ) S(?12 ,...,?k2 ) 0 bracketrightbigg . This proves the first statement. To prove item 2, we recall the reflection formula (6.19). The geodesic reflection of S2 with respect to S1 is found by negating S(?1,...,?k) in 189 S2 = Q bracketleftbiggC(? 1,...,?k) S(?1,...,?k) 0(n?2k)?k bracketrightbigg . Note that (2P1?In?n)S2 = Q bracketleftbigg C(? 1,...,?k) ?S(?1,...,?k) 0(n?2k)?k bracketrightbigg . (6.51) Therefore, (2P1?In?n)S2 spans the same subspace as S?2 (the geodesic reflection of S2 with respect to S1). 6.8 Two Applications There has been an increasing number of real-world applications of averaging or finding the mean for manifold-valued data. Of course, most applications are related to the cases where the underlying manifold is a homogenous space. Here, we consider two less explored applications, where the ideas introduced in this chapter fit and serve very well. The first one is the extension of the famous Nelder-Mead algorithm to SO(n) for maximization of a non-smooth function. The second application is the clustering of subspaces by extension of the K-means algorithm to Gk(n)1. Our emphasis is on using the pairwise mean (instead of the Riemannian L2 or the MI mean). Also for finding the pairwise mean and other related calculations we use the efficient methods introduced before. 1The author envisioned these two applications in the early stages of this research. However, later he found out that they had already been studied by other researchers. Papers [25, 63] and [37] (and possibly few others) address the two mentioned applications, respectively. Nevertheless, the treatment presented here is developed independently and differs from the ones in the mentioned references. 190 6.8.1 Nelder-Mead Direct Search Algorithm in SO(n) Direct search methods for optimization try to find a minimum of a function without the use of derivative information [55]. They usually are very useful in cases where the function is non-smooth, its derivative is expensive or impossible to compute or the function is not a even numerical in its nature. A very popular direct search algorithm known as Nelder-Mead simplex algorithm [70] can be easily extended from Rn to a manifold and in fact it can be efficiently implemented in SO(n) or Gk(n) using the methods introduced, before. Interestingly, the Nelder-Mead algorithm encompasses all the operations we mentioned so far: finding the center of mass, geodesic midpoint assignment and geodesic reflection of a point with respect to another one. We briefly describe the algorithm and give an example of its use for solving a non-differentiable maximization problem with applications in adaptive optics. In Table 6.4 a pseudo code for the Nelder-Mead algorithm which is adapted to a Riemannian manifold M is given. The adaptation of the original Euclidean version to M is straightforward, and mainly is achieved by the obvious use of the exponential map of M. The algorithm consists of very simple steps. Denote the dimension of M by m. The algorithm will be given initial m+1 points considered as vertices of a simplex. Then, through a sequence of function evaluations, numerical comparisons, center of mass calculations and reflections the initial simplex will move, shrink or expand, and typically by getting close to a minimum point the simplex starts to shrink such that it finally will converge to that point. The stopping criterion in Step 3, compares the size of the current simplex with a given threshold to stop. We 191 should mention that except for special cases there is no proof for convergence of this algorithm. Nevertheless, the Nelder-Mead algorithm has remained a popular tool among many practitioners in many different fields [55]. Here, we choose to use the pairwise mean as the center of mass in Step 4. Note that, interestingly, all the operations in Steps 4-8 can be calculated via pairwise operations. Also note that the operation of expansion in Step 6 which gives xe = exp?x(?2exp?1?x xm+1) can be viewed as the reflection of ?x with respect to xr which itself is the reflection of xm+1 with respect to ?x, and was computed in Step 5. Our concern, of course, should be that the defined operations remain well-defined. For this also there is no guarantee, unless we keep track of the points and force them to remain in valid distances, which can be prohibitively expensive or impractical. Nevertheless, one hopes that in the case of SO(n) or Gk(n) by starting in a large convex 2-norm Finsler ball the iterates will stay there. 6.8.1.1 An Application in Adaptive Optics: Maximization of a Non-differentiable Function in SO(n) In [27], for applications in adaptive optics, maximization of the non-differentiable cost function f : O(n)?R defined as f(x) = nsummationdisplay i=1 max 1?j?N (xCjxT)ii (6.52) is proposed. Here, {Ci}Ni=1(N ? 2) is a set of symmetric matrices. Without loss of generality we can assume that the domain of f is SO(n). The mentioned max- 192 Nelder-Mead Algorithm on a Riemannian Manifold 1. Let (M,d) be a complete Riemannian manifold of dimension m with exponential map expx at x?M. Consider f : M ?R. Set ? > 0 2. Choose x1,...,xm+1 ?M. 3. if maxi,j d(xi,xj) < ?, then stop; else, reorder the points such that f(x1)?...?f(xm+1) and goto step 4. 4. Find ?x the center of mass of{xi}mi=1. 5. (Reflection) Find xr the reflection of xm+1 w.r.t. ?x as xr = exp?x(?exp?1?x xm+1) (a) if f(x1)?f(xr) < f(xm), then xm+1 ??x and goto step 3. 6. (Expansion) if f(xr) < f(x1), then find xe the expansion of xm+1 w.r.t. ?x as xe = exp?x(?2exp?1?x xm+1). (a) if f(xe) < f(x1), then set xm+1 ? xe and goto 3, else set xm+1 ?xr and goto step 3 7. (Contraction) else find xc the contraction of xm+1 w.r.t. ?x as xc = exp?x(12 exp?1?x xxm+1) = ?1 2 (?x,xm+1) (a) if f(xc)?f(xm+1), then set xm+1 ?xc and goto 3 (b) else goto step 8 8. (Reduction) For i = 2,...,m+ 1 set xi??1 2 (xi,x1) and goto step 3 Table 6.4: Pseudo-code of the Nelder-Mead algorithm on a Riemannian manifold M. imization problem does not have a unique solution [27]. In the special case that N = 2, C1 = C and C2 = 0n?n, the maximum value of f is equal to sum of the positive eigenvalues of C, and certainly a solution in this case is the transpose of the eigen-matrix of C [27]. For this special case we examine the Nelder-Mead algorithm adapted to SO(n) to minimize?f. Let C be as follows (to 4 decimal digits) C = bracketleftbigg1.0133 0.7204 1.0904 0.1530 0.7204 ?0.4630 ?0.5926 1.4637 1.0904 ?0.5926 0.8661 ?0.9719 0.1530 1.4637 ?0.9719 ?0.6416 bracketrightbigg . (6.53) 193 The eigenvalues of C are 2.2136,1.6085,?0.9703 and?2.0769. Therefore, the max- imum of f which is equal to the sum of positive eigenvalues of C is 3.8220. To start the Nelder-Mead algorithm, we need to choose a simplex with m+1 vertices, where m is the dimension of M. Here, m = n(n+1)2 = 10. We generate m+1 = 11 random points inside the Finsler ball B2(0, ?2) as the initial simplex. We use the pairwise mean as the center of mass in Step 4 of the algorithm. For this step, reflection, expansion, contraction and reduction steps (see Steps 5-8) we use efficient interpo- lation and extrapolation schemes introduced in Theorem 6.7.1. Note that in the reflection step 5 we have xr = ?xxTm+1?x, and in the expansion step xe is the reflection of ?x with respect to xr; therefore xe = xr?xTxr, which can be computed efficiently again. After almost 110 iterations the calculated value for maxf will be equal to 3.8220, in four decimal digits. Denote the solution in four decimal digits by x?. We have x? and x?CxT? as follows: x? = bracketleftbigg 0.5325 ?0.5910 ?0.5908 ?0.1346 ?0.7519 ?0.0551 ?0.6477 0.1102 0.3817 0.6164 ?0.4003 0.5605 ?0.0734 ?0.5175 0.2670 0.8097 bracketrightbigg and x?CxT? = bracketleftbigg?1.0807 0.0000 0.0000 ?0.3316 0.0000 2.1020 0.2347 ?0.0000 0.0000 0.2347 1.7200 0.0000?0.3316 ?0.0000 0.0000 ?1.9665 bracketrightbigg . (6.54) Note that with C1 = C as above and C2 = 04?4, we have f(x?) = 4.8311. The convergence is slow in terms of the number of iterations. However, we should note that the computation load for each step is not significant. To see the behavior of the calculated maximizer in terms of iteration index, we have tabulated the approximate maximum in 15 decimal digits in Table 6.5. The last row shows maxf in 15 decimal digits, which is found using the (numerical) eigenvalues of C. We should note, despite their slow convergence, in many practical problems direct search methods 194 iteration i computed maxf 1 2.99792831533123 10 3.40130355940283 20 3.74564184272491 30 3.77369724540551 40 3.81199282263081 50 3.81796650623473 .. . .. . 110 3.822008220185086 .. . .. . 180 3.82201461161657 190 3.82201461242503 200 3.82201461328261 15digit 3.82201461365772 Table 6.5: Computed maximum of the non-differentiable function f in (6.52) in terms of the iteration index in the Nelder-Mead algorithm in SO(4). including the Nelder-Mead algorithm are the only choices. 6.8.2 Clustering of Subspaces Data clustering is an important task in many data and signal processing algorithms. The standard clustering problem consists of clustering N data points x1,...,xN in Rn to Nc clusters C1,...,CNc such that the members in cluster Ci are closer to the center of Ci than to the center of any other clusters. By the center of Ci we mean the Euclidean mean of the data points inCi. The center of each cluster can be used as a representative of the cluster or its members in subsequent analysis. A very popular algorithm to solve this problem is an iterative algorithm called the K-means algorithm [39]. In occasions, we might encounter a situation where our data points are linear subspaces themselves and we would like to cluster the data. This problem can be named as Grassmannian Clustering [37]. Interestingly, Grassmannian-valued data appear in different applications and clustering of such data has many potential applications. Two interesting application are the pre-coder codebook design [68] and channel quantization in Multi-Input Multi-Output wireless 195 communication channels [48, 54]. In [37] a Grassmannian K-means algorithm and in [54] a similar K-means algorithm together with the related Vector Quantization algorithm are proposed. These two algorithms use the extrinsic mean (see Section 6.7) for points on Gk(n), mostly due to the ensued ease of calculation. Here, we implement a K-means algorithm with the pairwise mean, which is the easiest mean to compute (among all the means we studied). K-means Algorithm on a Riemannian Manifold 1. Let (M,d) be a complete Riemannian manifold. Consider a set of data points{xi}Ni=1. Fix Nc?N. 2. Randomly partition{xi}Ni=1 into Nc non-empty clustersC1,...,CNc. 3. for i = 1,...,Nc, find ?xCi the center ofCi. 4. for i = 1,...,N, if d(xi, ?xCj)?d(xi, ?xCl)?lnegationslash= j then place xi inCj. if more than one j satisfies that condition choose one of them randomly. 5. if one of the clusters is empty assign one data point to it randomly. 6. if in Step 4, none of the clusters have changed, then stop; else goto Step 3. Table 6.6: Pseudo-code for the K-means algorithm on a Riemannian Manifold M. It is assumed that finding the center of mass is a well-defined operation. 6.8.2.1 An Example of Clustering of Subspaces With the K-means Algorithm In our example, we generate some data points in Gk(n) (k = 3 and n = 20) around Nc = 3 centers, and then try to cluster the data points and recover the centers via the K-means algorithm. First, we briefly describe the process to generate these data points. Denote the centers by c1,c2,c3 ? Gk(n) and the associated subspace representation by S1,S2,S3?Rn?k. We set d2(c1,c2) = ?24 = l2 and d2(c1,c3) = ?6 = 196 l3 and generate c2 and c3 with reference to c1; hence, we can set S1 = bracketleftbigg Ik?k 0 bracketrightbigg . For i = 2,3 we generate Ci = bracketleftbigg 0 ??CTi ?Ci 0 bracketrightbigg , (6.55) where the elements of ?Ci are of uniform distribution on the interval [0,1]. We then normalize Ci as Ci ? CibardblCibardbl2li. Now Si is equal to the first k columns of eCi. Note that C1 = 0. Next, we generate the actual data points around these centers. We generate n1 = 6, n2 = 3 and n3 = 2 points around c1, c2 and c3, respectively, as follows: Xi,j = Ci +? bracketleftbigg 0 ?WTi,j Wi,j 0 bracketrightbigg 1?i?Nc, 1?j?ni, (6.56) where Wi,j ?R(n?k)?k manifests the noise, and is of standard normal elements and ? is a constant controlling the noise power. We set Xij as the first k columns of eXi,j. After shu?ing all these subspaces we relabel them as X1,...,XN, where n = n1 + n2 + n3 = 11. The parameter ? is a variable that we shall change. First we set ? = 0.05. We implement the K-means algorithm as in Table 6.6 and choose the pairwise mean to find the center of mass of the clusters in Step 3. We perform 20 iterations to compute the pairwise mean. If the algorithm recovers the clusters correctly, then either based on the number of points in each estimated cluster or based on the distances d(ci,?cj) we can recover the label of each of center. We do this relabeling and show the distances between the estimated centers of clusters and the original centers in Table 6.7. The left panel in Table 6.7 shows the median of the distances between the estimated centers and the original centers for T = 100 197 d(ci,?ci) ?c1 ?c2 ?c3 c1 0.0416 0.5657 0.4712 c2 0.5614 0.0392 0.4592 c3 0.4639 0.4613 0.0406 d(ci,?ci) ?c1 ?c2 ?c3 c1 0.1884 0.5268 0.3821 c2 0.5310 0.1885 0.4356 c3 0.3911 0.5160 0.1434 Table 6.7: Distances between the centers of the estimated clusters by the K-means algorithm (on G3(20)) and the original centers in the example of Subsection 6.8.2.1. Each number is the median of the results of 100 independent runs of the experiment. The table on the left shows the distances for noise level ? = 0.05, and the one on the right shows the distances for ? = 0.20. The ability of the K-means algorithm to correctly detect the clusters reduces as ? increases. independent runs. The distances are computed from (6.20). The centers are kept fixed and the data around them are generated in each run. Next, we increase the noise level ? to 0.20, and show the results of T = 100 runs in the right panel of Table 6.7. Expectedly, the ability of the K-means algorithm to resolve the clusters decreases by increasing the noise, and as can be seen d(ci,?ci) increases as ? increases. 198 Chapter 7 Conclusions and Future Research Directions In this dissertation we studied some mostly theoretical aspects of the problem of averaging on Riemannian manifolds. There are different ways of defining means of data points on a Riemannian manifold, and we studied in detail two of them: the Riemannian Lp mean (in Chapters 3 and 4) and the recursive-iterative means (in Chapter 5). The most immediate question is that of existence and uniqueness of a defined mean. In Theorem 3.2.1 one such a question is answered for Riemannian Lp means. This result improves upon previous results such as the one in [61]. However, there might be room for further improvement, e.g., in Theorem 3.2.1 can we get rid of the specific shape of the domain that includes the support of the probability measure (i.e., a ball)? In Chapter 4, we studied the L? center of mass, smoothness and convexity properties of the Riemannian Lp mean, and averaging or sensitivities properties of the L2 mean. With regards to convexity there are still several open questions as explained in Section 4.5 which especially might be of interest to pure geometers, as the convex hull of finite points in a Riemannian manifold is still a mystery. Possible 199 practical implications of the non-averaging properties whcih we explained in Section 4.6 can be a subject for further research. The fact is that in a manifold of positive curvature, our intuition about the notion of average or mean might turn out to be wrong, and this issue deserves further experimental study. Also we introduced the notion of weak averaging property which is a global notion, and in Theorem 4.6.1 we identified the radius of a ball containing the data points to guarantee the weak averaging. We also conjecture that this radius can be improved (see Conjecture 4.6.1). We did not study in detail the computational methods for finding the Lp mean. However, convergence analysis of gradient descent and other algorithms are widely open questions. See [61] for a result which guarantees convergence of the gradient descent for finding the L2 mean in symmetric spaces, provided the data points are in a ball which is smaller than the one that is needed to have a unique mean. In Chapter 5 we studied a large class of recursive-iterative means which are defined based on the notion of mean-invariance. Our analysis generalizes the theories developed in [5] and [59]. The notion of a primitive vectorial mean function and the use of the radius of the minimal ball as a Lyapunov function allowed us to develop a theory, in Section 5.2, which parallels the Perron-Frobenius theory about the infinite power of a primitive stochastic matrix. Of course, an important part of the Perron-Frobenius theorem is the explicit description of the limit mean function, which in a Riemannian manifold is very difficult describe. We also studied the dynamical system point of view of recursive-iterative means. We also studied the simple pairwise mean and its relation with Perimeter Shrinkage scheme and cyclic 200 pursuit on manifolds. An open problem in this regard is to design an algorithm which brings a set of agents in cyclic pursuit to a closed geodesic in a stable fashion. We also studied weighted MI and Generalized Weighted MI (GWMI) means which can resemble the behavior of the Lp means. A main shortcoming of the recursive- iterative means is that it is not easy to associate statistical notions to them. For example, it is not obvious what the best way is to define a quantity associated with the MI mean which parallels the variance in the case of the L2 mean. In Chapter 6, we focused on results specific to SO(n) and Gk(n). These man- ifolds appear naturally in many applications, and numerical computations on them boil down to standard matrix calculations. Our major focus was on identifying 2- norm Finsler balls of appropriate radius as domains of existence and uniqueness of the Lp and MI means. Our efforts were not completely successful especially with respect to the Lp mean. In particular, we were only able to show that in a 2-norm ball of radius smaller than half of the largest strongly convex 2-norm Finsler ball, the local (and not global) Lp mean exists uniquely (see Theorem 6.5.1). Also while we could show that in the largest convex Finsler balls the MI and WMI means exist uniquely, we were not able to extend this result to GWMI means (see Theorem 6.6.1 and remarks afterwards). As explained in the remarks of Theorem 6.6.1, we conjec- ture that these results can beimproved. We also studied some computational aspects of averaging on SO(n) and Gk(n). In particular, using some standard embeddings of SO(n) and Gk(n) in Euclidean space and in the space of symmetric matrices, respectively, we gave the formulas to find the midpoint of the minimizing geodesic between two points without the use of the matrix exponential and logarithm. This 201 calculation can be used to efficiently find the MI and pairwise means. We also stud- ied two simple examples: the Nelder-Mead direct search algorithm on SO(n) and the K-means clustering algorithm on Gk(n) both of which have components that can be implemented efficiently via the methods we introduced. Besides the above mentioned open problems which are directly related to our work, there is a major research direction that can be pursued. An important class of problems arises when we know that a data set belongs to an unknown manifold, e.g., a low dimensional manifold and we still want to process this data set. Therefore, one has to both learn the manifold and its exponential map, and perform data analysis. Learning a manifold has close relations with approximating a manifold with a graph (see e.g., [86, 57, 8]), and it would be very interesting to study existence and uniqueness conditions and other properties of the means based on the quantities associated with the graph of the data. 202 Bibliography [1] U. Abresch and W. T. Meyer. Pinching below 14, injectivity radius, and conju- gate radius. Journal of Differential Geometry, 40(3):643?691, 1994. [2] P.-A. Absil, R. Mahony, and R. Sepulchre. Riemannian geometry of Grass- mann manifolds with a view on algorithmic computation. Acta Applicandae Mathematicae, 80(2):199?220, January 2004. [3] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ, 2008. [4] J. C. Alvarez and C. Duran. An introduction to Finsler geometry. http://www.math.poly.edu/research/finsler/intro/one.html. [5] T. Ando, C. K. Li, and R. Mathias. Geometric means. Linear Algebra and its Applications, 385:305?334, 2004. [6] V. Arsigny. Processing Data in Lie Groups: An Algebraic Approach. Applica- tion to Non-Linear Registration and Diffusion Tensor MRI. PhD thesis, ?Ecole Polytechnique, Nov. 2006. [7] E. Begelfor and M. Werman. Affine invariance revisited. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 2087?2094, 2006. [8] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):13731396, 2003. [9] M. Berger. A Panoramic View of Riemannian Geometry. Springer, 2007. [10] A. Bhattacharya. Nonparametric Statistics On Manifolds With Applications To Shape Spaces. PhD thesis, The University of Arizona, Nov. 2008. [11] R. Bhattacharya and V. Patrangenaru. Large sample theory of intrinsic and extrinsic sample means on manifolds. I. The Annals of Statistics, 31(1), 1-29 2003. 203 [12] D. Le Bihan, J.-F. Mangin, C. Poupon, C. A. Clark, S. Pappata, N. Molko, and H. Chabriat. Diffusion tensor imaging: Concepts and applications. Journal of Magnetic Resonance Imaging, 13:534?546, 2001. [13] A. Blochand I. Iserles. Commutators of skew-symmetric matrices. International Journal of Bifurcation and Chaos, 15:793?801, 2005. [14] A. M. Bruckstein. Why the ant trails look so straight and nice. The Mathe- matical Intelligencer, 15(2):59?62, June 1993. [15] D. Burago, Y. Burago, and S. Ivanov. A Course in Metric Geometry, volume 33 of Graduate Studies in Mathematics. American Mathematical Society, 2001. [16] P. Buser and H. Karcher. Gromov?s almost flat manifolds. Soci?et?e Math?ematique de France, 1981. [17] S.R. Buss and J.P. Fillmore. Spherical averages and application to spherical splines and interpolation. ACM Transcations on Graphics, 20(2):95?126, April 2001. [18] I. Chavel. Riemannian Geometry: A Modern Introduction. Cambridge Univer- sity Press, second edition edition, 2006. [19] J. Cheeger and D. Gromoll. On the structure of complete manifolds of nonneg- ative curvature. Annals of Mathematics, 96(3):413?443, November 1972. [20] F. H. Clarke, Yu. S. Ledyaev, R.J. Stern, and P. R. Wolenski. Nonsmooth Analysis and Control Theory. Graduate Texts in Mathematics. Springer, 1998. [21] T. H. Colding and W. P. Minicozzi II. Width and mean curvature flow, July 2007. arXiv0705.3827C. [22] P. I. Davies and N. J. Higham. A Schur-Parlett algorithm for computing matrix functions. SIAM Journal on Matrix Analysis and Applications, 25(2):464?485, 2003. [23] L. Dieci. Considerations on computing real logarithms of matrices, Hamiltonian logarithms, and skew-symmetric logarithms. Linear Algebra and its Applica- tions, 244:35?54, 1996. [24] M. P. do Carmo. Riemannian Geometry. Birkhauser, 1992. [25] D. W. Dreisigmeyer. Direct search algorithms over Riemannian manifolds, December 2006. Available at http://ddma.lanl.gov/Documents/publications /dreisigm-2007-direct.pdf. [26] A. Edelman, T. A. Arias, and S. Smith. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303?353, 1998. 204 [27] B. L. Ellerbroek, C. Van Loan, N. P. Pitsianis, and R. J. Plemmons. Optimiz- ing closed-loop adaptive-optics performance with use of multiple control band- widths. Journal of Optical Society of America A, 11(11):2871?2886, November 1991. [28] N. I. Fisher, T. Lewis, and B. J. J. Embleton. Statistical Analysis of Spherical Data. Cambridge University Press, 1987. [29] P.T. Fletcher, L. Conglin, S.M. Pizer, and S. Joshi. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Transactions on Medical Imaging, 23(8):995?1005, Aug 2004. [30] K. A. Gallivan, A. Srivastava, X. Liu, and P. Van Dooren. Efficient algorithms for inferences on Grassmann manifolds. In Proc. 12th IEEE Workshop on Statistical Signal Processing, St. Louis, October 2003. [31] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Uni- versity Press, 1996. [32] D. Groisser. Newton?s methods, zeros of vector fields, and the Riemannian center of mass. Advances in Applied Mathematics, 33, Nov 2004. [33] K. Grove. Riemannian Geometry: A Metric Entrance. Number 65 in Lecture Note Series. Department of Mathematics, University of Aarhus, 2nd edition, Sept. 2002. [34] K. Grove and H. Karcher. How to conjugate C1-close group actions? Mathe- matische Zeitschrift, 132(1):11?20, March 1973. [35] K. Grove, H. Karcher, and E. A. Ruh. Groupactions andcurvature. Inventiones math, 23:31?48, 1974. [36] K. Grove, H. Karcher, and E.A. Ruh. Jacobi fields and Finsler metrics on compact Lie groups with an application to differentiable pinching problems. Math. Ann., 211:7?21, 1974. [37] P. Guber and F. J. Theis. Grassmann clustering. In Proceedings of the European Signal Processing Conference(EUSIPCO), 2006. [38] B. C. Hall. Lie Groups, Lie Algebras, and Representations. Graduate Texts in Mathematics. Springer, 2003. [39] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer, 2001. [40] S. Helgason. Differential Geometry and Symmetric Spaces. Academic Press, 1962. [41] U. Helmke, K. Hper, and J. Trumpf. Newton?s method on Gra?mann manifolds, 2007. arXiv:0709.2205v2 [math.OC]. 205 [42] U. Helmke and J.B. Moore. Optimization and Dynamical Systems. Springer, 2nd edition, 1996. [43] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985. [44] M. Humbert, N. Gey, J. Muller, and C. Esling. Determination of a mean orientation from a cloud of orientations. application to electron back-scattering pattern measurements. Journal of Applied Crystallography, 29:662?666, 1996. [45] A. Iserles, H.Z. Munthe-Kaas, S.P. Norsett, and A. Zanna. Lie-group methods. Acta Numerica, 9:215?365, 2000. [46] J. Jost. Riemannian Geometry and Geometric Analysis. Universitext. SPringer, 3rd edition, 2002. [47] J. Jost and Y.L. Xin. Bernstein type theorems for higher codimension. Journal Calculus of Variations and Partial Differential Equations, 9(4):277?296, 1999. [48] R. W. Heath Jr. Communication and signal processing on the Grassmann manifold. 9th IEEE Workshop on Signal Process- ing for Wireless Communications 2008 Plenary Talk, Available at: http://spawc2008.org/Plenaries/SPAWC2008 Heath.pdf, July 2008. [49] E.W. Justh and P.S. Krishnaprasad. Equilibria and steering laws for planar formations. Systems & Control Letters, 52:25?38, 2004. [50] H. Karcher. Riemannian center of mass and mollifier smoothing. Communica- tions of Pure and Applied Mathematics, XXX:509?541, 1977. [51] H. Karcher. Global Differential Geometry, volume 27 of MAA Studies in Math- ematics, chapter Riemannian Comparison Constructions, pages 170?222. The Mathematical Association of America, 1989. [52] W. S. Kendall. Probability, convexity, and harmonic maps with small image I: Uniqueness and fine existence. Proceedings of the London Mathematical Society, 61(2):371?406, 2 1990. [53] H. K. Khalil. Nonlinear Systems. Prentice Hall, third edition, 2002. [54] J. H. Kim, W. Zirwas, and M. Haardt. Efficient feedback via subspace-based channel quantization for distributed cooperative antenna systems with tempo- rally correlated channels. EURASIP Journal on Advances in Signal Process, January 2008. Article No. 131. [55] T. G. Kolda, R. M. Lewis, and Virginia Torczon. Optimization by direct search: New perspectives on some classical and modern methods. SIAM Re- view, 45(3):385?482, 2003. 206 [56] K. A. Krakowski. Geometrical Methods of Inference. PhD thesis, The University of Western Australia, Aug. 2002. [57] S. Lafon and A. B. Lee. Diffusion maps and coarse-graining: A unified frame- work for dimensionality reduction, graph partitioning, and data set parame- terization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9):1393?1403, Sept. 2006. [58] S. Lang. Fundamentals of Differential Geometry. Graduate Texts in Mathe- matics. Springer, 1999. [59] J. Lawson and Y. Lim. A general framework for extending means to higher orders, 2006. arXiv.org:math/0612293. [60] M. Lazard and J. Tits. Domaines d?injectivite de l?application exponentielle. Toplogy, 4:315?322, 1966. [61] H. Le. Locating Fr?echet means with application to shape spaces. Advances in Applied Probability, 33(2):324?338, July 2001. [62] H. Le and D. G. Kendall. The Riemannian structure of Euclidean shape spaces: A novel environment for statistics. Annals of Statistics, 21(3):1225?1271, 1993. [63] S. Li, M. Choi, H. Kim, and F.C. Park. Geometric direct search algorithms for image registration. IEEE Transactions on Image Processing, 16(9):2215 ? 2224, Sept. 2007. [64] D. J. Love, R. W. Heath Jr., and T. Strohmer. Grassmannian beamforming for Multiple-Input Multiple-Output wireless systems. IEEE Transactions on Information Theory, 49(10):2735?2747, Oct. 2003. [65] K. V. Mardia and P. E. Jupp. Directional Statistics. Wiley, 2000. [66] W. Meyer. Toponogov?s theorem and applications. Lecture notes for the College on Differential Geometry at Trietse, 1989. Available at http://wwwmath.uni- muenster.de/u/meyer/publications/toponogov.html. [67] M. Moakher. Means and averaging in the group of rotations. SIAM Journal on Matrix Analysis and Applications, 24(1):1?16, 2002. [68] B. Mondal, R. W. Heath Jr., and L. Hanlen. Quantization on the Grassmann manifold: Applications to precoded MIMO wireless systems. IEEE Transac- tions on Signal Processing, 5(8):1025?1028, March 2005. [69] L. Moreau. Stability of multiagent systems with time-dependent communi- cation links. IEEE Transactions on Automatic Control, 50(2):169?182, Feb. 2005. [70] J. A. Nelder and R. Mead. A simplex method for function minimization. The Computer Journal, 7(4):308?313, 1965. 207 [71] R. Olfati-Saber, J. A. Fax, and R. M. Murray. Consensus and cooperation in networked multi-agent systems. Proceedings of the IEEE, 95(1):215?233, January 2007. [72] J.C. Alvarez Paiva and A.C. Thompson. Volumes on Normed and Finsler Spaces, volume 50 of Riemann-Finsler Geometry. MSRI, 2004. [73] D. A. Paley. Stabilization of collective motion on a sphere. Automatica 45, 45:212?216, 2009. [74] F. C. Park and B. Ravani. Smooth invariant interpolation of rotations. ACM Transactions on Graphics, 16(3):277?295, July 1997. [75] X. Pennec. Statistical Computing on Manifolds for Computational Anatomy. Habilitation `a diriger des recherches, Universit?e Nice Sophia-Antipolis, Decem- ber 2006. [76] P. Petersen. Riemannian Geometry. Springer, 2006. [77] D. Petz. Means of positive matrices: Geometry and a conjecture. Annales Mathematicae et Informaticae, 32:129?139, 2005. [78] T. Sakai. Riemannian Geometry, volume 149. American Mathematical Society, 1996. [79] A. Sarlette and R. Sepulchre. Consensus optimization on manifolds. SIAM Journal on Control and Optimization, 48(1):56?76, 2009. [80] A. Sarlette, R. Sepulchre, and N.E. Leonard. Autonomous rigid body attitude synchronization. In Proceedings of the 46th IEEE Conference on Decision and Control, New Orleans, LA (USA), pages 2566?2571, December 2007. [81] A. Sarlette, R. Sepulchre, and N.E. Leonard. Cooperative attitude synchro- nization in satellite swarms: A consensus approach. In Proceedings of the 17th IFAC Symposium on Automatic Control in Aerospace, Toulouse (France), June 2007. [82] R. Schneider. Convex Bodies: The Brunn-Minkowski Theory. Cambridge Uni- versity Press, 1993. [83] K. Shoemake. Animating rotation with quaternion curves. In ACM SIG- GRAPH, volume 19, pages 245?254, 1985. [84] S. T. Smith. Geometric optimization methods for adaptive filtering. PhD thesis, Harvard University, May 1993. [85] A. Srivastava. A Bayesian approach to geometric subspace estimation. IEEE Transactions on Signal Processing, 48(5):1390?1400, May 2000. 208 [86] J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:23192323, 2000. [87] P. Turaga, A. Veeraraghavan, and R. Chellappa. Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision. In IEEE con- ference on Computer Vision and Pattern Recognition (CVPR), pages 1?8, June 2008. [88] J. Wallner and N. Dyn. Convergence and C1 analysis of subdivision schemes on manifolds by proximity. Computer Aided Geometric Design, 22(7):593?622, 2005. [89] A. Weinstein. Almost invariant submanifolds for compact group actions. Jour- nal of the European Mathematical Society, 2(1):53?86, March 2000. [90] J. Wolfowitz. Products of indecomposable, aperiodic, stochastic matrices. Pro- ceedings of the American Mathematical Society, 15:733?736, 1963. [91] Y. C. Wong. Differential geometry of grassmann manifolds. Proceedings of the National Academy of Sciences of the United States of America, 57(3):589?594, March 1967. [92] M. Zefran and V. Kumar. Planning smooth motions on SE(3). In IEEE Inter- national Conference on Robotic Automation, pages 121?126, Minneapolis, MN, April 1996. 209