ABSTRACT
Title of dissertation: TOPOLOGICAL DATA ANALYSIS
DIMENSION REDUCTION AND
COMPUTATIONAL EFFICIENCY
Nathaniel Monson
Doctor of Philosophy, 2022
Dissertation directed by: Professor Wojciech Czaja
Professor Patrick Brosnan
Department of Mathematics
In this dissertation, we present a novel stability result for the persistent ho-
mology of the Rips complex associated to a point cloud. Our theorem is narrower
than the classic result of Cohen-Steiner, Edelsbrunner, and Harer in that it does
not apply to C?ech complexes, nor to functions which are not measuring distance to
a point cloud. It is broader than the classic result in that it is ?local?; if a function
approximately preserves distances in some range, but is contractionary below or
expansionary above that range, our result still applies. The novel stability result is
paired with the Johnson-Lindenstrauss Lemma to show that, with high probability,
random projection approximately preserves persistent homology. An experimental
analysis is given of the computational speedup granted by this dimension reduction.
This is followed by some observations suggesting that even when the theoretical
bound is loose enough that we have no guarantee of homology preservation, there
is still a high chance that significant features of the dataset are preserved.
TOPOLOGICAL DATA ANALYSIS, DIMENSION REDUCTION,
AND COMPUTATIONAL EFFICIENCY
by
Nathaniel Burton Monson
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2022
Advisory Committee:
Professor Wojciech Czaja, Chair/Advisor
Professor Patrick Brosnan, Co-Advisor
Professor Jonathan Rosenberg
Professor Christian Zickert
Professor Leila de Floriani (Dean?s representative)
?c Copyright by
Nathaniel Burton Monson
2022
Dedication
To my grandmother, Nancy Moebus Heuston. Gran, I stand all amazed at the
nigh-overwhelming support, generosity, and kindness you have offered me?oh, it is
wonderful. I know you love me so much you would read this whole dissertation if I
asked you to. And I love you so much that I won?t ask.
ii
Acknowledgments
I have been incredibly fortunate in my life to have wonderful teachers, friends,
and family. As I write this, I am filled with more gratitude than I can express
for all of them. This page is too small to contain every name?indeed, this entire
dissertation is too small to contain the entirety of my thanks. Thank you, thank
you all, from the bottom of my heart. I cannot thank you enough.
While a multitude of people have enriched my life, both mathematically and
otherwise, there are six people without whom this dissertation would literally never
have happened. In chronological order of entry into my life, I?d like to start by
thanking my mother, Kary Patricia Heuston. Some of my earliest memories are of
being in the lab, ?helping? you as you worked on your own dissertation [86]. You
were a model of spontaneous kindness and generosity, and determination in the face
of overwhelming adversity. I love you and miss you.
My thanks also to my father, Roland Hintze Monson. You were largely re-
sponsible for beginning my love of math, by explaining everything from Cantor?s
diagonalization argument, to Alexander?s horned sphere, to Fourier series to me, all
before high school. You have been a pillar of love and support for me in almost
every way I can imagine. If I ever have a child, I hope I am for that child what
you?ve been for me. Thanks also to my stepmother, Dian Saderup Monson. While
your influence on my mathematical life was limited to helping me learn my times
tables in second grade, your presence in the rest of my life has been possibly the
greatest blessing in a life filled with many.
iii
Jumping forward several decades, my thanks to my advisors, Wojciech Klaudiusz
Czaja and Patrick Gerald Brosnan. You have been patient when I required patience,
and pushed me when I needed to be pushed. I am a better mathematician, bet-
ter teacher, and even a better person due to your influence. You?ve been excellent
mentors, and deserve much of the credit for any part of this dissertation that future
readers find worthwhile.
Finally, my thanks to my friend Thomas Faraday Harper. In the marathon
of my dissertation, you have done everything short of slinging me bodily over your
shoulder and hauling me for twenty miles. Thank you for breakfasts, emotional
support, ice water, evening planning, cakes, and the love which they all represent.
Without your literally hundreds of hours of encouragement and logistical support,
this dissertation would not exist.
My thanks also to my mathematical community. Thomas Hunter, Jonathan
Rosenberg, Walter Stromquist, and Larry Washington have all been wonderful pro-
fessors for me. I first learned about persistent homology from Nathan Dykas at an
event kindly organized by Niranjan Ramachandran. In dealing with the Graduate
School of University of Maryland, Tom Haines and Cristina Garcia took in what
could have been a nightmare of red tape for me, and gave me smoother sailing than
I could have hoped for. Steve Balady, Rebecca Black, JP Burelle, Jon Cohen, Ran
Cui, Michael Kreisel, Adam Lizzi, Tim Mercure, Richard Rast, Catie Schwartz, and
many others have gone from being peers and fellow grad students to friends.
I began this dissertation without any programming skills. Rose Lynn Embry,
Liam Fowl, Josh Gleason, Ilya Kavalerov, Stephen Tratz, and Ben Warren all pro-
iv
vided some combination of programming assistance and friendship, both of which
were deeply appreciated.
My brother Boyd Monson spent many hours keeping me company while I
worked on this, and provided food and encouragement. My sister Olivia Monson
gave input on many important questions, such as ?is Paul Bunyan a kaiju?? She is
also the artist responsible for Figures 2.6 and 3.14. Both of their kindness and good
cheer have been invaluble.
Christopher Green, we did for each other what we could. Maybe it was not
what we have asked for, when we have spoken. But a man can only give what he
has, being what he is. Thank you for breakfasts, dinners, board gaming, interesting
conversations, and your relentless and uncompromising support. Carry on.
v
Table of Contents
Dedication ii
Acknowledgements iii
Table of Contents vi
List of Tables viii
List of Figures ix
1 Introduction 1
1.1 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Dissertation organization . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Topological preliminaries 11
2.1 Simplicial homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Simplicies and simplicial complexes . . . . . . . . . . . . . . . 13
2.1.2 Chain complexes . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Point clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Naive persistent homology . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Birth-death matching . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Advanced persistent homology . . . . . . . . . . . . . . . . . . . . . . 37
2.4.1 Replacing point clouds with height functions . . . . . . . . . . 37
2.4.2 Alternate complexes and ease of computation . . . . . . . . . 41
2.4.2.1 Clique complexes and Rips complexes . . . . . . . . 43
2.4.2.2 Delaunay complexes . . . . . . . . . . . . . . . . . . 46
2.4.2.3 Alpha complexes . . . . . . . . . . . . . . . . . . . . 46
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
vi
3 Applications of persistent homology to path planning 49
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Radiological isotope measurement and modeling . . . . . . . . . . . . 51
3.4 Multi-agent search methodology . . . . . . . . . . . . . . . . . . . . . 54
3.4.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.2 Online radioactive source location estimation . . . . . . . . . . 58
3.4.3 Terrain based source inspection . . . . . . . . . . . . . . . . . 60
3.5 Discussion of topological path planning . . . . . . . . . . . . . . . . . 67
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4 Results on stability of persistent homology and dimension reduction 74
4.1 Pipeline of persistent homology . . . . . . . . . . . . . . . . . . . . . 74
4.1.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.2 Obtaining a filtered simplicial complex from a point cloud . . 75
4.1.3 Reduction of filtered simplicial complex . . . . . . . . . . . . . 76
4.1.4 Obtaining a parameterized family of homology groups . . . . . 76
4.1.5 Displaying information in a persistence diagram or barcode . . 77
4.2 Stability of persistent homology . . . . . . . . . . . . . . . . . . . . . 77
4.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.2 Quadrant lemma and box lemma . . . . . . . . . . . . . . . . 82
4.3 Using stability for linear dimension reduction . . . . . . . . . . . . . . 92
4.4 Other dimension reduction methods . . . . . . . . . . . . . . . . . . . 94
4.5 Directions for further research . . . . . . . . . . . . . . . . . . . . . . 96
5 Experimental results on computation times of persistent homology 98
5.1 Datasets and software used for benchmarks . . . . . . . . . . . . . . . 98
5.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Experiments for a baseline of persistent homology computation speed 103
5.2.1 Scaling with number of points . . . . . . . . . . . . . . . . . . 104
5.2.2 Scaling with number of homology groups computed . . . . . . 106
5.2.3 Scaling with intrinsic dimension of the dataset . . . . . . . . . 109
5.2.4 Scaling with ambient dimension . . . . . . . . . . . . . . . . . 110
5.3 Calculation of persistent homology via random projections . . . . . . 110
5.3.1 Reasons to expect computational slowdown or speedup . . . . 118
5.4 Heuristics for better preservation than guaranteed . . . . . . . . . . . 119
Bibliography 121
vii
List of Tables
3.1 Table of UAV/UGV experiments . . . . . . . . . . . . . . . . . . . . 55
5.1 List of benchmark datasets . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 Dionysus computation demonstrating scaling with number of points . 105
5.3 Ripser computation demonstrating scaling with numer of points . . . 107
5.4 Dionysus computation demonstrating scaling with number of homol-
ogy groups computed. . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.5 Ripser computation demonstrating scaling with number of homology
groups computed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.6 Ripser computation of the Vicsek dataset with dimensional noise. . . 109
5.7 Dionysus computation demonstrating scaling with ambient dimension 111
5.8 Ripser time to calculate H0 pre- and post-projection . . . . . . . . . . 112
5.9 Differences in Ripser time to calculate H0 . . . . . . . . . . . . . . . . 113
5.10 Ripser time to calculate H1 pre- and post-projection . . . . . . . . . . 114
5.11 Differences in Ripser time to calculate H1 . . . . . . . . . . . . . . . . 115
5.12 Ripser time to calculate H2 pre- and post-projection . . . . . . . . . . 116
5.13 Differences in Ripser time to calculate H2 . . . . . . . . . . . . . . . . 117
viii
List of Figures
2.1 Simple simplicial example . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 An example of a 2-dimensional point cloud, X, (top) along with two
of its expansions, X.1 (bottom left) and X.5 (bottom right). . . . . . . 20
2.3 Simple C?ech complex . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Nerve Theorem example . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Two different C?ech complexes with the same homology . . . . . . . . 24
2.6 A point cloud with a generator of homology more stable than its
homology groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7 Diagram showing the time dependant chain complexes for a parame-
terized family of homology groups . . . . . . . . . . . . . . . . . . . . 30
2.8 Persistence diagram for random points in a figure-8 . . . . . . . . . . 34
2.9 Bar code for random points in a figure-8 . . . . . . . . . . . . . . . . 35
2.10 Stability counterexample . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.11 A small point cloud and the Voronoi decomposition it induces . . . . 47
3.1 Deconvolution method of source localization. . . . . . . . . . . . . . . 52
3.2 Aerial scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Spectral data showing source prediction . . . . . . . . . . . . . . . . . 53
3.4 RMAX with Nal detector and stereovision system in-flight. . . . . . . 56
3.5 Hexacopter with detectors . . . . . . . . . . . . . . . . . . . . . . . . 56
3.6 A UGV and UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7 Radiation contour generated from the INL flights . . . . . . . . . . . 57
3.8 3d mosaic for path-planning . . . . . . . . . . . . . . . . . . . . . . . 58
3.9 UGV trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.10 Ground robot path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.11 Initial exhaustive scan result. . . . . . . . . . . . . . . . . . . . . . . 66
3.12 Breakdown of area into traversable and less-traversable regions . . . . 68
3.13 Alternate paths generated to be high-persistence. . . . . . . . . . . . 69
3.14 Topological path planning counterexample . . . . . . . . . . . . . . . 71
5.1 Persistance diagrams for benchmark datasets . . . . . . . . . . . . . . 100
5.2 Best-fit quartic for Dionysus scaling with number of points . . . . . . 105
5.3 Best-fit quartic for Ripser scaling with number of points . . . . . . . 106
ix
5.4 Persistence diagrams demonstrating high probability of preservation
of persistent homology through random projection . . . . . . . . . . . 120
x
Chapter 1: Introduction
1.1 Summary of results
The primary results of this dissertation are twofold. First, we present a novel
stability result for Rips complexes in Theorem 4.2.11. Our result differs from the
classic result of Cohen-Steiner, Edelsbrunner, and Harer (Theorem 2.4.2) in several
ways. Their result is more general in that it concerns the stability of the persistence
diagram of any tame function on a triangulable space, while ours is restricted to
the case of functions which are the distance to a finite point cloud, and ours only
approximates the persistent homology using the Rips complex, while the C?ech com-
plex is the true object of interest. While this may sound a significant restriction,
almost all practical applications we have seen in the literature are computing Rips
complexes from finite point clouds. Indeed, we have only found a single computer
implementation of an algorithm which can support computation of persistent homol-
ogy via C?ech complexes?all others default to using the Rips complex. Given that
the sacrifice of C?ech to Rips has already been made, our Theorem 4.2.11 is thus
more directly applicable. Moreover, our result is an ?element by element? result
which can be applied locally in some sense, while their result only applies if there is
a single uniform bound on the motion of the entire point cloud. We pair Theorem
1
4.2.11 with the Johnson-Lindenstrauss Lemma (Theorem 4.3.1) to show that, with
high probability, random projection approximately preserves persistent homology.
In the second major contribution of this dissertation, we give an experimen-
tal analysis of the computational speedup granted by dimension reduction through
random projection. We also give some observations suggesting that even when the
theoretical bound is loose enough that we have no guarantee of homology preserva-
tion, there is still a high chance that significant features of the dataset are preserved.
1.2 Literature review
The subfield of computational topology known as persistent homology has
grown quickly since its birth two decades ago. There are many survey articles on
persistent homology. The author recommends Ghrist?s ?Barcodes: the persistent
topology of data? [78] as an especially welcoming introduction to the subject. Very
good alternatives include articles by Edelsbrunner and Harer [65], (and its successor
[69]) Carlsson [34] or Weinberger [125]. While the field of applied topology, for which
persistent homology is the exemplar, is too young to have many textbooks, those
that exist [66,79,109,116] are uniformly good.1
The paper that introduced persistent homology in its modern incarnation
was ?Topological persistence and simplification,? by Edelsbrunner, Letscher, and
Zomorodian [68]. This paper drew on the three associated geometric notions of
1An honorable mention should be made here of [129], which was before its time, and thus
does not contain many important later results. A further honorable mention to Chambers et al.?s
Research in Computational Topology [39]. While neither a textbook nor elementary, it collects
many interesting papers springing from research conducted at a workshop in 2018.
2
Voronoi regions, Delauney triangulations, and alpha complexes to define the per-
sistent homology of a filtered complex and introduce the index-persistence plane
for visualizing their results, and also discussed a reasonably efficient algorithm for
pairing simplices in order to calculate the persistent homology. It also included a
proof-of-concept example of using persistent homology for feature detection, calcu-
lations of the persistent homology of five objects of interest (e.g., a certain protein
and a small statue). This perspective was refined shortly thereafter in ?Computing
persistent homology,? by Zomorodian and Carlsson [128]. An important contribu-
tion of their paper is the identification of persistent homology of a filtered complex
with the standard homology of a certain graded module over a polynomial ring. This
implies that persistent homology with field coefficients can be simply classified, in
contrast to persistent homology with Z coefficients. The last piece of the core of
persistent homology was built by Cohen-Steiner, Edelsbrunner, and Harer, in [45],
which showed that persistence is stable under perturbations of the underlying func-
tion, a result which our Theorem 4.2.11 is closely tied to. This stability theorem
of Cohen-Steiner, Edelsbrunner, and Harer (Theorem 2.4.2), was further refined by
Cohen-Steiner et al in [46] and [47], and by Chazal et al in [40].
Even before the 2000 paper of Edelsbrunner, Letscher, and Zomorodian, sev-
eral mathematicians were already converging on these ideas. In 1990 Frosini intro-
duced ?size functions? [76] which are equivalent to 0-dimensional persistent homol-
ogy. Robins [115] described persistent Betti numbers in 1999. While Edelsbrunner
et al independently discovered persistent homology the next year, and their lan-
guage and formalism has persisted, the fundamental ideas are all present in Robins?
3
work. Also in 1999, Cagliari et al. reinterpreted and categorified size functions into
a language which is quite close to the modern formulation of persistence in [30].
The insights and theory underlying persistent homology have been extended
to a number of wider frameworks. In [38] Carlsson and Zomorodian explored multi-
dimensional persistence, and prove that it is necessarily unsatisfactory. In [35, 36],
Carlsson, De Silva, and Morozov give an explanation of and details for computing
zigzag persistence for a chain of spaces connected by maps that are not simply in-
clusions. In [17] Bendich, Edelsbrunner, Morozov and Patel give a way to measure
the robustness of a level set, and relate it to well groups and persistent homology.
In [59] De Silva, Morozov, and Vejdemo-Johansson define ?circular persistence? for
S1-valued functions. Bubenik categorifies persistent homology in [28], drawing on
the work of [48] and [91]. He uses this approach to give a highly general stability
result in [25]. Buchet, Chazal, Oudot, and Sheehy extend persistent homology from
functions to measures in [29]. Basu, in [13], gives an explicit formula relating the
persistent homology of a filtered space to the spectral sequence induced by that
same filtration. And Dey, in [60], presents traditional persistent homology as de-
pendent on a map to R as a topological space covered by intervals, and extends this
formulation to a map whose codomain is any manifold with any cover.
There has been an explosion of interest in the last ten years on the applications
of persistent homology across many fields. A full accounting is beyond the scope
of this dissertation. Thus the following examples from medical research, image
processing, and robotics should be taken as non-exhaustive representative samples
of the broad research using computational topology.
4
Topological data analysis has seen considerable use in the study of medical data
and genomic data, with [102] as a notable early example in which topological analysis
identified a previously unknown subgroup of breast cancers clinically distinct from
others. Neuronal structures in the visual cortex have been studied this way as
well [121]. Topological analysis has also been used in image processing [1, 10] and
robotic path planning [90].
In addition to the direct uses above, numerous papers have tried to bridge the
gap between theory and practice?papers whose results are theoretical, but whose
purpose is clearly to make applications more attractive or tractable. The following
list highlights a few of these demonstrating the development of the field over time.
1. De Silva, Ghrist, and Muhammad have used persistent homology to demon-
strate that a swarm of robots with no localization capabilities can determine
if it covers an area [57].
2. Carlsson, Ishkhanov, De Silva, and Zomorodian studied the space of natural
images, using persistent homology to identify a Klein bottle structure within
it, and gave reason to think this has the potential to be useful for image
compression [37].
3. Possibly most importantly on this list is the 2008 paper by Niyogi, Smale,
and Weinberger ?Finding the homology of submanifolds with high confidence
from random samples? [103], a foundational paper in manifold learning, in
which the authors demonstrate that random finite discrete samples from a
manifold allow recovery of its homology with high confidence. Niyogi et al.
5
later extended their results to relate more explicitly to unsupervised machine
learning [104].
4. Bubenik and Kim have related this statistical perspective of Niyogi et al. more
directly to persistent homology [27].
5. Adler et al. discussed manifold learning, as well as applications of persistent
homology to statistics and probability; they drew parallels between sublevel
sets and excursion probabilities [2].
6. Walker has used persistent homology to establish coverage of sensor networks
over a domain [124].
7. Fasy et al. have used statistics to give a detailed account of how much a given
feature detected by persistent homology matters [74].
8. Sheehy has given an algorithm for a linear-size approximation of the Vietoris-
Rips complex [119], and has paired the Johnson-Lindenstrauss Lemma (The-
orem 4.3.1) with the stability theorem of Cohen-Steiner, Edelsbrunner, and
Harer (Theorem 2.4.2) to give an algorithm for approximating persistent ho-
mology somewhat faster at the cost of some accuracy [120].2
9. Bubenik et al. have introduced persistence landscapes [26], a way to make per-
sistent homology more useful for machine learning applications, while Adams
2Although Oudot, in [109], gives some reason to believe the speedup is small enough to be not
worth it in most applications. Chapter 5 of this dissertation contains some experiments to evaluate
these perspectives along with a discussion.
6
et. al refined this into the idea of persistence images [1], which have some
desirable properties that persistence landscapes lack.
10. Finally, Mandal [94] has demonstrated the usage of sparsification methods to
assist persistent homology calculations in computational biology.
In addition to the above results on the applications of persistent homology,
there is an additional current of research flowing in the direction of increasing the
computational speed of various aspects of persistent homology. There is not a sharp
division between the papers listed here and those listed above, merely a change in
emphasis; the papers below are more narrowly focused on limitations of computer
speed and memory. Before giving this list, the author wishes to give special notice
to the 2017 paper ?A Roadmap for the Computation of Persistent Homology,? by
Otter, Porter, Tillmann, Grindrod, and Harrington [107]. An excellent paper which
is highly accessible, it serves both as an inviting introduction to persistent homology
as used in practice, and as a framework for understanding many other contributions
and areas for further study. It also contains a variety of datasets suitable for bench-
marking computational speeds of persistent homology calculations on, which the
authors have made publicly available. This dissertation makes use of many of these
datasets in Chapter 5. If the reader wishes to know more about computational
topology, this is the first paper they should read.
? In 2004, Kaczynski, Mischaikow, and Mrozek published Computational Homol-
ogy [88]. While this book does not deal with persistence directly, it addresses
questions of speed of homology computations in great detail, and provides a
7
number of novel results.
? In 2013, Attali, Lieutier, and Salinas [9] give explicit conditions for the Rips
complex of a point cloud to be collapsible to the same homotopy type as the
underlying shape, and two years later, in [8], the first two authors give a (non-
constructive) result that sheds light on why collapses tend to not get stuck in
?house with two rooms?3 situations.
? Also in 2013, Kerber and Sharathkumar [89] give a relatively quick algorithm
which yields an approximation of the C?ech filtration of any point cloud.
? Again in 2013 (a highly productive year for this area) Mischaikow and Nanda,
in [98], adapt discrete Morse theory for filtrations and efficient computations
of persistent homology. This perspective was built on by Curry, Ghrist, and
Nanda in [50] for more efficient computations of sheaf cohomology, which in-
cludes persistent homology as a special case, and also by Henselman and Ghrist
in [85] which observes a novel relationship between discrete Morse theory, ma-
trix factorizations, and matroids.
? In 2020, Espinoza, Herna?ndez-Amador, Herna?ndez-Herna?ndez, and Ramonetti-
Valencia in [72] present an algorithm for computing the C?ech complex of a
planar collection of discs with different radii, and assess its performance.
? Finally, also in 2020, Malott, Sens, and Wilsey in [93] evaluate several data
reduction methods, including random point sampling vs cluster centroids.
3The ?house with two rooms? is an example of a simplicial complex which is contractible, but
has many simplices and no available collapses.
8
The bridge connecting the idea of a stability result, such as our Chapter 4 to
a speedup of computations, as we investigate in Chapter 5, is compressive sensing,
a field of math concerned with reconstructing high dimensional information with
high probability from low dimensional projections. The foundation on which all
of compressed sensing rests is the Johnson-Lindenstrauss Lemma (Theorem 4.3.1),
from their paper ?Extensions of Lipschitz mappings into a Hilbert space? [87]. While
the original proof is quite technical, Dasgupta and Gupta discovered an elementary
proof in the appropriately-named ?An elementary proof of a theorem of Johnson
and Lindenstrauss? [54]. More recently, an explosion of interest has followed the
papers ?Near optimal signal recovery from random projections: Universal encoding
strategies?? of Candes and Tao [32], closely followed by ?Stable signal recovery from
incomplete and inaccurate measurements? [33] and ?Robust uncertainty principles:
Exact signal reconstruction from highly incomplete frequency information,? [31]
both by Candes, Romberg, and Tao. See Clark [11], Cloninger [44] Doster [62],
Hafftka [81] or Murphy [101]for more recent examples of compressed sensing in
action.
A recent example of using compressed sensing for a similar purpose to that we
put it in this paper is given by Halevy, in [82], wherein he speeds the computation
of Laplacian eigenmaps using random projection.
9
1.3 Dissertation organization
In Chapter 2, we present some mathematical preliminaries, primarily topolog-
ical, with no original theorems.
In Chapter 3, we discuss applications of persistent homology to path planning,
based largely on a paper by Ghrist and Battacharya. We include some information
about experiments done with DTRA funding, which the author wrote some (ulti-
mately unused) computer code (implementing the algorithm described by Ghrist
and Battacharya) for.
In Chapter 4, we give a proof of a novel stability result for persistent homology,
which yields the most commonly used application of the stability theorem of Cohen-
Steiner, Edelsbrunner, and Harer (Theorem 2.4.2) as a special case. Following a
suggestion by Baraniuk and Wakin [12], we pair it with the Johnson-Lindenstrauss
Lemma (Theorem 4.3.1) to show that random projection preserves homology of
point clouds with sufficiently high persistence, and has the potential to do better
than that if the points lie on a manifold. We include some directions for future
research.
In Chapter 5, we examine the implications of the dimension reductions of
Chapter 4 for the speed of computing persistent homology, with an emphasis on
computing the lower dimensional homology groups. We discuss the results of some
experiments done with various software packages.
10
Chapter 2: Topological preliminaries
This Chapter is devoted to an exposition of the topological results that form
the necessary background for the rest of this dissertation, in particular the field of
persistent homology. First we introduce simplicial homology, then we relate it to
point clouds (finite sets of points in Euclidean space). We then give a simplistic
formulation of ?naive? persistent homology, which will arise naturally from our
discussions on point clouds. We then mention two extensions of this formulation?
one that aids in the speed and ease of computation, and another that extends the
definition of persistent homology to a much more general class of objects than finite
point sets in Rd. This language will allow us to quote the stability theorem of
Cohen-Steiner, Edelsbrunner, and Harer. In Chapter 4 we will prove a theorem
which, in the case of point clouds in Rd, serves as a substantial generalization of
their stability theorem.
While none of the theorems or definitions we give in this Chapter are original
to this work, some of our exposition takes a perspective we have not seen before in
the literature?in particular, the counterexample described in Figure 2.10 is one the
author wishes he had seen.
11
2.1 Simplicial homology
Homology has an illustrious history; distant ancestors appear in 1758 [73],
where Euler first discussed the eponymous Euler characteristic, as well as 1857 [114],
where Riemann defined the genus of a surface and 1870 [18], when Betti first defined
the invariants of a surface that would later come to be called Betti numbers . It was
given the name ?homology? in 1895 by Poincare? in [111]. The modern formulation
of algebraic topology, the idea of a group of cycles modulo a group of boundaries,
was first given by Noether, between 1925 and 1928.1
Our discussion here follows the standard background material; we recall some
basic definitions and state some theorems for reference. For more details, any text
which provides an introduction to algebraic topology will give a more thorough
discussion than we do here, as well as contain more proofs. Hatcher?s Algebraic
Topology [83] has a very clear discussion, and has the further advantage of being
available free online.2 Other standard references include Armstrong [6], Bredon [24],
and, for the brave who appreciate a minimalistic text with full generality, May [96].
We take the classical perspective of homology as measuring the shape of a
space, in the sense of how many holes it has. The 0th homology measures how many
connected components an object has, the first measures how many non-trivial loops
it has and so forth. An d-dimensional sphere has a hole in the dth dimension (and
1She seems to have never written a paper focusing on this. However, in the summers of 1926
and 1927, she attended a series of lectures by Hopf and her discussions and comments there are
agreed to have been original and insightful. Her viewpoints were adopted by Hopf and others
shortly thereafter. See, e.g., [61] for more details of her contributions.
2It was also consulted frequently while writing this Section.
12
no other holes) so we intuitively expect homology to ?notice? this, and it does?the
homology of the d-sphere is trivial in all dimensions except the 0th and the dth, and
is non-trivial there. In fact, it is rank 1 in each of those, noticing a single connected
component and a single d-dimensional hole.
2.1.1 Simplicies and simplicial complexes
Our eventual goal is to perform large-scale homological computations. Of the
several approaches to homology, that of simplicial complexes is most sympatico with
this goal, so we forego the lure of Whitehead?s CW-complexes approach [126] in favor
of this more combinatorial language. Simplicial complexes were used by Eilenberg
and Zilber in [71] to introduce simplicial sets, which have been used to great effect
by Quillen [112], May [97], Bousfield and Kan [22], Segal [118], and many others.
Standard references include Massey [95], Armstrong [6], and Spanier [122] (which
we use here). Fomenko [75], while less well-known, covers similar material with a
visual focus, and has some very useful illustrations. Following in their footsteps, we
now define a simplex. The following definition is equivalent to that given in Spanier
(page 108) [122].3
Definition 2.1.1 (Abstract n-simplex; faces). An abstract n-simplex 4 is a set of
n + 1 elements, [v0, v1, . . . , vn]. These vi?s should be thought of and referred to as
points or vertices. If simplex ?1 is a subset of simplex ?2, then we say ?1 is a face of
?2. If the set of vertices is equipped with an order, we call it an oriented n-simplex.
3Although he does not refer to them as abstract.
4In some references these are called combinatorial n-simplices.
13
We consider two oriented simplices equivalent if the orders of their vertices differ
by an even permutation. If ? is an oriented simplex, we write ?? for the oriented
simplex with the same vertices as ?, with an opposite orientation.
Definition 2.1.2 (Abstract simplicial complex). An abstract simplicial complex is
a collection, ?, of abstract simplices with the property that any face of a simplex
in ? is itself also in ?.
Importantly, we do not insist that the vertices of the simplices which compose
the simplicial complex be distinct. (Indeed, if we did it would be impossible to have
any simplex with more than one vertex in the simplicial complex; its faces are, by
definition, not disjoint from it). For instance,
{{1, 2, 3}, {1, 2}, {2, 3}, {1, 3}, {2, 4}, {1, 4}, {1}, {2}, {3}, {4},?}
is an abstract simplicial complex?it is the smallest simplicial complex containing
the simplices {1, 2, 3}, {1, 4}, and {2, 4}. Equivalently, it is the closure of {{1, 2, 3},
{1, 4}, {2, 4}} under the operation of taking subsets.
These abstract objects are purely combinatorial but we are ultimately inter-
ested in working with topological spaces such as subsets of Rd. To connect the two,
we make the following definitions:
Definition 2.1.3 (Standard n-simplex; realizations of n-simplices).
a) The standard n-simplex is a subset of Rn+1; it is the convex hull of {~e1, . . . , ~en+1},
the standard unit vectors. These are, of course, the vertices of the simplex.
14
b) A realization of an abstract n-simplex is a subset of Rd which is the image
of the standard n-simplex under an affine map; the realization is considered
degenerate if it is not homeomorphic.5
In geometrically realizing simplicial complexes, just as in the above case of sim-
plices, we shall define a standard realization and then generalize it with appropriate
homeomorphisms.
Definition 2.1.4 (Realizations of simplicial complexes).
a) If ? is an abstract simplicial complex with n points, a standard realization
of ? is a subset D of Rn. It is uniquely determined by a bijection between
the points of ? and the standard unit vectors ~e1, ? ? ? , ~en in Rn. Given such a
bijection, a point x of Rn is in D if and only if it is in the convex hull of a set
of ~ei?s whose corresponding points form a simplex in ?; i.e., each simplex of
? is realized in a way consistent with the bijection.
b) More generally, if D is a standard realization of a simplicial complex ? then
K ? Rk is a (non-degenerate) realization if there exists a homeomorphism
f : D ? K such that the restriction of f to the portion of D corresponding
to any simplex of ? is an affine map.
From here on, when we refer to a realization, we mean a non-degenerate realiza-
tion unless otherwise specified. See Figure 2.1 for a simple example of a realization.
5The standard n-simplex is, of course, a (non-degenerate) realization of an abstract n-simplex
via the identity map.
15
Figure 2.1: A geometric realization of the simplicial complex generated by
{{1, 2, 3}, {1, 4}, {2, 4}}.
Simplicial complexes are quite easy to work with, and spaces that are suffi-
ciently similar to them deserve a name.
Definition 2.1.5 (Triangulable space). A space is called triangulable if it is home-
omorphic to the realization of some finite simplicial complex.
In particular, a single simplex is triangulable. As the n-sphere, Sn, is homeo-
morphic to the union of faces of an n-simplex, the n-sphere is triangulable.
2.1.2 Chain complexes
We now take these geometric, combinatorial objects and use them to generate
algebraic structures through the following definitions. The following definition is
equivalent to that given in Spanier (page 159) [122].
Definition 2.1.6 (Chain groups; chain complexes).
a) Given a simplicial complex, ?, the associated chain groups are, for every natu-
ral number i, an abelian group Ci which is the set of formal sums (with integer
16
coefficients unless otherwise specified) of oriented i-simplices of ?, subject to
the relation ? = ???. There is a map, ?n : Cn ? Cn?1, called the boundary
map. On the lev?el of an individual oriented simplex, ? = {v0, v1, . . . , vn}, we
define ? (?) = n in i=0(?1) {v0, . . . , v?i, . . . , vn} (where the hat means exclude
the corresponding vertex), and we define ? of a formal sum of simplices to be
the formal sum of ? of the individual simplices.
b) Piecing these together, we have the chain complex, which is the sequence of
abelian groups
? ? ? ? ???n?+1Cn+1 ? Cn ?
??n Cn?1 ? ? ? ? ?
?
C1 ??1 C ?
??00 0,
which has the property that ?n(?n+1) = 0 for all n?that is to say, the image
of ?n+1 lies within the kernel of ?n. In symbols, Im ?n+1 CKer ?n.
We are now in a position to define the homology groups of a chain complex,
and by extension, the homology groups of a simplicial complex.
Definition 2.1.7 (Homology groups). The nth homology group of a chain complex
X with coefficients in Z, usually written Hn(X,Z) is Ker ?n/Im ? 6n+1. We will call
the n-index the dimension of the homology group. If, in the construction of the
chain complex, we take formal sums with coefficients in an abelian group G other
than the integers, we write the result as Hn(X,G).
We will frequently take homology with coefficients in a field, in which case the
6Note that these are both subgroups of Cn.
17
chain complex will be a sequence of vector spaces connected by linear maps. When
it will not hamper clarity to do so, we will omit the coefficients and write only H(X).
We will use H?(X) to refer to the direct sum of the homology over all dimensions if
we wish to emphasize the direct sum; otherwise, we may omit the subscript entirely.
If X is any object with a canonical chain complex associated to it (e.g., a simplicial
complex), we will write H(X) for the homology of the chain complex associated to
X, rather than add unnecessary clutter by creating a separate symbol for the chain
complex itself.
Homology has a variety of very useful properties. It is a homotopy invariant:
any two simplicial complexes which are homotopy equivalent to each other have
the same homology groups. Because of this, we extend our definition of homology
to include any space which is homotopy equivalent to a simplicial complex. Most
importantly for our purposes, homology is a functor?that is, a map between spaces
induces a family of maps between the corresponding homology groups. The case
that will be of most interest to us will be the fact that if U ? X, the inclusion map
i : U ?? X will induce a map on homology, i? : H(U) ? H(X). Note that this
induced map is not guaranteed to be an inclusion. Indeed, it will usually not be;
by a theorem of Whitney [127] any smooth manifold can be embedded into Rd for a
sufficiently high d, which has trivial homology above dimension 0, but most smooth
manifolds have non-trivial higher homology groups.
18
2.2 Point clouds
Homology is the tool of our study, and real-world datasets are its object. In
particular, we would like to study the homology of spaces derived from finite sets
of points inside Rd, as many datasets have a somewhat natural interpretation into
this setting.
Definition 2.2.1 (Point cloud). A point cloud in dimension d is a finite subset of
Rd. We will denote a point cloud by X = {x1, x2, ..., xn}. We denote the open ball
centered at xi of radius r by Br(xi). We use Xr to denote the union of open balls
of radius r around each point in X.
We will frequently think of and refer to r as a time parameter?imagining the
balls expanding over time. Figure 2.2 shows an example of a simple point cloud and
two of its expansions.
In order to study point clouds with the tool of homology, we must find a way
to view them as simplicial complexes. To that end, we make the following two
definitions (see Croom [49] for more details):
Definition 2.2.2 (Good cover). A good cover of a topological space is a collection
of open sets (whose union is the whole space) such that each open set, each pairwise
intersection, each 3-way intersection, etc. are all contractible.
For any r, the space Xr has a canonical good cover {Br(xi)}. This cover is
good because balls are convex, intersections of convex sets are convex, and convex
sets are contractible.
19
Figure 2.2: An example of a 2-dimensional point cloud, X, (top) along with two of
its expansions, X.1 (bottom left) and X.5 (bottom right).
20
Figure 2.3: A C?ech complex at scale 1.01, generated by four points in the plane.
Definition 2.2.3 (C?ech complex). Given a topological space and an open cover of
it, the C?ech complex (also called the nerve) of the cover is the simplicial complex
formed by a 0-simplex for each element of the cover, and an n-simplex for each n-way
intersection in the obvious way. We will use C(Xr) to refer to the C?ech complex
associated with the canonical good cover of Xr.
An example C?ech complex generated from 4 points is shown in Figure 2.3.
We will occasionally refer to the C?ech complex associated to X??this is the
maximal C?ech complex, given by a simplex for every subset of the vertices. As a
topological space, it is contractible.
Importantly, the Nerve Theorem is true:
Theorem 2.2.4 (Nerve Theorem, Borsuk [21]). Given a subset of Rd, X, and a
21
Figure 2.4: Here we see two interpretations of Figure 2.3. By the Nerve Theorem
(Theorem 2.2.4), the topological space X1.01 (top) and the C?ech complex generated
by its natural good cover (bottom) are homotopy-equivalent. In this case, both are
equivalent to S1, the one-dimensional circle.
good cover {Ui} of X, the C?ech complex of {Ui} is homotopy equivalent to X.7
We can geometrically realize the nerve of Xr with X, and a straight line be-
tween each pair of points of X within 2r of each other, triangles filling in appropriate
triples of lines, etc. (See Figure 2.4).
Note that if r1 and r2 are positive real numbers with r1 < r2, there is a
natural inclusion map i : Xr1 ?? Xr2 . By functoriality, this induces a map on
homology, i? : H(Xr1) ? H(Xr2). Indeed, the parameterized family of spaces, Xr,
yields a parameterized family of simplicial complexes, each homotopy equivalent to
the corresponding space, which in turn yields a parameterized family of homology
groups. This parameterized family might be constant for some long range of r?s.
We thus introduce the following definition:
Definition 2.2.5 (Stable homology). 8 Let X be a point cloud. If, for all s, t with
7This is in fact true for most topological spaces one might run into in practice, although there
are some pathological counterexamples within point-set topology that prevent us from asserting it
for all topological spaces.
8We note there is a different sense of this phrase in the literature which usually applies to the
22
r1 < s < t < r2, the inclusion i : Xs ? Xt is a homotopy equivalence, we say X has
stable homology from r1 to r2.
Point clouds have only a finite number of radii at which the homology might
change. If X = {x1, x2, ? ? ? xn} is a point cloud, then for any I ? {1, 2, ? ? ? , n} define
?
dI := inf{r ? R+| Br(xi) =6 ?}.
i?I
Thus, dI is the smallest distance for which there exists a point (of our ambient
space) which is within dI of every point of the subset of X indexed by I. We can
order all the dI ?s:j
0 ? dI1 ? dI2 ? ? ? ? ? dI n .2
X will have stable homology from dI to dI . The C?ech complex generated ati i+1
each of the radii between consecutive dI ?s will, in fact, be identical. X may have
stable homology for a longer range than that, but the C?ech complexes will not be
identical?see Figure 2.5.
The values where the homology changes are worth naming. It is easiest to
define the intervals where homology does not change and then name their comple-
ments. For reasons that will become apparent later, we focus on the function which
gives the distance from any point to our point cloud.
Definition 2.2.6 (Homological regular value; homological critical value). 9
homology of groups rather than spaces. In this dissertation we shall never take the homology of a
group nor use this sense of ?stable,? so this should result in no confusion.
9The definition which follows is due to Bubenik and Scott. An earlier definition for the term
?homological critical value? was due to Cohen-Steiner, Edelsbrunner, and Harer [45]. However,
23
Figure 2.5: A point cloud X and the C?ech complexes induced by the natural good
covers of X.99 (left) and X1.01 (right). While the C?ech complex changes, the homol-
ogy does not. In both cases, the induced complex is contractible.
a) A real number a is a homological regular value of a function f : Rd ?
R if there exists an open interval I which contains a such that, for every
r1, r2 with [r1, r2] ? I, the map induced by inclusion i? : H(f?1[??, r1]) ?
H(f?1[??, r2]) is an isomorphism.
b) Any value of f which is not a homological regular value we will call a homo-
logical critical value. We will say r is a homological critical value of a point
cloud X if it is a homological critical value of the function which measures
distance to the nearest point of X.
To restate what we said earlier with our new language, all of the homological
critical values of a point cloud X occur as one of the dI ?s. The converse is not
true?there exists dI ?s which are not homological critical values.
We can use this idea to translate the time progression, which is continuous, to
in [28], Bubenik and Scott give a counterexample (which is further explored by Govc in [80]) to
a theorem in Cohen-Steiner et al.?s paper, and suggest a minor correction to the definition under
which the original theorem holds. In the case we are concerned with, that of persistent homology
of finite point clouds, the two definitions are equivalent.
24
a discrete structure:
Definition 2.2.7 (Filtration). Given a simplicial complex, ?, an expanding nested
sequence of subcomplexes is called a filtration of ?. Thus we may write
?0 ? ?1 ? ? ? ? ? ?n?1 ? ?n = ?
for a filtration of ?.
A few things to note about the above definition. The containments are not
necessarily proper??i could well be equal to ?i?1 or to ?i+1, or to both. A filtration
may begin with the empty set, or indeed with several copies of the empty set. If
? ? ?, the above filtration of ? induces a filtration of ?:
(?0 ? ?) ? (?1 ? ?) ? ? ? ? ? (?n?1 ? ?) ? (?n ? ?) = ?.
Any point cloud X has a natural filtration of the fully connected C?ech complex
associated to X?, given by taking a C?ech complex between each homological critical
value.
It is reasonable to think of the parameter describing which step of the filtration
we are referring to as a time parameter, and we shall do so. This is compatible with
the notion of the subscript of Xt as a time parameter, in the sense that they both
occur in the same order. It is, however, not identical to it as the filtration time
parameter is always a positive integer, while the radius time parameter could be
any positive real. We will be explicit in pointing out when we switch from one to
25
the other to avoid confusion.
2.3 Naive persistent homology
Let us review for a moment what we have done. As this list of steps will serve
as a template for further refinements, we include some passive steps.
1. We began with a point cloud, which we left alone but will discuss modifying
in Chapter 5.
2. From this point cloud we obtained a filtration of a simplicial complex.
3. We then had a parameterized family of simplicial complexes. Again, we will
discuss modifying this in Chapter 5.
4. By taking the homology, we obtained a parameterized family of homology
groups.
5. We then had a parameterized family of homology groups, which we will want
to display in some easily analyzable format.
Why might we have done this? Frequently, we are in a situation where we
have a point cloud that represents a sample of some important data which we have
good reason to believe all lies on some manifold or algebraic variety (or finite union
thereof). Ideally, we would be able to derive knowledge of the exact manifold from
the point cloud, but this is obviously impossible?there are an infinite number of
C? manifolds which contain any given finite collection of points. But as long as the
26
embedding of the manifold is not too sharply angled, and our point cloud represents
a sufficiently large and evenly distributed sample of points, we can still expect to
calculate the major features of the manifold with high probability. In particular, if
X has stable homology from a to b, with a << b, we might well conclude that X
?really looks like? X(b?a)/2. There are a number of difficulties with this. One is that
any point cloud will have stable homology from a to ? for any a greater than the
diameter of the point cloud?specifically, the Xr?s will be contractible. Thus, we
can?t just choose the largest interval with stable homology?the homology on that
interval is trivial.
Another difficulty is a bit more subtle. Consider a subset of the cylinder
x2 + y2 = 4, 0 ? z ? 2 consisting of finitely many vertically oriented circles with
radii between .1 and 1, each tangent to at least one circle whose center is clockwise
of it and one whose center is counterclockwise.10 Now let X be a random sample
of many points from this space (See Figure 2.6). Xr is contractible for r > 2, and
X has stable homology from 1 to 2 (and is homotopy equivalent to a circle in that
range). But between 0 and 1, the homology of X can fluctuate wildly. Despite this,
there should be an element of the homology of X whose representative is also a
non-zero element of H(X1.5), even for very small . A piece of the first homology is
quite stable, even if the first homology group as a whole is not.
Both of the problems can be dealt with by focusing on the generators of ho-
mology instead of the entire collection of groups. This method for trying to extract
these individual features of the structure underlying the data is called persistent
10Geometrically, these will be slight deformations of circles, due to the curvature of the cylinder.
27
Figure 2.6: An image of the point set described in the text. A point cloud with a
generator of homology more stable than its homology groups.
28
homology.
2.3.1 Birth-death matching
Suppose we have a parameterized family of homology groups, each mapping
into the next. If we are given an element h of the homology at time t, h ? Ht(X), the
length of time it will survive is well defined?its image under the time progression
map, Ht(X)? Ht+s(X) will either be zero or not. Its ?age,? the amount of time it
existed before time t is, however, not well defined. If the preimage under the time
progression map is empty, then it clearly didn?t exist earlier, and if the preimage is
unique it clearly did, but what about a preimage with multiple elements?
Unfortunately, if we are working with coefficients in Z, things are pretty dire
for reasons we will return to. If, however, we are working with coefficients in a field
F , the situation is much more tractable.
We mentioned earlier that we will be explicit when switching from using ?time?
to refer to the real-valued, radius-like parameter to using it instead for the discrete,
integer-valued filtration parameter. Here is such a place. In the two paragraphs
above, we intended time to refer to the former, and below, we will use it for the
latter. Given an n-stage filtration of a d-dimensional simplicial complex, for which
we use upper indices to avoid confusion with the lower indices indicating dimension,
?0 ?? ?1 ?? ? ? ? ?? ?n, the associated chain complexes have the structure shown
in Figure 2.7.
Consider the direct sum of all the homology groups that result with coefficients
29
C0 ?d C
0
d?1 ? ? ?
? C0 ? 0 ?1 C0 0
C1 ?d C
1 ? 1 ? 1 ?
d?1 ? ? ? C1 C0 0
.
.. .
. .
. .. .
.
. .
.
.
Cn ? Cn ? n ? n ?d d?1 ? ? ? C1 C0 0.
Figure 2.7: Here we show a diagram of the chain groups associated to an n-stage
filtration of a d-dimensional simplicial complex. By the nature of a filtration, all the
vertical maps are simple inclusions.
in a field F . The homology groups are vector spaces over F . The diagram above
commutes, as all the vertical maps are simple inclusions, and the ? maps do not
change due to these inclusions. Thus the vertical ?time progression? maps are well-
defined on homology.
This direct sum, together with the time-progression maps, thus contains all
the information of all the homology groups at every time. We therefore name it.
Definition 2.3.1 (Persistence module). Given a filtration of a simplicial complex
and a field F , the persistence module associated to the filtration is an F [x] module.
It is the direct sum of the homology groups with coefficients in F of all the spaces in
the filtration, with the usual F -action of a vector space. The action of x is moving
to the next step in the filtration.
30
As F is a field, F [x] is an integral domain in which every ideal can be generated
by a single element. Such rings are called principal ideal domains, or PIDs. It would
take us too far afield to go deeply into the theory of PIDs, but suffice it to say there
is a well-understood theory of how to classify finitely generated modules over a
PID.11 There is essentially only one way of decomposing this module?thus, there
is a preferred basis for the module. This preferred basis gives us a canonical pairing
of birth/death times for each generator of the homology. This is precisely why the
situation is worse with integer coordinates, as we mentioned earlier?the ring Z[x]
is not a PID, so the modules over it don?t break down in the same easily classifiable
way. Attempts to develop a satisfying theory of multiparameter persistent homology
have encountered difficulties for a similar reason: F[x, y] is also not a PID.
In our much more easily classifiable case, we have the following:
Theorem 2.3.2 (Carlsson and Zomorodian [128]). A finitely generated persistence
module over a field, F , will decompose as
? ?
( xti ? F [x])? ( xrj ? (F [x]/(xsj ? F [x]))),
i j
where the i-indexed terms, which are free F [x]-modules, represent pieces of the per-
sistence module that begin to exist at the tthi time step, and never stop existing. The
j-indexed terms are those which begin to exist at time rj and which then stop existing
at time rj + sj.
11See, for instance, Chapter 12 of Dummit and Foote [63], or any other first-year text on
graduate algebra.
31
Note that, as our point clouds are finite sets, all our persistence modules auto-
matically satisfy the finitely generated condition. Earlier we mentioned a preferred
basis. Note that if not all ti or rj, sj pairs are unique, the preferred basis will not
be either. In this case, we may pick any of the preferred bases, and consider its
elements as the ?generators? we mentioned earlier.
After taking advantage of the discrete structure of the time steps, we can now
translate our integer-valued, discrete, ordinal times back into continuous, real valued
times. A note of caution: there is some inconsistency in the literature and in the
use of computer programs, with some sources translating the discrete times into
radius-like times, and others translating them into diameter-like times, which is a
natural thing to do when using the Rips complex, discussed later in this Chapter.
For the remainder of this dissertation, we shall use diameter-like times, as that
seems to be most consistent with the authors and software packages. As a result,
the following definition is not uniform throughout the literature.
Definition 2.3.3 (Birth time, death time). A generator of the persistence module
of X which first appears as an element of H(Xt /2), and has an image which is firstb
0 in H(Xt /2) is said to have birth time tb and death time td.d
The generators of homology now have well-defined times at which we can con-
sider them to have died or been born. We can graphically represent this information
in a few ways. The two most common are a persistence diagram or a barcode.
Definition 2.3.4 (Persistence diagram). A persistence diagram for dimension d is
32
a subset12 of R2 which consists of a point for every generator of the d-dimensional
homology. The x-coordinate of the point represents the t value at which the ho-
mology generator was born, and the y-coordinate represents the t value at which it
dies. It is convenient to count the line x = y as also being part of the diagram.13
Note that all points in a persistence diagram lie above the line x = y, as
everything must be born before it dies. While we may occasionally refer merely to
?a persistence diagram,? omitting the identification of dimension, it is important
to remember that a persistence diagram contains information about only a single
dimension. Figure 2.8 displays a point cloud and the associated persistence diagrams
for dimensions 0 and 1.
Another way of presenting the information contained in a persistence diagram
is the barcode: a multiset of intervals, one interval for each generator of homology,
with endpoints representing the birth and death times. Just as in the case of persis-
tence diagrams, we consider the barcodes for each dimension to be seperate objects.
Figure 2.9 shows a point cloud and associated bar code.
How can we compare two persistence diagrams? One method is to look at
all possible bijections between the points of the two, and find the bijection which
moves the most-moved point the least. The distance this least-moved point moves
is called the bottleneck distance.
Definition 2.3.5 (Bottleneck distance). Given two persistence diagrams, P1 and
12Technically a multisubset, as we allow points to have multiplicity greater than one.
13And indeed to consider all points on it as having infinite multiplicity.
33
Figure 2.8: Top, we have the persistence diagrams for dimension 0 and 1, which
display the persistent homology of a point cloud. As with any point cloud, the
dimension 0 homology, shown here as blue points, is all born at time 0, and has a
single point which never dies (displayed as a death time of 2 here, when the computer
stopped sampling). The two high off-diagonal points of H1 are suggestive of a point
cloud with 2 large loops. Bottom shows the point cloud, which consists of 100 points
picked at random from a figure-8, with 2 large loops as predicted.
34
Figure 2.9: Top shows a cloud of 100 points picked at random from a figure-8. Bot-
tom is the bar code displaying the persistent homology (calculated with 37 divisions)
of the point cloud. The two long bars in dimension 1 correspond to the two circles
of the figure-8.
35
P2, the bottleneck distance (denoted DB) between them is
DB(P1, P2) := inf sup ?x? ?(x)??,
? x
where ? ranges over all bijections between P1 and P2, and x ranges over all points
in P1.
As calculating the bottleneck distance involves quantifying over all possible
bijections, it can be quite unwieldy. Thus we make another definition:
Definition 2.3.6 (Hausdorff distance). Given two persistence diagrams, P1 and P2,
the Hausdorff distance (denoted DH(P1, P2)) between them is
DH(P1, P2) := max{sup inf ?x? y??, sup inf ?y ? x??},
x y y x
where x ranges over all points in P1 and y ranges over all points in P2.
Note that the Hausdorff distance can never be greater than the bottleneck
distance; one of the options the Hausdorff distance is taking the infimum over is the
matched point ?(x) in the bottleneck distance.
It may aid clarity to translate these distance considerations back into the
language of barcodes. If we are attempting to compare two barcodes, one method is
to put every bar of the first barcode close to whichever bar of the second barcode it
most resembles, (in the sense of endpoint-location), and then find the worst match
(the element of the first barcode whose left endpoint is farthest from its matches
left endpoint, or whose right endpoint is farthest from its right endpoint). This
36
corresponds to the Hausdorff distance between persistence diagrams. If each bar
may only correspond to a single other bar, this added restriction corresponds to
bottleneck distance. In either case, we allow any bar to be matched with the ?empty?
bar whose birth and death are both the midpoint of the non-empty bar. In a
persistence diagram, this corresponds with matching a point off the diagonal with
one on the diagonal. The closest diagonal point will be obtainable by decreasing the
death time by half of the total duration and increasing the birth time by the same
amount.
2.4 Advanced persistent homology
Persistent homology, as we have introduced it above, is a useful tool with some
major limitations. One major limitation is that it can only be used to understand
finite point sets; another is that it is very computationally expensive. In this Sec-
tion we take a different perspective, one which allows calculation of the persistence
of other objects. We also look at some alternative routes to the computation of
persistent homology.
2.4.1 Replacing point clouds with height functions
While the construction we have described above is sufficient for most real-
world applications, the full framework of persistent homology is usually approached
differently. Instead of trying to find the ?shape? of a finite set of points embedded
in Rd, we instead try to find the ?shape? of a real-valued function. If we allow an
37
arbitrary function, some pathologies will be problematic. We therefore restrict our
attention to a class of functions for which many of our previous statements hold, and
note that (on a bounded set) most functions the reader will come across in nature
are sufficiently well-behaved.
Definition 2.4.1 (Tame function). A real-valued function f on a topological space
is called tame if it satisfies the following two properties:
1. f has a finite number of homological critical points
2. Every homology group of every sublevel set of f , that is, a set of the form
f?1((??, a]) for some real a, is finitely generated.
Now we will discuss taking the persistent homology of a tame function. Just
as before, we have a parameterized family of spaces (Xr in the naive construction,
the inverse image f?1([??, r]) in the more advanced) and we can ask for what
ranges of values of the parameter are the generators of the homology of the space
stable. The naive construction is a special case of this more subtle framework; if
the function we are investigating is ?distance from a point to the set X?, we obtain
the naive construction of persistent homology. The two tameness conditions allow
us to again make use of the fundamental theorem of finitely generated modules over
a PID.
From a computational perspective, we don?t have a general method for com-
puting the inverse image of an arbitrary tame function. If our space is finite, we
can check each point individually, but Rd, which is a frequent domain of interest, is
38
not finite, nor even bounded. Arbitrarily large subsets are, however, triangulable,
which makes Rd somewhat amenable to brute-force algorithms.
We might hope that persistent homology is stable under small perturbations,
and indeed it is:
Theorem 2.4.2 (Stability theorem of Cohen-Steiner, Edelsbrunner, and Harer,
2005 [45]). Let X be a triangulable space with continuous tame functions f, g : X?
R. Then the bottleneck distance between the persistence diagrams DB(f, g) is at
most twice the L? distance14 between f and g:
dB(D(f), D(g)) ? 2?f ? g??.
The factor of 2 is not present in the original paper, as the authors are using
a radius-like scale in their persistence diagrams, unlike our diameter-like scale. In
the case of distance from a finite point set, we can rephrase this in terms of the bar
code and obtain:
Corollary 2.4.3. Suppose ? : X ? X ? is a bijection of point clouds, with d(x, ?(x)) <
 for all x ? X. Then the persistence bar code of X ? is identical to that of X, with
2 differences. First, the endpoints of each element of the barcode might move by 2
at either end (and thus any element of X?s barcode which persisted for less than 4
might die). Second, an arbitrary number of new elements may be added, although
14Recall the L? distance between two continuous real-valued functions is the supremum (taken
over all inputs) of their difference.
39
each will be of size less than 4.
Proof. Let f be the function whose output at any point is the distance from that
point to the nearest point of X, and let g be the equivalent function for X ?. Per-
turbing each point in X by at most  perturbs the distance from a set by at most
, so ?f ? g?? ? . Thus, by Theorem 2.4.2, dB(D(f), D(g)) ? 2. By the way
the barcode was defined, each point in D(f) corresponds to an interval in the bar-
code, whose endpoints are determined by the coordinates of the point. As each
point in D(f) is moving each of its coordinates by less than 2, the movement of
each interval?s endpoints is also bounded by 2. Points on the diagonal of D(f),
which represented empty intervals, may move off the diagonal but only by 2 in each
coordinate, so new intervals may be created, but will be less than 4 in length.
While the stability theorem above is quite strong, we might conjecture an
even stronger version. Are there any added conditions which might guarantee, not
only that each bar in the barcode is stable, but also that no new bars are added?
If, for instance, our function is the distance in Rd to a finite point set, then there
are ranges in which we know H0 will not change?for instance, the range from 0
to the minimum pairwise distance between the points. If all of the homology is
stable in some range, are we guaranteed that an -perturbation of the points will
not introduce any new bars into the barcode? Alas, the following example shows
that we do not have this guarantee?the stability result above is, in some sense,
sharp.
Consider three points, {(?1, 0), (0, 1), (1, 0)} at the vertices of an isosceles
40
triangle. This collection of points has stable homology as t ranges between 0 and
?
2. In that range, the homology is just that of three points. For values of r between
?
2 and ?, the homology is that of a single point. But if we shift the top point
upwards by , and consider {(?1, 0), (0, 1+), (1, 0)}, the situation changes. We still
? ?
have stability from 0 to 2 + ?, but we do NOT have stability from 2 + ? to ?.
At t = 2, an element of H1 flashes briefly into existence, hanging around just long
enough to disprove our conjecture before vanishing at t = 2 +  (see Figure 2.10).
2.4.2 Alternate complexes and ease of computation
As a practical matter, how do we compute persistent homology? Once we have
a filtration of simplicial complexes, taking the homology is a relatively straightfor-
ward task. Obtaining the filtration from the function we wish to analyze, however,
is harder. In the fully general case, we would need to be able to determine the
homology of arbitrary sublevel sets of a very general class of function. To avoid
this, let us once again restrict our attention to point clouds, and consider the ways
a simplicial complex may be generated from a point cloud and a scale parameter.
While the C?ech complex described in 2.2.3 earlier is, in some theoretical sense,
the only ?correct? way to compute persistent homology, it has a major downside: it
is extremely computationally expensive to create and store. Fortunately, there are a
number of other methods for obtaining simplicial complexes from a point cloud which
can approximate the C?ech complex. These include the Vietoris-Rips complex [84],
the Delaunay complex [16], the Alpha complex [64], the Witness complex [55], and
41
Figure 2.10: A small perturbation of a point cloud can result in a new element of the
bar code. In the initial complex, the C?ech complex at scale 1 is contractible (top)
and will remain so at scale 1.001 (bottom left). However, if the top point is shifted
up by , there will be a scale at which the C?ech complex has non-trivial homology
(bottom right).
42
many others. We describe a small sampling of the many possibilities here.
2.4.2.1 Clique complexes and Rips complexes
One of the more computationally difficult parts of computing a C?ech complex
is checking every subset of X which has cardinality k + 1 to see if there is a point
within r of all of them. Although there is an algorithm to find the radius of the
bounding sphere for a finite set of points which is linear in the number of points,
the C?ech complex requires calculating this for every subset of our point cloud, and
then storing the resulting (possibly very many) simplices. Clique complexes provide
a way around this issue.
Definition 2.4.4 (Clique complexes). A clique complex is a simplicial complex
which is maximal for a given set of edges. That is to say, if {x1, ? ? ? , xk} is a
collection of points which do not form a simplex, then there is some pair {xi, xj}
which does not form an edge.
A set of edges uniquely determines a clique complex. Unsurprisingly, there is
a name for the clique complex determined by the edges of a C?ech complex.
Definition 2.4.5 (Vietoris-Rips complex). The Vietoris-Rips complex of a point
cloud X at scale r is the clique complex of the 1-skeleton of the C?ech complex at
the radius r.
The Vietoris-Rips complex is much easier to compute and store than the C?ech
complex. All the information needed to construct it is stored in the matrix of
pairwise distances, and no additional computations need to be performed to detect
43
if a given set of points form a simplex at a given scale. Moreover, the Vietoris-Rips
complex is related to the C?ech complex by the following theorem of De Silva and
Ghrist [56]:
Theorem 2.4.6. Let X be a point cloud in Rd and Cr(X) (resp. V Rr) denote the
C?ech (resp. Vietoris-Rips) complex of the natural good cover of Xr. ?Then there is
a chain of inclusions V Rr?(X) ? Cr(X) ? V Rr(X) whenever r/r? ? 2d/(d+ 1).
Moreover, this ratio is the smallest for which the inclusions hold in general.
So, if we have the persistent homology of the Rips complex of X, any element
that persists long enough corresponds to an element of the persistent homology of
the C?ech complex. Theorem 2.4.6 also guarantees that any element of the persistent
homology of the C?ech complex with sufficiently high persistence will show up as an
element of the Vietoris-Rips complex.
We might have hoped that either every element of the persistent homology of
the Vietoris-Rips complex corresponds to an element of the C?ech complex, or vice
versa, but the following examples show that is not the case.
Let X be the vertices of an equilateral triangle with side length 1. Then
the C?ech complex of X contains an element of H1(X) that begins at t = 1 and
?
persists until dying at t = 2/ 3. But the Vietoris-Rips complex never displays any
nontrivial H1.
Conversely, consider a regular polygon in the plane with 2n+ 2 vertices, with
r such that every pair of points is connected if and only if they are not antipodal
(i.e., r is just shy of including the center). The Vietoris-Rips complex of this has a
44
nontrivial element of homology of dimension n, despite the planarity of the figure!
Recall that in the C?ech complex case, we constructed a counterexample to
the idea that, for a sufficiently small perturbation, no new elements of the barcode
would appear (see Figure 2.10). Perhaps we hope that, while we do not have this
type of stability for the C?ech complex, we may have it for the Rips complex. After
all, the above complex was a triangle, so the associated Rips complex will never
have non-trivial homology above dimension 0. Alas, this also fails. Consider a set
of four points arranged in a square with side length 1. The associated Rips complex
has the homology of four points for 0 < t < 1, then has the homology of S1 in the
? ?
range 1 < t < 2. Past t = 2, the homology is once again trivial. If we perturb
the top right corner, letting it swing down in a circle around the bottom right corner
(maintaining a constant distance of 2 between the two points), then at some point
the homology vanishes.
Despite these many deficiencies, the ease-of-use of the Rips complex, together
with the theoretical guarantee given by Theorem 2.4.6, makes it the complex of
choice for almost all practical applications. Every piece of software to compute
persistent homology the author is aware of defaults to using the Rips complex. As
a result of this, we will focus our study of stability in Chapter 4 on the homology of
the Rips complex derived from a point cloud. This is also the root of our decision
to use diameter-like information rather than radius-like.
45
2.4.2.2 Delaunay complexes
Unlike the C?ech or Rips complexes above, the Delaunay complex does not
have a scale parameter. It is entirely determined by the point cloud X.
Definition 2.4.7 (Voronoi cells; Delaunay complex). Given a point cloud X ? Rd,
we will decompose the ambient space into regions with one region for each point of
X. The region corresponding to xi consists of all the points of Rd which are closer
to xi than to any other point of X. These regions are called the Voronoi cells of
the decomposition. The Delaunay complex of X is given by a 0-simplex for each
Voronoi cell, a 1-simplex for each pairwise intersection of Voronoi cells, etc.
See Figure 2.11 for an example of a point cloud and the Voronoi cells of its
decomposition.
The Delaunay complex for a point cloud in Rd will be contractible, making it
less than ideal for topological analysis.
2.4.2.3 Alpha complexes
The alpha complex is, in a precise way, a compromise between the Delaunay
and C?ech complexes.15? Recall, the C?ech complex at scale r is the nerve of the space
x?X Br(x) with respect to the obvious cover. We can obtain an alternate cover
to the space by intersecting each Br(x) with its respective Voronoi cell. This will
give us a good cover of our space, so the nerve, called the alpha complex, will be
15Alpha complexes were first introduced by Edelsbrunner in the early 1980?s [67, 70], and are
thus arguably a part of the inspiration for persistent homology in general.
46
Figure 2.11: A small point cloud (points in blue) and the Voronoi decomposition
of R2 it induces. The corresponding Delaunay complex will have a point for every
region (happily, this means a point for every blue point), an edge for every boundary
between regions, and a 2-simplex for every triple intersection, shown here as orange
points.
homotopy equivalent to the original space, retaining the best feature of the C?ech
complex. Moreover, unlike the C?ech complex, the alpha complex will, in general,
have no simplices of dimension higher than the ambient dimension of the space, Rd,
which simplifies computations considerably. In contrast, the C?ech complex can have
simplices of dimension up to |X| ? 1, which is huge! And worse is the fact that,
by Theorem 2.2.4, these simplices of the C?ech complex do not add anything to the
eventual homology calculation.
In low-dimensional situations, the alpha complex is an excellent choice. As
the dimension increases, however, it almost immediately becomes an impractical
tool. As an informal explanation, the problem is that the number of hyperplanes
bounding each cell goes up swiftly, resulting in an exponent of roughly dd/2e for
47
many computationally relevant quantities such as the number of vertices in the
Voronoi decomposition, the number of simplices in the Delaunay complex, and many
others. But for 2 and 3 dimensions, alpha complexes remain a natural choice.
2.5 Summary
In this Chapter, we have established the basic framework of homology and
point clouds; we have discussed the application of homology to measuring the shape
of point clouds by persistent homology; we have extended the framework of per-
sistent homology to analyze objects other than point clouds; and we have briefly
mentioned some intermediate objects which can replace C?ech complexes.
In Chapter 4, we will take advantage of both the stability result stated earlier,
and some of the alternate complexes proposed, to find alternate methods of speeding
up the computation of persistent homology.
First, however, we shall use Chapter 3 to take a detour into an application
of persistent homology to path-planning, suggested by Bhattacharya, Ghrist, and
Kumar [20]. While we will present no original theorems in the upcoming Chapter,
we believe that some of the insights gained in the course of implementing their
algorithm may prove useful to others.
48
Chapter 3: Applications of persistent homology to path planning
In this Chapter we will discuss the use of unmanned robots with radiological
sensors to find radioactive materials in unfamiliar environments. The first several
Sections lay out the engineering framework of radiological detection, and discuss
some proof-of-concept experiments which have been run. In Section 3.4.3 we discuss
the implementation of an algorithm of Bhattacharya, Ghrist, and Kumar which
applies concepts from persistent homology to path planning, and some insights from
working with this in a real-world context.1
3.1 Introduction
The use of unmanned aircraft to assist in radiological surveys, both pre- and
post-detonation, remains an emerging technology coincident with the evolution of
unmanned aircraft and payloads for sensing and sampling. Some of the past novel
work includes development of a tethered nuclear materials sampling payload, a spa-
tially variant deconvolution method for source localization, and collaborative Un-
1This Chapter is a lightly edited and expanded version of a paper [90] I coauthored with
Wojciech Czaja and Weilin Li at University of Maryland; Kevin Kochersberger, John Peterson,
Prashant Kumar, and John Bird at Virginia Tech; and Morgan McLean at the Remote Sensing Lab.
The research was supported by the Defense Threat Reduction Agency Basic Research Program. A
different perspective on some of the experiments described herein is given by my colleague Weilin
Li in [92].
49
manned Aerial Vehicle (UAV) Unmanned Ground Vehicle (UGV) sensing and local-
ization to safely map source distributions. As threats continue to emerge and the ac-
ceptance of unmanned aircraft grows, an expanded portfolio of uses?including event
clearing, intelligent search, post-detonation analysis, and tiered search strategies?is
being proven.
This Chapter reports on work at Virginia Tech in pre- and post-detonation
data collects with UAVs ranging from 10 kg hexacopters to 90 kg single-rotor he-
licopters. We have shown the integration of UGVs into a measurement system
designed to reduce risk to first responders while generating accurate analytics of
chemistry and location when mapping radiological distributions. Results are pre-
sented for several experiments that define use-cases of interest to first responders.
3.2 Motivation
The use of unmanned systems, both UAVs and UGVs, to assist in the search
of nuclear materials has been an outgrowth of applications in disaster response,
with significant investment after 9/11. Despite the advantages of autonomous un-
manned systems, the transition to full autonomy has been slower than with other
Unmanned Aerial System (UAS) applications, such as 3D mapping. This is due in
part to a conservative response community that has shown reluctance in the past to
embrace the full benefits of autonomy in support of human-machine teaming. More
recently, acceptance of perception and autonomy in nuclear materials identification
and localization has accelerated, and their useful applications are growing.
50
Scalable and complementary unmanned systems are a focus of research to
integrate small autonomous systems into existing search methods that include large
manned aircraft as well as backpack-carried detectors. Some of the mission protocols
include initial assessment of a post-detonation environment, localization of sources,
3D data collection, and the expanded use of sensing in the non-visible EM bands.
3.3 Radiological isotope measurement and modeling
After collection of aerial radiological measurements and terrain imagery, there
is a challenge in finding the ground-level radioactive distribution which has numerous
solution strategies. Kochersberger, et. al [43], showed that a spatially variant point
spread function could be employed in deconvolution to map the location of point
sources using raw gamma count data. Figure 3.1 shows the result of localizing
two closely spaced sources from a 60 m overhead scan using a Yamaha RMAX
autonomous helicopter. The stronger source obscures the location of the weaker
source in the summed gamma counts shown on the left.
The use of summed gamma counts is a first-order method of localization.
Incorporating spectral and visual data has the potential to identify sources with an
intensity nearly equal to the background radiation level. Computational harmonic
analysis has been applied to support dimension reduction, multiscale representation,
and multi-modal data fusion resulting in robust spectral anomaly detection. Sources
that would normally be buried in background radiation signatures are extracted with
these methods.
51
Figure 3.1: Deconvolution method of source localization. The presence of a weaker
source is hidden in the raw count data shown on the left, but is revealed when
deconvolution is applied, shown on the right.
Laplacian Eigenmaps (LE) and Schro?dinger Eigenmaps (SE) have already been
successfully used to perform, respectively, unsupervised and supervised clustering of
radiological data [53]. An example of this is seen in Figure 3.2, where the application
of the Eigenmap nonlinear dimension reduction reveals relatively subtle spectral
signatures without the need for a human spectroscopist. Class 5 in the Figure is
determined to be significant despite the fact that the gross count levels were the
same as background; small peaks in the low energy region are discerned from the
unsupervised method indicating the unexpected presence of Cs-137.
Moreover, preliminary results have shown that the Fourier Scattering Trans-
form (FST), a hybrid Fourier and neural network transform that was developed in
Czaja and Li [51,52] is effective at locating relatively weak sources. Figure 3.3 shows
a plot of predicted radiological anomaly source location using FST.
52
Figure 3.2: An aerial scan conducted by Virginia Tech with a 2? x 2? Nal
scintillation-type detector over a dispersion site of Br-82 using conventional ex-
plosive.
Figure 3.3: Spectral data showing source prediction using FST. The black points
are sample points and the red diamonds are true source locations. Half of the data
was used as a training set for the FST coefficients while the other half was used for
test.
53
3.4 Multi-agent search methodology
Operating a multi-agent search mission has benefits in complex environments
where air and ground systems provide complementary data for analysis. Aerial
search can be employed in hard-to-traverse areas, and terrain classification is sim-
plified using nadir aerial imagery. Ground robotic systems can be directed to areas
of interest based on classified traversability from the aerial imagery, and they have
the capability of long dwell data collection not possible using UAVs. Combined,
this data allows the robotic system to answer the question: what does the classified
environment and the radiation map tell us about the next place to search? Fur-
thermore, generalizing a search mission to include human agents offers additional
flexibility to the autonomous system while improving overall search efficiency.
Our approach to a multi-agent search and localization strategy is subdivided
into three main tasks: Data collection, initial radioactive source location/distribution
estimation, and final estimation. The following Sections will describe different data
collection hardware and approaches for source estimation.
3.4.1 Data collection
We have performed several experiments, listed in Table 3.4.1. Here we give
some details of the UAVs which which flew the experiments.
Methods of aerial data collection range from use of the Yamaha RMAX au-
tonomous helicopter carrying a 3? x 9? NaI scintillation-type detector, to a custom-
designed hexacopter that carries a 2? x 8? crystal or 2? x 2? crystal plus a Cadmium
54
Date Location Experiment Description
1/18 SRNL Aerial scan, identification of sources, classification,
and secondary ground robotic and aerial long-
dwell data collection
4/18 INL Aerial scan over dispersion site of Br-82. Data
is dense enough to create accurate plume maps.
Dimension reduction is used to find weak isotopes
7/18 INL Another aerial survey campaign over dispersal site
of Br-82
10/18 NNSS Scan at crater to identify interesting isotopes. The
system was designed for BVLOS flights inside the
crater
Table 3.1: This Table lists the experiments run by the UAV/UGV described in this
Chapter.
Zinc Telluride (CZT) imager. The RMAX, shown in Figure 3.4, has been equipped
with a stereovision system useful in classifying terrain traversability and perform-
ing 3D reconstructions. As a 90 kg system capable of lifting 20 kgs of payload,
the RMAX has also been used to carry a custom-designed, tether-deployed ground
sampling robot which remains tethered throughout its mission for retrieval.
A VT-designed hexacopter shown in Figure 3.5 has been developed as a low-
cost aerial platform with a 6 kg payload capacity. Its high payload capacity combined
with a Swiftnav Piksi Multi Real Time Kinematic (RTK) Global Position System
(GPS) allow it to accurately hold position while carrying the gamma-ray imager.
A smaller hexacopter adapted from a commercially available airframe, shown
in Figure 3.6 has also been utilized. An Nvida TX2 computer and Swift Piksi
Multi RTK GPS provide computational power and positional accuracy necessary
for adaptive sampling and route planning for ground systems.
55
Figure 3.4: RMAX with Nal detector and stereovision system in-flight.
Figure 3.5: VT-designed hexacopter with CZT and Nal detectors installed (16.8 kg
gross weight).
56
Figure 3.6: The UAV and UGV used in the experiments in Table 3.4.1
.
Figure 3.7: Radiation contour generated from the INL flights, July 2018.
An example of the utility of drone measurements is provided in Figure 3.7.
The spatial resolution here is much greater than can be obtained by a manned air-
craft, since the speed and altitudes of manned aircraft do not favor data resolution.
Drones fill a significant need for higher resolution data in a tiered discovery, where
initial findings from manned aircraft point to areas of interest that demand fur-
ther investigation by drones. More accurate source classification and safe perimeter
definition follow from the higher quality data products.
57
Figure 3.8: A progressive 3D mosaic created during image capture to accelerate
ground-based route planning for inspection in time-critical situations.
Radiation measurements from an unmanned aircraft should be complemented
by nadir imagery which both provides context to the radiation data and is also
used to inform ground robotic and manned searches. Using a custom stereovision
system, the Virginia Tech team has developed a data processing architecture to
compute disparity maps and find feature point matches to produce 3D mosaics for
route planning during an overflight, accelerating the speed at which ground-based
inspections can occur shown in Figure 3.8. Christie [43] showed the benefits of
reasoning about 3D and 2D information to obtain accurate classes on the ground
which reduce mission failure with low-risk route plans in a detailed search.
3.4.2 Online radioactive source location estimation
A challenge in most data collection scenarios is to extract meaningful source
information when background radiation levels can be at the same order of magnitude
58
as the source. By leveraging an analytical model between the counts observed in
a detector volume and the activity of sources of radiation, environments may be
efficiently explored to reduce misclassification risk and hypothesis uncertainty.
This analytical model was derived for a spherical detector volume with a mono-
energetic point and validated in Geant4 [3], a particle transport simulator. If the
relative position of a measurement and a source is known, then there is a linear
relationship between the unknown activity of the source and the counts recorded
in the measurement. A set of hypotheses on the position of sources within the
environment can be constructed. Then the strength as well as an uncertainty of
each source in each hypothesis may be estimated through Maximum Likelihood
Estimation. The likelihood of each hypothesis itself may be computed to identify
the most likely hypothesis out of the set.
With this model, it is possible to estimate the expected reduction in uncer-
tainty that would occur in each hypothesis if the robot were to make additional
observations. For each candidate in a set of trajectories, this improvement may
be characterized by metrics such as mutual information and the reduction in mis-
classification risk, allowing the system to select the most informative trajectory.
Misclassification risk is the risk that given a user defined threshold on source activ-
ity, a source which is currently estimated to have a strength above that threshold is
actually weaker than that threshold and vice versa. This risk may be estimated on
a per source basis by examining the estimated strength and uncertainty in strength
of each source in each hypothesis. Mutual Information, given by the difference in
entropy of the distribution, measures the reduction in uncertainty in the estimate
59
of the strength. Maximizing Mutual Information maximizes the reduction in un-
certainty. By optimizing over these rewards, the system is able to select the most
informative trajectory through the environment.
This method enables observations from multiple independent robots with sep-
arate radiation detectors to be combined to form a single estimate of the position
and strength of a source in the environment. The planning method described above
allows the paths of these robots to be coordinated to efficiently explore the environ-
ment by considering their joint effects on the distributions.
Figure 3.9 shows an example of a trajectory navigated by the UGV when
aerially collected radiation measurements were used to initialize the model for a
good initial guess on the location of the point source.
3.4.3 Terrain based source inspection
We approach the problem of source estimation as a two-step process where
the second step leverages data from the first round to improve the allocation of
air and ground sampling resources. Reconstructing the 3D environment from nadir
imagery and semantically classifying the visual 2D data allows ground vehicles to
efficiently navigate the environment. Figure 3.10 shows an experiment conducted
at Kentland Farm, near Virginia Tech, where the first aerial scan provides a source
location estimate. The UGV then navigates to the estimated source location to
collect longer dwell data using the estimated terrain classes to minimize cost.
Operational tempo is maintained in the overall search mission by generating
60
Figure 3.9: Trajectory taken by UGV to localized point source of radiation. This
trajectory was informed by aerially collected radiation measurements. Color indi-
cates the log likelihood of hypotheses on the position of the point source. Warmer
indicates more likely.
Figure 3.10: In this example, aerial data (left) is collected over two weak sources
(pink), and a single source (red) is identified. Starting 100m away, a ground robot
takes the minimum cost path (right) to visit the source based on a power consump-
tion heuristic and distance.
61
3D terrain reconstructions while the initial survey flight is in progress. Aerial data
will provide the bulk of the planning information, but ground vehicles will augment
this with higher resolution data for obstacle detection and the capacity for on-line
learning of terrain costs to reduce the dependence on comprehensive training data
sets.
Using aerial data generated by the UAV, the LE spectral clustering provides an
initial probability distribution map for the distribution of radioactive sources. The
UGV may use this probability distribution map and generate a list of candidate
locations for radioactive sources.
Once the UGV has a candidate location to inspect and a semantically classified
map, each given to it by the UAV, it must select a path. If the traversability
status of all the terrain is fully understood, the UGV can pick a path minimizing
time, distance, energy-cost, or some other criterion. If the traversability status
is uncertain, however, whether through uncertainty in the environmental semantic
classification (is the black streak asphalt or mud?) or through uncertainty in the
robot?s own capabilities, (will it get stuck in the mud?) we may be faced with the
unenviable task of trading off a safe path (one with a high degree of certainty in
traversability) with a path that is otherwise desirable (shorter, or generating more
Bayesian information).
One solution for a problem of this type was suggested by Bhattacharya, Ghrist,
and Kumar [20]. They give an algorithm that takes an image where each pixel has
been assigned a probability of traversability, along with a desired start and end
point, and returns a path from the start point to the end point that is intended to
62
be a desirable path in the face of uncertainty. Their algorithm is:
1. Pick a finite set of probabilities at which to make calculations. If unsure which
probabilities to pick, a reasonable choice is equally spaced numbers from 0 to
1, including both endpoints.
2. At each probability picked above, we create a derived image, one where each
pixel is either 1 (all pixels at or above the probability threshold are counted
as traversable) or 0 (all pixels below the probability threshold are counted as
non-traversable).
3. For each of these images, calculate all Z/2 homology classes of paths from the
start pixel to the end pixel. If there are n obstacles that may be navigated
around, there will be 2n classes of paths (roughly speaking, a path may pick
?go clockwise? or ?go counterclockwise? to avoid each obstacle), so it may
be desirable to ignore all obstacles below a certain size for computational
efficiency.
4. If 0 < p < q < 1, then every path in the image with threshold q is a legitimate
path inside the image with threshold p?at the extreme, a path with threshold
1 only goes through pixels which are guaranteed to be traversable. Calculate
the ?persistence? of a path based on this correspondence.
A point the author found to be of practical importance in implementing the
first step of this algorithm: in practice, we are not handed a ?probability of traver-
sal,? but rather an image of an area which has been semantically classified together
63
with some information about our UGV?s capabilities. Even if our semantic classifi-
cation and assessment of the UGV?s capacity is correct enough that we can assign
a ?traversability score? to every pixel and be guaranteed that the pixels with a
higher score are more likely to be traversable, there is a large amount of freedom in
how to map these scores onto probabilities from 0 to 1. Even the most straightfor-
ward methods, such as linear and log-linear mapping, often produce vastly disparate
results.
The last step in this algorithm is more subtle than it appears, and so is deserv-
ing of more explanation. Just as in Chapter 2, we have a map that can be thought
of as a time progression map (including the paths with more certain traversability
into the regions with a lower traversability threshold). Further paralleling Chapter
2, this map induces a map on the homological objects of interest. These objects
were traditional elements of homology in Chapter 2 and are paths whose homology
classes we are interested in here. Still following Chapter 2?s example, any element
not in the image of a stage of this map is one we can consider to be ?born? at that
time, and any element in the image with a unique preimage is ?persisting?. Unlike
the persistent homology groups described in Chapter 2, homology classes of paths
between two fixed points do not have a natural group structure, or distinguished 0
element, so there is no question of an element dying by ?being mapped to 0?. The
only way for a class to die is for it to merge with another class, and in that case,
which class dies, and which survives? In Chapter 2, we resolved this by resorting to
the classification of finitely generated modules over a PID, but, as the present case
lacks even a group structure, we certainly can?t make use of that. Bhattacharya,
64
Ghrist, and Kumar suggest looking at the set of preimages which mapped to the
homology class of some path ?, and pick the one with the lowest Hausdorff distance
to ??consider that as being part of the same persistence class, and declare all other
preimages dead.
Another option, original to this author, is to parameterize the paths, whether
through distance or expected travel time of the robot and take an integral over the
parameterizations of the distance between ? and each of its preimages. We use this
as the distance, then proceed as in Bhattacharya, Ghrist, and Kumar?s original idea.
This algorithm produces a path that is desirable in that it is minimally sensitive
to our uncertainty in the traversability of the terrain. If there is terrain that cannot
be traversed, this path is ?close to? one that is more likely traversable.
Once the UGV reaches a target location, it collects higher quality data for the
supervised FST algorithm to classify. The FST computes oscillatory features of the
spectral data and satisfies several desirable properties for classification: it is energy
preserving, it is stable with respect to additive noise, and it contracts sufficiently
small translations and diffeomorphisms.
After classification, the spectral information corresponding to the scouted lo-
cation is incorporated into the global dataset and the probability distribution is
recomputed using the supervised spectral clustering algorithm of SE. The path for
the UGV is recomputed according to this new distribution. This process continues
until all suspected locations are checked.
When an environment is large, complex, and includes locations inaccessible
to ground vehicles, the combined deployment of both UAVs and UGVs allows the
65
Figure 3.11: Initial exhaustive scan result. The yellow points indicate where sources
have been placed. The colored track indicates the path of the UAV and the measured
counts per second.
system to localize sources. Multi-agent task allocation ensures that the environment
is efficiently and completely explored while considering the mobility constraints of
the vehicles and the effectiveness of the sensors equipped to each vehicle.
A demonstration of a search and classification mission with variable terrain
traversability was performed at a National Labs in January, 2018. In this experi-
ment, three sources were located on concrete pads, considered traversable for the
UGV, and one source was placed in a grassy area, considered less traversable for the
UGV. An initial scan of the area provided source contours that would be used to
inform a secondary search using either the UGV or UAV, depending on the terrain
type. Figure 3.11 shows the initial scan result.
66
LE dimension reduction was performed on the exhaustive search data to es-
timate locations for secondary exploration. Additionally, the aerial imagery was
classified to determine where the UGV can explore, and a route plan was generated
for both the UAV and UGV, shown in Figure 3.12. The air and ground vehicles are
tasked to dwell at each exploration point to obtain higher quality data.
For contrast, we include the alternate, high-persistence paths generated by our
implementation of Bhattacharya, Ghrist, and Kumar?s path-planning algorithm, in
Figure 3.13. Some of the most notable differences between these paths and those
the UGV actually took, shown in Figure 3.12 are due to an eccentricity of the DOSL
library created by Bhattacharya2 [19], and used in our work?it has a preference for
straight lines in one of the 8 cardinal directions, as it is locally going pixel-by-pixel.
This results in, e.g., the parallelograms made by both the red and the blue pathways.
3.5 Discussion of topological path planning
Other than the differences created by the above feature of DOSL, the paths
generated by our path planning software are quite similar to those generated by the
standard methods. Some thoughts on the contrasting advantages of each:
1. The largest advantage of the path planning algorithm is that, as an algorithm,
it does not require human input after the initial set up. We caution, however,
that the initial set up is not trivial, and there are many choices to be made
that human thought is important for. For example, the choice of what size
2The DOSL library itself makes use of OpenCV [23].
67
Figure 3.12: The top and bottom concrete pads and the center grass area are iden-
tified as traversable and less-traversable regions respectively for the ground robot
(UGV). Yellow points are the actual radiation sources while the red points are the
estimated location. In this image, the purple line is the path the robot took, using
standard navigational methods.
68
Figure 3.13: These are alternate paths the UGV could have taken, generated by our
implementation of Bhattacharya, Ghrist, and Kumar?s path-planning algorithm.
We color the concrete areas white, representing the semantic identification of the
concrete pads as certainly traversable.
69
obstacles to exclude, or how many layers to calculate the paths within.
2. The largest disadvantage to the algorithm is the running time. Generating
Figure 3.13 took more than four hours of computer time. Downsampling the
image, calculating the paths at fewer layers, and choosing to ignore all obsta-
cles below a certain threshold can all speed up the calculations dramatically,
but all run the risk of worse outcomes.
3. The final paths generated by the topological algorithm and a human operator
were fairly similar.
Because of this, for terrain which can be semantically classified as having a very
high likelihood of traversibility, we recommend against this topological method?the
necessity of calculating two to the power of (number of obstacles) paths in each of
possibly many layers is a high computational burden for not much gain. In highly
uncertain environments, however, more investigation is warranted.
Bhattacharya, Ghrist, and Kumar do not provide a quantity of practical in-
terest which this path maximizes or minimizes. It is difficult to do so?the most
naive formulations of such a criteria have counterexamples. The counterexample
shown in Figure 3.14 actually shows that if each pixel?s chance of being traversable
is independent, no algorithm that only takes the homological classes of the paths
into account can succeed in being minimally sensitive to the traversal probabilities.
If we add the hypothesis that the traversability of the pixels is strongly corre-
lated, in the sense that ?if pixel p, with chance of traversability PT (p), is traversable,
then so is every pixel q such that PT (q) ? PT (p),? the counterexample mentioned
70
Figure 3.14: Black represents non-traversable terrain, red represents terrain with
chance of traversability p, blue represents terrain with chance of traversability 
higher than p, and pink represents terrain with an intermediate chance of traversabil-
ity. While the central path through the blue region is both shortest and has the most
persistent homological class, as a practical matter it only takes a single pixel of blue
being non-traversable to necessitate a very large divergence from that path. Then
if the second most persistent path is the backup choice, on the eastern side, then
again a small amount of bad luck can necessitate a great deal of backtracking. The
western path, while marginally less persistent, will only require a single traversable
pixel of red to be found.
71
above is no longer a problem. While this sounds highly restrictive, it is, in practice,
perhaps more plausible than we might fear?if, for instance, the terrain includes
areas that we can semantically classify as ?very muddy,? ?somewhat muddy,? and
?less muddy,? then we may intuitively expect that a robot?s ability to traverse areas
labeled ?somewhat muddy? would imply an ability to traverse ?less muddy? areas,
without guaranteeing its ability to navigate ?very muddy? areas.
Or, for an example less directly relevant to the current circumstance, consider
a configuration space. It may be that a system?s ability to enter into some types of
states, represented by a region of the configuration space, is uncertain, but that the
uncertainty is entirely caused by boundary conditions?after the system successfully
achieves one state in the uncertain region, its ability to transform into other states
in the same region is assured. Think for instance of a robotic arm with many joints,
with the position of the arm and its joints represented as a single point in the space,
in an environment containing obstacles we may or may not be able to fit the arm
between. Once the arm is in a new region, it can move freely.
There are some tantalizing parallels between this and simulations of proteins,
where we are not able to move freely through the state space, but rather are con-
strained by physical chemistry. In this case, the ?uncertainty? of being able to enter
a region is replaced by a high energy requirement to transition between otherwise
stable states. A multistage path to changing the folding structure of a protein, or un-
derstanding the natural folding pathways, could conceivably take advantage of some
of the path-planning we have discussed in this Chapter. See, e.g., [5] for a recent
example of a successful attempt to understand a path through such a configuration
72
space.3
3.6 Conclusion
This Chapter has showcased a use of persistent homology, in path planning for
UGV?s after preliminary UAV flights in response to radiological incidents. UAV?s
are able to fly at low altitudes enabling them to make effective radiation measure-
ments despite their limited payload capacities. Methods such as LE and SE have
been used to automatically detect anomalous sources in raw gamma-ray spectra.
Aerial vehicles are also suitable to carry scene mapping sensors in the form of ei-
ther Lidar or stereo vision systems. This combination of scene information and
radiation measurements not only improves the accuracy of radiation models, but
also enables terrain informed navigation for UGVs to investigate candidate source
locations. Operational tempo may be improved by utilizing online algorithms for
terrain reconstruction and source localization.
As we closed with a discussion on the use of topological path planning, one of
the major disadvantages was the length of computation time required. In the next
Chapter, we will lay a foundation for a technique to speed up the calculations of
persistent homology.
3In this case, the path represents a pair of proteins undergoing a coupled folding-binding event.
73
Chapter 4: Results on stability of persistent homology and dimen-
sion reduction
Several of the steps involved in computing persistent homology are computa-
tionally intensive, sometimes prohibitively so. In this Chapter, we will discuss some
of the theory of computational speeds of these steps, while in the next we will give
some experimental results on two popular software implementations of these.
The fastest available software still requires nearly half an hour on a very fast
computer to calculate even just through the second homology of a point cloud with
only 2000 points in 3 dimensions. It is thus desirable to try to find shortcuts, by
which we can sacrifice a small amount of accuracy for a significant gain in speed. In
this Chapter we discuss the theoretical foundations one such technique.
4.1 Pipeline of persistent homology
Let us once again make use of the computational pipeline we have described,
focusing this time on the speeds of computation.
1. We may preprocess a point cloud.
2. We obtain a parameterized family of simplicial complexes from this point
74
cloud, possibly using C?ech, Rips, or Alpha complexes, and interpret the pa-
rameterization as a filtration.
3. We may reduce the filtered simplicial complex.
4. By taking the homology, we obtain a family of homology groups. By reinter-
preting the filtration, we can consider it a parameterized family.
5. We can then display this information in a persistence diagram or a barcode.
4.1.1 Preprocessing
The point cloud may be preprocessed through either sparsification [41, 94]
or dimension reduction. Sparsification is computationally very inexpensive, and is
discussed in more detail elsewhere. Dimension reduction can also be performed
inexpensively?a random projection matrix can be calculated beforehand.
4.1.2 Obtaining a filtered simplicial complex from a point cloud
The naive implementation of the C?ech complex involves calculating an enclos-
ing radius for every n-tuple of points and thus scales with the number of subsets of
n, 2n. This is, to put it mildly, computationally infeasible. In practice, therefore,
it is almost always preferable to use a complex which is easier to calculate. The
mathematical community has settled on the Rips complex, which, as discussed in
Chapter 2, is related to the C?ech complex and thus has a theoretical accuracy guar-
antee and, being a clique complex, requires little computation beyond the pairwise
distances of the points. As this pairwise distance computation itself scales both
75
with the square of the number of points and the ambient dimension of the points,
dimension reduction is a natural candidate for speeding computation. A previous
attempt to cirumvent the ?dimensionality curse? was the invention of witness com-
plexes [55]. While these have desireable convergence properties for zero and first
dimensional homology, there exist cases where they do not give correct results for
higher dimensional homology groups [108].
4.1.3 Reduction of filtered simplicial complex
A filtered simplicial complex can be greatly reduced using discrete Morse the-
ory [98]. This is a well developed area with a great deal of its own literature to
support it, and we have nothing further to add to that illustrious tale of mathe-
matical victories?which has recently been added to by Henselman and Ghrist [85]
through their use of matroids.
4.1.4 Obtaining a parameterized family of homology groups
The simplest way of obtaining a parameterized family of homology groups
is through column reduction of the boundary matrix. This bears a great deal of
resemblence to classical matrix row reduction algorithms, and so shares speed es-
timations with them.1 Many refinements to this algorithm have been found. For
instance, refinements that greatly improve speed include taking advantage of duality
to perform easier cohomological calculations rather than homological ones [58, 59],
not bothering to reduce columns in the matrix which will not contribute to the final
1In particular, for an n? n matrix over a finite field it is O(n3) [4].
76
answer [42], and others. The author has been unable to locate full analyses of the
time complexity of the algorithm in its fully-refined splendor; indeed, Ulrich Bauer,
who wrote the code Ripser, which is the fastest implementation available, stated in
2017 that, to his knowledge, ?there is no asymptotic analysis of the algorithm? [14].
4.1.5 Displaying information in a persistence diagram or barcode
From a parameterized family of homology groups, there is negligible compu-
tation time involved in displaying the information. We include this step mostly for
completeness, and also as an excuse to point out some of the interesting work done
in persistence landscapes [26] and elaborated on with persistence images [1], which
makes the information of the persistent homology of an object a much more fertile
foundation for the application of machine learning techniques.
4.2 Stability of persistent homology
Recall Theorem 2.4.2, on page 39, which says, roughly speaking, that anything
we do to the point cloud that doesn?t interfere much with pairwise distances will not
interfere much with the persistent homology. We restate it here for convenience:
Theorem 4.2.1 (Cohen-Steiner, Edelsbrunner, Harer [45]). Let X be a triangulable
space with continuous tame functions f, g : X ? R. Then the bottleneck distance
between the persistence diagrams DB(f, g) is at most twice the l
? distance between
f and g:
dB(D(f), D(g)) ? 2?f ? g??.
77
While they do not state and prove this theorem for Rips complexes, only for
the C?ech complex, we note (for use in a few pages) their proof goes through with
no complications in the Rips complex case.
Note that Rn is triangulable, and the function which gives distance to the
nearest point of a point cloud is both continuous and tame. To ensure tameness,
recall that we defined a point cloud to necessarily be not only discrete, but finite.
In this Section, we will prove a closely related theorem, a highly general sta-
bility result for the persistent homology of the Rips complex associated to a point
cloud. Our proof will only apply to the ?naive? view of persistent homology, where
the object that has persistent homology is a point cloud, and only to the Rips
complex. As an immediate application, we will pair our Theorem 4.2.11 with the
Johnson-Lindenstrauss Lemma (Theorem 4.3.1) and obtain a result on preservation
of persistent homology through dimension reduction.
The results we show here were inspired by Baraniuk and Wakin?s paper on
random projection of smooth manifolds [12]. To the best of the authors knowledge,
this paper was the first to suggest the application of the Johnson-Lindenstrauss
Lemma (Theorem 4.3.1) to persistent homology. After finding our result, we dis-
covered that in so doing, we have overlapped with a result of Don Sheehy [120],
who showed that the persistent homology of a point cloud and its distance function
are interleaved.2 His result was strengthened just last year by Arya et al [7], who
2We borrow and slightly modify some terminology of his that we have not seen elsewhere for
Definition 4.3.2.
78
gave a result applicable to any linear transformation which preserves distances to
within a multiplicative factor of (1? ). Their results are both stronger than ours in
that they consider alternate versions of ?distance to a point cloud?, where we only
discuss Euclidean distance. On the other hand, our result is more general in that it
depends on neither linearity of the function, nor any specific form of approximate
distance preservation.
The structure of the proof we give here follows Cohen-Steiner, et al?s proof of
Theorem 2.4.2 very closely. To aid comparison of the two, we borrow much of their
language and many of their notational conventions.
4.2.1 Notation
For technical reasons which will become clear, we will not work with point
clouds?instead, we will phrase many of our theorems as concerning weighted graphs.
Definition 4.2.2 (Weighted graph). A weighted graph is a set of points together
with a positive number for every pair of points, which we shall think of and refer to
as the edge length.
While the edge length acts much like a distance, it is not guaranteed to satisfy
the triangle inequality. Any point cloud can be interpreted as a weighted graph,
with distance given by the ambient metric of Rn. We will use either d(x1, x2) or
?x1 ? x2? to refer to the length of the edge between x1 and x2?the former when
we wish to emphasize the lack of triangle inequality, the latter when we believe the
most common applications will be in the case of a weighted graph derived from a
79
point cloud.
A weighted graph naturally gives rise to a filtered Rips complex, which in turn
has persistent homology that we can consider the persistence diagram of. Just as
with point clouds, if X is a weighted graph, we use Xi for the subgraph with all
edges of length less than or equal to 2i. (The factor of two is due to the conversion
from radii to diameters).
We use R(Xi) to denote the Rips complex associated with Xi. We use H
i
k(X)
to denote Hk(R(Xi/2)), the homology of the Rips complex with diameter i with
coefficients in a field. Suppressing the k, let ?i denote the dimension of Hik(X).
For a ? b ? c, let Ha?bk be the image of Hak(X) in Hbk(X) induced by the
inclusion R(Xa/2) ?? R(Xb/2). Note that if the interval [a, b] contains none of the
pairwise distances between points of X, this map is an isomorphism.
Consider the sequence Hak(X) ? Hbk(X) ? Hck(X) induced by inclusions of
the Rips complexes. Note that this sequence induces natural maps from Ha?bk to
Ha?ck (a surjection) and from H
a?c
k to H
b?c
k (an inclusion). We will use ?
a?b for the
dimension of Ha?bk .
From here on, we fix a homology dimension k, and suppress it in the notation.
Let (ai)i=1...n be the set of edge lengths of X, ordered from smallest to largest,
and let (bi) be an interleaved sequence; ie, bi?1 < ai < bi. For any two subscripts,
we will refer to the quantity ?i?j := ?bi?1?bj ? ?bi?bj + ?bi?bj?1 ? ?bi?1?bj?1 as the
multiplicity.
Suppose we have four real numbers, none of them in (ai), and ordered w <
x < y < z. There is a natural surjection, Hx?y(X) Hx?z(X). With apologies for
80
overloading our notation, and keeping in mind that we are suppressing the subscript
k indicating homological dimension, we will write Hy,zx for the kernel of this surjec-
tion; that is, Hy,zx is elements of Hx which die between times y and z. There is an
inclusion Hy,z ?? Hy,zw x . This is because any element of Hw which dies between times
y, z must still be alive at time x. We will thus use Hy,zw,x to represent the quotient
Hy,zx /H
y,z
w .
We will write D(X) for the persistence diagram associated with the Rips
complex of X, filtered by edge length in the obvious way. This is the multiset of
points in R2, where (a , a ) has multiplicity ?i?ji j , together with all points on the
diagonal ? := (x, x) counted with infinite multiplicity. We will refer to the sum of
the multiplicities of the off-diagonal points in any region of the persistence diagram
as the size of that portion?of the persistence diagram. The size of the entire diagram
is thus #(D(X) ??) := ?i?j. The dimension of Hy,zi<j w,x(X) is exactly the size
of the portion of D(X) in a box #(D(X) ? [w, x]? [y, z]).
We will refer to the quadrant of R2 above and to the left of (x, y) by Qyx :=
(??, x]? [y,?).
Lemma 4.2.3 (k?triangle lemma of Cohen-Steiner, Edelsbrunner, Harer [45]).
Suppose we have a weighted graph X and two values x < y, with neither x nor y
being pairwise distances between points of X. Then #(D(X) ?Qy) = ?x?y.3x
3As stated originally, the theorem is more general; it concerns tame functions on any space,
while what we do here could be interpreted as piecewise linear functions on complete graphs.
81
4.2.2 Quadrant lemma and box lemma
Throughout this Section, suppose we have a bijection between weighted graphs,
? : X ? X ?? which approximately preserves edge lengths in the sense that there are
bounding functions for ?: continuous, monotonic, invertible functions L(d), U(d) :
R+ ? R+ with L(d) ? d ? U(d) with, for any x1, x2 ? X, L(?x1 ? x2?) ?
??(x1)??(x2)? ? U(?x1?x2?). We will identify the points of X with their images
in X ??. Equivalently, we may instead think of X and X ?? as two sets of edge lengths
assigned to a single set of vertices.
Note that any statements we make will have mirror statements with the roles
of X and X ?? reversed, U replaced by L?1, and L replaced by U?1. We will refer to
this as the ?mirror principle?.
Proposition 4.2.4. Given two reals, b < c, we have the inclusions
R(X ??L(b)/2) ?? R(Xb/2) ?? R(Xc/2) ?? R(X ??U(c)/2).
Proof. In order for a set of points ? ? X to not form a simplex in R(Xb/2), there
must be at least one pair of points, x1, x2 ? ? with d(x1, x2) > b. By definition
of L, d(?(x1), ?(x2)) > L(d(x1, x2)) > L(b), so the corresponding set of points do
not form a simplex in R(X ??L(b)/2), establishing the first inclusion. The second is
trivial. The third inclusion follows from the applying the mirror principle to the
first inclusion.
82
These inclusions of Rips complexes induce maps on homology groups:
HL(b)(X ??)? Hb(X)? Hc(X)? HU(c)(X ??) (4.1)
Induced by similar inclusions, we also have
Hb(X)? HU(b)(X ??)? HU(c)(X ??). (4.2)
As all the relevant maps on the level of Rips complexes are inclusions, the compo-
sition of maps in (4.2) is equivalent to the composition of the latter two maps in
(4.1).
We also have the inclusions
HL(b)?U(c)(X ??) ? Im(Hb(X)? Hc(X)? HU(c)(X ??)) ? HU(b)?U(c)(X ??), (4.3)
where the first inclusion comes from (4.1), and the second from (4.2).
Lemma 4.2.5 (Quadrant lemma).
?? ? U(c)#(D(X ) QL(b)) ? #(D(X) ?Q
c
b)
Proof. This is exactly the statement that the dimension of the image of the middle
map of (4.1) is greater than or equal to the dimension of the image of the composition
of all three maps.
Lemma 4.2.6 (Box lemma). #(D(X ??) ? [U(a), L(b)]? [U(c), L(d)]) ? #(D(X) ?
83
[a, b]? [c, d]).
Proof. Note that we may assume order is preserved; U(a) < L(b) < U(c) < L(d),
otherwise there is nothing to prove, as the smaller box becomes the empty set.
U(c),L(d)
It suffices to find a surjection from a subspace of Hc,d ??a,b(X) onto HU(a),L(b)(X ),
as the dimensions of these vector spaces are exactly the quantities in the inequality.
Consider the following commutative diagram:
Ha?d
r
(X) 1 Hb?d(X)
s1
HU(a)?L(d)
r
(X ??) 2 HL(b)?L(d)(X ??)
u1 u2 u3 u4
HU(a)?U(c)(X ??
r
) 3 HL(b)?U(c)(X ??)
s2
a?c r4 s3H (X) Hb?c(X)
i1 i2
Ea?c
r?
(X) 4 Eb?c(X)
The vertical maps labeled ui are all natural maps as discussed in the previous
Section, as are the horizontal maps labeled ri (the ri maps are, in fact, inclusions,
while the ui are surjections).
We know there is a map from Hc(X) to HU(c)(X ??). We define Eb?c to be
the preimage of the kernel of u3 under this map, intersected with the subgroup
Hb?c(X). By construction, we therefore have a map, which we call s3, from E
b?c to
HL(b)?U(c)(X ??), with Im s3 ? keru3. The only requirements for being in the domain
84
of s3 are mapping to the kernel of u3 and being in H
b?>c(X). By the first inclusion in
(4.3), everything in the domain of u3 comes from something in H
b?>c(X) so in fact
we have Im s3 = keru3. We define E
a?c by intersecting this further with Ha?c(X).
The vertical i maps are thus inclusions. The map s2 comes from the inclusion in (3).
s1 comes from the inclusion obtained by the ?mirror principle? mentioned earlier.
As all the maps are natural, the diagram commutes. Thus u4 ? i2 = s1 ?u3 ?s3.
By construction, u3 ? s3 is zero, so u4 ? i2 is as well. We have a subspace inclusion
Eb?c/Ea?c ? Hc,da,b(X).
U(c),L(d)
We will now show that dim H ?? b?c a?cU(a),L(b)(X ) ? dimE /E , which will com-
U(c),L(d)
plete the proof. Recall that H ??U(a),L(b)(X ) = keru3/ keru2. By construction of
the E spaces, and the fact that the ri maps are inclusions of subspaces, we have
Eb?c/Ea?c = keru4/ keru1. By construction, s3 is a surjection of keru4 onto keru3.
If we can show that s2(keru1) is a subspace of keru2, we will be done. For any
h ? keru1, we have u3 ? s3 ? r?4(h) = 0, because s3(Eb?c) lies entirely inside the
kernel of u3. By commutativity of the diagram, r2 ? u2 ? s2(h) is also 0. As r2 is an
inclusion map, u2 ? s2(h) = 0, so we are done.
Corollary 4.2.7. If there is a point of the persistence diagram of X at (a, b) with
multiplicity ?, then the total multiplicity of the points of the persistence diagram of
X ?? within the rectangle [L(a), U(a)]? [L(b), U(b)] is at least ?.
Proof. Immediate application of box lemma.
By our mirror principle, we also have:
85
Corollary 4.2.8. If there is a point of the persistence diagram of X ?? at (a?, b?) with
multiplicity ?, then the total multiplicity of the points of the persistence diagram of
X within the rectangle [U?1(a?), L?1(a?)]? [U?1(b?), L?1(b?)] is at least ?.
At this point, we have found a map from the points of the persistence diagram
of X to those of the persistence diagram of X ?? which doesn?t move points ?very
far?. If we can strengthen this map to be a bijection, we will be done. We start
with an easy case; if the bijection of the weighted graph doesn?t change edge lengths
much, then maybe every resulting rectangle only has the minimum number of points
necessary for a bijection. In order to formalize this, we make the following:
Definition 4.2.9 (Resolution of a persistence diagram). Given a weighted graph
X, the k-dimensional resolution of the persistence diagram of X is the minimum
distance between two points of the persistence diagram for the k-dimensional ho-
mology, at least one of which is off the diagonal. Denoting this by ?X,k, we have
?X,k := min{?p1 ? p2?? | p1 ? Dk(X), p2 ? Dk(X)??}.
Taking the minimum over all k, we obtain the resolution of the persistence diagram
of X:
?X := min ?X,k.
Another way of considering resolution: if every point of D(X) has a square
with side length ?X centered on it, then no two squares will intersect, unless they
are both centered on the diagonal.
86
We have the following:
Lemma 4.2.10 (Easy bijection lemma of Cohen-Steiner, Edelsbrunner, Harer [45]).
Suppose ? : X ? X ?? is a bijection of weighted graphs which does not change the
pairwise distances by more than the resolution of X; ie, L(d) = d? and U(d) = d+
are bounding functions for ?, with  < ? .4X Then there is a bijection ? : D(X) ?
D(X ??) with ?p? ?(p)?? ?  for all p ? D(X).
Proof. From our Corollaries 4.2.7 and 4.2.8, if ? is the multiplicity of a point of
D(X) located at (a, b), we have
? ? #(D(X ??)?[a?, a+]?[b?, b+]) ? #(D(X)?[a?2, a+2]?[b?2, b+2]).
As the resolution of D(X) is larger than , the point at (a, b) is the only point of
D(X) within this rectangle, so we have a (local) bijection. Iterating this over every
off-diagonal point of D(X), the only way for a point of D(X ??) to remain unmatched
is if it is within  of the diagonal. Matching all such points with their nearest point
on the diagonal, we have thus constructed a bijection which moves no point further
than , and so are done.
Earlier, we mentioned we could take the point of view of X and X ?? as two
different assignations of edge lengths to the same underlying point set. We now
take that viewpoint, and put an entire family of edge lengths on this point set.
4In fact, these are not properly bounding functions as we have defined them, because they are
not invertible functions R+ ? R+; d+  is not onto, thus not invertible, and d?  has a codomain
which is slightly too large. As our point clouds are finite sets, we take any bounding function
which has the correct values at all pairwise distances. A bijection will never cause the distance
between two points to become non-positive, so this creates no issues.
87
Letting c be the maximum change in edge length, we denote the weighted graph
with accompanying function by X t, with t ? [0, c], where the edge length between
x1 and x2 in X
t is given by
?????min{(?x1 ? x2?+ t), ??(x1)? ?(x2)?}, ??(x1)? ?(x2)? > ?x1 ? x2?
length = ????max{(?x1 ? x2? ? t), ??(x1)? ?(x2)?}, ??(x1)? ?(x2)? < ?x1 ? x2?.
The X ts form a continuous family of spaces transitioning from X0 = X to
Xc = X ??. We will divide this into intervals which are small enough for us to use
Lemma 4.2.10 on each interval, and concatenate the bijections.
We are now in a position to prove
Theorem 4.2.11 (Generalized stability theorem). Let X be a point cloud, and let
? : X ? Rn be a map which approximately preserves distances, in the sense that
there are continuous, monotonic, invertible functions L(d), U(d) : R+ ? R+ with
L(d) ? d ? U(d) such that for any x1, x2 ? X we have L(?x1 ? x2?) ? ??(x1) ?
?(x2)? ? U(?x1 ? x2?). Then there exists a bijection ? between the points of the
persistence diagrams of the Rips complexes associated to X and ?(X) which satisfies
the following condition: if (a, b) is a point of the persistence diagram of X, and
?(a, b) = (a?, b?) a point of the persistence diagram of ?(X), then L(a) ? a? ? U(a)
and L(b) ? b? ? U(b).
Existence of a bijection. Note that the resolution of the persistence diagrams as-
sociated to the X ts are always positive numbers, ?Xt . The set of open intervals
? ?
J = (t? Xt , t+ Xtt ) is thus an open cover of the closed interval [0, c]. By compact-4 4
88
ness, it has a finite minimal subcover. Let t1 < t2 < ? ? ? < tn be the midpoints of the
intervals in some minimal subcover. By minimality, any two consecutive intervals
will have non-empty intersection, so we have
ti+1 ? ti ? (?Xti + ?Xti+1 )/4 ? max{?Xti , ?Xti+1}/2.
We thus have a family of bijections, which we shall label ? : X t ? X ti i i+1. These
each have bounding functions Ui(d) = d+ (ti+1 ? ti) and L(d) = d? (ti+1 ? ti). By
construction, this is less than the resolution of the relevant persistence diagram, so
by Lemma 4.2.10 there is a bijection ?i : D(X
ti)? D(X ti+1) with ?x? ?i(x)?? ?
t ? t .5i+1 i Chaining all these bijections together, we obtain a bijection between the
points of D(X) and those of D(X ?).
Analysis of distance traveled. How far does this bijection move a point of D(X)
which begins at (a, b)? To check this, we decompose the bijections into two steps:
let X ? be Rips complex where the edge length d(x1, x2) is given by the minimum of
their edge length in X and their edge length in X ??, and let X ? t be the continuously
varying family of weighted graphs between X and X ?, where the edge length between
x1, x2 in X
? t is given by
???????x1 ? x2?, ??(x1)? ?(x2)? > ?x1 ? x2?length = ???max{(?x1 ? x2? ? t), ??(x1)? ?(x2)?}, ??(x1)? ?(x2)? < ?x1 ? x2?.
5Because we are taking the max of the two ??s, we don?t know if we have a bijection from
D(Xt)? D(Xt+1) or vice-versa. But it?s a bijection, so this is fine.
89
We thus have X ? 0 = X and X ? 1 = X ?. With the same set of ti?s as before,
we label these contracting bijections ?? : X ? ti ? X ? ti+1i . What are the bounding
functions for these ?i? The edge lengths are never increasing, so we may take
Ui(d) = d. For Li(d), our global bound has improved by tj+1 ? tj after each step,
or ti total. Obviously, no lower bounding function can be above the identity, so we
have ??????d, L(d) + ti > dLi(d) = ???max{L(d) + ti, d? (ti+1, ti)} else.
By Corollary 4.2.7, each of the bijections may only move the point(s) at (a, b)
left and down. The movement is guaranteed to stop at or before the time when
L(d) + ti = d, so the composition of the bijections results in a point beginning at
(a, b) ending within the rectangle [L(a), a]? [L(b), b].
The same argument applies to expanding X ? to X ??, moving points up and to
the right, where they end within the rectangle [L(a), U(a)]? [L(b), U(b)].
This theorem resembles, but is not identical to, the stability theorem of Cohen-
Steiner, Edelsbrunner, and Harer (Theorem 4.2.1). Their theorem allows us to take
persistent homology of many kinds of functions, not merely functions which give
distance to a point cloud, and our Theorem 4.2.11 here is restricted to that case.
In that case, our theorem gives a result which mirrors theirs if we let L(t) = t? ,
U(t) = t + . As our theorem concerns Rips complexes, and theirs concerns C?ech
complexes, this mirroring is not a direct overlap.
While there may be some concern that the C?ech complex is more an object
90
of interest than the Rips complex, in practical applications involving point clouds,
almost all calculations are done using the Rips complex or approximations or spar-
sifications of it. Indeed, Otter et al.?s paper [107] lists seven computational topology
software packages which compute persistent homology from a point cloud. Six of
the seven have the Rips complex as as option?for the most state-of-the-art, it is the
only option?and no other complex has more than three software packages. This
theorem is thus more directly applicable than one which concerns the C?ech complex
itself.
Also note that if U(t) becomes very large after some point?eg, if we have
?????t(1 + ), x ? c,
U(t) = ?????, x > c,
then, although we have no guarantees about persistent homology classes which come
into existence after c, any that began existing before c will still exist, and will not
have shrunk ?too much?. Similarly, if we have bounds which are relatively tight after
c, but very loose beforehand (perhaps we have both a source of noise we wish to
model as a perturbation of up to a constant magnitude, and a distortion that could
affect the pairwise distances, a distortion we are modeling as being bounded by a
constant multiple), we may have better guarantees about any persistent homology
classes which die after c than those which die early.
91
4.3 Using stability for linear dimension reduction
In addition to the theoretical interest, this result will allow us to speed up the
computations of persistent homology. For a given point cloud, it isn?t obvious how
perturbing the points will lead to swifter computations.6 However, reducing the
ambient dimension of the point cloud can frequently speed up computations. If the
number of points in X is higher than the ambient dimension (which it almost always
is), we don?t expect to embed X into a lower dimension perfectly isometrically.
However, with high probability we can still do so approximately, by the following:
Theorem 4.3.1 (Johnson-Lindenstrauss Lemma [54, 87]). Let X be a point cloud
in RD, and fix 0 <  < 1, and
d ? 4 ln (#X).
2/2? 3/3
There exists a linear map ? : RD ? Rd which approximately preserves distances, in
the sense that for every x, y ? X we have
(1? )?x? y?2 ? ??(x)? ?(y)?2 ? (1 + )?x? y?2.
Note that by taking square roots, we obtain the inner inequalities of the fol-
6Indeed, to the best of the author?s knowledge, in general it won?t.
92
lowing
? ?
(1? )?x? y? ? 1? ?x? y? ? ??(x)??(y)? ? 1 + ?x? y? ? (1 + )?x? y?,
where the outer inequalities are true because  is less than 1, and squaring decreases
positive numbers less than 1 and increases those greater than 1.
Definition 4.3.2 (-JL projection for X). An -JL projection of a point cloud X
is a linear function ? : RD ? Rd such that any two points x1, x2 of X satisfy
(1? )d(x1, x2) ? d(?(x1), ?(x2)) ? (1 + )d(x1, x2).
While there exist versions of Theorem 4.3.1 that include information on the
likelihood of a projection onto a random subspace preserving distances, we will not
make use of that. We will instead take an -JL projection as given.
Pairing the two theorems seems natural?we have one which states that di-
mension reduction doesn?t shift distances very much, and another which states that
if the distance function doesn?t change much then neither does the persistent ho-
mology.
Corollary 4.3.3. Let X be a point cloud in RD, and let ? be an -JL projection. If
h is an element of the persistence diagram of the Rips complex associated to X with
birth time tb and end time te, then there is an element of D(?(X)) with birth time
in [tb(1? ), tb(1 + )] and end time in [te(1? ), te(1 + )].
93
Proof. Follows from Theorem 4.2.11 using L(t) = t(1? ) and U(t) = t(1 + ).
Corollary 4.3.3 is of interest for two reasons. From one point of view, the fact
that random projection has a high probability of preserving persistent homology
serves as validation of random projection as a dimension reduction scheme. In
a much more practical way, this gives us a method by which we might hope to
reduce the computation times of persistent homology. Chapter 5 takes the second
perspective, and we will evaluate its usefulness there.
4.4 Other dimension reduction methods
The above property of approximately preserving distances is a desirable one in
any dimension reduction method, so we may expect many dimension reduction meth-
ods to generally preserve persistent homology, not just the reduction of ?project onto
a random subspace?. As an example, we consider the following dimension reduction
method, which we will refer to as Grammian reduction. Essentially equivalent to
principal component analysis, or PCA, it is a prototypical example of a dimension
reduction method, and the forefather of basically all spectral methods of dimension
reduction [77,117].
1. Given a point cloud X with |X| = n, we construct the Gram matrix, G, whose
(i, j)th entry is the inner product < xi, xj >. G is, of course, n? n.
2. As the Gram matrix is symmetric, the spectral theorem tells us it factors as
G = QDQT , where Q is an orthogonal matrix and D is diagonal. Again, Q
94
and D are n ? n. Without loss of generality, we order the eigenvalues of D
and the columns of Q so that i < j =? Di,i ? Dj,j.
3. We can truncate D into a smaller k ? k matrix, D?, which contains just the
largest k eigenvalues, and similarly truncate Q into an n? k matrix Q?.
4. Observe that if we let X? = D?1/2Q?T , then the columns of X? form a set of n
points inside Rk, and their Gram matrix G? = X?T X? ? G, so the columns of X?
are a family of points whose pairwise distances is approximately equal to the
pairwise distances between elements of X.
Theorem 4.4.1. Suppose X is a point cloud with n points, and Gram matrix G =
QDQT , where Q is orthogonal and D is diagonal, with values Di,i = ?i and X? is a
point cloud with Gram matrix G? = QD?QT , and D?i,i := ?i for i ? k, and D?i,i := 0 for
i > k. If h is an element of the persistence diagram of the Rips complex associated
with X with birth time tb and death time td, the persistence diagram for the Rips
complex of X? has an element with birth time t? and death time t? , which satisfy
????? ???? ???? ???n n n ?ntb? 2 ?i ? t? ? t ?b + 2 ?i and td? 2? ?i ? t? ? t + 2?d ?i.
i=k+1 i=k+1 i=k+1 i=k+1
Proof. The original point cloud is isometric to the columns of D1/2QT , so we can
view this as a perturbation moving each point by a succession of orthogonal moves.
As Q is orthogonal, each movement is by at most the eigenvalue being eliminated.
Changing any two eigenvalues results in two movements orthogonal to each other,
95
so the total magnitude of the movement is, by the Pythagorean Theorem, bounded
by the squareroot of the squares of the values of D1/2 other words, by the squareroot
of the sum of the eigenvalues. Thus the distances move by at most twice that, and
so, by Theorem 4.2.11, this Grammian reduction approximately preserves persistent
homology.
While the entire theory of dimension reduction techniques is far too large
to go into here, we wish to emphasize that any theoretical guarantee of distance
preservation on any scale translates directly via Theorem 4.2.11 to a theorem on
the preservation of persistent homology. ISOMAPS, Locally Linear Embeddings,
Laplacian Eigenmaps, Schro?dinger eigenmaps, and diffusion mapping are all tech-
niques which have either local distance convergence guarantees or strong reasons
to expect such to be true, thus we expect there to be a version of Theorem 4.4.1
for each of them. New dimension reduction methods are proposed frequently; see
Njeunje [105] for an example. We hope this proof serves as a template for future
proofs for many dimension reduction techniques.
4.5 Directions for further research
Theorem 4.2.11 as we have proven it applies only to Rips complexes. If I is
a subset of points in X, then let dI be the radius of the smallest ball enclosing
all points in I. If, at each of the dI times, we have approximate pairwise distance
preservation, in the sense of Theorem 4.2.11, then a C?ech complex version of our
result should hold.
96
If the U(t) and L(t) functions are replaced by a single probability distribution,
it seems clear that a probabilistic verstion of Theorem 4.2.11 should still obtain.
Moreover, it seems intuitively clear that an element of persistent homology with
many representatives should be more stable, and while the deterministic version of
our result cannot take advantage of a many-representative situation, such a proba-
balistic version should be able to. As we will see in the next Chapter, the extent to
which persistent homology is preserved under random projection is higher than the
theory might lead us to expect.
97
Chapter 5: Experimental results on computation times of persistent
homology
In the previous Chapter, we showed that persistent homology can be preserved
through reduction of the ambient dimension. A natural question to now explore is
how does the speed of calculating the persistent homology of a point cloud scale
with ambient dimension in practice? Is Corollary 4.3.3 of practical significance, or
merely theoretical interest?
While there has been a great deal of work in speeding up the computation
of persistent homology (see our Section 1.2 for some examples), the author is not
aware of many tests of the practical effect of these in the literature. We thus take
this opportunity to fill in the gap by running experiments intended to establish how
time to calculate persistent homology scales with several other variables of interest
and make some conjectures based on the results we obtain. We hope these serve to
guide future research.
5.1 Datasets and software used for benchmarks
We use datasets from [107], which have been put forward as a standard collec-
tion of benchmarks for persistent homology speed computations and since been used
98
Dataset denotation Filename on github # Points Dim
Celegans celegans weighted undirected reindexed for 297 202
matlab.txt maxdist 2.6429 SP distmat.txt
point cloud.txt
Dragon1000 dragon vrip.ply.txt 1000 .txt 1000 3
Dragon2000 dragon vrip.ply.txt 2000 .txt 2000 3
Fractal fractal 9 5 2 linear edge list.txt 512 257
1866.1116 point cloud.txt
HIV1 HIV1 2011.all.nt.concat.fa hdm.txt 1088 673
point cloud.txt
House house104 edge list.txt 0.72344 point cloud.txt 445 261
Human human gene2 sampled reindexed for matlab.txt 1397 688
maxdist 91.9097 SP distmat.txt point cloud.txt
Klein klein bottle pointcloud new 400.txt 400 3
Network network379 edge list.txt 38.3873 point cloud.txt 379 300
Random random point cloud 50 16 .txt 50 16
Senate senate104 edge list.txt 0.68902 point cloud.txt 103 60
Vicsek Vicsek particles 300 distance 1 noise 300 3
0.1 v0 0.03 box 25 timestep 1500 of 3000.txt
Table 5.1: List of datasets taken from Nina Otter?s github page [106] and used for
experiments in this chapter.
for that in many places (e.g. Morozov and Nigmetov [100] or just last year in Pe?rez
et al. [110]). We use two software packages, Dionysus 2 [99] and Ripser [15,123].1
5.1.1 Datasets
For ease of replicability of results, we give in Table 5.1 the correspondance
between the longer file names used by Otter et al. in [107] and graciously made
available at [106], and the shorter names which we shall use to refer to the datasets.
We also include images of their first two persistent homology diagrams in Figure 5.1.
While the datasets appear relatively unstructured, there are still things to note.
1The original Ripser package was written by Ulrich Bauer in C++. The version we use here is
a python adaptation, written by Christopher Tralie, Nathaniel Saul, and Rann Bar-On.
99
Celegans Dragon1000 Dragon2000
Fractal HIV1 House
Human Klein Network
Random Senate Vicsek
Figure 5.1: Persistence diagrams for the datasets in H0,H1. Note the expected
extremely high persistence element of H1 in Klein, as the Klein bottle has non-
trivial 1st homology. Also note the high isolated element of H0 in Senate?this
could indicate a senator or small group of senators with very distinctive voting
behaviour, or (more likely) the Democratic/Republican split.
100
1. Celegans is a neural net derived from the nervous system of the roundworm.
Each point is a neuron, and the distances between the points are given by the
inverse of the weight between the neuronal connections.
2. Dragon1000 and Dragon2000 consist of points on the surface of a statue
of a small dragon.
3. Fractal is a synthetic self-similar network. Due to the self-similarity, note
that the H0?s are very evenly distributed
4. Each point of HIV1 is a genomic sequence sampled from HIV, with distances
given by the Hamming distance. As Hamming distances tend to be large,
there are no low-persistence points of H0?the points are fairly isolated from
each other.
5. House and Senate represent the voting behaviour of the 104th US Congress.
Nodes are more or less similar depending on how often they voted together.
Note the high isolated element of H0 in Senate?this could indicate a senator
or small group of senators with very distinctive voting behaviour, or (more
likely) the Democratic/Republican split. The lack of a similar point in House
is due to the higher number of ?moderate? representatives.
6. Each point of Human is a human gene, with edge lengths given by correlation
level of expression of corresponding genes.
7. Klein is a synthetic data set sampled from a Klein bottle immersed into R3.
Hence, the presence of a very highly persistent element of H1.
101
8. Each node of Network is a scientist publishing papers on networks. Distances
are given by inverses of number of collaborations in publication history.
9. Random is a synthetic dataset consisting of random points drawn indepen-
dantly from a uniform distribution on a unit hypercube. The relatively tight
clustering of the H0 points is due to the infamous ?curse of dimensionality??
distances between random points become more similar as the ambient dimen-
sion increases.
10. Vicsek is a synthetic dataset modeling points contained in a box.
We might intuitively expect that, given a list of genes, if the expression of
each is well-correlated with the expression of the adjacent genes on the list, the
expressions of any two genes on the list should be well-correlated. Similarly, we
may expect a similar pattern with voting records. We see this expectation fulfilled
in the relative flatness of the H1 diagrams for the Human, Senate, and House
datasets. We have no such expectation for points on the surface of a statue, and
this is reflected in the cloud of off-diagonal points in the two Dragon datasets.
These datasets represent a wide cross section of some of the uses of persistent
homology. They contain both real-world and synthetic data, both naturally geomet-
ric and non-geometric data. Many of them have been used in published works using
persistent homology to study other fields (See [107] for more details). They thus
represent a good cross-section for us to study the practical aspects of computing
persistent homology. In the remainder of this chapter, we will measure the speed of
performing persistent homology calculations with these datasets.
102
5.1.2 Software
We benchmark against two sets of software to avoid any software-package spe-
cific effects. We use Ripser, as it is the fastest available software, and Dionysus as it
is one of the easiest to use. Dionysus represents a relatively straightforward imple-
mentation of the standard algorithms, and was written/is maintained by Morozov,
who is responsible for many of the original results in the field of computational
topology. Ripser takes advantage of many adaptations proposed earlier.2 The au-
thor has been unable to locate in the literature a detailed theoretical breakdown
of how much each of these speeds up the computations, although Bauer has some
experimental results in [15]. As used within this chapter, each takes in a list of
lists of floating point numbers, together with a maximum homology dimension, and
returns the persistent homology of the point cloud for that dimension and all lower.
5.2 Experiments for a baseline of persistent homology computation
speed
We expect the calculation speed of persistent homology to scale with several
factors. Most obviously, the speed of the calculation should depend on the number
of points in the point cloud, and whether we calculate just H[0], or H[0] and H[1],
and so forth. Moreover, we might expect it to depend on the intrinsic dimension of
the point cloud?the smallest Rd which the point cloud isometrically embeds into?as
2The Ripser paper itself [15] has a long section detailing influences and giving a geneology of
the major improvements, which we will not replicate here.
103
well as the ambient dimension.3 In this section, we run experiments attempting to
isolate the practical effect of each of these.
5.2.1 Scaling with number of points
A fully naive algorithm calculating H0,H1,H2 for a point cloud with( n) points
would calculate the birth times of all 3-simplices, which requires roughly n , which
4
is O(n4) simplices, followed by matrix reduction on the square matrix, which is
(# of simplices) ? (# of simplices), and as mentioned earlier, the simplest matrix
reduction algorithm on an n? n matrix is O(n3) . The two programs we are using
here are far from naive, so we expect them to do better than the implied O(n12)
bound, and they do.
Table 5.2 shows the results of an experiment to check the scaling of Dionysus
computation times as number of points increases.
We plot the best fit quartic in Figure 5.2. Interestingly, the residual of the
best-fit quartic for the final point is very small, suggesting that the best-fit quartic
for the initial nine points was successfully predicting the tenth point?a good sign
for our curve.
We run the same experiment with Ripser (Table 5.3 and Figure 5.3) and obtain
substantially the same result. While Ripser is much, much faster, it doesn?t scale
differently. This is perhaps unsurprising; several of Ripser?s improvements may
be expected to reduce the number of simplices for which calculations need to be
3Indeed, if the speed doesn?t depend on ambient dimension, our result of the previous chapter
will not help us in speeding it up.
104
# of points mean min max std dev
5 0.00005 0.00004 0.00013 0.00002
8 0.00023 0.00022 0.00027 0.00001
11 0.001 0.0009 0.0011 0.0001
17 0.009 0.007 0.015 0.001
25 0.048 0.043 0.055 0.003
38 0.355 0.308 0.504 0.049
57 2.366 2.08 2.888 0.2
85 17.437 15.44 21.061 1.269
128 118.477 107.598 139.04 7.827
192 849.037 754.377 961.577 51.48
288 6250.355 - - -
Table 5.2: Dionysus computation of subsamples of Dragon2000 dataset. Each row
except the last of this table is a set of 20 runs of Dionysus to calculate H0,H1,H2
for a random subsample of the dataset Dragon2000. The number of points in the
subsample is increased by 50% each row. The last row represents only a single run,
due to prohibitively high run time. For the set of 20 times, the mean, min, max,
and standard deviation are all given. All times are in seconds.
Figure 5.2: A graph of the points of Table 5.2, with the best-fit quartic line. Both
the polynomial and the image were created with Wolfram Alpha. The r2 value is
.999997.
105
Figure 5.3: A graph of the points of Table 5.3, with the best-fit quartic line. Both
the polynomial and the image were created with Wolfram Alpha. The r2 value is
greater than .999999.
performed by a constant factor but this does not necessarily improve scaling.
We conjecture that the implementation of persistent homology calculation for
Hd and below of a point cloud of n points in Dionysus is O(n
d+2), and the same for
Ripser.
5.2.2 Scaling with number of homology groups computed
Most applications of persistent homology in the literature that the author is
familiar with involve at most computations of H0 and H1. This is understandable?
the ?meaning? of non-trivial elements of the higher order homology groups of a
point cloud quickly becomes unclear. It is fortunate for us that these are rarely
useful in applications, as the expectation is that these higher groups would grow
to be unweildy to compute very swiftly, an expectation we will see confirmed here.
106
# of points mean min max std dev
50 0.026 0.025 0.03 0.001
75 0.084 0.080 0.097 0.003
112 0.181 0.173 0.199 0.005
169 0.602 0.569 0.645 0.018
253 2.191 2.02 2.272 0.068
380 7.937 7.272 8.397 0.339
570 27.947 24.783 29.468 1.226
854 98.891 91.065 110.853 5.226
1281 382.773 348.107 411.482 17.295
1922 1624.179 1437.346 1734.625 81.949
Table 5.3: Ripser computation of subsamples of Dragon2000 dataset. Each row of
this table is a set of 20 runs of Ripser to calculate H0,H1,H2 for a random subsample
of the dataset Dragon2000. The number of points in the subsample is increased by
50% each row. For the set of 20 times, the mean, min, max, and standard deviation
are all given. All times are in seconds.
Moreover, although testing this is beyond the scope of this dissertation, we expect
each additional higher homology group to make the scaling times with number of
points worse.
In Table 5.4, we show the time for Dionysus to compute the first several homol-
ogy groups of Random.4 Computational limitations prevent us from calculating
a high sample from H6, or indeed calculating any groups beyond that. With this
small a number of points, accurate curve fitting is not feasible, but it seems that
log of the number of seconds is increasing at a nearly constant rate. Table 5.5 gives
the same information for Ripser, with very similar results. As noted b(ef)ore, for the
naive algorithm to calculate Hk of n points requires calculations for
n simplices,
k
which (for k << n) grows roughly as O(nk). We thus observe that both theory and
observation is consistent with exponential growth, and feel some relief that these
4Random was picked due to low enough dimensionality to be tractable, but high enough to
have nontrivial higher homology.
107
Highest Hi computed mean min max std dev
0 0.019 0.018 0.03 0.003
1 0.29 0.287 0.308 0.004
2 3.672 3.649 3.699 0.012
3 36.161 35.999 36.553 0.149
4 285.756 284.89 287.456 0.516
5 1998.12 1978.995 2005.671 5.689
6 15041.805 - - -
Table 5.4: Dionysus computation of Random dataset. Each row of this table except
the last is a set of 20 runs of Dionysus to calculate H0 through Hi for the dataset
Random. The maximum homology dimension calculated increases each row. The
last row represents only a single run, due to prohibitively high run time. For the set
of 20 times, the mean, min, max, and standard deviation are all given. All times
are in seconds.
Highest Hi computed mean min max std dev
0 0.001 0.001 0.003 0
1 0.003 0.003 0.005 0
2 0.028 0.027 0.032 0.001
3 0.171 0.152 0.189 0.015
4 1.522 1.497 1.609 0.027
5 12.731 12.598 13.077 0.111
6 85.465 84.58 86.007 0.366
Table 5.5: Ripser computation of Random dataset. Each row of this table is a
set of 20 runs of Ripser to calculate H0 through Hi for the dataset Random. The
maximum homology dimension calculated increases each row. For the set of 20
times, the mean, min, max, and standard deviation are all given. All times are in
seconds.
108
intrinsic dimension mean min max std dev
3 3.773 3.76 3.809 0.012
53 3.786 3.766 3.872 0.025
103 3.774 3.765 3.804 0.012
153 3.764 3.757 3.796 0.011
203 3.762 3.74 3.818 0.016
253 3.756 3.749 3.789 0.011
303 3.76 3.749 3.794 0.015
353 3.81 3.748 4.552 0.171
403 3.754 3.732 3.795 0.012
453 3.769 3.742 4.02 0.06
Table 5.6: Ripser computation of the Vicsek dataset with dimensional noise.
Ripser computation of the Vicsek dataset with dimensional noise. Each row of
this table is a set of 20 runs of Ripser to calculate H0,H1,H2 for the dataset
Vicsek. First, the points are randomly rotated into RIntrinsic dimension. Then, a
small amount of noise is added to them (The noise is normally distributed with
mean 0 and standard deviation 5% of the diameter of the point set). Then the
points are randomly rotated into 1000 dimensions, and their persistent homology
is calculated. For the set of 20 times, the mean, min, max, and standard deviation
are all given. All times are in seconds.
higher homology groups are not desired for most applications.
5.2.3 Scaling with intrinsic dimension of the dataset
We don?t expect the calculation of persistent homology to depend much, if
at all, on the intrinsic dimension of the dataset. In Table 5.6, we see that indeed,
it does not?adding higher dimensional noise to the dataset has almost no effect
on the times to calculate persistent homology. Even going from 3 dimensions to
453 dimensions has almost no effect on the average time to calculate the persistent
homology through H2.
We do not perform this calculation with Dionysus because, as we will see in
the next Section, Dionysus scales quite badly with ambient dimension.
109
5.2.4 Scaling with ambient dimension
The extent to which the speed of persistent homology calculations depends
upon various parameters will, of course, change depending not only on what al-
gorithms are used to calculate it, but on the exact methods the software uses to
implement those calculations. Nowhere is that more apparent than in the effect of
the ambient dimension upon the speed of calculation. Ripser takes the input point
cloud, and immediately converts it to a matrix of pairwise distances; from then
on, a ?point? is merely a reference to a certain column of a matrix. Dionysus, on
the other hand, continues to view a point in Rd as a list of d coordinates, so any
manipulations of it require manipulating all the coordinates. Thus, a reduction in
the ambient dimension affects the computation time of Dionysus in almost direct
proportion. Ripser, in contrast, is affected through the first step, which is computa-
tionally already very fast, and further effects are primarily through changes which
affect the actual simplices and matrix reduction itself.
Table 5.7 shows the computation time for Dionysus to calculate the through
H1 of the Random dataset. The scaling is roughly linear in the number of ambient
dimensions. Ripser, on the other hand, is almost totally insensitive to the number
of ambient dimensions, as we will see in the next section.
5.3 Calculation of persistent homology via random projections
Corollary 4.3.3 in Chapter 4 tells us that persistent homology can be approxi-
mately preserved through random projections. In this Section, we test the increase
110
Ambient dimension mean min max std dev
1024 11.142 11.111 11.194 0.02
512 5.62 5.616 5.676 0.013
256 2.865 2.864 2.868 0.001
128 1.491 1.49 1.505 0.003
64 0.759 0.756 0.765 0.002
32 0.42 0.415 0.422 0.002
16 0.242 0.24 0.245 0.002
Table 5.7: Dionysus computation of the Random dataset isometrically embedded
in larger dimensions. Each row of this table is a set of 20 runs of Dionysus to
calculate H0,H1 for the dataset Random, after the points are randomly rotated
into Rambient dimension. Notice that halving the ambient dimension results in roughly
halving the computation time. For the set of 20 times, the mean, min, max, and
standard deviation are all given. All times are in seconds.
in speed this gives us with the fastest currently available software. Tables 5.8, 5.9,
5.10, 5.11, 5.12, 5.13 show the time it takes to calculate each dataset when it has
been random rotated into 1000 dimensions, and compares it to the time to calculate
after being randomly projected back down.
As the comparative importance of the time to calculate the matrix of pairwise
distances changes dramatically depending on how many homology groups we are
calculating, we include the results with just H0, H1 with H0, and all three of the
lowest groups through H2.
In the H0 alone case, the conversion from point cloud to distance matrix is
responsible for a higher portion of the run time. Thus, the advantages of random
projection are more pronounced. The run times for every one of our benchmark
datasets is decreased, on median by roughly 8%. One third of the datasets showed
speed increases ranging from nearly 25% to over 50%!
In the H1 case, our picture is still very promising. All but one dataset showed
111
Dataset mean min max std dev
Celegans (pre) 0.025 0.024 0.026 <.001
Celegans (post) 0.023 0.022 0.024 <.001
Dragon1000 (pre) 0.222 0.208 0.227 0.003
Dragon1000 (post) 0.217 0.189 0.221 0.007
Dragon2000 (pre) 0.895 0.889 0.919 0.007
Dragon2000 (post) 0.875 0.857 0.894 0.014
Fractal (pre) 0.075 0.075 0.075 <.001
Fractal (post) 0.047 0.047 0.048 <.001
HIV1 (pre) 0.257 0.255 0.257 0.001
HIV1 (post) 0.226 0.226 0.227 <.001
House (pre) 0.057 0.056 0.057 <.001
House (post) 0.043 0.042 0.043 <.001
Human (pre) 0.417 0.413 0.419 0.002
Human (post) 0.405 0.383 0.414 0.012
Klein (pre) 0.045 0.045 0.045 <.001
Klein (post) 0.042 0.042 0.043 <.001
Network (pre) 0.041 0.041 0.042 <.001
Network (post) 0.038 0.036 0.039 0.001
Random (pre) 0.002 0.002 0.002 <.001
Random (post) 0.001 0.001 0.001 <.001
Senate (pre) 0.005 0.005 0.005 <.001
Senate (post) 0.003 0.003 0.003 <.001
Vicsek (pre) 0.026 0.025 0.027 <.001
Vicsek (post) 0.024 0.023 0.026 0.001
Table 5.8: Ripser computation of many datasets, first randomly rotated into 1000
dimensions, then projected down to 10. Each row of this table is a set of 20 runs of
Ripser to calculate H0. For each set of 20 times, the mean, min, max, and standard
deviation are all given. All times are in seconds.
112
Dataset mean min max std dev
Celegans difference 0.002 0.002 0.003 <.001
% difference 8.478 7.9 11.482 0.707
Dragon1000 difference 0.005 -0.011 0.034 0.008
% difference 2.369 -5.266 15.193 3.541
Dragon2000 difference 0.02 -0.001 0.037 0.012
% difference 2.284 -0.151 4.041 1.301
Fractal difference 0.028 0.027 0.028 <.001
% difference 36.943 36.558 37.397 0.226
HIV1 difference 0.03 0.029 0.031 0.001
% difference 11.833 11.351 12.238 0.248
House difference 0.014 0.013 0.015 <.001
% difference 24.383 23.78 25.537 0.407
Human difference 0.012 0.003 0.032 0.011
% difference 2.893 0.786 7.775 2.639
Klein difference 0.003 0.003 0.003 <.001
% difference 6.673 5.597 7.556 0.531
Network difference 0.003 0.003 0.005 0.001
% difference 8.46 6.996 12.403 1.552
Random difference 0.001 0.001 0.001 <.001
% difference 53.377 51.346 55.234 1.425
Senate difference 0.002 0.002 0.002 <.001
% difference 41.845 40.434 43.082 0.83
Vicsek difference 0.002 0.001 0.003 0.001
% difference 6.357 3.261 10.508 2.59
Table 5.9: Ripser computation of many datasets, first randomly rotated into 1000
dimensions, then projected down to 10. Each row of this table is the differences
between a set of 20 runs of Ripser to calculate H0 pre- and post-projection. For
each set of 20 differences, the mean, min, max, and standard deviation are all given.
All times are in seconds.
113
Dataset mean min max std dev
Celegans (pre) 0.079 0.078 0.082 0.001
Celegans (post) 0.078 0.075 0.082 0.001
Dragon1000 (pre) 0.675 0.673 0.677 0.001
Dragon1000 (post) 0.668 0.61 0.719 0.027
Dragon2000 (pre) 3.541 3.422 3.696 0.068
Dragon2000 (post) 3.367 3.028 3.884 0.208
Fractal (pre) 0.185 0.183 0.188 0.001
Fractal (post) 0.165 0.145 0.187 0.014
HIV1 (pre) 1.353 1.344 1.379 0.008
HIV1 (post) 0.999 0.898 1.087 0.049
House (pre) 0.138 0.136 0.14 0.001
House (post) 0.12 0.106 0.143 0.013
Human (pre) 1.038 1.031 1.043 0.003
Human (post) 1.43 1.374 1.497 0.03
Klein(pre) 0.284 0.281 0.286 0.001
Klein (post) 0.213 0.142 0.329 0.052
Network (pre) 0.154 0.153 0.155 <.001
Network (post) 0.077 0.075 0.08 0.001
Random (pre) 0.004 0.004 0.004 <.001
Random (post) 0.003 0.002 0.003 <.001
Senate (pre) 0.011 0.011 0.011 <.001
Senate (post) 0.009 0.009 0.009 <.001
Vicsek (pre) 0.087 0.087 0.088 <.001
Vicsek (post) 0.049 0.046 0.053 0.002
Table 5.10: Ripser computation of many datasets, first randomly rotated into 1000
dimensions, then projected down to 10. Each row of this table is a set of 20 runs of
Ripser to calculate H1. For each set of 20 times, the mean, min, max, and standard
deviation are all given. All times are in seconds.
114
Dataset mean min max std dev
Celegans difference 0.0004 -0.001 0.004 0.001
% difference 0.514 -1.42 4.939 1.543
Dragon1000 difference 0.008 -0.042 0.066 0.027
% difference 1.124 -6.246 9.763 3.993
Dragon2000 difference 0.173 -0.237 0.515 0.211
% difference 4.868 -6.489 14.457 5.914
Fractal difference 0.019 -0.001 0.038 0.013
% difference 10.5 -0.456 20.863 7.124
HIV1 difference 0.354 0.263 0.455 0.048
% difference 26.179 19.498 33.619 3.582
House difference 0.018 -0.005 0.031 0.012
% difference 13.158 -3.263 22.713 8.858
Human difference -0.392 -0.454 -0.335 0.03
% difference -37.826 -43.496 -32.19 2.922
Kleinbottle difference 0.071 -0.045 0.142 0.052
% difference 25.076 -15.872 49.918 18.374
Network difference 0.077 0.073 0.079 0.001
% difference 50.131 47.921 51.125 0.809
Random difference 0.001 0.001 0.002 <.001
% difference 36.444 31.911 43.05 3.01
Senate difference 0.002 0.001 0.002 <.001
% difference 16.909 11.244 20.696 2.541
Vicsek difference 0.038 0.034 0.041 0.002
% difference 43.78 39.217 46.789 1.966
Table 5.11: Ripser computation of many datasets, first randomly rotated into 1000
dimensions, then projected down to 10. Each row of this table is the differences
between a set of 20 runs of Ripser to calculate H1 pre- and post-projection. For
each set of 20 differences, the mean, min, max, and standard deviation are all given.
All times are in seconds.
115
Dataset mean min max std dev
Celegans (pre) 3.534 3.495 3.986 0.104
Celegans (post) 3.527 3.314 3.718 0.106
Dragon1000 (pre) 166.588 165.612 169.937 0.831
Dragon1000 (post) 162.747 154.296 171.653 4.366
Dragon2000 (pre) 1716.02 1696.383 1739.451 10.616
Dragon2000 (post) 1751.798 1692.084 1833.825 38.766
Fractal (pre) 19.115 19.015 19.291 0.075
Fractal (post) 19.997 17.429 21.938 1.09
HIV1 (pre) 273.694 272.051 275.807 0.998
HIV1 (post) 247.278 226.706 266.279 10.406
House (pre) 13.73 13.645 13.844 0.045
House (post) 12.942 11.957 13.942 0.544
Human (pre) 504.035 501.405 505.167 0.809
Human (post) 554.189 504.662 581.235 17.456
Klein (pre) 14.852 14.737 14.986 0.05
Klein (post) 14.7 13.119 16.288 0.885
Network (pre) 12.281 12.193 12.349 0.046
Network (post) 8.132 7.48 8.406 0.231
Random (pre) 0.028 0.028 0.031 0.001
Random (post) 0.026 0.024 0.027 0.001
Senate (pre) 0.146 0.144 0.148 0.001
Senate (post) 0.111 0.106 0.129 0.005
Vicsek(pre) 3.729 3.709 3.793 0.02
Vicsek (post) 3.687 3.474 3.882 0.109
Table 5.12: Ripser computation of many datasets, first randomly rotated into 1000
dimensions, then projected down to 10. Each row of this table is a set of 20 runs of
Ripser to calculate H2. For each set of 20 times, the mean, min, max, and standard
deviation are all given. All times are in seconds.
116
Dataset mean min max std dev
Celegans difference 0.007 -0.142 0.268 0.112
% difference 0.153 -4.041 6.722 3.088
Dragon1000 difference 3.84 -1.716 11.316 3.998
% difference 2.31 -1.01 6.833 2.403
Dragon2000 difference -35.778 -117.059 21.915 38.649
% difference -2.087 -6.86 1.279 2.253
Fractal difference -0.882 -2.766 1.595 1.078
% difference -4.613 -14.426 8.383 5.637
HIV1 difference 26.416 7.674 46.131 10.485
% difference 9.65 2.801 16.908 3.829
House difference 0.788 -0.205 1.728 0.55
% difference 5.737 -1.495 12.625 4.001
Human difference -50.154 -77.709 -0.14 17.605
% difference -9.952 -15.433 -0.028 3.494
Klein difference 0.153 -1.448 1.742 0.884
% difference 1.029 -9.756 11.724 5.96
Network difference 4.15 3.87 4.814 0.233
% difference 33.789 31.526 39.157 1.886
Random difference 0.003 0.0004 0.005 0.001
% difference 9.477 1.512 15.898 3.76
Senate difference 0.035 0.015 0.039 0.005
% difference 23.756 10.558 26.991 3.453
Vicsek difference 0.041 -0.164 0.296 0.12
% difference 1.1 -4.419 7.806 3.204
Table 5.13: Ripser computation of many datasets, first randomly rotated into 1000
dimensions, then projected down to 10. Each row of this table is the differences
between a set of 20 runs of Ripser to calculate H2 pre- and post-projection. For
each set of 20 differences, the mean, min, max, and standard deviation are all given.
All times are in seconds.
117
an increase, and the median speedup surprisingly increased, nearly doubling to
roughly 15%! Nearly half the datasets had speed increases of over 25%, and again
the greatest increase was over 50%.
Random projection results in less of a speed up in the H2 case. While the
median case is still faster post-projection, it is only by about 1.7%, and, while two
of the datasets showed were sped up by significantly more than 10%, it is only those
two. Additionally, for three datasets, the computations were actually slowed by the
lower ambient dimension.
As we have previously discussed, applied computational topology rarely makes
use of the homology groups above H1 in practice, so random projection may be useful
in cases where a sacrifice of accuracy in favor of speed is desirable.
5.3.1 Reasons to expect computational slowdown or speedup
Ideally, we would be able to characterize what datasets are good candidates
for significant speedup or slowdown through random projection. In particular, the
datasets Human and HIV1 are similar on many metrics?similar number of points,
similar ambient dimension, both modeling genomic data, albeit with different mod-
els.
We can?t even claim that the computational speedup for H0 is correlated with
that for H2?the datasets Fractal and Random each have very significant speedups
for H0, but Fractal experiences the second-worst slow-down for H2, while Random
is one of the four most positively effected.
118
5.4 Heuristics for better preservation than guaranteed
We close with some discussion of the accuracy of this method of estimating
persistent homology. Ramamurthy, Varshney, and Thiagarajan [113] have given
some empirical evidence that persistent betti numbers can be approximately recon-
structed with high probability after random projection. And even in cases that the
Johnson Lindenstrauss Lemma (Theorem 4.3.1) does not give any guarantees, it is
not uncommon for significant features of homology to be well-preserved, as in Figure
5.4.
119
Figure 5.4: Persistence diagrams for the Dragon1000 dataset, randomly rotated in
1000 dimensions, and projected down to 10. Notice how, in most of the diagrams,
a green dot representing an element of H2 has high persistence. On the other hand,
notice the difference in scales?between images 5 and 10, the vertical scale nearly
doubles! To avoid unconscious bias effects, Dragon1000 was the first dataset the
author performed this with, and these are the results of the first 20 attempts.
120
Bibliography
[1] H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Che-
pushtanova, E. Hanson, F. Motta, and L. Ziegelmeier. Persistence images:
A stable vector representation of persistent homology. Journal of Machine
Learning Research, 18, 2017.
[2] R. J. Adler, O. Bobrowski, M. S. Borman, E. Subag, S. Weinberger, et al.
Persistent homology for random fields and complexes. In Borrowing strength:
theory powering applications?a Festschrift for Lawrence D. Brown, pages 124?
143. Institute of Mathematical Statistics, 2010.
[3] S. Agostinelli, J. Allison, K. a. Amako, J. Apostolakis, H. Araujo, P. Arce,
M. Asai, D. Axen, S. Banerjee, G. . Barrand, et al. GEANT4?a simulation
toolkit. Nuclear instruments and methods in physics research section A: Accel-
erators, Spectrometers, Detectors and Associated Equipment, 506(3):250?303,
2003.
[4] H. Alt. Computational discrete mathematics: advanced lectures, volume 2122.
Springer, 2003.
[5] A. Antoszewski, C.-J. Feng, B. P. Vani, E. H. Thiede, L. Hong, J. Weare,
A. Tokmakoff, and A. R. Dinner. Insulin dissociates by diverse mechanisms
of coupled unfolding and unbinding. The Journal of Physical Chemistry B,
124(27):5571?5587, 2020.
[6] M. A. Armstrong. Basic topology. Springer Science & Business Media, 2013.
[7] S. Arya, J.-D. Boissonnat, K. Dutta, and M. Lotz. Dimensionality reduc-
tion for k-distance applied to persistent homology. Journal of Applied and
Computational Topology, 5(4):671?691, 2021.
[8] D. Attali and A. Lieutier. Geometry-driven collapses for converting a C?ech
complex into a triangulation of a nicely triangulable shape. Discrete & Com-
putational Geometry, 54(4):798?825, 2015.
121
[9] D. Attali, A. Lieutier, and D. Salinas. Vietoris?Rips complexes also provide
topologically correct reconstructions of sampled shapes. Computational Ge-
ometry, 46(4):448?465, 2013.
[10] W. Bae, J. Yoo, and J. Chul Ye. Beyond deep residual learning for image
restoration: Persistent homology-guided manifold simplification. In Proceed-
ings of the IEEE conference on computer vision and pattern recognition work-
shops, pages 145?153, 2017.
[11] R. Balan, M. Begue?, C. Clark, and K. Okoudjou. Optimization methods for
frame conditioning and application to graph Laplacian scaling. In Frames and
Other Bases in Abstract and Function Spaces, pages 27?45. Springer, 2017.
[12] R. G. Baraniuk and M. B. Wakin. Random projections of smooth manifolds.
Foundations of computational mathematics, 9(1):51?77, 2009.
[13] S. Basu and L. Parida. Spectral sequences, exact couples and persistent ho-
mology of filtrations. Expositiones Mathematicae, 35(1):119?132, 2017.
[14] U. Bauer. Ripser: Efficient computation of Vietoris?Rips persistence barcodes.
[15] U. Bauer. Ripser: efficient computation of Vietoris-Rips persistence barcodes.
Journal of Applied and Computational Topology, 2021.
[16] U. Bauer and H. Edelsbrunner. The Morse theory of C?ech and Delaunay
complexes. Transactions of the American Mathematical Society, 369(5):3741?
3762, 2017.
[17] P. Bendich, H. Edelsbrunner, D. Morozov, and A. Patel. The robustness of
level sets. In European Symposium on Algorithms, pages 1?10. Springer, 2010.
[18] E. Betti. Sopra gli spazi di un numero qualunque di dimensioni. Annali di
Matematica Pura ed Applicata (1867-1897), 4(1):140?158, 1870.
[19] S. Bhattacharya. Discrete optimal search library (dosl): A template-
based C++ library for discrete optimal search, 2017. Available at
https://github.com/subh83/DOSL.
[20] S. Bhattacharya, R. Ghrist, and V. Kumar. Persistent homology for path plan-
ning in uncertain environments. IEEE Transactions on Robotics, 31(3):578?
590, 2015.
[21] K. Borsuk. On the imbedding of systems of compacta in simplicial complexes.
Fundamenta Mathematicae, 35(1):217?234, 1948.
[22] A. K. Bousfield and D. M. Kan. Homotopy limits, completions and localiza-
tions, volume 304. Springer Science & Business Media, 1972.
[23] G. Bradski. The OpenCV Library. Dr. Dobb?s Journal of Software Tools,
2000.
122
[24] G. E. Bredon. Topology and geometry, volume 139. Springer Science & Busi-
ness Media, 2013.
[25] P. Bubenik, V. De Silva, and J. Scott. Metrics for generalized persistence
modules. Foundations of Computational Mathematics, 15(6):1501?1531, 2015.
[26] P. Bubenik et al. Statistical topological data analysis using persistence land-
scapes. J. Mach. Learn. Res., 16(1):77?102, 2015.
[27] P. Bubenik and P. T. Kim. A statistical approach to persistent homology.
Homology, homotopy and Applications, 9(2):337?362, 2007.
[28] P. Bubenik and J. A. Scott. Categorification of persistent homology. Discrete
& Computational Geometry, 51(3):600?627, 2014.
[29] M. Buchet, F. Chazal, S. Y. Oudot, and D. R. Sheehy. Efficient and robust
persistent homology for measures. Computational Geometry, 58:70?96, 2016.
[30] F. Cagliari, M. Ferri, and P. Pozzi. Size functions from a categorical viewpoint.
Acta Applicandae Mathematica, 67(3):225?235, 2001.
[31] E. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: Exact
signal reconstruction from highly incomplete frequency information. IEEE
Trans. Inform. Theory, 52(2):489?509, 2006.
[32] E. Candes and T. Tao. Near optimal signal recovery from random projections:
Universal encoding strategies? IEEE Trans. Inform. Theory, 52(12):5406?
5425, 2006.
[33] E. J. Candes, J. K. Romberg, and T. Tao. Stable signal recovery from incom-
plete and inaccurate measurements. Communications on Pure and Applied
Mathematics: A Journal Issued by the Courant Institute of Mathematical Sci-
ences, 59(8):1207?1223, 2006.
[34] G. Carlsson. Topology and data. Bulletin of the American Mathematical
Society, 46(2):255?308, 2009.
[35] G. Carlsson and V. De Silva. Zigzag persistence. Foundations of computational
mathematics, 10(4):367?405, 2010.
[36] G. Carlsson, V. De Silva, and D. Morozov. Zigzag persistent homology and
real-valued functions. In Proceedings of the twenty-fifth annual symposium on
Computational geometry, pages 247?256. ACM, 2009.
[37] G. Carlsson, T. Ishkhanov, V. De Silva, and A. Zomorodian. On the local
behavior of spaces of natural images. International journal of computer vision,
76(1):1?12, 2008.
[38] G. Carlsson and A. Zomorodian. The theory of multidimensional persistence.
Discrete & Computational Geometry, 42(1):71?93, 2009.
123
[39] E. W. Chambers, B. T. Fasy, and L. Ziegelmeier. Research in Computational
Topology. Springer, 2018.
[40] F. Chazal, D. Cohen-Steiner, M. Glisse, L. J. Guibas, and S. Y. Oudot. Prox-
imity of persistence modules and their diagrams. In Proceedings of the twenty-
fifth annual symposium on Computational geometry, pages 237?246, 2009.
[41] F. Chazal, B. Fasy, F. Lecci, B. Michel, A. Rinaldo, and L. Wasserman. Sub-
sampling methods for persistent homology. In International Conference on
Machine Learning, pages 2143?2151. PMLR, 2015.
[42] C. Chen and M. Kerber. Persistent homology computation with a twist. In
Proceedings 27th European workshop on computational geometry, volume 11,
pages 197?200, 2011.
[43] G. Christie, A. Shoemaker, K. Kochersberger, P. Tokekar, L. McLean, and
A. Leonessa. Radiation search operations using scene understanding with
autonomous uav and ugv. Journal of Field Robotics, 34(8):1450?1468, 2017.
[44] A. Cloninger. Exploiting data-dependent structure for improving sensor ac-
quisition and integration. PhD thesis, University of Maryland, College Park,
2014.
[45] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persistence dia-
grams. In Proceedings of the twenty-first annual symposium on Computational
geometry, pages 263?271. ACM, 2005.
[46] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Extending persistence using
Poincare? and Lefschetz duality. Foundations of Computational Mathematics,
9(1):79?103, 2009.
[47] D. Cohen-Steiner, H. Edelsbrunner, J. Harer, and D. Morozov. Persistent
homology for kernels, images, and cokernels. In Proceedings of the twenti-
eth annual ACM-SIAM symposium on Discrete algorithms, pages 1011?1020.
SIAM, 2009.
[48] W. Crawley-Boevey. Decomposition of pointwise finite-dimensional persis-
tence modules. Journal of Algebra and its Applications, 14(05):1550066, 2015.
[49] F. H. Croom. Basic concepts of algebraic topology. Springer Science & Business
Media, 2012.
[50] J. Curry, R. Ghrist, and V. Nanda. Discrete morse theory for computing cellu-
lar sheaf cohomology. Foundations of Computational Mathematics, 16(4):875?
897, 2016.
[51] W. Czaja and W. Li. Analysis of time-frequency scattering transforms. Applied
and Computational Harmonic Analysis, 47(1):149?171, 2019.
124
[52] W. Czaja and W. Li. Rotationally invariant time?frequency scattering trans-
forms. Journal of Fourier Analysis and Applications, 26(1):1?23, 2020.
[53] W. Czaja, B. Manning, L. McLean, and J. M. Murphy. Fusion of aerial gamma-
ray survey and remote sensing data for a deeper understanding of radionuclide
fate after radiological incidents: examples from the Fukushima Dai-Ichi re-
sponse. Journal of Radioanalytical and Nuclear Chemistry, 307(3):2397?2401,
2016.
[54] S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and
Lindenstrauss. Random Structures & Algorithms, 22(1):60?65, 2003.
[55] V. De Silva and G. E. Carlsson. Topological estimation using witness com-
plexes. SPBG, 4:157?166, 2004.
[56] V. De Silva and R. Ghrist. Coverage in sensor networks via persistent homol-
ogy. Algebraic & Geometric Topology, 7(1):339?358, 2007.
[57] V. De Silva, R. Ghrist, and A. Muhammad. Blind swarms for coverage in 2-d.
In Robotics: Science and Systems, pages 335?342, 2005.
[58] V. De Silva, D. Morozov, and M. Vejdemo-Johansson. Dualities in persistent
(co) homology. Inverse Problems, 27(12):124003, 2011.
[59] V. De Silva, D. Morozov, and M. Vejdemo-Johansson. Persistent cohomology
and circular coordinates. Discrete & Computational Geometry, 45(4):737?759,
2011.
[60] T. K. Dey, F. Me?moli, and Y. Wang. Multiscale mapper: Topological sum-
marization via codomain covers. In Proceedings of the twenty-seventh annual
acm-siam symposium on discrete algorithms, pages 997?1013. SIAM, 2016.
[61] A. Dick and H. Weyl. Emmy Noether, 1882-1935. Springer, 1981.
[62] T. Doster. Harmonic analysis inspired data fusion for applications in remote
sensing. PhD thesis, University of Maryland, College Park, 2014.
[63] D. S. Dummit and R. M. Foote. Abstract algebra. Wiley Hoboken, 3 edition,
2004.
[64] H. Edelsbrunner. Alpha shapes-a survey. Tessellations in the Sciences, 27:1?
25, 2010.
[65] H. Edelsbrunner and J. Harer. Persistent homology-a survey. Contemporary
mathematics, 453:257?282, 2008.
[66] H. Edelsbrunner and J. Harer. Computational topology: an introduction.
American Mathematical Soc., 2010.
125
[67] H. Edelsbrunner, D. Kirkpatrick, and R. Seidel. On the shape of a set of points
in the plane. IEEE Transactions on information theory, 29(4):551?559, 1983.
[68] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence
and simplification. In Foundations of Computer Science, 2000. Proceedings.
41st Annual Symposium on, pages 454?463. IEEE, 2000.
[69] H. Edelsbrunner and D. Morozov. Persistent homology: theory and prac-
tice. Technical report, Lawrence Berkeley National Lab.(LBNL), Berkeley,
CA (United States), 2012.
[70] H. Edelsbrunner and E. P. Mu?cke. Three-dimensional alpha shapes. ACM
Transactions on Graphics (TOG), 13(1):43?72, 1994.
[71] S. Eilenberg and J. A. Zilber. Semi-simplicial complexes and singular homol-
ogy. Annals of Mathematics, pages 499?513, 1950.
[72] J. F. Espinoza, R. Herna?ndez-Amador, H. A. Herna?ndez-Herna?ndez, and
B. Ramonetti-Valencia. A numerical approach for the filtered generalized
C?ech complex. Algorithms, 13(1):11, 2020.
[73] L. Euler. Elementa doctrinae solidorum. Novi commentarii academiae scien-
tiarum Petropolitanae, pages 109?140, 1758.
[74] B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, and A. Singh.
Confidence sets for persistence diagrams. The Annals of Statistics, 42(6):2301?
2339, 2014.
[75] A. T. Fomenko. Visual geometry and topology. Springer Science & Business
Media, 2012.
[76] P. Frosini. A distance for similarity classes of submanifolds of a Euclidean
space. Bulletin of the Australian Mathematical Society, 42(3):407?415, 1990.
[77] K. Fukumizu, F. R. Bach, and M. I. Jordan. Kernel dimension reduction in
regression. The Annals of Statistics, 37(4):1871?1905, 2009.
[78] R. Ghrist. Barcodes: the persistent topology of data. Bulletin of the American
Mathematical Society, 45(1):61?75, 2008.
[79] R. W. Ghrist. Elementary applied topology, volume 1. Createspace Seattle,
WA, 2014.
[80] D. Govc. On the definition of the homological critical value. Journal of
Homotopy and Related Structures, 11(1):143?151, 2016.
[81] A. Hafftka. Tensor completion for multidimensional inverse problems with
applications to magnetic resonance relaxometry. PhD thesis, University of
Maryland, College Park, 2016.
126
[82] A. Halevy. Extensions of Laplacian eigenmaps for manifold learning. PhD
thesis, University of Maryland, College Park, 2011.
[83] A. Hatcher. Algebraic topology. Cambridge university press, 2005.
[84] J.-C. Hausmann et al. On the Vietoris-Rips complexes and a cohomology
theory for metric spaces. Annals of Mathematics Studies, 138:175?188, 1995.
[85] G. Henselman and R. Ghrist. Matroid filtrations and computational persistent
homology. arXiv:1606.00199, 2016.
[86] K. P. Heuston. The subfornical organ and water deprivation in pigeons. Har-
vard University, 1994.
[87] W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into
a Hilbert space. Contemporary mathematics, 26:189?206, 1984.
[88] T. Kaczynski, K. M. Mischaikow, and M. Mrozek. Computational homology,
volume 157. Springer, 2004.
[89] M. Kerber and R. Sharathkumar. Approximate C?ech complex in low and high
dimensions. In International Symposium on Algorithms and Computation,
pages 666?676. Springer, 2013.
[90] K. Kochersberger, J. Peterson, P. Kumar, J. Bird, M. McLean, W. Czaja,
W. Li, and N. Monson. Unmanned aircraft applications in radiological sur-
veys. In 2018 IEEE International Symposium on Technologies for Homeland
Security (HST), pages 1?5. IEEE, 2018.
[91] M. P. Lesnick. Multidimensional interleavings and applications to topological
inference. Stanford University, 2012.
[92] W. Li. Topics in Harmonic Analysis, Sparse Representations, and Data Anal-
ysis. PhD thesis, University of Maryland, College Park, 2018.
[93] N. O. Malott, A. M. Sens, and P. A. Wilsey. Topology preserving data reduc-
tion for computing persistent homology. In 2020 IEEE International Confer-
ence on Big Data (Big Data), pages 2681?2690. IEEE, 2020.
[94] S. Mandal. Applications of Persistent Homology and Cycles. PhD thesis, The
Ohio State University, 2020.
[95] W. S. Massey. A basic course in algebraic topology, volume 127. Springer,
2019.
[96] J. P. May. A concise course in algebraic topology. University of Chicago press,
1999.
[97] J. P. May. The geometry of iterated loop spaces, volume 271. Springer, 2006.
127
[98] K. Mischaikow and V. Nanda. Morse theory for filtrations and efficient
computation of persistent homology. Discrete & Computational Geometry,
50(2):330?353, 2013.
[99] D. Morozov. Dionysus. Software available at http://www. mrzv.
org/software/dionysus, 2012.
[100] D. Morozov and A. Nigmetov. Towards lockfree persistent homology. In
Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and
Architectures, pages 555?557, 2020.
[101] J. M. Murphy. Anisotropic harmonic analysis and integration of remotely
sensed data. PhD thesis, University of Maryland, College Park, 2015.
[102] M. Nicolau, A. J. Levine, and G. Carlsson. Topology based data analysis iden-
tifies a subgroup of breast cancers with a unique mutational profile and excel-
lent survival. Proceedings of the National Academy of Sciences, 108(17):7265?
7270, 2011.
[103] P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submani-
folds with high confidence from random samples. Discrete & Computational
Geometry, 39(1-3):419?441, 2008.
[104] P. Niyogi, S. Smale, and S. Weinberger. A topological view of unsupervised
learning from noisy data. SIAM Journal on Computing, 40(3):646?663, 2011.
[105] F. O. N. Njeunje. Computational methods in machine learning: transport
model, Haar wavelet, DNA classification, and MRI. PhD thesis, University of
Maryland, College Park, 2018.
[106] N. Otter. Ph-roadmap, 2017.
[107] N. Otter, M. A. Porter, U. Tillmann, P. Grindrod, and H. A. Harrington.
A roadmap for the computation of persistent homology. EPJ Data Science,
6(1):17, 2017.
[108] S. Y. Oudot. On the topology of the restricted Delaunay triangulation and
witness complex in higher dimensions. arXiv:0803.1296, 2008.
[109] S. Y. Oudot. Persistence theory: from quiver representations to data analysis,
volume 209. American Mathematical Society Providence, 2015.
[110] J. B. Pe?rez, S. Hauke, U. Lupo, M. Caorsi, and A. Dassatti. giotto-ph: A
python library for high-performance computation of persistent homology of
Vietoris-Rips filtrations. arXiv:2107.05412, 2021.
[111] H. Poincare?. Analysis situs. Gauthier-Villars, 1895.
[112] D. Quillen. Higher algebraic k-theory: I. In Higher K-theories, pages 85?147.
Springer, 1973.
128
[113] K. N. Ramamurthy, K. R. Varshney, and J. J. Thiagarajan. Computing per-
sistent homology under random projection. In 2014 IEEE Workshop on Sta-
tistical Signal Processing (SSP), pages 105?108. IEEE, 2014.
[114] B. Riemann. Theorie der Abel?schen functionen. Georg Reimer, 1857.
[115] V. Robins. Towards computing homology from finite approximations. In
Topology proceedings, volume 24, pages 503?532, 1999.
[116] M. Robinson. Topological signal processing, volume 81. Springer, 2014.
[117] L. K. Saul, K. Q. Weinberger, F. Sha, J. Ham, and D. D. Lee. Spectral
methods for dimensionality reduction. Semi-supervised learning, 3, 2006.
[118] G. Segal. Classifying spaces and spectral sequences. Publications
Mathe?matiques de l?IHE?S, 34:105?112, 1968.
[119] D. R. Sheehy. Linear-size approximations to the Vietoris?Rips filtration. Dis-
crete & Computational Geometry, 49(4):778?796, 2013.
[120] D. R. Sheehy. The persistent homology of distance functions under random
projection. In Proceedings of the thirtieth annual symposium on Computational
geometry, pages 328?334, 2014.
[121] G. Singh, F. Memoli, T. Ishkhanov, G. Sapiro, G. Carlsson, and D. L. Ringach.
Topological analysis of population activity in visual cortex. Journal of vision,
8(8):11?11, 2008.
[122] E. H. Spanier. Algebraic topology. Springer Science & Business Media, 1989.
[123] C. Tralie, N. Saul, and R. Bar-On. Ripser.py: A lean persistent homology
library for python. The Journal of Open Source Software, 3(29):925, Sep
2018.
[124] B. Walker. Topological Structure of Spatially-Distributed Network Coded In-
formation. PhD thesis, University of Maryland, College Park, 2014.
[125] S. Weinberger. What is... persistent homology? Notices of the AMS, 58(1):36?
39, 2011.
[126] J. H. Whitehead. Combinatorial homotopy. i. Bulletin of the American Math-
ematical Society, 55(3):213?245, 1949.
[127] H. Whitney. Differentiable manifolds. Annals of Mathematics, pages 645?680,
1936.
[128] A. Zomorodian and G. Carlsson. Computing persistent homology. Discrete &
Computational Geometry, 33(2):249?274, 2005.
[129] A. J. Zomorodian. Topology for computing, volume 16. Cambridge university
press, 2005.
129