ABSTRACT Title of Dissertation: Statistical Network Analysis of High-Dimensional Neuroimaging Data With Complex Topological Structures Tong Lu Doctor of Philosophy, 2023 Dissertation Directed by: Professor Shuo Chen Department of Mathematics This dissertation contains three projects that collectively tackle statistical challenges in the field of high-dimensional brain connectome data analysis and enhance our understanding of the intricate workings of the human brain. Project 1 proposes a novel network method for detecting brain-disease-related alterations in voxel-pair-level brain functional connectivity with spatial constraints, thus improving spatial specificity and sensitivity. Its effectiveness is validated through extensive simulations and real data applications in nicotine addiction and schizophrenia studies. Project 2 introduces a multivariate multiple imputation method specifically designed for voxel-level neuroimaging data in high dimensions based on Bayesian models and Markov chain Monte Carlo processes. According to both synthetic data and real neurovascular water exchange data extracted from a neuroimaging dataset in a schizophrenia study, our method indicates high imputation accuracy and computational efficiency. Project 3 develops a multi-level network model based on graph combinatorics that captures vector-to-matrix associations between brain structural imaging measures and functional connectomic networks. The validity of the proposed model is justified through extensive simulations and a real structure-function imaging dataset from UK Biobank. These three projects contribute innovative methodologies and insights that advance neuroimaging data analysis, including improvements in spatial specificity, statistical power, imputation accuracy, and computational efficiency when revealing the brain’s complex neurological patterns. STATISTICAL NETWORK ANALYSIS OF HIGH-DIMENSIONAL NEUROIMAGING DATA WITH COMPLEX TOPOLOGICAL STRUCTURES by Tong Lu Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2023 Advisory Committee: Professor Shuo Chen, Chair/Advisor Professor Vince Lyzinski Professor Tianzhou Ma Professor Paul Smith Professor Xin He © Copyright by Tong Lu 2023 Preface This dissertation represents the culmination of a research journey spanning several years in the field of brain imaging data analysis and its implications in unlocking the intricate workings of the human brain. The completion of this research endeavor has been made possible through funding from the National Institutes of Health under Award Numbers 1DP1DA04896801, EB008432, and EB008281. It is with immense pride and gratitude that I present this work to the academic community. The motivation behind this research stemmed from the analytical challenges posed by the complex and entangled nature of neuroimaging data in high dimensions and the need to advance the statistical methodologies in order to disentangle the complex data and further reveal various pathological and structural association mechanisms within brain functional connectome. Through the course of this dissertation, I embarked on three distinct projects, each aimed at addressing specific statistical challenges and offering solutions to the field of neuroscience. These projects have not only introduced novel statistical methodologies on a theoretical level, but have also shed light on their practical applicability in neuroimaging data analysis. It is my sincerest hope that this dissertation contributes to the field of neuroscience and serves as a stepping stone for future research in statistical network models and understanding human brain connectome. May it inspire further exploration, spark curiosity, and foster innovation in the scientific community. ii Acknowledgments I owe my heartfelt gratitude to all the people who have made this thesis possible. Their unwavering support and contributions have profoundly shaped my PhD experience into one that I will cherish forever. First and foremost, I would like to express my sincere gratitude to my advisor, Professor Shuo Chen, for granting me an invaluable opportunity to engage in challenging yet immensely fascinating and meaningful projects on brain connectome data over the past five years. His steadfast dedication, guidance, support, and patience have played a crucial role in making this five-year journey exceptionally rewarding and unforgettable. It has been a pleasure to work with and learn from such an extraordinary individual. I would also like to thank my committee members, Professor Vince Lyzinski, Professor Tianzhou Ma, Professor Paul Smith, and Professor Xin He for graciously agreeing to serve on my thesis committee. Their willingness to dedicate their individual time to reviewing my manuscript and offering constructive feedback has been truly invaluable. I owe my deepest thanks to my family - my mother and father, who have always provided unconditional love and support. Their immense encouragement has propelled me forward, even in the face of daunting challenges. I am indebted to them for their belief in my abilities and for enabling me to pursue my undergraduate and Ph.D. studies in the United States. Words cannot express the gratitude I feel towards them. Without them, everything I own today would have iii remained a distant dream. Lastly, I am also grateful to my significant half, Luke, whose presence and support have been a constant source of strength throughout my life and academic journey. I am fortunate to have him by my side. I extend my best wishes to him as he embarks on his own pursuit of a Ph.D. degree. Thank you all for making this five-year journey a magical one. iv Table of Contents Preface ii Acknowledgements iii Table of Contents v List of Tables viii List of Figures ix List of Abbreviations x Chapter 1: Introduction 1 1.1 Background of neuroimaging data . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Common data structures . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Biological significance . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Research questions and literature review . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Voxel-level and region-level analysis . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Current methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Proposed methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Spatially constrained and connected networks (SCCN) . . . . . . . . . . 10 1.3.2 High-dimensional multiple imputation (HIMA) . . . . . . . . . . . . . . 11 1.3.3 Multi-level network association method (MOAT) . . . . . . . . . . . . . 12 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 2: Network analysis with spatial-contiguity constraints (SCCN) 14 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Detecting densely altered sub-area pairs from an ROI pair . . . . . . . . . 22 2.2.3 Statistical inference of {(Uc, Vd)} pairs . . . . . . . . . . . . . . . . . . . 30 2.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.1 Primary analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.2 Negative control analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4 Real data application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4.1 Nicotine-addiction research study . . . . . . . . . . . . . . . . . . . . . 38 v 2.4.2 Schizophrenia research study . . . . . . . . . . . . . . . . . . . . . . . . 42 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Chapter 3: High Dimensional Multiple Imputation (HIMA)) 50 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.2 HIMA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.3 Posterior mode estimation . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3 Data example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.1 Semi-synthetic data analysis . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.2 Real data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Chapter 4: Multi-level network association analysis (MOAT) 73 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2 Our method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.1 Data structure and problem set up . . . . . . . . . . . . . . . . . . . . . 78 4.2.2 Multi-level graph structure for {�(ij),k} . . . . . . . . . . . . . . . . . . 80 4.2.3 Bc suppressing false positive findings . . . . . . . . . . . . . . . . . . . 83 4.2.4 Multi-level sub-network extraction . . . . . . . . . . . . . . . . . . . . . 85 4.2.5 Inference for B̂c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3.1 Synthetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3.2 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4 Study of FC-SI associations in brain connectome data . . . . . . . . . . . . . . . 97 4.4.1 UK Biobank sample and neuroimaging data . . . . . . . . . . . . . . . . 97 4.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Appendix : SCCN 105 2A. Spatial-contiguity constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 2A.1. Formal definition of spatial-contiguity . . . . . . . . . . . . . . . . . . . . 105 2A.2. Implementation of spatial-contiguity constraints . . . . . . . . . . . . . . 106 2B. Within-region vFC association analysis . . . . . . . . . . . . . . . . . . . . . . . 107 2B.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 2B.2 Dense sub-network extraction . . . . . . . . . . . . . . . . . . . . . . . . . 109 2B.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 2B.4 Real data application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 2C. Proofs and derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 2C.1 Proof of Lemma 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 2C.2. Proof of Theorem 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 2C.3. Proof of Theorem 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 2C.4. Construction of the MDL-based test statistics . . . . . . . . . . . . . . . . 121 vi 2D. Additional information on schizophrenia data analysis . . . . . . . . . . . . . . . 123 2D.1. fMRI data acquisition and pre-processing procedures . . . . . . . . . . . . 123 2D.2. Salience network disrupted connectivity . . . . . . . . . . . . . . . . . . . 124 2D.3. Temporal-thalamic disrupted connectivity . . . . . . . . . . . . . . . . . . 126 2E. Additional information on UK Biobank smoking data analysis . . . . . . . . . . . 128 2E.1. Subject selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 2E.2. fMRI data acquisition and pre-processing procedures . . . . . . . . . . . . 130 2E.3. Covariates and Confounders . . . . . . . . . . . . . . . . . . . . . . . . . 131 2E.4. Network detection results . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 2F. Additional information on negative control analysis . . . . . . . . . . . . . . . . . 133 Appendix : HIMA 135 3A. Additional information on real imaging data . . . . . . . . . . . . . . . . . . . . . 135 3B. Theoretical justifications of HIMA . . . . . . . . . . . . . . . . . . . . . . . . . . 136 3C. Impropriety of NNGP in neuroimaging data imputation . . . . . . . . . . . . . . . 138 Appendix : MOAT 139 4A. Estimation of �1, �2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 4B. Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 4B.1. Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 4B.2. Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 4C. Additional information on real imaging data . . . . . . . . . . . . . . . . . . . . . 145 4C.1. UK Biobank imaging data collection and preprocessing . . . . . . . . . . . 145 4C.2. Imaging data confounder control . . . . . . . . . . . . . . . . . . . . . . . 146 vii List of Tables A.1 Subject Demographic Information . . . . . . . . . . . . . . . . . . . . . . . . . 124 viii List of Figures 2.1 Patterns of Disease-Related Connections: Examples and Insights . . . . . . . . . 15 2.2 SCCN pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 A 2D visualization of performance by different methods . . . . . . . . . . . . . . 33 2.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5 Detected sub-area pairs from a nicotine-addition study . . . . . . . . . . . . . . . 41 2.6 Detected sub-area pairs in salience network from a schizophrenia study (2D) . . . 44 2.7 Detected sub-area pairs in salience network from a schizophrenia study (3D) . . . 45 3.1 An example of missingness distribution in neuroimaging data . . . . . . . . . . . 51 3.2 Running time against the number of voxels using MICE and HIMA . . . . . . . . 52 3.3 Imputation performance on semi-synthetic data . . . . . . . . . . . . . . . . . . 66 3.4 Trace plots of convergence performance . . . . . . . . . . . . . . . . . . . . . . 67 3.5 Imputation results on real schizophrenia data . . . . . . . . . . . . . . . . . . . . 69 4.1 The detection pipeline of systematic FC-SI association patterns by MOAT . . . . 76 4.2 An illustration of a multi-level graph with a FC-SI associated sub-network B1 . . 81 4.3 Application of MOAT and comparative methods on synthetic data . . . . . . . . 93 4.4 Inference results of MOAT and comparative methods under different settings . . . 95 4.5 Application of MOAT on a real neuroimaging dataset obtained from the UK Biobank. 99 4.6 Extracted FC-SI associated sub-networks by MOAT . . . . . . . . . . . . . . . . 101 4.7 20 selected white matter tracts strongly associated with identified FC sub-network 102 A.1 An illustration of the concept spatial contiguity . . . . . . . . . . . . . . . . . . 106 A.2 A 2D visualization of within-region performance by different network methods . 112 A.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.4 Detected results within cingulate from a schizophrenia study . . . . . . . . . . . 116 A.5 Detected results within insular from a schizophrenia study . . . . . . . . . . . . 118 A.6 Detected results within salience network from a schizophrenia study . . . . . . . 127 A.7 Detected results within W(Temright,Thaleft) network from a schizophrenia study . . 128 A.8 Detected results within W(Temright,Tharight) network from a schizophrenia study . . 129 A.9 Results of negative control analysis . . . . . . . . . . . . . . . . . . . . . . . . . 134 B.1 Scatter plot of voxel-pair correlations against voxel-pair spatial distance . . . . . 138 C.1 White matter tracts defined following the ENIGMA protocols . . . . . . . . . . . 148 ix List of Abbreviations ACC Anterior Cingulate Cortex AI Anterior Insula ALFF Amplitude Of Low-Frequency Fluctuation BG Basal Ganglia BH-FDR Benjamini–Hochberg FDR BOLD Blood-Oxygenation-Level Dependent BSGP Bipartite Spectral Graph Partitioning CT Cortical Thickness DMN Default Mode Network DTI Diffusion Tensor Imaging FA Fractional Anisotropy FABIA Factor Analysis for Bicluster Information Acquisition FC Functional Connectivity FCN Functional Connectomic Networks FDR False Discovery Rate fMRI Functional Magnetic Resonance Imagine FPR False Positive Rate FWER Family Wise Error Rate HIMA High-Dimensional Multiple Imputation ICBM International Consortium for Brain Mapping ITL Information Theoretic Learning IW Inverse Wishart KL Kullback Leibler MAP Maximum a Posterior MAR Missing At Random MCAR Missing Completely At Random MCMC Markov Chain Monte Carlo MDL Minimum Description Length MI Multiple Imputation MICE Multivariate Imputation by Chained Equations MNAR Missing Not At Random MOAT Multilayer Network Association Method MRI Magnetic Resonance Imaging MVN Multivariate Normal NNGP Nearest Neighbor Gaussian Processes NP Nondeterministic Polynomial x PCA Principal Component Analysis PET Positron Emission Tomography PMA Penalized Matrix Decomposition RBN Region-Level Brain Network RLA Region-Level Analysis ROI Regions Of Interest rs-fMRI Resting-State Functional Magnetic Resonance Imaging SCCA Sparse Canonical Correlation Analysis SCCN Spatially Constrained and Connected Networks SI Structural Imaging SZ Schizophrenia TNR True Negative Rate TPR True Positive Rate vFC Voxel-wise Functional Connectivity VLA Voxel-Level Analysis wMAE Weighted Mean Absolute Error wMBE Weighted Mean Bias Error wMSE Weighted Mean Square Error xi Chapter 1: Introduction Brain imaging data, with its diverse data structures and applications, opens up a realm of possibilities for unlocking the mysteries of the human brain. By unraveling fundamental brain structures and functions, brain imaging techniques provide researchers with valuable insights into complex neurological processes. Statistical analysis of brain imaging data has continuously driven groundbreaking research (Bullmore and Sporns, 2009; Cao et al., 2014; Fornito et al., 2016; Rubinov and Sporns, 2010; Simpson et al., 2013). As both neuroimaging technology and statistical methodology advances, the future holds even greater potential for understanding the brain and its role in human cognition and behavior. Motivated by this immense potential, this dissertation aims to develop three distinct statistical models to systematically disentangle the intricate workings of the human brain, including identifying pathophysiological sub-community patterns in brain functional connectome, robustly imputing missingness in imaging data for further analysis, and revealing systematic association patterns between brain structure and function. The applications of these models help pave the way for further discoveries in neuroscience, and assist clinical predictions concerning disease diagnosis and treatment selection. 1 1.1 Background of neuroimaging data Brain imaging data encompasses information obtained through a range of non-invasive imaging techniques, enabling visualization of the brain’s structure, function, and connectivity. Commonly utilized imaging techniques include magnetic resonance imaging (MRI), diffusion tensor imaging (DTI), functional magnetic resonance imaging (fMRI), positron emission tomography (PET), and electroencephalography (EEG). MRI produces high-resolution images of the brain’s structure, providing valuable physical information such as size, shape, and cortical thickness. DTI assesses the integrity of white matter microstructure by measuring fractional anisotropy (FA). fMRI records dynamic changes in blood flow within different brain regions, facilitating the measurement of localized neural activity and functional connectivity (FC). PET provides information about brain function and metabolism by measuring the distribution of a radioactive tracer. EEG measures the electrical activity of the brain, allowing researchers to study the timing and synchronization of neural processes. All these diverse imaging modalities play essential roles in understanding the complexities of brain activity and contribute to various fields of neuroscience research. By collecting data from these imaging modalities, researchers can capture different aspects of brain activity and organization. 1.1.1 Common data structures Neuroimaging data can take on various data structures, with the most common ones being: a) Volumetric Data: Volumetric data characterizes a three-dimensional (3D) representation of the brain’s structure. It is commonly acquired through MRI scans and provides detailed information about brain anatomy, allowing researchers to study brain regions, their sizes, and 2 shapes (Milchenko and Marcus, 2013; Reiss et al., 1995; Verellen et al., 2008). b) Structural data: Structural data is related to volumetric data but refers to a broader category of information that characterizes the anatomical properties and organization of the brain. It includes measures such as cortical thickness, surface area, volume of brain regions, and connectivity patterns. In statistical analysis, structural data are often stored in vectors (Bullmore and Sporns, 2009; Derado et al., 2010; Smith et al., 2004). For example, a vector X = {xk} m k=1 stores a list of m integrity measures on different white matter tracts. c) Functional Connectivity Data: Functional connectivity data, derived from fMRI or EEG, examines the temporal correlation between different brain regions. It provides insights into how brain regions communicate and work together, enabling researchers to understand brain networks and their involvement in various cognitive processes. In statistical analysis, functional connectivity data is often stored in a binary or weighted adjacency matrix Y n⇥n (Penny et al., 2011; Wig et al., 2014; Xia and Li, 2017), where each element {yij}1i