ABSTRACT

Title of Dissertation: Statistical Network Analysis of
High-Dimensional Neuroimaging Data
With Complex Topological Structures

Tong Lu
Doctor of Philosophy, 2023

Dissertation Directed by: Professor Shuo Chen
Department of Mathematics

This dissertation contains three projects that collectively tackle statistical challenges in

the field of high-dimensional brain connectome data analysis and enhance our understanding

of the intricate workings of the human brain. Project 1 proposes a novel network method for

detecting brain-disease-related alterations in voxel-pair-level brain functional connectivity with

spatial constraints, thus improving spatial specificity and sensitivity. Its effectiveness is validated

through extensive simulations and real data applications in nicotine addiction and schizophrenia

studies. Project 2 introduces a multivariate multiple imputation method specifically designed for

voxel-level neuroimaging data in high dimensions based on Bayesian models and Markov chain

Monte Carlo processes. According to both synthetic data and real neurovascular water exchange

data extracted from a neuroimaging dataset in a schizophrenia study, our method indicates high

imputation accuracy and computational efficiency. Project 3 develops a multi-level network

model based on graph combinatorics that captures vector-to-matrix associations between brain

structural imaging measures and functional connectomic networks. The validity of the proposed

model is justified through extensive simulations and a real structure-function imaging dataset

from UK Biobank. These three projects contribute innovative methodologies and insights that

advance neuroimaging data analysis, including improvements in spatial specificity, statistical


power, imputation accuracy, and computational efficiency when revealing the brain’s complex

neurological patterns.


STATISTICAL NETWORK ANALYSIS OF HIGH-DIMENSIONAL
NEUROIMAGING DATA WITH COMPLEX TOPOLOGICAL STRUCTURES

by

Tong Lu

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2023

Advisory Committee:
Professor Shuo Chen, Chair/Advisor
Professor Vince Lyzinski
Professor Tianzhou Ma
Professor Paul Smith
Professor Xin He


© Copyright by
Tong Lu

2023


Preface

This dissertation represents the culmination of a research journey spanning several years in

the field of brain imaging data analysis and its implications in unlocking the intricate workings of

the human brain. The completion of this research endeavor has been made possible through funding

from the National Institutes of Health under Award Numbers 1DP1DA04896801, EB008432,

and EB008281. It is with immense pride and gratitude that I present this work to the academic

community.

The motivation behind this research stemmed from the analytical challenges posed by the

complex and entangled nature of neuroimaging data in high dimensions and the need to advance

the statistical methodologies in order to disentangle the complex data and further reveal various

pathological and structural association mechanisms within brain functional connectome. Through

the course of this dissertation, I embarked on three distinct projects, each aimed at addressing

specific statistical challenges and offering solutions to the field of neuroscience. These projects

have not only introduced novel statistical methodologies on a theoretical level, but have also shed

light on their practical applicability in neuroimaging data analysis.

It is my sincerest hope that this dissertation contributes to the field of neuroscience and

serves as a stepping stone for future research in statistical network models and understanding

human brain connectome. May it inspire further exploration, spark curiosity, and foster innovation

in the scientific community.

ii


Acknowledgments

I owe my heartfelt gratitude to all the people who have made this thesis possible. Their

unwavering support and contributions have profoundly shaped my PhD experience into one that I

will cherish forever.

First and foremost, I would like to express my sincere gratitude to my advisor, Professor

Shuo Chen, for granting me an invaluable opportunity to engage in challenging yet immensely

fascinating and meaningful projects on brain connectome data over the past five years. His

steadfast dedication, guidance, support, and patience have played a crucial role in making this

five-year journey exceptionally rewarding and unforgettable. It has been a pleasure to work with

and learn from such an extraordinary individual.

I would also like to thank my committee members, Professor Vince Lyzinski, Professor

Tianzhou Ma, Professor Paul Smith, and Professor Xin He for graciously agreeing to serve on my

thesis committee. Their willingness to dedicate their individual time to reviewing my manuscript

and offering constructive feedback has been truly invaluable.

I owe my deepest thanks to my family - my mother and father, who have always provided

unconditional love and support. Their immense encouragement has propelled me forward, even

in the face of daunting challenges. I am indebted to them for their belief in my abilities and for

enabling me to pursue my undergraduate and Ph.D. studies in the United States. Words cannot

express the gratitude I feel towards them. Without them, everything I own today would have

iii


remained a distant dream.

Lastly, I am also grateful to my significant half, Luke, whose presence and support have

been a constant source of strength throughout my life and academic journey. I am fortunate to

have him by my side. I extend my best wishes to him as he embarks on his own pursuit of a Ph.D.

degree.

Thank you all for making this five-year journey a magical one.

iv


Table of Contents

Preface ii

Acknowledgements iii

Table of Contents v

List of Tables viii

List of Figures ix

List of Abbreviations x

Chapter 1: Introduction 1
1.1 Background of neuroimaging data . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Common data structures . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Biological significance . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Research questions and literature review . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Voxel-level and region-level analysis . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Current methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Proposed methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Spatially constrained and connected networks (SCCN) . . . . . . . . . . 10
1.3.2 High-dimensional multiple imputation (HIMA) . . . . . . . . . . . . . . 11
1.3.3 Multi-level network association method (MOAT) . . . . . . . . . . . . . 12

1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Chapter 2: Network analysis with spatial-contiguity constraints (SCCN) 14
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Detecting densely altered sub-area pairs from an ROI pair . . . . . . . . . 22
2.2.3 Statistical inference of {(Uc, Vd)} pairs . . . . . . . . . . . . . . . . . . . 30

2.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.1 Primary analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Negative control analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4 Real data application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.1 Nicotine-addiction research study . . . . . . . . . . . . . . . . . . . . . 38

v


2.4.2 Schizophrenia research study . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Chapter 3: High Dimensional Multiple Imputation (HIMA)) 50
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2.2 HIMA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.3 Posterior mode estimation . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.3 Data example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3.1 Semi-synthetic data analysis . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.2 Real data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Chapter 4: Multi-level network association analysis (MOAT) 73
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Our method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.2.1 Data structure and problem set up . . . . . . . . . . . . . . . . . . . . . 78
4.2.2 Multi-level graph structure for {�(ij),k} . . . . . . . . . . . . . . . . . . 80
4.2.3 Bc suppressing false positive findings . . . . . . . . . . . . . . . . . . . 83
4.2.4 Multi-level sub-network extraction . . . . . . . . . . . . . . . . . . . . . 85
4.2.5 Inference for B̂c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.1 Synthetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.2 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.4 Study of FC-SI associations in brain connectome data . . . . . . . . . . . . . . . 97
4.4.1 UK Biobank sample and neuroimaging data . . . . . . . . . . . . . . . . 97
4.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Appendix : SCCN 105
2A. Spatial-contiguity constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

2A.1. Formal definition of spatial-contiguity . . . . . . . . . . . . . . . . . . . . 105
2A.2. Implementation of spatial-contiguity constraints . . . . . . . . . . . . . . 106

2B. Within-region vFC association analysis . . . . . . . . . . . . . . . . . . . . . . . 107
2B.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
2B.2 Dense sub-network extraction . . . . . . . . . . . . . . . . . . . . . . . . . 109
2B.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2B.4 Real data application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

2C. Proofs and derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
2C.1 Proof of Lemma 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
2C.2. Proof of Theorem 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2C.3. Proof of Theorem 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2C.4. Construction of the MDL-based test statistics . . . . . . . . . . . . . . . . 121

vi


2D. Additional information on schizophrenia data analysis . . . . . . . . . . . . . . . 123
2D.1. fMRI data acquisition and pre-processing procedures . . . . . . . . . . . . 123
2D.2. Salience network disrupted connectivity . . . . . . . . . . . . . . . . . . . 124
2D.3. Temporal-thalamic disrupted connectivity . . . . . . . . . . . . . . . . . . 126

2E. Additional information on UK Biobank smoking data analysis . . . . . . . . . . . 128
2E.1. Subject selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2E.2. fMRI data acquisition and pre-processing procedures . . . . . . . . . . . . 130
2E.3. Covariates and Confounders . . . . . . . . . . . . . . . . . . . . . . . . . 131
2E.4. Network detection results . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

2F. Additional information on negative control analysis . . . . . . . . . . . . . . . . . 133

Appendix : HIMA 135
3A. Additional information on real imaging data . . . . . . . . . . . . . . . . . . . . . 135
3B. Theoretical justifications of HIMA . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3C. Impropriety of NNGP in neuroimaging data imputation . . . . . . . . . . . . . . . 138

Appendix : MOAT 139
4A. Estimation of �1, �2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4B. Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

4B.1. Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4B.2. Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

4C. Additional information on real imaging data . . . . . . . . . . . . . . . . . . . . . 145
4C.1. UK Biobank imaging data collection and preprocessing . . . . . . . . . . . 145
4C.2. Imaging data confounder control . . . . . . . . . . . . . . . . . . . . . . . 146

vii


List of Tables

A.1 Subject Demographic Information . . . . . . . . . . . . . . . . . . . . . . . . . 124

viii


List of Figures

2.1 Patterns of Disease-Related Connections: Examples and Insights . . . . . . . . . 15
2.2 SCCN pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 A 2D visualization of performance by different methods . . . . . . . . . . . . . . 33
2.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Detected sub-area pairs from a nicotine-addition study . . . . . . . . . . . . . . . 41
2.6 Detected sub-area pairs in salience network from a schizophrenia study (2D) . . . 44
2.7 Detected sub-area pairs in salience network from a schizophrenia study (3D) . . . 45

3.1 An example of missingness distribution in neuroimaging data . . . . . . . . . . . 51
3.2 Running time against the number of voxels using MICE and HIMA . . . . . . . . 52
3.3 Imputation performance on semi-synthetic data . . . . . . . . . . . . . . . . . . 66
3.4 Trace plots of convergence performance . . . . . . . . . . . . . . . . . . . . . . 67
3.5 Imputation results on real schizophrenia data . . . . . . . . . . . . . . . . . . . . 69

4.1 The detection pipeline of systematic FC-SI association patterns by MOAT . . . . 76
4.2 An illustration of a multi-level graph with a FC-SI associated sub-network B1 . . 81
4.3 Application of MOAT and comparative methods on synthetic data . . . . . . . . 93
4.4 Inference results of MOAT and comparative methods under different settings . . . 95
4.5 Application of MOAT on a real neuroimaging dataset obtained from the UK Biobank. 99
4.6 Extracted FC-SI associated sub-networks by MOAT . . . . . . . . . . . . . . . . 101
4.7 20 selected white matter tracts strongly associated with identified FC sub-network 102

A.1 An illustration of the concept spatial contiguity . . . . . . . . . . . . . . . . . . 106
A.2 A 2D visualization of within-region performance by different network methods . 112
A.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.4 Detected results within cingulate from a schizophrenia study . . . . . . . . . . . 116
A.5 Detected results within insular from a schizophrenia study . . . . . . . . . . . . 118
A.6 Detected results within salience network from a schizophrenia study . . . . . . . 127
A.7 Detected results within W(Temright,Thaleft) network from a schizophrenia study . . 128
A.8 Detected results within W(Temright,Tharight) network from a schizophrenia study . . 129
A.9 Results of negative control analysis . . . . . . . . . . . . . . . . . . . . . . . . . 134

B.1 Scatter plot of voxel-pair correlations against voxel-pair spatial distance . . . . . 138

C.1 White matter tracts defined following the ENIGMA protocols . . . . . . . . . . . 148

ix


List of Abbreviations

ACC Anterior Cingulate Cortex
AI Anterior Insula
ALFF Amplitude Of Low-Frequency Fluctuation
BG Basal Ganglia
BH-FDR Benjamini–Hochberg FDR
BOLD Blood-Oxygenation-Level Dependent
BSGP Bipartite Spectral Graph Partitioning
CT Cortical Thickness
DMN Default Mode Network
DTI Diffusion Tensor Imaging
FA Fractional Anisotropy
FABIA Factor Analysis for Bicluster Information Acquisition
FC Functional Connectivity
FCN Functional Connectomic Networks
FDR False Discovery Rate
fMRI Functional Magnetic Resonance Imagine
FPR False Positive Rate
FWER Family Wise Error Rate
HIMA High-Dimensional Multiple Imputation
ICBM International Consortium for Brain Mapping
ITL Information Theoretic Learning
IW Inverse Wishart
KL Kullback Leibler
MAP Maximum a Posterior
MAR Missing At Random
MCAR Missing Completely At Random
MCMC Markov Chain Monte Carlo
MDL Minimum Description Length
MI Multiple Imputation
MICE Multivariate Imputation by Chained Equations
MNAR Missing Not At Random
MOAT Multilayer Network Association Method
MRI Magnetic Resonance Imaging
MVN Multivariate Normal
NNGP Nearest Neighbor Gaussian Processes
NP Nondeterministic Polynomial

x


PCA Principal Component Analysis
PET Positron Emission Tomography
PMA Penalized Matrix Decomposition
RBN Region-Level Brain Network
RLA Region-Level Analysis
ROI Regions Of Interest
rs-fMRI Resting-State Functional Magnetic Resonance Imaging
SCCA Sparse Canonical Correlation Analysis
SCCN Spatially Constrained and Connected Networks
SI Structural Imaging
SZ Schizophrenia
TNR True Negative Rate
TPR True Positive Rate
vFC Voxel-wise Functional Connectivity
VLA Voxel-Level Analysis
wMAE Weighted Mean Absolute Error
wMBE Weighted Mean Bias Error
wMSE Weighted Mean Square Error

xi


Chapter 1: Introduction

Brain imaging data, with its diverse data structures and applications, opens up a realm of

possibilities for unlocking the mysteries of the human brain. By unraveling fundamental brain

structures and functions, brain imaging techniques provide researchers with valuable insights

into complex neurological processes. Statistical analysis of brain imaging data has continuously

driven groundbreaking research (Bullmore and Sporns, 2009; Cao et al., 2014; Fornito et al., 2016;

Rubinov and Sporns, 2010; Simpson et al., 2013). As both neuroimaging technology and statistical

methodology advances, the future holds even greater potential for understanding the brain and its

role in human cognition and behavior. Motivated by this immense potential, this dissertation aims

to develop three distinct statistical models to systematically disentangle the intricate workings

of the human brain, including identifying pathophysiological sub-community patterns in brain

functional connectome, robustly imputing missingness in imaging data for further analysis, and

revealing systematic association patterns between brain structure and function. The applications

of these models help pave the way for further discoveries in neuroscience, and assist clinical

predictions concerning disease diagnosis and treatment selection.

1


1.1 Background of neuroimaging data

Brain imaging data encompasses information obtained through a range of non-invasive

imaging techniques, enabling visualization of the brain’s structure, function, and connectivity.

Commonly utilized imaging techniques include magnetic resonance imaging (MRI), diffusion

tensor imaging (DTI), functional magnetic resonance imaging (fMRI), positron emission tomography

(PET), and electroencephalography (EEG). MRI produces high-resolution images of the brain’s

structure, providing valuable physical information such as size, shape, and cortical thickness.

DTI assesses the integrity of white matter microstructure by measuring fractional anisotropy

(FA). fMRI records dynamic changes in blood flow within different brain regions, facilitating

the measurement of localized neural activity and functional connectivity (FC). PET provides

information about brain function and metabolism by measuring the distribution of a radioactive

tracer. EEG measures the electrical activity of the brain, allowing researchers to study the timing

and synchronization of neural processes. All these diverse imaging modalities play essential roles

in understanding the complexities of brain activity and contribute to various fields of neuroscience

research. By collecting data from these imaging modalities, researchers can capture different

aspects of brain activity and organization.

1.1.1 Common data structures

Neuroimaging data can take on various data structures, with the most common ones being:

a) Volumetric Data: Volumetric data characterizes a three-dimensional (3D) representation

of the brain’s structure. It is commonly acquired through MRI scans and provides detailed

information about brain anatomy, allowing researchers to study brain regions, their sizes, and

2


shapes (Milchenko and Marcus, 2013; Reiss et al., 1995; Verellen et al., 2008).

b) Structural data: Structural data is related to volumetric data but refers to a broader

category of information that characterizes the anatomical properties and organization of the

brain. It includes measures such as cortical thickness, surface area, volume of brain regions, and

connectivity patterns. In statistical analysis, structural data are often stored in vectors (Bullmore

and Sporns, 2009; Derado et al., 2010; Smith et al., 2004). For example, a vector X = {xk}
m

k=1

stores a list of m integrity measures on different white matter tracts.

c) Functional Connectivity Data: Functional connectivity data, derived from fMRI or EEG,

examines the temporal correlation between different brain regions. It provides insights into how

brain regions communicate and work together, enabling researchers to understand brain networks

and their involvement in various cognitive processes. In statistical analysis, functional connectivity

data is often stored in a binary or weighted adjacency matrix Y n⇥n (Penny et al., 2011; Wig et al.,

2014; Xia and Li, 2017), where each element {yij}1i<jn
characterizes the strength of functional

connectivity between brain regions i and j.

d) Graph Data: Graph-based data structures, such as G = (V,E), represent brain networks

as nodes (brain regions) and edges (connections between regions), where V denotes the node set

with a size of, for example, |V | = n, and E denotes the edge set with a size of |E| =
�
n

2

�
. Graph

analysis allows researchers to study the network properties of the brain, including node centrality,

community structure, and information flow (Fornito et al., 2016; Loewe et al., 2014; Zalesky

et al., 2010). This approach is particularly useful for understanding the complex organization and

functioning of brain systems.

3


1.1.2 Biological significance

Brain imaging data has revolutionized neuroscience research, providing researchers with

unprecedented opportunities to explore and elucidate crucial aspects of the brain and its disorders.

Brain imaging data typically contributes to research in the following ways:

a) Studying Brain Function: Functional brain imaging techniques like fMRI and EEG

help researchers investigate brain activity during various tasks, providing insights into cognitive

processes, perception, attention, and memory (Cole et al., 2010; Poldrack, 2008).

b) Mapping Brain Connectivity: Brain imaging data allows the mapping of structural and

functional connections between different brain regions. This helps researchers understand how

information is transmitted and processed within the brain, leading to discoveries about functional

networks and their roles in different behaviors and disorders (Drevets et al., 2008; Kemmer et al.,

2018; Smith et al., 2004).

c) Unraveling Neurological Disorders: Brain imaging data aids in studying neurological

and psychiatric disorders (Siuly and Zhang, 2016; Tae et al., 2018). By comparing brain images of

healthy individuals and patients, researchers can identify structural and functional abnormalities

associated with conditions such as Alzheimer’s disease, schizophrenia, and depression. This

knowledge enhances early detection, treatment, and monitoring of these disorders.

d) Personalized Medicine and Brain-Computer Interfaces: Brain imaging data can

contribute to personalized medicine by providing individualized information about brain structure

and function (Dilsizian and Siegel, 2014; Eckelman et al., 2008; Lambin et al., 2017). It also plays

a vital role in developing brain-computer interfaces, allowing direct communication between the

brain and external devices.

4


1.2 Research questions and literature review

1.2.1 Voxel-level and region-level analysis

In this work, we aim to statistically study neuroimaging data from two perspectives: voxel-

level and region-level analysis, which are two major categories in this domain.

Voxel-Level Analysis (VLA)

VLA is a widely used technique that enables researchers to investigate brain activity and

connectivity at the level of individual voxels, which are three-dimensional pixels that compose

an image. The most commonly employed method for VLA is fMRI. fMRI measures changes in

blood oxygenation levels as a proxy for neuronal activity, providing researchers with insights into

brain function.

One of the key advantages of VLA is its ability to capture fine-grained spatial details.

Researchers can identify specific brain regions that exhibit significant activation or deactivation

during different cognitive processes or in response to external stimuli. VLA has been instrumental

in advancing our understanding of brain function and its relationship to various cognitive and

psychological processes.

Region-Level Analysis (RLA)

RLA takes a broader perspective by grouping voxels into anatomically defined brain regions.

This approach aims to understand the overall functioning of specific brain regions or networks

rather than focusing on individual voxels. RLA is commonly used in both structural and functional

5


brain imaging studies.

In structural brain imaging, such as MRI, RLA involves segmenting the brain into anatomical

regions of interest. By quantifying the volume, shape, or cortical thickness of these regions,

researchers can investigate structural differences associated with various neurological conditions

or developmental changes. Functional RLA is often performed using resting-state fMRI data,

which captures spontaneous brain activity in the absence of explicit tasks. The data is processed

to identify functional connectivity patterns between different brain regions. Various techniques,

including seed-based correlation analysis, independent component analysis (ICA), and graph

theory approaches, are used to map and quantify these functional networks.

RLA provides a macroscopic view of brain organization and interregional communication.

It allows researchers to study large-scale brain networks and investigate how these networks

contribute to various cognitive processes, such as attention, memory, and decision-making.

1.2.2 Research questions

We are interested in three research questions focusing on neuroimaging data. The first two

questions delve into the voxel-level analysis, while the final question pertains to region-level

analysis:

1. How to build a model that can, on a voxel-pair-level, identify the functionally altered brain

sub-area pairs between two large regions caused by a certain brain disease (e.g., Alzheimer’s

disease, Parkinson’s disease, schizophrenia, etc.)?

2. How to construct a missingness imputation technique that is specifically effective for high-

dimensional neuroimaging data, which are often resulted from data acquisition limitations

6


and susceptibility artifacts?

3. How to develop a model that can reveal the underlying systematic association patterns

between brain structure (e.g., cortical thickness) and brain function (e.g., functional connectivity

between neurons)?

1.2.3 Current methods

The solutions to questions (1) and (3) typically involve population-level multiple testing

and covariance/association analysis in order to reveal hidden association patterns. Traditional

multiple testing methods, such as the false-discovery rate (FDR) and family-wise error rate

(FWER) control, provide statistical safeguards against the inflation of Type I error rates in

situations involving multiple comparisons. However, in many applications, these methods can be

conservative, resulting in reduced power and potential false negatives. In recent years, alternative

approaches, including permutation-based methods and FDR estimation methods, have emerged

as alternatives to traditional multiple testing methods. These newer methods offer additional

flexibility and adaptability to various research scenarios, addressing some of the limitations

associated with traditional approaches. However, these methods frequently do not apply to

multivariate voxel/region pairs, as they are unable to account for brain anatomical restrictions

and recover inherent systematic patterns of voxels/regions that are associated with covariates of

interest, such as clinical status and brain structural measures.

In recent years, researchers have introduced advanced statistical methods to extract sub-

community structures while addressing the need for multiple corrections. For instance, Xia and

Li (2017) developed a localized statistical inference approach that takes into account network

7


properties. Chen et al. (2016) proposed a Bayesian hierarchical model to identify voxel-level

connectivity patterns related to clinical covariates, and subsequently used voxel-wise functional

connectivity (vFC) patterns to infer region-level connections. These innovative approaches

have shown enhanced inference results and localized specificity. However, these methods have

limited compatibility in integrating spatial information, thereby presenting a potential area for

improvement.

In addition, other advanced statistical methods have been developed to jointly model two

sets of neuroimaging features by leveraging techniques such as regularization, low rank, and

projection models (Kong et al., 2019; Li et al., 2012b; Wang et al., 2011; Zhu et al., 2014).

These methods have been successfully applied in multimodal imaging data analysis, yielding

intriguing findings (Ball et al., 2017; Hayden et al., 2006; Wehrle et al., 2020; Zhang et al., 2022).

These statistical methods can be broadly classified into two categories. The first category utilizes

regularization-based methods (Wang et al., 2020; Zhou and Li, 2014; Zhu et al., 2017) that aim to

select a parsimonious set of associations between FC and neuroimaging features. However, a major

limitation of these methods is their failure to consider the systematic impact of the features on the

functional connectivity network. The second category employs dimensionality reduction strategies,

such as principal component analysis (PCA) (Chachlakis et al., 2019; Hotelling, 1933; Jolliffe and

Cadima, 2016). These methods project both FCs and neuroimaging features onto a reduced set of

principal components, followed by regression analysis. However, as an unsupervised dimension

reduction technique, PCA-based analysis often captures fewer relevant principal components,

resulting in the omission of underlying true association pairs. Sparse canonical correlation analysis

(sCCA) methods, which integrate elements from both categories, have gained popularity (Lin

et al., 2013; Uurtio et al., 2019; Witten et al., 2009). However, sCCA methods typically operate as

8


vector-to-vector association analyses and may overlook systematic vector-to-network association

patterns. As a result, a methodological gap persists in effectively modeling vector-to-matrix

associations, such as the associations between the neuroimaging feature vector and the FCN

matrix-variate outcome, while incorporating latent topological network structures.

Lastly, question (2) involves the problem of missingness in neuroimaging data, which

commonly arises in neuroimaging studies due to data acquisition limitations and susceptibility

artifacts. Simply omitting missing entries may lead to the exclusion of areas of particular interest

and reduce statistical power. The mean/mode imputation is one of the most commonly used

imputation method, involving replacing missing values with the mean (for continuous variables)

or the mode (for categorical variables) of the observed data. While straightforward to implement,

mean/mode imputation assumes that the missing values share the same statistical characteristics

as the observed values. To improve upon simple imputation, multiple imputation offers a powerful

and flexible approach. MI generates multiple plausible imputations for missing values based

on observed data. Each imputation is analyzed separately, and the results are combined using

specific rules to obtain valid statistical inferences. MI considers the uncertainty associated with

imputed values and provides more reliable estimates compared to single imputation methods.

Furthermore, the Expectation-Maximization (EM) algorithm is an iterative procedure used for

estimating missing values based on maximum likelihood estimation. It assumes that the data

are missing at random (MAR) and iteratively estimates the missing values until convergence. It

is worth noting that neuroimaging data is typically stored in high dimensions, which can pose

challenges for existing imputation techniques, including the intractability of large matrix sampling

and high computational complexity. Therefore, there is a potential for improvement in specifically

addressing the imputation of high-dimensional brain imaging data.

9


1.3 Proposed methods

In this dissertation, I present three statistical methods that address each of the questions

raised in the previous section (see Section 1.2.2). Each method is specifically designed to tackle the

respective research question, and I provide a brief overview of the motivation, current challenges,

proposed solutions, and performance evaluation for each method in the following three subsections.

1.3.1 Spatially constrained and connected networks (SCCN)

Brain connectome analysis commonly compresses high-resolution brain scans (typically

composed of millions of voxels) down to only hundreds of regions of interest (ROIs) by averaging

within-ROI signals. This huge dimension reduction improves computational speed and the

morphological properties of anatomical structures; however, it also comes at the cost of substantial

losses in spatial specificity and sensitivity, especially when the signals exhibit high within-ROI

heterogeneity. Oftentimes, abnormally expressed functional connectivity (FC) between a pair of

ROIs caused by a brain disease is primarily driven by only small subsets of voxel pairs within

the ROI pair. This article proposes a new network method for detection of voxel-pair-level

neural dysconnectivity with spatial constraints. Specifically, focusing on an ROI pair, our model

aims to extract dense sub-areas that contain aberrant voxel-pair connections while ensuring that

the involved voxels are spatially contiguous. In addition, we develop sub-community-detection

algorithms to realize the model, and the consistency of these algorithms is justified. Comprehensive

simulation studies demonstrate our method’s effectiveness for reducing the false-positive rate

while increasing statistical power, detection replicability, and spatial specificity. We apply our

approach to reveal: (i) disrupted voxel-wise FC patterns related to nicotine addiction between

10


the basal ganglia, hippocampus, and insular gyrus from 3269 participants using UK Biobank

data; (ii) voxel-wise schizophrenia-altered FC patterns within the salience and temporal-thalamic

network from 330 participants in a schizophrenia study. The detected results align with previous

medical findings but include improved localized information.

1.3.2 High-dimensional multiple imputation (HIMA)

Neuroimaging data typically contain missing entries due to data acquisition limitations and

susceptibility artifacts. Simply omitting missing entries may exclude areas of particular interest and

decrease statistical power. Besides, many existing model-based imputation methods suffer from

high-dimensional data due to the intractability of large matrix sampling and high computational

complexity. This paper proposes a multivariate multiple imputation method, HIMA, which is

particularly designed for high-dimensional neuroimaging data. To account for approximately

normally distributed brain signals, HIMA employs a joint multivariate normal model and constructs

conditional probabilities based on Bayesian models using Markov chain Monte Carlo processes.

While the normal mean vector is Gibbs sampled, HIMA samples the normal covariance matrix

from the posterior mode (i.e., maximum a posterior probability). We justified that the posterior

mode has achieved good asymptotic properties. Given high-dimensional imaging data, the relaxed

posterior sampling step largely enhances numerical stability and imputation accuracy while

reducing computational complexity from O(Cp3) to O(Cp), where C depends on sample size,

number of iterations, etc., and p is the variable space dimension. We evaluated HIMA on two

imaging datasets (semi-synthetic and real data) and compared it with commonly used imputation

methods. The results showed that HIMA is robust against large datasets (n⌧ p) and it expanded

11


brain map coverage with improved imputed results (reduced bias and dispersion) and significantly

improved computational efficiency (103 times faster than the popular multiple imputation model

MICE).

1.3.3 Multi-level network association method (MOAT)

The goal of our research is to model the association between brain structural imaging (SI)

measures and functional connectomic networks (FCN) derived from neuroimaging data. In this

analysis, the outcomes are off-diagonal elements of functional connectivity (FC) matrices, while

predictors are a multivariate vector of SI variables and nuisance variables. We propose a vector-to-

matrix multi-level network model to capture latent association patterns between subsets of SIs

and FC sub-networks. The first layer network is a bipartite graph characterizing the association

between all SI variables and FC outcomes, where an edge denotes a non-zero FC-SI association.

Previous findings show that a large proportion of edges are often located within dense bipartite

subgraphs, while other edges are randomly and sparsely distributed in the rest of the graph.

The second layer network represents a connectomic graph, where most FC outcomes from the

first layer dense subgraphs comprise dense clique subgraphs. This globally sparse and locally

dense multi-level network model helps to reveal which FCN sub-networks are systematically

influenced by which subsets of SIs. We develop algorithms to identify the underlying multi-level

sub-networks and propose a statistical inference framework to test these sub-networks. We perform

extensive simulation analysis to benchmark the validity and performance of the proposed method.

We further apply our approach to 4242 participants from UK Biobank to evaluate the effects of

whole-brain white matter microstructure integrity and cortical thickness on the whole-brain FCN.

12


1.4 Organization of the Dissertation

The remaining chapters of this dissertation are organized as follows: Chapter 2 presents

the SCCN method, a network-based approach for detecting brain-disease-related alterations in

voxel-pair-level brain functional connectivity while incorporating spatial constraints. Chapter

3 presents the HIMA method, a multiple imputation technique specifically designed to address

missingness in high-dimensional neuroimaging data. Chapter 4 presents the MOAT method,

a multi-level network approach that uncovers the vector-to-matrix associations between brain

structural imaging measures and functional connectomic networks.

13


Chapter 2: Network analysis with spatial-contiguity constraints (SCCN)

2.1 Introduction

Statistical network analysis and graph theory have been fundamental in the study of the

intricate neural circuits in human brains (the “human connectome”) (Bullmore and Sporns, 2009;

Rubinov and Sporns, 2010). A large body of literature has revealed that the human connectome is

a well-organized network, and it exhibits graph properties of intelligent networks such as social

networks and the Internet (Bahrami et al., 2019; Cao et al., 2014). Built on graph theory, brain

network analysis depicts the brain connectome as a graph in which cortical regions are denoted as

nodes and the connections between regions are edges. Under this framework, abundant statistical

models have been developed to study the associations between complex neural connections and

experimental/clinical conditions (e.g., Fornito et al., 2016; Simpson et al., 2013). These models

can help to enhance our understanding of the underlying pathophysiological mechanisms of

brain diseases (e.g., Alzheimer’s disease and Parkinson’s disease) and assist clinical predictions

concerning disease diagnosis and treatment selection.

In brain network studies, regions of interest (ROIs) are often considered as basic units of

analysis, and these are equivalent to nodes/vertices in graph theory. The popularity of region-level

brain network (RBN) analysis comes from its high anatomical consistency and computational

tractability. When a whole-brain connectome is considered, RBN analysis dramatically reduces

14


Figure 2.1: (a) shows the heterogeneity of functional connectivity (FC) among intra-ROI voxels through a seed-
to-voxel analysis using insula as a seed ROI. While both the cingulate cortex and hippocampus are well-known
ROIs, their interior FC with insula varies substantially. (b) shows a simplified example of covariate-related FC of
voxel-pairs located in sub-area pairs (U1, V1), (U1, V2), and (U2, V3) within a larger ROI pair (Region A, Region B).

the search dimensions from trillions (106⇥ 106) to thousands (102⇥ 102). However, RBN analysis

relies on the assumption of signal homogeneity among intra-ROI voxels, which is often violated

in reality. When significant intra-ROI heterogeneity is present, RBN analysis can lead to several

analytical flaws:

1. Variability negligence. Simply averaging the time series of voxels within an ROI can lead to

voxel-level information variability loss (e.g., Figure 2.1(a));

2. Spatial specificity loss. A clinical covariate may alter the ROI-pair connections by disrupting

only a small proportion of intra-ROI voxel pairs. In such cases, RBN analysis fails to

precisely distinguish the localized alterations;

3. Power loss. The averaging process mixes both significant and non-significant voxel-level

connections, which often attenuates the effect size and statistical power.

Recently, many brain network studies have shifted focus from RBN analysis to voxel-level

network analysis (Loewe et al., 2014; Wu et al., 2013). Traditional multiple testing methods (e.g.,

15


the false-discovery rate (FDR) and the family-wise error rate (FWER) control) are not applicable

to high-dimensional multivariate voxel pairs since they are unable to take into account anatomical

restrictions and inherent systematical patterns of disease-associated voxels in ROIs. Some other

existing methods may also have limitations, such as not utilizing rich voxel-level information to

complement region-level connectivity characterization, or yielding relatively hard-to-interpret

results for various reasons (e.g., under-represented neurobiological structures or biases in the

seed-selection process). Several advanced statistical methods have been proposed to address

these limitations. For example, Xia and Li (2017) provided localized statistical inference by

accounting for the network properties. Chen et al. (2016) proposed a Bayesian hierarchical model

to identify the voxel-level connectivity patterns associated with clinical covariates and then used

the voxel-wise functional connectivity (vFC) patterns to infer region-level connections. These

novel approaches yield improved inference results and localized specificity. Nonetheless, they are

not directly applicable to our input data of interest (i.e., an m⇥ n “bi-cluster” rather than an n⇥ n

adjacency matrix), and they do not regulate involved voxels to be spatially contiguous. Unlike

RBN analysis, spatial contiguity is crucial for vFC analysis because: (i) it preserves anatomical

homogeneity, and it hence preserves the interpretability of the vFC results (Thirion et al., 2006);

(ii) it better controls the FDR and FWER since phenotype-related vFC is often intrinsically linked

with the topological structure of the brain connectome (Fan et al., 2012).

In this study, our goal was to identify altered vFC patterns between spatially contiguous

sub-area pairs from a larger region pair. More specifically, given a region pair of interest, we

sought to extract interior sub-area pairs that could maximally cover spatially adjacent covariate-

related vFC with well-controlled FDR and FWER values (e.g., Figure 2.1(b)). Our sub-area

extraction approach is fundamentally distinct from other commonly used brain parcellation

16


Figure 2.2: SCCN pipeline.(a) Preprocess the fMRI data and transform it into a standard brain template. (b) Define
voxels in ROIs as nodes and bonds between voxels as edges. Extract the time series of brain signals from each
voxel. (c) Calculate the connectivity matrix between voxels from regions A and B for each subject. (d) Calculate the
connectivity inference matrix, where each element is a test statistic per edge between clinical groups. A hotter point in
the heatmap suggests a larger between-group difference. (e) Construct the spatial-contiguity constraint matrices for
ROIs A and B (see detailed matrix construction in Section 2.1). In (e1), each dot represents a voxel in 3D coordinates,
where red dots represent positive voxels. Voxels connected by yellow lines form a spatially contiguous area. (f) Detect
the disease-related connections contained in sub-area pairs based on (d) and (e) jointly. (f) is obtained by re-ordering
the nodes in (d), with the densely altered sub-networks pushed to the top (i.e., (d) and (f) are isomorphic graphs). (g)
Conduct the proposed MDL-based network-level statistical inference. The sub-area pairs that pass the statistical tests
are highlighted.

17


methods such as anatomy-based and data-driven approaches (e.g., gradient- or similarity-based

mappings) (Craddock et al., 2012; Wig et al., 2014); these parcellation methods seek to segment

an ROI into different sub-regions, and every single voxel is assigned to a corresponding sub-region.

In contrast to parcellation methods in which every voxel is processed, our sub-area extraction

approach only selects subsets of voxels that are covariate-related and are constrained in spatially

contiguous spaces. All other non-selected voxels are considered to be covariate-indifferent. Sub-

area extraction is more suited to our study because: (i) it is likely that the covariate-related

differences across clinical groups may gather in the vFC between a sub-area in Region A and an

intersection of multiple sub-areas grouped by the existing parcellation methods in Region B; (ii) it

is often found that only a small proportion of voxels in regions A and B are disrupted, and thus a

comprehensive parcellation across the entire ROI is not necessary (Cao et al., 2014).

To achieve the desired sub-area extraction and address the limitations discussed above, we

propose a new statistical network framework to extract Spatially Constrained and Connected

Networks, hereafter referred to as SCCN. SCCN is a two-step method (Figure 2.2) focusing

between a pair of ROIs, say A and B, that are believed to contain aberrant functional connections

caused by a brain disease. In step 1, SCCN extracts spatially coherent sub-area pairs that

maximally contain disease-altered vFC between regions A and B. In step 2, we formally test each

extracted sub-area pair to determine whether it is significantly covariate-associated with multiple

testing controls. If no sub-area pairs are found to be significant, we then consider the region-pair

connectivity as covariate-unrelated. If significant results are seen, the association between the

covariate of interest and the ROI-pair connections can be traced down to smaller but much more

precise sub-areas consisting of extracted voxels. These vFC results may provide insights into

understanding the latent neurophysiological mechanisms of diseases.

18


In this chapter, we show that SCCN provides a consistent estimate for the true community

structure in the sense that the error of edge assignments is negligible in large region pairs. We

empirically evaluate the performance of SCCN through extensive simulation studies. The results

show that SCCN achieves satisfactory performance in increasing statistical power and spatial

specificity while controlling the false-positive rate. Notably, SCCN is easily scalable to both small

and large datasets. Besides, we apply SCCN to two real data examples: a nicotine-addiction

research study using UK Biobank1 data with 3269 participants, and a schizophrenia research study

with 330 participants. Through these applications, we systematically investigate disease-related

sub-network structures using SCCN with rigorously controlled FDR.

2.2 Methods

2.2.1 Background

2.2.1.1 Data structure

Given two ROIs, A and B, each consisting of n and m voxels, respectively, the vFC

association patterns can be represented by a general (n+m)⇥(n+m) outcome matrix. Specifically,

the (n+m)⇥ (n+m) connectivity matrix can be decomposed into three sub-matrices: n⇥ n,

m ⇥ m, and n ⇥ m, which encompass within-A, within-B, and between-region connections

information. Herein, we focus on presenting the new methodology for vFC analysis between ROI

A and B (i.e., n⇥m connectivity matrix), which is motivated by the growing interest in clinical

investigations aimed at exploring neuropsychiatric disorder-related inter-regional vFC changes
1UK Biobank is a large-scale biomedical database and research resource containing in-depth genetic and health

information from half a million UK participants.

19


(Agosta et al., 2013; Rogers et al., 2007; Wu et al., 2011). Due to space limitation, we provide

the statistical framework for within-region vFC analysis (i.e., the n⇥ n and m⇥m connectivity

matrices) along with the additional simulations and real data applications in Appendix 2B.

For a subject s 2 [S] :={1, . . . , S}, let ZA,s

n⇥T
and ZB,s

m⇥T
represent the matrices of voxel-level

blood-oxygenation-level dependent (BOLD) signals at T different time points for ROIs A and B.

The outcome variables are the functional connectivity measures quantified by similarity matrices

between the time series of voxels in A and in B. For example, Y s

ij
, the connectivity strength

between voxel i in A and voxel j in B, can be computed by Y s

ij
= f(ZA,s

i· , ZB,s

j· ), where ZA,s

i·

and ZB,s

j· are the BOLD time series for voxels i and j, and f is a similarity metric (e.g., Fisher’s

z-transformed Pearson correlation). Collecting all Y s

ij
for each voxel pair (i, j) 2 [n]⇥ [m] gives

an inter-region connectivity matrix Ys

n⇥m
. Additionally, a covariate vector Xs

1⇥p
is observed for

each subject s, and this contains demographic and clinical information.

Our goal is to identify clinical/behavioral-related functional connectivity (FC) patterns at

the voxel level. This is because voxel-level findings can reveal altered FC with improved statistical

power and enhanced spatial specificity and resolution. To achieve this, multivariate statistical

inference is required for the n⇥m vFC outcomes (usually in high dimension, e.g., millions) with

spatial constraints. We first test the associations between each outcome Y s

ij
and a regressor of

primary interest xs

1 2 Xs (clinical status in our application, e.g., patient or control):

E(Y s

ij
|Xs) = ↵0 + xs

1�ij +Xs

1⇥(p�1)↵,

where �ij is the coefficient of xs

1 and ↵ is a coefficient vector for the remaining covariates Xs

1⇥(p�1)

(e.g., age, ethnicity, etc). We denote � := {�ij}i2[n],j2[m] and aim to systematically extract vFC

20


whose � 6= 0 with high accuracy. We further summarize the significance levels of � by a

connectivity inference matrix Wn⇥m. Each entry of Wn⇥m is computed by Wij = � log pij ,

where pij is the p-value for �ij . In neuroimaging statistics, the selection of � 6= 0 is not only

determined by the level of statistical significance but also by spatial constraints. In addition to these

two factors, � is also intrinsically linked with an underlying n⇥m bipartite graph between ROIs

A and B. Therefore, we will require both graphic and spatial information to assist in identifying

vFC whose � 6= 0. We present the detailed graphic and spatial constructions as follows.

2.2.1.2 Graph representation

To decipher the complex voxel-pair connectome, we consider a bipartite graph structure

G = {U, V } underlying the inference matrix Wn⇥m. The node sets U and V represent voxels

in ROIs A and B, respectively, where |U | = n and |V | = m. We assume that, after spatial

normalization and registration of the fMRI data, all subjects share a common set of nodes, namely,

(U s, V s) ⌘ (U, V ), 8s 2 [S].

2.2.1.3 Spatial contiguity

Each node in our dataset corresponds to a voxel at a certain spatial position in 3D brain

imaging (e.g., Figure 2.2(e1)). When we map each detected subgroup of voxels back to the 3D brain

space, we desire these voxels to emerge as a spatially adjacent cluster (i.e., connected components).

Such anticipation, translated into formal language, is referred to as spatial contiguity. Specifically,

we define an “infrastructure graph” SA between all nodes within ROI A to accommodate spatial

contiguity. Each entry Sii0 in SA is a spatial-adjacency indicator variable between voxels i and

21


i0 in ROI A, where Sii0 = 1 if dii0  ", and Sii0 = 0 otherwise (dii0 is the Euclidean distance

between voxels i and i0). For example, in a 3D grid space, when ✏ is set to be
p
3, a centroid

voxel i in a cube will have 26 surrounding voxels i0 such that Sii0 = 1. We define and interpret

SB for nodes within ROI B similarly. SA and SB will be used to prescribe the spatial-contiguity

constraints when implementing SCCN. We provide more rigorous mathematical definitions of

spatial contiguity, SA, and SB in Appendix 2A.1.

We propose the SCCN model to systematically select vFC of �ij 6= 0 by jointly considering

the information of voxel-pair-level statistical significance, underlying graph structures, and spatial

constraints. We integrate these into a weighted graph G = {W,SA,SB} as the input of our method.

2.2.2 Detecting densely altered sub-area pairs from an ROI pair

2.2.2.1 Spatial-contiguity-constrained objective function

The node set U corresponding to voxels in ROI A can reportedly be partitioned into mutually

non-overlapping sub-areas {Uc}, denoted by U =
L

C

c=1 Uc (Eickhoff et al., 2015). Similarly,

we have V =
L

D

d=1 Vd for ROI B. In this chapter, we aim to extract sub-area pairs {(Uc, Vd)}

that dominantly contain disease-related voxel pairs, and we call these “densely altered” sub-area

pairs. Formally, a sub-area pair (Uc, Vd) is considered densely altered if
P

(i,j)2(Uc,Vd)
I(�ij 6=0)
|Uc| |Vd|

�

P
(i,j)2(U 0

c,V
0
d)

I(�ij 6=0)
|U 0

c||V 0
d |

, where U 0
c

and V 0
d

are the complements of node sets Uc and Vd. We are

therefore inspired to devise a regularized objective function to generate a checkerboard-like

network structure underlying the connectivity inference matrix W. This network structure

reshuffles W and reveals densely altered {(Uc, Vd)} pairs from (U, V ). In addition, we impose

spatial contiguity on Uc and Vd to improve biological interpretability and prohibit isolated false

22


positive edges. Finally, the objective function is formulated as follows:

argmax
C, D, U=

LC
c=1 Uc,V=

LD
d=1 Vd

(Uc,Vd subject to spatial contiguity)

Z
CX

c=1

DX

d=1

(
log

P
i2Uc,j2Vd

Wij · I(Wij > r)

|Uc||Vd|
(2.1)

+ � log(|Uc||Vd|)

)
g(r)dr,

where � 2 [0, 1] is a tuning parameter, r is a threshold below which there is no disease-related

effect on Wij , and g(r) is the distribution function for r. Both g(r) and � can be chosen by prior

knowledge or by a data-driven method proposed in Section 2.2.2.2.

The tuning parameter � falls in the range [0, 1]: when � = 0, maximizing (2.1) is equivalent

to maximizing f1 =
P

i2Uc,j2Vd
Wij ·I(Wij>r)

|Uc||Vd|
, which is a popular definition for connection density;

when � = 1, maximizing (2.1) is simply maximizing f2 =
P

i2Uc,j2Vd
Wij · I(Wij > r), which

quantifies the magnitude of significant voxel pairs contained by the sub-area pair (Uc, Vd). Direct

optimization of the connection density f1 tends to detect a dense subgraph with a minuscule

size, while the optimization of f2 can trigger an oversized subgraph. Theorem 2.2 shows that

function (2.1) provides a consistent estimate for the targeted topological structure (collections of

edge-induced sub-area pairs) in the sense that the error of edge assignments is negligible in large

region pairs. Extensive simulation studies also show that function (2.1) performs well in balancing

the size and density when detecting subgraphs.

2.2.2.2 Optimization of objective function (2.1) for given g(r) and �

In this section, we focus on optimizing function (2.1) for a given configuration of g(r) and �,

which are the density function for the threshold r and the tuning parameter in (2.1). We will then

23


discuss how to determine g(r) and � in the next section. Unfortunately, even with a given g(r)

and �, direct optimization of (2.1) is still an NP-hard problem. Therefore, traditional optimization

methods, such as gradient descent, cannot be used due to the non-convexity of the problem. Here,

we present an alternative strategy for optimizing (2.1). The essential idea is that we integrate

W with the spatial-contiguity constraints and then estimate the targeted community structure

using modified spectral clustering algorithms via iterative procedures. As presented earlier, the

targeted network structure is {Uc, Vd} partitioned from (U, V ) (i.e., the collection of edge-induced

sub-area pairs, or in other words, the voxel memberships of Uc and Vd), where U =
L

C

c=1 Uc and

V =
L

D

d=1 Vd.

According to the spectral clustering algorithm, applying singular value decomposition

to the Laplacian matrix of W = U⌃V> and then clustering U and V will give partitions of

regions A and B, respectively. Now, since V is the eigenvectors of W>W, spectral clustering on

the Laplacian matrix of W>W will simply give the partitions of Region B. Similarly, spectral

clustering on the Laplacian matrix of WW> will provide the partitions of Region A. Therefore,

our community-detection algorithm can be conducted based on WW> and W>W. Next, to

incorporate the spatial-contiguity constraints into the optimization, we make use of the two

within-region “infrastructure graphs” SA and SB introduced earlier. Specifically, we define

WA = WW>
� SA and WB = W>W � SB, (2.2)

where � is an element-wise product. As pointed out by Kamvar et al. (2003) and Craddock et al.

(2012), SA and SB force the similarity between all pairs of non-adjacent voxels to zero, which

breaks edges between isolated voxels in the graph. Based on this, the n by n matrix WA(ii0)

24


(where i and i0 are two voxels in A) is greater if the voxels in A are spatially adjacent and have a

similar profile linking to voxels in Region B. The spatial-contiguity constraints enable our method

to produce results that better honor the neurobiological background regarding the coherence of

neighboring neuron populations (Thirion et al., 2006).

We can now fit a stochastic block model to WA (and another to WB) using the spectral

clustering algorithm and then perform grid search for the optimizer of function (2.1). We further

examine whether the estimated Ûc and V̂d values satisfy the spatial-contiguity constraints, while

empirically we find that the constraints are typically satisfied. There is thus no need to perform

any further modification step for the constraints. We formally present our clustering procedure in

Algorithm 1.

Algorithm 1 Optimization of objective function (2.1) with given �
.

1: procedure ALGORITHM (Input: � and G = {W,SA,SB})
2: function SCCN.partition (�, G )
3: for C = 1, 2, . . . , |U | do
4: Ratio-cut spectral clustering WA into C networks: U =

L
C

c=1 Uc

5: for D = 1, 2, . . . , |V | do
6: Ratio-cut spectral clustering WB into D networks: V =

L
D

d=1 Vd

7: Substitute network sets U and V into objective function (2.1), and obtain the
output values

8: end for
9: end for

10: return C,D, U =
L

C

c=1 Uc and V =
L

D

d=1 Vd that yield the maximum output value
11: end function
12: end procedure

Consistency for subgraph detection. In Lemma 2.1, we first establish that, given true

sub-areas numbers C⇤ and D⇤, the solution to optimize the objective function (2.1) provides a

consistent estimate for the topological structure of the target community ({Uc, Vd}) (the collection

of edge-induced sub-area pairs) in the sense that false-positive edge assignments are negligible

25


in very large bipartite graphs G = (U, V ), |U | ! 1, and |V | ! 1. Then, we establish the

convergence of Algorithm 1 to optimize function (2.1) based on Theorems 2.2 and 2.3. In Theorem

2.2, we prove that our algorithm can provide a consistent estimate of the number of sub-areas, C

and D. In Theorem 2.3, we prove that the implementation of Algorithm 1 converges to the optimal

solution of the objective function (2.1).

To present the theoretical results, we consider the following settings. Let {e1
ij
} and {e0

ij
} be

the sets of positive (e.g., disease-related) and negative edges, respectively. For a weighted matrix

W, we assume that wij|e1ij
iid
⇠ f1 and wij|e0ij

iid
⇠ f0, where f1 and f0 are two probability density

functions with means and variances (µ1, �2
1) and (µ0, �2

0), respectively. In addition, let M⇤ be

the true membership of edges (the community index of edges falling in sub-area pair (Uc, Vd)).

Furthermore, let M̂(Ĉ,D̂) be the membership estimated by function (2.1) with Ĉ sub-areas in

Region A and D̂ sub-areas in Region B.

Lemma 2.1. (Consistency with known sub-area numbers C⇤ and D⇤). Assume that E(WWT)

is of rank C⇤ with smallest absolute nonzero eigenvalue of at least ⇤A, and E(WTW) is

of rank D⇤ with smallest absolute nonzero eigenvalue of at least ⇤B . Assume further that

max(µ0, µ1, �2
0, �

2
1)  d for some d  max(logn/n, logm/m)). Then, if there exists (2 +

"A)
ndCD

⇤2
A

< ⌧A and (2 + "B)
mdCD

⇤2
B

< ⌧B for some ⌧A, ⌧B, "A, "B > 0, the output M̂(Ĉ,D̂) that

maximizes function (2.1) is consistent to the true membership M
⇤
(C⇤, D⇤) underlying the latent

community structure up to a permutation.

Equivalently, let Ŝc, Ŝd be the estimated node sets for the subgraphs Gc, Gd (induced by

Uc and Vd), respectively. Then Ŝc \ Uc represents the nodes in Gc whose assignments can be

guaranteed. Ŝd \ Vd follows the same definition. With probability at least 1�max(n,m)�1, up

26


to a permutation, we have

CX

c=1

DX

d=1


1�

�� (Ŝc \ Uc)
N

(Ŝd \ Vd)
��

|Uc| |Vd|

�
 ⌧�1

A
(2 + "A)

ndCD

⇤2
A

+ ⌧�1
B

(2 + "B)
mdCD

⇤2
B

,

where
N

denotes the edge set that connects two node sets on its left and right side.

Theorem 2.2. (Consistency for grid-searched C, D). Let the sizes of subgraph pairs |Uc| ⇥

|Vd|(8c = [C⇤], d = [D⇤]) be generated from a multinomial distribution with probabilities

⇡ = (⇡1, . . . , ⇡C⇤⇥D⇤). Assume 9� > 0, such that

µ1 > µ0
1 + �

1� �

✓
1 +

s

1 +
⇡2
min

⇡2
1 + · · ·+ ⇡2

C⇤⇥D⇤

◆
,

then under conditions in Lemma 2.1 and tuning parameter � = 0.5, the number of mis-assigned

edges Nedge satisfies

Nedge = op(nmin ⇤mmin)as |U |, |V |!1,

where nmin,mmin are the sizes of the smallest possible subgraphs in U and V , respectively.

Theorem 2.3. (Convergence of Algorithm 1). Let Ũ =
L

C̃

c=1 Uc, Ṽ =
L

D̃

d=1 Vd be the partitions

yielded by ratio-cut spectral clustering on WA and WB that maximizes function (1) with cluster

numbers C̃, D̃. Then Ũ , Ṽ converge almost surely to the true community structure where false-

positive edge assignments to each sub-bicluster are negligible.

Proof. Proofs of Lemma 2.1 and Theorems 2.2 and 2.3 are provided in Appendix 2C.

In summary, the above results provide theoretical evidence that the solution of the proposed

27


objective function (2.1) and Algorithm 1 converge to the target community structure ({Uc, Vd}).

Moreover, extensive simulation analyses in multiple settings with a wide range of different

sample sizes demonstrate that SCCN can accurately reveal the true community structure with low

false-positive and false-negative rates.

2.2.2.3 Determining g(r) and �

Determining g(r). Following Efron (2012), we can choose g(·) to be a discrete distribution

on thresholds {r1, . . . , rp}. A simple example would be as follows. Suppose that the voxel-pair-

level FDRs yielded by pre-selected thresholds r1, r2, and r3 are 0.20, 0.10, and 0.05, respectively.

We can then assign a higher probability mass to rp that yields a lower FDR, for example, g(r1) =

0.1, g(r2) = 0.3, and g(r3) = 0.6. In addition, r1, r2, and r3 can be chosen from commonly used

thresholds in MRI studies, such as� log(0.005) and� log(0.001) (Wij is� log pij after screening,

and r is a threshold for Wij).

Selecting �. As aforementioned, the tuning parameter � adjusts the balance between the

subgraph size and the connection density; it thus plays a critical role in our method. A large �

encourages large |Uc| and |Vd|, whereas a small � is stricter on the connection densities of (Uc, Vd)

pairs. Essentially, the selection of � is related to the network structure of �ij . In practice, we

have observed from many datasets that the coefficients �ij 6= 0 usually exhibit a block model.

To reflect this, we assume the following hierarchical model. Suppose there exists a non-random,

latent hyperparameter � 2 R
n⇥m with all nonzero elements. We can generate a bipartite similarity

matrix ⌘ 2 {0, 1}n⇥m from a bipartite stochastic block model with blocks {(Uc, Vd)} and the

corresponding connection probabilities {⇡ij}, such that ⌘ij ⇠ Bernoulli(⇡ij) are independent of

28


each other, where

⇡ij =

8
>><

>>:

⇡cd(�) i 2 Uc, j 2 Vd,

⇡0(�) otherwise.

We select the � value that maximizes the likelihood for this block model. In practice, the ⌘ij values

are not directly observable, and we replace them by ⌘ij(r0) := I(wij > r0). The log-likelihood

function for � is:

l�(⇡cd,8c 2 [C], d 2 [D]|⌘ij(r0)) =
X

c,d

X

(i,j)2Uc⇥Vd

⌘ij(r0) log ⇡cd +(1� ⌘ij(r0)) log (1� ⇡cd) .

To eliminate the arbitrariness in choosing the threshold r0, we integrate the likelihood

function with respect to r0 over a prior distribution g0(r0) determined by the method above. This

yields the following criterion:

�optimal = argmax
�

nZ
max

Uc,Vd,⇡cd

lr
�
(⇡cd(�), 8c = [C], d = [D] | ⌘ij(r0)) g0(r0)dr0

o
.

We formally present the procedure to select the tuning parameter � in Algorithm 2. The

overall complexity of the algorithm is O(Knm), where K is a sufficient searching range for �,

n = |U |, and m = |V |.

Since the inference results between clinical groups across S subjects are captured in W, the

algorithm complexity no longer involves sample size S, indicating the scalability of SCCN for large

datasets. In addition, clustering algorithms typically involve computing the first K eigenvectors

of a potentially high-throughput similarity matrix. Our input similarity matrix W is sparse after

applying screening and the spatial-contiguity constraints (usually only 0.2%–5.0% of edges are

29


non-zero entries after processing), which notably reduces computational expense. It is, however,

worth noting that since our algorithm is based on a single region pair, the computational burden

may become heavy when investigating multiple different pairs, especially when a whole-brain

analysis is needed.

Algorithm 2 Grid search for �.
1: procedure ALGORITHM
2: for 0  �  1 do
3: return U =

L
C

c=1 Uc and V =
L

D

d=1 Vd from SCCN.partition function in Algorithm 1
4: for r0 = (r0)1 to (r0)q do
5: Compute the log-likelihood: l�(⇡̂MLE

c⇥d
, 8c = [C], d = [D] | ⌘ij(r0))

6: end for
7: Integrate the log-likelihood w.r.t. r0:
8: l� =

P
p

i=1 L�

�
⇡̂MLE
c⇥d

, 8c = [C], d = [D]| ⌘ij(r)
�
g((r0)i)

9: end for
10: return �̂ that yields maximized l�
11: end procedure

2.2.3 Statistical inference of {(Uc, Vd)} pairs

Recall that our ultimate goal is to extract a few most-densely connected subgraph pairs from

{(Uc, Vd)} based on the block partition {Uc, Vd : c 2 [C], d 2 [D]} that we have already obtained

at this point. A natural idea is to inspect each (Uc, Vd) pair and perform a statistical test on them

with the alternative hypothesis that the subgraph Uc ⌦ Vd is unusually dense. Here, we derive a

cluster-wise permutation test (Nichols and Holmes, 2002) with FWER control. The hypotheses

are:

H0 : G is a random graph (i.e., no dense subgraph Uc ⌦ Vd exists,

Ha : At least one dense subgraph Uc ⌦ Vd exists.

30


More specifically, under H0, the connection density of Uc ⌦ Vd should be close to the density

of sub-area pairs obtained by randomly shuffling edges in the bipartite graph. Built upon the

minimum description length (MDL) principle proposed by Grünwald (2007), we devise the

following MDL-based test statistic:

MDL (Uc, Vd) = log2

✓
n

|Uc|

◆✓
m

|Vd|

◆�
+

✓
1� µ2

1

2 ln 2
� L⇣

◆
|Uc|⇥ |Vd|,

where µ1 is the mean value of edge-wise test statistics ⇣ij for edges within Uc ⇥ Vd, and

L⇣ = �
R
�(⇣ij) log2 (�(⇣ij)) d⇣ij + C is an information entropy measure based on the standard

normal distribution � for ⇣ij . Detailed derivations for the MDL-based test statistic and its

connections to our inference goal are provided in Appendix 2C.4. We formally present the

cluster-wise permutation test for each observed sub-area pair (Uc, Vd) in Algorithm 3. The number

of permutations H in this algorithm can be determined based on the sample size, the targeted

computational expense, and the precision of the test. For example, H = 1000. Compared to

conventional multiple testing correction methods (e.g., FDR and FWER), the MDL-based cluster-

wise permutation test returns suppressed false-positive findings and shows improved statistical

power in real-data examples and simulations.

2.3 Simulations

In the simulation study, we probed whether SCCN can extract densely altered sub-area pairs

with better performance compared to common existing methods. Specifically, we evaluated the

performance from two perspectives. (i) Multivariate edge-level inference: whether extracted voxel

pairs have a high true-positive rate (TPR) and low false-positive rate (FPR); (ii) network-level

31


Algorithm 3 MDL-based cluster-wise permutation test for each (Uc, Vd) pair
1: procedure ALGORITHM
2: Compute T 0

c,d
= MDL (Uc, Vd) for each (Uc, Vd) pair yielded with true covariate labels

3: for h = 1, . . . , H do
4: Permute covariate labels and obtain the new inference connectivity matrix Wh

5: Obtain Uh =
L

C

c=1 U
h

c
and V h =

L
D

d=1 V
h

d
by substituting Wh in Algorithms 1 and

2
6: return T h = max

�
MDL

�
Uh

c ,V
h
d

��

7: end for
8: Compute p-value for each observed (Uc, Vd) pair: Pc,d =

P
I(Th

>T
0
c,d)

H

9: return the significance of each observed (Uc, Vd) pair based on Pc,d at a predetermined
↵-level

10: end procedure

inference: whether the extracted sub-areas contain maximal true-positive voxels, compared to

other unextracted sub-areas.

2.3.1 Primary analysis

We first generated a bipartite graph G = {U, V } to represent the brain connectome between

two brain regions A and B for S subjects, where U corresponds to the voxel set in Region A,

and V corresponds to that in Region B. We assume all S subjects share a common set of nodes

after spatial normalization and registration, i.e., (U s, V s) ⌘ (U, V ), 8s 2 [S]. Next, we simulated

covariates of interest {X1, . . . ,XS
} that contain clinical information of all S subjects. Lastly, we

simulated the Fisher’s z-transformation connectivity matrices {Z1, . . . ,ZS
} between regions A

and B, where Zs
2 R

n⇥m, n = |U |,m = |V |. Specifically, each element zs
ij

in Zs was set to

follow N (h(zs
ij
), �2), where h(zs

ij
) = Xs�ij is location-specific within regions A and B.

In the following, we show the numerical settings under the above simulation framework:

1. For the two pre-defined brain regions of interest, we simulated |U | = 900 voxels in Region A

and |V | = 1600 voxels in Region B. Within |U | and |V |, we also randomly simulated three

32


Figure 2.3: A 2D visualization of performance by different methods. In (a1), the true spatial locations of sub-areas
U1 and U2 are displayed in a simulated 30⇥ 30 grid space while V1, V2, and V3 are displayed in a simulated 40⇥ 40
grid space. Sub-areas with the same color contain disease-related edges from A to B, i.e., (U1, V1), (U1, V2), and
(U2, V3) are the positive sub-area pairs. (a2) shows the scenario with false negative and false positive noises added
to mimic the real vFC patterns in the brain connectome. (a3) shows the connectivity inference matrix W obtained
based on (a2). (b)-(e) show the detected disease-related voxel pairs (again only regions with the same color form
a pair) under different variances � and sample sizes S. The last row shows the isomorphic graphs of (a3) with the
extracted sub-area pairs pushed to the top when � = 1. We highlight the voxels from the supra-threshold voxel
pairs that were yielded by the FDR-control and FWER-control, and voxels in sub-area pairs that were extracted by
BSGP and SCCN. Multiple testing with FDR-control and FWER-control tend to extract an excess of voxels with high
false-positive error rates. BSGP better controls the error rates, but it extracts voxel pairs without differentiating the
correct area-wise connections, i.e., (U1, V1), (U1, V2), and (U2, V3). In contrast, SCCN can simultaneously recover
the spatially-contiguous sub-areas, respectively, in A and B, and reveal the correct disease-related vFC patterns.
(f)-(j) show that no single differentially expressed sub-area pair was extracted by the biclustering algorithms listed.

33


disease-related sub-area pairs (U1, V1), (U1, V2), and (U2, V3). The true spatial locations of

these five sub-areas in the simulated 2d grid spaces are presented in Figure 2.3(a1). Not every

possible pair {(Uc, Vd), c = [2], d = [3]} was associated with the disease; only regions with

the same color exhibited dysconnectivity from A to B. The sizes of these sub-area pairs were

|U1||V1| = 84⇥ 70 = 5880, |U1||V2| = 84⇥ 64 = 5376, and |U2||V3| = 96⇥ 117 = 11 232.

In addition, we included spatially isolated abnormal voxels as well as noise within regions A

and B to mimic more realistic neural connectivity (Figure 2.3(a2)).

2. For the Fisher’s z-transformation connectivity matrices {Zs, s 2 S}, we set h(zs
ij
) =

�0 + �ij,1xs

1 + �ij,2xs

2 + �ij,3xs

3, where xs

1 and xs

2 store the age and sex information for

subject s, and xs

3 represents their clinical status (xs

3=1 if patient s has a mental disorder, and

0 for a healthy control.). In addition, while �ij,1 and �ij,2 are typically not spatially variant,

�ij,3 is considered brain-region specific:

�ij,3 =

8
>>>>>><

>>>>>>:

0.9, if (i, j) 2 (U1, V1)
S

(U1, V2),

0.13, if (i, j) 2 (U2, V3),

0, if (i, j) 2 U/{(U1, V1)
S

(U1, V2)
S

(U2, V3)}.

3. To control standardized effect sizes, we set �2 = 0.5, 1.0, 2.0 in Zs
⇠ N (h(zs

ij
), �2).

Additionally, four sample sizes, S = 100, 200, 2000, and 20, 000, were used, each with

balanced healthy controls and patients. All settings with different (�, S) were simulated for

1000 times to assess the variability of the TPR and FPR.

We implemented Algorithm 1 and 2 of SCCN to identify sub-area pairs from each simulated

dataset, and we then applied Algorithm 3 to conduct cluster-wise inference on the sub-area pairs

34


detected. To assess the performance of the multivariate edge-wise inference, we considered two

conventional multiple-testing controls (FDR and FWER). Specifically, we used the voxel-wise

permutation test (with 1000 permutations) to control the FWER and the Benjamini–Hochberg

procedure (with q = 0.05 as a cut-off) to control the FDR (Benjamini and Hochberg, 1995). To

assess the accuracy of the cluster-wise performance, our goal was to compare true disease-related

subgraphs {(Uc, Vd)} with the estimated subgraphs {(Ûc, V̂d)} produced by five commonly used

biclustering algorithms (i.e., Cheng and Church, Plaid, OPSM, xMOTIF, and Spectral Biclustering

(J. K. Gupta, 2013)).

The edge-wise inference results are presented in Table 1, and graph illustrations of the

results is shown in Figure 2.3. For the edge-wise inference performance with all different (�, S),

SCCN outperforms the two traditional multiple testing correction methods (i.e., FDR and FWER

control) in terms of TPR, while its ability to control the FPR falls in between the two. SCCN’s

relatively inferior performance in controlling the FPR (compared to sensitivity) can sometimes be

impacted by the following disadvantage: in traditional multiple testing methods with universal

thresholds, one false-positive finding corresponds to exactly one false-positive edge. However,

SCCN detects altered edges by partitioning voxels within each ROI; therefore, one false-positive

finding by SCCN corresponds to one false-positive voxel, say vi 2 Uc, which will lead to n false

positive edges when Vd (|Vd| = n) is found to connect to |Uc|. The greater the size of Vd, the more

false-positive edges will be yielded. Nonetheless, even with such a heavy penalty for detecting

one false-positive voxel, SCCN still controls the FPR and shows better performance when jointly

considering the TPR and FPR. More importantly, false-positive edges discovered by the traditional

FDR and FWER correction approaches almost cover all within-ROI voxels, which leads to a

substantial loss of spatial specificity when identifying covariate-related vFC patterns.

35


Table 1: Simulation results. The four sub-tables show the inference results given different sample sizes and variances,
where TPR and FPR correspond to the edge-wise true positive rate and false positive rate. Network detection results
indicate whether the algorithm can successfully extract the correct connection patterns between disease-related
sub-area pairs.

36


Regarding the network-level inference performance, all common biclustering methods failed

to detect any positive biclusters (differentially expressed sub-area pairs) except for BSGP. However,

BSGP nonetheless failed to ensure spatial contiguity, and the precise connection between the

extracted sub-areas was not correctly revealed. That is, unlike the results yielded by SCCN

(Figure 2.3(e)), BSGP (Figure 2.3(d)) could not effectively differentiate between yellow and

blue clusters. In comparison, SCCN shows outstanding network-level performance for detecting

community structures and incorporating spatial contiguity.

2.3.2 Negative control analysis

We further performed a negative control analysis to evaluate the FPR of our method. We

consider a scenario in which the connections between a pre-selected ROI pair are unrelated to a

clinical condition of interest. We generated |U | = 900 and |V | = 1600 voxels in regions A and

B. We distinguished the patient and control groups as 1 and 0, but since there were no abnormal

sub-area pairs {(Uc, Vd)} across groups, we simply set the connectivity matrices Zs
⇠ N (0, �2)

over the entire regions for all S subjects. Based on Zs, we obtained the inference matrix W0

across clinical groups. Since the network detection was validated to be scalable to different sample

sizes and sample variances, we evaluated the configuration (S = 1000, � = 1) as a proof of

concept. Finally, we implemented SCCN on W0. Since the false positive voxel pairs tended

to be distributed randomly, no sub-area pairs were significant. Therefore, the sub-area-level

false positive findings were 0. The edge-wise FPR (supra-threshold voxel-pairs) among 1000

iterations was 6.82⇥ 10�5(std. 1.29⇥ 10�5), which is consistent with the pre-determined alpha

level (E(p) = 0.00005). We have provided a graph visualization of these results in Appendix 2F.

37


In summary, we have shown that the sub-area detection is not affected by different values

of variance �2, sample size S, or other sources of noise. SCCN also yields vFC patterns with

high sensitivity and low FPRs. The spatial-contiguity constraints allow positive edges to borrow

strengths from each other within a data-driven sub-area; sensitivity is thus notably increased.

Data-driven sub-areas with these constraints can also exclude false-positive edges that bridge

voxels that are randomly scattered in ROIs. False-positive findings are therefore largely suppressed.

In addition, the jointly improved sensitivity (and thus statistical power) and control of the FPR

yield almost identical voxel sets across all simulated datasets. Replicability is hence remarkably

improved.

2.4 Real data application

In this section, we apply SCCN to two real datasets to investigate the voxel-level altered

connections under different clinical settings. Dataset 1 contains 3269 subjects from a nicotine-

addiction study using fMRI data collected from the UK Biobank database. Dataset 2 includes 330

subjects from a schizophrenia (SZ) research study using fMRI data collected in Baltimore, MD.

2.4.1 Nicotine-addiction research study

2.4.1.1 Sample characteristics

Our primary dataset consists of 3269 individuals from the UK Biobank database, including

1353 current smokers (M/F: 737/616, age: 48.6± 15.3) and 1916 previous light smokers (M/F:

1187/729, age: 32.9 ± 18.1). Additional information on the selection of these 3269 subjects is

provided in Appendix 2E.1. Specifically, we define current smokers as participants who currently

38


smoke more than ten cigarettes per day, indicating nicotine addiction 2. Conversely, we define

previous light smokers as individuals who had tried only a few cigarettes in the past but are not

currently addicted to nicotine products, serving as controls 3. For detailed information on fMRI

imaging acquisition and preprocessing procedures, please refer to Appendix 2E.2.

2.4.1.2 Clinical background

Abundant literature shows that the basal ganglia (BG), hippocampus (Hippo), and insular

gyrus (Ins) play important roles in nicotine addiction (Ersche et al., 2011; Gaznick et al., 2014;

McClernon et al., 2016). We therefore intend to look into the disrupted connectivity patterns

between these three bilateral ROIs, resulting in a total of 12 pairs. To maintain conciseness,

we present the results for the (left BG, left Ins) pair in the main text, while the remaining 11

cases are provided in Appendix 2E.4. By investigating the altered vFC patterns across different

clinical groups, we aim to gain insights into the underlying neurological mechanisms of nicotine

dependence and ultimately assist smokers in resisting nicotine cravings.

We labeled the left BG and left Ins using the Brainnetome Atlas(Fan et al., 2016)(left BG:

2345 voxels; left Ins: 1762 voxels). For each subject, we calculated the vFC matrix between the

left BG and left Ins, with each entry representing a Fisher’s z-transformed Pearson correlation

coefficient. Next, we calculated the population-level statistical inference matrix W(BGleft,Insleft)
2345⇥1762

across all subjects while adjusting for age, sex, site, educational level, and Body Mass Index

(BMI). More details regarding the selection of nuisance covariates can be found in Appendix

2E.3. Applying SCCN and the MDL-based test, we extracted abnormal sub-area pairs from
2ACE touchscreen question ”About how many cigarettes do you smoke on average each day? ”
3ACE touchscreen question ”In the past, how often have you smoked tobacco?”

39


W(BGleft,Insleft) with spatial-contiguity constraints. Lastly, we compared these results with those

obtained from comparative methods.

2.4.1.3 Network-level results

Each entry in the inference matrix W(BGleft,Insleft) is endowed with a� log p value testing the

vFC difference between current smokers and previous light smokers (Figure 2.5(1)). Implementing

Algorithm 2 returned the MLE �̂ = 0.75. Given the estimated �̂, Algorithm 1 returned the number

of clusters Ĉ = 306, D̂ = 210 for W(BGleft,Insleft). The MDL-based test returned six abnormal sub-

area pairs, which are marked in red in Figure 2.5(2). A 3D demonstration of the detected sub-area

pairs from W(BGleft,Insleft) is shown in Figure 2.5(a)–(e) (with a significance level of 0.05 selected

for the MLD-based permutation test). All extracted sub-area pairs show well-organized topological

structures. Results indicate that the majority of aberrant vFC patterns from W(BGleft,Insleft) are

gathered between the medial inferior part of the left basal ganglia and the left insula.

2.4.1.4 Biological interpretation of detected sub-areas

The detected sub-areas consist of several locations that are believed to be frequently

associated with nicotine addiction, including the medial inferior part of the basal ganglia and

the posterior insula. We also observed decreased connectivity within these regions in current

smokers, which aligns with the previous medical discovery that decreased resting-state functional

connectivity is correlated with increased nicotine-addiction severity (Fedota and Stein, 2015;

Sutherland and Stein, 2018). The incorporated spatial-contiguity constraints help unfold the

sub-areas within the BG, Hippo, and Ins, which maximally cover addiction-related vFC. These

40


Figure 2.5: Detected sub-area pairs from a nicotine-addition study. (1) A heatmap of W(BGleft,Insleft): rows and
columns correspond to voxels from the left basal ganglia and the left insula, respectively. (2) Results yielded by
SCCN: altered sub-area pairs that pass the MDL-based permutation test are highlighted in red boxes. (3) Results
yielded by BH-FDR: The hypothesis testing error measure was set to be q = 0.05 as a cut-off. No sub-area pairs were
detected. (4) Results yielded by BSGP: only one positive yet much less dense sub-area pair was detected. The detected
sub-area pair also lack spatial contiguity and specificity. (a)-(d) shows the 3D demonstration of the 6 detected altered
sub-areas from W(BGleft,Insleft). (a)-(e) show a 3D demonstration of the detected results from W(BGleft,Insleft). Based
on the p-values from the MDL-based permutation test shown in (e), most positive sub-area pairs are located in the
medial inferior part of left basal ganglia and left insula.

41


novel findings improve the spatial specificity of addiction-related locations in the three brain

regions and may lead to future guidance for resisting the urge to use nicotine products.

2.4.1.5 Comparisons with existing methods

For comparison purposes, we again performed the BH-FDR correction edge-wisely and

BSGP cluster-wisely on W(BGleft,Insleft). By first conducting an initial edge-wise significance

test across the current and previously light smoker groups, only 7.29% of the edges were found

to be significant (p < 0.005). However, no edges showed significance after applying BH-FDR

correction with q = 0.01 (Figure 2.5(3)). When applying BSGP to W(BGleft,Insleft), only one

abnormal sub-area pair was detected (Figure 2.5(4)), with 49.5% edges of p > 0.005 included

in the detected pair, compared to 3.12% yielded by SCCN. In comparison to the two existing

methods, SCCN yields much more densely altered vFC contained in spatially contiguous sub-area

pairs with strong topological structures.

2.4.2 Schizophrenia research study

2.4.2.1 Sample characteristic

Our primary dataset contains 330 individuals, including 148 SZ patients (M/F 84/64, age

37.5±14.4) and 182 healthy controls (M/F 80/102, age 37.0±16.1). The participants were required

for a large ongoing study of the effects of cognitive deficits in SZ. Specifically, the study probed

how cognitive deficits contributed to functional disability in SZ patients and how they were related

to altered functional networks that serve cognition. All subjects were assessed at local research

centers in the greater Baltimore area between 2004 and 2016 using uniform recruitment criteria,

42


and neurological and clinical assessments. Detailed information about participant demographics,

the recruitment process, imaging acquisition, and fMRI preprocessing procedures can be found in

Appendix 2D.1.

2.4.2.2 Salience network disrupted connectivity

Clinical background The salience network, which is mainly composed of the bilateral insula

and cingulate cortices, is related to several core SZ symptoms. A vast amount of literature in

neuroimaging research suggests that the connectivity in the salience network is disturbed during

information processing in SZ patients (Palaniyappan et al., 2012). We therefore intend to focus

on the bilateral insula and cingulate cortices and study the schizophrenic-altered vFC patterns

between them. Specifically, we want to extract schizophrenic-impacted edges that connect voxels

from spatially coherent sub-areas within the insula to those within the cingulate cortex. This data-

driven extraction of sub-areas caused by vFC abnormality in SZ may provide insights for more

effective clinical treatments (e.g., by transcranial magnetic stimulation or deep-brain-stimulation

therapies). We labeled the bilateral insula and cingulate cortices based on the Brainnetome Atlas

(Fan et al., 2016) (left insula: 1762 voxels; right insula: 1577 voxels; cingulate cortex: 5768

voxels). We applied SCCN and the MDL-based test to the edge-wise connectivity inference

matrices WL

1762⇥5768 and WR

1577⇥5768 for the (left Ins, cingulate) and (right Ins, cingulate) ROI

pairs respectively while adjusting for age and sex. WL and WR were obtained by the same

computational procedures as in Dataset 1. Lastly, we compared the detection results with those

obtained by comparative methods.

43


Figure 2.6: Detected sub-area pairs in salience network from a schizophrenia study (2D). (1) A heatmap of WL:
rows and columns correspond to the voxels from the left insula and the cingulate cortex, respectively. A hotter entry
indicates a more differentially expressed voxel pair between clinical groups adjusted for other covariates. (2) Results
yielded by SCCN: positive sub-area pairs that pass the MDL-based permutation test are highlighted in red boxes.
There are many edges with small p-values outside the red boxes (e.g., in the bottom left corner) because they are
not spatially contiguous to those inside the boxes, and are automatically excluded by SCCN. (3) Results yielded by
BH-FDR: with q = 0.05, no sub-area pairs were detected. (4) Results yielded by BSGP: only one informative yet
much less dense sub-area pair was detected. The detected sub-area pair was also lack of spatial contiguity and
specificity.

Network-level results Each element in the vFC inference matrix WL is WL

ij
= � log(pL

ij
),

where pL
ij

is the p-value testing the case-control vFC difference for the (i, j) pair between the

left insula and cingulate cortex (Figure 2.6(L1)). We then perform screening on WL using

a pre-selected threshold (e.g., p = 0.05): WL

ij
= (WL)ij · I

�
(WL)ij  � log(0.05)

�
. The

post-screened inference matrix WL can effectively exclude most non-informative false-positive

edges while maintaining a high proportion of true-positive edges (Fan and Lv, 2008; Li et al.,

2012a). Similar settings apply to WR (Figure 2.6(R1)). Implementing Algorithm 2 returned a

maximum-likelihood estimation (MLE) of �̂L = 0.625 for WL and �̂R = 0.75 for WR. Given

the estimated �̂, Algorithm 1 returned the number of clusters ĈL = 135, D̂L = 107 for WL, and

44


Figure 2.7: Detected sub-area pairs in salience network from a schizophrenia study (3D). Let LIi be the i-th
disrupted sub-area detected from the left insula that is connected to the j-th sub-area from the cingulate cortex, CLj .
Let NLIi denote the number of voxels in sub-area LIi, and similarly NCLj for CLj . (a)(c) show the images of the
original left insular and cingulate cortex; (b) shows the SZ-affected sub-areas in the left insula that are connected to
those in the cingulate cortex highlighted in (d); (e) shows the architecture of interconnections between the detected
sub-areas from WL and the associated p-values from the MDL-based permutation test. A 3D demonstration of the
detected results from WR is provided in Appendix 2D.2.

ĈR = 225, D̂R = 226 for WR. The MDL-based test returned nine abnormal sub-area pairs for

WL and ten abnormal sub-area pairs for WR (marked in red in Figure 2.6(L2) and (R2)). A 3D

demonstration of the detected results from WL is shown in Figure 2.7 (using a significance level

of 0.05 from the MDL-based permutation test). Information regarding the precise sizes, p-values,

and locations is also specified in Figure 2.7. All extracted sub-area pairs show well-organized

topological structures. Overall, the aberrant vFC patterns from WL are gathered between the

dorsal insula and anterior cingulate cortex (ACC). Detailed detection results for WR are provided

in Appendix 2D.2.

45


Biological interpretation of detected sub-areas The detected sub-areas consist of several

well-known brain regions that are believed to be frequently associated with SZ disorder, including,

most remarkably, the anterior insula (AI) and ACC. Emotions that most strongly engage the AI,

such as anger and fear, are those that SZ patients tend to have the most difficulty recognizing

(Wylie and Tregellas, 2010). Furthermore, the densities of neurons, axons, and synapses are

found to be abnormal in the ACCs of people with SZ (Arnold and Trojanowski, 1996). All of the

aberrant edges detected showed decreased or equivalent connections in SZ patients. This aligns

with medical findings that SZ is a “dysconnectivity” disorder with primarily reduced FC across

the salience network (Lynall et al., 2010), although medication effects cannot be completely ruled

out. The imposed spatial-contiguity constraints help unfold brain sub-areas of the bilateral insula

and cingulate cortices that maximally cover disease-related vFC. These novel findings improve

the spatial specificity of SZ-related dysconnectivity in the well-known salience network and may

lead to guidance for future treatments.

Comparisons with existing methods For comparison purposes, we performed the Benjamini–

Hochberg FDR (BH-FDR) correction edge-wisely and a commonly used biclustering algorithm,

bipartite spectral graph partitioning (BSGP), cluster-wisely. By first conducting an initial correlation

analysis between vFC and schizophrenic status, 17.84% of the edges in WL were found to have

p < 0.005 significance, where p = 0.005 is a commonly used yet uncorrected threshold in

neuroimaging studies (Derado et al., 2010). After applying BH-FDR correction, 9.45% of the

edges were found to be significant using the threshold of q = 0.01 (Figure 2.6(L3)), and no

community structure was revealed. For WR, 13.50% of edges had p-values less than 0.005, and

only 3.61% significant edges were found after BH-FDR correction with q = 0.01 (Figure 2.6(R3));

46


again, no community structure was found in WR. When applying BSGP to both WL and WR,

only one abnormal sub-area pair was detected (Figure 2.6(L4) and (R4)), with more than 36.80%

edges of p > 0.005 included compared to SCCN. In comparison to the existing methods, SCCN

yields much more densely schizophrenia-associated vFC contained in spatially contiguous sub-area

pairs with stronger topological structures.

2.4.2.3 Temporal-thalamic disrupted connectivity

In contrast to the reduced salience network connections in SZ patients, many studies have

shown that SZ patients have greater thalamic connectivity with multiple sensory-motor regions,

including, most remarkably, the temporal gyrus (Cetin et al., 2014; Ferri et al., 2018). More

specifically, thalamus to middle temporal gyrus connectivity was positively correlated with many

core SZ features, such as hallucinations and delusions. We therefore aim to use SCCN to identify

some novel findings between the middle temporal gyrus on the right hemisphere and the bilateral

thalamus in SZ patients. Based on the Brainnetome Atlas, there are 3566 voxels in the right

middle temporal gyrus (labeled 82, 84, 86, and 88) and 3275 voxels in the bilateral thalamus

(labeled 231–246). We computed the vFC connectivity inference matrices W(Temright,Thaleft)
3566⇥1727 and

W
(Temright,Tharight)
3566⇥1548 between clinical groups and then implemented SCCN. Due to limited space

here, we provide the results for the selections of all parameters and densely altered sub-area pairs

in Appendix 2D.3.

47


2.5 Discussion

Psychiatric and neurological disorders are often associated with a disrupted brain connectome.

To improve the spatial specificity and sensitivity for detecting a disease-impacted brain connectome,

in this work, we focused on voxel-level connectivity network analysis. We developed statistical

models focusing on extracting abnormal voxel pairs from a region pair of interest, which can

be further extended to whole-brain connectome analysis. We have attempted to simultaneously

address the challenges of a controlled FPR for multiple voxel-pair testing and the spatial-contiguity

constraints for vFC analysis.

In addition, the brain parcellation to extract sub-areas is usually based on commonly used

brain atlases (e.g., Brodmann’s map or the International Consortium for Brain Mapping), and these

were built on comprehensively studied cortical anatomy, such as complex gyro-sulcal folding

patterns. Different regions blocked by gyri and sulci tend to show differential neurobiological

structures and functions, and these atlases can thus serve as a good foundation to investigate

sub-area community structures. However, to further overcome the limitation of using existing

brain parcellations, one can consider combining any extracted spatially adjacent sub-areas from

a pair of spatially adjacent regions if the combination is statistically coherent and biologically

meaningful.

The centerpiece of our proposed method is the identification of sub-area pairs containing an

unusually high density of phenotype-related voxel pairs. By leveraging th