ABSTRACT

Title of Dissertation: Application of Causal Inference in
Large-Scale Biomedical Data

Zhiwei Zhao
Doctor of Philosophy, 2024

Dissertation Directed by: Professor Shuo Chen and Chixiang Chen
Department of Mathematics

This dissertation contains three projects that tackle the challenges in the application of

causal inference on large-scale biomedical data. Project 1 proposes a novel mediation analysis

framework with the existence of multiple mediators and outcomes. It can extract the mediation

pathway efficiently and estimate the mediation effect from multiple mediators simultaneously.

The effectiveness of the proposed method is validated through extensive simulation and a real

data application focusing on human connectome study. Project 2 introduces a doubly machine

learning based method, assisted by algorithm ensemble, for estimating longitudinal causal effects.

This approach reduces estimation bias and accommodates high-dimensional covariates. The va-

lidity of the proposed method is justified by simulation studies and an application to adolescent

brain cognitive development data, specifically evaluating the impact from sleep insufficiency on

youth cognitive development. Project 3 develops a new bias-reduction estimation that addresses

unmeasured confounding by leveraging proximal learning and negative control outcome tech-

niques. This method can handle a moderate number of exposures and multivariate outcomes in


the presence of unmeasured confounders. Both numerical experiment and data application using

UK Biobank demonstrate that the proposed method effectively reduces estimation bias caused by

unmeasured confounding. Collectively, these three projects introduce innovative methodologies

for causal inference in neuroimaging, advancing mediation analysis in neuroimaging, improving

longitudinal causal effect estimation, and reducing estimation bias in the presence of unmeasured

confounding.


APPLICATION OF CAUSAL INFERENCE
IN LARGE-SCALE BIOMEDICAL DATA

by

Zhiwei Zhao

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2024

Advisory Committee:
Professor Shuo Chen, Chair/Advisor
Professor Chixiang Cheng, Co-Advisor
Professor Vince Lyzinski
Professor Tianzhou Ma
Professor Yan Li


© Copyright by
Zhiwei Zhao

2024


Acknowledgments

I owe my heartful gratitude to all the people who have made this thesis possible and because

of whom my graduate experience has been one that I will cherish forever.

First and foremost I’d like to thank my advisor, Professor Shuo Chen for giving me an

invaluable opportunity to work and explore on challenging and extremely interesting projects in

a rising area over the past years. He has always made himself available for help and advice for

all sorts of things and there has never been an occasion when I’ve knocked on his door and he

hasn’t given me time. It has been a pleasure to work with and learn from such an extraordinary

individual.

I would like to thank my co-advisor, Professor Chixiang Chen. Without his extraordi-

nary theoretical ideas and methodological expertise, this thesis would have been a distant dream.

Thanks are due to Professor Vince Lyzinski, Professor Tianzhou Ma, and Professor Yan Li for

agreeing to serve on my thesis committee and for sparing their invaluable time reviewing the

manuscript.

My cohort and also friends at Pittsburgh, Maryland, and Emory have enriched my graduate

life in many ways and deserve a special mention. Their fruitful insights on my projects, uncon-

ditional supports, and tireless encouragements are the most important motivation throughout this

journey.

I owe my deepest thanks to my family - my mother and father who have always stood by

ii


me and unconditionally supported me through my study. I would also like to thank my cousin

Hannah, and to other families and friends. Without them, none of my work would have been

possible.

Lastly, I would like to acknowledge financial support from Dr. Emerson Wickwire during

the past two years of my PhD study. It was a pleasure to collaborate with him.

iii


Table of Contents

Acknowledgements ii

Table of Contents iv

List of Tables vii

List of Figures ix

List of Abbreviations xii

Chapter 1: Introduction 1
1.1 Introduction of Causal Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Causal Graphical Models and Identifications . . . . . . . . . . . . . . . . 2
1.1.2 Mediation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Causal Inference with Time-varying Covariates . . . . . . . . . . . . . . 6
1.1.4 Unmeasured Confounding . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Large-Scale Biomedical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Human Connectome Project . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Adolescent Brain Cognitive Development . . . . . . . . . . . . . . . . . 9
1.2.3 UK Biobank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 multivariate-mediator and multivariate-outcome mediation model (MMO) 11
1.3.2 Marginal Structure Ensemble Learning Model (MASE) . . . . . . . . . 12
1.3.3 Unmeasured Confounding with Multiple Outcomes and Exposures . . . . 12

1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Chapter 2: multivariate-mediator and multivariate-outcome mediation model (MMO) 14
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Multivariate mediation model . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Model estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.3 Inference for dense bi-clusters . . . . . . . . . . . . . . . . . . . . . . . 25

2.3 Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4.1 Parameters and settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.2 Comparable methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

iv


2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Chapter 3: Marginal Structure Ensemble Learning Model (MASE) 38
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.1 Marginal Structural Model with Time-varying Exposure . . . . . . . . . 41
3.2.2 Doubly Robust Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.3 Ensemble Learning For time-varying PS and ICEs . . . . . . . . . . . . . 45

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Chapter 4: Unmeasured Confounding with Multiple Outcomes and Exposures 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.3 Comparison of Existing Methods . . . . . . . . . . . . . . . . . . . . . . 66

4.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.1 Estimation of S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.2 Estimation of βsl with S . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.3 Estimation of βsl with Ŝ . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.4.1 Metabolic Dysfunction Effects on Brain White Matters . . . . . . . . . . 76
4.4.2 Subgroup Analysis on APOE4 Carriers . . . . . . . . . . . . . . . . . . 78

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Appendix A: MMO 82
A.1 Mediation Bi-luster and Effect Estimation . . . . . . . . . . . . . . . . . . . . . 82

A.1.1 Dense Bi-Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.1.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.1.3 Choice of Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 85
A.1.4 Causal Mediation Estimation . . . . . . . . . . . . . . . . . . . . . . . . 86
A.1.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
A.1.6 Mediation Effect Estimation . . . . . . . . . . . . . . . . . . . . . . . . 88

A.2 Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
A.2.1 Data Acquisition and Processing . . . . . . . . . . . . . . . . . . . . . . 90
A.2.2 Mediation Pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.2.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A.2.4 Selected Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

A.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.3.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.3.2 Additional Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
A.3.3 Low-rank mediators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

v


Appendix B: MASE 99
B.1 Marginal Structure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

B.1.1 DAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
B.1.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.1.3 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

B.2 Data Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.2.1 Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.2.2 Other Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
B.2.3 Positivity Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
B.2.4 Other Summary Information . . . . . . . . . . . . . . . . . . . . . . . . 112

Appendix C: Unmeasured Confounding with Multiple Outcomes and Exposures 118
C.1 Proof of Lemma 1 and Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . 118
C.2 Extra Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

C.2.1 Numerical results for Simulation . . . . . . . . . . . . . . . . . . . . . . 119
C.2.2 Influence of Estimated Ŝ . . . . . . . . . . . . . . . . . . . . . . . . . . 119
C.2.3 Other De-bias Shrinkage Method . . . . . . . . . . . . . . . . . . . . . . 119

C.3 Application Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
C.3.1 Data Preprocessing and Region Names . . . . . . . . . . . . . . . . . . . 121
C.3.2 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Bibliography 127

vi


List of Tables

2.1 Edge-wise results of extracting mediation pattern Uc ⊗ Vd by all methods. We
demonstrate the false discovery rate (FDR), sensitivity (sens) and specificity (spec)
across different settings for all comparable methods. ∗ represents a rounded number. 34

2.2 Simulation results for mediation effect estimation (step 2). We compare the es-
timated mediation effects by Low Rank Model (medLRM) and MMO, with ref-
erence to the estimated mediation effect based on the oracle model (with known
mediating imaging factors). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1 Causal effect estimation (insufficient sleep vs. sufficient sleep) and 95% confi-
dence interval (in parentheses) for age-corrected outcomes in ABCD study. *
represents the significant CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.2 Results for 500 Monte Carlo simulations. We compared the performance of all
methods with sample size n = 1000, different dimensions of confoudners (p),
and various effect sizes. The estimated SE’s are the averages of standard error
estimates for the 500 simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.1 Sensitivity and False Discovery Rate (FDR) of the greedy L0 optimization. . . . . 71
4.2 The coverage probability of the 95% bootstrap confidence interval based on 500

simulations and 500 bootstrap samples. . . . . . . . . . . . . . . . . . . . . . . . 72

A.1 Performance of Different Function in order to select λ . . . . . . . . . . . . . . . 87
A.2 Simulation results (Step 1) for simulated datasets with non-normal mediators and

outcomes: the accuracy of mediation pattern Uc⊗Vd extraction. We demonstrate
the edge-wise false discovery rate (FDR), sensitivity (sens), and specificity (spec).
The standard Cauchy distribution and Laplace distribution are used to generate
the mediators and outcomes of the synthetic datasets with the sample size is n =
200. ∗ represents a rounded number. . . . . . . . . . . . . . . . . . . . . . . . . 95

A.3 Simulation results (Step 2) for simulated datasets with non-normal mediators and
outcomes: mediation effect estimation. We perform Step 2 analysis based on the
above results (Step 1 analysis results based on datasets generated by the Cauchy
distribution or Laplace distribution) and compare the estimated mediation effects
by Low Rank Model (medLRM) and MMO, with reference to the estimated me-
diation effect based on the oracle model (with known mediating imaging factors). 96

vii


A.4 Simulation results (Step 1) for simulated datasets with non-orthogonal mediators:
the accuracy of mediation pattern Uc ⊗ Vd extraction. We demonstrate the edge-
wise false discovery rate (FDR), sensitivity (sens) and specificity (spec). The
correlations of non-orthogonal factors vary from 0.5 to 0.8 with sample size is
n = 200. ∗ represents a rounded number. . . . . . . . . . . . . . . . . . . . . . . 96

A.5 Simulation results (Step 2) for simulated datasets with non-orthogonal mediators:
the estimated mediation effects. We compare the estimated mediation effects by
Low Rank Model (medLRM) and MMO, with reference to the estimated me-
diation effect based on the oracle model (with known mediating imaging fac-
tors). The correlations of simulated non-orthogonal factors vary from 0.5 to 0.8
withn = 200. ∗ represents rounded numbers. . . . . . . . . . . . . . . . . . . . . 97

B.1 Causal effect estimation (insufficient sleep vs. sufficient sleep) and 95% confi-
dence interval (in parentheses) for age-uncorrected outcomes. The results are for
the primary analysis (upper), and complete cases only data (lower). * represents
the significant CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

B.2 Causal effect estimation (insufficient sleep vs. sufficient sleep) and 95% confi-
dence interval (in parentheses) for age-corrected outcomes in ABCD study. The
results are for the analysis using complete cases only data. * represents the sig-
nificant CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

B.3 Causal effect estimation (insufficient sleep vs. sufficient sleep) and 95% con-
fidence interval (in parentheses) for age-corrected (upper) and age-uncorrected
(lower) outcomes. * represents the significant CIs. . . . . . . . . . . . . . . . . . 113

B.4 Baseline demographic statistics. Summarized by the group of sufficient/insufficient
(nine hours sleep) at the first visit measure for data used in the two models. The
number of participants is counted for categorical variables. . . . . . . . . . . . . 116

B.5 Proportions of sufficient versus insufficient sleep at each visit. We provided such
values for the original dataset, the dataset we used for Model 1 and that of Model 2.117

C.1 The mean absolute errors ×100 and infinity norm ×100 of estimates bias from
different methods across scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . 120

C.2 The mean absolute value (MAE), infinity norm (|| · ||∞) of estimates of βsl when
Ŝ is mis-specified. We considered strong effect of βsl = 5, r = 3 with normal
residuals for all scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

C.3 Comparing the mean absolute errors ×100 and infinity norm ×100 of estimating
bias using SCAD, adLASSO, LASSO, Direct estimates across scenarios. . . . . . 122

C.4 The names of regions of interests for the 39 white matter regions. . . . . . . . . . 124
C.5 The effect estimations of LDL and VLDL on the seven regions with the 95%

confidence intervals. ‘-’ represents the effect estimate is not significant for that
exposure factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

C.6 The effect estimations of LDL and VLDL on the seven regions with the 95%
confidence intervals. ‘-’ represents the effect estimate is not significant for that
exposure factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

viii


List of Figures

1.1 Three major types of relationship in causal DAGs. . . . . . . . . . . . . . . . . . 3
1.2 The illustration of the relationships, effects, and sources of confounders in me-

diation analysis. Z∗, ∗ = 1, 2, 3 represent subsets of confounders observed for
corresponding paths (see corresponding assumptions below). . . . . . . . . . . . 4

1.3 The illustration time varying exposures and covariates causal DAGs. . . . . . . . 6
1.4 Three approaches resolving unmeasured confounding problem. . . . . . . . . . . 7

2.1 We illustrate the multivariate mediator multivariate outcome mediation analysis:
(a) is the mediation model based on observed data with massive and complicated
mediation patterns; (b) is the mediation model with recognized systematic medi-
ation patterns, showing X affecting a subset of outcomes (Y1) through a corre-
sponding set of mediators (M1), the bold arrows represent relations between sets;
and (c) is the mediation model with factorized mediators for a specific mediator
set M̃c and outcome set Yd. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Selected Regions for CBF (mediators) and ReHo (outcomes): (a) is the six re-
gions for CBF (mediators) including the six region names which are close to
the bilateral primary cerebral arteries, (b)-(f) demonstrate the selected regions of
ReHo (outcomes) in the areas of frontal, corona, temporal, parietal and midbrain. 28

2.3 The mediation analysis results for the data example. The mediation variable is
systolic blood pressure (SBP). (a) is the heatmap of− log(p)-matrix based on the
input data. (b) is the heatmap with detected systematic mediation patterns. (c) is
the mediation results based on the mediating factor. β and θ effects are averaged
across related outcome ROIs. Overall, the mediation proportion is high. . . . . . 29

2.4 Edge-wise bi-cluster simulation results for all methods: the scenarios on x axis
correspond to the scenarios in Table 2.1 from left to right, top to bottom. . . . . 33

3.1 ABCD tabulate data structure. The solid lines represent the impact of variables to
the others, the dashed line represents possible impact of previous measurements
on the later ones. Notations A,Z, and Y are defined in section 3.2.1. . . . . . . . 42

3.2 Overview of MASE: the left top box is the procedure for estimating propensity
scores (PS), the right box is the procedure for estimating the outcome regres-
sions (ICE); The left bottom box it the illustration of estimating the ATE (θ̂). We
illustrate the stratification estimate for ICE within the right box (orange box). . . 50

3.3 Forest plots for long-term effects estimates of insufficient sleep on cognitive de-
velopment on ABCD data. The reference level is sufficient sleep. Estimates from
different methods are represented using different types of points. . . . . . . . . . 54

ix


4.1 Illustration graphs of negative control bridge method [1] (Left) and the parallel
outcome method [2] (Right). The lines without arrows represent the effect can be
either way. The illustration is based on different line relationship colors. . . . . . 68

4.2 Box-plots for the estimates biases for the five methods. The results for p = 100
and p = 50 are very similar so we only visualized the p = 100 results. The metric
used here is MAE in red boxes (right of each method) and || · ||∞ in blue boxes
(left of each method). Each row of the plots represents scenarios for different
effect sizes. They are (from top to bottom): Weak Effect, Strong Effect, Mixed
Effect, and Uniform Residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.3 Box-plot for the estimates biases for the five methods with estimated Ŝ. The
metric used here is MAE for the red boxes (left of each method) and infinity
norm for the blue boxes (right of each method). The above row is for n = 100
and the bottom row is for n = 500. The FDR are correspondingly in Table 4.1. . . 75

4.4 Visualization of the brain regions that are significantly influenced by HDL or
LDL. The left three columns are for effects estimates of HDL with axial (left),
coronal (middle), and sagittal (right) views. The last column is the results for
effects from LDL with axial view. The top row is for whole dataset and the
second row is for APOE4 non-carriers only. Blue regions represent the effects
are negative and red regions represent the effects are positive. . . . . . . . . . . 78

4.5 Visualization of the regions in brain axial (left) and coronal (right) views from
the subgroup analysis for APOE4 carriers. The regions are affected by VLDL
and LDL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

A.1 Demonstration of a dense bi-cluster. The left figure is a − log(p)-matrix (weight
matrix) before extraction, while the right figure is a p-matrix with re-ordered
region indices to highlight the extracted dense bi-cluster. A dense pattern refers
to that within an extracted bi-cluster the proportion of edges with suprathreshold
p-values is much higher than the edges in the rest of the graph. . . . . . . . . . . 83

A.2 The Mediation Discovery procedure for determining the mediation pathway. (a)
is the marginal correlation between SBP, CBF, ReHo. (b) is the conditional cor-
relation between the three variables, conditional on the boxed variables. The
conditional correlation between SBP and ReHo given CBF is not significant. (c)
is all the possible DAGs based on the results in (b). The green box is the most
possible DAG for our data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

A.3 Sensitivity analysis to assess the impact of the violated assumptions for our data
example. The sensitivity analysis is performed based on the procedure described
in [3]. The sequential ignorability holds when ρ = 0 (ρ is the correlation of resid-
uals of mediators and outcomes conditional on the exposure). The dashed hori-
zontal line is the estimated mediation effect with the valid assumption of sequen-
tial ignorability. When ρ deviates further from 0, the bias is larger. The results
suggested that our mediation results were generally robust to (mild to moderate)
violation of this assumption (e.g., ρ = 0–0.375 ). . . . . . . . . . . . . . . . . . 94

x


A.4 Results of mediation effect estimation based on mediators by orthogonal factor-
ization method and a composite mediator based on the weighted sum method.
Based on 1000 simulations, both methods perform well, while the factorization
method slightly outperforms the weighted sum component method. . . . . . . . 98

B.1 We illustrate the Directed Acyclic Graph for longitudinal causal relationship for
the first two visits. The DAG can be extended to more repeated measures with
similar relationship between the treatment history and covariates history. . . . . . 99

B.2 Forest plots for long-term effects estimates of insufficient sleep on age-uncorrected
cognitive development on (a) primary data analysis for ABCD study (left); and
(b) ABCD data with complete cases only (right). The reference level is sufficient
sleep. Estimates from different methods are represented using different types of
points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

B.3 Forest plots for long-term effects estimates of insufficient sleep on cognitive de-
velopment on ABCD data with complete cases only. The reference level is suf-
ficient sleep. Estimates from different methods are represented using different
types of points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

B.4 Forest plots for long-term effects estimates of insufficient sleep on cognitive de-
velopment on ABCD data with three time points measurements. (a) is the plot
for age-corrected variables (left); and (b) is the plot for age-uncorrected variables
(right). The reference level is sufficient sleep. Estimates from different methods
are represented using different types of points. . . . . . . . . . . . . . . . . . . 114

B.5 Positivity assumption check with propensity score density check with insufficient
(red) versus sufficient sleep (blue) groups. . . . . . . . . . . . . . . . . . . . . . 115

C.1 Correlation maps for all the metabolic biomarkers grouped into two clusters using
kmeans. The first dense graph in the left bottom is for the first group (HDL) and
the dense group on the top right is for the second (LDL&IDL). The darker the
color, the stronger the correlation is. . . . . . . . . . . . . . . . . . . . . . . . . 125

xi


List of Abbreviations

ABCD Adolescent Brain Cognitive Development
ACP Amish Connectome Project
APOE4 Apolipoprotein ε4 gene
ASL Arterial-spin Labeling
ATE Average Treatment Effect
BH-FDR Benjamini-Hochberg False Discovery Rate
BOLD Blood Oxygenation Level-Dependent
CBF Cerebral Blood Flow
DAG Directed Acyclic Graphs
DR Doubly Robust
DTI Diffusion Tensor Imaging
FA Fractrional Anisotropy
FDR False Discovery Rate
fMRI funcitonal Magnetic Resonance Imaging
FWE Family-Wise-Error
HCP Human Connectome Project
HDL High-density Lipoprotein
ICE Iterative Conditional Mean Estimation
IDL Intermediate-density Lipoprotein
IV Instrumental Variable
KL Kullback-Leibler
LASSO Least Absolute Shrinkage and Selection Operator
LDL Low-density Lipoprotein
LSEM Linear Structural Equation Model
lTMLE longitudinal Targeted Maximum Likelihood
MAE Mean Absolute Error
MASE Marginal Structure Ensemble Learning Model
MLS Multivariate Longitudinal Sequential
MMO Multivariate-Mediator and Multivariate-Outcome mediation model
MRI Magnetic Resonance Imaging
MSM Marginal Structural Model
(N)DE (Nature) Direct Effect
(N)IE (Nature) Indirect Effect
NCO Negative Control Outcome

xii


NCE Negative Ctonrol Exposure
PS Propensity Score
ReHo Regional Homogeneity
ROI Regions of Interest
SCAD Smoothly Clipped Absolute Deviation
SBP Systolic Blood Pressure
SURVA Stable Unit Treatment Value Assumption
TMLE Targeted Maximum Likelihood
UKB United Kingdom Biobank
VLDL Very-low-denisty Lipoprotein
WM White Matter

xiii


Chapter 1: Introduction

Causal Inference is a widely used technique in fields such as economics, biomedical stud-

ies, behavior science, and political science. Its goal is to determine the actual effect of or more

causal variables (often referred to as treatments) on outcomes of interest. The primary challenge

in causal inference is identifying the true causal effect in real-world applications, where other

factors—such as confounding—can complicate the relationship. Researchers from various dis-

ciplines, including statistics [4], epidemiology [5], econometrics [6], and computer science [7],

have developed different methods to address these challenges. In statistics, causal inference has

been a significant area of research over the past half-century (see a review in 8).

To date, the research of causal inference have been focusing on three major areas: causal

discovery based on graphical models [9, 10, 11, 12, 13], identification problem answering the

question of if observational data can identify the causal effect (the g-formula, 14, the back door

criteria 15, etc.), and estimation problem (the g-computation 16, the inverse probability treatment

weighting 17, the marginal structural models 18, the targeted maximum likelihood estimation 19,

and etc.).

1


1.1 Introduction of Causal Inference

The main goal of causal inference is to understand the effect of a certain causal variable(s)

on an outcome of interest. In real-world applications, identifying causal relationships can be

far more challenging than detecting associations, as demonstrated by phenomena like Simpson’s

paradox [20]). One of the most widely used frameworks is the so-called “counterfactual” out-

come framework, first proposed by [21] and later developed by [22]. In this framework, let A

represents the exposure of interest (the treatment variable) and Y be the outcome. The counter-

factual outcome is defined as Y (A = a) representing the outcome we would have observed if the

exposure A were set to A = a. This differs from the observed outcome Y |A = a, which repre-

sents the outcome we observe in a real experiment when A = a. The key distinction is that we

can only observe one outcome given a specific A = a, and we can never know the counterfactual

outcome under a different value of A = a′. Causal inference is such a technique that seeks to

understand the underlying data generating mechanism, bridge the gap between the counterfactual

and real word, and find models to estimate the effects.

1.1.1 Causal Graphical Models and Identifications

In causal inference, the causal graphical models are a powerful tool that offer both visual

and mathematical approaches for studying the underlying causal relationships in data. Central

to these models are Directed Acyclic Graphs (DAGs) [23, 24]. A DAG consists of nodes (rep-

resenting variables) and directed edges (arrows that depict causal relationships between these

variables). A key requirement for DAGs is that they must be ”acyclic,” meaning they cannot

contain any loops, ensuring a one-way causal influence from one variable to another. Common

2


relationships depicted in DAGs include confounders (Figure 1.1 (a)), mediator (Figure 1.1 (b)),

and collider (Figure 1.1 (c)). The arrowed lines represent causal relationships, while dashed lines

represent underlying relationships when conditioning on a variable Z (a property of colliders,

also known as Berkson’s paradox 25). In this dissertation, I will focus on the estimation problem,

with a particular emphasis on applying causal inference to large-scale biomedical data.

A

Z

Y

A M Y

A

Z

Y
(a) Confounder (b) Mediator (c) Collider

Figure 1.1: Three major types of relationship in causal DAGs.

To bridge the gap between the counterfactual world and real world, we focus on the iden-

tifiability of causal effects. Several key assumptions help achieve this identifiability. First, we

assume consistency, meaning that the counterfactual outcome Y (A = a) is equal to the observed

outcome with A = a (i.e., Y (A = a) = Y |A = a). Second, we assume conditional ignorability,

which states that there is no unmeasured confounders between A and Y given the observed co-

variates. Let Z represents the observed covariates, this assumption requires that A |= Y |Z. Third,

we assume positivity, meaning that the probability of being exposed to any level of A = a is

greater than zero given values of covariates, i.e., P (A = a|Z) > 0. Other assumptions, such as

the Stable Unit Treatment Value Assumption (SUTVA, 26), which requires no interference be-

tween subjects, and assumptions specific to Directed Acyclic Graphs (DAGs), like the the causal

Markov condition [27] and faithfulness [24, 28], also play a role. However, these assumptions

do not directly influence the common identification of the average treatment effect (ATE), which

is the primary causal effect estimated in this dissertation. Therefore, we will focus on the three

3


major assumptions: consistency, conditional ignorability, and positivity.

Here, we define the ATE as following (for binary exposure A as an example):

ATE := E(Y (A = 1)− Y (A = 0)). (1.1)

1.1.2 Mediation Analysis

One of the most important branches of causal inference is mediation analysis, an approach

used to understand and explain the mechanisms through which an independent variable influences

a dependent variable via one or more intermediate variables [29]. These intermediate variables,

known as mediators (M , Figure 1.1 (b)), help elucidate the pathways through which causal effects

operate. Mediation analysis is crucial for moving beyond simple causal relationships, enabling

researchers to explore the underlying processes and mechanisms driving these effects.

A

M

Y

Z3 Z2

Z1

4

γ

α β

Figure 1.2: The illustration of the relationships, effects, and sources of confounders in mediation
analysis. Z∗, ∗ = 1, 2, 3 represent subsets of confounders observed for corresponding paths (see
corresponding assumptions below).

Mediation analysis relies on several common assumptions, particularly regarding condi-

tional ignorability. Using the same counterfactual outcome framework and defining the observed

4


confounders as Z, we assume the following:

(I) Y (A = a,M = m) |= A | Z

(II) Y (A = a,M = m) |= M | Z,A = a

(III) M(A = a) |= A | Z

(IV) Y (A = a,M = m) |= M(A = a′) | Z. (1.2)

We illustrated such confounding relationships in Figure 1.2. It is required that there should be no

unmeasured confounders between: (I) treatment and outcome (Z1); (II) mediator and outcome

(Z2); (III) treatment and mediator (Z3); and (IV) mediator and outcome, considering the path-

way affected by the treatment (path 4 in the figure). These conditions are known as sequential

ignorability assumptions [30]. The positivity assumption is adjusted as: (I) treatment positivity:

P (A = a|Z = z) > 0ifp(Z = z) > 0; (II) mediator positivity: P (M = m|A = a, Z = z) >

0ifP (Z = z) > 0. The consistency assumption remains the same as previously defined.

The causal effect of A on Y can be decomposed into two components: the direct effect and

the indirect effect. Specifically, we consider the natrural direct (NDE), which is the effect of A

on Y that occurs directly, bypassing the mediator (fixed A, γ in Figure 1.2). The indirect effect

(NIE), on the other hand, captures the effect of A on Y through the mediator M (fixed M , αβ in

Figure 1.2). For binary A, we have the following decomposition:

ATE = [E(Y (1,M(1)))− E(Y (1,M(0)))]︸ ︷︷ ︸
NIE

+ [E(Y (1,M(0)))− E(Y (0,M(0)))]︸ ︷︷ ︸
NDE

(1.3)

5


1.1.3 Causal Inference with Time-varying Covariates

Another challenging aspect of causal inference is estimating causal effects in longitudinal

data. In this context, the effect of A on Y is dynamic, as it can be influenced by or interact

with covariates that change over time. This evolving nature of both treatment and covariates

adds complexity to the estimation process, requiring methods that can account for time-varying

confounders and interactions.

Z0 A0 Z1 A1 Y

U0 U1

(a) Sequential randomization

Z0 A0 Z1 A1 Y

U0 U1

Z0 A0 Z1 A1 Y

U0 U1

(b) Sequential conditional randomization (c) Unmeasured confounding

Figure 1.3: The illustration time varying exposures and covariates causal DAGs.

We illustrate three scenarios of causal relationships with time-varying confounders in Fig-

ure 1.3. In these scenarios, U represents unmeasured confounding, and the graphs are as follows:

6


(a) sequential randomization trials with no covariates influencing the treatments A0, A1; (b) se-

quential conditional randomization, where only the observed confounders (Z0, Z1) influence the

treatments; and (c) unmeasured confounding, with U0, U1 influence the treatments. In practice,

scenario (b) is the most commonly studied, under the assumption of sequential exchangeability.

With Āk = (A0, A1, . . . , Ak), Z̄k = (Z0, Z1, . . . , Zk) being the treatment and covariates history,

the assumption states that : Y (Āk = āk) |= Ak | Āk−1, Z̄k. The positivity assumption is adjusted

as: P (Ak = ak|Z̄k, Āk−1 > 0, k = 1, . . . , K. Commonly used techniques in estimating the

causal effect under this scenario include G-computation [31], structural nested model [32], and

marginal structural model [18, 33, 34].

1.1.4 Unmeasured Confounding

Throughout the discussion, the identification and estimation of causal effects rely on the

conditional ignorability assumption, which requires that there be no unmeasured confounding

between the variables of interest. However, this assumption is often violated in observational

studies [35], making it challenging to accurately estimate causal effects.

A

U

M Y A

IV U

Y A

U

Y

W

(a) Front-Door Model (b) Instrumental Variable (IV ) (c) Negative Control (W )

Figure 1.4: Three approaches resolving unmeasured confounding problem.

In recent years, several techniques have been developed to address the issue of unmeasured

confounding, denoted as U . Three commonly used methods, illustrated in Figure 1.4, are: the

front-door model [23, 36], which adjusts for unmeasured confounding through a mediator that is

7


the sole path through which A affects the outcome (i.e., complete mediation). The instrumental

variable method [37, 38], where an IV influences A but is independent of U . An important appli-

cation of the IV method is Mendelian randomization, which uses genes as IVs to estimate causal

effects [39, 40]. And lastly, the negative control variables [1, 41, 42], where the NC variable

is influenced by U and acts as a proxy for adjusting the effect of U on the causal relationship

[43, 44]. However, these methods come with their own sets of assumptions, which may be more

or less reasonable depending on the application. For example, the front door model requires no

unmeasured confounding between A and M or between M and Y . The instrumental variable

requires exclusion restriction assumption, meaning the IV influences the outcome only through

its effect on A [45]. The negative control variables, depending on whether they are NC exposures

or NC outcomes, require different exclusion relationships between A and Y (see 1 for details).

All NC variables must also satisfy the condition of U-comparability, meaning they share a similar

confounding structure [41].

1.2 Large-Scale Biomedical Data

1.2.1 Human Connectome Project

The Human Connectome Project (HCP) is a collaborative research effort designed to com-

prehensively study how the brain’s structural and functional network connections contribute

to cognitive and behavioral capabilities, ultimately influencing personality and mental health

[46, 47, 48]. Specifically, we focus on a subset of the HCP known as the Amish Connectome

Project (ACP, 49). It includes 450 participants aged 18-85 from the Amish population, which

is characterized by greater genetic homogeneity, a consistent lifestyle, lower exposure to envi-

8


ronmental variation, and a reduced risk of substance misuse. Data collection follows the broader

guidelines of the HCP and includes standard demographic information, structural, functional, and

diffusion imaging scans, clinical data with genomic information, and behavioral data such as NIH

Toolbox assessments, diagnostic details, and genotypes. We provide a detailed description of the

cohort used in our study, along with the measurements and pre-processing procedures, later in

Chapter 2 and Appendix A.

1.2.2 Adolescent Brain Cognitive Development

The Adolescent Brain Cognitive Development (ABCD) study is the largest long-term study

of brain development and child health in the United States. It recruits over 11, 000 children cross

the country, following them from ages 9-10 into young adulthood. The primary goal of the study

is to understand how various factors, including genetics, environment, physical activity, and sub-

stance use, influence brain development and overall health during adolescence [50, 51]. The

study collects a wide range of data encompassing neuroimaing, genetic, behavioral and cognitive

assessments, environmental and lifestyle, and substance use and mental health data. Participants

first enrolled at ages 9-10 with a collection of comprehensive set of baseline data. This is followed

by annual behavioral and cognitive assessments, surveys, physical measurements, and environ-

mental data, with neuroimaging data acquired every other year. In this dissertation, we focus

primarily on the survey and questionnaire data, with further details provided in Chapter 3 and

Appendix B.

9


1.2.3 UK Biobank

The UK Biobank (UKB) is a large-scale biomedical database designed to enhance our

understanding of the genetic and environmental factors that contribute to disease development. It

is a prospective cohort study of approximately 500, 000 individuals, aged 40 to 69 at recruitment,

from across United Kingdom [52, 53]. The UKB provides a comprehensive resource that allows

researchers to explore the interplay between genetics, environment, and lifestyle factors in the

development of diseases such as cancer, cardiovascular disease, diabetes, mental health disorders,

and neurodegenerative diseases. The study collects data from various sources, including genetic

information, lifestyle and environmental data, health records, biological samples, and imaging

data from a subcohort of around 100, 000 participants. It also provides some repeated assessments

on, for example, imaging, and questionnaire surveys. It also includes repeated assessments, such

as imaging and questionnaire surveys. In our study, we focus on structural brain imaging data

along with biological testing variables and demographic information. Further details on the data

and analysis are provided in Chapter 4 and Appendix C.

1.3 Overview

In this dissertation, I aim to address the application of causal inference in the context of

large-scale biomedical data, with a particular focus on the estimation of causal effects. Three

key research questions are explored: (i) the mediation analysis with the existence of multivariate

mediators and multivariate outcomes; (ii) the longitudinal causal effect estimation with high-

dimensional covariates; and (iii) the unmeasured confounding problem with multiple outcomes

and moderate exposures. The following subsections provide the summaries of the motivation,

10


current challenges, proposed solutions, and brief conclusion for each project.

1.3.1 multivariate-mediator and multivariate-outcome mediation model (MMO)

Multimodal neuroimaging data have attracted increasing attention for brain research. An

integrated analysis of multimodal neuroimaging data and behavioral or clinical measurements

provides a promising approach for comprehensively and systematically investigating the under-

lying neural mechanisms of different phenotypes. However, such an integrated data analysis is

intrinsically challenging due to the complex interactive relationships between the multimodal

multivariate imaging variables. To address this challenge, a novel multivariate-mediator and

multivariate-outcome mediation model (MMO) is proposed to simultaneously extract the latent

systematic mediation patterns and estimate the mediation effects based on a dense bi-cluster

graph approach. A computationally efficient algorithm is developed for dense bi-cluster struc-

ture estimation and inference to identify the mediation patterns with multiple testing correction.

The performance of the proposed method is evaluated by an extensive simulation analysis with

comparison to the existing methods. The results show that MMO performs better in terms of

both the false discovery rate and sensitivity compared to existing models. The MMO is applied

to a multimodal imaging dataset from the Human Connectome Project to investigate the effect

of systolic blood pressure on whole-brain imaging measures for the regional homogeneity of the

blood oxygenation level-dependent signal through the cerebral blood flow.

11


1.3.2 Marginal Structure Ensemble Learning Model (MASE)

Evaluating the effects of time-varying exposures is essential for longitudinal studies. The

effect estimation becomes increasingly challenging when dealing with hundreds of longitudinally

measured confounders. We propose a Marginal Structure Ensemble Learning Model (MASE) to

provide a marginal structure model (MSM)-based robust estimator under such longitudinal set-

ting. The proposed model integrates multiple machine learning algorithms to model propensity

scores and a sequence of conditional outcome means such that it allows many confounders with

potential non-linear confounding effects to reduce the risk of inconsistent estimation. Extensive

simulation analysis demonstrates the superiority of MASE over benchmark methods (e.g., MSM,

G-computation), yielding smaller estimation bias and improved inference accuracy. We further

apply MASE to the adolescent cognitive development study to investigate the time-varying ef-

fects of sleep insufficiency on cognitive performance. The results reveal an aggregated negative

impact of insufficient sleep on cognitive development among youth.

1.3.3 Unmeasured Confounding with Multiple Outcomes and Exposures

A key challenge in applying causal inference to real-world data is the presence of unmea-

sured confounding. To address this issue, we propose a novel approach using the idea of proximal

learning. The proposed model leverage the information across multiple outcomes with the ex-

istence of multiple exposures based on linear structural equation models. Extensive simulation

studies demonstrates the superiority of our method. We apply the proposed method to the UK

Biobank data to study the influence of plasma metabolic dysfunciton on brain white matter. The

results identified six brain regions influenced by metabolic biomarkers in the overall data, and

12


four distinct regions in individuals who are non-carriers of the apolipoprotein ε4 gene. A fur-

ther subgroup analysis on the spolipoprotein ε4 carriers validate the effectiveness of our method.

Additionally, we provide a review of existing debiasing methods for handling unmeasured con-

founding in both methodology and comprehensive numerical experiments.

1.4 Organization of the Dissertation

In the remainder of this dissertation, we address the three key questions surrounding the

application of causal inference in large-scale biomedical data. In Chapter 2, we introduce the

model, estimation, and application of the multivariate-mediator and multivariate-outcome medi-

ation model, along with a thorough evaluation of its numerical performance. Chapter 3 presents

the framework for the marginal structure ensemble learning model, including application results

and numerical experiments. In Chapter 4, we present a proximal learning-based method for han-

dling unmeasured confounding with multiple outcomes and exposures. This chapter includes a

comprehensive numerical study, comparing our method to existing approaches in terms of both

experimental performance and methodological differences, alongside two illustrative data appli-

cations.

In general, this dissertation explores the application of causal inference in large-scale

biomedical data from three distinct but interrelated perspectives. The methods developed and

applied in these studies pave the way for future advancements in causal inference applications

and support clinical investigators in disease diagnosis and treatment selection.

13


Chapter 2: multivariate-mediator and multivariate-outcome mediation model

(MMO)

2.1 Introduction

The joint analysis of multiple types of neuroimaging data (i.e., multimodal imaging) has

garnered increasing interest for brain research, as these data can characterize brain functions and

structure from different yet complementary angles [54]. For example, functional magnetic res-

onance imaging (fMRI) can capture neural responses to external stimuli. Resting-state fMRI

is commonly used to assess functional connectivity between neural populations from different

brain areas. Diffusion-weighted MRI evaluates the molecular function and micro-architecture.

Recently, regional homogeneity (ReHo) has been widely discussed in the resting-state literature

[55, 56]. ReHo is a measure of brain activity that evaluates the summarized functional connec-

tivity between a voxel and its nearest neighbors. It is based on the blood oxygenation level-

dependent (BOLD) signal. These imaging variables from multimodal datasets can be influenced

by peripheral and behavioral conditions simultaneously and, at the same time, interact with each

other.

The current research is a multimodal imaging study investigating cardiovascular disease

risks, such as the effects of systolic blood pressure (SBP) on ReHo via the influence of cerebral

14


blood flow (CBF). We develop the hypothesis based on previous findings of the physiological

effects of SBP on CBF [57], and CBF on ReHo [58, 59]. This study uses two brain imaging

modalities and thus, has two sets of spatially dependent multivariate variables. One way to sys-

tematically study the underlying mechanism of SBP → CBF → ReHo is to use a multivariate-

mediator and multivariate-outcome mediation model (MMO). We use multivariate mediation

methods in the integrated analysis of SBP through multimodal imaging mediation.

Statistical mediation analysis has been widely applied in neuroimaging studies [60, 61, 62].

Certain models have been adopted for image-based mediators. For example, [63] proposed an

orthogonal mediator decomposition method and [64] used a sparse principal component analysis

in a multiple-mediator analysis. Advanced models have also been developed to handle mediation

analysis with multivariate exposures and multivariate mediators [65, 66]. However, there is still a

methodological gap in mediation analysis where the mediator and outcome are both multivariate.

To fill this gap, we propose a novel mediation model to handle multivariate-mediators and

multivariate-outcomes. In our application, extracting the underlying mediation pathways is nat-

urally challenging because the mediation pathways are tangled between imaging modalities with

complex data structures. In our motivation data example, a small set of CBF mediators related to

the primary brain arteries can mediate most of the effects of peripheral blood pressure (exposure)

on localized ReHo measures (outcomes). Here, we mimic this neurobiological property (i.e., sys-

tematic mediation patterns) by introducing the concept of dense bi-clusters. We use the following

example to illustrate the concept of dense bi-clusters while referring the readers to Section 2.2.2.1

for the detailed introduction. In a dataset with 100 mediators and 100 outcomes, there exist 10000

potential pathways, among which 200 (2%) are positive pathways, and 9800 are negative. If pos-

itive pathways are evenly distributed, a bi-cluter consisting of 10 mediators and 20 outcomes is

15


expected to cover around four positive pathways (10 × 20 × 2%). The 10-mediator-20-outcome

bi-cluster is dense if it covers a large number (say 120) of positive pathways suggesting unevenly

distributed positive pathways. The dense bi-clusters is particularly useful for our application

because they can reveal the underlying complex mediation pathways, which is required for me-

diation effect estimation. We develop a computationally efficient algorithm that is tailored to

extract the latent mediation pathways among multivariate mediators and multivariate outcomes

with guaranteed optimum convergence. Based on the extracted mediation dense bi-clusters, we

estimate mediation effects by performing an orthogonal transformation for the selected media-

tors. A new form of statistical inference is adopted to assess the significance of the mediation

dense bi-clusters.

We apply this approach to the Amish Connectome Project, which is a subcohort of the Hu-

man Connectome Project, to investigate the influence of SBP on ReHo through CBF. Six CBF

regions are identified as momentous mediators, and 59 ReHo regions are affected by SBP through

them. These CBF regions are analogous to those found in previous studies, which proved that

they can trigger changes to metabolism and higher-order information processing. Our findings

are aligned well with previous research in the literature, since a negative influence is observed

from SBP to the CBF latent factor, and there are positive effects between the CBF factor and

ReHo regions. Our method is among the first to use a mediation analysis with high-dimensional

mediators and outcomes. Unlike conventional multivariate three-step mediation inference (e.g.,

61) or a marginal mediation model with Benjamini–Hochberg false discovery rate (BH-FDR)

correction, MMO can substantially improve the statistical power and prohibit the finding of false

positives by fully leveraging the latent organized mediation patterns. While the proposed frame-

work is applicable to causal mediation analysis for datasets with manipulable causal variables and

16


mediators, we can also use it for pathway analysis in observational studies (i.e., causal variables

and mediators are not manipulable). In the neuroimaging statistics literature, pathway analysis

has been widely used to investigate the potential pathways among exposures - imaging variables -

behavior outcomes [66, 67]. We performed the pathway analysis as our data example is collected

from an observational study.

This chapter is organized as follows. Section 2.2 describes MMO and the steps for dense

bi-cluster extraction, mediation effect estimation, and inference. Section 2.3 applies our frame-

work to the Amish Connectome data and explains the identified CBF regions and mediation

effects. Section 2.4 implements simulations to demonstrate the empirical performance of our

method. We conclude the chapter with a discussion in Section 2.5.

2.2 Methods

2.2.1 Multivariate mediation model

We consider a setup with a single-exposure, multivariate-mediator, and multivariate-outcome

mediation model, as illustrated in Figure 2.1. Our model includes p mediating imaging variables

from the first imaging modality and q outcome imaging variables from the second imaging modal-

ity. To perform the mediation analysis on a sample of n participants, let X = (X1, . . . , Xn)
′ ∈ Rn

be the observed exposure vector for n subjects. Let M = {M(i)}pi=1 denote the set of mediat-

ing imaging variables, where M(i) ∈ Rn is the i-th mediator. Similarly, define Y = {Y(j)}qj=1

to be the set of imgaing outcomes, where Y(j) ∈ Rn is the j-th outcome. Then, the univariate

mediation analysis is denoted by X→M(i) → Y(j). However, this may not reveal the systematic

mediation pattern nor lead to an accurate inference. Therefore, the goal of the current research is

17


to develop MMO to identify and estimate the underlying systematic mediation effects.

(a) Mediation model with observed data

(b) Mediation model with identified (c) Mediation model with
systematic patterns mediating factors

Figure 2.1: We illustrate the multivariate mediator multivariate outcome mediation analysis: (a)
is the mediation model based on observed data with massive and complicated mediation patterns;
(b) is the mediation model with recognized systematic mediation patterns, showing X affect-
ing a subset of outcomes (Y1) through a corresponding set of mediators (M1), the bold arrows
represent relations between sets; and (c) is the mediation model with factorized mediators for a
specific mediator set M̃c and outcome set Yd.

The systematic mediation effects may involve only a subset of the p mediating imaging

variables and a subset of the q outcomes. We further assume that the systematic mediation pattern

is in an organized structure. That is, the exposure X influences a subset of outcomes Yd =

{Y(j)
d }

Jd
j=1 (d = 1, . . . , D) only through the subset of mediatorsMc = {M(i)

c }Ici=1 (c = 1, . . . , C),

where Ic and Jd are the number of mediators and outcomes in Mc and Yd respectively. For

example, in our real data analysis, we have n = 204, p = q = 107 as the input data, and

18


D = C = 1, I1 = 6, J1 = 59 based on Step 1 analysis (see Section 2.3). We denote a systematic

mediator-outcome-set pair by {Mc,Yd} [Figure 2.1(b)],
⋃C

c=1Mc ⊂ M and
⋃D

d=1 Yd ⊂ Y . In

practice, {Mc,Yd} pairs are unknown. We provide a graph model-based procedure for estimating

{Mc,Yd} in Section 2.2.2. Here, we first present the mediation model with known mediation

patterns
⋃
{Mc,Yd}.

Given a systematic mediation set pair {Mc,Yd}, we can model the mediation X→Mc →

Yd. In practice, the mediators in imaging data are highly correlated and thus obscure the identifi-

cation of the mediation effect. To ensure the identifiability, we calculate a set of orthogonal latent

factors M̃c = {M̃
(l)

c }Lc
l=1 ofMc by factorization model (see section 2.2.2.2), where M̃c is a vari-

able set consisting of Lc orthogonal factors corresponding toMc. For example, M̃(l)

c , M̃(l′)

c ∈ M̃c

(1 ≤ l < l′ ≤ Lc) are two orthogonal factors [Figure 2.1(c)].

Specifically, we present the MMO mediation model for each {M̃c,Yd} by a linear struc-

tural equation model (LSEM) as:

E(M̃(l)

c |X = x) = γl0 + αlx,

E(Y(j)
d |X = x,M̃c = {m̃(1)

c , . . . , m̃(Lc)
c }) = υj0 + θjx +

Lc∑
l=1

βljm̃(l)
c ,

(2.1)

where j = 1, . . . , Jd and l = 1, . . . , Lc. Lower-case variables (e.g., x and m̃(l)) denote manipu-

lated values.

Our method can be applied for the causal mediation analysis when the causal variable (or

exposure) X and mediators M can be manipulated in experiments (see the specification and

assumptions of causal mediation model in A.1.4). In most observational studies where X and

M cannot be manipulated, we apply the MMO to identify the potential mediation pathways and

19


estimate the mediation effects. In the neuroimaging statistics literature, this approach is often

referred to as pathway analysis [66, 67]. We consider mediation analysis and pathway analysis

as interchangeable terms for the rest of the chapter.

2.2.2 Model estimation

In this section, we estimate the parameters in MMO using a two-step procedure. In step 1,

we aim to extract potential mediation pairs {Mc,Yd} from X→M→ Y. Then, we estimate the

latent factors M̃c and the mediation effects in step 2.

2.2.2.1 Step 1: Mediation bi-cluster estimation

We estimate
⋃
{Mc,Yd} by leveraging a graph model. Let bipartite graph G(U, V,E)

denote all potential marginal mediation pathways of X → M → Y and let W be the weighted

matrix of the edges in G. The node set U represents the mediator imaging variables with |U | = p,

whereas the node set V denotes the outcome imaging variables with |V | = q. The edge set

E, where |E| = p · q, refers to all potential mediation pathways. The weighted edge set W is

calculated with the sample data. For example, wij = − log pij , where pij is the p-value for the

mediation model X→M(i) → Y(j) [68]. The analysis of X→M(i) → Y(j) can be considered as

a sure independent screening procedure for multivariate mediation analysis by [69]. In our model,

this procedure aims to extract the latent mediation patterns instead of estimating the mediation

effect.

When systematic mediation patterns exist, we define a dense bi-cluster Uc⊗ Vd ⊂ G as the

Cartesian product of Uc ⊂ U and Vd ⊂ V , and P (wij > wi′j′) = 1 for i and j from {Uc⊗Vd} and

20


i′ and j′ not in {Uc ⊗ Vd} (see formal definition of dense bi-cluster in the A.1.1). Our goal is to

extract all dense bi-clusters {Uc ⊗ Vd} from G based on W from the sample data. The node sets

of an extracted Uc⊗Vd become the mediation pair {Mc,Yd} for the above-mentioned mediation

analysis in section 2.2.1

Since the combinatorial of {Uc⊗ Vd} from G is infinite, a sound heuristic is required to re-

cover {Uc⊗Vd}. Therefore, for the sample matrix W, we tend to assign wij with high values into

{Uc ⊗ Vd} while maximizing the proportion of high-value edges in each dense bi-cluster. This

heuristic can be translated into an objective function that maximally includes informative edges

with a set of subgraphs under minimal size. The maximal weight inclusion ensures high sensi-

tivity, whereas the subgraph size penalty prohibits the finding of false positives. The objective

function is given by:

argmax
{Uc⊗Vd}

dλ(Uc, Vd) = argmax
{Uc⊗Vd}

∑
c,d

∑
i∈Uc,j∈Vd

[log(|wij|)−
1

2
λ log(|Uc||Vd|)], (2.2)

where λ is a tuning parameter for the size penalty term. When λ is large, our approach tends to

extract more parsimonious {Uc ⊗ Vd} with a reduced size and increased density.

Estimating Eq. (2.2) is an NP hard problem [70]. Thus, we translate the objective func-

tion into an adaptive dense directed subgraph detection problem using greedy algorithms [71].

Multiple dense bi-clusters {Ûc⊗V̂d} can be captured by an iterative procedure [70]. When imple-

menting the greedy algorithms, the choice of λ is critical for extracting {Uc⊗ Vd} [72]. To avoid

an ad hoc parameter selection and ensure maximal reproducibility, we develop a data-driven ap-

proach to automatically determine λ based on the Kullback–Leibler (KL) divergence criterion

[73].

21


Recall that the KL divergence is given by:

D(P ||Q) =
∑
i,j

P (δij) log
P (δij)

Q(δij)
, (2.3)

which measures the discrepancy between two distributions. Specifically, for a systematic dense

bi-cluster, let δij be an indicator variable for the edge set E. δij = 1 if i ∈ U and j ∈ V are

connected, and δij = 0 otherwise. P and Q are two distributions of δij , which follows a mixture

Bernoulli distribution P (δij). If there are dense bi-clusters in G:

P (δij) =


Bern(π1), if {i, j} ∈ {Uc ⊗ Vd},

Bern(π0), otherwise,

(2.4)

where π1 > π0 are the parameters of the two Bernoulli distributions. In contrast, when {Uc ⊗

Vd} = ∅, δij follows a Bernoulli distribution Q(δij), where

Q(δij) = Bern(π), (2.5)

and π is uniform in G.

The KL divergence measures the discrepancy between the distributions of the systematic

dense bi-cluster mediation patterns (2.4) and the null model (2.5) [74]. The divergence reaches

a maximum when the estimated {Ûc ⊗ V̂d} bi-cluster structures are the same as the underlying

true patterns {Uc ⊗ Vd} (Theorem 1). This can guide the tuning. As we assume that there are

systematic mediation patterns, a larger KL divergence indicates that the underlying systematic

patterns {Uc⊗Vd} are better recovered from W [75]. Therefore, we maximize the KL divergence

22


in (2.3) to objectively select the tuning parameter λ by:

argmax
λ

D(P{Ûc⊗V̂d(λ)}∥Q)

= argmax
λ

∑
c,d

 ∑
i,j∈Ûc⊗V̂d(λ)

(
δijπ1 log

π1

π
+ (1− δij)(1− π1) log

(1− π1)

(1− π)

)

+
∑

i,j /∈Ûc⊗V̂d(λ)

(
δijπ0 log

π0

π
+ (1− δij)(1− π0) log

(1− π0)

(1− π)

) . (2.6)

In practice, δij is unknown and can be approximated by δ̃ij = I(wij > r) [76]. The choice

of the threshold r depends on a prior distribution h0(r) (such as the mixture binomial model, as

we assumed) and can be integrated out by
∫
D(P{Uc⊗Vd(λ)}||Q)h0(r)dr. Further, π1, π0, and π are

also unknown and can be estimated by maximum likelihood estimation based on the estimated

dense bi-cluster structures (2.4) (details are in the A.1.3 of Appendix). Last, we optimize λ using

a grid search to maximize (2.6).

We integrate the objective function optimization and objective tuning parameter selection

in Algorithm 3 in the A.1.5. The output of this algorithm is {Ûc ⊗ V̂d}. By Theorem 1, we show

that the optimum can be attained under mild regularity conditions.

Theorem 1. Let {Uc ⊗ Vd} be the true underlying mediation dense bi-cluster pattern and {U ′
c ⊗

V ′
d} be the dense bi-cluster pattern different from {Uc ⊗ Vd}. Then, we have

D(P{Uc⊗Vd}||Q) > D(P{U ′
c⊗V ′

d}||Q). (2.7)

The proof of this Theorem is in the A.1.2 of the Appendix.

23


2.2.2.2 Step 2: Mediation effect estimation

For each estimated {Mc,Yd} (i.e., Uc⊗Vd), we next estimate the mediation effects. We first

calculate the orthogonal mediating factors from a set of correlated mediating imaging variables

Mc to ensure mediation identifiability. As demonstrated in Figure 2.1(c), let Mc ∈ Rn×Lc be the

observed matrix ofMc, c = 1, . . . , C. For the orthogonal factor matrix M̃c of M̃c, we apply the

following factorization model: M̃c = Mcηc+ϵc. Here, η(c) ∈ RIc×Lc represents the corresponding

loading, and ϵc ∈ Rn is the error term. The factorization model above can be solved with either

classical low-rank methods or using the direction of mediation method designed for multiple

mediators [63]. Although low-rank models are commonly used to estimate the aggregating effect

of correlated multiple mediators (e.g., orthogonal factorization and sparse princinple component

analysis by 64), the alternative approach (e.g., the weighted sum method) can also be applied (see

A.3.3).

Given the orthogonal mediating factors, we follow the commonly used imaging mediation

procedure to estimate indirect effect (IE) and direct effect (DE). Specifically, with estimated

{Mc,Yd}, the estimated indirect and total effects are:

ÎEd =
1

Jd

Jd∑
j=1

Lc∑
l=1

β̂ljα̂l,

T̂Ed =
1

Jd

Jd∑
j=1

{
Lc∑
l=1

β̂ljα̂l + θ̂j},

(2.8)

where β̂lj and α̂l are estimated using equation (2.1) based on the orthogonal factors. The media-

tion proportion then will be: ÎEd

T̂Ed
.

In practice, the numeric ranges of the mediating imaging variables and imaging outcomes

24


can vary, the directly estimated mediation effects are less informative about the level of the me-

diation effect. Therefore, we adopt the commonly used partial correlation as the standardized

mediation effect size (77, see details in the A.1.6).

2.2.3 Inference for dense bi-clusters

The goal of our statistical inference is to test the systematic mediation pattern for each

estimated dense bi-cluster Ûc ⊗ V̂d. Since Uc ⊗ Vd are not specified before the data analysis, the

classical statistical inference methods for the fixed parameter(s) are not applicable. We consider

Ûc⊗V̂d as a data-driven “cluster” object. In the large body of the neuroimaging statistics literature,

permutation test-based family-wise-error (FWE) methods are widely used to test the statistical

significance of clusters [78, 79].

In our application, we implement the FWE-based inference procedure as follows. First,

we permute the labels of mediators and outcomes to generate B permutation datasets Db =

(Xb,Mb,Yb), b = 1, . . . , B, where, for example, B = 10000. Next, we apply the MMO Algo-

rithm 3 to each permuted dataset and the observed dataset. For each permutation dataset Db, the

test statistic is calculated by:

Tb = max
{Ûc,V̂d}

∑
i∈Ûc,j∈V̂d

− log(pbij)√
|Ûc||V̂d|

, (2.9)

where pbij is calculated from the mediation analysis Xb → Mb(i) → Yb(j). The test statistic

integrates the mediation effect size, and the extent and the density of the mediation dense bi-

cluster, and thus, it is superior than the commonly used cluster extent [80]. Last, we calculate the

FWE q-values for the observed mediation dense bi-clusters based on the percentile ranks of their

25


test statistics among {Tb} from the permuted datasets.

2.3 Data Example

We applied the proposed method to the multimodal brain imaging data from the Amish

Connectome Project (ACP), which is a subcohort of the Human Connectome Project [49]. In this

study, there were 94 male and 110 female participants (i.e. n = 204) with ages 39.3± 16.9. One

goal of the Amish Connectome Project is to investigate the association between the risk factors

for cardiovascular disease and brain structures and functions (i.e., heart and brain). Multiple

clinical measurements for the risk factors for cardiovascular disease were collected, including

SBP and total cholesterol. Multimodal imaging data were also acquired, including CBF, which

was calculated from arterial-spin labeling imaging data [81], and ReHo, which was calculated

from resting-state fMRI data [55]. Specifically, ReHo was derived from the BOLD signal using

Kendall’s coefficient of concordance. In our data example, both mediators (CBF) and outcomes

(ReHo) are region-level brain imaging measures based on a commonly used brain atlas of 107

regions (JHU-MNI, 82). Previous studies have revealed significant correlations between SBP and

CBF and between SBP and ReHo [83, 84]. Covariates (e.g., age and sex) were adjusted in the

analysis. However, it remains unclear how SBP can affect ReHo by influencing CBF, and this

naturally requires a complex mediation analysis with high-dimensional mediators and outcomes.

The details of the multimodal imaging data acquisition and preprocessing, and the validity of the

studied pathways are provided in the A.2.1 of the Appendix.

The two-step procedure (MMO) was applied to the multimodal imaging data with n = 204

and p = q = 107. We first performed mediation analyses to obtain the W matrix and then

26


extracted mediation dense bi-clusters using Algorithm 3. The KL criterion selected λ = 1.8.

The results show one significant dense bi-cluster with six CBF ROIs and 59 ReHo ROIs (i.e.,

C = D = 1, I1 = 6, J1 = 59) with an FWE rate adjusted q-value of 0.019. Figures 2.3(a)

and 2.3(b) illustrate the input matrix and outcome matrix for step 1, and Figure 2.2 shows CBF

and ReHo ROIs overlaid onto a 3D brain imaging template. Next, we applied the factoriza-

tion method to estimate the orthogonal vectors from the six CBF ROIs. Interestingly, only one

vector was extracted. It explains 88% of the variance of the six mediators. Last, we estimated

the systematic mediation effect. The mean of the mediation partial correlation coefficients for

SBP→ CBF component→ 59 ReHo is −0.094 (a medium effect size, 85) with a standard devi-

ation of 0.015. The negative mediation effect suggests that SBP can decrease ReHo in multiple

brain areas by first reducing CBF in the six areas where CBF regions are positively associated

with ReHo. The partial correlations between pairs of SBP, CBF, and ReHo are shown in Fig-

ure 2.3(c). The average mediation proportion across all outcomes is 88.30%.

The systematic mediation pattern discovered by MMO reveals the critical pathways for

CBF: cingulate gyrus left (CingG L), cingulate gyrus right (CingG R), supramarginal gyrus left

(SMG L), supramarginal gyrus right (SMG R), cingulum (cingulate gyrus) left (CGC L), and

inferior occipital gyrus right (IOG R) [Figure 2.2(a)]. Specifically, for the six CBF regions,

CingG L, CingG R, and CGC L are centred around the middle cerebral, while SMG L, SMG R,

and IOG R are close to posterior cerebral arteries, which play a central role in affecting ReHo.

Regional-specific changes partially influence ReHo scores related to neuropsychiatric illness and

CBF, where CBF is itself influenced by SBP. The 59 ReHo regions cover a large proportion of

the brain, including superior frontal gyrus left (SFG L), middle frontal gyrus right (MFG R),

etc. in the frontal lobe [Figure 2.2(b)]; superior corona radiata left (SCR L), CGC R, etc. in the

27


(a) CBF Regions (b) ReHo Frontal

(c) ReHo Corona (d) ReHo Temporal

(e) ReHo Parietal (f) ReHo midbrain

Figure 2.2: Selected Regions for CBF (mediators) and ReHo (outcomes): (a) is the six regions for
CBF (mediators) including the six region names which are close to the bilateral primary cerebral
arteries, (b)-(f) demonstrate the selected regions of ReHo (outcomes) in the areas of frontal,
corona, temporal, parietal and midbrain.

28


(a) Mediation p-matrix (b) Mediation dense bi-cluster

(c) Mediation effect by partial correlation

Figure 2.3: The mediation analysis results for the data example. The mediation variable is systolic
blood pressure (SBP). (a) is the heatmap of − log(p)-matrix based on the input data. (b) is
the heatmap with detected systematic mediation patterns. (c) is the mediation results based on
the mediating factor. β and θ effects are averaged across related outcome ROIs. Overall, the
mediation proportion is high.

29


corona radiata [Figure 2.2(c)]; parahippocampal gyrus left (PHG L), superior temporal gyrus

right (STG R), etc. in the temporal lobe [Figure 2.2(d)]; superior parietal lobule left (SPG L),

SMG R, etc. in the cuneus [Figure 2.2(e)]; and insular left (Ins L) and putamen right (Put R)

in the midbrain [Figure 2.2(f)]. These brain areas are involved in multiple cognitive, language,

and motor functions. For example, CingG is related to metabolic reduction for higher executive

functions [86]. The SMG, located in the parietal area, is highly correlated with information

processing and metabolism [87]. The CBF from six regions supports ReHo in the above regions.

Our findings suggest that a reduced CBF can lead to a lower level of ReHo. These explain,

in part, how the risk factors for cardiovascular disease can adversely affect daily functions by

influencing the CBF and, accordingly, ReHo regions. The details of the selected CBF and ReHo

regions and the corresponding full names of the regions are provided in the A.2.4 of the Ap-

pendix. We further performed a sensitivity analysis by following [3] to assess the effects of the

potential violation of the mediation assumptions. The results suggest that our mediation results

are generally robust to moderate violation of assumption (details are in the A.2.3).

For comparison, we also applied the classical BH-FDR methods and three-step regression

methods in [61]. However, no regions were identified under either method, and the systematic

mediation pattern was not discovered. The BH-FDR is conservative when there is enough noise

[88]. We also applied the Pathway Lasso method [89]. Although this method allows us to select

only the CBF regions, we applied it to each outcome and took the union of the results. Conse-

quently, 32 CBF regions were selected. They contain the regions selected by our method.

30


2.4 Simulation

We simulated the multimodal imaging data for the mediation analysis, including a scalar

exposure variable, multivariate mediating imaging variables, and multivariate imaging outcomes.

Specifically, we simulated the exposure X, orthogonal mediator factors M̃, and outcomes Y with

(X, M̃,Y) ∼ N(µ,Σ), where the dimension of the multinormal distribution was the sum of

the number of latent factors and outcomes plus the exposure, and µ = (µX , µM̃, µY)
′ and Σ is

covariance matrix (see details in A.3.1 in the Appendix). We then generated observed mediator

variables M, as described in step 2.

2.4.1 Parameters and settings

For simplicity, we fixed µ = 0. The parameters in the data simulation were the sample size

n, effect size of the mediation, dimension of mediating and outcome variables, and the cardinality

of the mediation structure. We let p = q = 100 and 300, n = 80 and 200, set the mediation effect

as±0.24 and 0.16, and included two cardinalities |Uc⊗Vd| = 10×10 and |Uc⊗Vd| = 20×20. We

assessed the performance of our method for combinations of different sample sizes, mediator and

outcome dimensions, mediation dense bi-cluster sizes (cardinalities), and effect sizes. We also

evaluated the performance of our method under the settings of non-normally distributed mediators

and outcomes. Specifically, we considered the Cauchy distribution and Laplace distribution.

Moreover, we tested our method under the settings with mediators generated from non-orthogonal

factors (factor correlations varying from 0.5–0.8). The details of data generation and analysis

results of these two settings are provided in A.3.2 in the Appendix.

31


2.4.2 Comparable methods

We first assessed the accuracy of the multivariate mediator and multivariate outcome me-

diation analysis using the results of step 1. The goal of the step 1 analysis is to recover the

systematic mediation pattern. We compared the results for our method with those of existing

methods, including univariate mass methods BH-FDR and a three-step LSEM from [61] and a

pathway least absolute shrinkage and selection operator (Lasso) method (PathL) from [89]. Since

pathway Lasso by default handles a single outcome, we aggregated selected mediators for each

outcome. We calculated the sensitivity and FDR of the mediation patterns as the evaluation cri-

teria by comparing the estimated Ûc ⊗ V̂d with the true Uc ⊗ Vd. Specifically, the sensitivity

is ∑
c,d |(Ûc ⊗ V̂d) ∩ (Uc ⊗ Vd)|∑

c,d |Uc ⊗ Vd|
,

and the FDR is ∑
c,d(|Ûc ⊗ V̂d| − |(Ûc ⊗ V̂d) ∩ (Uc ⊗ Vd)|)∑

c,d |Ûc ⊗ V̂d|
,

where |(Ûc ⊗ V̂d) ∩ (Uc ⊗ Vd)| is the number of edges intersecting between the estimated set and

the true set.

We evaluated the accuracy of the mediation effect estimation for step 2, given the estimated

mediation patterns. We compared our model with the ‘oracle model’, which directly estimates

the mediation effects based on true latent mediating factors and a mediation low-rank factoriza-

tion model (medLRM) that decomposes all mediating variables without leveraging the estimated

mediation pattern. We calculated the estimation bias of the mediation effects of MMO and

medLRM in contrast to the oracle model. We also calculated the coverage probabilities with

32


asymptotic 95% confidence intervals [90].

2.4.3 Results

We summarized the edge-wise results for mediation pattern detection in Figure 2.4 and Ta-

ble 2.1. Our approach MMO achieved high sensitivity across all settings with a well-controlled

FDR. These results indicated that MMO can accurately reveal the mediation pattern. In compari-

son, the sensitivity rates of the competing methods were lower when the sample size or effect size

was relatively small, although the FDRs were similar. The reason for this difference is mainly

because MMO can lend strength to mediators and outcomes, and thus, it joins the weak signals

in organized structures. When the sample size and effect size were larger, all methods had excel-

lent performance. Moreover, the performance of MMO was more robust for both negative and

positive mediation effects.

(a) FDR Comparison (b) Sensitivity Comparison

Figure 2.4: Edge-wise bi-cluster simulation results for all methods: the scenarios on x axis cor-
respond to the scenarios in Table 2.1 from left to right, top to bottom.

33


Mediation effect size
and Sample size Method

Cluster Size = 10× 10 Cluster Size = 20× 20
FDR sens spec FDR sens spec

BH 0 0.021(0.135) 1 0 0.155(0.359) 1
Effect=0.24 3-step 0.118(0.247) 0.371(0.459) 1∗ 0.014(0.101) 0.296(0.433) 1∗

n=80 PathL 0.222(0.245) 0.490(0.179) 0.973(0.042) 0.098(0.018) 0.365(0.127) 0.979(0.042)
MMO 0.106(0.188) 0.955(0.129) 0.998(0.004) 0.057(0.116) 0.950(0.131) 0.997(0.001)

BH 0.017(0.048) 0.728(0.434) 1∗ 0.001(0.013) 0.895(0.299) 1∗

Effect=0.24 3-step 0.149(0.162) 0.952(0.161) 0.998(0.003) 0.022(0.24) 0.974(0.131) 1∗

n = 200 PathL 0.129(0.102) 0.740(0.182) 0.984(0.017) 0.106(0.125) 0.629(0.086) 0.980(0.023)
MMO 0.040(0.084) 0.994(0.028) 1∗ 0.004(0.020) 1 1∗

BH 0 0 1 0 0.017(0.128) 1
Effect=0.16 3-step 0.082(0.215) 0.204(0.368) 1∗ 0.009(0.025) 0.112(0.294) 1∗

n = 80 PathL 0.208(0.248) 0.530(0.142) 0.971(0.046) 0.106(0.181) 0.405(0.128) 0.976(0.043)
MMO 0.136(0.208) 0.950(0.154) 0.997(0.005) 0.124(0.129) 0.937(0.152) 0.993(0∗)

BH 0 0.607(0.481) 1 0.004(0.019) 0.758(0.409) 1∗

Effect=0.16 3-step 0.114(0.160) 0.899(0.207) 0.998(0.003) 0.037(0.082) 0.946(0.189) 0.998(0.005)
n = 200 PathL 0.086(0.193) 0.460(0.135) 0.993(0.015) 0.036(0.095) 0.379(0.141) 0.996(0.009)

MMO 0.041(0.076) 1 1∗ 0.047(0.078) 1 0.998(0.004)

BH 0∗ 0.763(0.420) 1 0.007(0.027) 0.863(0.341) 1∗

Effect=-0.24 3-step 0.003(0.019) 0.150(0.359) 0.999(0.003) 0.0002(0.002) 0.111(0.313) 1∗

n = 200 PathL 0.167(0.214) 0.630(0.134) 0.981(0.027) 0.057(0.121) 0.445(0.107) 0.993(0.016)
MMO 0.030(0.099) 0.999(0.009) 0.999(0.002) 0.023(0.070) 0.993(0.057) 0.999(0.004)

Sample size = 200 Sample size = 80
p, q = 300 BH 0.017(0.060) 0.504(0.492) 1∗ 0 0 1

Effect=0.16 3-step 0.131(0.155) 0.712(0.437) 0.999(.0002) 0.025(0.033) 0.096(0.287) 1∗

Cluster Size = 30× 30 PathL 0.087(0.084) 0.667(0.096) 0.993(0.007) 0.238(0.149) 0.393(0.113) 0.985(0.011)
MMO 0.076(0.129) 0.966(0.106) 0.999(0.003) 0.089(0.126) 0.867(0.219) 0.999(0.002)

Table 2.1: Edge-wise results of extracting mediation pattern Uc⊗ Vd by all methods. We demon-
strate the false discovery rate (FDR), sensitivity (sens) and specificity (spec) across different
settings for all comparable methods. ∗ represents a rounded number.

34


We listed the estimates of the mediation effect size (step 2 of MMO) in Table 2.2. Since the

true mediators and corresponding outcomes were implicit as the input to each method, we used

the results from the oracle model analysis, which used true mediating factors M̃, as a reference.

We then performed a low-rank mediation analysis with MMO and medLRM and compared the

estimated mediation effect size with that of the oracle model. We calculated the average medi-

ation effect size for all outcomes for each simulated dataset. In general, MMO outperformed

medLRM across all settings because it can more accurately capture the true mediators. In addi-

tion, our model is generally robust to the non-normally distributed mediators and outcomes (See

Table A.2 and A.3 in the A.3.2 ). When non-orthogonal mediating factors present, the mediation

pathway extraction by the Step 1 of our method remain accurate. However, the performance of

the mediation effect estimation can vary due to non-orthogonal factors (See Table A.4 and A.5 in

the A.3.2).

Effect size
and Sample size Method

signal region = 10× 10 signal region = 20× 20
Mean Bias Coverage Prob Mean Bias Coverage Prob

effect=0.24
n=80

‘Oracle Model’ 0.295(0.055) 0.292(0.058)
medLRM 0.073(0.093) 0.222(0.093) 15.9% 0.142(0.111) 0.149(0.105) 47.8%

MMO 0.259(0.095) 0.036(0.078) 92.5% 0.285(0.058) 0.007(0.027) 98.9%

effect=0.24
n=200

‘Oracle Model’ 0.279(0.047) 0.285(0.039)
medLRM 0.071(0.086) 0.208(0.081) 9.7% 0.229(0.086) 0.056(0.072) 74.6%

MMO 0.266(0.052) 0.012(0.021) 98.8% 0.282(0.040) 0.003(0.005) 99.29%

effect=0.16
n=80

‘Oracle Model’ 0.218(0.047) 0.193(0.049)
medLRM 0.008(0.021) 0.210(0.047) 0% 0.034(0.057) 0.160(0.061) 14.7%

MMO 0.192(0.073) 0.026(0.052) 93.4% 0.174(0.069) 0.019(0.043) 94.2%

effect=0.16
n=200

‘Oracle Model’ 0.195(0.035) 0.193(0.035)
medLRM 0.018(0.033) 0.178(0.044) 0.9% 0.028(0.041) 0.165(0.049) 5.0%

MMO 0.186(0.041) 0.010(0.017) 83.9% 0.182(0.039) 0.011(0.017) 93.8%

Effect=-0.24
n=200

‘Oracle Model’ -0.250(0.044) -0.249(0.047)
medLRM -0.015(0.032) 0.235(0.050) 0.7% -0.094(0.094) 0.156(0.092) 29.5%

MMO -0.233(0.062) 0.017(0.044) 96.7% -0.233(0.068) 0.016(0.053) 96.5%
Sample Size = 200 Sample Size = 80

Mean Bias Coverage Prob Mean Bias Coverage Prob
p, q = 300
effect=0.16

Cluster Size = 30× 30

‘Oracle Model’ 0.191(0.037) 0.194(0.051)
medLRM 0.010(0.016) 0.181(0.040) 0.7% 0.025(0.047) 0.170(0.056) 8.2%

MMO 0.174(0.038) 0.017(0.033) 95.7% 0.179(0.056) 0.016(0.031) 96.0%

Table 2.2: Simulation results for mediation effect estimation (step 2). We compare the estimated
mediation effects by Low Rank Model (medLRM) and MMO, with reference to the estimated
mediation effect based on the oracle model (with known mediating imaging factors).

35


2.5 Discussion

This chapter attempts to solve the mediation problem with high-dimensional-mediators

and high-dimensional outcomes. This dual high-dimensionality can lead to a massive number of

mediation pathways, which, thus, gives rise to numerical challenges for mediation analysis. We

proposed a mediation dense bi-cluster structure to identify systematic mediation patterns under

the bipartite graph framework, which reflects that the exposure affects multiple outcomes via

influencing a corresponding set of mediators. We developed cluster-wise inference for set-to-set

mediation patterns and statistical methods for mediation estimation.

This chapter makes several methodological contributions to neuroimaging statistics. First,

our method extracts multivariate-mediator and multivariate-outcome mediation patterns using a

graph-based bi-cluster model. The model effectively reveals the underlying systematic mediation

pathways, which can be recovered with the guaranteed optimality of pattern extraction. Second,

our method provides an approach for systematic mediation effect estimation. Within a dense

bi-cluster structure, we factorize the mediators and take the product of partial correlations as the

standardized mediation effects. Our model can also handle dense bi-clusters sharing the identical

outcome set by concatenating mediator sets to estimate mediating factors. Finally, we provide a

computationally efficient algorithm for an integrated multimodal high-dimensional data analysis

and an open-source toolkit for the mediation analysis.

Our method has limitations. MMO cannot identify the sequential mediation model if the

previous mediators (or outcomes) can influence later ones in the proper order [89]. In addition,

estimating mediation effects on orthogonalized factors can lead to an unstraightforward inter-

pretation. Last, we could extend the existing model to handle longitudinal multimodal imaging

36


data.

In our analysis of application data, we first identify the systematic mediation pattern SBP→

CBF → ReHo. The CBF regions and ReHo regions can better reveal how cardiovascular risk

factors affect brain-related functions through neural pathways related to CBF around vast areas

of ReHo. The joint analysis of multimodal imaging data via a mediation model provides neu-

rological insights into the influence of SBP on ReHo via CBF. These findings may lead to more

effective treatments for brain diseases related to blood oxygen levels. In contrast, the traditional

methods, including BH-FDR, may miss the mediation pattern. Therefore, our method may be an

alternative tool for analyzing integrated multimodal imaging data.

In summary, we developed MMO, a method for the joint analysis of multimodal neu-

roimaging data in a mediation framework. The multivariate-mediator and multivariate-outcome

mediation tool can also be applied to other multimodal high-dimensional data, for example,

omics-imaging data and omics-omics data [91]. Our algorithm is scalable since the computa-

tional cost for the above application is moderate. The software package for the proposed method

is available at https://github.com/zhuivv/MMO.git.

37

https://github.com/zhuivv/MMO.git


Chapter 3: Marginal Structure Ensemble Learning Model (MASE)

3.1 Introduction

Recent large-scale longitudinal observational studies have collected extensive data, includ-

ing hundreds of repeated measurements of exposures, covariates, and outcomes. For example,

the Adolescent Brain Cognitive Development (ABCD) study, which investigates youth cognitive

development, provides numerous variables encompassing cultural environment, physical, mental,

neurocognitive, and genetic factors. To understand the influence of these exposures, it is imper-

ative to uncover the underlying causal effects. The current literature offers methods to address

limited number of confounders with linear confounding effects, including inverse probability

weighting [92, 93], conditional mean imputation [32], and matching [94]. When confronted with

high-dimensional confounders, machine learning methods have proven effective in handling the

massive covariates with intricate and non-linear effects in estimating causal effects [95, 96, 97].

However, these approaches have typically focused on studies with a single visit and are inade-

quate for longitudinal data. The implementation of machine learning methods in studies with

time-varying exposure and confounders requires more evaluations and non-trivial efforts due to

the complex treatment-confounder feedback, where time-dependent confounders might be influ-

enced by past exposure.

To estimate the effects of time-varying exposures in the presence of time-varying con-

38


founders, the Marginal Structural Model (MSM) has become a popular tool in longitudinal causal

inference. This method constructs a pseudo-population and incorporates both confounders and

time-varying exposures into weights. However, correctly modeling propensity scores is crucial

and can be challenging when dealing with high-dimensional confounders or non-linear effects.

To address these challenges, Doubly Robust (DR) methods have been introduced [34, 98, 99]. DR

estimators are derived by subtracting the projection onto Hilbert space and generally involve two

nuisance parameters: treatment estimator (e.g., propensity scores) and outcome estimator (e.g.,

iterative conditional mean) [100]. They are ‘doubly robust’ in the sense that if either the treatment

model or the outcome model is correct, the final estimate will be consistent. More importantly,

a doubly robust estimator will result in a faster convergence rate when nuisance parameters are

all correctly estimated. This property enables the use of machine learning in estimating nuisance

parameters, as the estimation uncertainty from machine learning becomes negligible under mild

conditions [95, 101]. As a result, the combination of machine learning methods with doubly

robust estimators offers a promising solution to tackle this challenge [102, 103]. Particularly,

ensemble learning has gained increased popularity due to its robust and superior performance,

achieved by integrating multiple machine learning models [97, 104]. However, there remains a

gap in the application of causal machine and ensemble learning to longitudinal data, particularly

for machine learning-assisted doubly robust MSM.

Motivating Study: Our current research is inspired by an investigation into the influence

of insufficient sleep on adolescent cognitive development, utilizing data from the ABCD study

[105]. The ABCD study collects data from 11, 868 adolescent participants at ages 10, 12, and

14. During each visit, various measures are recorded, including sleep patterns, cognitive per-

formance, and a range of psychographic variables. Insufficient sleep is a modifiable and health-

39


related factor that can adversely affect the cognitive development of young individuals. However,

sleep insufficiency can be time-varying due to the increasing use of electronic devices along ag-

ing. In addition, hundreds of time-varying covariates (e.g., environmental factors, nutrition, and

socioeconomic status) can act as confounders. Therefore, it is challenging to achieve a consistent

estimate of the effect of sleep insufficiency in a large-scale longitudinal study with numerous

repeatedly measured confounders and potential non-linear confounding effects.

To bridge the gap, we propose a Marginal Structure Ensemble Learning Model (MASE).

MASE is developed based on the MSM solved by an efficient score, with the estimation of

propensity scores and iterative conditional means as nuisance parameters. We develop a compu-

tationally efficient ensemble model to integrate multiple machine learning algorithms to model

these nuisance parameters. We also study the theory and provide the analytical solution for the

inference of causal parameter estimations. In numerical studies, we observe that our method

effectively handles high-dimensional confounders, appropriately accounts for the longitudinal

data structure, and outperforms existing methods. More importantly, the application of MASE

contributes significantly to bridging gaps in our understanding of the long-term relationship be-

tween insufficient sleep and cognitive outcomes using ABCD data. Our method aligns with the

asymptotic theory of doubly robust machine learning estimation in longitudinal setting developed

by [106]. The main difference is that our method combined the ensemble learning method and

considered the non-compatible problem for outcome regression [34].

The rest of the chapter is organized as follows. Section 3.2 describes the proposed model,

including the estimation process and inference method. Section 3.3 showcases the application

of our method to the ABCD study, outlining the effect estimation and presenting the inference

results. Section 3.4 implements simulations to demonstrate the empirical performance of our

40


method with finite sample sizes. We conclude the chapter with a discussion in Section 3.5.

Technical details and extra application results can be found in the Appendix.

3.2 Methods

3.2.1 Marginal Structural Model with Time-varying Exposure

We consider a longitudinal study featuring a single binary exposure, multiple continuous

outcomes, and high-dimensional confounders encompassing various data types. For i = 1, . . . , n,

let Di = (Ait,Zit,Yit) represent the independent identically distributed data for subject i across

visit t = 1, . . . , T . Specifically, Ait = ait ∈ {0, 1} denotes a univariate binary exposure vari-

able, Zit = (Z1
it, . . . , Z

pt
it )

T is a pt-dimensional vector of observed confounders, with the number

of covariates not necessarily being uniform at each visit. Further, Yit = (Y 1
it , . . . , Y

q
it)

T repre-

sents q outcomes, which can be either continuous or categorical. For any variable {vt}1≤t≤T ,

we denote v̄t = (v1, . . . , vt), t = 1, . . . , T as the historical measurements of variable v up to

the t-th visit. We define the counterfactual outcome [107] for the j-th outcome at time T as

Y j
i (ĀiT = {ai1, . . . , aiT}) for subject i. The population mean of the counterfactual outcome is

given by E(Yj(ĀT = {a1, . . . , aT})|Z0), where Z0 ∈ Rn×p0 is the baseline covariates matrix, and

At ∈ Rn is the exposure vector.

Figure 3.1 illustrates the directed acyclic graph (DAG) of our model, with all variables re-

peatedly measured at two time points (using a single outcome for demonstration purposes). Be-

cause both high-dimensional confounders and exposures are longitudinally measured and time-

varying, estimating the aggregated causal effect is challenging. Classic methods, such as MSM,

estimate the causal effect by leveraging propensity score weighting. Inspired by our data analysis,

41


which suggests that previous cognitive outcomes can affect subsequent results (e.g., the crystal-

lized composite score), we propose a novel propensity score model that incorporates the effects

of all time-varying confounders along with past outcomes:

πi1(ai1, zi1) = P (Ai1 = ai1|Zi1 = zi1)

πit(āit, z̄it, ȳi(t−1)) = P (Ait = ait|Āi(t−1) = āi(t−1), Z̄it = z̄it, Ȳi(t−1) = ȳi(t−1)), 2 ≤ t ≤ T.

(3.1)

To ease the notation, we redefine Z̄it = (Z̄it, Ȳi(t−1), t = 2, . . . , T . Then, the second PS function

in (3.1) can be simplified as πit(āit, z̄it).

Figure 3.1: ABCD tabulate data structure. The solid lines represent the impact of variables to the
others, the dashed line represents possible impact of previous measurements on the later ones.
Notations A,Z, and Y are defined in section 3.2.1.

To ensure the identifiability of average causal effect τ(āt), we adopt the commonly used

assumptions consistent with MSM. Specifically, in addition to the Stable Unit Treatment Value

42


Assumption (SUTVA, 108), we assume that, for each individual i and each outcome j, the fol-

lowing hold: (1) consistency: Y j
i (āiT ) = Y j

iT if ĀiT = āiT ; (2) no unmeasured confounding

(NUC): for all āiT and t, Y j
i (āiT ) |= Ait|Zit, Āi(t−1) = āi(t−1); and (3) positivity: for all t and

āit, πit(āit, z̄it) > 0. Throughout the chapter, we refer to these assumptions as the Multivariate

Longitudinal Sequential (MLS) assumptions. We then provide the MSM for the j-th outcome as:

E(Yj(āT )|Z0) = τ(āT ,Z0; θ
j),∀j, (3.2)

where τ(·, ·; ·) is a known function that models the j-th outcome based on exposures and co-

variates at baseline, and θj is the vector of estimands. For example, we have τ(ā2,Z0; θ
j) =

βj
0 + βj

1a1 + βj
2a2 as a linear function with θj = (βj

0, β
j
1, β

j
2)

′, where a1, a2 correspond to the ado-

lescent sleep time measured at visits one and two. Or moreover, with effect modification, we can

have τ(ā2,Z0; θ
j) = βj

0+βj
z0z0+βj

1a1+βj
2a2+βj

z1z0∗a1+βj
z2z0∗a2. We will focus on the previous

model with a simple linear form, and we are interested in estimating the average treatment effect

(ATE) for population getting sufficient sleep all the time (treatment regime (0, 0)) versus popula-

tion not having sufficient sleep at all (treatment regime (1, 1)) as: ATE := θjATE = βj
1 + βj

2,∀j.

3.2.2 Doubly Robust Estimator

Propensity score (PS) weighting is a common approach for estimating the MSM equa-

tion (3.2). The weights can be calculated as wit = 1/ΠT
t=1πit(āit, z̄it). However, when massive

time-varying confounders are present, the model specified for propensity score can become in-

correct due to numerical challenges. These incorrect modeling of propensity scores can lead to

inconsistent estimates of causal effects within the MSM framework. Therefore, we are moti-

43


vated from the doubly robust estimator [34] and aim to develop a causal machine and ensemble

learning-based estimator mitigating the issue of model mis-specification.

The doubly robust estimator combines the estimates of conditional means and propensity

score weights, ensuring consistent estimates of causal effects within the MSM framework when

either the estimate of conditional means or propensity scores is corretly specified. We first intro-

duce an Iterative Conditional Mean Estimation (ICE) procedure:

ηjT (āT , z̄T ) = E(Yj
T |ĀT = āT , Z̄T = z̄T )

ηjt (āT , z̄t) = E(ηjt+1(āT , Z̄t+1)|Āt = āt, Z̄t = z̄t), t = T − 1, T − 2, . . . , 1

ηj0(āT , z0) = E(ηj1(āT ,Z1)|Z0 = z0), (3.3)

for any j = 1, . . . , q.

For simplicity, in the rest of this chapter, we will eliminate the superscript j for η’s. The

doubly robust estimator for MSM is derived from the following estimating equation. Without

loss of generality, we demonstrate the estimator with two timepoints. The estimating equation

integrates the information from both conditional means and propensity scores [34]:

E(S0 + S1 + S2) = 0, (3.4)

44


where:

S2 =
d(Ā2,Z0)

π1(A1,Z1)π2(Ā2, Z̄2)
{Y2 − η2(Ā2, Z̄2)}

S1 =
∑

a2∈A2

d(A1, a2,Z0)

π1(A1,Z1)
{η2(A1, a2, Z̄2)− η1(A1, a2,Z1)} (3.5)

S0 =
∑

a1∈A1

∑
a2∈A2

d(ā2,Z0){η1(ā2,Z1)− τ(ā2,Z0; θ
j)}.

Here, τ(ā2,Z0; θ
j) is given in (3.2), At, t = 1, 2 represents the exposure set, for example, A2 =

{a1 = {0,1}, a2 = {0,1}}. Additionally, d(·) is any function with the same dimension as

the parameter of interest θj . A simple example is to take d(·) = ∂τ(ā2,Z0; θ
j)/∂θj . When

τ(·, ·; θj) is a linear function, a closed-form solution for the estimating equation is available (see

Appendix B.1) . For a non-linear τ(·, ·; θj), we can solve equation (3.4) using the Newton-

Raph