ABSTRACT

Title of Dissertation: NEW STATISTICAL METHODS FOR
HIGH-DIMENSIONAL INTERCONNECTED DATA
WITH UNIFORM BLOCKS

Yifan Yang
Doctor of Philosophy, 2023

Dissertation Directed by: Professor Shuo Chen
Department of Mathematics

Empirical analyses of high-dimensional biomedical data, including genomics, proteomics,

microbiome, and neuroimaging data, consistently reveal the presence of strong modularity in

the dependence patterns. In these analyses, highly correlated features often form a few distinct

communities or modules, which can be interconnected with each other. While the intercon-

nected community structure has been extensively studied in biomedical research (e.g., gene co-

expression networks), its potential to assist in statistical modeling and inference remains largely

unexplored. To address this research gap, we propose novel statistical models and methods that

capitalize on the prevalent community structures observed in large covariance and precision ma-

trices derived from high-dimensional biomedical interconnected data.

The first objective of this dissertation is to delve into the algebraic properties of the pro-

posed interconnected community structures at the population level. Specifically, this pattern

partitions the population covariance matrix into uniform (i.e., equal variances and covariances)


blocks. To accomplish this objective, we introduce a block Hadamard product representation

in Chapter 2, which relies on two lower-dimensional “coordinate” matrices and a pre-specific

vector. This representation enables the explicit expressions of the square or power, determi-

nant, inverse, eigendecomposition, canonical form, and the other matrix functions of the original

larger-dimensional matrix on the basis of these lower-dimensional “coordinate” matrices.

Estimating a covariance matrix is central to high-dimensional data analysis. Our second

objective is to consistently estimate a large covariance or precision matrix having an intercon-

nected community structure with uniform blocks. In Chapter 3, we derive the best-unbiased

estimators for covariance and precision matrices in closed forms and provide theoretical results

on their asymptotic properties. Our proposed method improves the accuracy of covariance and

precision matrix estimation and demonstrates superior performance compared to the competing

methods in both simulations and real data analyses.

In Chapter 4, our goal is to investigate the effects of alcohol intake (as an exposure) on

metabolomics outcome features. However, similar to other omics data, metabolomic outcomes

often consist of numerous features that exhibit a structured dependence pattern, such as a co-

expression network with interconnected modules. Effectively addressing this dependence struc-

ture is crucial for accurate statistical inferences and the identification of alcohol intake-related

metabolomic outcomes. Nevertheless, incorporating the structured dependence patterns into mul-

tivariate outcome regression models remains difficulties in accurate estimation and inference. To

bridge this gap, we propose a novel multivariate regression model that accounts for the correla-

tions among outcome features using a network structure composed of interconnected modules.

Additionally, we derive closed-form estimators of regression parameters and provide inference

tools. Extensive simulation analysis demonstrates that our approach yields much-improved sen-


sitivity with a well-controlled discovery rate when benchmarking against existing multivariate

regression models.

Confirmatory factor analysis (CFA) models play a crucial role in revealing underlying latent

common factors within sets of correlated variables. However, their implementation often relies on

a strong prior theory to categorize variables into distinct classes, which is frequently unavailable

(e.g., in omics data analysis scenarios). To address this limitation, in Chapter 5, we propose

a novel strategy based on network analysis that allows data-driven discovery to substitute for

the lacking prior theory. By leveraging the detected interconnected community structure, our

approach offers an elegant statistical interpretation and yields closed-form uniformly minimum

variance unbiased estimators for all unknown matrices. To evaluate the effectiveness of our

proposed estimation procedure, we compare it to conventional numerical methods and thoroughly

validate it through extensive Monte Carlo simulations and real-world applications.


NEW STATISTICAL METHODS FOR HIGH-DIMENSIONAL
INTERCONNECTED DATA WITH UNIFORM BLOCKS

by

Yifan Yang

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2023

Advisory Committee:
Professor Shuo Chen, Chair/Advisor
Professor Benjamin Kedem, Co-Chair
Professor Vince Lyzinski
Professor Tianzhou Ma
Professor Xin He


© Copyright by
Yifan Yang

2023


Acknowledgments

I would like to express my heartfelt gratitude to Dr. Shuo Chen, my doctoral research

advisor, for his invaluable guidance throughout this journey at University of Maryland, College

Park. He is always available for academic advice whenever the students need it. Dr. Chen not

only taught me the foundations of research, academic writing, and data analysis in real-world

applications but also nurtured my collaborative skills. His insightful suggestions and unwavering

support have been instrumental in shaping my future career.

I extend my sincere appreciation to all the professors and staff in the Department of Math-

ematics at University of Maryland, College Park. In particular, I am grateful to Dr. Benjamin

Kedem for his exceptional expertise in mathematical statistics, time series analysis, Bayesian

statistics, and various other fields. Dr. Paul Smith’s teachings on linear models, along with his

guidance during numerous fruitful RITs, have been immensely valuable. I would also like to

thank Dr. Eric Slud for his mentorship in computational statistics and categorical data analy-

sis. Dr. Vince Lyzinski, an expert in graph theory and a member of my dissertation committee,

deserves special thanks. Dr. Takumi Saegusa’s instruction on survival analysis and other remark-

able statistical methods in semiparametric models has been invaluable. I am indebted to Cristina

Garcia for her constant kindness and careful planning for the students, as well as to Stephanie

Padgett for assisting us in reserving rooms. I am grateful to Matthew Griffin, the instructor of

STAT 100, for his collaboration and for imparting valuable teaching techniques. Working with

ii


Susan Mazzullo, a dedicated instructor for several undergraduate statistical classes, has been a

pleasure.

I am also grateful to the faculty members in other departments who have contributed to my

academic journey. Dr. Chixiang Chen deserves thanks for proofreading my papers and providing

invaluable suggestions. Dr. Tianzhou Ma’s leadership in bi-weekly group meetings has provided

fresh insights and perspectives. Dr. Xin He, a member of my dissertation committee, has been

unfailingly kind and supportive.

I am deeply thankful to my friends in Maryland, including Chuan Bi, Yunjiang Ge, Hwiy-

oung Lee, Tong Lu, Yezhi Pan, Qiong Wu, Zhenyao Ye, Nathan Yu, Yiran Zhang, Xiaocui Zhang,

Xuze Zhang, Zhiwei Zhao, Xiaoyu Zhou, and more.

Lastly, I would like to express my profound appreciation to my parents, Guangping Yang

and Xiaohui Huang. Thank you for introducing me to this remarkable and beautiful world, en-

abling me to discover my passion and complete this degree. Your love and support have been my

driving force.

iii


Table of Contents

Acknowledgements ii

Table of Contents iv

List of Tables vii

List of Figures ix

Chapter 1: Introduction 1
1.1 Estimation of Large Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Classical Covariance Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Recent Developments of Covariance Structures . . . . . . . . . . . . . . . . . . 3
1.4 Interconnected Community Structure . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 A New Interconnected Community Structure with Uniform Blocks . . . . . . . . 5
1.6 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Chapter 2: A New Representation of Uniform-Block Matrix and Applications 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Uniform-Block Structure and Uniform-Block Matrix . . . . . . . . . . . . . . . 15
2.3 Testing A Specific Mean for One-Sample . . . . . . . . . . . . . . . . . . . . . 22
2.4 Testing the Equality of Means for Multiple-Sample . . . . . . . . . . . . . . . . 26
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Chapter 3: Covariance Matrix Estimation for High-Throughput Biomedical Data with
Dependence Structure of Interconnected Communities 29

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.1 A parametric covariance model with a uniform-block structure . . . . . . 33
3.2.2 Matrix estimation for the uniform-block structure with a small K . . . . . 36
3.2.3 Matrix estimation for the uniform-block structure with a large K . . . . . 43

3.3 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2 Scenario 1: comparison for small K covariance matrix . . . . . . . . . . 46
3.3.3 Scenario 2: comparison for large K covariance matrix . . . . . . . . . . 50
3.3.4 Scenario 3: simulation analysis under model misspecification . . . . . . . 51

3.4 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4.1 Proteomics data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 53

iv


3.4.2 Brain imaging data analysis . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Chapter 4: Modeling Multivariate Outcomes with Interconnected Modules: Evaluating
the Impact of Alcohol Intake on Plasma Metabolomics 58

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2.1 A simultaneous autoregressive regression model . . . . . . . . . . . . . . 62
4.2.2 The MAUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.3 Estimation of the MAUD parameters . . . . . . . . . . . . . . . . . . . . 67
4.2.4 Inference about the MAUD parameters . . . . . . . . . . . . . . . . . . . 71

4.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.1 Data generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.2 Evaluation of estimation of the scaled autoregressive dependence param-

eters by the MAUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.3 Evaluation of statistical inference about the regression coefficients by the

MAUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.4 Misspecification analysis of the MAUD . . . . . . . . . . . . . . . . . . 77

4.4 Investigation of the Effect of Alcohol Intake on Plasma Metabolomics . . . . . . 78
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Chapter 5: Semi-Confirmatory Factor Analysis for Multivariate Data with Intercon-
nected Community Structures 84

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.2.1 A confirmatory factor analysis model . . . . . . . . . . . . . . . . . . . 89
5.2.2 The SCFA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.3 The estimation procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3.1 Data generation procedure . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3.2 Study 1: performance of parameter estimation . . . . . . . . . . . . . . . 97
5.3.3 Study 2: performance of factor scores estimation . . . . . . . . . . . . . 98
5.3.4 Study 3: misspecification analysis . . . . . . . . . . . . . . . . . . . . . 100

5.4 Real Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Chapter 6: Conclusions 104

Appendix A: Supplementary Materials for Chapter 2 106
A.1 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

A.1.1 Proofs in Section 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
A.1.2 Proofs in Section 2.3 and Section 2.4 . . . . . . . . . . . . . . . . . . . . 110

Appendix B: Supplementary Materials for Chapter 3 121
B.1 Extra Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

v


B.2 Exact Covariance Estimators for θ̃ . . . . . . . . . . . . . . . . . . . . . . . . . 122
B.3 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

B.3.1 Proof of Lemma 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
B.3.2 Proof of Corollary 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
B.3.3 Derivations of the maximum likelihood estimator and its property . . . . 126
B.3.4 Proof of Theorem 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
B.3.5 Proof of Corollary 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
B.3.6 Proofs of Corollary 3.2.3 and Corollary B.2.1 . . . . . . . . . . . . . . . 137
B.3.7 Proof of Theorem 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

B.4 Extra Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
B.4.1 Extra simulation studies in Scenario 1 . . . . . . . . . . . . . . . . . . . 151
B.4.2 Extra simulation study in Scenario 3 . . . . . . . . . . . . . . . . . . . . 155

Appendix C: Supplementary Materials for Chapter 4 157
C.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
C.2 Properties of UB-Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
C.3 Plug-In Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
C.4 Technical Conditions and Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . 159
C.5 The Case of Σϵ ̸= IR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

Appendix D: Supplementary Materials for Chapter 5 164
D.1 Technical Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

D.1.1 Proof of Corollary 5.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Bibliography 167

vi


List of Tables

3.1 Estimation results (×100) for a0,kk and b0,kk′ in Scenario 1 under n = 50, 100, or
150, where bias denotes the average bias, MCSD denotes the Monte Carlo stan-
dard deviation, ASE denotes the average standard errors, CP denotes the coverage
probability based on a 95% Wald-type confidence interval. . . . . . . . . . . . . 49

4.1 Estimation results of γ under n = 100, where “bias” denotes the average of
estimation bias, “MCSD” denotes the Monte Carlo standard deviation, “ASE”
denotes the average asymptotic standard error, “95% CP” denotes the coverage
probability based on a 95% Wald-type confidence interval. . . . . . . . . . . . . 75

4.2 Statistical results of the effect of alcohol intake frequency on the biomarkers
associated with the lipoprotein at level α = 0.05, where “IDL” refers to the
intermediate-density lipoprotein, “VLDL” refers to the very-low-density lipopro-
tein, “LDL” refers to the low-density lipoprotein, “HDL” refers to the high-
density lipoprotein, and + and − refer to the signs of estimated regression co-
efficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.1 Estimation results of A = diag (a11, a22, a33) and B = (bkk′) with bk′k = bkk′
for k ̸= k′ in Study 1 by using the proposed method under various n and p,
where “bias” denotes the average of estimation bias, “MCSD” denotes the Monte
Carlo standard deviation, “ASE” denotes the average asymptotic standard error,
“95% CP” denotes the coverage probability based on a 95% Wald-type confidence
interval. The computational times of 100 replicates are around 0.05 seconds and
0.07 seconds for p = 50 and p = 100, respectively. . . . . . . . . . . . . . . . . 98

5.2 Estimation results of A = diag (a11, a22, a33) and B = (bkk′) with bk′k = bkk′
for k ̸= k′ in Study 1 by using the lavaan package under various n and p, where
“bias” denotes the average of estimation bias, “MCSD” denotes the Monte Carlo
standard deviation, “ASE” denotes the average asymptotic standard error. The
computational times of 100 replicates are around 97 seconds and 1776 seconds
for p = 50 and p = 100, respectively. . . . . . . . . . . . . . . . . . . . . . . . . 99

5.3 Euclidean losses of
∑n

i=1

∥∥∥f̂i − fi

∥∥∥ in Study 2 by using a standard SCFA model,
CFA model, and EFA model, respectively, under various n and p, based on 100
replicates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

vii


5.4 Misspecification analysis results of A = diag (a11, . . . , a33) and B = (bkk′) with
bk′k = bkk′ for k ̸= k′ in Study 3 by using the proposed method under various σ
and p, where “bias” denotes the average of estimation bias, “MCSD” denotes the
Monte Carlo standard deviation, “ASE” denotes the average asymptotic standard
error, “95% CP” denotes the coverage probability based on a 95% Wald-type
confidence interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.5 Estimation of the correlation matrix of 227 combinations of neurometabolites and
brain regions, where “SE” denotes the estimated standard error, “95% CI” denotes
the 95% Wald-type confidence interval, and † denotes the 95% CI containing 0. . 102

B.1 Estimation results of the total number of rejectionsR(t), the number of correct re-
jections S(t), and the false discovery proportion (FDP) using the true covariance
matrix, the POET estimator, and the proposed estimator . . . . . . . . . . . . . . 154

viii


List of Figures

1.1 The plots in the first column are the heat maps of sample correlation matrices
for the raw datasets. The plots in the second column are the heat maps of sample
correlation matrices with reordered features/variables using community detection
algorithms. We call these structures interconnected community structures with
(generalized) well-organized blocks (if the singletons exist). The plots in the third
column are the heat maps of sample correlation matrices with features/variables
that are extracted from the reordered ones. We call these structures interconnected
community structures with well-organized blocks. The plots in the fourth column
are the heat maps of population correlation matrices that are assumed to have the
interconnected community structures with uniform blocks. . . . . . . . . . . . . 10

2.1 Illustration of the block Hadamard product representation of a UB matrix Σ [A,B,p] =
A ◦ I[p] + B ◦ J[p], where p = (2, 3)⊤, K = 2, p = 5, each square represents an
element of Σ [A,B,p], different colors represent different values. . . . . . . . . 18

3.1 A: top, heat map of the sample correlation matrix calculated by a K-medoids
clustering algorithm for Spellman’s yeast genome study, showing 8 by 8 well-
organized blocks (without singletons) in the structure; bottom: an assumed pop-
ulation correlation matrix with an 8 by 8 uniform-block structure. B and C: top,
heat maps of the sample correlation matrices calculated by a network detection
algorithm for proteomics and brain imaging datasets, respectively, showing the
block patterns (with singletons); middle, heat maps in the black frames in the top
parts, showing 7 by 7 and 5 by 5 well-organized blocks respectively in the struc-
ture; bottom, assumed population correlation matrices with a 7 by 7 and a 5 by
5 uniform-block structures, respectively. D: top, illustration of a network model
with 5 communities; bottom, a corresponding population matrix with a 5 by 5
uniform-block structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Results of the losses in the Frobenius and spectral norms and the execution time
for the proposed covariance- and precision-matrix estimators and the conven-
tional estimators in simulation studies. The proposed estimators outperform the
competitors regarding fewer losses and shorter computational times. . . . . . . . 50

ix


3.3 A: the first two are the heat maps of the sample correlation matrices for the pro-
teomics dataset, the third one exhibits the estimates of b0,kk′ (their standard errors
are between 0.03 and 0.07; for each k, the estimate of a0,kk is one minus the es-
timate of b0,kk with standard errors around 0.01), and the fourth one contains the
confidence intervals for b0,kk′ (+ refers to the 95% confidence interval on the right
of 0, − refers to the 95% confidence interval on the left of 0, 0 refers to the 95%
confidence interval containing 0). B: the first two are the heat maps of the sample
correlation matrices for the brain imaging dataset, the third one exhibits the esti-
mates of b0,kk′ (their s.e. are between 0.07 and 0.11; for each k, the estimate of
a0,kk is equal to one minus the estimate of b0,kk with standard errors around 0.01),
and the fourth one contains the confidence intervals for b0,kk′ (+ refers to the 95%
confidence interval on the right of 0, − refers to the 95% confidence interval on
the left of 0, 0 refers to the 95% confidence interval containing 0). . . . . . . . . 56

4.1 A: 249 by 249 sample correlation matrix (of residuals) calculated from a sub-
set of raw NMR data (Ritchie et al., 2023); B: an interconnected community
structure with generalized (with singletons) well-organized blocks in the sample
correlation matrix provided by a community detection algorithm (Chen et al.,
2023); C: a 170 by 170 sample correlation matrix (of residuals) extracted from
the black frame in B, which is called an interconnected community structure with
well-organized blocks; D: a population correlation matrix with an interconnected
community structure with (5 by 5) uniform blocks; E: a partition-size vector
ℓ = (L1, L2)

⊤ = (2, 3)⊤ with R = 5 and G = 2; F: an illustration of the
block Hadamard product representation N (A,B, ℓ) = A ◦ I(ℓ) + B ◦ J(ℓ); G: an
illustration of a uniform-block matrix N (A,B, ℓ), where the cells with different
colors in the matrices represent the elements with different values, the cells with
different colors in ℓ represent the features in different communities. . . . . . . . 61

4.2 ROC curves under various levels of effect size κ for different covariance matrix
estimators, different statistical models, and different noise levels of covariance
matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3 Forest plot of 95% confidence intervals for 249 biomarkers regression coeffi-
cients, biomarkers in “black” refer to the significant ones in 5 communities,
biomarkers in “blue” refer to the significant singletons, biomarkers in “grey” refer
to the non-significant ones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.1 Illustration of the workflow for the SCFA, CFA, and EFA models: we first apply a
community detection algorithm to the raw data; if the interconnected community
structure is detected, then we use the SCFA model; otherwise, we consider the
conventional EFA or CFA models. . . . . . . . . . . . . . . . . . . . . . . . . . 85

x


5.2 Real examples of the interconnected community structures (the first row) and
the corresponding assumed population covariance matrices (the second row). A:
the sample and assumed population covariance matrices for a seed quality study
(Perrot-Dockès et al., 2022); B: the sample and assumed population covariance
matrices for a nuclear magnetic resonance study (Ritchie et al., 2023); C: the
sample and assumed population covariance matrices for an echo-planar spectro-
scopic imaging study (Chiappelli et al., 2019); D and E: the sample and assumed
population covariance matrices for an environmental study involving exposome
and metabolites (ISGlobal, 2021). . . . . . . . . . . . . . . . . . . . . . . . . . 87

B.1 Heat maps for exposome and metabolite data in an environmental research . . . 121
B.2 Box plots of the off-diagonal entries in the diagonal blocks and of all entries in

the diagonal blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

xi


Chapter 1: Introduction

1.1 Estimation of Large Covariance Matrix

Technological innovation has had a profound impact on scientific discoveries by integrat-

ing the analysis of large-scale datasets into various fields. Statistically, a covariance matrix or the

precision matrix (i.e., the inverse of a covariance matrix) is crucial in understanding the intricate

relationships among massive and dependent covariates. Thus, considerable attention has been

concentrated on the use of an appropriate covariance (or precision or correlation) matrix estima-

tor. A practical issue arises from the fact that the sample size n in research is often inadequate in

comparison to the number of covariates, say p. Thus, when the number of covariates exceeds the

sample size, i.e., p > n, the sample covariance matrix, which is a natural choice of the covariance

matrix estimator in the case p < n, fails to be a reliable estimator due to the absence of positive

definiteness (Dykstra, 1970) and the inconsistency of eigenvalues and eigenvectors (Johnstone,

2001; Johnstone and Lu, 2009; Johnstone and Paul, 2018). Consequently, Fan (2005) asserted

that it is essentially challenging to obtain an appropriate covariance estimator without imposing

a structure assumption on a large covariance matrix.

1


1.2 Classical Covariance Structures

Focus has been devoted to structured or patterned matrices, not only because they hold

significance in theoretical analyses for multivariate models but also because they serve as funda-

mental tools in numerous applications. Many notable instances of structured matrices have found

applications in multivariate analysis.

One straightforward example involves testing a given (known) covariance matrix statisti-

cally based on multivariate normal observations (Korin, 1968). Another example is the adop-

tion of a sphericity structure by Mauchly (1940), where the diagonals are identical and the off-

diagonals are zero. Wilks (1946) then proposed an intraclass, or called uniform, or complete

symmetry structure, characterized by equal diagonals and equal off-diagonals. Wilks developed

likelihood ratio tests (LRTs) to statistically evaluate hypotheses related to population means or

population covariance structures. The estimation and hypothesis testing problems under the uni-

form covariance structures were further explored by Roy and Murthy (1960); Geisser (1964);

Aitkin et al. (1968); Haq (1974); Mathai and Katiyar (1979); Clement et al. (1981); Bhoj (1987),

and others. Building upon Wilks’ uniform structure, Votaw (1948) introduced two types of com-

pound symmetry structures, wherein the blocks exhibit uniform structures and derived the LRTs

for the corresponding hypotheses. The exact distributions of the LRTs were examined by Con-

sul (1968) and Mathai and Rathie (1970). From a symmetry perspective, matrices with Toeplitz

or circular Toeplitz structures have proven essential in various applications, including physics,

mathematics, and signal processing (Olkin and Press, 1969; Olkin, 1972).

Many classical covariance structures can be generalized to linear structured covariance

matrices, wherein only a few covariance parameters are unknown (Srivastava, 1966; Anderson,

2


1969; Anderson et al., 1970; Anderson, 1973). Furthermore, reducible and totally reducible

matrices were investigated by Rogers and Young (1975), Young (1976), and Sinha and Wieand

(1979).

In addition to parameterizing the covariance matrix unknown up to serval parameters, an-

other approach to design the covariance matrix is by adopting blocks in the covariance structure.

For instance, Afifi and Elashoff (1969) partitioned the covariance matrix based on the dichoto-

mous and continuous variables. The block versions of Wilks’ uniform structure were discussed

by Fleiss (1966), Arnold (1973), and Rogers and Young (1974), while the block forms of Votaw’s

compound symmetry structures were extended by Szatrowski (1982) and Žežula et al. (2018).

1.3 Recent Developments of Covariance Structures

Recent statistical methods have made successful advancements in addressing the estimation

problem associated with high-dimensional structured covariance matrices within the asymptotic

framework where the number of covariates p grows with the sample size n together. For exam-

ple, a shrinkage method involves shrinking the sample covariance matrix to the identity matrix

(Ledoit and Wolf, 2004). Several penalty methods have also been established to estimate large-

scale covariance and precision matrices (Friedman et al., 2008; Rothman et al., 2008; Lam and

Fan, 2009; Ravikumar et al., 2011). Under the assumption of bandability, i.e., the entries in the

covariance matrix decrease to zero from the diagonal to the off-diagonal direction, the banding

or (its smooth version) tapering covariance estimator has been proposed (Wu and Pourahmadi,

2003; Bickel and Levina, 2008a; Bickel and Gel, 2011). Additionally, various versions of thresh-

olding covariance estimators, along with their theoretical properties such as optimal convergence

3


rates under different matrix norms, have been studied (Karoui, 2008; Bickel and Levina, 2008b;

Rothman et al., 2009; Cai and Liu, 2011; Cai and Yuan, 2012).

However, the sparsity assumption may be violated in many real-life applications. For in-

stance, the financial returns and house prices studies (Fan et al., 2016), and the interactive features

in biomedical studies (Chen et al., 2018). The spiked sparse covariance matrix has been explored

by Johnstone (2001). Alternatively, the conditional sparsity assumption, i.e., a structure assump-

tion combined with sparsity and low-rank approximation, has been proposed, and the large co-

variance matrices have been consistently estimated using multi-factor models (Fan et al., 2008,

2013, 2018). Under another non-sparsity assumption, i.e., the Toeplitz structure (a special case

of bandable covariance structure), Cai et al. (2013) and Cai et al. (2016) have studied the problem

of estimating large covariance matrices and optimal convergence rates. Expository literature on

structured covariance matrix estimation can be found in Pourahmadi (2013), Wainwright (2019),

and Fan et al. (2020).

1.4 Interconnected Community Structure

The accumulating availability of community detection, network division, or clustering tech-

niques has led to enormous scientific discoveries and challenges in many studies involving large-

scale networks, such as biologics, biomedicine, plant science, computer science, finance, social

networks (Newman, 2006). Characterizing the structures or latent patterns in networked systems

is crucial as they may quantitatively describe the complex interactions among high-dimensional

features or variables (Wu et al., 2021). Novel discoveries based on structured networks could

deepen our understanding of the scientific mechanisms behind them, see various real examples in

4


He et al. (2015), He et al. (2019), Pal et al. (2020), and Perrot-Dockès et al. (2022). Conventional

community detection methods typically impose an independence assumption that the latent com-

munities are not correlated, resulting in block-diagonal structures in networks, particularly when

the number of features in networks is extremely high (Zhao, 2017; Lee and Wilkinson, 2019).

However, modeling the networks with independent communities may mislead subsequent statis-

tical analysis due to oversimplification. Alternatively, an interconnected community structure is

much more flexible for network analysis by allowing non-null connections among features at the

community level. In other words, we may add non-null off-diagonal blocks to the structure of

networks to represent correlations among communities. Wu et al. (2021) examined the relation-

ships between the independent communities and interconnected communities, as well as between

overlapped communities and interconnected communities.

1.5 A New Interconnected Community Structure with Uniform Blocks

In the present dissertation, we propose an interconnected community structure that exists

in a wide range of high-dimensional datasets. This structure can be implemented in popula-

tion covariance matrices, correlation matrices, or weighted adjacency matrices in various fields,

including genetics, proteomics, brain imaging, and RNA expression data, among many others.

From a sample perspective, an interconnected community structure with well-organized

blocks is widely observed. It exhibits several characteristics: (1) it is latent, meaning that network

or community detection algorithms need to be employed to analyze the raw data in a preliminary

study; (2) it is non-sparse, i.e., the elements of the covariance or correlation matrix have small but

non-zero values; (3) it demonstrates an almost constant-valued block form, where the elements

5


within each block are nearly identical, exhibiting low variability; and (4) it may contain single-

tons or isolated nodes. When the singletons are identified, we refer to the proposed structure as

an interconnected community structure with generalized well-organized blocks. In other words,

we can form an interconnected community structure with well-organized blocks from a gener-

alized version by extracting the features that are reordered by community detection algorithms

and excluding the singletons. The population versions of the interconnected community struc-

tures with (generalized) well-organized blocks are referred to as the interconnected community

structure with (generalized) uniform blocks.

Yeast Genome Study. One example of the interconnected community structure with well-

organized blocks (without singletons) is observed in the yeast genome study (Spellman et al.,

1998). We plot a heat map of the sample correlation matrix for 724 selected genes in Fig-

ure 1.1(B), where a K-medoids clustering algorithm was applied to the raw dataset, see Fig-

ure 1.1(A). Figure 1.1(C) represents the population correlation matrix may have 8 by 8 uniform

blocks.

Proteomics Study. Another example is from a proteomics study (Yildiz et al., 2007) in

Figure 1.1(D). The sample correlation matrix is analyzed, revealing an interconnected community

structure with generalized well-organized blocks among 184 protein features using the NICE

algorithm (Chen et al., 2018) in Figure 1.1(E). Extracting the structure without singletons results

in a sample correlation matrix with 107 protein features, shown in Figure 1.1(F). It suggests a

population correlation matrix with the corresponding uniform-block structure in Figure 1.1(G).

Seed Quality Study. A 923 by 923 sample correlation matrix is calculated from raw seed

quality data (Perrot-Dockès et al., 2022), plotted in Figure 1.1(H). An interconnected community

structure (without singletons), as shown in Figure 1.1(I) in the sample correlation matrix, is

6


provided by a hierarchical clustering algorithm. This implies a population correlation matrix

with the corresponding 7 by 7 uniform-block structure. See the illustration in Figure 1.1(J).

NMR Study. We observe a 249 by 249 sample correlation matrix calculated from a subset

of raw NMR data (Ritchie et al., 2023) in Figure 1.1(K). An interconnected community structure

with generalized well-organized blocks is revealed in the sample correlation matrix by a commu-

nity detection algorithm (Chen et al., 2023). See Figure 1.1(L). A 170 by 170 sample correlation

matrix (of residuals) extracted in Figure 1.1(M), which implies a population correlation matrix

with the corresponding 5 by 5 uniform-block structure in Figure 1.1(N).

Exposome and Metabolites Study. Another real-data example of the pattern of well-organized

blocks in sample correlation matrices is found in a study involving the exposome and metabolites

(ISGlobal, 2021). Figures 1.1(O) and (S) are the heat maps of the correlation matrices among

exposome and metabolite features, respectively. In this study, there are 169 exposome variables

and 221 metabolite variables for 1192 subjects. In particular, Figure 1.1(P) contains the heat

map of the sample correlation matrix calculated by the clustering method (Wu et al., 2021) for

exposome variables, while Figure 1.1(T) contains the heat map of the sample correlation matrix

calculated by the clustering method for metabolite variables. Omitting the singletons, 89 out of

169 exposome variables are identified to form a 9 by 9 well-organized blocks (Figure 1.1(Q)) and

141 out of 221 metabolite variables (Figure 1.1(U)) are identified to form a 7 by 7 well-organized

blocks. We may assume that their population correlation matrices have block forms, as shown in

Figure 1.1(R) and (V), respectively.

EPSI Study. The presence of an interconnected community pattern is detected in the sample

correlation matrix for an echo-planar spectroscopic imaging (EPSI) dataset, including 445 com-

binations of neurometabolites and regions-of-interest (ROIs) detected by the NICE algorithm

7


(Chen et al., 2018) and shown in Figure 1.1(W) and (X). Among these 445 combinations, the

sample correlation matrix of 227 combinations is extracted and implies the underlying popula-

tion correlation matrix has the structure of uniform blocks in Figure 1.1(Y) and (Z).

Other Studies. In addition to the above studies, analogs of the interconnected community

pattern were uncovered in a cutaneous melanoma dataset from the genome-wide association study

(He et al., 2015), a neuroimaging activation and connectivity dataset, a DNA methylation dataset

(Chen et al., 2016), and a gene expression profiling dataset from the host peripheral blood study

(Chen et al., 2018). Recently, it was extracted from a multiple myeloma dataset (He et al., 2019),

a citation network study (Pal et al., 2020), and an RNA-seq dataset from the cancer genome atlas

study of acute myeloid leukemia (Wu et al., 2021).

Throughout the current research, we concentrate on the population covariance matrices

having interconnected community structures with uniform blocks, as illustrated in Figure 1.1(C),

(G), (J), (N), (R), (V), and (Z). The proposed interconnected community structure with uniform

blocks has two-fold advantage. On the one hand, it leads to a dramatically reduced number of

covariance parameters (from p(p+1)/2 to K +(K +1)K/2 where K is the number of diagonal

blocks) but remains adequate flexibility to deal with the arbitrary dependency between covariates.

On the other hand, the non-sparse uniform correlations between blocks might represent some

stability (e.g., with respect to time or location in applications (Votaw et al., 1950)) and symmetry

in experimental quantities, therefore, improving the interpretations.

8


1.6 Organization

The rest of the dissertation is organized as follows. The definitions and properties of

uniform-block matrices and their applications in hypothesis tests are presented in Chapter 2.

An estimation procedure for the covariance matrix that is assumed to have the uniform-block

structure is conducted under both low- and high-dimensional frameworks in Chapter 3. Multi-

variate outcomes having an interconnected community structure with uniform blocks are ana-

lyzed by using autoregressive regression models in Chapter 4. A semi-confirmatory factor model

with uniform-block interconnected community structure is studied and an estimation procedure

is proposed for the unknown matrices and factor scores in Chapter 5. All technical proofs and

supplementary materials are appended at the end of the dissertation.

9


Figure 1.1: The plots in the first column are the heat maps of sample correlation matrices for the
raw datasets. The plots in the second column are the heat maps of sample correlation matrices
with reordered features/variables using community detection algorithms. We call these structures
interconnected community structures with (generalized) well-organized blocks (if the singletons
exist). The plots in the third column are the heat maps of sample correlation matrices with fea-
tures/variables that are extracted from the reordered ones. We call these structures interconnected
community structures with well-organized blocks. The plots in the fourth column are the heat
maps of population correlation matrices that are assumed to have the interconnected community
structures with uniform blocks.

10


Chapter 2: A New Representation of Uniform-Block Matrix and Applications

2.1 Introduction

Covariance matrices with specific structures or patterns have been extensively studied for

their crucial roles in theoretical and practical applications of multivariate analysis. Numerous ex-

amples of structured covariance matrices have been employed in multivariate analysis. Mauchly

(1940) proposed a spherical covariance matrix with identical positive variance parameters along

the diagonal and zero correlation parameters off the diagonal. Wilks (1946) extended the spheric-

ity structure to have equal non-zero values for the off-diagonal correlation parameters, terming

it the uniform (intraclass or complete symmetry) structure in the application to parallel forms of

a test in educational studies. Furthermore, Votaw (1948) expanded Wilks’ complete symmetry

structure by incorporating the interchangeability of mutually exclusive subsets of variables, intro-

ducing two types of compound symmetry covariance structures, which were utilized in medical

experiments (Votaw et al., 1950). In addition to the spherical and intraclass symmetric structures,

Olkin and Press (1969) proposed another covariance structure known as circular symmetry, which

was applied in physical studies and time series analysis.

A number of technological breakthroughs have led to significantly large-dimensional vari-

ables in real-world practice, necessitating the consideration of more complex covariance struc-

tures to reduce dimensionality. Customarily, various covariance structures have been developed,

11


including the bandability (Wu and Pourahmadi, 2003; Bickel and Levina, 2008a), the sparsity

(Karoui, 2008; Bickel and Levina, 2008b; Cai and Liu, 2011), and the combination of sparsity

and low-rank (Fan et al., 2008, 2011). Alternatively, to address the high dimensionality problem,

covariance matrices can be assumed to have a block structure, where the number of unknown

parameters is remarkably smaller than the original dimension. For instance, Rogers and Young

(1974) generalized Wilks’ intraclass structure to an arbitrary order in an educational study, such

that all diagonal blocks have the same intraclass form, as do all off-diagonal blocks. Szatrowski

(1976) studied covariance matrices with block compound symmetry structures, including type I

and type II, and applied them to the analysis of educational testing data (Szatrowski, 1982). Olkin

(1972) introduced circular symmetry structures in blocks and proposed a more general structure

known as block circular symmetry for applications in physics. Roy and Leiva (2011), Roy et al.

(2015), Roy et al. (2016), and Žežula et al. (2018) have extensively investigated a block structure

referred to as blocked compound symmetry or equicorrelation (partition) (Leiva, 2007; Roy and

Leiva, 2008), and applied it in brain imaging and bone densitometry studies. In this chapter, our

focus is on a covariance or correlation matrix with a particular block structure that is commonly

observed in empirical applications.

We concentrate on investigating a specific block pattern called uniform-block (UB) struc-

ture, motivated by its numerous real-world applications in Chapter 1. Specifically, the UB struc-

ture is characterized by diagonal and off-diagonal elements within each diagonal submatrix be-

ing equal to two constants and all elements within each off-diagonal submatrix being equal to a

constant. A partitioned matrix having a uniform-block structure is denoted as a uniform-block

matrix. The concept of UB structures is not completely new and has been introduced by various

researchers in different contexts. For instance, Geisser (1963) referred to it as the uniform case

12


of order m and derived an information test statistic for the population mean vector, when the co-

variance matrix of a normal population has a UB structure of order m = 1 or 2. Morrison (1972)

extended Geisser’s information test statistic to a more general order. Huang and Yang (2010) in-

vestigated the random sampling issues in the presence of a UB structure in the correlation matrix.

Cadima et al. (2010) referred to it as a k-group block structure and studied the eigendecomposi-

tion of correlation matrices with a UB structure. Roustant and Deville (2017) named a correlation

matrix with UB structure a parametric block correlation matrix with p blocks, and provided nec-

essary and sufficient conditions for its positive definiteness. Roustant et al. (2020) investigated

the Gaussian process regression problems using the name of generalized compound symmetry

block covariance matrices for UB matrices. Recently, Archakov and Hansen (2022) examined

this structured matrix, referring to it as a block matrix with block partition, and provided canoni-

cal forms for both symmetric and nonsymmetric cases.

However, to the best of our knowledge, there have been limited comprehensive studies on

the algebraic properties of UB matrices, which restricts their applications in various fields, in-

cluding statistics, biometrics, economics, finance, and others. For example, Geisser (1963) was

the first to derive the null distributions of the information test statistics concerning the population

mean vector(s) for both single and multiple samples, given a covariance matrix with a 2 by 2

UB structure. Specifically, Geisser (1963) derived an analogous version of Hotelling’s (general-

ized) T 2-statistic regarding the population mean vector based on a single normal sample, and an

analogous version of Hotelling’s (generalized) T 2
0 -statistic, known also as the Hotelling-Lawley

trace (Lawley, 1938; Hotelling, 1947, 1951), for testing the equality of population mean vectors

based on multiple normal samples. Under the null hypotheses, both of Geisser’s information

test statistics follow identical distributions as linear combinations of independent F -variates. Al-

13


though Geisser (1963) and Morrison (1972) also extended these results to a general case with an

arbitrary number of diagonal blocks, proofs were omitted.

In this study, we presented the algebraic properties of UB matrices through a novel block

Hadamard product representation. In essence, given a vector consisting of the block sizes, a UB

matrix can be uniquely determined by a diagonal matrix and a symmetric matrix of much smaller

dimensions. Moreover, these two lower-dimensional matrices (and the block-size vector) can

be viewed as the “coordinates” of a UB matrix since many important algebraic calculations on

UB matrices only depend on their “coordinate” matrices. As a result, this representation greatly

simplifies the algebraic operations, including the power computation, inverse calculation, eigen-

values determination, and determinant evaluation of a UB matrix, by leveraging its “coordinate”

matrices. As an application in statistics, we revisited and rigorously established the exact null dis-

tributions of Geisser’s information test statistics for a general number of orders, including single

and multiple sample cases.

We organize the remainder of this chapter as follows. Section 2.2 presents the definitions

and properties of UB matrices. Section 2.3 and Section 2.4 demonstrate the exact null distribu-

tions of Geisser’s information test statistics for one-sample and multiple-sample cases, respec-

tively. Lastly, we summarize our findings and provide remarks and discussions in Section 2.5.

Technical proofs are given in Chapter A. Throughout this chapter, let ⊤ denote the transpose

of a vector or matrix. Let In, Jn ∈ Rn×n denote the identity matrix and all-one matrix, respec-

tively. Let 0n×m, 1n×m ∈ Rn×m denote the all-zero matrix and all-one matrix, respectively. Let

diag(·) and Bdiag(·) denote the diagonal matrix and the block-diagonal matrix, respectively. Let

tr(·) and det(·) denote the trace and determinant of a square matrix, respectively. Let sum(·)

denote the sum of all elements of a matrix. Let corr(Σ) = diag−1/2 (σ11, . . . , σpp) × Σ ×

14


diag−1/2 (σ11, . . . , σpp) denote the correlation matrix of covariance matrix Σ with diagonal el-

ements σ11, . . . , σpp.

2.2 Uniform-Block Structure and Uniform-Block Matrix

In this section, we begin by defining a uniform-block structure and matrix. Next, we intro-

duce a block Hadamard product representation for uniform-block matrices, which unveils their

algebraic properties.

Definition 2.2.1 (partition-size vector and partitioned matrix by a partition-size vector). Given a

p by p matrix N ∈ Rp×p and a positive integer K ∈ Z+ such that K < p, we define:

(1) a column vector p = (p1, . . . , pK)
⊤ ∈ ZK

+ is a partition-size vector, if pk > 1 for every

k and p = p1 + · · ·+ pK;

(2) the K by K partitioned matrix (Nkk′) of N is the partitioned matrix of N by p, if the

(k, k′)-th block Nkk′ has dimensions pk by pk′ for k, k′ = 1, . . . , K.

Definition 2.2.2 (uniform-block structure and matrix). Given a partition-size vector p = (p1, . . . , pK)
⊤

and the K by K partitioned matrix (Nkk′) of a symmetric matrix N by p, we define:

(1) the structure of (Nkk′) is a uniform-block structure, if there exist real numbers akk and

bkk′ satisfying that the diagonal block Nkk = akkIpk+bkkJpk for every k = k′ and the off-diagonal

block Nkk′ = bkk′1pk×pk′
with bk′k = bkk′ for every k ̸= k′;

(2) the partitioned matrix (Nkk′) is a uniform-block matrix if it has the structure of uniform-

block. Furthermore, let N [A,B,p] = (Nkk′) denote this uniform-block matrix, where A =

diag (a11, . . . , aKK) is a K by K diagonal matrix and B = (bkk′) is a K by K symmetric matrix

with bk′k = bkk′ for every k ̸= k′.

15


Remark (non-symmetric uniform-block structure and matrix). We impose the condition of sym-

metry on a uniform-block structure or matrix, as defined in Definition 2.2.2, because we will be

considering covariance or correlation matrices with this structure. However, it is worth noting that

a non-symmetric uniform-block structure and matrix can also be defined by removing the condi-

tion bk′k = bkk′ for every k ̸= k′, i.e., allowing B to be an arbitrary K by K matrix. Nonetheless,

throughout this chapter, unless explicitly stated otherwise, we refer to a uniform-block structure

or matrix as symmetric.

Following Definition 2.2.2, we introduce two important instances of UB matrices: the par-

titioned matrices of an identity matrix Ip and an all-one matrix Jp are UB matrices, by a pre-

determined partition-size vector p = (p1, . . . , pK)
⊤. For simplicity, we will use I [p] and J [p]

instead of I [IK , 0K×K ,p] and J [0K×K , JK ,p] throughout the chapter.

I [p] = I [IK , 0K×K ,p] = Ip =



Ip1 0p1×p2 . . . 0p1×pK

0p2×p1 Ip2 . . . 0p2×pK

...
... . . . ...

0pK×p1 0pK×p2 . . . IpK


,

J [p] = J [0K×K , JK ,p] = Jp =



Jp1 1p1×p2 . . . 1p1×pK

1p2×p1 Jp2 . . . 1p2×pK

...
... . . . ...

1pK×p1 1pK×p2 . . . JpK


.

Using the notations I[p] and J[p], we propose the following novel block Hadamard product

representation of a UB matrix, which extremely simplifies the algebraic calculations involving

16


UB matrices.

Lemma 2.2.1 (block Hadamard product representation of a UB matrix). Given a pre-determined

a partition-size vector p = (p1, . . . , pK)
⊤, suppose the K by K partitioned matrix (Nkk′) of a p

by p symmetric matrix N by p is a UB matrix N [A,B,p], where A = diag (a11, . . . , aKK) is a

diagonal matrix and B = (bkk′) is a symmetric matrix with bk′k = bkk′ for every k ̸= k′. Then,

N [A,B,p] = A ◦ I [p] + B ◦ J [p] ,

holds uniquely for A and B, where ◦ denotes the block Hadamard product satisfying that A◦ I [p]

is the block-diagonal matrix Bdiag (a11Ip1 , . . . , aKKIpK ) and B ◦ J [p] is the symmetric block

matrix
(
bkk′1pk×pk′

)
.

Remark (block Hadamard product representation). The matrix operator ◦ can be regarded as a

specialized form of block Hadamard product that is specifically tailored for block matrices, as

discussed in works (Horn et al., 1991; Günther and Klotz, 2012). We provide an illustration of

Lemma 2.2.1 in Figure 2.1. A proof of Lemma 2.2.1 is available in Chapter A. From the proof,

it is evident that the block Hadamard product representation holds (except for uniqueness) when

pk = 1 for some k. Furthermore, the representation also holds for non-symmetric uniform-block

matrices.

By Lemma 2.2.1, the block Hadamard product representation is crucial for a UB matrix,

as it provides an explicit expression involving the diagonal matrix A, the symmetric matrix B,

and the partitioned-size vector p. Therefore, we suggest utilizing the notation N [A,B,p] in Def-

inition 2.2.2 (instead using the usual notation N (A,B,p)) throughout this chapter to emphasize

17


Figure 2.1: Illustration of the block Hadamard product representation of a UB matrix
Σ [A,B,p] = A ◦ I[p] + B ◦ J[p], where p = (2, 3)⊤, K = 2, p = 5, each square repre-
sents an element of Σ [A,B,p], different colors represent different values.

the importance of this representation. As demonstrated below, A, B, and p are sufficient and

necessary for determining the expressions for the power, inverse (if it exists), and eigenvalues of

a UB matrix N [A,B,p].

Corollary 2.2.1 (algebraic properties of UB matrices). Given a common partition-size vector

p = (p1, . . . , pK)
⊤, suppose N = N [A,B,p], N1 = N1 [A1,B1,p] and N2 = N2 [A2,B2,p]

are UB matrices with K by K diagonal matrices A = diag (a11, . . . , aKK), A1, A2, K by K

symmetric matrices B = (bkk′), B1, B2. Let ∆ = A+B×P ∈ RK×K with P = diag(p1, . . . , pK).

(1) (Addition/Subtraction) suppose N∗ = N1 ± N2, then the partitioned matrix of N∗ by p

is a UB matrix, denoted by N∗ [A∗,B∗,p], where A∗ = A1 ± A2 and B∗ = B1 ± B2;

(2) (Product) suppose N∗ = N1 × N2, in general, N∗ is not a UB matrix. But, if N1 and N2

are commute, i.e., N1 × N2 = N2 × N1, then N∗ is a UB matrix, denoted by N∗ [A∗,B∗,p], where

A∗ = A1 × A2 = A∗,⊤ and B∗ = A1 × B2 + B1 × A2 + B1 × P × B2 = B∗,⊤; in particular,

(2-1) (Square) suppose N∗ = N×N, then the partitioned matrix of N∗ by p is a UB matrix,

denoted by N∗ [A∗,B∗,p], where A∗ = A × A and B∗ = A × B + B × A + B × P × B;

18


(2-2) (Power) suppose N∗ = N × · · · × N = Nm with integer m ≥ 2, then the partitioned

matrix of N∗ by p is a UB matrix, denoted by N∗ [A(m),B(m),p
]
, where A(1) = A, B(1) = B,

A(m′) = A(m′−1)×A and B(m′) = A(m′−1)×B+B(m′−1)×A+B(m′−1)×P×B form′ = 2, . . . ,m;

(3) (Eigenvalues) N [A,B,p] has p real eigenvalues in total, those are akk with multiplicity

(pk − 1) for k = 1, . . . , K and the rest K eigenvalues are identical with those of ∆;

(4) (Determinant) N [A,B,p] has the determinant
(∏K

k=1 a
pk−1
kk

)
× det (∆);

(5) (Inverse) suppose N is invertible and N∗ = N−1, then the partitioned matrix of N∗ by p

is a UB matrix, denoted by N∗ [A∗,B∗,p], where A∗ = A−1 and B∗ = −∆−1 × B × A−1.

(6) (Canonical Form) let p̄0 = 0, p̄k =
∑k

k′=1 pk for k = 1, . . . , K (then p̄K = p), and λj

denote the j-th eigenvalue of N [A,B,p], where λ1 = · · · = λp̄1−1 = a11, λp̄1+1 = · · · = λp̄2−1 =

a22, . . ., λp̄K−1+1 = · · · = λp̄K−1 = aKK and the rest λp̄1 , λp̄2 , . . . , λp̄K are identical with the

eigenvalues of ∆ (in the decreasing order). Thus, there exists an p by p orthogonal matrix Γ

satisfying that Γ × N [A,B,p] × Γ⊤ = diag (λ1, λ2, . . . , λp) and Γ can be constructed by K

Helmert submatrices and K row vectors as follows:

Γ =



H̃1 0(p1−1)×p2 . . . 0(p1−1)×pK

ξ1,111×p1 ξ1,211×p2 . . . ξ1,K11×pK

...
... . . . ...

0(pK−1)×p1 0(pK−1)×p2 . . . H̃K

ξK,111×p1 ξK,211×p2 . . . ξK,K11×pK


,

where H̃k ∈ R(pk−1)×pk is the submatrix of a standard Helmert matrix of order pk without the first

row (Lancaster, 1965) and ξk = (ξk,1, ξk,2, . . . , ξk,K)
⊤ ∈ RK×1 is the eigenvector (normalized to

19


the unit length) of ∆ corresponding to the eigenvalue λp̄k for every k.

Remark (sufficient and necessary condition for positive definiteness). Let P = diag(p1, . . . , pK).

We observe that the term ∆ = A + B × P plays a critical role in determining the eigenvalues,

determinant, and inverse of an invertible UB matrix N [A,B,p]. Although ∆ is not symmetric

in general, it has K real eigenvalues because ∆ =
(
AP−1 + B

)
P is similar to a real symmetric

matrix P1/2
(
AP−1 + B

)
P1/2 (because they have the same characteristic polynomial), which has

K real eigenvalues. Therefore, N [A,B,p] is positive definite or invertible, if and only if A is

positive definite (i.e., akk > 0 for every k) and ∆ has K positive eigenvalues.

Remark (quadratic subspace). Consider the trace as an inner product, and let A denote the finite-

dimensional Hilbert space of p by p real symmetric matrices. A subspace B of A is said to be a

quadratic subspace of A , if B ∈ B implies that B2 ∈ B (Seely, 1971). By the square property

in Corollary 2.2.1, the collection of all UB matrices having a common partition-size vector forms

a quadratic subspace. Quadratic subspaces are useful in studying the completeness of minimal

sufficient statistics in a family of multivariate normal distributions (Seely, 1971, 1977; Zmyślony,

1980). For example, Szatrowski (1980) explored the relationship between the quadratic subspace

and the explicit representation of maximum likelihood estimators for covariance matrices in a

normal model. Roy et al. (2016) proved the optimal properties of the unbiased estimator that they

derived for estimating a blocked compound symmetry covariance matrix.

Remark (algebraic properties for non-symmetric UB matrices). Most results in Corollary 2.2.1

also hold for non-symmetric uniform-block matrices. Specifically, the sum, difference, and prod-

uct of two non-symmetric UB matrices are still a non-symmetric UB matrix with the same ex-

pressions of A∗ and B∗ as in Corollary 2.2.1. The determinant and inverse (if it exists) of a

20


non-symmetric UB matrix are also a non-symmetric UB matrix with the same expressions of A∗

and B∗. However, it is worth noting that although a non-symmetric UB matrix N [A,B,p] still

has p eigenvalues, i.e., akk with multiplicity (pk − 1) for every k and K eigenvalues of ∆, some

of the K eigenvalues of ∆ may be complex. Subsequently, we may rearrange the K Helmert

submatrices in Γ below the remaining K row vectors, resulting in a block-diagonal canonical

form for a non-symmetric UB matrix (please see Theorem 1 in Archakov and Hansen (2022)). If

∆ is diagonalizable, the canonical form will have a diagonal structure, and Γ remains the same

as in Corollary 2.2.1, where λp̄1 , . . . , λp̄K may be ordered in decreasing real parts.

The results in Corollary 2.2.1 highlight the following two-fold advantage of using the block

Hadamard product representations of UB matrices. First, calculations on K byK matrices A and

B can replace calculations on a larger p by p matrix N, where K is typically much smaller than

p, e.g., a proteomics study has K = 7 and p = 107 and a brain imaging study has K = 5 and

p = 227 in Chapter 3. This reduction in matrix size can significantly improve computational

efficiency. Second, the results involving addition or subtraction of UB matrices with a com-

mon partition-size vector, as well as operations such as taking the square (or power), computing

eigenvalues, determinant, and inverse (if it exists) of a UB matrix can be expressed in terms of

the “coordinates” A, B, and p. These results greatly facilitate the use of UB matrices in various

fields of applications. For example, in Chapter 3, we proposed the best-unbiased covariance- and

precision-matrix estimators when the number of diagonal blocksK is fixed, as well as a modified

hard-thresholding covariance matrix estimator when K grows with the sample size, respectively.

Before proceeding to hypothesis testing problems in the next sections, we specify the rela-

tionships between a covariance matrix and its precision and correlation matrix.

21


Corollary 2.2.2 (covariance matrix with a UB structure). Given a partition-size vector p =

(p1, . . . , pK)
⊤, suppose Σ = Σ [A,B,p] is a p by p positive definite covariance matrix with

a uniform-block structure, where A = diag (a11, . . . , aKK) and B = (bkk′) with bk′k = bkk′

for every k ̸= k′. Then, the partitioned matrix of Θ = Σ−1 by p is a UB matrix, denoted

by Θ [AΘ,BΘ,p]; the partitioned matrix of Ξ = corr (Σ) by p is a UB matrix, denoted by

Ξ [AΞ,BΞ,p], where


AΘ = A−1

BΘ = −∆−1 × B × A−1

,


AΞ = C−1/2 × A × C−1/2

BΞ = C−1/2 × B × C−1/2

,

with ∆ = A + B × P, P = diag(p1, . . . , pK), C = diag(c11, . . . , cKK), and ckk = akk + bkk for

every k.

2.3 Testing A Specific Mean for One-Sample

In the case where the number of diagonal blocksK = 1 orK = 2, Geisser (1963) proposed

an information test statistic for testing a specific mean vector based on a multivariate normal sam-

ple and derived its exact null distribution in closed form. The distribution of Geisser’s information

test statistic under the null hypothesis is identical to the distribution of a sum of several indepen-

dent F -variates. However, for the general case K > 2, Geisser (1963) provided an algorithm

for calculating the information test statistic and explicitly formulated the exact null distribution,

but omitted the proofs. In this section, we present the exact null distribution of the one-sample

Geisser’s information test statistic using the notations of UB matrices: this exact null distribu-

tion is equivalent to the distribution of a linear combination of mutually independent F -variates,

22


where the last variate is exactly the Hotelling’s T 2 statistic.

Specifically, given p-dimensional normal vectors X1, . . . ,Xn
i.i.d.∼ N (µ,Σ [A,B,p]), the

null and alternative hypotheses are given by

H0 : µ = µ0 versus H1 : µ ̸= µ0, (2.3.1)

where the covariance matrix is known to have a UB stricture, A = diag (a11, . . . , aKK) is an

unknown diagonal matrix, B = (bkk′) is an unknown symmetric matrix with bk′k = bkk′ for every

k ̸= k′, p = (p1, . . . , pK)
⊤ is a known partition-size vector, and µ0 ∈ Rp is a pre-determined

vector. To guarantee positive definiteness of Σ [A,B,p], we assume A is positive definite and

∆ = A + B × P has positive eigenvalues only, with P = diag (p1, . . . , pK).

Before deriving Geisser’s information test statistic for a specific mean vector, we introduce

the maximum likelihood estimator of Σ [A,B,p] based on a multivariate normal sample. Let

X̄ = n−1 (X1 + · · ·+Xn), S = (n − 1)−1
∑n

i=1(Xi − X̄)(Xi − X̄)⊤, and (Skk′) denote

the sample mean, the (unbiased) sample covariance matrix, and the partitioned matrix of S by p,

respectively. If the sample size is larger than the total number of unknown covariance parameters,

i.e., n > K + (K +1)K/2, then we can obtain the best-unbiased estimators of A and B, denoted

by Â = diag (â11, . . . , âKK) and B̂ =
(
b̂kk′
)

with b̂k′k = b̂kk′ for every k ̸= k′, respectively,

where âkk and b̂kk′ are given by

âkk =
pk × tr (Skk)− sum (Skk)

pk × (pk − 1)
, b̂kk′ =


sum (Skk′)

pk × pk′
, k ̸= k′

sum (Skk′)− tr (Skk′)

pk × (pk′ − 1)
, k = k′

(2.3.2)

23


for every k and every k, k′ respectively (see the details in Chapter 3). It is clear that the maximum

likelihood estimator âkk is exactly the average of the off-diagonal elements within the (k, k)-

th diagonal block of (Skk′); b̂kk is exactly the average of diagonal elements within the (k, k)-

th diagonal block minus âkk; and b̂kk′ is the average of all elements within the (k, k′)-th off-

diagonal block. By Corollary 2.2.2, the plug-in estimators of Σ [A,B,p] and Θ [AΘ,BΘ,p] are

Σ̂
[
Â, B̂,p

]
and Θ̂

[
ÂΘ, B̂Θ,p

]
, where Â and ∆̂ = Â+B̂×P are assumed to be positive definite

and have positive eigenvalues only, respectively, and ÂΘ and B̂Θ are given by ÂΘ = Â
−1

and

B̂Θ = −∆̂−1 × B̂ × Â
−1

.

Theorem 2.3.1 (exact null distribution of Geisser’s one-sample information test statistic). Geisser’s

one-sample test statistic for the hypotheses in (2.3.1) is given by

U = n×
(
X̄ − µ0

)⊤ × Θ̂
[
ÂΘ, B̂Θ,p

]
×
(
X̄ − µ0

)
.

Under H0, it follows a distribution U that is identical with the distribution of

K∑
k=1

(pk − 1)F
(k)
(pk−1),(pk−1)(n−1) + T 2,

where T 2 = K(n − 1)(n −K)−1F
(K+1)
K,n−K is the Hotelling’s T 2-statistic and F (k)

df1,df2
are (K + 1)

mutually independent F -variates with degrees of freedom df1 and df2 for k = 1, . . . , K + 1.

Remark (information test statistic U ). Geisser’s information criterion was proposed by Geisser

(1963) to test the hypotheses in (2.3.1) using analysis of variance tables. In the distribution U ,

the last variate is precisely Hotelling’s (generalized) T 2-statistic, which is most likely used in

multivariate inference (Anderson, 1992).

24


Remark (related distributions under H0). (1) As n→ ∞, U asymptotically follows a chi-square

distribution χ2
p where p = p1 + · · ·+ pK (Geisser, 1963);

(2) Given a significance level α and an arbitrary vector a ∈ Rp, the 100(1− α)% simulta-

neous confidence interval for a measurable function a⊤µ has the form

a⊤X̄ +±
√

U(α)× a⊤Σ̂
[
Â, B̂,p

]
a/n,

where U(α) denotes the upper α-th percentile of the distribution U (Morrison, 1972);

(3) An approximate of U is suggested as the distribution of C1F(p,C2) by Morrison (1971),

where the scale coefficient C1 and the second degree of freedom C2 are determined by equating

the first two cumulants of C1F(p,C2) to those of U . The specific values of C1 and C2 for K = 2

can be found in Spjøtvoll (1972) and Young (1976). Furthermore, Dyer (1982) considered the

distribution of the sum of generalized F variates and Lee and Hu (1996) extended the above result

to independent F -variates with arbitrary coefficients and degrees of freedom.

Remark (related distributions under H1). (1) The non-null distribution was analogous to the null

distribution U , except that one or more F -variates are non-central (Geisser, 1963);

(2) An approximate non-null distribution of U can be represented as D1F(p,D2)(δ), where

the noncentrality parameter δ = n (µ− µ0)
⊤Θ [AΘ,BΘ,p] (µ− µ0), and the scale coefficient

D1 and the second degree of freedom D2 are determined by equating the first two cumulants of

D1F(p,D2)(δ) to the non-null distribution of U .

25


2.4 Testing the Equality of Means for Multiple-Sample

We consider a general M -sample mean test (M > 1), where the samples are drawn from

M normal distributions with means µ(m) ∈ Rp, for m = 1, . . . ,M , and an equal covariance

matrix with a UB structure Σ [A,B,p] ∈ Rp×p.

Specifically, suppose them-th sample X(m)
1 , . . . ,X

(m)
nm has a size of nm, form = 1, 2, . . . ,M .

Thus, the grand sample size is denoted by n = n1 + · · ·+ nM and we assume n > max{M,K +

(K + 1)K/2}. Let X̄(m) = n−1
m

(
X

(m)
1 + · · ·+X

(m)
nm

)
, X̄ = n−1

(
n1X̄

(1) + · · ·+ nMX̄(M)
)
,

and S = (n −M)−1
∑M

m=1

∑nm

j=1

(
X

(m)
j − X̄(m)

)(
X

(m)
j − X̄(m)

)⊤
denote the m-th sample

mean, the grand sample mean, and the (pooled) unbiased estimator of the common covariance

matrix, respectively. The maximum likelihood estimators Â, B̂ can be obtained similarly to those

in (2.3.2), yielding the estimators ÂΘ, B̂Θ, and Θ̂
[
ÂΘ, B̂Θ,p

]
, respectively. Therefore, the null

and alternative hypotheses can be written as

H
(M)
0 : µ(1) = · · · = µ(M) versus H

(M)
1 : µ(m′) ̸= µ(m) for some m′ . (2.4.1)

Theorem 2.4.1 (exact null distribution of Geisser’s multiple-sample information test statistic).

Geisser’s multiple-sample information test statistic for the hypotheses in (2.4.1) is given by

UM =
M∑

m=1

nm

(
X̄(m) − X̄

)⊤
Θ̂
[
ÂΘ, B̂Θ,p

] (
X̄(m) − X̄

)
.

26


Under H(M)
0 , it follows a distribution UM that is identical with the distribution of

K∑
k=1

(M − 1)(pk − 1)F
(k)
(M−1)(pk−1),(n−M)(pk−1) + T 2

0

where T 2
0 is the Hotelling’s T 2

0 -statistic, F (k)
df1,df2

are K mutually independent F -variates (and

independent from T 2
0 ) with degrees of freedom df1 and df2 for k = 1, . . . , K.

Remark (information test statisticUM ). The Hotelling’s (generalized) T 2
0 -statistic, also known as

the Hotelling-Lawley trace, is commonly used to test the equality of multiple populations means,

assuming these multiple normal populations have the same (arbitrary) population covariance ma-

trices (Lawley, 1938; Hotelling, 1947, 1951). However, it is intractable to obtain the exact null or

non-null distribution of Hotelling’s T 2
0 -statistic, and various approximations have been proposed

in the literature (Ito, 1956, 1960; Pillai and Young, 1971; Siotani, 1971; McKeon, 1974).

2.5 Discussion

In this chapter, we concentrate on the algebraic properties of a specific type of block ma-

trices, where each block is uniform. We chose to parameterize the matrices in this way for two

key reasons. First, the uniform-block pattern has been popularly discovered in plenty of large-

scale biological data. Second, from a biological perspective, the variables that are clustered into

the same community may exhibit stochastic equivalence or comparable patterns, while variables

from different communities may have coherent connections at the community level. Compared

to the conventional diagonal or block-diagonal structure, the proposed uniform-block structure

offers more flexibility and is better suited for real data analysis, since the information contained

27


in the non-zero off-diagonal blocks can potentially provide valuable insights into the scientific

mechanisms.

In addition to defining a uniform-block structure, we have discovered a unique block

Hadamard product representation for a uniform-block matrix. This representation plays an im-

portant role because it allows for the transformation of a large-scale uniform-block matrix into

two lower-dimensional matrices and an integer-valued vector. The block Hadamard product rep-

resentation simplifies the computations related to uniform-block matrices. With these algebraic

properties, the uniform-block matrices are applicable to various statistical problems. For exam-

ple, covariance estimation with the uniform-block structure in Chapter 3, hypothesis testing for

the information test statistics, the multivariate linear regression models in Chapter 4, and confir-

matory factor analysis models in Chapter 5.

In conclusion, a uniform-block matrix (or structure), its associated algebraic properties, and

the block Hadamard product representation have broad applications in a range of fields, including

linear algebra, statistics, economics, and many others.

28


Chapter 3: Covariance Matrix Estimation for High-Throughput Biomedical Data

with Dependence Structure of Interconnected Communities

3.1 Introduction

Technological innovations in biomedicine have facilitated the generation of high-dimensional

datasets with simultaneous measurements of up to millions of biological features (Fan and Lv,

2008). In the past few decades, numerous statistical methods have been developed to analyze

these large-dimensional datasets. Estimating a covariance matrix (or a precision matrix) is fun-

damental to these analyses (Fan et al., 2016; Cai et al., 2016; Wainwright, 2019) because a covari-

ance matrix not only describes the complex interactive relations among variables but also leads

to accurate inferential and predictive results for clinical outcomes (He et al., 2019; Ke et al.,

2022). Since the dimensionality of the variables is much larger than the sample size, we resort to

advanced statistical methods rather than traditional covariance estimation strategies (Johnstone,

2001; Johnstone and Paul, 2018). The shrinkage and thresholding methods can provide a reliable

and robust covariance estimator under the sparsity assumption (Ledoit and Wolf, 2004; Bickel

and Levina, 2008b; Rothman et al., 2009; Cai and Liu, 2011). In addition, prior knowledge

of a covariance structure can greatly improve the accuracy of estimation and statistical infer-

ence (Fan, 2005; Bien, 2019). For example, recent methods can accommodate Toeplitz, banded,

29


block-diagonal covariance structures (Cai et al., 2013; Bickel and Levina, 2008a; Devijver and

Gallopin, 2018).

In the present research, we consider a widely prevalent covariance structure in a large

body of high-dimensional datasets (see the examples in Figure 3.1, extra examples in Chapter B,

and more examples in Figure 1.1). A well-organized block pattern is widely observed in most

commonly used biomedical data types, including genetics, proteomics, brain imaging, and RNA

expression data, among many others (Spellman et al., 1998; Yildiz et al., 2007; Chiappelli et al.,

2019; Chen et al., 2016; He et al., 2019, 2015; Wu et al., 2021). This block pattern exhibits several

properties of highly organized networks. For example, there is high modality as some variables

are clustered in the multiple and coherently correlated communities; there is small-worldness as

these communities are interconnected; and the network is scale-free as the remaining variables are

isolated if singletons (see the top parts of Figure 3.1(B) and (C)) are detected (Newman, 2006).

Therefore, we can specify this well-organized block pattern by assigning the high-dimensional

variables to multiple interconnected communities (see the middle parts of Figure 3.1(B) and (C)),

which are more informative, and a set of detected singletons. The interconnected community

structures might not be directly available from the high-dimensional biomedical data, but they can

be estimated by several clustering algorithms and network detection methods (Magwene, 2021;

Wu et al., 2021). Although there are potential benefits to leveraging the estimated interconnected

community structure to enhance the estimation of large covariance matrices, existing statistical

methods are restricted to establishing a connection between this structure and the covariance or

precision parameters and providing precise estimates. To address this methodological gap, we

propose a novel statistical procedure that enables closed-form estimators of large covariance and

precision matrices and supports robust statistical inference.

30


Figure 3.1: A: top, heat map of the sample correlation matrix calculated by a K-medoids cluster-
ing algorithm for Spellman’s yeast genome study, showing 8 by 8 well-organized blocks (without
singletons) in the structure; bottom: an assumed population correlation matrix with an 8 by 8
uniform-block structure. B and C: top, heat maps of the sample correlation matrices calculated
by a network detection algorithm for proteomics and brain imaging datasets, respectively, show-
ing the block patterns (with singletons); middle, heat maps in the black frames in the top parts,
showing 7 by 7 and 5 by 5 well-organized blocks respectively in the structure; bottom, assumed
population correlation matrices with a 7 by 7 and a 5 by 5 uniform-block structures, respectively.
D: top, illustration of a network model with 5 communities; bottom, a corresponding population
matrix with a 5 by 5 uniform-block structure.

31


We propose a parametric covariance model that subdivides the covariance matrix into

blocks or submatrices and assigns each block to either a community or an interconnection be-

tween two communities based on the observed interconnected community structure. Linking

the covariance parameters and the underlying network topological structure, we can facilitate a

closed-form estimator of each covariance parameter (using only the elements within the corre-

sponding block) and can establish the asymptotic properties for the proposed estimators. Specif-

ically, we derive the explicit estimators using advanced matrix theories including the block

Hadamard product representation of a covariance matrix with the above structure, and the thresh-

olding covariance regularization in the high-dimensional setting, where the number of covariance

parameters in all blocks exceeds the sample size.

Our method makes at least three novel contributions. Firstly, we have developed a fast

(closed-form) and accurate procedure for estimating a large covariance (and precision) matrix

with a particular structure that is applicable to various high-throughput biomedical data includ-

ing genomics, metabolomics, proteomics, neuroimaging data, and many others. Our method

quantitatively characterizes the interconnected community structure by estimating parameters in

both diagonal sub-matrices (i.e., communities) and off-diagonal sub-matrices (i.e., interactions

between communities), and thus better reveals the interactive mechanisms of a complex biosys-

tem. By utilizing this interconnected community structure, our large covariance matrix estimation

procedure also outperforms comparable methods in terms of accuracy and is numerically robust

to model misspecification. Consequently, our approach can lead to a more precise selection

of biological features of interest (e.g., cancer-related gene expressions) in the context of multiple

testing which relies on the accurate and reliable large covariance- and precision-matrix estimation

(Fan et al., 2012; Fan and Han, 2017). Secondly, we have derived the exact variance estimators

32


of the covariance parameter estimators and established the asymptotic properties, which enables

us to evaluate covariance patterns and provide confidence intervals. Lastly, we have extended our

method to accommodate scenarios where the number of diagonal blocks exceeds the sample size,

allowing for scalability to accommodate ultra-high-dimensional datasets.

The rest of the chapter is organized as follows. Section 3.2 introduces our proposed method.

We first mathematically define the uniform-block structure and the uniform-block matrix in Sec-

tion 3.2.1. Then, in Section 3.2.2, we derive the best-unbiased covariance- and precision-matrix

estimators for small K by taking advantage of the block Hadamard product representation. We

generalize the estimating procedure to large K with a diverging number of diagonal blocks in

Section 3.2.3. Section 3.3 contains thorough numerical evaluations of our method under various

scenarios. Section 3.4 illustrates the use of our method in two real-world applications. We pro-

vide a discussion in Section 3.5. Additional data examples, the exact covariance estimators for

the covariance parameters in the blocks, more simulation studies, and all the technical proofs are

given in Chapter B.

3.2 Methodology

3.2.1 A parametric covariance model with a uniform-block structure

Suppose X ∈ Rn×p is an n by p observed data matrix containing n independent and identi-

cally distributed p-variate normal vectors X1, . . . ,Xn ∼ N(µ,Σ) with mean µ := E(X1) ∈ Rp,

positive definite covariance matrix Σ := cov(X1) ∈ Rp×p (denoted by Σ ≻ 0), and precision

matrix Ω := Σ−1. Let S∗ and S denote the biased and unbiased versions of the sample covariance

matrix, respectively: S := (n − 1)−1
∑n

i=1(Xi − X̄)(Xi − X̄)⊤ and S∗ := n−1(n − 1)S with

33


X̄ := n−1(X1 + · · ·+Xn), where M⊤ denotes the transpose of a matrix (or a vector) M. More-

over, we require that the covariance matrix has the uniform-block structure described below. The

parameterization of this covariance structure is illustrated in two steps.

We specify the parametric covariance matrix based on our previous discussions of the

uniform-block structure. First, we use a vector to characterize the community sizes. Specifi-

cally, given the dimension p of the covariance matrix Σ and the number of communities K, let

p1, . . . , pK be positive integers satisfying pk > 1 (k = 1, . . . , K) and p = p1 + · · · + pK and

let p := (p1, . . . , pK)
⊤ be the partition-size vector, which is assumed to be fixed throughout the

paper. Given p, we can express Σ partitioned by p in block form:

(Σkk′) :=



Σ11 Σ12 . . . Σ1K

Σ21 Σ22 . . . Σ2K

...
... . . . ...

ΣK1 ΣK2 . . . ΣKK


, (3.2.1)

where Σkk′ ∈ Rpk×pk′ (k, k′ = 1, . . . , K). Second, following (3.2.1), we specify the diagonal

submatrix Σkk := akkIpk + bkkJpk for every k and the off-diagonal submatrix Σkk′ := bkk′1pk×pk′

with bkk′ = bk′k for every k ̸= k′, where Is, Js, and 1s×t are an s by s identity matrix, an s by s

all-one matrix, and an s by t all-one matrix, respectively. Using K < p1 + · · ·+ pK = p, we can

represent a large covariance matrix Σ by a smaller diagonal matrix A := diag(a11, . . . , aKK), a

34


smaller symmetric matrix B := (bkk′) with bkk′ = bk′k for every k ̸= k′, and a known vector p:

Σ(A,B,p) := (Σkk′) =



a11Ip1 + b11Jp1 b121p1×p2 . . . b1K1p1×pK

b211p2×p1 a22Ip2 + b22Jp2 . . . b2K1p2×pK

...
... . . . ...

bK11pK×p1 bK21pK×p2 . . . aKKIpK + bKKJpK


.

(3.2.2)

In this chapter, we say that the pattern in (3.2.2) is a uniform-block structure. If Σ(A,B,p)

has the structure in (3.2.2), it is a uniform-block matrix. This covariance parameterization strat-

egy based on Σkk and Σkk′ has been widely used in the statistics literature. For example, the

generalized estimation equations where the working correlation structure has a compound sym-

metry as well as linear mixed-effects models with a random intercept both have this pattern. In

practice, this parameterization strategy can characterize the covariance structure well using a par-

simonious model (as shown in Figure 3.1). In Section 3.3, we demonstrate that the performance

of this parameterization is robust under misspecification and matrix perturbation. By building the

parsimonious and effective covariance-matrix specification, we can develop reliable and accurate

covariance-matrix estimations using the likelihood approach and achieve optimistic properties.

Notice that the partition-size vector p is assumed to be known throughout this chapter.

In practice, it can be learned by a preliminary algorithm (for example, a K-medoids clustering

algorithm (Magwene, 2021) or a network detection algorithm (Wu et al., 2021)) before estimating

the covariance matrix. Also, the above definition of a uniform-block matrix does not guarantee its

positive definiteness in general. Thus, additional constraints should be imposed on the uniform-

block matrix Σ(A,B,p) to ensure that it is a valid covariance matrix. We defer the discussion of

35


these constraints to the next section.

3.2.2 Matrix estimation for the uniform-block structure with a small K

Given a partition-size vector p and the parametric covariance matrix Σ(A,B,p) with the

uniform-block structure (3.2.2), we define a q-dimensional parameter vector

θ := (a11, . . . , aKK , b11, . . . , b1K , b22, . . . , bKK)
⊤

consisting of the covariance parameters in the blocks. That is, the parameters of interest are in

the upper triangular part of Σ(A,B,p). Also, q = K + K(K + 1)/2. Thus, the problem of

estimating a p by p symmetric covariance matrix reduces to that of estimating the q-dimensional

parameter vector θ. In practice, q is considerably smaller than p(p + 1)/2, thereby remarkably

reducing the dimensionality of the parameters of interest.

The small K setting can be specified as follows: q < n while K, p, and q are fixed and p

can be greater than n. In other words, both the number of diagonal blocks K and the number of

parameters in the blocks q = K + K(K + 1)/2 are smaller than the sample size n. Moreover,

we require that K and the dimension of the covariance matrix p, which is proportional to K, are

fixed, so that q is also fixed. Also, p can be large enough to exceed nwith smallK. In this section,

we first introduce an explicit maximum likelihood estimator of θ with asymptotic properties and

then refine it to be the best-unbiased estimator, whose exact variance estimator is also provided in

closed form. Finally, we provide the best unbiased covariance- and precision-matrix estimators

for small K.

We start with the maximum likelihood estimation. Specifically, let (S∗
kk′) be the block form

36


of the biased sample covariance matrix S∗ partitioned by p. If the data X are normally distributed,

then the log-likelihood function of the data can be expressed by

ℓn(θ;X) ∝ n

2
log
(
det
[
{Σ(A,B,p)}−1])− n

2
tr
[
(S∗

kk′)× {Σ(A,B,p)}−1] .
A typical way to estimate θ in the literature is to derive the score function by taking the first-order

partial derivative of the log-likelihood function with respect to θj:

∂

∂θj

ℓn(θ;X) =
n

2
tr

[
{Σ(A,B,p)− (S∗

kk′)} ×
∂ {Σ(A,B,p)}−1

∂θj

]
(j = 1, . . . , q), (3.2.3)

where θj ∈ {a11, . . . , aKK , b11, . . . , b1K , b22, . . . , bKK} denotes the jth component of θ and

∂{Σ(A,B,p)}−1/∂θj ∈ Rp×p is a matrix whose entries are functions of θj .

However, solving the score equation (3.2.3) is challenging. Although the unknown entries

akk and bkk′ are uniformly and elegantly arranged in Σ(A,B,p), they are entangled in a complex

way in the precision matrix Ω = {Σ(A,B,p)}−1. In other words, θj is implicit in Ω so that

the closed form of Ω is not accessible in general. The complexity of the calculation increases if

the scale of the precision matrix increases. Alternatively, the existing numerical algorithms for

solving θ (for example, the method of averaging (Szatrowski, 1980)) rely on iterative updating

schemes, which require a long computational time and may lead to unstable estimates. These

facts motivate us to reconsider the possibility of deriving a closed-form estimator of θ. Thus,

we aim to find an explicit expression for Ω in terms of akk and bkk′ by taking advantage of the

special covariance structure in (3.2.2). More precisely, we speculate that Ω has an analogous

form to (3.2.2), which indeed can be fulfilled by realizing the following representation of the

37


block Hadamard product.

Lemma 3.2.1. Given a partition-size vector p = (p1, . . . , pK)
⊤ satisfying pk > 1 for every k

and p = p1 + · · · + pK , Ip and Jp partitioned by p are uniform-block matrices, expressed by

I(p) := Ip(IK , 0K×K ,p) = Bdiag(Ip1 , . . . , IpK ) and J(p) := Jp(0K×K , 1K×K ,p) = (1pk×pk′
),

respectively, where 0r×s denotes the r by s zero matrix and Bdiag(·) denotes a block-diagonal

matrix. Suppose a p by p matrix N partitioned by p is a uniform-block matrix of the form (3.2.2),

expressed by N(A,B,p), then the following representation is unique,

N(A,B,p) = A ◦ I(p) + B ◦ J(p),

where ◦ denotes the block Hadamard product satisfying that A◦I(p) := Bdiag (a11Ip1 , . . . , aKKIpK )

and B ◦ J(p) :=
(
bkk′1pk×pk′

)
.

Based on Lemma 3.2.1, we derive several basic properties of a uniform-block matrix, which

are summarized in Chapter 2. These properties reveal how A, B, and p determine the algebraic

operations for a uniform-block matrix N(A,B,p) and how a collection of uniform-block matri-

ces with the same p form a quadratic subspace (Seely, 1971). If we view A, B, and p as the

“coordinates” of a uniform-block matrix, then using the notation N(A,B,p) can simplify the

mathematical operations between p by p uniform-block matrices into those between their cor-

responding lower-dimensional K by K “coordinates”. Following Chapter 2, we get two useful

results for the covariance and precision matrices.

Corollary 3.2.1. (1) Given Σ(A,B,p) defined in (3.2.2), Σ(A,B,p) ≻ 0 if and only if A ≻ 0

and ∆ has positive eigenvalues only, where ∆ := A + B × P and P := diag(p1, . . . , pK).

38


(2) Suppose Σ(A,B,p) ≻ 0 and Ω = {Σ(A,B,p)}−1 is the precision matrix. Then, Ω

partitioned by p is a uniform-block matrix as well, which can be written by Ω(AΩ,BΩ,p) =

AΩ ◦ I(p) + BΩ ◦ J(p), where

AΩ = A−1, BΩ = −∆−1 × B × A−1. (3.2.4)

The first assertion of Corollary 3.2.1 finalizes the additional constraints that guarantee that

Σ(A,B,p) ≻ 0. The second assertion confirms that the precision matrix Ω = {Σ(A,B,p)}−1

partitioned by p is a uniform-block matrix, expressed by Ω(AΩ,BΩ,p). Furthermore, (3.2.4)

provides the relations between AΩ, BΩ, and A, B.

Therefore, applying the representation of the precision matrix in Corollary 3.2.1, we can

rewrite the partial derivative of the log-likelihood in (3.2.3) as

∂

∂θj

ℓn(θ;X) =
n

2
tr

[
{Σ(A,B,p)− (S∗

kk′)} ×
{
∂AΩ

∂θj

◦ I(p) +
∂BΩ

∂θj

◦ J(p)
}]

, (3.2.5)

where θj ∈ {a11, . . . , aKK , b11, . . . , b1K , b22, . . . , bKK} (j = 1, . . . , q). In contrast to (3.2.3), the

derivatives in (3.2.5) can be calculated by:

∂AΩ

∂akk
= −a−2

kk Ekk,
∂BΩ

∂akk
= ∆−1Ekk∆

−1BA−1 + a−2
kk∆

−1BEkk (k = 1, . . . , K),

∂AΩ

∂bkk′
= 0K×K ,

∂BΩ

∂bkk′
=


−∆−1EkkP∆−1P−1, k = k′

−∆−1(Ekk′ + Ek′k)P∆−1P−1, k ̸= k′

(k, k′ = 1, . . . , K),

where Ekk′ ∈ RK×K denotes a matrix with 1 in the (k, k′) entry and 0 otherwise. The explicit

39


forms of the derivatives highlight the advantage of (3.2.5) over (3.2.3). Owing to (3.2.5), we can

explicitly derive the analytic form of the maximum likelihood estimator for θ:

ã∗kk :=
tr(S∗

kk)

pk − 1
− sum(S∗

kk)

pk × (pk − 1)
, b̃∗kk′ :=


sum(S∗

kk′)

pk × pk′
, k ̸= k′

sum(S∗
kk′)− tr(S∗

kk′)

pk × (pk′ − 1)
, k = k′

, (3.2.6)

for every k and k′, where sum(M) =
∑r

j=1

∑s
j′=1mjj′ denotes the sum of all entries in M :=

(mjj′) ∈ Rr×s. Technical details are referred to Chapter B. We denote the maximum likelihood

estimator of θ as

θ̃∗ := (ã∗11, . . . , ã
∗
KK , b̃

∗
11, . . . , b̃

∗
1K , b̃

∗
22, . . . , b̃

∗
KK)

⊤.

The strong consistency, the asymptotic efficiency, and normality for θ̃∗ can similarly be derived

by standard procedures in the literature (Ferguson, 1996; van der Vaart and Wellner, 1996; van der

Vaart, 2000). Please refer to Chapter B.

Despite its asymptotic property, θ̃∗ is biased under a finite sample size and, therefore, is not

a uniformly minimum variance unbiased estimator. Namely, θ̃∗ is not the best unbiased estimator

for θ. Thus, it is natural to consider an unbiased estimator of θ by replacing S∗ with the unbiased

version S in (3.2.6). Specifically, let (Skk′) be the block form of S partitioned by p. Substituting

Skk′ for S∗
kk′ in (3.2.6), for every k and k′, we have

ãkk :=
tr(Skk)

pk − 1
− sum(Skk)

pk × (pk − 1)
, b̃kk′ :=


sum(Skk′)

pk × pk′
, k ̸= k′

sum(Skk′)− tr(Skk′)

pk × (pk′ − 1)
, k = k′

. (3.2.7)

We denote the above unbiased estimator as θ̃ := (ã11, . . . , ãKK , b̃11, . . . , b̃1K , b̃22, . . . , b̃KK)
⊤.

40


Alternatively, since S = n(n − 1)−1S∗, we have a relation between the maximum likelihood

estimator θ̃∗ and the unbiased estimator θ̃:

ãkk =
n

n− 1
ã∗kk, b̃kk′ =

n

n− 1
b̃∗kk′ (k, k′ = 1, . . . , K), θ̃ =

n

n− 1
θ̃∗.

Moreover, the following theorem summarizes the optimal property of θ̃.

Theorem 3.2.1 (Optimal property of θ̃). θ̃ is the best-unbiased estimator of θ, and

µ̃ := X̄, Σ̃(Ã, B̃,p) = Ã ◦ I(p) + B̃ ◦ J(p) (3.2.8)

are the best unbiased estimators of µ and Σ(A,B,p), where Ã := diag(ã11, . . . , ãKK) and B̃ :=

(̃bkk′) with b̃kk′ = b̃k′k for every k ̸= k′ are the best unbiased estimators of A and B.

Geisser (1963) and Morrison (1972) derived the same estimator θ̃ based on an analysis

of the variance table, but they did not show the above optimal property. Furthermore, after

calculating ∆̃ := Ã + B̃ × P with the best unbiased estimators Ã and B̃, we derive the best-

unbiased estimator for the precision matrix, as summarized in the following corollary.

Corollary 3.2.2. If both Ã and ∆̃ are positive definite, then

Ω̃(ÃΩ, B̃Ω,p) = ÃΩ ◦ I(p) + B̃Ω ◦ J(p) (3.2.9)

is the best unbiased estimator of Ω(AΩ,BΩ,p), ÃΩ := Ã
−1

and B̃Ω := −∆̃−1 × B̃ × Ã
−1

.

The covariance matrix of the proposed estimator θ̃ can be calculated from the Fisher infor-

mation matrix based on the asymptotic normality. Given a small K (say, K ≤ 3, empirically),

41


the Fisher information matrix and its inverse may have explicit expressions. For some real ap-

plications in which K is relatively large (say, K > 3), a calculation of the inverse of the Fisher

information matrix might be burdensome and unstable. Alternatively, we provide the exact vari-

ance and covariance estimators of the elements of θ̃ in Corollary 3.2.3 and Chapter B under a

finite sample size, respectively.

Corollary 3.2.3. The exact variance estimators of ãkk and b̃kk′ are

var(ãkk) =
2a2kk

(n− 1)(pk − 1)
,

var(̃bkk′) =


2

(n−1)pk(pk−1)
{(akk + pkbkk)

2 − (2akk + pkbkk)bkk} , k = k′

1

2(n− 1)pkpk′
{pkpk′(b2kk′ + b2k′k) + 2(akk + pkbkk)(ak′k′ + pk′bk′k′)} , k ̸= k′

for every k and k′. The exact covariance estimators of ãℓℓ and b̃kk′ for every ℓ, k and k′ are given

in Chapter B.

At the end of this section, we briefly introduce how we can extend the above estimation

procedure to situations with singletons. In addition to the communities assumed to follow the

uniform-block structure (for example, the blocks in the black frames in the top parts of Fig-

ure 3.1(B) and (C)), the other singletons on the diagonal are also commonly detected in real ap-

plications (see the diagonal entries outside the black frames in the top parts of Figure 3.1(B) and

(C)), such as a proteomics study (Yildiz et al., 2007) and a brain imaging study (Chiappelli et al.,

2019). Notice that a structure of well-organized communities with singletons may imply that a

population covariance matrix of the form Σ(full) := {Σ(A,B,p),D1;D⊤
1 ,D2}, where D1 ∈ Rp×d,

D2 ∈ Rd×d, and Σ(full) ∈ R(p+d)×(p+d), and d denotes the number of singletons. Following the

42


decomposition idea in Fan et al. (2013), we can estimate Σ(full) in two steps. First, we partition

the unbiased sample covariance matrix S(full) := {(Skk′),S1;S⊤
1 ,S2} in the same way. Second, we

obtain the best unbiased estimator Σ̃(Ã, B̃,p) based on (Skk′) and obtain the hard-thresholding

estimator R̃(full) of the matrix R(full) := {0p×p,S1;S⊤
1 ,S2} (Bickel and Levina, 2008a), yielding

the consistent estimator Σ̃(full) := Bdiag{Σ̃(Ã, B̃,p), 0d×d} + R̃(full) under a matrix norm if p is

fixed and d may grow with the sample size n.

3.2.3 Matrix estimation for the uniform-block structure with a large K

In Section 3.2.2, we estimated the covariance matrix with the uniform-block structure and

its precision matrix for small K. However, there are applications where the covariance matrices

are uniform-block matrices with more diagonal blocks than the sample size. More specifically,

a large K occurs when K, q > n and K, p, and q grow with n. In other words, the number of

diagonal blocks K is greater than the sample size n, and so is the number of covariance param-

eters in the blocks: q = K + K(K + 1)/2. Moreover, we require that K, the dimension p of

the covariance matrix (which is proportional to K), and q grow with n and diverge as n goes to

infinity. In this section, we generalize the proposed estimation procedure for small K and intro-

duce a consistent covariance-matrix estimator by modifying the hard-thresholding method with

largeK. Denote ∥M∥F = (
∑r

j=1

∑r
j′=1m

2
jj′)

1/2 and ∥M∥S = max∥x∥2=1∥Mx∥2 as the Frobenius

norm and spectral norm of M := (mjj′) ∈ Rr×r respectively, where ∥x∥2 := (
∑r

j=1 x
2
j)

1/2 for

x := (x1, . . . , xr)
⊤ ∈ Rr.

For normal data X ∈ Rn×p with population mean µ = 0p×1, the unbiased sample covari-

ance matrix S = X⊤X/n, and the population covariance matrix with the uniform-block structure

43


Σ(A,B,p) for large K, we propose a new thresholding approach based on the work by Bickel

and Levina (2008b). Since covariance matrix Σ(A,B,p) is fully determined by A, B, and p

according to Lemma 3.2.1, we threshold the estimates of A and B, rather than S, to yield a

covariance-matrix estimate. Specifically, given a thresholding level λ = λn > 0, let

âkk(λ) := ãkk × I(|ãkk| > λ), b̂kk′(λ) := b̃kk′ × I(|̃bkk′ | > λ) (3.2.10)

be the hard-thresholding estimators of akk and bkk′ , respectively, where ãkk and b̃kk′ are the best

unbiased estimators for every k and k′ and I(·) is the indicator function. We regard the new

covariance-matrix estimator

Σ̂λ(Âλ, B̂λ,p) = Âλ ◦ I(p) + B̂λ ◦ J(p) (3.2.11)

as the modified hard-thresholding estimator of Σ(A,B,p), where Âλ := diag{â11, . . . , âKK},

B̂λ := {b̂kk′(λ)} with b̂kk′(λ) = b̂k′k(λ) for every k ̸= k′. The consistency of this modified

hard-thresholding estimator is summarized below.

Theorem 3.2.2 (Consistency of the modified hard-thresholding estimator). Given a positive-

definite covariance matrix with the uniform-block structure Σ(A,B,p) = (σjj′) defined as (3.2.2),

suppose K = K(n) > n → ∞ and log(K)/n → 0 as n → ∞, and there exist constants

0 < Cp0 , Cq0 < ∞ such that
∑p

j=1

∑p
j′=1 |σjj′ |

p0 ≤ Cp0 and max
j

∑p
j′=1 |σjj′ |

q0 ≤ Cq0 for

44


0 < p0 < 2 and 0 < q0 < 1, then

∥Σ̂λ(Âλ, B̂λ,p)−Σ(A,B,p)∥2F ≤ OP (1)Cp0

{
log(K)

n

}1− p0
2

,

∥Σ̂λ(Âλ, B̂λ,p)−Σ(A,B,p)∥S ≤ OP (1)Cq0

{
log(K)

n

} 1−q0
2

,

where we choose λ = C{log(K)/n}1/2 for some positive constant C.

The performance of the modified hard-thresholding estimator (3.2.10) relies on the choice

of λ, which can be determined by applying the resampling rule in Bickel and Levina (2008a,b).

3.3 Numerical Studies

3.3.1 Simulations

To evaluate the performance of the proposed method comprehensively, we simulate data

and benchmark them against comparable estimation methods for large covariance (and precision)

matrices in the following three scenarios.

In Scenario 1 (Section 3.3.2), we generate normal data using a covariance matrix with a

uniform-block structure Σ0,1(A0,B0,p1) for n subjects for small K. That is, the number of

diagonal blocks K is small so that the number of covariance parameters in the blocks q is smaller

than the sample size n, whereas the dimension of the covariance matrix p can be greater than n.

We first focus on evaluating the finite-sample performance of the parameter vector estimator θ̃

in (3.2.7) by comparing the estimates with the ground truth. We also assess the accuracy of the

exact covariance estimator for θ̃, as presented in Chapter B. We then present the performance of

the covariance estimator