ABSTRACT Title of Dissertation: NEW STATISTICAL METHODS FOR HIGH-DIMENSIONAL INTERCONNECTED DATA WITH UNIFORM BLOCKS Yifan Yang Doctor of Philosophy, 2023 Dissertation Directed by: Professor Shuo Chen Department of Mathematics Empirical analyses of high-dimensional biomedical data, including genomics, proteomics, microbiome, and neuroimaging data, consistently reveal the presence of strong modularity in the dependence patterns. In these analyses, highly correlated features often form a few distinct communities or modules, which can be interconnected with each other. While the intercon- nected community structure has been extensively studied in biomedical research (e.g., gene co- expression networks), its potential to assist in statistical modeling and inference remains largely unexplored. To address this research gap, we propose novel statistical models and methods that capitalize on the prevalent community structures observed in large covariance and precision ma- trices derived from high-dimensional biomedical interconnected data. The first objective of this dissertation is to delve into the algebraic properties of the pro- posed interconnected community structures at the population level. Specifically, this pattern partitions the population covariance matrix into uniform (i.e., equal variances and covariances) blocks. To accomplish this objective, we introduce a block Hadamard product representation in Chapter 2, which relies on two lower-dimensional “coordinate” matrices and a pre-specific vector. This representation enables the explicit expressions of the square or power, determi- nant, inverse, eigendecomposition, canonical form, and the other matrix functions of the original larger-dimensional matrix on the basis of these lower-dimensional “coordinate” matrices. Estimating a covariance matrix is central to high-dimensional data analysis. Our second objective is to consistently estimate a large covariance or precision matrix having an intercon- nected community structure with uniform blocks. In Chapter 3, we derive the best-unbiased estimators for covariance and precision matrices in closed forms and provide theoretical results on their asymptotic properties. Our proposed method improves the accuracy of covariance and precision matrix estimation and demonstrates superior performance compared to the competing methods in both simulations and real data analyses. In Chapter 4, our goal is to investigate the effects of alcohol intake (as an exposure) on metabolomics outcome features. However, similar to other omics data, metabolomic outcomes often consist of numerous features that exhibit a structured dependence pattern, such as a co- expression network with interconnected modules. Effectively addressing this dependence struc- ture is crucial for accurate statistical inferences and the identification of alcohol intake-related metabolomic outcomes. Nevertheless, incorporating the structured dependence patterns into mul- tivariate outcome regression models remains difficulties in accurate estimation and inference. To bridge this gap, we propose a novel multivariate regression model that accounts for the correla- tions among outcome features using a network structure composed of interconnected modules. Additionally, we derive closed-form estimators of regression parameters and provide inference tools. Extensive simulation analysis demonstrates that our approach yields much-improved sen- sitivity with a well-controlled discovery rate when benchmarking against existing multivariate regression models. Confirmatory factor analysis (CFA) models play a crucial role in revealing underlying latent common factors within sets of correlated variables. However, their implementation often relies on a strong prior theory to categorize variables into distinct classes, which is frequently unavailable (e.g., in omics data analysis scenarios). To address this limitation, in Chapter 5, we propose a novel strategy based on network analysis that allows data-driven discovery to substitute for the lacking prior theory. By leveraging the detected interconnected community structure, our approach offers an elegant statistical interpretation and yields closed-form uniformly minimum variance unbiased estimators for all unknown matrices. To evaluate the effectiveness of our proposed estimation procedure, we compare it to conventional numerical methods and thoroughly validate it through extensive Monte Carlo simulations and real-world applications. NEW STATISTICAL METHODS FOR HIGH-DIMENSIONAL INTERCONNECTED DATA WITH UNIFORM BLOCKS by Yifan Yang Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2023 Advisory Committee: Professor Shuo Chen, Chair/Advisor Professor Benjamin Kedem, Co-Chair Professor Vince Lyzinski Professor Tianzhou Ma Professor Xin He © Copyright by Yifan Yang 2023 Acknowledgments I would like to express my heartfelt gratitude to Dr. Shuo Chen, my doctoral research advisor, for his invaluable guidance throughout this journey at University of Maryland, College Park. He is always available for academic advice whenever the students need it. Dr. Chen not only taught me the foundations of research, academic writing, and data analysis in real-world applications but also nurtured my collaborative skills. His insightful suggestions and unwavering support have been instrumental in shaping my future career. I extend my sincere appreciation to all the professors and staff in the Department of Math- ematics at University of Maryland, College Park. In particular, I am grateful to Dr. Benjamin Kedem for his exceptional expertise in mathematical statistics, time series analysis, Bayesian statistics, and various other fields. Dr. Paul Smith’s teachings on linear models, along with his guidance during numerous fruitful RITs, have been immensely valuable. I would also like to thank Dr. Eric Slud for his mentorship in computational statistics and categorical data analy- sis. Dr. Vince Lyzinski, an expert in graph theory and a member of my dissertation committee, deserves special thanks. Dr. Takumi Saegusa’s instruction on survival analysis and other remark- able statistical methods in semiparametric models has been invaluable. I am indebted to Cristina Garcia for her constant kindness and careful planning for the students, as well as to Stephanie Padgett for assisting us in reserving rooms. I am grateful to Matthew Griffin, the instructor of STAT 100, for his collaboration and for imparting valuable teaching techniques. Working with ii Susan Mazzullo, a dedicated instructor for several undergraduate statistical classes, has been a pleasure. I am also grateful to the faculty members in other departments who have contributed to my academic journey. Dr. Chixiang Chen deserves thanks for proofreading my papers and providing invaluable suggestions. Dr. Tianzhou Ma’s leadership in bi-weekly group meetings has provided fresh insights and perspectives. Dr. Xin He, a member of my dissertation committee, has been unfailingly kind and supportive. I am deeply thankful to my friends in Maryland, including Chuan Bi, Yunjiang Ge, Hwiy- oung Lee, Tong Lu, Yezhi Pan, Qiong Wu, Zhenyao Ye, Nathan Yu, Yiran Zhang, Xiaocui Zhang, Xuze Zhang, Zhiwei Zhao, Xiaoyu Zhou, and more. Lastly, I would like to express my profound appreciation to my parents, Guangping Yang and Xiaohui Huang. Thank you for introducing me to this remarkable and beautiful world, en- abling me to discover my passion and complete this degree. Your love and support have been my driving force. iii Table of Contents Acknowledgements ii Table of Contents iv List of Tables vii List of Figures ix Chapter 1: Introduction 1 1.1 Estimation of Large Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Classical Covariance Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Recent Developments of Covariance Structures . . . . . . . . . . . . . . . . . . 3 1.4 Interconnected Community Structure . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 A New Interconnected Community Structure with Uniform Blocks . . . . . . . . 5 1.6 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2: A New Representation of Uniform-Block Matrix and Applications 11 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Uniform-Block Structure and Uniform-Block Matrix . . . . . . . . . . . . . . . 15 2.3 Testing A Specific Mean for One-Sample . . . . . . . . . . . . . . . . . . . . . 22 2.4 Testing the Equality of Means for Multiple-Sample . . . . . . . . . . . . . . . . 26 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 3: Covariance Matrix Estimation for High-Throughput Biomedical Data with Dependence Structure of Interconnected Communities 29 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.1 A parametric covariance model with a uniform-block structure . . . . . . 33 3.2.2 Matrix estimation for the uniform-block structure with a small K . . . . . 36 3.2.3 Matrix estimation for the uniform-block structure with a large K . . . . . 43 3.3 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.2 Scenario 1: comparison for small K covariance matrix . . . . . . . . . . 46 3.3.3 Scenario 2: comparison for large K covariance matrix . . . . . . . . . . 50 3.3.4 Scenario 3: simulation analysis under model misspecification . . . . . . . 51 3.4 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.1 Proteomics data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 53 iv 3.4.2 Brain imaging data analysis . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Chapter 4: Modeling Multivariate Outcomes with Interconnected Modules: Evaluating the Impact of Alcohol Intake on Plasma Metabolomics 58 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.1 A simultaneous autoregressive regression model . . . . . . . . . . . . . . 62 4.2.2 The MAUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.3 Estimation of the MAUD parameters . . . . . . . . . . . . . . . . . . . . 67 4.2.4 Inference about the MAUD parameters . . . . . . . . . . . . . . . . . . . 71 4.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3.1 Data generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3.2 Evaluation of estimation of the scaled autoregressive dependence param- eters by the MAUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.3 Evaluation of statistical inference about the regression coefficients by the MAUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.4 Misspecification analysis of the MAUD . . . . . . . . . . . . . . . . . . 77 4.4 Investigation of the Effect of Alcohol Intake on Plasma Metabolomics . . . . . . 78 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Chapter 5: Semi-Confirmatory Factor Analysis for Multivariate Data with Intercon- nected Community Structures 84 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2.1 A confirmatory factor analysis model . . . . . . . . . . . . . . . . . . . 89 5.2.2 The SCFA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.2.3 The estimation procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.3.1 Data generation procedure . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.3.2 Study 1: performance of parameter estimation . . . . . . . . . . . . . . . 97 5.3.3 Study 2: performance of factor scores estimation . . . . . . . . . . . . . 98 5.3.4 Study 3: misspecification analysis . . . . . . . . . . . . . . . . . . . . . 100 5.4 Real Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Chapter 6: Conclusions 104 Appendix A: Supplementary Materials for Chapter 2 106 A.1 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 A.1.1 Proofs in Section 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 A.1.2 Proofs in Section 2.3 and Section 2.4 . . . . . . . . . . . . . . . . . . . . 110 Appendix B: Supplementary Materials for Chapter 3 121 B.1 Extra Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 v B.2 Exact Covariance Estimators for θ̃ . . . . . . . . . . . . . . . . . . . . . . . . . 122 B.3 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 B.3.1 Proof of Lemma 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 B.3.2 Proof of Corollary 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 B.3.3 Derivations of the maximum likelihood estimator and its property . . . . 126 B.3.4 Proof of Theorem 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 B.3.5 Proof of Corollary 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 B.3.6 Proofs of Corollary 3.2.3 and Corollary B.2.1 . . . . . . . . . . . . . . . 137 B.3.7 Proof of Theorem 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 B.4 Extra Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 B.4.1 Extra simulation studies in Scenario 1 . . . . . . . . . . . . . . . . . . . 151 B.4.2 Extra simulation study in Scenario 3 . . . . . . . . . . . . . . . . . . . . 155 Appendix C: Supplementary Materials for Chapter 4 157 C.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 C.2 Properties of UB-Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 C.3 Plug-In Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 C.4 Technical Conditions and Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . 159 C.5 The Case of Σϵ ̸= IR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Appendix D: Supplementary Materials for Chapter 5 164 D.1 Technical Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 D.1.1 Proof of Corollary 5.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Bibliography 167 vi List of Tables 3.1 Estimation results (×100) for a0,kk and b0,kk′ in Scenario 1 under n = 50, 100, or 150, where bias denotes the average bias, MCSD denotes the Monte Carlo stan- dard deviation, ASE denotes the average standard errors, CP denotes the coverage probability based on a 95% Wald-type confidence interval. . . . . . . . . . . . . 49 4.1 Estimation results of γ under n = 100, where “bias” denotes the average of estimation bias, “MCSD” denotes the Monte Carlo standard deviation, “ASE” denotes the average asymptotic standard error, “95% CP” denotes the coverage probability based on a 95% Wald-type confidence interval. . . . . . . . . . . . . 75 4.2 Statistical results of the effect of alcohol intake frequency on the biomarkers associated with the lipoprotein at level α = 0.05, where “IDL” refers to the intermediate-density lipoprotein, “VLDL” refers to the very-low-density lipopro- tein, “LDL” refers to the low-density lipoprotein, “HDL” refers to the high- density lipoprotein, and + and − refer to the signs of estimated regression co- efficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.1 Estimation results of A = diag (a11, a22, a33) and B = (bkk′) with bk′k = bkk′ for k ̸= k′ in Study 1 by using the proposed method under various n and p, where “bias” denotes the average of estimation bias, “MCSD” denotes the Monte Carlo standard deviation, “ASE” denotes the average asymptotic standard error, “95% CP” denotes the coverage probability based on a 95% Wald-type confidence interval. The computational times of 100 replicates are around 0.05 seconds and 0.07 seconds for p = 50 and p = 100, respectively. . . . . . . . . . . . . . . . . 98 5.2 Estimation results of A = diag (a11, a22, a33) and B = (bkk′) with bk′k = bkk′ for k ̸= k′ in Study 1 by using the lavaan package under various n and p, where “bias” denotes the average of estimation bias, “MCSD” denotes the Monte Carlo standard deviation, “ASE” denotes the average asymptotic standard error. The computational times of 100 replicates are around 97 seconds and 1776 seconds for p = 50 and p = 100, respectively. . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3 Euclidean losses of ∑n i=1 ∥∥∥f̂i − fi ∥∥∥ in Study 2 by using a standard SCFA model, CFA model, and EFA model, respectively, under various n and p, based on 100 replicates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 vii 5.4 Misspecification analysis results of A = diag (a11, . . . , a33) and B = (bkk′) with bk′k = bkk′ for k ̸= k′ in Study 3 by using the proposed method under various σ and p, where “bias” denotes the average of estimation bias, “MCSD” denotes the Monte Carlo standard deviation, “ASE” denotes the average asymptotic standard error, “95% CP” denotes the coverage probability based on a 95% Wald-type confidence interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.5 Estimation of the correlation matrix of 227 combinations of neurometabolites and brain regions, where “SE” denotes the estimated standard error, “95% CI” denotes the 95% Wald-type confidence interval, and † denotes the 95% CI containing 0. . 102 B.1 Estimation results of the total number of rejectionsR(t), the number of correct re- jections S(t), and the false discovery proportion (FDP) using the true covariance matrix, the POET estimator, and the proposed estimator . . . . . . . . . . . . . . 154 viii List of Figures 1.1 The plots in the first column are the heat maps of sample correlation matrices for the raw datasets. The plots in the second column are the heat maps of sample correlation matrices with reordered features/variables using community detection algorithms. We call these structures interconnected community structures with (generalized) well-organized blocks (if the singletons exist). The plots in the third column are the heat maps of sample correlation matrices with features/variables that are extracted from the reordered ones. We call these structures interconnected community structures with well-organized blocks. The plots in the fourth column are the heat maps of population correlation matrices that are assumed to have the interconnected community structures with uniform blocks. . . . . . . . . . . . . 10 2.1 Illustration of the block Hadamard product representation of a UB matrix Σ [A,B,p] = A ◦ I[p] + B ◦ J[p], where p = (2, 3)⊤, K = 2, p = 5, each square represents an element of Σ [A,B,p], different colors represent different values. . . . . . . . . 18 3.1 A: top, heat map of the sample correlation matrix calculated by a K-medoids clustering algorithm for Spellman’s yeast genome study, showing 8 by 8 well- organized blocks (without singletons) in the structure; bottom: an assumed pop- ulation correlation matrix with an 8 by 8 uniform-block structure. B and C: top, heat maps of the sample correlation matrices calculated by a network detection algorithm for proteomics and brain imaging datasets, respectively, showing the block patterns (with singletons); middle, heat maps in the black frames in the top parts, showing 7 by 7 and 5 by 5 well-organized blocks respectively in the struc- ture; bottom, assumed population correlation matrices with a 7 by 7 and a 5 by 5 uniform-block structures, respectively. D: top, illustration of a network model with 5 communities; bottom, a corresponding population matrix with a 5 by 5 uniform-block structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Results of the losses in the Frobenius and spectral norms and the execution time for the proposed covariance- and precision-matrix estimators and the conven- tional estimators in simulation studies. The proposed estimators outperform the competitors regarding fewer losses and shorter computational times. . . . . . . . 50 ix 3.3 A: the first two are the heat maps of the sample correlation matrices for the pro- teomics dataset, the third one exhibits the estimates of b0,kk′ (their standard errors are between 0.03 and 0.07; for each k, the estimate of a0,kk is one minus the es- timate of b0,kk with standard errors around 0.01), and the fourth one contains the confidence intervals for b0,kk′ (+ refers to the 95% confidence interval on the right of 0, − refers to the 95% confidence interval on the left of 0, 0 refers to the 95% confidence interval containing 0). B: the first two are the heat maps of the sample correlation matrices for the brain imaging dataset, the third one exhibits the esti- mates of b0,kk′ (their s.e. are between 0.07 and 0.11; for each k, the estimate of a0,kk is equal to one minus the estimate of b0,kk with standard errors around 0.01), and the fourth one contains the confidence intervals for b0,kk′ (+ refers to the 95% confidence interval on the right of 0, − refers to the 95% confidence interval on the left of 0, 0 refers to the 95% confidence interval containing 0). . . . . . . . . 56 4.1 A: 249 by 249 sample correlation matrix (of residuals) calculated from a sub- set of raw NMR data (Ritchie et al., 2023); B: an interconnected community structure with generalized (with singletons) well-organized blocks in the sample correlation matrix provided by a community detection algorithm (Chen et al., 2023); C: a 170 by 170 sample correlation matrix (of residuals) extracted from the black frame in B, which is called an interconnected community structure with well-organized blocks; D: a population correlation matrix with an interconnected community structure with (5 by 5) uniform blocks; E: a partition-size vector ℓ = (L1, L2) ⊤ = (2, 3)⊤ with R = 5 and G = 2; F: an illustration of the block Hadamard product representation N (A,B, ℓ) = A ◦ I(ℓ) + B ◦ J(ℓ); G: an illustration of a uniform-block matrix N (A,B, ℓ), where the cells with different colors in the matrices represent the elements with different values, the cells with different colors in ℓ represent the features in different communities. . . . . . . . 61 4.2 ROC curves under various levels of effect size κ for different covariance matrix estimators, different statistical models, and different noise levels of covariance matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3 Forest plot of 95% confidence intervals for 249 biomarkers regression coeffi- cients, biomarkers in “black” refer to the significant ones in 5 communities, biomarkers in “blue” refer to the significant singletons, biomarkers in “grey” refer to the non-significant ones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.1 Illustration of the workflow for the SCFA, CFA, and EFA models: we first apply a community detection algorithm to the raw data; if the interconnected community structure is detected, then we use the SCFA model; otherwise, we consider the conventional EFA or CFA models. . . . . . . . . . . . . . . . . . . . . . . . . . 85 x 5.2 Real examples of the interconnected community structures (the first row) and the corresponding assumed population covariance matrices (the second row). A: the sample and assumed population covariance matrices for a seed quality study (Perrot-Dockès et al., 2022); B: the sample and assumed population covariance matrices for a nuclear magnetic resonance study (Ritchie et al., 2023); C: the sample and assumed population covariance matrices for an echo-planar spectro- scopic imaging study (Chiappelli et al., 2019); D and E: the sample and assumed population covariance matrices for an environmental study involving exposome and metabolites (ISGlobal, 2021). . . . . . . . . . . . . . . . . . . . . . . . . . 87 B.1 Heat maps for exposome and metabolite data in an environmental research . . . 121 B.2 Box plots of the off-diagonal entries in the diagonal blocks and of all entries in the diagonal blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 xi Chapter 1: Introduction 1.1 Estimation of Large Covariance Matrix Technological innovation has had a profound impact on scientific discoveries by integrat- ing the analysis of large-scale datasets into various fields. Statistically, a covariance matrix or the precision matrix (i.e., the inverse of a covariance matrix) is crucial in understanding the intricate relationships among massive and dependent covariates. Thus, considerable attention has been concentrated on the use of an appropriate covariance (or precision or correlation) matrix estima- tor. A practical issue arises from the fact that the sample size n in research is often inadequate in comparison to the number of covariates, say p. Thus, when the number of covariates exceeds the sample size, i.e., p > n, the sample covariance matrix, which is a natural choice of the covariance matrix estimator in the case p < n, fails to be a reliable estimator due to the absence of positive definiteness (Dykstra, 1970) and the inconsistency of eigenvalues and eigenvectors (Johnstone, 2001; Johnstone and Lu, 2009; Johnstone and Paul, 2018). Consequently, Fan (2005) asserted that it is essentially challenging to obtain an appropriate covariance estimator without imposing a structure assumption on a large covariance matrix. 1 1.2 Classical Covariance Structures Focus has been devoted to structured or patterned matrices, not only because they hold significance in theoretical analyses for multivariate models but also because they serve as funda- mental tools in numerous applications. Many notable instances of structured matrices have found applications in multivariate analysis. One straightforward example involves testing a given (known) covariance matrix statisti- cally based on multivariate normal observations (Korin, 1968). Another example is the adop- tion of a sphericity structure by Mauchly (1940), where the diagonals are identical and the off- diagonals are zero. Wilks (1946) then proposed an intraclass, or called uniform, or complete symmetry structure, characterized by equal diagonals and equal off-diagonals. Wilks developed likelihood ratio tests (LRTs) to statistically evaluate hypotheses related to population means or population covariance structures. The estimation and hypothesis testing problems under the uni- form covariance structures were further explored by Roy and Murthy (1960); Geisser (1964); Aitkin et al. (1968); Haq (1974); Mathai and Katiyar (1979); Clement et al. (1981); Bhoj (1987), and others. Building upon Wilks’ uniform structure, Votaw (1948) introduced two types of com- pound symmetry structures, wherein the blocks exhibit uniform structures and derived the LRTs for the corresponding hypotheses. The exact distributions of the LRTs were examined by Con- sul (1968) and Mathai and Rathie (1970). From a symmetry perspective, matrices with Toeplitz or circular Toeplitz structures have proven essential in various applications, including physics, mathematics, and signal processing (Olkin and Press, 1969; Olkin, 1972). Many classical covariance structures can be generalized to linear structured covariance matrices, wherein only a few covariance parameters are unknown (Srivastava, 1966; Anderson, 2 1969; Anderson et al., 1970; Anderson, 1973). Furthermore, reducible and totally reducible matrices were investigated by Rogers and Young (1975), Young (1976), and Sinha and Wieand (1979). In addition to parameterizing the covariance matrix unknown up to serval parameters, an- other approach to design the covariance matrix is by adopting blocks in the covariance structure. For instance, Afifi and Elashoff (1969) partitioned the covariance matrix based on the dichoto- mous and continuous variables. The block versions of Wilks’ uniform structure were discussed by Fleiss (1966), Arnold (1973), and Rogers and Young (1974), while the block forms of Votaw’s compound symmetry structures were extended by Szatrowski (1982) and Žežula et al. (2018). 1.3 Recent Developments of Covariance Structures Recent statistical methods have made successful advancements in addressing the estimation problem associated with high-dimensional structured covariance matrices within the asymptotic framework where the number of covariates p grows with the sample size n together. For exam- ple, a shrinkage method involves shrinking the sample covariance matrix to the identity matrix (Ledoit and Wolf, 2004). Several penalty methods have also been established to estimate large- scale covariance and precision matrices (Friedman et al., 2008; Rothman et al., 2008; Lam and Fan, 2009; Ravikumar et al., 2011). Under the assumption of bandability, i.e., the entries in the covariance matrix decrease to zero from the diagonal to the off-diagonal direction, the banding or (its smooth version) tapering covariance estimator has been proposed (Wu and Pourahmadi, 2003; Bickel and Levina, 2008a; Bickel and Gel, 2011). Additionally, various versions of thresh- olding covariance estimators, along with their theoretical properties such as optimal convergence 3 rates under different matrix norms, have been studied (Karoui, 2008; Bickel and Levina, 2008b; Rothman et al., 2009; Cai and Liu, 2011; Cai and Yuan, 2012). However, the sparsity assumption may be violated in many real-life applications. For in- stance, the financial returns and house prices studies (Fan et al., 2016), and the interactive features in biomedical studies (Chen et al., 2018). The spiked sparse covariance matrix has been explored by Johnstone (2001). Alternatively, the conditional sparsity assumption, i.e., a structure assump- tion combined with sparsity and low-rank approximation, has been proposed, and the large co- variance matrices have been consistently estimated using multi-factor models (Fan et al., 2008, 2013, 2018). Under another non-sparsity assumption, i.e., the Toeplitz structure (a special case of bandable covariance structure), Cai et al. (2013) and Cai et al. (2016) have studied the problem of estimating large covariance matrices and optimal convergence rates. Expository literature on structured covariance matrix estimation can be found in Pourahmadi (2013), Wainwright (2019), and Fan et al. (2020). 1.4 Interconnected Community Structure The accumulating availability of community detection, network division, or clustering tech- niques has led to enormous scientific discoveries and challenges in many studies involving large- scale networks, such as biologics, biomedicine, plant science, computer science, finance, social networks (Newman, 2006). Characterizing the structures or latent patterns in networked systems is crucial as they may quantitatively describe the complex interactions among high-dimensional features or variables (Wu et al., 2021). Novel discoveries based on structured networks could deepen our understanding of the scientific mechanisms behind them, see various real examples in 4 He et al. (2015), He et al. (2019), Pal et al. (2020), and Perrot-Dockès et al. (2022). Conventional community detection methods typically impose an independence assumption that the latent com- munities are not correlated, resulting in block-diagonal structures in networks, particularly when the number of features in networks is extremely high (Zhao, 2017; Lee and Wilkinson, 2019). However, modeling the networks with independent communities may mislead subsequent statis- tical analysis due to oversimplification. Alternatively, an interconnected community structure is much more flexible for network analysis by allowing non-null connections among features at the community level. In other words, we may add non-null off-diagonal blocks to the structure of networks to represent correlations among communities. Wu et al. (2021) examined the relation- ships between the independent communities and interconnected communities, as well as between overlapped communities and interconnected communities. 1.5 A New Interconnected Community Structure with Uniform Blocks In the present dissertation, we propose an interconnected community structure that exists in a wide range of high-dimensional datasets. This structure can be implemented in popula- tion covariance matrices, correlation matrices, or weighted adjacency matrices in various fields, including genetics, proteomics, brain imaging, and RNA expression data, among many others. From a sample perspective, an interconnected community structure with well-organized blocks is widely observed. It exhibits several characteristics: (1) it is latent, meaning that network or community detection algorithms need to be employed to analyze the raw data in a preliminary study; (2) it is non-sparse, i.e., the elements of the covariance or correlation matrix have small but non-zero values; (3) it demonstrates an almost constant-valued block form, where the elements 5 within each block are nearly identical, exhibiting low variability; and (4) it may contain single- tons or isolated nodes. When the singletons are identified, we refer to the proposed structure as an interconnected community structure with generalized well-organized blocks. In other words, we can form an interconnected community structure with well-organized blocks from a gener- alized version by extracting the features that are reordered by community detection algorithms and excluding the singletons. The population versions of the interconnected community struc- tures with (generalized) well-organized blocks are referred to as the interconnected community structure with (generalized) uniform blocks. Yeast Genome Study. One example of the interconnected community structure with well- organized blocks (without singletons) is observed in the yeast genome study (Spellman et al., 1998). We plot a heat map of the sample correlation matrix for 724 selected genes in Fig- ure 1.1(B), where a K-medoids clustering algorithm was applied to the raw dataset, see Fig- ure 1.1(A). Figure 1.1(C) represents the population correlation matrix may have 8 by 8 uniform blocks. Proteomics Study. Another example is from a proteomics study (Yildiz et al., 2007) in Figure 1.1(D). The sample correlation matrix is analyzed, revealing an interconnected community structure with generalized well-organized blocks among 184 protein features using the NICE algorithm (Chen et al., 2018) in Figure 1.1(E). Extracting the structure without singletons results in a sample correlation matrix with 107 protein features, shown in Figure 1.1(F). It suggests a population correlation matrix with the corresponding uniform-block structure in Figure 1.1(G). Seed Quality Study. A 923 by 923 sample correlation matrix is calculated from raw seed quality data (Perrot-Dockès et al., 2022), plotted in Figure 1.1(H). An interconnected community structure (without singletons), as shown in Figure 1.1(I) in the sample correlation matrix, is 6 provided by a hierarchical clustering algorithm. This implies a population correlation matrix with the corresponding 7 by 7 uniform-block structure. See the illustration in Figure 1.1(J). NMR Study. We observe a 249 by 249 sample correlation matrix calculated from a subset of raw NMR data (Ritchie et al., 2023) in Figure 1.1(K). An interconnected community structure with generalized well-organized blocks is revealed in the sample correlation matrix by a commu- nity detection algorithm (Chen et al., 2023). See Figure 1.1(L). A 170 by 170 sample correlation matrix (of residuals) extracted in Figure 1.1(M), which implies a population correlation matrix with the corresponding 5 by 5 uniform-block structure in Figure 1.1(N). Exposome and Metabolites Study. Another real-data example of the pattern of well-organized blocks in sample correlation matrices is found in a study involving the exposome and metabolites (ISGlobal, 2021). Figures 1.1(O) and (S) are the heat maps of the correlation matrices among exposome and metabolite features, respectively. In this study, there are 169 exposome variables and 221 metabolite variables for 1192 subjects. In particular, Figure 1.1(P) contains the heat map of the sample correlation matrix calculated by the clustering method (Wu et al., 2021) for exposome variables, while Figure 1.1(T) contains the heat map of the sample correlation matrix calculated by the clustering method for metabolite variables. Omitting the singletons, 89 out of 169 exposome variables are identified to form a 9 by 9 well-organized blocks (Figure 1.1(Q)) and 141 out of 221 metabolite variables (Figure 1.1(U)) are identified to form a 7 by 7 well-organized blocks. We may assume that their population correlation matrices have block forms, as shown in Figure 1.1(R) and (V), respectively. EPSI Study. The presence of an interconnected community pattern is detected in the sample correlation matrix for an echo-planar spectroscopic imaging (EPSI) dataset, including 445 com- binations of neurometabolites and regions-of-interest (ROIs) detected by the NICE algorithm 7 (Chen et al., 2018) and shown in Figure 1.1(W) and (X). Among these 445 combinations, the sample correlation matrix of 227 combinations is extracted and implies the underlying popula- tion correlation matrix has the structure of uniform blocks in Figure 1.1(Y) and (Z). Other Studies. In addition to the above studies, analogs of the interconnected community pattern were uncovered in a cutaneous melanoma dataset from the genome-wide association study (He et al., 2015), a neuroimaging activation and connectivity dataset, a DNA methylation dataset (Chen et al., 2016), and a gene expression profiling dataset from the host peripheral blood study (Chen et al., 2018). Recently, it was extracted from a multiple myeloma dataset (He et al., 2019), a citation network study (Pal et al., 2020), and an RNA-seq dataset from the cancer genome atlas study of acute myeloid leukemia (Wu et al., 2021). Throughout the current research, we concentrate on the population covariance matrices having interconnected community structures with uniform blocks, as illustrated in Figure 1.1(C), (G), (J), (N), (R), (V), and (Z). The proposed interconnected community structure with uniform blocks has two-fold advantage. On the one hand, it leads to a dramatically reduced number of covariance parameters (from p(p+1)/2 to K +(K +1)K/2 where K is the number of diagonal blocks) but remains adequate flexibility to deal with the arbitrary dependency between covariates. On the other hand, the non-sparse uniform correlations between blocks might represent some stability (e.g., with respect to time or location in applications (Votaw et al., 1950)) and symmetry in experimental quantities, therefore, improving the interpretations. 8 1.6 Organization The rest of the dissertation is organized as follows. The definitions and properties of uniform-block matrices and their applications in hypothesis tests are presented in Chapter 2. An estimation procedure for the covariance matrix that is assumed to have the uniform-block structure is conducted under both low- and high-dimensional frameworks in Chapter 3. Multi- variate outcomes having an interconnected community structure with uniform blocks are ana- lyzed by using autoregressive regression models in Chapter 4. A semi-confirmatory factor model with uniform-block interconnected community structure is studied and an estimation procedure is proposed for the unknown matrices and factor scores in Chapter 5. All technical proofs and supplementary materials are appended at the end of the dissertation. 9 Figure 1.1: The plots in the first column are the heat maps of sample correlation matrices for the raw datasets. The plots in the second column are the heat maps of sample correlation matrices with reordered features/variables using community detection algorithms. We call these structures interconnected community structures with (generalized) well-organized blocks (if the singletons exist). The plots in the third column are the heat maps of sample correlation matrices with fea- tures/variables that are extracted from the reordered ones. We call these structures interconnected community structures with well-organized blocks. The plots in the fourth column are the heat maps of population correlation matrices that are assumed to have the interconnected community structures with uniform blocks. 10 Chapter 2: A New Representation of Uniform-Block Matrix and Applications 2.1 Introduction Covariance matrices with specific structures or patterns have been extensively studied for their crucial roles in theoretical and practical applications of multivariate analysis. Numerous ex- amples of structured covariance matrices have been employed in multivariate analysis. Mauchly (1940) proposed a spherical covariance matrix with identical positive variance parameters along the diagonal and zero correlation parameters off the diagonal. Wilks (1946) extended the spheric- ity structure to have equal non-zero values for the off-diagonal correlation parameters, terming it the uniform (intraclass or complete symmetry) structure in the application to parallel forms of a test in educational studies. Furthermore, Votaw (1948) expanded Wilks’ complete symmetry structure by incorporating the interchangeability of mutually exclusive subsets of variables, intro- ducing two types of compound symmetry covariance structures, which were utilized in medical experiments (Votaw et al., 1950). In addition to the spherical and intraclass symmetric structures, Olkin and Press (1969) proposed another covariance structure known as circular symmetry, which was applied in physical studies and time series analysis. A number of technological breakthroughs have led to significantly large-dimensional vari- ables in real-world practice, necessitating the consideration of more complex covariance struc- tures to reduce dimensionality. Customarily, various covariance structures have been developed, 11 including the bandability (Wu and Pourahmadi, 2003; Bickel and Levina, 2008a), the sparsity (Karoui, 2008; Bickel and Levina, 2008b; Cai and Liu, 2011), and the combination of sparsity and low-rank (Fan et al., 2008, 2011). Alternatively, to address the high dimensionality problem, covariance matrices can be assumed to have a block structure, where the number of unknown parameters is remarkably smaller than the original dimension. For instance, Rogers and Young (1974) generalized Wilks’ intraclass structure to an arbitrary order in an educational study, such that all diagonal blocks have the same intraclass form, as do all off-diagonal blocks. Szatrowski (1976) studied covariance matrices with block compound symmetry structures, including type I and type II, and applied them to the analysis of educational testing data (Szatrowski, 1982). Olkin (1972) introduced circular symmetry structures in blocks and proposed a more general structure known as block circular symmetry for applications in physics. Roy and Leiva (2011), Roy et al. (2015), Roy et al. (2016), and Žežula et al. (2018) have extensively investigated a block structure referred to as blocked compound symmetry or equicorrelation (partition) (Leiva, 2007; Roy and Leiva, 2008), and applied it in brain imaging and bone densitometry studies. In this chapter, our focus is on a covariance or correlation matrix with a particular block structure that is commonly observed in empirical applications. We concentrate on investigating a specific block pattern called uniform-block (UB) struc- ture, motivated by its numerous real-world applications in Chapter 1. Specifically, the UB struc- ture is characterized by diagonal and off-diagonal elements within each diagonal submatrix be- ing equal to two constants and all elements within each off-diagonal submatrix being equal to a constant. A partitioned matrix having a uniform-block structure is denoted as a uniform-block matrix. The concept of UB structures is not completely new and has been introduced by various researchers in different contexts. For instance, Geisser (1963) referred to it as the uniform case 12 of order m and derived an information test statistic for the population mean vector, when the co- variance matrix of a normal population has a UB structure of order m = 1 or 2. Morrison (1972) extended Geisser’s information test statistic to a more general order. Huang and Yang (2010) in- vestigated the random sampling issues in the presence of a UB structure in the correlation matrix. Cadima et al. (2010) referred to it as a k-group block structure and studied the eigendecomposi- tion of correlation matrices with a UB structure. Roustant and Deville (2017) named a correlation matrix with UB structure a parametric block correlation matrix with p blocks, and provided nec- essary and sufficient conditions for its positive definiteness. Roustant et al. (2020) investigated the Gaussian process regression problems using the name of generalized compound symmetry block covariance matrices for UB matrices. Recently, Archakov and Hansen (2022) examined this structured matrix, referring to it as a block matrix with block partition, and provided canoni- cal forms for both symmetric and nonsymmetric cases. However, to the best of our knowledge, there have been limited comprehensive studies on the algebraic properties of UB matrices, which restricts their applications in various fields, in- cluding statistics, biometrics, economics, finance, and others. For example, Geisser (1963) was the first to derive the null distributions of the information test statistics concerning the population mean vector(s) for both single and multiple samples, given a covariance matrix with a 2 by 2 UB structure. Specifically, Geisser (1963) derived an analogous version of Hotelling’s (general- ized) T 2-statistic regarding the population mean vector based on a single normal sample, and an analogous version of Hotelling’s (generalized) T 2 0 -statistic, known also as the Hotelling-Lawley trace (Lawley, 1938; Hotelling, 1947, 1951), for testing the equality of population mean vectors based on multiple normal samples. Under the null hypotheses, both of Geisser’s information test statistics follow identical distributions as linear combinations of independent F -variates. Al- 13 though Geisser (1963) and Morrison (1972) also extended these results to a general case with an arbitrary number of diagonal blocks, proofs were omitted. In this study, we presented the algebraic properties of UB matrices through a novel block Hadamard product representation. In essence, given a vector consisting of the block sizes, a UB matrix can be uniquely determined by a diagonal matrix and a symmetric matrix of much smaller dimensions. Moreover, these two lower-dimensional matrices (and the block-size vector) can be viewed as the “coordinates” of a UB matrix since many important algebraic calculations on UB matrices only depend on their “coordinate” matrices. As a result, this representation greatly simplifies the algebraic operations, including the power computation, inverse calculation, eigen- values determination, and determinant evaluation of a UB matrix, by leveraging its “coordinate” matrices. As an application in statistics, we revisited and rigorously established the exact null dis- tributions of Geisser’s information test statistics for a general number of orders, including single and multiple sample cases. We organize the remainder of this chapter as follows. Section 2.2 presents the definitions and properties of UB matrices. Section 2.3 and Section 2.4 demonstrate the exact null distribu- tions of Geisser’s information test statistics for one-sample and multiple-sample cases, respec- tively. Lastly, we summarize our findings and provide remarks and discussions in Section 2.5. Technical proofs are given in Chapter A. Throughout this chapter, let ⊤ denote the transpose of a vector or matrix. Let In, Jn ∈ Rn×n denote the identity matrix and all-one matrix, respec- tively. Let 0n×m, 1n×m ∈ Rn×m denote the all-zero matrix and all-one matrix, respectively. Let diag(·) and Bdiag(·) denote the diagonal matrix and the block-diagonal matrix, respectively. Let tr(·) and det(·) denote the trace and determinant of a square matrix, respectively. Let sum(·) denote the sum of all elements of a matrix. Let corr(Σ) = diag−1/2 (σ11, . . . , σpp) × Σ × 14 diag−1/2 (σ11, . . . , σpp) denote the correlation matrix of covariance matrix Σ with diagonal el- ements σ11, . . . , σpp. 2.2 Uniform-Block Structure and Uniform-Block Matrix In this section, we begin by defining a uniform-block structure and matrix. Next, we intro- duce a block Hadamard product representation for uniform-block matrices, which unveils their algebraic properties. Definition 2.2.1 (partition-size vector and partitioned matrix by a partition-size vector). Given a p by p matrix N ∈ Rp×p and a positive integer K ∈ Z+ such that K < p, we define: (1) a column vector p = (p1, . . . , pK) ⊤ ∈ ZK + is a partition-size vector, if pk > 1 for every k and p = p1 + · · ·+ pK; (2) the K by K partitioned matrix (Nkk′) of N is the partitioned matrix of N by p, if the (k, k′)-th block Nkk′ has dimensions pk by pk′ for k, k′ = 1, . . . , K. Definition 2.2.2 (uniform-block structure and matrix). Given a partition-size vector p = (p1, . . . , pK) ⊤ and the K by K partitioned matrix (Nkk′) of a symmetric matrix N by p, we define: (1) the structure of (Nkk′) is a uniform-block structure, if there exist real numbers akk and bkk′ satisfying that the diagonal block Nkk = akkIpk+bkkJpk for every k = k′ and the off-diagonal block Nkk′ = bkk′1pk×pk′ with bk′k = bkk′ for every k ̸= k′; (2) the partitioned matrix (Nkk′) is a uniform-block matrix if it has the structure of uniform- block. Furthermore, let N [A,B,p] = (Nkk′) denote this uniform-block matrix, where A = diag (a11, . . . , aKK) is a K by K diagonal matrix and B = (bkk′) is a K by K symmetric matrix with bk′k = bkk′ for every k ̸= k′. 15 Remark (non-symmetric uniform-block structure and matrix). We impose the condition of sym- metry on a uniform-block structure or matrix, as defined in Definition 2.2.2, because we will be considering covariance or correlation matrices with this structure. However, it is worth noting that a non-symmetric uniform-block structure and matrix can also be defined by removing the condi- tion bk′k = bkk′ for every k ̸= k′, i.e., allowing B to be an arbitrary K by K matrix. Nonetheless, throughout this chapter, unless explicitly stated otherwise, we refer to a uniform-block structure or matrix as symmetric. Following Definition 2.2.2, we introduce two important instances of UB matrices: the par- titioned matrices of an identity matrix Ip and an all-one matrix Jp are UB matrices, by a pre- determined partition-size vector p = (p1, . . . , pK) ⊤. For simplicity, we will use I [p] and J [p] instead of I [IK , 0K×K ,p] and J [0K×K , JK ,p] throughout the chapter. I [p] = I [IK , 0K×K ,p] = Ip =  Ip1 0p1×p2 . . . 0p1×pK 0p2×p1 Ip2 . . . 0p2×pK ... ... . . . ... 0pK×p1 0pK×p2 . . . IpK  , J [p] = J [0K×K , JK ,p] = Jp =  Jp1 1p1×p2 . . . 1p1×pK 1p2×p1 Jp2 . . . 1p2×pK ... ... . . . ... 1pK×p1 1pK×p2 . . . JpK  . Using the notations I[p] and J[p], we propose the following novel block Hadamard product representation of a UB matrix, which extremely simplifies the algebraic calculations involving 16 UB matrices. Lemma 2.2.1 (block Hadamard product representation of a UB matrix). Given a pre-determined a partition-size vector p = (p1, . . . , pK) ⊤, suppose the K by K partitioned matrix (Nkk′) of a p by p symmetric matrix N by p is a UB matrix N [A,B,p], where A = diag (a11, . . . , aKK) is a diagonal matrix and B = (bkk′) is a symmetric matrix with bk′k = bkk′ for every k ̸= k′. Then, N [A,B,p] = A ◦ I [p] + B ◦ J [p] , holds uniquely for A and B, where ◦ denotes the block Hadamard product satisfying that A◦ I [p] is the block-diagonal matrix Bdiag (a11Ip1 , . . . , aKKIpK ) and B ◦ J [p] is the symmetric block matrix ( bkk′1pk×pk′ ) . Remark (block Hadamard product representation). The matrix operator ◦ can be regarded as a specialized form of block Hadamard product that is specifically tailored for block matrices, as discussed in works (Horn et al., 1991; Günther and Klotz, 2012). We provide an illustration of Lemma 2.2.1 in Figure 2.1. A proof of Lemma 2.2.1 is available in Chapter A. From the proof, it is evident that the block Hadamard product representation holds (except for uniqueness) when pk = 1 for some k. Furthermore, the representation also holds for non-symmetric uniform-block matrices. By Lemma 2.2.1, the block Hadamard product representation is crucial for a UB matrix, as it provides an explicit expression involving the diagonal matrix A, the symmetric matrix B, and the partitioned-size vector p. Therefore, we suggest utilizing the notation N [A,B,p] in Def- inition 2.2.2 (instead using the usual notation N (A,B,p)) throughout this chapter to emphasize 17 Figure 2.1: Illustration of the block Hadamard product representation of a UB matrix Σ [A,B,p] = A ◦ I[p] + B ◦ J[p], where p = (2, 3)⊤, K = 2, p = 5, each square repre- sents an element of Σ [A,B,p], different colors represent different values. the importance of this representation. As demonstrated below, A, B, and p are sufficient and necessary for determining the expressions for the power, inverse (if it exists), and eigenvalues of a UB matrix N [A,B,p]. Corollary 2.2.1 (algebraic properties of UB matrices). Given a common partition-size vector p = (p1, . . . , pK) ⊤, suppose N = N [A,B,p], N1 = N1 [A1,B1,p] and N2 = N2 [A2,B2,p] are UB matrices with K by K diagonal matrices A = diag (a11, . . . , aKK), A1, A2, K by K symmetric matrices B = (bkk′), B1, B2. Let ∆ = A+B×P ∈ RK×K with P = diag(p1, . . . , pK). (1) (Addition/Subtraction) suppose N∗ = N1 ± N2, then the partitioned matrix of N∗ by p is a UB matrix, denoted by N∗ [A∗,B∗,p], where A∗ = A1 ± A2 and B∗ = B1 ± B2; (2) (Product) suppose N∗ = N1 × N2, in general, N∗ is not a UB matrix. But, if N1 and N2 are commute, i.e., N1 × N2 = N2 × N1, then N∗ is a UB matrix, denoted by N∗ [A∗,B∗,p], where A∗ = A1 × A2 = A∗,⊤ and B∗ = A1 × B2 + B1 × A2 + B1 × P × B2 = B∗,⊤; in particular, (2-1) (Square) suppose N∗ = N×N, then the partitioned matrix of N∗ by p is a UB matrix, denoted by N∗ [A∗,B∗,p], where A∗ = A × A and B∗ = A × B + B × A + B × P × B; 18 (2-2) (Power) suppose N∗ = N × · · · × N = Nm with integer m ≥ 2, then the partitioned matrix of N∗ by p is a UB matrix, denoted by N∗ [A(m),B(m),p ] , where A(1) = A, B(1) = B, A(m′) = A(m′−1)×A and B(m′) = A(m′−1)×B+B(m′−1)×A+B(m′−1)×P×B form′ = 2, . . . ,m; (3) (Eigenvalues) N [A,B,p] has p real eigenvalues in total, those are akk with multiplicity (pk − 1) for k = 1, . . . , K and the rest K eigenvalues are identical with those of ∆; (4) (Determinant) N [A,B,p] has the determinant (∏K k=1 a pk−1 kk ) × det (∆); (5) (Inverse) suppose N is invertible and N∗ = N−1, then the partitioned matrix of N∗ by p is a UB matrix, denoted by N∗ [A∗,B∗,p], where A∗ = A−1 and B∗ = −∆−1 × B × A−1. (6) (Canonical Form) let p̄0 = 0, p̄k = ∑k k′=1 pk for k = 1, . . . , K (then p̄K = p), and λj denote the j-th eigenvalue of N [A,B,p], where λ1 = · · · = λp̄1−1 = a11, λp̄1+1 = · · · = λp̄2−1 = a22, . . ., λp̄K−1+1 = · · · = λp̄K−1 = aKK and the rest λp̄1 , λp̄2 , . . . , λp̄K are identical with the eigenvalues of ∆ (in the decreasing order). Thus, there exists an p by p orthogonal matrix Γ satisfying that Γ × N [A,B,p] × Γ⊤ = diag (λ1, λ2, . . . , λp) and Γ can be constructed by K Helmert submatrices and K row vectors as follows: Γ =  H̃1 0(p1−1)×p2 . . . 0(p1−1)×pK ξ1,111×p1 ξ1,211×p2 . . . ξ1,K11×pK ... ... . . . ... 0(pK−1)×p1 0(pK−1)×p2 . . . H̃K ξK,111×p1 ξK,211×p2 . . . ξK,K11×pK  , where H̃k ∈ R(pk−1)×pk is the submatrix of a standard Helmert matrix of order pk without the first row (Lancaster, 1965) and ξk = (ξk,1, ξk,2, . . . , ξk,K) ⊤ ∈ RK×1 is the eigenvector (normalized to 19 the unit length) of ∆ corresponding to the eigenvalue λp̄k for every k. Remark (sufficient and necessary condition for positive definiteness). Let P = diag(p1, . . . , pK). We observe that the term ∆ = A + B × P plays a critical role in determining the eigenvalues, determinant, and inverse of an invertible UB matrix N [A,B,p]. Although ∆ is not symmetric in general, it has K real eigenvalues because ∆ = ( AP−1 + B ) P is similar to a real symmetric matrix P1/2 ( AP−1 + B ) P1/2 (because they have the same characteristic polynomial), which has K real eigenvalues. Therefore, N [A,B,p] is positive definite or invertible, if and only if A is positive definite (i.e., akk > 0 for every k) and ∆ has K positive eigenvalues. Remark (quadratic subspace). Consider the trace as an inner product, and let A denote the finite- dimensional Hilbert space of p by p real symmetric matrices. A subspace B of A is said to be a quadratic subspace of A , if B ∈ B implies that B2 ∈ B (Seely, 1971). By the square property in Corollary 2.2.1, the collection of all UB matrices having a common partition-size vector forms a quadratic subspace. Quadratic subspaces are useful in studying the completeness of minimal sufficient statistics in a family of multivariate normal distributions (Seely, 1971, 1977; Zmyślony, 1980). For example, Szatrowski (1980) explored the relationship between the quadratic subspace and the explicit representation of maximum likelihood estimators for covariance matrices in a normal model. Roy et al. (2016) proved the optimal properties of the unbiased estimator that they derived for estimating a blocked compound symmetry covariance matrix. Remark (algebraic properties for non-symmetric UB matrices). Most results in Corollary 2.2.1 also hold for non-symmetric uniform-block matrices. Specifically, the sum, difference, and prod- uct of two non-symmetric UB matrices are still a non-symmetric UB matrix with the same ex- pressions of A∗ and B∗ as in Corollary 2.2.1. The determinant and inverse (if it exists) of a 20 non-symmetric UB matrix are also a non-symmetric UB matrix with the same expressions of A∗ and B∗. However, it is worth noting that although a non-symmetric UB matrix N [A,B,p] still has p eigenvalues, i.e., akk with multiplicity (pk − 1) for every k and K eigenvalues of ∆, some of the K eigenvalues of ∆ may be complex. Subsequently, we may rearrange the K Helmert submatrices in Γ below the remaining K row vectors, resulting in a block-diagonal canonical form for a non-symmetric UB matrix (please see Theorem 1 in Archakov and Hansen (2022)). If ∆ is diagonalizable, the canonical form will have a diagonal structure, and Γ remains the same as in Corollary 2.2.1, where λp̄1 , . . . , λp̄K may be ordered in decreasing real parts. The results in Corollary 2.2.1 highlight the following two-fold advantage of using the block Hadamard product representations of UB matrices. First, calculations on K byK matrices A and B can replace calculations on a larger p by p matrix N, where K is typically much smaller than p, e.g., a proteomics study has K = 7 and p = 107 and a brain imaging study has K = 5 and p = 227 in Chapter 3. This reduction in matrix size can significantly improve computational efficiency. Second, the results involving addition or subtraction of UB matrices with a com- mon partition-size vector, as well as operations such as taking the square (or power), computing eigenvalues, determinant, and inverse (if it exists) of a UB matrix can be expressed in terms of the “coordinates” A, B, and p. These results greatly facilitate the use of UB matrices in various fields of applications. For example, in Chapter 3, we proposed the best-unbiased covariance- and precision-matrix estimators when the number of diagonal blocksK is fixed, as well as a modified hard-thresholding covariance matrix estimator when K grows with the sample size, respectively. Before proceeding to hypothesis testing problems in the next sections, we specify the rela- tionships between a covariance matrix and its precision and correlation matrix. 21 Corollary 2.2.2 (covariance matrix with a UB structure). Given a partition-size vector p = (p1, . . . , pK) ⊤, suppose Σ = Σ [A,B,p] is a p by p positive definite covariance matrix with a uniform-block structure, where A = diag (a11, . . . , aKK) and B = (bkk′) with bk′k = bkk′ for every k ̸= k′. Then, the partitioned matrix of Θ = Σ−1 by p is a UB matrix, denoted by Θ [AΘ,BΘ,p]; the partitioned matrix of Ξ = corr (Σ) by p is a UB matrix, denoted by Ξ [AΞ,BΞ,p], where  AΘ = A−1 BΘ = −∆−1 × B × A−1 ,  AΞ = C−1/2 × A × C−1/2 BΞ = C−1/2 × B × C−1/2 , with ∆ = A + B × P, P = diag(p1, . . . , pK), C = diag(c11, . . . , cKK), and ckk = akk + bkk for every k. 2.3 Testing A Specific Mean for One-Sample In the case where the number of diagonal blocksK = 1 orK = 2, Geisser (1963) proposed an information test statistic for testing a specific mean vector based on a multivariate normal sam- ple and derived its exact null distribution in closed form. The distribution of Geisser’s information test statistic under the null hypothesis is identical to the distribution of a sum of several indepen- dent F -variates. However, for the general case K > 2, Geisser (1963) provided an algorithm for calculating the information test statistic and explicitly formulated the exact null distribution, but omitted the proofs. In this section, we present the exact null distribution of the one-sample Geisser’s information test statistic using the notations of UB matrices: this exact null distribu- tion is equivalent to the distribution of a linear combination of mutually independent F -variates, 22 where the last variate is exactly the Hotelling’s T 2 statistic. Specifically, given p-dimensional normal vectors X1, . . . ,Xn i.i.d.∼ N (µ,Σ [A,B,p]), the null and alternative hypotheses are given by H0 : µ = µ0 versus H1 : µ ̸= µ0, (2.3.1) where the covariance matrix is known to have a UB stricture, A = diag (a11, . . . , aKK) is an unknown diagonal matrix, B = (bkk′) is an unknown symmetric matrix with bk′k = bkk′ for every k ̸= k′, p = (p1, . . . , pK) ⊤ is a known partition-size vector, and µ0 ∈ Rp is a pre-determined vector. To guarantee positive definiteness of Σ [A,B,p], we assume A is positive definite and ∆ = A + B × P has positive eigenvalues only, with P = diag (p1, . . . , pK). Before deriving Geisser’s information test statistic for a specific mean vector, we introduce the maximum likelihood estimator of Σ [A,B,p] based on a multivariate normal sample. Let X̄ = n−1 (X1 + · · ·+Xn), S = (n − 1)−1 ∑n i=1(Xi − X̄)(Xi − X̄)⊤, and (Skk′) denote the sample mean, the (unbiased) sample covariance matrix, and the partitioned matrix of S by p, respectively. If the sample size is larger than the total number of unknown covariance parameters, i.e., n > K + (K +1)K/2, then we can obtain the best-unbiased estimators of A and B, denoted by  = diag (â11, . . . , âKK) and B̂ = ( b̂kk′ ) with b̂k′k = b̂kk′ for every k ̸= k′, respectively, where âkk and b̂kk′ are given by âkk = pk × tr (Skk)− sum (Skk) pk × (pk − 1) , b̂kk′ =  sum (Skk′) pk × pk′ , k ̸= k′ sum (Skk′)− tr (Skk′) pk × (pk′ − 1) , k = k′ (2.3.2) 23 for every k and every k, k′ respectively (see the details in Chapter 3). It is clear that the maximum likelihood estimator âkk is exactly the average of the off-diagonal elements within the (k, k)- th diagonal block of (Skk′); b̂kk is exactly the average of diagonal elements within the (k, k)- th diagonal block minus âkk; and b̂kk′ is the average of all elements within the (k, k′)-th off- diagonal block. By Corollary 2.2.2, the plug-in estimators of Σ [A,B,p] and Θ [AΘ,BΘ,p] are Σ̂ [ Â, B̂,p ] and Θ̂ [ ÂΘ, B̂Θ,p ] , where  and ∆̂ = Â+B̂×P are assumed to be positive definite and have positive eigenvalues only, respectively, and ÂΘ and B̂Θ are given by ÂΘ =  −1 and B̂Θ = −∆̂−1 × B̂ × Â −1 . Theorem 2.3.1 (exact null distribution of Geisser’s one-sample information test statistic). Geisser’s one-sample test statistic for the hypotheses in (2.3.1) is given by U = n× ( X̄ − µ0 )⊤ × Θ̂ [ ÂΘ, B̂Θ,p ] × ( X̄ − µ0 ) . Under H0, it follows a distribution U that is identical with the distribution of K∑ k=1 (pk − 1)F (k) (pk−1),(pk−1)(n−1) + T 2, where T 2 = K(n − 1)(n −K)−1F (K+1) K,n−K is the Hotelling’s T 2-statistic and F (k) df1,df2 are (K + 1) mutually independent F -variates with degrees of freedom df1 and df2 for k = 1, . . . , K + 1. Remark (information test statistic U ). Geisser’s information criterion was proposed by Geisser (1963) to test the hypotheses in (2.3.1) using analysis of variance tables. In the distribution U , the last variate is precisely Hotelling’s (generalized) T 2-statistic, which is most likely used in multivariate inference (Anderson, 1992). 24 Remark (related distributions under H0). (1) As n→ ∞, U asymptotically follows a chi-square distribution χ2 p where p = p1 + · · ·+ pK (Geisser, 1963); (2) Given a significance level α and an arbitrary vector a ∈ Rp, the 100(1− α)% simulta- neous confidence interval for a measurable function a⊤µ has the form a⊤X̄ +± √ U(α)× a⊤Σ̂ [ Â, B̂,p ] a/n, where U(α) denotes the upper α-th percentile of the distribution U (Morrison, 1972); (3) An approximate of U is suggested as the distribution of C1F(p,C2) by Morrison (1971), where the scale coefficient C1 and the second degree of freedom C2 are determined by equating the first two cumulants of C1F(p,C2) to those of U . The specific values of C1 and C2 for K = 2 can be found in Spjøtvoll (1972) and Young (1976). Furthermore, Dyer (1982) considered the distribution of the sum of generalized F variates and Lee and Hu (1996) extended the above result to independent F -variates with arbitrary coefficients and degrees of freedom. Remark (related distributions under H1). (1) The non-null distribution was analogous to the null distribution U , except that one or more F -variates are non-central (Geisser, 1963); (2) An approximate non-null distribution of U can be represented as D1F(p,D2)(δ), where the noncentrality parameter δ = n (µ− µ0) ⊤Θ [AΘ,BΘ,p] (µ− µ0), and the scale coefficient D1 and the second degree of freedom D2 are determined by equating the first two cumulants of D1F(p,D2)(δ) to the non-null distribution of U . 25 2.4 Testing the Equality of Means for Multiple-Sample We consider a general M -sample mean test (M > 1), where the samples are drawn from M normal distributions with means µ(m) ∈ Rp, for m = 1, . . . ,M , and an equal covariance matrix with a UB structure Σ [A,B,p] ∈ Rp×p. Specifically, suppose them-th sample X(m) 1 , . . . ,X (m) nm has a size of nm, form = 1, 2, . . . ,M . Thus, the grand sample size is denoted by n = n1 + · · ·+ nM and we assume n > max{M,K + (K + 1)K/2}. Let X̄(m) = n−1 m ( X (m) 1 + · · ·+X (m) nm ) , X̄ = n−1 ( n1X̄ (1) + · · ·+ nMX̄(M) ) , and S = (n −M)−1 ∑M m=1 ∑nm j=1 ( X (m) j − X̄(m) )( X (m) j − X̄(m) )⊤ denote the m-th sample mean, the grand sample mean, and the (pooled) unbiased estimator of the common covariance matrix, respectively. The maximum likelihood estimators Â, B̂ can be obtained similarly to those in (2.3.2), yielding the estimators ÂΘ, B̂Θ, and Θ̂ [ ÂΘ, B̂Θ,p ] , respectively. Therefore, the null and alternative hypotheses can be written as H (M) 0 : µ(1) = · · · = µ(M) versus H (M) 1 : µ(m′) ̸= µ(m) for some m′ . (2.4.1) Theorem 2.4.1 (exact null distribution of Geisser’s multiple-sample information test statistic). Geisser’s multiple-sample information test statistic for the hypotheses in (2.4.1) is given by UM = M∑ m=1 nm ( X̄(m) − X̄ )⊤ Θ̂ [ ÂΘ, B̂Θ,p ] ( X̄(m) − X̄ ) . 26 Under H(M) 0 , it follows a distribution UM that is identical with the distribution of K∑ k=1 (M − 1)(pk − 1)F (k) (M−1)(pk−1),(n−M)(pk−1) + T 2 0 where T 2 0 is the Hotelling’s T 2 0 -statistic, F (k) df1,df2 are K mutually independent F -variates (and independent from T 2 0 ) with degrees of freedom df1 and df2 for k = 1, . . . , K. Remark (information test statisticUM ). The Hotelling’s (generalized) T 2 0 -statistic, also known as the Hotelling-Lawley trace, is commonly used to test the equality of multiple populations means, assuming these multiple normal populations have the same (arbitrary) population covariance ma- trices (Lawley, 1938; Hotelling, 1947, 1951). However, it is intractable to obtain the exact null or non-null distribution of Hotelling’s T 2 0 -statistic, and various approximations have been proposed in the literature (Ito, 1956, 1960; Pillai and Young, 1971; Siotani, 1971; McKeon, 1974). 2.5 Discussion In this chapter, we concentrate on the algebraic properties of a specific type of block ma- trices, where each block is uniform. We chose to parameterize the matrices in this way for two key reasons. First, the uniform-block pattern has been popularly discovered in plenty of large- scale biological data. Second, from a biological perspective, the variables that are clustered into the same community may exhibit stochastic equivalence or comparable patterns, while variables from different communities may have coherent connections at the community level. Compared to the conventional diagonal or block-diagonal structure, the proposed uniform-block structure offers more flexibility and is better suited for real data analysis, since the information contained 27 in the non-zero off-diagonal blocks can potentially provide valuable insights into the scientific mechanisms. In addition to defining a uniform-block structure, we have discovered a unique block Hadamard product representation for a uniform-block matrix. This representation plays an im- portant role because it allows for the transformation of a large-scale uniform-block matrix into two lower-dimensional matrices and an integer-valued vector. The block Hadamard product rep- resentation simplifies the computations related to uniform-block matrices. With these algebraic properties, the uniform-block matrices are applicable to various statistical problems. For exam- ple, covariance estimation with the uniform-block structure in Chapter 3, hypothesis testing for the information test statistics, the multivariate linear regression models in Chapter 4, and confir- matory factor analysis models in Chapter 5. In conclusion, a uniform-block matrix (or structure), its associated algebraic properties, and the block Hadamard product representation have broad applications in a range of fields, including linear algebra, statistics, economics, and many others. 28 Chapter 3: Covariance Matrix Estimation for High-Throughput Biomedical Data with Dependence Structure of Interconnected Communities 3.1 Introduction Technological innovations in biomedicine have facilitated the generation of high-dimensional datasets with simultaneous measurements of up to millions of biological features (Fan and Lv, 2008). In the past few decades, numerous statistical methods have been developed to analyze these large-dimensional datasets. Estimating a covariance matrix (or a precision matrix) is fun- damental to these analyses (Fan et al., 2016; Cai et al., 2016; Wainwright, 2019) because a covari- ance matrix not only describes the complex interactive relations among variables but also leads to accurate inferential and predictive results for clinical outcomes (He et al., 2019; Ke et al., 2022). Since the dimensionality of the variables is much larger than the sample size, we resort to advanced statistical methods rather than traditional covariance estimation strategies (Johnstone, 2001; Johnstone and Paul, 2018). The shrinkage and thresholding methods can provide a reliable and robust covariance estimator under the sparsity assumption (Ledoit and Wolf, 2004; Bickel and Levina, 2008b; Rothman et al., 2009; Cai and Liu, 2011). In addition, prior knowledge of a covariance structure can greatly improve the accuracy of estimation and statistical infer- ence (Fan, 2005; Bien, 2019). For example, recent methods can accommodate Toeplitz, banded, 29 block-diagonal covariance structures (Cai et al., 2013; Bickel and Levina, 2008a; Devijver and Gallopin, 2018). In the present research, we consider a widely prevalent covariance structure in a large body of high-dimensional datasets (see the examples in Figure 3.1, extra examples in Chapter B, and more examples in Figure 1.1). A well-organized block pattern is widely observed in most commonly used biomedical data types, including genetics, proteomics, brain imaging, and RNA expression data, among many others (Spellman et al., 1998; Yildiz et al., 2007; Chiappelli et al., 2019; Chen et al., 2016; He et al., 2019, 2015; Wu et al., 2021). This block pattern exhibits several properties of highly organized networks. For example, there is high modality as some variables are clustered in the multiple and coherently correlated communities; there is small-worldness as these communities are interconnected; and the network is scale-free as the remaining variables are isolated if singletons (see the top parts of Figure 3.1(B) and (C)) are detected (Newman, 2006). Therefore, we can specify this well-organized block pattern by assigning the high-dimensional variables to multiple interconnected communities (see the middle parts of Figure 3.1(B) and (C)), which are more informative, and a set of detected singletons. The interconnected community structures might not be directly available from the high-dimensional biomedical data, but they can be estimated by several clustering algorithms and network detection methods (Magwene, 2021; Wu et al., 2021). Although there are potential benefits to leveraging the estimated interconnected community structure to enhance the estimation of large covariance matrices, existing statistical methods are restricted to establishing a connection between this structure and the covariance or precision parameters and providing precise estimates. To address this methodological gap, we propose a novel statistical procedure that enables closed-form estimators of large covariance and precision matrices and supports robust statistical inference. 30 Figure 3.1: A: top, heat map of the sample correlation matrix calculated by a K-medoids cluster- ing algorithm for Spellman’s yeast genome study, showing 8 by 8 well-organized blocks (without singletons) in the structure; bottom: an assumed population correlation matrix with an 8 by 8 uniform-block structure. B and C: top, heat maps of the sample correlation matrices calculated by a network detection algorithm for proteomics and brain imaging datasets, respectively, show- ing the block patterns (with singletons); middle, heat maps in the black frames in the top parts, showing 7 by 7 and 5 by 5 well-organized blocks respectively in the structure; bottom, assumed population correlation matrices with a 7 by 7 and a 5 by 5 uniform-block structures, respectively. D: top, illustration of a network model with 5 communities; bottom, a corresponding population matrix with a 5 by 5 uniform-block structure. 31 We propose a parametric covariance model that subdivides the covariance matrix into blocks or submatrices and assigns each block to either a community or an interconnection be- tween two communities based on the observed interconnected community structure. Linking the covariance parameters and the underlying network topological structure, we can facilitate a closed-form estimator of each covariance parameter (using only the elements within the corre- sponding block) and can establish the asymptotic properties for the proposed estimators. Specif- ically, we derive the explicit estimators using advanced matrix theories including the block Hadamard product representation of a covariance matrix with the above structure, and the thresh- olding covariance regularization in the high-dimensional setting, where the number of covariance parameters in all blocks exceeds the sample size. Our method makes at least three novel contributions. Firstly, we have developed a fast (closed-form) and accurate procedure for estimating a large covariance (and precision) matrix with a particular structure that is applicable to various high-throughput biomedical data includ- ing genomics, metabolomics, proteomics, neuroimaging data, and many others. Our method quantitatively characterizes the interconnected community structure by estimating parameters in both diagonal sub-matrices (i.e., communities) and off-diagonal sub-matrices (i.e., interactions between communities), and thus better reveals the interactive mechanisms of a complex biosys- tem. By utilizing this interconnected community structure, our large covariance matrix estimation procedure also outperforms comparable methods in terms of accuracy and is numerically robust to model misspecification. Consequently, our approach can lead to a more precise selection of biological features of interest (e.g., cancer-related gene expressions) in the context of multiple testing which relies on the accurate and reliable large covariance- and precision-matrix estimation (Fan et al., 2012; Fan and Han, 2017). Secondly, we have derived the exact variance estimators 32 of the covariance parameter estimators and established the asymptotic properties, which enables us to evaluate covariance patterns and provide confidence intervals. Lastly, we have extended our method to accommodate scenarios where the number of diagonal blocks exceeds the sample size, allowing for scalability to accommodate ultra-high-dimensional datasets. The rest of the chapter is organized as follows. Section 3.2 introduces our proposed method. We first mathematically define the uniform-block structure and the uniform-block matrix in Sec- tion 3.2.1. Then, in Section 3.2.2, we derive the best-unbiased covariance- and precision-matrix estimators for small K by taking advantage of the block Hadamard product representation. We generalize the estimating procedure to large K with a diverging number of diagonal blocks in Section 3.2.3. Section 3.3 contains thorough numerical evaluations of our method under various scenarios. Section 3.4 illustrates the use of our method in two real-world applications. We pro- vide a discussion in Section 3.5. Additional data examples, the exact covariance estimators for the covariance parameters in the blocks, more simulation studies, and all the technical proofs are given in Chapter B. 3.2 Methodology 3.2.1 A parametric covariance model with a uniform-block structure Suppose X ∈ Rn×p is an n by p observed data matrix containing n independent and identi- cally distributed p-variate normal vectors X1, . . . ,Xn ∼ N(µ,Σ) with mean µ := E(X1) ∈ Rp, positive definite covariance matrix Σ := cov(X1) ∈ Rp×p (denoted by Σ ≻ 0), and precision matrix Ω := Σ−1. Let S∗ and S denote the biased and unbiased versions of the sample covariance matrix, respectively: S := (n − 1)−1 ∑n i=1(Xi − X̄)(Xi − X̄)⊤ and S∗ := n−1(n − 1)S with 33 X̄ := n−1(X1 + · · ·+Xn), where M⊤ denotes the transpose of a matrix (or a vector) M. More- over, we require that the covariance matrix has the uniform-block structure described below. The parameterization of this covariance structure is illustrated in two steps. We specify the parametric covariance matrix based on our previous discussions of the uniform-block structure. First, we use a vector to characterize the community sizes. Specifi- cally, given the dimension p of the covariance matrix Σ and the number of communities K, let p1, . . . , pK be positive integers satisfying pk > 1 (k = 1, . . . , K) and p = p1 + · · · + pK and let p := (p1, . . . , pK) ⊤ be the partition-size vector, which is assumed to be fixed throughout the paper. Given p, we can express Σ partitioned by p in block form: (Σkk′) :=  Σ11 Σ12 . . . Σ1K Σ21 Σ22 . . . Σ2K ... ... . . . ... ΣK1 ΣK2 . . . ΣKK  , (3.2.1) where Σkk′ ∈ Rpk×pk′ (k, k′ = 1, . . . , K). Second, following (3.2.1), we specify the diagonal submatrix Σkk := akkIpk + bkkJpk for every k and the off-diagonal submatrix Σkk′ := bkk′1pk×pk′ with bkk′ = bk′k for every k ̸= k′, where Is, Js, and 1s×t are an s by s identity matrix, an s by s all-one matrix, and an s by t all-one matrix, respectively. Using K < p1 + · · ·+ pK = p, we can represent a large covariance matrix Σ by a smaller diagonal matrix A := diag(a11, . . . , aKK), a 34 smaller symmetric matrix B := (bkk′) with bkk′ = bk′k for every k ̸= k′, and a known vector p: Σ(A,B,p) := (Σkk′) =  a11Ip1 + b11Jp1 b121p1×p2 . . . b1K1p1×pK b211p2×p1 a22Ip2 + b22Jp2 . . . b2K1p2×pK ... ... . . . ... bK11pK×p1 bK21pK×p2 . . . aKKIpK + bKKJpK  . (3.2.2) In this chapter, we say that the pattern in (3.2.2) is a uniform-block structure. If Σ(A,B,p) has the structure in (3.2.2), it is a uniform-block matrix. This covariance parameterization strat- egy based on Σkk and Σkk′ has been widely used in the statistics literature. For example, the generalized estimation equations where the working correlation structure has a compound sym- metry as well as linear mixed-effects models with a random intercept both have this pattern. In practice, this parameterization strategy can characterize the covariance structure well using a par- simonious model (as shown in Figure 3.1). In Section 3.3, we demonstrate that the performance of this parameterization is robust under misspecification and matrix perturbation. By building the parsimonious and effective covariance-matrix specification, we can develop reliable and accurate covariance-matrix estimations using the likelihood approach and achieve optimistic properties. Notice that the partition-size vector p is assumed to be known throughout this chapter. In practice, it can be learned by a preliminary algorithm (for example, a K-medoids clustering algorithm (Magwene, 2021) or a network detection algorithm (Wu et al., 2021)) before estimating the covariance matrix. Also, the above definition of a uniform-block matrix does not guarantee its positive definiteness in general. Thus, additional constraints should be imposed on the uniform- block matrix Σ(A,B,p) to ensure that it is a valid covariance matrix. We defer the discussion of 35 these constraints to the next section. 3.2.2 Matrix estimation for the uniform-block structure with a small K Given a partition-size vector p and the parametric covariance matrix Σ(A,B,p) with the uniform-block structure (3.2.2), we define a q-dimensional parameter vector θ := (a11, . . . , aKK , b11, . . . , b1K , b22, . . . , bKK) ⊤ consisting of the covariance parameters in the blocks. That is, the parameters of interest are in the upper triangular part of Σ(A,B,p). Also, q = K + K(K + 1)/2. Thus, the problem of estimating a p by p symmetric covariance matrix reduces to that of estimating the q-dimensional parameter vector θ. In practice, q is considerably smaller than p(p + 1)/2, thereby remarkably reducing the dimensionality of the parameters of interest. The small K setting can be specified as follows: q < n while K, p, and q are fixed and p can be greater than n. In other words, both the number of diagonal blocks K and the number of parameters in the blocks q = K + K(K + 1)/2 are smaller than the sample size n. Moreover, we require that K and the dimension of the covariance matrix p, which is proportional to K, are fixed, so that q is also fixed. Also, p can be large enough to exceed nwith smallK. In this section, we first introduce an explicit maximum likelihood estimator of θ with asymptotic properties and then refine it to be the best-unbiased estimator, whose exact variance estimator is also provided in closed form. Finally, we provide the best unbiased covariance- and precision-matrix estimators for small K. We start with the maximum likelihood estimation. Specifically, let (S∗ kk′) be the block form 36 of the biased sample covariance matrix S∗ partitioned by p. If the data X are normally distributed, then the log-likelihood function of the data can be expressed by ℓn(θ;X) ∝ n 2 log ( det [ {Σ(A,B,p)}−1])− n 2 tr [ (S∗ kk′)× {Σ(A,B,p)}−1] . A typical way to estimate θ in the literature is to derive the score function by taking the first-order partial derivative of the log-likelihood function with respect to θj: ∂ ∂θj ℓn(θ;X) = n 2 tr [ {Σ(A,B,p)− (S∗ kk′)} × ∂ {Σ(A,B,p)}−1 ∂θj ] (j = 1, . . . , q), (3.2.3) where θj ∈ {a11, . . . , aKK , b11, . . . , b1K , b22, . . . , bKK} denotes the jth component of θ and ∂{Σ(A,B,p)}−1/∂θj ∈ Rp×p is a matrix whose entries are functions of θj . However, solving the score equation (3.2.3) is challenging. Although the unknown entries akk and bkk′ are uniformly and elegantly arranged in Σ(A,B,p), they are entangled in a complex way in the precision matrix Ω = {Σ(A,B,p)}−1. In other words, θj is implicit in Ω so that the closed form of Ω is not accessible in general. The complexity of the calculation increases if the scale of the precision matrix increases. Alternatively, the existing numerical algorithms for solving θ (for example, the method of averaging (Szatrowski, 1980)) rely on iterative updating schemes, which require a long computational time and may lead to unstable estimates. These facts motivate us to reconsider the possibility of deriving a closed-form estimator of θ. Thus, we aim to find an explicit expression for Ω in terms of akk and bkk′ by taking advantage of the special covariance structure in (3.2.2). More precisely, we speculate that Ω has an analogous form to (3.2.2), which indeed can be fulfilled by realizing the following representation of the 37 block Hadamard product. Lemma 3.2.1. Given a partition-size vector p = (p1, . . . , pK) ⊤ satisfying pk > 1 for every k and p = p1 + · · · + pK , Ip and Jp partitioned by p are uniform-block matrices, expressed by I(p) := Ip(IK , 0K×K ,p) = Bdiag(Ip1 , . . . , IpK ) and J(p) := Jp(0K×K , 1K×K ,p) = (1pk×pk′ ), respectively, where 0r×s denotes the r by s zero matrix and Bdiag(·) denotes a block-diagonal matrix. Suppose a p by p matrix N partitioned by p is a uniform-block matrix of the form (3.2.2), expressed by N(A,B,p), then the following representation is unique, N(A,B,p) = A ◦ I(p) + B ◦ J(p), where ◦ denotes the block Hadamard product satisfying that A◦I(p) := Bdiag (a11Ip1 , . . . , aKKIpK ) and B ◦ J(p) := ( bkk′1pk×pk′ ) . Based on Lemma 3.2.1, we derive several basic properties of a uniform-block matrix, which are summarized in Chapter 2. These properties reveal how A, B, and p determine the algebraic operations for a uniform-block matrix N(A,B,p) and how a collection of uniform-block matri- ces with the same p form a quadratic subspace (Seely, 1971). If we view A, B, and p as the “coordinates” of a uniform-block matrix, then using the notation N(A,B,p) can simplify the mathematical operations between p by p uniform-block matrices into those between their cor- responding lower-dimensional K by K “coordinates”. Following Chapter 2, we get two useful results for the covariance and precision matrices. Corollary 3.2.1. (1) Given Σ(A,B,p) defined in (3.2.2), Σ(A,B,p) ≻ 0 if and only if A ≻ 0 and ∆ has positive eigenvalues only, where ∆ := A + B × P and P := diag(p1, . . . , pK). 38 (2) Suppose Σ(A,B,p) ≻ 0 and Ω = {Σ(A,B,p)}−1 is the precision matrix. Then, Ω partitioned by p is a uniform-block matrix as well, which can be written by Ω(AΩ,BΩ,p) = AΩ ◦ I(p) + BΩ ◦ J(p), where AΩ = A−1, BΩ = −∆−1 × B × A−1. (3.2.4) The first assertion of Corollary 3.2.1 finalizes the additional constraints that guarantee that Σ(A,B,p) ≻ 0. The second assertion confirms that the precision matrix Ω = {Σ(A,B,p)}−1 partitioned by p is a uniform-block matrix, expressed by Ω(AΩ,BΩ,p). Furthermore, (3.2.4) provides the relations between AΩ, BΩ, and A, B. Therefore, applying the representation of the precision matrix in Corollary 3.2.1, we can rewrite the partial derivative of the log-likelihood in (3.2.3) as ∂ ∂θj ℓn(θ;X) = n 2 tr [ {Σ(A,B,p)− (S∗ kk′)} × { ∂AΩ ∂θj ◦ I(p) + ∂BΩ ∂θj ◦ J(p) }] , (3.2.5) where θj ∈ {a11, . . . , aKK , b11, . . . , b1K , b22, . . . , bKK} (j = 1, . . . , q). In contrast to (3.2.3), the derivatives in (3.2.5) can be calculated by: ∂AΩ ∂akk = −a−2 kk Ekk, ∂BΩ ∂akk = ∆−1Ekk∆ −1BA−1 + a−2 kk∆ −1BEkk (k = 1, . . . , K), ∂AΩ ∂bkk′ = 0K×K , ∂BΩ ∂bkk′ =  −∆−1EkkP∆−1P−1, k = k′ −∆−1(Ekk′ + Ek′k)P∆−1P−1, k ̸= k′ (k, k′ = 1, . . . , K), where Ekk′ ∈ RK×K denotes a matrix with 1 in the (k, k′) entry and 0 otherwise. The explicit 39 forms of the derivatives highlight the advantage of (3.2.5) over (3.2.3). Owing to (3.2.5), we can explicitly derive the analytic form of the maximum likelihood estimator for θ: ã∗kk := tr(S∗ kk) pk − 1 − sum(S∗ kk) pk × (pk − 1) , b̃∗kk′ :=  sum(S∗ kk′) pk × pk′ , k ̸= k′ sum(S∗ kk′)− tr(S∗ kk′) pk × (pk′ − 1) , k = k′ , (3.2.6) for every k and k′, where sum(M) = ∑r j=1 ∑s j′=1mjj′ denotes the sum of all entries in M := (mjj′) ∈ Rr×s. Technical details are referred to Chapter B. We denote the maximum likelihood estimator of θ as θ̃∗ := (ã∗11, . . . , ã ∗ KK , b̃ ∗ 11, . . . , b̃ ∗ 1K , b̃ ∗ 22, . . . , b̃ ∗ KK) ⊤. The strong consistency, the asymptotic efficiency, and normality for θ̃∗ can similarly be derived by standard procedures in the literature (Ferguson, 1996; van der Vaart and Wellner, 1996; van der Vaart, 2000). Please refer to Chapter B. Despite its asymptotic property, θ̃∗ is biased under a finite sample size and, therefore, is not a uniformly minimum variance unbiased estimator. Namely, θ̃∗ is not the best unbiased estimator for θ. Thus, it is natural to consider an unbiased estimator of θ by replacing S∗ with the unbiased version S in (3.2.6). Specifically, let (Skk′) be the block form of S partitioned by p. Substituting Skk′ for S∗ kk′ in (3.2.6), for every k and k′, we have ãkk := tr(Skk) pk − 1 − sum(Skk) pk × (pk − 1) , b̃kk′ :=  sum(Skk′) pk × pk′ , k ̸= k′ sum(Skk′)− tr(Skk′) pk × (pk′ − 1) , k = k′ . (3.2.7) We denote the above unbiased estimator as θ̃ := (ã11, . . . , ãKK , b̃11, . . . , b̃1K , b̃22, . . . , b̃KK) ⊤. 40 Alternatively, since S = n(n − 1)−1S∗, we have a relation between the maximum likelihood estimator θ̃∗ and the unbiased estimator θ̃: ãkk = n n− 1 ã∗kk, b̃kk′ = n n− 1 b̃∗kk′ (k, k′ = 1, . . . , K), θ̃ = n n− 1 θ̃∗. Moreover, the following theorem summarizes the optimal property of θ̃. Theorem 3.2.1 (Optimal property of θ̃). θ̃ is the best-unbiased estimator of θ, and µ̃ := X̄, Σ̃(Ã, B̃,p) = à ◦ I(p) + B̃ ◦ J(p) (3.2.8) are the best unbiased estimators of µ and Σ(A,B,p), where à := diag(ã11, . . . , ãKK) and B̃ := (̃bkk′) with b̃kk′ = b̃k′k for every k ̸= k′ are the best unbiased estimators of A and B. Geisser (1963) and Morrison (1972) derived the same estimator θ̃ based on an analysis of the variance table, but they did not show the above optimal property. Furthermore, after calculating ∆̃ := à + B̃ × P with the best unbiased estimators à and B̃, we derive the best- unbiased estimator for the precision matrix, as summarized in the following corollary. Corollary 3.2.2. If both à and ∆̃ are positive definite, then Ω̃(ÃΩ, B̃Ω,p) = ÃΩ ◦ I(p) + B̃Ω ◦ J(p) (3.2.9) is the best unbiased estimator of Ω(AΩ,BΩ,p), ÃΩ := à −1 and B̃Ω := −∆̃−1 × B̃ × Ã −1 . The covariance matrix of the proposed estimator θ̃ can be calculated from the Fisher infor- mation matrix based on the asymptotic normality. Given a small K (say, K ≤ 3, empirically), 41 the Fisher information matrix and its inverse may have explicit expressions. For some real ap- plications in which K is relatively large (say, K > 3), a calculation of the inverse of the Fisher information matrix might be burdensome and unstable. Alternatively, we provide the exact vari- ance and covariance estimators of the elements of θ̃ in Corollary 3.2.3 and Chapter B under a finite sample size, respectively. Corollary 3.2.3. The exact variance estimators of ãkk and b̃kk′ are var(ãkk) = 2a2kk (n− 1)(pk − 1) , var(̃bkk′) =  2 (n−1)pk(pk−1) {(akk + pkbkk) 2 − (2akk + pkbkk)bkk} , k = k′ 1 2(n− 1)pkpk′ {pkpk′(b2kk′ + b2k′k) + 2(akk + pkbkk)(ak′k′ + pk′bk′k′)} , k ̸= k′ for every k and k′. The exact covariance estimators of ãℓℓ and b̃kk′ for every ℓ, k and k′ are given in Chapter B. At the end of this section, we briefly introduce how we can extend the above estimation procedure to situations with singletons. In addition to the communities assumed to follow the uniform-block structure (for example, the blocks in the black frames in the top parts of Fig- ure 3.1(B) and (C)), the other singletons on the diagonal are also commonly detected in real ap- plications (see the diagonal entries outside the black frames in the top parts of Figure 3.1(B) and (C)), such as a proteomics study (Yildiz et al., 2007) and a brain imaging study (Chiappelli et al., 2019). Notice that a structure of well-organized communities with singletons may imply that a population covariance matrix of the form Σ(full) := {Σ(A,B,p),D1;D⊤ 1 ,D2}, where D1 ∈ Rp×d, D2 ∈ Rd×d, and Σ(full) ∈ R(p+d)×(p+d), and d denotes the number of singletons. Following the 42 decomposition idea in Fan et al. (2013), we can estimate Σ(full) in two steps. First, we partition the unbiased sample covariance matrix S(full) := {(Skk′),S1;S⊤ 1 ,S2} in the same way. Second, we obtain the best unbiased estimator Σ̃(Ã, B̃,p) based on (Skk′) and obtain the hard-thresholding estimator R̃(full) of the matrix R(full) := {0p×p,S1;S⊤ 1 ,S2} (Bickel and Levina, 2008a), yielding the consistent estimator Σ̃(full) := Bdiag{Σ̃(Ã, B̃,p), 0d×d} + R̃(full) under a matrix norm if p is fixed and d may grow with the sample size n. 3.2.3 Matrix estimation for the uniform-block structure with a large K In Section 3.2.2, we estimated the covariance matrix with the uniform-block structure and its precision matrix for small K. However, there are applications where the covariance matrices are uniform-block matrices with more diagonal blocks than the sample size. More specifically, a large K occurs when K, q > n and K, p, and q grow with n. In other words, the number of diagonal blocks K is greater than the sample size n, and so is the number of covariance param- eters in the blocks: q = K + K(K + 1)/2. Moreover, we require that K, the dimension p of the covariance matrix (which is proportional to K), and q grow with n and diverge as n goes to infinity. In this section, we generalize the proposed estimation procedure for small K and intro- duce a consistent covariance-matrix estimator by modifying the hard-thresholding method with largeK. Denote ∥M∥F = ( ∑r j=1 ∑r j′=1m 2 jj′) 1/2 and ∥M∥S = max∥x∥2=1∥Mx∥2 as the Frobenius norm and spectral norm of M := (mjj′) ∈ Rr×r respectively, where ∥x∥2 := ( ∑r j=1 x 2 j) 1/2 for x := (x1, . . . , xr) ⊤ ∈ Rr. For normal data X ∈ Rn×p with population mean µ = 0p×1, the unbiased sample covari- ance matrix S = X⊤X/n, and the population covariance matrix with the uniform-block structure 43 Σ(A,B,p) for large K, we propose a new thresholding approach based on the work by Bickel and Levina (2008b). Since covariance matrix Σ(A,B,p) is fully determined by A, B, and p according to Lemma 3.2.1, we threshold the estimates of A and B, rather than S, to yield a covariance-matrix estimate. Specifically, given a thresholding level λ = λn > 0, let âkk(λ) := ãkk × I(|ãkk| > λ), b̂kk′(λ) := b̃kk′ × I(|̃bkk′ | > λ) (3.2.10) be the hard-thresholding estimators of akk and bkk′ , respectively, where ãkk and b̃kk′ are the best unbiased estimators for every k and k′ and I(·) is the indicator function. We regard the new covariance-matrix estimator Σ̂λ(Âλ, B̂λ,p) = Âλ ◦ I(p) + B̂λ ◦ J(p) (3.2.11) as the modified hard-thresholding estimator of Σ(A,B,p), where Âλ := diag{â11, . . . , âKK}, B̂λ := {b̂kk′(λ)} with b̂kk′(λ) = b̂k′k(λ) for every k ̸= k′. The consistency of this modified hard-thresholding estimator is summarized below. Theorem 3.2.2 (Consistency of the modified hard-thresholding estimator). Given a positive- definite covariance matrix with the uniform-block structure Σ(A,B,p) = (σjj′) defined as (3.2.2), suppose K = K(n) > n → ∞ and log(K)/n → 0 as n → ∞, and there exist constants 0 < Cp0 , Cq0 < ∞ such that ∑p j=1 ∑p j′=1 |σjj′ | p0 ≤ Cp0 and max j ∑p j′=1 |σjj′ | q0 ≤ Cq0 for 44 0 < p0 < 2 and 0 < q0 < 1, then ∥Σ̂λ(Âλ, B̂λ,p)−Σ(A,B,p)∥2F ≤ OP (1)Cp0 { log(K) n }1− p0 2 , ∥Σ̂λ(Âλ, B̂λ,p)−Σ(A,B,p)∥S ≤ OP (1)Cq0 { log(K) n } 1−q0 2 , where we choose λ = C{log(K)/n}1/2 for some positive constant C. The performance of the modified hard-thresholding estimator (3.2.10) relies on the choice of λ, which can be determined by applying the resampling rule in Bickel and Levina (2008a,b). 3.3 Numerical Studies 3.3.1 Simulations To evaluate the performance of the proposed method comprehensively, we simulate data and benchmark them against comparable estimation methods for large covariance (and precision) matrices in the following three scenarios. In Scenario 1 (Section 3.3.2), we generate normal data using a covariance matrix with a uniform-block structure Σ0,1(A0,B0,p1) for n subjects for small K. That is, the number of diagonal blocks K is small so that the number of covariance parameters in the blocks q is smaller than the sample size n, whereas the dimension of the covariance matrix p can be greater than n. We first focus on evaluating the finite-sample performance of the parameter vector estimator θ̃ in (3.2.7) by comparing the estimates with the ground truth. We also assess the accuracy of the exact covariance estimator for θ̃, as presented in Chapter B. We then present the performance of the covariance estimator