ABSTRACT Title of dissertation: GOODNESS OF FIT TESTS FOR GENERALIZED LINEAR MIXED MODELS Min Tang, Doctor of Philosophy, 2010 Dissertation directed by: Professor Eric V. Slud Department of Mathematics Dr. Ruth M. Pfei?er National Cancer Institute Generalized Linear mixed models (GLMMs) are widely used for regression analysis of data, continuous or discrete, that are assumed to be clustered or corre- lated. Assessing model ?t is important for valid inference. We therefore propose a class of chi-squared goodness-of-?t tests for GLMMs. Our test statistic is a quadratic form in the di?erences between observed values and the values expected under the estimated model in cells de?ned by a partition of the covariate space. We show that this test statistic has an asymptotic chi-squared distribution. We study the power of the test through simulations for two special cases of GLMMs, linear mixed models (LMMs) and logistic mixed models. For LMMs, we further derive the analytical power of the test under contiguous local alternatives and compare it with simulated empirical power. Three examples are used to illustrate the proposed test. GOODNESS OF FIT TESTS FOR GENERALIZED LINEAR MIXED MODELS by Min Tang Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial ful?llment of the requirements for the degree of Doctor of Philosophy 2010 Advisory Committee: Dr. Eric V. Slud, Co-Advisor Dr. Ruth M. Pfei?er, Co-Advisor Dr. Paul J. Smith Dr. Abram M. Kagan Dr. Wolfgang Losert c? Copyright by Min Tang 2010 Dedication To my parents and Jun. ii Acknowledgments It is a great honor for me to thank all the people who made this thesis possible. First and foremost, I owe my deepest gratitude to my two coadvisors, Dr. Eric V. Slud and Dr. Ruth M. Pfei?er. I am grateful to Dr. Eric V. Slud for his expert guidance and continuous support during my research and study at University of Maryland, College Park. Dr. Slud has been a signi?cant presence in my life. His perpetual energy and his rigor and enthusiasm in research have been motivating me through my whole doctoral study. His insights have strengthened this study signi?cantly. I will always be thankful for his wisdom, knowledge, deep concern and constant encouragement. I would like to show my greatest gratitude and sincere thanks to Dr. Ruth M. Pfei?er who o?ered me the predoctoral training at National Cancer Institute (NCI) which led me to the world of Biostatistics and cancer research. Her rigor and passion on research in?uenced me a lot. I am deeply indebted to her for her being always ready to help and her precious time on guiding me through my research. I could not have ?nished this thesis within four years without her invaluable help. She has also made available her support in a number of other ways, such as revising my CV, giving suggestions to my job talk rehearsal. She is the best mentor I have ever met. It has been an honor for me to work with her. I am grateful to all my committee members Dr. Paul J. Smith, Dr. Abram Kagan and Dr. Wolfgang Losert for their valuable comments and suggestions to this thesis. I would like to thank Dr. Paul J. Smith for all his generous help and iii guidance on taking courses in my entire graduate period. I also want to thank Dr. Abram M. Kagan for his interest and discussion with me on the MLE consistency proof in this thesis. My special thanks go to Dr. Partha Lahiri for initially recommending me to the Biostatistics Branch in NCI. Many thanks to all my colleagues and collaborators involved in the projects I have been working on at NCI. I would also show my sincere thanks to all my friends at the Mathematics Department, especially Ziliang Li for all the discussions on my thesis, Tinghui Yu for his generous help on my statistical questions, Ning Jiang for his encouragement during my qualify exam stage, Huilin Li for her help and sharing information during my stay at NCI and Lingyan Cao, Yong Zheng, Guanhua Lu, Denise, Shihua Wen, Shu Zhang, Wei Guo, Neung Ha and Rongrong Wang, for their sincere help and all the good time with them during my graduate study. I am grateful to my cousins Lifang Liu and Juan Tang who provided tremen- dous help when I was applying for graduate school, who took care of my parents in China when I was pursuing my Ph.D. in the US, who have been standing by my side at all stages of my life. I also want to thank my good friend Caixia Kou and all my other friends in China for their constant supports. My deepest gratitude goes to my family for their un?agging love and support throughout my life. Finally, I am forever indebted to Jun. His encouragement and companionship have turned my journey through graduate school into a pleasure. This dissertation is simply impossible without them. iv Table of Contents List of Tables vii List of Figures viii 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Overview of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Goodness of ?t tests for linear mixed models 7 2.1 Linear mixed models (LMMs) . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Goodness of ?t test statistic . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Test statistic and its asymptotic properties for LMMs when parameters are estimated by maximum likelihood . . . . . . . 10 2.2.1.1 LMM with a single random e?ect . . . . . . . . . . . 10 2.2.1.2 LMM with additive random e?ects . . . . . . . . . . 15 2.2.2 Test statistic and its asymptotic properties for two-level LMM with parameters estimated by least squares and method of moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.3 Power of the test . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.1 Normally distributed covariates (Scenario I) . . . . . . . . . . 30 2.3.1.1 Impact of choice of the cell partition on power . . . . 34 2.3.1.2 Robustness of T with respect to error distribution . . 35 2.3.1.3 A summary parameter related to the power . . . . . 36 2.3.2 Normally distributed interacting covariates (Scenario II) . . . 38 2.4 Data examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.1 Birth weight data . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.2 Alcohol data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.4.3 Factors impacting thyroglobulin levels in an iodine de?cient population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.6 Technical details for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . 48 2.6.1 Proof of Theorem 2.3 . . . . . . . . . . . . . . . . . . . . . . . 48 2.6.2 Proof of Corollary 2.7 . . . . . . . . . . . . . . . . . . . . . . 52 2.6.3 Proof of Theorem 2.10 . . . . . . . . . . . . . . . . . . . . . . 52 2.6.4 Proof of Theorem 2.12 . . . . . . . . . . . . . . . . . . . . . . 53 2.6.5 Derivation of the power of T . . . . . . . . . . . . . . . . . . . 59 2.6.5.1 Limit of (X?)TV?1X and (X?)TV?1(X?) in (2.22) . 64 2.6.5.2 Limit of ? in (2.22) . . . . . . . . . . . . . . . . . . 70 v 3 Goodness of ?t tests for generalized linear mixed models 73 3.1 Generalized linear mixed models (GLMMs) . . . . . . . . . . . . . . . 73 3.1.1 Mixed-e?ects logistic models . . . . . . . . . . . . . . . . . . . 74 3.1.2 Mixed-e?ects Poisson models . . . . . . . . . . . . . . . . . . 76 3.2 Proof of the consistency of MLE for GLMMs . . . . . . . . . . . . . . 77 3.3 Goodness of ?t test for GLMMs . . . . . . . . . . . . . . . . . . . . . 81 3.3.1 Derivation of the power of T . . . . . . . . . . . . . . . . . . . 85 3.4 Simulations for logistic mixed models . . . . . . . . . . . . . . . . . . 88 3.4.1 Computational issues and code checking . . . . . . . . . . . . 89 3.4.2 Checking the size of the test in simulations . . . . . . . . . . . 91 3.4.3 Simulations to assess empirical power of the test . . . . . . . . 92 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.6 Technical details for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . 95 3.6.1 Checking B.2 in Assumption 3.1 for logistic mixed model . . . 95 3.6.2 Checking B.3 in Assumption 3.1 for Logistic mixed model . . 97 3.6.3 Checking B.3 in Assumption 3.1 for Poisson mixed model . . . 99 3.6.4 Simpli?cation of ?? in equation (3.11) . . . . . . . . . . . . . . 102 4 Discussion and further research 105 4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.2 Further research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5 Appendix: Three general lemmas 108 Bibliography 111 vi List of Tables 2.1 Empirical size of the test under di?erent levels (LMM) . . . . . . . . . 32 2.2 Empirical size of the test under di?erent cell partitions (LMM). . . . . . . 33 2.3 Power and robustness study for Scenario I (LMM). . . . . . . . . . . . . 34 2.4 Impact of cell partition on empirical power for Scenario I (LMM). . . . . . 35 2.5 Empirical size of the test for Scenario II under di?erent cell partitions with standard deviations in brackets (LMM). . . . . . . . . . . . . . . . . . . 39 2.6 Impact of cell partition on empirical power for Scenario II (LMM). . . . . 39 3.1 Empirical size of the test under di?erent levels (logistic mixed model). . 92 3.2 Empirical size of the test under di?erent cell partitions (logistic mixed model). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.3 Impact of cell partition on empirical power I (logistic mixed model). . . . 93 3.4 Impact of cell partition on empirical power II (logistic mixed model). . . . 94 vii List of Figures 2.1 The impact of (?2a,?2?) on analytical power (Scenario I, LMM) . . . . 29 2.2 The impact of (?13,?23) on analytical power (Scenario I, LMM) . . . 29 2.3 Analytical power plot for ?31 = ?32 = 0, x-axis is ?3/(?2a +?2?). Case 1: ?2a = .25,?2? = 1; Case 2: ?2a = .625,?2? = .625; Case 3: ?2a = 1,?2? = .25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Empirical and Asymptotic distribution of the test statistic 2.13 in Scenario I with correlations ?13 = ?23 = 0. . . . . . . . . . . . . . . . 32 2.5 Empirical power vs summary parameter ? . . . . . . . . . . . . . . . 38 2.6 Theimpactof(?2a,?2?)onanalyticalpower, ?12 = 0, x-axisis?3/(?2a +?2?). Case 1: ?2a = .25,?2? = 1; Case 2: ?2a = .625,?2? = .625; Case 3: ?2a = 1,?2? = .25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.7 Normality assumption checking for birth weight data. . . . . . . . . . 41 2.8 Normality checking for residuals of ?nal model for birth weight data . 42 2.9 Normality checking for log(TG) for Alcohol data. . . . . . . . . . . . 43 2.10 Normality checking for residuals of the ?nal model for Alcohol data . 44 3.1 Histogram of 1000 test statistics for logistic mixed model . . . . . . . 91 viii List of Notation and Abbreviations ? a space of outcomes of an experiment P a probability measure on ? R the real line, (??, ?) R+ the non-negative real line, [0,?) Z+ the set of positive integer numbers I{A} indicator function, where A is either a set or an event de?ned by a property enclosed in brackets, de?ned in terms of a random variable N(?,?2) Normal distribution with mean ? and variance ?2 E(X) mean of the random variable X Var(X) variance of the random variable X ?{?} the ?-algebra generated by random variables in {?} E(Y|X) the conditional expectation of Y w.r.t. a ?-algebra generated by X Var(Y|X) the conditional variance of Y given X i.i.d. independent and identically distributed A ? B A?B P? 0 d(?,?0) Euclidean distance ????0?2 between ? and ?0 aT transpose of the vector or matrix a IN N ?N identity matrix 1n a n?1 vector of all 1s a?2 Kronecker product of vector a, aaT 1?2n the n?n matrix of all 1s A?1 inverse of matrix A tr(A) trace of matrix A, de?ned to be the sum of the diagonal elements of matrix A ?A? Euclidean norm of matrix A P? convergence in probability D? convergence in distribution op(1) a random variable sequence Xn = op(1) means Xn P? 0 ix Chapter 1 Introduction 1.1 Background The linear mixed model (LMM) (McCulloch and Searle, 2001) extends the standard linear regression model by including random e?ects in addition to the usual ?xed e?ects in the linear predictors. LMMs can be expressed as Y = X?+Z?+?, where Y is a vector of observations, X is a matrix of known covariates,? is a vector of unknown ?xed regression coe?cients which are called ?xed e?ects, Z is a known matrix, ? is a vector of unknown random e?ects and ? is a vector of unobservable random errors. By incorporating random e?ects, LMMs can accommodate clus- tered or correlated or longitudinal data. For example, in medical studies, various measurements are often collected from the same individual over time. It is then reasonable to assume that the observations for the same individual are correlated. Examples of applying LMM to longitudinal data can be found for example in Laird and Ware (1982), Weiss (2005, Chapter 9) and Lee et al. (2006). Generalized linear mixed models (GLMMs) further extend the LMM family to discrete or categorical exponential family data. Examples include logistic mixed models for binomial data and Poisson mixed models for count data. Data examples of GLMMs are given in McCullagh and Nelder (1989, Section 14.5) and GLMMs have numerous appli- cations in medical research and the survey area. Jiang and Lahiri (2006) gave a 1 very good general literature review of prediction based on GLMMs, with particular application to small-area estimation. There are various methods of estimating GLMM parameters. The method of maximum likelihood is widely used. A full maximum likelihood analysis requires numerical integration techniques to calculate the log-likelihood function and thus the distribution of the random e?ects needs to be known. Jiang (1998b) proposed estimating equations that apply to GLMMs not necessarily having a block-diagonal covariance matrix structure. Jiang (1999) proposed a method of inference which in many ways resembles the method of least squares in linear models and relies on weak distributional assumptions about random e?ects. In this thesis, we focus on the maximum likelihood method for parameter estimation in GLMMs. Developments in model ?tting algorithms and their implementations in statis- tical packages have greatly facilitated the applications of LMMs and GLMMs. The commonly used functions for mixed modeling in the statistical software package SAS, version 9.2, are PROC MIXED, PROC NLMIXED and PROC GLIMMIX. The commonly used functions for mixed modeling in R, version 2.11.1, are ? linear mixed models: aov(), lme() in library(nlme), lmer() in library(lme4); ? generalized linear mixed models: glmmPQL() in library(MASS), glmer() in library(lme4), MCMCglmm() in library(MCMCglmm). Two important steps in modeling are selecting a model and checking its ?t. Fre- quently, model selection is done by comparing nested models, via likelihood ratio, wald or score tests, as part of model building and there are approaches for com- 2 paring non-nested models (Cox, 1961; Godfrey, 1988). AIC (Akaike?s information criterion), BIC (Bayesian information criterion) and other model selection principles (Rao and Wu, 1989; Shao, 1997) focus on selection of covariates. Rao, Wu et al. (2001) gave a concise review on the subject of the statistical model selection. These methods select the best statistical model from a set of potential models chosen by the researcher, given the observed data. Even though the ?nally selected model may be the best in the class of potential models, it might still not provide a good ?t to the data. Thus once a model is selected, its ?t should be assessed. This is often done by checking residuals and formal goodness of ?t tests. There are various diagnostics and graphical techniques in assessing goodness of ?t of models. Lin, Wei and Ying (2002) developed objective and informative model-checking techniques for a variety of statistical models and data structures, including generalized linear models with independent or dependent observations, by taking the cumulative sums of residuals over certain coordinates. Lange and Ryan (1989) described a graphical method for checking distributional assumptions about the random e?ects in random e?ects models. Park and Lee (2004) proposed residual plots to investigate the goodness of ?t for repeated measure data, where they mainly focus on the mean model diagnostics. Jacqmin-Gadda et al. (2007) discussed the ?t of a linear mixed model through Cholesky residuals and conditional residuals. Pan and Lin (2005) developed graphical and numerical methods for checking the adequacy of generalized linear mixed models, by comparing the cumulative sums of residuals with certain Gaussian processes. Regarding formal tests for the model adequacy, goodness of ?t tests for generalized linear models for ?xed e?ects can be found in Chapter 4 3 in Agresti (2002). However, the literature for formally assessing the overall ?t for GLMMs is limited. Some procedures to assess model misspeci?cation have been proposed. Testing for the presence of random e?ects in LMMs has been discussed by Self and Liang (1987) and by Crainiceanu and Ruppert (2004). Jiang (2001) and Ritz (2004) assessed the distributional assumptions for the random e?ects in LMMs. Claeskens and Hart (2009) proposed formal tests for testing the normality of random e?ects and/or error terms in LMMs. Khuri, Mathew and Sinha (1998) presented derivations of both exact and optimal tests regarding variance component models, as well as guidance on using such tests for hypothesis testing for the ?xed e?ect part. These are separate tools for checking ?xed-e?ect speci?cation or for separately checking the residuals or forms of the random e?ect speci?cation. Testing the overall adequacy of a proposed model has been discussed in the literature for several types of models with ?xed e?ects. Tsiatis (1980) proposed a goodness-of-?t test to test the overall ?t of a logistic regression model. The test is originally established based on the e?cient scores test and after simpli?cation, it is reduced to a quadratic form of observed counts minus the expected counts in regions of the covariate space. However, for logistic mixed models, with the presence of random e?ects, this e?cient scores test can not be simpli?ed as a quadratic form of observed counts minus the expected counts because of the integrals involved in the likelihood function. For survival data, Schoenfeld (1980) presented a class of omnibus chi-squared goodness of ?t tests for the proportional hazards regression model. Slud and Kedem (1994) adapted the idea of Schoenfeld to generalized linear time series models and discussed ?xed e?ects binary-response models with time- 4 dependent covariates. Kedem and Fokianos (2002) extended this approach to various other goodness of ?t tests based on categorical time series residuals. In this thesis, we adopt the idea of Schoenfeld (1980) and develop a class of goodness of ?t tests for GLMMs by comparing the observed and expected values computed from the model within cells of a partition of the covariate space. This class of goodness of ?t tests primarily assesses the adequacy of ?t of the ?xed e?ects part in the presence of random e?ects. 1.2 Overview of thesis In Chapter 2, we present the linear mixed models (LMMs). We adopt the idea of Schoenfeld (1980) and propose a class of goodness of ?t tests for testing the statistical adequacy of the selected LMM. We study two classes of LMMs, the general LMM with additive random e?ects Y = X?+?Rr=1 Zr?r +?, where the random e?ects?r,r = 1,...,R are normally distributed and in a moderate to large sample setting, the two-level LMM yij = xTij? + ?i + ?ij, that is, the LMM with one random intercept, where no distributional assumption is made on the random e?ect ?i or the error term ?ij. For this two-level LMM, i = 1,...,m, m denotes the number of clusters, and j = 1,...,ni, ni denotes the size of cluster i. To deal with technical issues, the covariate matrix X is is assumed in di?erent settings to be a matrix either of ?xed constants or of random variables. We propose a test statistic based on di?erences between observations and their expected values computed under the model aggregated over cells of a partition of the covariate space. 5 We ?rst discuss assumptions needed, and then derive the asymptotic properties of this test statistic as the total number of observations N tends to in?nity under the null hypothesis and under local alternatives. For the two-level LMM, N = ?m i=1 ni = ( ?m i=1 ni/m)m. Under the assumption of the existence of ?m i=1 ni/m, N tending to in?nity is equivalent to m tending to in?nity. For the general LMM with additive normal random e?ects, we estimate parameters using maximum likelihood estimators (MLEs) and assume that the covariate matrix X is ?xed and nonrandom. For the two-level LMM with no distributional assumptions made on the random e?ect or the error term, we assume that (xi,ni), where i is the cluster index, are i.i.d random vectors and estimate the parameters using least squares and method of moments. In Chapter 2, we also check the theoretical power in simulations, and study the impact of choice of cell partitions on the test as well as the robustness of the test with respect to the error distribution. We illustrate this test in three real datasets. In Chapter 3, we extend the test to random intercept generalized linear mixed models (GLMMs) and derive its theoretical power. To do that, we also prove the MLE consistency of GLMMs under certain assumptions. The covariate matrix X in this Chapter is considered to be random variables. Again, we conduct simulations to assess factors with an impact on the power of the test in GLMMs. In Chapter 4, we discuss the applicability of the results and point to future research directions. There is a separate technical appendix at the end of each chapter. Appendix A contains three lemmas cited in this thesis. 6 Chapter 2 Goodness of ?t tests for linear mixed models 2.1 Linear mixed models (LMMs) A linear mixed model (LMM) has the form Y = X?+Zu+?, (2.1) where YN?1 is the vector of observations; XN?p is the design matrix for the ?xed e?ects part of the model;?is a p?1 vector of unknown ?xed e?ects parameters; u is a vector of random e?ects and?is a vector of errors. Typically u and?are assumed to be independent of each other and each independently normally distributed with mean 0 and unknown variances. In this thesis, we only consider the LMM (2.1) with Z?=?Rr=1 Zr?r, i.e. Y = X?+ R? r=1 Zr?r +?. (2.2) Here Zr, an N ?mr matrix of constants, is the design matrix for the random e?ect ?r, r = 1,...,R. The quantity ?r is an mr ?1 random vector, r = 1,...,R. Also, components of ?r are i.i.d. within the vector, ?1,...,?R are independent and are also independent of ?. In this thesis, ?1,...,?R are always assumed normal 7 except for the 2-level LMM case discussed in Section 2.2.2 where no distributional assumptions are made on either the random e?ect or the error term. Let ? = (?2?,?21,...,?2R), the parameter vector of all variance components and let? = (?,?). Let Gr = ZrZTr , r = 1,...,R and G0 = IN. The X matrix can be ?xed or random. To deal with technical issues, X is considered to be ?xed in Section 2.2.1 and to be random in Section 2.2.2. As an important case of model (2.2), we now introduce the linear mixed model with a single random e?ect: yij = xTij?+?i +?ij, i = 1,...,m, j = 1,...,ni, (2.3) where the 1?p vector xTij = (1,xij1,...,xij(p?1)) denotes covariates for ?xed e?ects for the jth observation within the ith cluster. The cluster speci?c random e?ects ?i ? N(0,?2a) are assumed to be independent of the error terms ?ij, ?ij ? N(0,?2?). To accommodate an intercept term in the model, the ?rst entry in xij is 1. We write N = ?mi=1 ni and yi = (yi1,...,yini) denotes the vector of observations for the ith cluster. Under these assumptions, Y ? N(X?,V), with a block diagonal covariance matrix V, where each of the m ni ?ni blocks has the structure Vi = ? ?? ?? ?? ? ?2a +?2? ??? ?2a ... ... ... ?2a ??? ?2a +?2? ? ?? ?? ?? ? ni?ni . (2.4) 8 For asymptotic analysis of the LMM model (2.3), we always assume that m goes to in?nity, thus N also goes to in?nity. The following assumptions are made on model (2.2) throughout Chapter 2. Assumption 2.1 The true parameter point ?0 = (?0,?0) is an interior point of = (Rp,(R+)R+1). For the 2-level LMM (2.3), R = 1. Assumption 2.2 The covariate matrix X can be either ?xed or random. If X is assumed to be ?xed, then it is assumed to have full rank p. If X is assumed to be a matrix of random variables which is the 2-level LMM (2.3) discussed in Section 2.2.2, then (xi,ni) are assumed i.i.d. with ?E(x ?2 i )? < ? and E(n 2 i) < ?. Assumption 2.3 When the covariate matrix X is considered to be ?xed, we al- ways assume that, with E1,...,El being a cell partition of the covariate space, for l = 1,...,L, N?1?Nk=1 I{xk?El}xTk exists. Remark 2.1 For the 2-level LMM with ?xed covariate matrix X, Assumption 2.3 is equivalent to the existence of N?1?mi=1?nij=1 I{xij?El}xTij. When the covariate matrix X is considered to be random, the existence of N?1?Nk=1 I{xk?El}xTk is ensured by Assumption 2.2. 2 9 2.2 Goodness of ?t test statistic 2.2.1 Test statistic and its asymptotic properties for LMMs when parameters are estimated by maximum likelihood 2.2.1.1 LMM with a single random e?ect In this Section, we discuss the 2-level LMM (2.3) with normality assumptions on both the random e?ect and the error term. The covariate matrix X is considered to be ?xed. We derive our test statistic for the setting where the model parameters ? = (?,?) = (?,?2a,?2?) are estimated by the maximum likelihood. Since we assume normality both for the random intercept term ?i and for the error term ?ij, we can use the result of Miller (1977) to show the consistency and asymptotic normality of the maximum likelihood estimator (MLE) ??. The following assumptions are made for the two-level LMM. Assumption 2.4 J?? = limN??XTV?1X/N exists and is positive de?nite; Here the positive de?niteness assumption for J?? is equivalent to the assumption that X has full rank. Assumption 2.5 The 2?2 matrix M with elements de?ned below exists and is positive de?nite; [M]st = 12 lim N?? (trV?1G sV?1Gt )/N, s,t = 0,1, where G0 = I is the N ?N identity matrix and G1 = 1?2 is the N ?N matrix of 10 all 1s. Under Assumptions 2.1 ? 2.5, based on Miller (1977), the maximum likelihood estimators (MLEs) for model (2.3) exist and are consistent. To test the goodness of ?t of the LMM (2.3), we ?rst divide the covariate space into L disjoint regions E1, ..., EL. We compute the observed and expected sums in each region El as fl = m? i=1 ni? j=1 I{xij?El}yij, (2.5) el(?) = m? i=1 ni? j=1 I{xij?El}E(yij) = m? i=1 ni? j=1 I{xij?El}xTij?, (2.6) where I denotes the indicator function. Remark 2.2 The cell partition can also be based on covariates not included in model (2.3). In this case, if we let xij denote the vector of all available covariates and x?ij only includes covariates used in the regression, then we would de?ne el(??) = ?m i=1 ?ni j=1 I{xij?El}(x ? ij) T??, where ?? corresponds to the coe?cients of x? ij. But no matter which kind of cell partition we choose, we employ the expressions in (2.5) and (2.6) in the whole thesis for notational simplicity. 2 With the notation f = (f1,...,fL) and e(?) = (e1,...,eL), the observed minus the expected vector is f?e(?0) = ? ?? ?? ?? ? ?m i=1 ?ni j=1 I{xij?E1}(yij ?x T ij?0) ... ?m i=1 ?ni j=1 I{xij?EL}(yij ?x T ij?0) ? ?? ?? ?? ? . (2.7) 11 Since the true parameter vector?0 is unknown, we replace?0 in (2.7) by its MLE ?? for Theorem 2.3. The proof of this Theorem is given in Section 2.6.1. The following assumption is made to ensure the existence of components of the variance covariance matrix for the test statistic. Assumption 2.6 limK??limsupm?? 1m ?mi=1 I{n2i?K} n2i ? 0. Theorem 2.3 For model (2.3), let E1,...,EL constitute a disjoint partition of the covariate space generated by X. Under Assumptions 2.1 ? 2.6, as N ??, ?N ? ?? ? (f ?e(?0))/N ????0 ? ?? ? D? N(0,DVDT), where with the (L+p)?N matrix D given in equation (2.29), DVDT = ? ?? ? H J?1?? J?1?? T J?1?? ? ?? ? (L+p)?(L+p) . (2.8) The entries of H, and J?? are given below Hlk = ?2a lim N?? 1 N m? i=1 [( n i? j=1 I{xij?Ek} )( n i? j=1 I{xij?El} )] , (2.9) Hll = ?2? lim N?? 1 N m? i=1 ni? j=1 I{xij?El} +?2a 1N m? i=1 ( n i? j=1 I{xij?El} )2 , (2.10) l = lim N?? 1 N m? i=1 ni? j=1 I{xij?El}xTij, (2.11) J?? = lim N?? XTV?1X/N. (2.12) 12 Consistent estimators for these quantities are given in Corollary 2.5 below. Remark 2.4 Assumption 2.3 ensures the existence of in (2.8). Assumptions 2.6 and 2.4 ensure the existence of H, the limiting variance covariance matrix for (f?e(?0))/N, and J?? in (2.8). 2 Corollary 2.5 Consistent estimators for elements in the block matrix DVDT in Theorem 2.3 are: ?Hlk = ??2a 1 N m? i=1 [( n i? j=1 I{xij?Ek} )( n i? j=1 I{xij?El} )] , ?Hll = ??2? 1 N m? i=1 ni? j=1 I{xij?El} + ??2a 1N m? i=1 ( n i? j=1 I{xij?El} )2 , ? l = 1 N m? i=1 ni? j=1 I{xij?El}xTij, ?I?? = XT ?V?1X/N, where ?Hlk, ?Hll are estimators for o?-diagonal and diagonal elements of H, and ? l estimates the l-th row of . Remark 2.6 If X is random, under the more restrictive assumption that xij,i = 1,...,m;j = 1,...,ni are i.i.d. and are independent of ni, the diagonal and o?- diagonal elements of H given in (2.9) and (2.10) are Hlk = ?2a ? E(n 2 1 ?n1) E(n1) P (x11 ? El)P (x11 ? Ek), ? l ?= k and Hll = (?2? +?2a)P(x11 ? El)+?2aE(n 2 1 ?n1) E(n1) [P(x11 ? El)] 2 . 13 In this case, another way of estimating the diagonal and o?-diagonal elements of H is to use ?Hlk = ??2a ?E(n21)? ?E(n1)? E(n1) ?P(x11 ? El)?P(x11 ? Ek) and ?Hll = (??2? + ??2a)?P(x11 ? El) + ??2a ?E(n21 ?n1)? E(n1) [? P(x11 ? El) ]2 , where ?E(n21) =?mi=1 n2i/m, ?E(n1) =?mi=1 ni/m and ?P(x11 ? El) =?mi=1?nij=1 I{x ij?El}/N. In the R code for simulations and data anal- ysis below, we use the estimators in Corollary 2.5 to estimate H and in calculating the theoretical power in the analytical power study Section, the estimators in this Remark are applied. 2 Corollary 2.7 Under Assumptions 2.1 ? 2.6, (f ? e(??))/?N D? N(0, ), where = H? J?1?? T is an L?L matrix and can be replaced by its consistent estimator ? 0 = ?H? ? ?J?1?? ? T based on Corollay 2.5. We compute Singular Value Decomposition for ? 0. For each eigenvalue of ? 0, we compare it with a small preset threshold, such as 10?4 ?. For any eigenvalue less than ?, we instead set this eigenvalue to be 0 and reconstruct the ? 0 matrix using the non-zero eigenvalues and their corresponding eigenvectors. We denote this reconstructed matrix as ? . Based on Corollary 5.3 given in the Appendix, P(rank (? ) = rank ( )) ? 1. 14 Our goodness of ?t test statistic is then given by the following quadratic form T = 1N(f?e(??))T ? ?1(f?e(??)), (2.13) where ? ?1 denotes the Moore-Penrose pseudoinverse of ? . Under the null hypoth- esis, T has an asymptotic central ?2k distribution, where k = rank(? ) = rank( ) for large N, based on Corollary 5.3. 2.2.1.2 LMM with additive random e?ects We consider next the LMM with additive random e?ects (2.2), that is, Y = X?+ R? r=1 Zr?r +?. (2.14) The covariate matrix X is considered to be ?xed numbers in this Section. This is model (1) in Miller (1977). We ?rst state and comment on the assumptions Miller (1977) made to ensure consistency and asymptotic normality of the MLE for ? in (2.2). Assumption 2.7 A.1 The partitioned matrix [X : Zr] has rank greater than p, r = 1,...,R. A.2 The matrices G0,G1,...,GR are linearly independent; that is, ?Rr=0 ?rGr = 0 implies ?r = 0, r = 0,1,...,R. A.3 N and each mr, r = 1,...,R, tend to in?nity. A.4 Let m0 = N. Then for each s,t = 0,1,...,R, either limN??ms/mt = ?st or 15 limN??mt/ms = ?ts exists. If ?st = 0, then let ?ts = ? for notational convenience. Without loss of generality, let Zr be labeled so that for s < t, ?st > 0; i.e., the mr are in decreasing order of magnitude. Generate a partition of the integers 0,1,...,R, S0,S1,...,Sc, so that for indices r in the same set Ss, the associated mr?s have the same order of magnitude. Such a partition is generated as follows: i) r0 = 0; S0 = {0}; r1 = 1. ii) For s = 1,2,..., it is true that rs ? Ss. Then for r = rs+1,rs+2,..., include r in Ss until ?rs,r = ?; call the ?rst value of r where this occurs rs+1; then rs+1 ? Ss+1. iii) Continue as in step ii until R has been placed in a set. Call this set Sc. There are then c+ 1 sets in partitions, S0,S1,...,Sc, and Ss = {rs,...,rs+1 ?1}. For each r = 1,2,...,R, r ? Ss for some s = 1,2,...,c. De?ne sequences Kr (depending on N) as follows: Kr = rank[Zrs : Zrs+1 : ??? : ZR] ? rank[Zrs : ??? : Zr?1 : Zr+1 : ??? : ZR], r = 1,2,...,R, K0 = N ?rank[Z1,...,ZR]. (The Kr so de?ned are closely related to the degrees of freedom of sums of squares in the analysis of variance.) A.5 Each of the limN?Kr/mr, r = 1,...,R exists and is positive. Let V0 =?Rr=1 ?2rZr be the true covariance matrix. A.6 There exists a sequence KR+1 (depending on N) increasing to in?nity such that the p?p matrix C0 de?ned by C0 = limN??[X?V?10 X]/KR+1 exists and is positive de?nite. 16 De?ne the (R + 1)?(R + 1) matrix C1 by [C1]st = 12 lim N?? [trV?10 GsV?10 Gt]/K12s K12t , s,t = 0,1,...,R. A.7 Each of the limits used in de?ning [C1]st exists, s,t = 0,1,...,R. The matrix C1 is positive de?nite. Remark 2.8 Assumption A.1 requires that the ?xed e?ects not be confounded with any of the random e?ects. A.2 requires that the random e?ects not be confounded with each other. For the LMM (2.3) or hierarchical linear mixed models with nested blocks such as (2.18), the matrices Zr,r = 1,...,R, consist only of 0?s or 1?s and satisfy A.1?A.2. Assumptions A.1?A.2 are su?cient to guarantee identi?ability of the MLE ??. Assumptions A.3?A.7, which correspond to Assumptions 3.1?3.5 in Miller (1977), are used to ensure the consistency of the MLE. Assumption A.3 is natural and necessary for the consistency property of MLE estimators of both ? and the variance components ?2? and ?2r,r = 1,...,R, because the sample size used to estimate ? and ?2? is N and the sample size used to estimate ?2r is mr. Assumptions A.6?A.7 are used to establish the existence and positive de?niteness of the limiting variance-covariance matrix of the MLE ??. 2 InadditiontoAssumption2.7, takenfromMiller(1977), wealsorequirethefollowing Assumption to ensure the existence of components in the variance covariance matrix for the test statistic. 17 Assumption 2.8 H = limN??FVF? exists and is positive de?nite, with F given in (2.16). Remark 2.9 Assumption 2.3 ensures the existence of in (2.17). Assumption 2.8 ensures the existence and positive de?niteness of H in . H is the limiting variance covariance matrix for (f?e(?0))/N, which involves empirical moments of the covariates and empirical moments of cluster sizes at di?erent levels. At the end of this Section, we give speci?c forms of H and ways of estimating H for the special case of 3-level LMM (2.18). 2 We next state our goodness of ?t test for model (2.2) in Theorem 2.10 below. The details of the proof are given in Section 2.6.3. The covariate space is divided into L disjoint regions E1,...,EL. This partition can also be based on covariates not included in model (2.2). As discussed in Remark 2.2 in Section 2.2.1.1, for notational simplicity, the following notation applies whether or not the cell partition is based only on covariates in the model. For l = 1,2,...,L, de?ne fl = N? k=1 I{xk?El}yk, el(?) = N? k=1 I{xk?El}E(yk) = N? k=1 I{xk?El}xTk?. Let f = (f1,...,fL), e(?) = (e1(?),...,eL(?)). Theorem 2.10 For model (2.2), let E1,...,EL constitute a disjoint partition of the 18 covariate space generated by X. Under Assumptions 2.1 , 2.2 and 2.7 ? 2.8, (f?e(??))T ? ?1(f?e(??))/?N D? ?2k, (2.15) where ?? is the MLE, ? is the reconstructed matrix by applying Singular Value De- composition on a consistent estimator of = H? J?1?? T, ? ?1 denotes the Moore- Penrose pseudoinverse of ? , k = rank(? ) = rank( ). Here J?? = limN??XTV?1X/N, H = limN??FVFT, with F = 1?N ? ?? ?? ?? ? I{x1?E1}???I{xN?E1} ... I{x1?EL}???I{xN?EL} ? ?? ?? ?? ? (2.16) and = ? ?? ?? ?? ? ?T1 ... ?TL ? ?? ?? ?? ? L?p = lim N?? ? ?? ?? ?? ? 1 N ?N k=1 I{xk?E1}x T k ... 1 N ?N k=1 I{xk?EL}x T k ? ?? ?? ?? ? . (2.17) Remark 2.11 The detailed steps used in deriving the matrix ? with small-eigenvalue eigenspaces project to 0 are exactly analogous to those described after Corollary 2.7 in Section 2.2.1.1. For the special case of a 2-level LMM (2.3), an explicit form of H, which also follows from (2.16), and two consistent estimators were provided in Section 2.2.1.1 under the more restrictive assumption that xij,i = 1,...,m;j = 1,...,ni are i.i.d. random variables and are independent of ni. The explicit form 19 of H is also given for the 3-level LMM (2.18) in this Remark. Even if the alternate forms of the estimators for H and are used, we still need to do the small-eigenvalue thresholding in de?ning ? . For the 3-level hierarchical block nested LMM yijt = xTijt?+ui +vij +?ijt, i = 1,...,m; j = 1,...,ni; t = 1,...,nij, (2.18) the matrix V has the structure V = ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? ?? ?? ?? ? a ??? b ... ... ... b ??? a ? ?? ?? ?? ? ni1?ni1 ??? ? ?? ?? ?? ? ?21 ??? ?21 ... ... ... ?21 ??? ?21 ? ?? ?? ?? ? ni1?ni,ni1 ... ? ?? ?? ?? ? ?21 ??? ?21 ... ... ... ?21 ??? ?21 ? ?? ?? ?? ? ni1?ni,ni1 ??? ? ?? ?? ?? ? a ??? b ... ... ... b ??? a ? ?? ?? ?? ? ni,ni1?ni,ni1 ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? , where ?21 = Var(ui),?22 = Var(vij),?20 = Var(?ijt),a = ?20 + ?21 + ?22,b = ?21 + ?22. In this case, ? l ?= k, the o?-diagonal elements of H are Hlk = ?21 lim N?? 1 N m? i=1 [( n i? j=1 nij? h=1 I{xijh?El} )( n i? j=1 nij? h=1 I{xijh?Ek} )] + ?22 lim N?? 1 N m? i=1 ni? j=1 (n ij? h=1 I{xijh?El} )(n ij? h=1 I{xijh?Ek} ) . 20 For l = 1,??? ,L, the diagonal elements of H are Hll = ?20 lim N?? 1 N m? i=1 ni? j=1 nij? k=1 I{xijk?El} +?21 lim N?? 1 N m? i=1 ( n i? j=1 nij? k=1 I{xijk?El} )2 + ?22 lim N?? 1 N m? i=1 ni? j=1 (n ij? k=1 I{xijk?El} )2 . Similar to the 2-level LMM, under the assumption that the covariate vectors are i.i.d. random variables and the cluster sizes are independent of the covariate vectors, the Hlk and Hll here can be expressed as functions of moments of n1, n11 and x111. This can be done by applying Law of Large Numbers theory and by taking iterated conditional expectations ?rst conditioning on {n1,n11}, which is similar to what was done in (2.9), (2.10). Because of the complexity of these functions in the 3-level LMM model (2.18), we recommend directly estimating Hlk as ?Hlk = ??21 1 N m? i=1 [( n i? j=1 nij? k=1 I{xijk?El} )( n i? j=1 nij? k=1 I{xijk?Ek} )] + ??22 1N m? i=1 ni? j=1 (n ij? k=1 I{xijk?El} )(n ij? k=1 I{xijk?Ek} ) . This also applies to estimating Hll. 2 21 2.2.2 Test statistic and its asymptotic properties for two-level LMM with parameters estimated by least squares and method of mo- ments We consider the LMM (2.3), but now only require that E(?i) = E(?ij) = 0,Var(?i) = ?2a,Var(?ij) = ?2?, instead of assuming normality of ?i and ?ij. The covariate vectors xij are considered to be random variables in this Section and (xi,ni) are assumed to be i.i.d. This model, also called Nested Error Regression Model, is widely used and studied in small area estimation (Prasad and Rao, 1990). We estimate ? by the generalized least squares estimator ?? = (XT ?V?1X)?1(XT ?V?1Y) = (XTV?1X)?1(XTV?1Y) +op(1), as N ??, where V is a function of the variance components? = (?2a,?2?), which are estimated by the method of moments by equating the right-hand sides of E [ m? i=1 ni? j=1 (yij ? ?yi.)2 ] = m? i=1 ni? j=1 (xT ij?? ?x T i.? )2 +(N ?m)?2 ? (2.19) E [ m? i=1 ni? j=1 (?yi. ? ?y..)2 ] = m? i=1 ni(?xTi.?? ?xT..?)2 + ( N ? 1N m? i=1 n2i ) ?2a + (m?1)?2? (2.20) 22 respectively to their estimates SSW = m? i=1 ni? j=1 (yij ? ?yi.)2 and SSB = m? i=1 ni? j=1 (?yi. ? ?y..)2 = m? i=1 ni?y2i. ?N?y2.. respectively, where SSW is the Sum of Squares Within groups and SSB is the Sum of Squares Between groups in the analysis of variance. Because di?erent clusters (over index i) are independent and the three quantities?nij=1(yij??yi.)2, ni?y2i. and?nij=1 yij have ?nite second moments, SSW/m and SSB/m satisfy Laws of Large Numbers. Equations (2.19), (2.19) and (2.20) are estimating equations for the parameter vector ? = (?,?2a,?2?), which can be solved iteratively. The solutions of equations (2.19), (2.19) and (2.20) ??, is consistent. We ?rst divide the covariate space into L disjoint regions E1, ..., EL and compute the observed and expected values in each cell El as given in (2.5) and (2.6). We then state our goodness of ?t test in Theorem 2.12 below with details of the proof in Section 2.6.4. Theorem 2.12 For the LMM (2.3) with ?nite second moments for both ?i and ?ij, under Assumptions 2.1 and 2.2, (f?e(??))/?N D? N(0, ), where = H? J?1?? ?. Thus (f ? e(??))? ? ?1(f ? e(??))/N D? ?2k, where ? is the reconstructed matrix by applying Singular Value Decomposition on a consistent estimator of and k = rank(? ). 23 Based on Corollary 5.3, k = rank(? ) = rank( ) for large N. Detailed steps used in de?ning a reconstructed matrix ? with all eigenvalues lower-bounded away from 0 are exactly analogous to those following Corollary 2.7 in Section 2.2.1.1. The matrices J??, and H are the same as for the two-level LMM (2.3) where normality was assumed for both ?i and ?ij and MLEs are used, with formulas given in (2.9), (2.10), (2.11) and (2.12). 2.2.3 Power of the test For the multi-level LMM (2.2), we derive the theoretical power under local, and more speci?cally under contiguous alternatives for the test in (2.13) in the situation where some covariates that in?uence the outcome y have been omitted from model (2.2). Let X be the true N ? p covariate matrix and X? be a submatrix of X of dimension N ? p? used in model (2.2), with p? < p. Let the null hypothesis be H0 : ?N =?0. We assess the power of T under the alternative H1 :?N =?0 + a?N, (2.21) with ?0 = (?0,?0), where several components of ?0 are 0. We use the vector ??0 to denote the non-zero components of ?0. The indexing of ??0 as a sub-vector of ?0 corresponds to the same index subset as the columns of X? within X. Based on the derivation for Theorem 2.10, we have that under H0, 1? N(f?e( ???)) H0? N(0, ?), 24 where ? is the limiting variance covariance matrix of (f?e(???))/?N. By checking the condition (2.14) in Le Cam?s third lemma in Section 2.6.5, we ?nd that under the alternative hypothesis H1 in (2.21), 1? N(f?e( ???)) H1? N(?, ?), where ? = lim N?? { ? ?[(X?)TV ?1X?]?1[(X?)TV ?1X]}a, (2.22) with given by expression (2.17) and ? corresponds to the same expression, but computed using X? and ??. Thus under H1, T? has a limiting noncentral ?2 distribution T? = 1N[f?e(???)]T( ?)?1[f?e(???)] H1? ?2k(?), (2.23) where k = rank( ?) and the non centrality parameter is ? = ?T( ?)?1?. For a given type I error level ?, the power is thus P(T? > ?2k,?), where ?2k,? is the 1 ? ? quantile of the central ?2k distribution and P denotes the non central ?2k(?) distribution. We then substitute all parameter values in ? with their MLEs and reconstruct this consistent estimator of by applying Singular Value Decomposition as explained in the context after Corollary 2.7 in Section 2.2.1.1. With ? ? denoting the reconstructed matrix, ?T? = 1 N[f?e( ???)]T(? ?)?1[f?e(???)] H1? ?2k(??), (2.24) 25 where k = rank(? ) = rank( ) for large N, based on Corollary 5.3. We now compute ? and ? explicitly for two-level LMMs for the setting of three covariates xij = (xij1,xij2,xij3) that are from a multivariate normal distribu- tion (2.25) (as studied in the simulation Section 2.3.1), where xij,i = 1,...,m;j = 1,...,ni are i.i.d., and xij and ni are independent. We assume Y ? N(XT?,V), where X = (1,x1,x2,x3) and V is given in (2.4), but then omit x3 in ?tting the model, leading to X? = (1,x1,x2). Here a in ? is equal to (0,0,0,?3). With derivation details given in Sections 2.6.5.1 and 2.6.5.2, ? = lim N?? 1 N(X ?)TV?1X? = 1?2 ? ? ?? ?? ?? ? 1 Ex1 Ex2 Ex1 Ex21 E(x1x2) Ex2 E(x1x2) Ex22 ? ?? ?? ?? ? ? 1E(n)? 2 a ?2? ? ?? ?? ?? ? c1 c1Ex1 c1Ex2 c1Ex1 h1 h2 c1Ex2 h2 h3 ? ?? ?? ?? ? , where c1 = En[n2/(?2? +n?2a)], c2 = En[n/(?2? +n?2a)], c3 = c1 ?c2, h1 = c2Ex21 + c3(Ex1)2, h2 = c2E(x1x2)+c3Ex1Ex2 and h3 = c2Ex22 +c3(Ex2)2 . 26 And the component (X?)TV?1X in ? satis?es 1 N(X ?)TV?1X P? 1 ?2? ? ?? ?? ?? ? 1 Ex1 Ex2 Ex3 Ex1 Ex21 E(x1x2) E(x1x3) Ex2 E(x1x2) Ex22 E(x2x3) ? ?? ?? ?? ? ? 1E(n)? 2 a ?2? ? ?? ?? ?? ? c1 c1Ex1 c1Ex2 c1Ex3 c1Ex1 h1 h2 h4 c1Ex2 h2 h3 h5 ? ?? ?? ?? ? , where h4 = c2E(x1x3) +c3Ex1Ex3 and h5 = c2E(x2x3) +c3Ex2Ex3. When the cell partition E1,...,El is based on the omitted covariate x3, the elements of in ? are 1 N m? i=1 ni? j=1 I{xij,3?El} P?? ? El f3(x3)dx3 1 N m? i=1 ni? j=1 I{xij,3?El}xij,1 P?? ? x1 ? El x1f(x1,x3)(x1,x3)dx3dx1 1 N m? i=1 ni? j=1 I{xij,3?El}xij,3 P?? ? El x3f3(x3)dx3. With (x1,x2,x3) jointly normal, which is the Scenario I in Section 2.3.1, F3(x3) and f(x1,x3)(x1,x3) are the corresponding normal and bivariate normal densities. Based on the above quantities, we next study the impact of the magnitude of the variance components ?2a and ?2? and the correlations ?13 and ?23 in (2.25) between the omitted covariate x3 and the covariates in the model (x1 and x2) on 27 the theoretical power when the cell partition is based on quantiles of the omitted covariate x3 with L = 8 cells. For ?13 = .5 and ?23 = .6, Figure 2.1 plots the theoretical power against ?3/(?2a + ?2?) for three choices of (?2a,?2?) and varying ?3 on the x-axis. For any ?xed pair of (?2a,?2?), the power of the test not surprisingly increases as a function of ?3, the coe?cient of the omitted covariate x3. This ob- servation can be made by a Taylor Expansion to the theoretical power formula. Let G(x,?) = P?2k, ?([?2k, ?,?)) be the theoretical power, then by a Taylor Expansion to G(x,?) around ? = 0, we get G(x,?) ? G(x,?) + ???G(x,?) ?. Thus the theoretical power G(x,?) for ? close to zero is approximately a linear function of ? =?T( ?)?1?, which is a function of ?23. For any ?xed ?3, the power increases when the random e?ect ?2a decreases compared to the error term ?2?. Figure 2.2 plots the power for ?xed ?2a = 1, ?2? = .25 and di?erent choices of (?13,?23). It shows that the power of test increases as ?213+?223 decreases. When we set ?13 = 0 and ?23 = 0, that is, when x3 is correlated with neither x1 nor x2, we can see from Figure 2.3 that the power is not a?ected by the individual values of ?2a and ?2?, as long as ?2a +?2? is ?xed. 28 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ?3/(?a2+??2)1/2 ?a2=1; ??2=.25? a2=.25; ??2=1? a2=.625; ??2=.625 Figure 2.1: The impact of (?2a,?2?) on analytical power (Scenario I, LMM) 0.05 0.1 0.15 0.2 0.250.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ?3/(?a2+??2)1/2 ?13=0; ?23=0? 13=.1; ?23=.2? 13=.3; ?23=.4? 13=.05; ?23=.55 Figure 2.2: The impact of (?13,?23) on analytical power (Scenario I, LMM) 29 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.2 0.4 0.6 0.8 1.0 Analytic Power Plot (Scenerao I) x Analytic P ower Case 1Case 2 Case 3 Figure 2.3: Analytical power plot for ?31 = ?32 = 0, x-axis is ?3/(?2a +?2?). Case 1: ?2a = .25,?2? = 1; Case 2: ?2a = .625,?2? = .625; Case 3: ?2a = 1,?2? = .25. 2.3 Simulations 2.3.1 Normally distributed covariates (Scenario I) We simulate data from the following setting. For m = 500, we ?rst generate cluster sizes ni ? uniform on {2,3,4,5} and compute N =?mi=1 ni. Thus the total number of observations in each repetition is always around N = 1750. We then draw N covariates xij = (x1ij,x2ij,x3ij), ?xed in the sense that they are simulated only once for the entire simulation of K = 1000 repetitions, independently from a multivariate normal distribution, ? ?? ?? ?? ? x1 x2 x3 ? ?? ?? ?? ? ? N ? ?? ?? ?? ? ? ?? ?? ?? ? 0 0 0 ? ?? ?? ?? ? , ? ?? ?? ?? ? 1 0 ?13 0 1 ?23 ?13 ?23 1 ? ?? ?? ?? ? ? ?? ?? ?? ? . (2.25) 30 Given X = (1,x1,x2,x3) and parameters ?, ?a and ??, we generate Y from a multivariate normal distribution Y ? N(X??,V), where V is given in (2.4). We ?rst do simulations to show that our goodness of ?t test statistic (2.13) indeed has asymptotic ?2 distribution. We choose ?13 = ?23 = 0 and set true parameter values ?a = 1, ?? = .5, ? = (?0,?1,?2,?3) = (1,1,1,.25). We then ?t model (2.3) with all covariates that in?uence the response y. The number of cells L in the computation of T is twelve based on empirical quantiles of x1 and x2. With the number of iterations being 1000, we then have 1000 test statistic values N?1(f?e(??))? ? ?1(f?e(??)). Figure 2.4 gives the histogram of these 1000 independently calculated test statistics which turns out to be close to ?212, with p value from the Kolmogorov-Smirnov test being around .5 and p value from Pearson?s chi-square goodness of ?t test being around .27. The simulation result coincides with the theory we claim for (2.13). Let ? be the level of signi?cance. We show in Table 2.1 the empirical size of the test, which is the proportion of iterations that have p values less than or equal to ?. For example, the ?rst row of Table 2.1 says that 4.7% of the 1000 simulations have p-values less than or equal to .05. The third column is the standard deviation of the corresponding empirical size (ES), which is calculated as?ES(1?ES)/1000. We also check the size of the test under various choices of cell partition based on X. We choose ?13 = ?23 = 0 in (2.25) and let ?a = 1, ?? = .5, ? = (?0,?1,?2,?3) = (1,1,1,1), and ?t model (2.3) with all covariates X in the model. Cell partitions in the computation of T are always based on empirical quantiles of each generated covariate matrix in each repetition. For example, the ?rst row in Table 2.2 means 31 stat.ML prob 5 10 15 20 25 30 0.00 0.02 0.04 0.06 0.08 density of chisq, df=12 Hist of test.stat.ML Figure 2.4: Empirical and Asymptotic distribution of the test statistic 2.13 in Sce- nario I with correlations ?13 = ?23 = 0. Table 2.1: Empirical size of the test under di?erent levels (LMM) m = 500;E(N) = 1750; 3 = :25; 13 = 23 = 0;K = 1000. signi?cance level Empirical Size(ES) Standard Deviation of ES 0.05 0.047 0.0067 0.1 0.097 0.0094 0.2 0.189 0.0124 0.3 0.279 0.0142 0.4 0.385 0.0154 0.5 0.505 0.0158 0.6 0.604 0.0155 0.7 0.703 0.0144 0.8 0.797 0.0127 that in each of K = 1000 repetitions, the cell partition is based on the each generated x1 with number of cells being 8. The second row in Table 2.2 means that in each of K = 1000 repetitions, the cell partition is based on both the generated x1 and x2 using cross tabulation. Table 2.2 shows empirical sizes (Emp. Size) were all close to the nominal ? levels of 0.05 and 0.1 for all choices of cell partitions. 32 Table 2.2: Empirical size of the test under di?erent cell partitions (LMM). m = 500;E(N) = 1750; 3 = 1; 13 = 23 = 0;K = 1000. L Emp. Size Emp. Size 8 (x1) 0.05 0.056 0.1 0.097 3?4 (x1;x2) 0.05 0.058 0.1 0.109 5?4 (x1;x3) 0.05 0.048 0.1 0.088 6?7 (x2;x3) 0.05 0.050 0.1 0.097 To assess the power of the test, we ?t model (2.3) to the data without includ- ing x3 among the covariates. We then use a cell partition based on the empirical quantiles of the omitted x3 with L = 8 cells when the number of clusters m = 500 or 50. For m = 20, we use ?xed cell boundaries, based on theoretical quantiles of the distribution of x3, to divide x3 into L = 8 cells. We set (?13,?23) = (.5,.6), ?a = 1, ?? = .5, ? = (?0,?1,?2,?3) = (1,1,1,.25). For a given design matrix X, we compute the theoretical power (Theo.Pow.), the mean estimated theoretical power (Theo.Pow.hat), and the mean empirical power (Empi.Pow.n) for K = 1000 iterations. We compute the theoretical power of T? in (2.23) based on the asymp- totic ?2 distribution with the true values of the variance components and empirical moments for X used in the calculation of the non-centrality parameter (2.22). We compare these values to the estimated theoretical power, that is computed based on the asymptotic ?2 distribution with estimated variance components and empir- ical moments for X in (2.22). We repeat the power computations for D = 500 randomly generated matrices X. Table 2.3 shows means and variances of the 500 distinct power estimates (each based on K=1000 iterations) varying over the design matrices for m = 500,50 and m = 20 clusters. The theoretical power, the empirical power and the estimated theoretical power agree very well, even when m is small. 33 Table 2.3: Power and robustness study for Scenario I (LMM). L = 8;K = 1000;D = 500;( 13; 23) = (:5;:6); a = 1; ? = :5. Power m = 500;EN = 1750 m = 50;EN = 175 m = 20;EN = 70 3 = :25 mean stan.dev. mean stan.dev. mean stan.dev. Theo.Pow. .798 .0392 .122 .0236 .084 .0175 Theo.Pow.hat .796 .0383 .127 .0241 .089 .0184 Empi.Pow.n .796 .0388 .113 .0227 .062 .0181 Empi.Pow.t3 .797 .0383 .113 .0234 .061 .0184 Empi.Pow.t5 .797 .0375 .113 .0237 .061 .0180 Power m = 500;EN = 1750 m = 50;EN = 175 m = 20;EN = 70 3 = :8 mean stan.dev. mean stan.dev. mean stan.dev. Theo.Pow. 1 0 .853 .0967 .533 .1842 Theo.Pow.hat 1 0 .827 .0895 .506 .1443 Empi.Pow.n 1 0 .827 .0954 .439 .1586 Empi.Pow.t3 1 0 .828 .0942 .443 .1629 Empi.Pow.t5 1 .0001 .827 .0941 .440 .1591 However, only for m = 500 was there adequate power to detect lack of ?t when ?3 = 0.25, which was substantially smaller than the coe?cients ?1 = ?2 = 1 of x1, and x2, the covariates included in the model. When the e?ect of the omitted covariate was larger, ?3 = 0.8, the test statistic had approximately 80% power even for m = 50 clusters. 2.3.1.1 Impact of choice of the cell partition on power As is true for Pearson?s chi-square test, the choice of cell partition plays an important role for our goodness-of-?t test. We now illustrate the impact of the cell partition on the power of our test. We let ?13 = .5 and ?23 = 0 and set ? = (?0,?1,?2,?3) = (1,1,1,.15), ?a = 1 and ?? = .5, for di?erent choices of cell partitions. Again, we generate y from a model whose proper speci?cation includes x1, x2 and x3 but then analyze the 34 Table 2.4: Impact of cell partition on empirical power for Scenario I (LMM). m = 500; 3 = :15; a = 1; ? = :5;K = 1000. Parti 13 = 0; 23 = 0 13 = 0:2; 23 = 0:3 13 = 0:4; 23 = 0:5 L = 12 L = 42 L = 12 L = 42 L = 12 L = 42 x1 0.059 0.044 0.055 0.055 0.058 0.051 x2 0.052 0.044 0.044 0.046 0.041 0.053 x3 0.991 0.871 0.907 0.708 0.539 0.292 x1;x2 0.041 0.044 0.053 0.047 0.060 0.053 x1;x3 0.962 0.860 0.922 0.737 0.566 0.360 x2;x3 0.961 0.871 0.889 0.723 0.569 0.368 results of omitting x3 in the subsequent model ?tting. We choose six di?erent cell partitions: partitions based on only x1, only x2, only x3, both x1 and x2, both x1 and x3, or both x2 and x3. For all cell partitions, we use empirical quartiles based on data to divide the covariates. The number of replications in our simulation study is K = 1000. The results in Table 2.4 show that a lack of ?t is detecTable by our test statistic only when the cell partition involves the omitted covariate, x3. Similar results are observed when x3 is independent of x1 and x2. 2.3.1.2 Robustness of T with respect to error distribution In Table 2.3 we also assessed the impact of misspeci?cation of the error dis- tribution on the power of the test statistic. Using the same setting as in the power calculations given above, we generated ? from a t distribution with k = 3 or 5 de- gree of freedom instead of from a N(0,?2?). We rescaled the variance of ? so that Var(??tk?k ?2/?k) = ?2? to ensure that the noise had the same variance as in the normal case. The power of the test under a t-distribution is virtually the same as the power of the test with a normal error distribution. For example, for m = 50 35 the power was 0.83 for the normal error distribution and for t-distributions with 3 and 5 degrees of freedom (Table 2.3). Thus, by comparing the last three rows of Table 2.3, we can see that our test is very robust with respect to symmetric error distributions. 2.3.1.3 A summary parameter related to the power The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is not true. The rejection ratio in Table 1, which is de?ned as total number of iterations with p value < .05 over the total number of simulation iterations, can be considered as an estimator of the power for our goodness-of-?t test. We performed many simulations to show that the rejection ratio is closely related to a summary parameter ? which is de?ned as the ratio of two variances. Suppose the true model is yij = h(x1ij,x2ij,x3ij) + ?i + ?ij, where the true covariates that impact y are x1,x2 and x3. But we ?t the data only using covariates x1 and x2. Then the summary parameter ? = Var[h(x1,x2,x3)? ? x1,x2 h(x1,x2,x3)] Var(?i +?ij) , (2.26) where ?x1,x2 h(x1,x2,x3) = ?E(h(X1,X2,X3)|X1,X2) is the linear projection. One simulation scenario we considered is similar to the one we already ex- plained in Section 2.3.1, where both the random e?ect ?i and the error term ?ij are generated from the normal distribution. The true model is de?ned to have covariate 36 vector generated from the multivariate normal distribution: ? ?? ?? ?? ? x1 x2 x3 ? ?? ?? ?? ? ? N ? ?? ?? ?? ? ? ?? ?? ?? ? 12 2 24 ? ?? ?? ?? ? , ? ?? ?? ?? ? 1 0 .5 0 .5 .8 .5 .8 2 ? ?? ?? ?? ? ? ?? ?? ?? ? . Underthisjointdistribution, one veri?esthatx3|x1,x2 ?N(.5x1+1.6x2+14.8,0.47). But we ?t the data only using x1 and x2, omitting x3. Then the summary parameter (2.26) is simpli?ed to be ?23Var(x3|x1,x2)/Var(?) = .47?23/(?2a + ?2?). With the covariate set x ?xed for each iteration, we change the magnitude of the coe?cient vector ? and the two variance components ?2a and ?2?. For each set of parameters {?,?2a,?2?}, there is a corresponding summary parameter ?, which we plotted on the x-axis. We then ran simulations with 1000 iterations to get the corresponding rejection ratio, which we plotted on the y-axis. The simulation results in Figure 2.5 indicate that the empirical power of the test increases as the summary parameter ? increases and this is true for three di?erent cell partitions. As we can see from the graph, there is a nearly linear relationship between the estimated power and the de?ned summary parameter ? and we can get the slope from the Taylor expansion of the theoretical power around 0. We then compared what we found here with what we saw by using the analytic power formula for the special case when (x1,x2,x3) comes from a multivariate normal (2.25), the simulation scenario we carefully studied in Section 2.3.1. In this case, the numerator of ? in (2.26) becomes Var(x3|x1,x2) = 1 ? (?213 + ?223). Thus, 37 0.00 0.02 0.04 0.06 0.08 0.2 0.4 0.6 0.8 Empirical Power vs Summary Parameter Summary Parameter Empir ical P ower Cell Partition on x3Cell Partition on x1,x3 Cell Partition on x2,x3 Figure 2.5: Empirical power vs summary parameter ? ? = (1?(?213 + ?223))/(?2a + ?2?). This agrees with what we found from Figure 2.2, i.e. the power of test increases as ?213 + ?223 decreases. However, this ? doesn?t include other information we got in Section 2.3.1, such as the way that the power increases as ?3 increase and the relationship of the power to the relative magnitude of ?2a and ?2?, given ?xed ?2a +?2?. 2.3.2 Normally distributed interacting covariates (Scenario II) We again generatey from a linear model that depends on three covariatesx1,x2 and x3. However, we now let x1 and x2 arise from a bivariate normal distribution with mean (1,0.5), with variances ?21 = ?22 = 1 and covariance ?12 = 0, and de?ne x3 as their product, i.e. x3 = x1x2. Again, we choose m = 500 and generate the cluster size ni from a uniform distribution on {2,3,4,5}. We let ?a = 1, ?? = .5, 38 ? = (?0,?1,?2,?3) = (1,1,1,.2) and use empirical quantiles of the covariates to de?ne the cell partition. First, we ?t model (2.3) with all covariates (x1,x2,x3) that in?uence the response y and check the size of the test. Again, the empirical sizes were all close to the nominal ? level under di?erent choices of cell partitions, as shown in Table 2.5. We then ?t model (2.3) without x3 and study the power of Table 2.5: Empirical size of the test for Scenario II under di?erent cell partitions with standard deviations in brackets (LMM). m = 500;E(N) = 1750; 3 = :2; a = 1; ? = :5; 12 = 0;K = 1000. L Emp. Size Emp. Size 8(x1) 0.05 0.048 (0.0068) 0.1 0.102 (0.0096) 3?4(x1;x2) 0.05 0.045 (0.0066) 0.1 0.085 (0.0088) 5?4(x1;x3) 0.05 0.060 (0.0075) 0.1 0.098 (0.0094) 6?7(x2;x3) 0.05 0.051 (0.0070) 0.1 0.100 (0.0095) the test using di?erent cell partitions. Results in Table 2.6 indicate that the test has adequate power only when the cell partition is based on x1 and x2, or on the omitted interaction term x3 = x1x2, but not if the cell partition is based on either x1 or x2 alone, for ?12 = 0 and ?12 = .3. We next study the impact of the magnitude of the variance components ?2a and ?2? on the theoretical power when the cell partition is based on the omitted covariate Table 2.6: Impact of cell partition on empirical power for Scenario II (LMM). m = 500;E(N) = 1750; 3 = :2; a = 1; ? = :5;K = 1000. Partition 12 = 0 12 = 0:3 L = 12 L = 42 L = 12 L = 42 x1 0.049 0.049 0.256 0.182 x2 0.038 0.041 0.273 0.173 x3 0.893 0.771 0.859 0.749 x1;x2 0.991 0.966 0.989 0.975 x1;x3 0.843 0.938 0.885 0.939 x2;x3 0.936 0.912 0.956 0.928 39 0.05 0.10 0.15 0.20 0.2 0.4 0.6 0.8 1.0 Analytic Power Plot (Scenario II) x Analytic P ower Case 1Case 2 Case 3 Figure 2.6: The impact of (?2a,?2?) on analytical power, ?12 = 0, x-axis is ?3/(?2a +?2?). Case 1: ?2a = .25,?2? = 1; Case 2: ?2a = .625,?2? = .625; Case 3: ?2a = 1,?2? = .25. x3 with L = 8 cells using ?xed cell boundaries. For ?12 = 0, Figure 2.6 plots the theoretical power against ?3/(?2a+?2?) for three choice of (?2a,?2?) and varying ?3 on the x-axis. Our conclusions are consistent with what we saw in Section 2.3.1. For any ?xed pair of (?2a,?2?), the power of the test increases as a function of ?3, the coe?cient of the omitted covariate x3. For any ?xed ?3, the power increases when the random e?ect ?2a decreases compared to the error term ?2?. 2.4 Data examples 2.4.1 Birth weight data These data were obtained from the book [19] and consist of fetal birth weights of 432 boys from 108 families of size 4, and BMI of the mother, age of the mother, 40 0 40 80 2500 4500 Family index Birthweight(grams) (a) Birth weight in families ?3 ?1 1 2 3?1000 500 Theoretical Quantiles Residuals (b) Normal plot of res ?2 0 1 2 3000 4500 Theoretical Quantiles Family means(grams) (c) Family means qqnorm 20 30 2500 4500 BMI Birthweight(grams) (d) Birth weight vs BMI Figure 2.7: Normality assumption checking for birth weight data. birth order, gestation time and a family indicator. Lee et al. [19] analyzed this data set using a linear mixed model with only an intercept term. To use all covariates, we excluded individuals who had missing covariates and modi?ed possible recording errors on the covariate order. The complete data set consists of 104 families with 370 individuals. Family sizes di?ered for di?erent families, but all family sizes were four or less. We repeated the graphical analysis in Chapter 5 of [19] on the modi?ed data and got similar results. There is a strong familial e?ect of birth weight and the within-family variation is normally distributed. We conclude that the weight data satisfy assumptions for (2.3). We tried various covariate combinations for the ?xed part in (2.3), including interactions. Based on the signi?cance of the covariates and BIC (or AIC) criteria, the best model is the LMM with 3 covariates: intercept, mom.age and gestation. Residual plots further con?rm the normality assumptions in our LMM. We then apply the goodness of ?t test to our model. We use the cell partition based on the covariate order, which is not in our ?nal model, and obtain 41 ?3?11 3?1500 ?500 0 500 1500 Theoretical Quantiles Sample Quantiles (b) qqnorm: a_i+e_ij ?2 0 2 ?500 0 500 1000 Theoretical Quantiles Sample Quantiles (c) qqnorm: a_i+e_i. Figure 2.8: Normality checking for residuals of ?nal model for birth weight data the following number of observations in the cells: 61, 74, 79, 80, 31, 45. The value of the test statistic is T = 2.44, which corresponds to a p value of .87 for a chi- square distribution with 6 degrees of freedom. However, when we tested the ?t of the intercept only model in [19] using the same cell partition, we also found that it had adequate ?t to the data with T = 4.97. Thus both of these two models are judged adequate by our test. 2.4.2 Alcohol data These data come from a Women?s Alcohol Study [8], where 53 healthy, non- smoking postmenopausal women completed a random-order, three-period (8-week treatment for each period) study with a crossover design in which each woman re- ceived 0, 15 or 30 g of alcohol per day. Participants were not told the amount of alcohol they were consuming and each controlled feeding period was preceded by a two to ?ve week washout period during which time the participant consumed no alcohol. During the controlled feeding period, all food and beverages for the 42 ?2 ?1 0 1 2 4.0 4.5 5.0 5.5 6.0 Theoretical Quantiles Sample Quantiles QQ plot for log(TG) Figure 2.9: Normality checking for log(TG) for Alcohol data. participants were supplied by the study investigators. For each woman in the data set, three blood measurements for the three pe- riods are recorded, corresponding to the three randomized alcohol intake periods. We assessed the association of Plasma Triglycerides level (TG) with alcohol in- take. Other relevant covariates were race, age, height, weight and BMI (Body Mass Index). A log transformation of TG met the normality assumption by a formal goodness of ?t check on the response variable. We then tried all possible covariate combinations for the ?xed-e?ect terms of (2.3), including interactions. Assessed by the signi?cance of the covariates and BIC (or AIC) criteria, our ?nal model in- cluded an intercept, age, BMI and alcohol. The residual plots further con?rmed the normality assumptions in (2.3). To apply our goodness of ?t test on this model, we use the cell partition based on the covariates height and weight, which are not in our ?nal model, and obtain the following number of observations in the cells: 19,19,13,17,15,11,13,15,14,22. The value of the test statistic is T = 8.98, which corresponds to a p values of .53 for a chi-square distribution with 10 degrees of 43 ?2 0 1 2 ?0.5 0.0 0.5 Theoretical Quantiles Sample Quantiles qqnorm for a_i+e_ij ?2?10 1 2?0.6 ?0.4 ?0.2 0.0 0.2 0.4 0.6 0.8 Theoretical Quantiles Sample Quantiles qqnorm for a_i+e_i. Figure 2.10: Normality checking for residuals of the ?nal model for Alcohol data freedom. Thus our ?nal model ?tted well and we concluded, not surprisingly, that alcohol intake a?ects Plasma Triglycerides levels. 2.4.3 Factors impacting thyroglobulin levels in an iodine de?cient population On April 26, 1986, an accident at the Chernobyl power plant located in north- western Ukraine, close to the border with Belarus, released large amounts of radioac- tive materials including iodine-131 (I-131) into the atmosphere from the destroyed reactor. Deposition of these radioactive materials resulted in serious contamination of the territory and exposed its residents. Because the thyroid gland concentrates iodine, the doses to the thyroid due to consumption of I-131 contaminated milk were much greater than those to any other organs. The National Cancer Institute, NIH is involved in a cohort study in Belarusian individuals exposed to the accident [34]. While the main objective of the study is to evaluate the relationship between I-131 44 doses and risk of thyroid cancer, investigators were also interested in describing the levels of iodine in this population, as historical data suggest that the study area could be mildly iodine de?cient and iodine de?ciency impacts absorption of I-131. We therefore evaluated the relationship between levels of serum thyroglobulin (TG), a sensitive marker of iodine de?ciency, and patients? characteristics including age, sex, thyroid volume and other demographic and clinical variables that might re?ect or in?uence the intake of dietary iodine and were identi?ed in a initial screen of variables. We restrict our example to men from four of the ?ve study regions, who had complete information on the covariates. After excluding observations with TG > 80, log(TG) was normally distributed. The ?nal dataset was comprised of m = 933 individuals, of whom 404 had a single TG measurement, 484 had two, 42 three and 3 four TG measurements during follow-up, resulting in a total of N = 1510 observations. An initial screening of the variables, one at a time by simple linear regres- sion, indicated that age at time of exam, age at time of the accident, rural or urban residence, smoking status, urinary iodine levels, serum thyroid-stimulating hormone (TSH) levels, serum anti-thyroglobulin antibody (ATG) levels, thyroid volume, presence of thyroid nodules, presence of goiter and presence of any thyroid abnormality may impact TG levels. We ?t various of the LMM (2.3) model, including combinations of those co- variates and their interaction terms using PROC GLIMMIX (SAS 9.2). A model (Model 1) that included all the variables mentioned above, with the exception of 45 presence of nodules, and in addition included an interaction term of ATG levels with presence of any thyroid abnormality that was marginally signi?cant (Wald p- value p = 0.054), had a log-likelihood value of 1625.2. The random e?ect variance estimate for Model 1 was ?2a = 0.29 and the error variance estimate was ?2? = 0.25. A second model (Model 2) that had no interaction term, but included presence of nodules, resulted in a log-likelihood of 1621.3. The variance component estimates were similar to model 1, ?2a = 0.29 and ?2? = 0.27. However, as models 1 and 2 are not nested, we could not compare them using a likelihood ratio test. To assess the ?t of Models 1 and 2, we formed the cell partition for the test based on L = 8 cells de?ned by quantiles of ATG and presence of any thyroid abnormality. Based on a chi-squared test statistic with eight degrees of freedom, there was no indication of lack of ?t for either model, with corresponding p-values p = 0.32 and p = 0.40 for Models 1 and 2 respectively. We repeated the calculation of the test statistic for a second cell partition based on presence of nodules and presence of goiter, resulting in L = 4 cells, with corresponding p-values p = 0.19 and p = 0.70 for Models 1 and 2 respectively. These results suggested that both models provided an adequate ?t to the data. When we checked a third model, that included neither presence of nodules not the interaction term (log-likelihood 1621.5), there was also no evidence of lack of ?t based on eight (p = 0.39) or 4 degrees of freedom chi square tests (p = .21). However, with a model that included all the covariates, the interaction term of ATG levels with presence of any thyroid abnormality but excluded presence of goiter, we did detect a lack of ?t in the cell partition de?ned by presence of nodules and goiter, 46 with p = 0.006, while there was no evidence of lack of ?t based on the L = 8 cell partition (p = 0.28). This highlights the importance of presence of any thyroid abnormality in the model. However, omitting the interaction term of this variable with ATG levels does not a?ect ?t to the data. 2.5 Discussion Schoenfeld (1980) presented a class of chi-squared goodness of ?t tests for the proportional hazards regression model. We adapted this idea and proposed a class of goodness of ?t tests for testing the statistical adequacy of a linear mixed model. We described the asymptotic properties of the test when parameters were estimated and developed its theoretical power under the local, or contiguous, alternative. We assessed factors that impact the power, the impact of choice of cell partitions on the test as well as the robustness of the test with respect to symmetric error distri- bution in simulations. We found that when a speci?c covariate that is associated with outcome is omitted, especially interaction terms or a covariate correlated with covariates already in the model, cell partitions based on the omitted covariate result in adequate power of the test. However, if the cell partition is based only on covari- ates already in the model, this test has no power to detect any lack of model ?t. We also found that the estimated theoretical power calculated using Le Cam?s third lemma was reliable at least when the number of clusters m is above 50. However, when m is very small, it may be advisable to rely on the empirical power computed through simulations. Our test was also very robust to violations of the normality 47 assumption of the error distribution. This goodness of ?t test can be used to test the statistical adequacy of the ?nally selected LMM in real application. The test statistic is very easy to implement, as all that is needed in order to apply the test are the ?nal model parameter estimates andtheirvariancecovariancematrix, whicharestandardoutputsfromanystatistical software. As a note of caution, in applying the test one needs to check the rank of the estimated variance covariance matrix ? in (2.13) to ensure the correct degrees of freedom for the test statistic. In Chapter 3, we extend this test statistic to assess the ?t of generalized linear mixed models. 2.6 Technical details for Chapter 2 2.6.1 Proof of Theorem 2.3 Proof: Let J be the limit of the sample information matrix per observation, J = lim N?? 1 N ? ?? ? ? ?2l??i??j ? ?2l??i??j ? ?2l??i??j ? ?2l??i??j ? ?? ?= ? ?? ? J?? J?? JT?? J?? ? ?? ?. (2.27) The MLE ?? in model (2.3) is consistent as follows from results by Miller (1977). By Taylor series expansion of the score function S(??), we obtain ?N(???? 0) ? ( ? 1N ?S(?0)?? )?1 1 ?NS(?0) ? J?1 1?NS(?0). (2.28) 48 Under model (2.3), J?? = 0 in (2.27) (Wand 2007). The Fisher information is therefore a block diagonal matrix and J?1 = ? ?? ? J?1?? 0 0 J?1?? ? ?? ?. As Y?X?? N(0,V), the score function for ?, which is the ?rst p components of S(?), is S?(?) = (?/??)l(?) = XTV?1(Y?X?). By extracting the ?rst p components of (2.28), we have ?N(???? 0) ? J ?1 ?? 1? NS?(?0) = J ?1 ??X TV?1(Y?X? 0)/ ?N. Thus, ?N ? ?? ? (f ?e(?0))/N ????0 ? ?? ? ? ? ?? ?? ?? ?? ?? ? N?1/2 [I{x11?E1}???I{xmnm?E1}] ... N?1/2 [I{x11?EL}???I{xmnm?EL}] N?1/2 J?1??XTV?1 ? ?? ?? ?? ?? ?? ? (Y?X?0) = D(L+p)?N(Y?X?0), (2.29) which is a linear combination of Gaussian random variables. Therefore, as N ??, ?N ? ?? ? (f ?e(?0))/N ????0 ? ?? ? D? N(0,DVDT), 49 where DVDT = ? ?? ? H J?1?? J?1?? T J?1?? ? ?? ?, with = ? ?? ?? ?? ? ?T1 ... ?TL ? ?? ?? ?? ? L?p = lim N?? ? ?? ?? ?? ? N?1?mi=1?nij=1 I{xij?E1}xTij ... N?1?mi=1?nij=1 I{xij?EL}xTij ? ?? ?? ?? ? , (2.30) and H is a symmetric matrix of dimension L ? L. For l = 1,??? ,L ? 1; k = l +1,??? ,L, the o?-diagonal elements of H are Hlk = ?2a lim N?? 1 N m? i=1 [( n i? j=1 I{xij?El} )( n i? j=1 I{xij?Ek} )] (2.31) Similarly, for l = 1,??? ,L, the diagonal elements of H are Hll = ?2? lim N?? 1 N m? i=1 ni? j=1 I{xij?El} +?2a lim N?? 1 N m? i=1 ( n i? j=1 I{xij?El} )2 . (2.32) Remark 2.13 If xij are random variables, under Assumption 2.2, 50 ?l = E{x1,n1}(?n1j=1 I{x1j?El}x1j)/E(n1), and Hlk = ?2a ? limm?? 1(?m i=1 ni)/m ? 1m m? i=1 [( n i? s=1 I{xis?El} )( n i? t=1 I{xit?Ek} )] = ? 2 a E(n1) ?E(x1,n1) [( n 1? s=1 I{x1s?El} )( n 1? t=1 I{x1t?Ek} )] = ? 2 a E(n1) ?En1 { Ex1 [( n 1? s=1 I{x1s?El} )( n 1? t=1 I{x1t?Ek} ) n 1 ]} . The expressions of Hlk and Hll depend on the joint distribution of xi and ni. Under the more restrictive assumption that xij,i = 1,...,m;j = 1,...,ni are i.i.d. and are independent of ni, then Hlk and Hll can be further simpli?ed as Hlk = ? 2 a E(n1) ?En1 {n 1Ex11 (I {x11?El}I{x11?Ek} )+ (n2 1 ?n1)E (I {x11?El} )E(I {x11?Ek} )} = ?2a ? E(n 2 1 ?n1) E(n1) P (I {x11?El} )P(I {x11?Ek} ) and Hll = (?2? +?2a)E(I{x11?El})+?2aE(n 2 1 ?n1) E(n1) [ P(I{x11?El}) ]2 . 2 51 2.6.2 Proof of Corollary 2.7 Proof: Under the asymptotic normality of ?N(????0), with A ? B denot- ing A?B P? 0, 1? N ( f?e(??) ) = 1?N (f?e(?0)) + 1?N ( e(?0)?e(??) ) ? 1?N (f?e(?0))? 1?N?e(?0) (? ???0 ) ? 1?N (f?e(?0))? ?N (? ???0 ) = (I | ? )?N ? ?? ? (f ?e(?0))/N ????0 ? ?? ?. Since N?1/2 ( f?e(??) ) is a linear combination of components in ?N ? ?? ? (f ?e(?0))/N ????0 ? ?? ?, thus 1? N ( f?e(??) ) D ? N(0, ), (2.33) with = H? J?1?? T. 2.6.3 Proof of Theorem 2.10 Since the 2-level LMM model (2.3) is a special case of the LMM model (2.2) under normality assumptions for both random e?ects and the error term, the proof of Theorem 2.10 for model (2.2) is quite similar to the proof of Theorem 2.3 for model (2.3) as stated in Section 2.6.1. Two results that we need to use for the proof 52 of Theorem 2.10 are, again, the MLE consistency (Miller, 1977) and the fact that the o?-diagonal matrix J?? of J, the limit of the sample information matrix per observation, is 0 (Wand 2007, equation (3)). Key steps of the proof are ?rst to do a Taylor expansion to the MLE ??, and then to use the fact that the response vector Y is normally distributed to show the asymptotic normality of the observed minus estimated expected vector. In this case, H = limN??FVFT is a symmetric L?L matrix, with F = 1?N ? ?? ?? ?? ? I{x1?E1}???I{xN?E1} ... I{x1?EL}???I{xN?EL} ? ?? ?? ?? ? . (2.34) and = ? ?? ?? ?? ? ?T1 ... ?TL ? ?? ?? ?? ? L?p = lim N?? ? ?? ?? ?? ? 1 N ?N k=1 I{xk?E1}x T k ... 1 N ?N k=1 I{xk?EL}x T k ? ?? ?? ?? ? . (2.35) 2.6.4 Proof of Theorem 2.12 We use the multivariate Central Limit Theorem to prove Theorem 2.12. Proof: Let the ni ?p covariate matrix for the i-th cluster be xi = ? ?? ?? ?? ? 1 xi1,1 ??? xi1,p?1 ... ... ... 1 xini,1 ??? xini,p?1 ? ?? ?? ?? ? , 53 then with Vi given in (2.4), ?N(???? 0) = ?N(XT ?V?1X)?1XT ?V?1(Y?X? 0) ? (XTV?1X N )?1 1 ?N m? i=1 xTi V?1i (yi ?xi?0). Let zil =?nij=1 I{xij?El}(yij ?E(yij)) =?nij=1 I{xij?El}(yij ?xij?0), i = 1,...,m; l = 1,...,L. Then f?e(?0) = ? ?? ?? ?? ? ?m i=1 zi1 ... ?m i=1 ziL ? ?? ?? ?? ? , and our test statistic is based on a quadratic form in (f?e(??))/?N = (f?e(?0))/?N + ( e(?0)?e(??) ) /?N ? (f?e(?0))/?N ??e(?0)(????0)/?N ? 1?N ? ?? ?? ?? ? ?m i=1 zi1 ... ?m i=1 ziL ? ?? ?? ?? ? ? 1?N ?e(?0)N (XTV?1X N )?1 m? i=1 xTi V?1i (yi ?xi?0) = m? i=1 1? N ? ?? ?? ?? ? ? ?? ?? ?? ? zi1 ... ziL ? ?? ?? ?? ? ? ?e(?0)N (XTV?1X N )?1 xTi V?1i (yi ?xi?0) ? ?? ?? ?? ? . 54 Let ? = N?1?e(?0), ?J?? = N?1XTV?1X. Then under Assumption 2.4, ?J?? P? J??, and under Assumption 2.2, ? P? (Remark 2.1), with given in (2.30). We next show that (f?e(??))/?N has a limiting Gaussian distribution by using the multivariate Central Limit Theorem. For any constant vector C = (C1,...,CL), since the inverse of Vi given in equation (2.4) is Vi = Ini/?2? ??2a/(?2?(?2? +ni?2a))1 ?2 , we have CTN?1/2 ( f?e(??) ) ? m? i=1 1? N [ L? l=1 Clzil ?CT ? ?I?1??xTi V?1i (yi ?xi?0) ] = m? i=1 1? N [ L? l=1 Cl ni? j=1 I{xij?El}(yij ?xij?0)? CT ? ?I?1??xTi ( 1 ?2? Ini ? ?2a ?2?(?2? +ni?2a)1 ?2) (yi ?xi?0) ] = m? i=1 1? N { ni? j=1 [ L? l=1 ClI{xij?El} ] (?i +?ij)? CT ? ?I?1?? ni? j=1 [ 1 ?2? xij ? ni?2a ?2?(?2? +ni?2a)?xi. ] (?i +?ij) } = m? i=1 { 1? N ni? j=1 [ L? l=1 ClI{xij?El} ?CT ? ?I?1?? ( 1 ?2? xij ? ni?2a ?2?(?2? +ni?2a)?xi. )]} ?i + m? i=1 ni? j=1 1? N [ L? l=1 ClI{xij?El} ?CT ? ?I?1?? ( 1 ?2? xij ? ni?2a ?2?(?2? +ni?2a)?xi. )] ?ij = m? i=1 ci,ni?i + N? s=1 ws?s, 55 where the double index (i,j) is placed in one-to-one correspondence with the single index s. Because{?i}mi=1 are i.i.d and{?s}Ns=1 are i.i.d, we can show both of the above sums have limiting normal distributions as m ??, by checking the conditions (a) and (b) of Lemma 5.1 in the Appendix. First we bound |ci,ni| = 1?N ni? j=1 { L? l=1 ClI[xij?El] ?CT ? ?I?1?? ( 1 ?2? xij ? ni?2a ?2?(?2? +ni?2a)?xi. )} = 1?N ni? j=1 L? l=1 ClI[xij?El] ?CT ? ?I?1?? ( 1 ?2? ni?xi. ? ni?2a ?2?(?2? +ni?2a)ni?xi. ) = 1??m i=1 ni/m 1? m L? l=1 Cl ni? j=1 I{xij?El} ?CT ? ?I?1?? ni? 2 a ?2? +ni?2a 1 ?2a?xi. ? 1??m i=1 ni/m 1? m ( ni L? l=1 Cl + CT ? ?I?1?? 1 ?2a?xi. ) ? 1?E(n) ?0 ?i = 1,...,m Next, m? i=1 c2i,ni = 1?m i=1 ni/m 1 m m? i=1 { L? l=1 Cl ni? j=1 I{xij?El} ?CT ? ?I?1?? ni? 2 a ?2? +ni?2a 1 ?2a?xi. }2 ? 1?m i=1 ni/m 1 m m? i=1 2 ? ? ? ( L? l=1 Cl )2 n2i + ( 1 ?2a )2 CT ? ?I?1???xi.?xTi.?I?1?? ? TC ? ? ? P? 2 E(n) ? ? ? ( L? l=1 Cl )2 En2 + 1?4 a CT ? ?I?1?? [ limm?? 1m m? i=1 ?xi.?xTi. ] ?I?1?? ? TC ? ? ? = a ?nite constant. (2.36) 56 Thus condition (a) in Lemma 5.1 holds, i.e. max 1?i?m |ci,ni| P? 0, as m ??. Since (xi,ni) are assumed i.i.d., and based on (2.36), the c2i,ni terms can be written as g(xi,ni)/m after removing the factor m/(?mi=1 ni). This function g is the same across i, with E(g(xi,ni)) < ?. Thus the Law of Large Numbers Theorem holds for (xi,ni), condition (b) in Lemma 5.1 holds, i.e. m? i=1 c2i,ni P? a ?nite constant. Then ?mi=1 ci,ni?i is asymptotically normally distributed since both conditions (a) and (b) in Lemma 5.1 are satis?ed. We do the same checking for ?Ns=1 ws?s. |ws| = 1?N L? l=1 ClI{xij?El} ?C? ? ?I?1?? ( 1 ?2? xij ? ni?2a ?2?(?2? +ni?2a)?xi. ) ? 1?N [ L? l=1 ClI{xij?El} + C? ? ?I?1?? 1 ?2? xij + C? ? ?I?1?? ni?2a ?2?(?2? +ni?2a)?xi. ] ? 1?N [ L? l=1 |Cl|+ 1?2 ? C? ? ?I?1??xij + C? ? ?I?1???xi. ] P? 0 ?s = 1,...,N. (2.37) 57 Based on (2.37) and the fact that (x+y +z)2 ? 3(x2 +y2 +z2), we have N? s=1 w2s = 1N N? s=1 [ L? l=1 ClI{xij?El} ?C? ? ?J?1?? ( 1 ?2? xij ? ni?2a ?2?(?2? +ni?2a)?xi. )]2 ? 1N N? s=1 [ L? l=1 |Cl|+ 1?2 ? C? ? ?J?1??xij + 1?2 ? C? ? ?J?1???xi. ]2 ? 1N N? s=1 3 ? ? ( L? l=1 |Cl| )2 + ( 1 ?2? )2 C? ? ?J?1??xij 2 +( 1 ?2? )2 C? ? ?J?1???xi. 2 ? ? P? 3 ( L? l=1 |Cl| )2 +3 ( 1 ?2a )2 C? ? ?J?1?? ( lim N?? 1 N m? i=1 ni? j=1 xijx?ij ) ?J?1?? ? ?C + 3 ( 1 ?2a )2 C? ? ?J?1?? ( lim N?? 1 N m? i=1 ni?xi.?x?i. ) ?J?1?? ? ?C = a ?nite constant. Therefore, we have max 1?s?N |ws| P? 0, as N ??. With the same argument that we did on ?mi=1 ci,ni?i, we get the asymptotic nor- mality of ?Ns=1 ws?s by checking conditions (a) and (b) of Lemma 5.1. We have shown above that both?mi=1 ci,ni?i and?Ns=1 ws?s are asymptotically normal. Because ?i and ?ij are independent, the sums ?mi=1 ci,ni?i and ?Ns=1 ws?s are conditionally independent given (xi,ni). These two sums are jointly normal and asymptotically uncorrelated. Therefore they are asymptotically independent. Thus the limiting distribution of C?N?1/2(f?e(??)), which is asymptotically the sum of these two quantities, is normal. Moreover, for any constant vector C, its limiting 58 variance is of the form CT C with the same ?xed , = lim N?? 1 N m? i=1 Var ? ?? ?? ?? ? ?ni j=1 I{xij?E1}(yij ?xij?0) ... ?ni j=1 I{xij?EL}(yij ?xij?0) ? ?? ?? ?? ? ? lim N?? [?e(? 0) N ][XTV?1X N ]?1[?e(? 0) N ]T = H? J?1?? T. Therefore, N?1/2(f ? e(??)) D? N(0, ), and (f ? e(??))T ?1(f ? e(??))/N D? ?2k, where k = rank( ). We replace with ? , the reconstructed matrix by applying Singular Value Decomposition on a consistent estimator of . One such consistent estimator of is to replace all parameters in with their MLEs. Based on Lemma 5.3, rank (? ) = rank ( ) for large N. Thus (f?e(??))T ? ?1(f?e(??))/N D? ?2k. 2.6.5 Derivation of the power of T We derive the power of the test for multi level LMM (2.1) under contiguous alternatives, based on Le Cam?s third lemma (Van der Vaart, 2000), as stated below. Lemma 2.14 (Le Cam?s third lemma) Let Pm and Qm be two measures on a measurable space, corresponding to a null distribution under investigation, and an 59 alternative hypothesis respectively. If (Wm,log dQmdP m ) Pm?NL+1 ? ?? ? ? ?? ? ? ??2?/2 ? ?? ?, ? ?? ? ? ? ?T ?2? ? ?? ? ? ?? ?, (2.38) then Wm Qm? NL(?+?,?). 2 Let H0 :?N =?0, H1 :?N =?0 + a?N, where a is a constant vector, a/?N ? 0, as N ? ?. Thus ?N ??0, as N ? ?. By Taylor expansion, under Theorem 5.21 in Van der Vaart (2000), log dQNdP N = log Likelihood(?N;Y,X)Likelihood(? 0;Y,X) ?= log L(?N) L(?0) ? (?log(L(?0)))T a?N + 12 a T ?N (??2 log(L(?0))) a?N ? (SN(?0))T a?N ? 12aTJ(?0)a, 60 where SN(?0) = ?log(L(?0)) = ? ?? ?? ?? ? XTV?1(Y?X?0) ?12tr(V?1 ?V??2 a )+ 12(Y?X?0)TV?1 ?V??2 a V?1(Y?X?0) ?12tr(V?1)+ 12(Y?X?0)TV?1V?1(Y?X?0) ? ?? ?? ?? ? (2.39) is the score function for all observations (Chapter 6, McCulloch and Searle, 2001). Under Assumptions 2.4 and 2.5, we get the existence of J(?0) = lim N?? ?? ?2 log(L(?0))/N, the limit of the sample Fisher information per observation and lim N?? Var(SN(?0)/?N) = J(?0). Thus log dQNdP N PN? N(?1 2a TJ(?0)a,aTJ(?0)a). For the special case when we ?t a reduced model to the data, using X?N?p? instead of XN?p with p? < p, we only estimate the coe?cient ?? corresponding to X?. The e(?) function in (2.7) el(?) = m? i=1 ni? j=1 I{xij?El}E(yij) = m? i=1 ni? j=1 I{xij?El}xTij? 61 has Rp as its domain. Let function e?(?) have the same meaning of function e(?), but with Rp? as its domain. e?l(??) = m? i=1 ni? j=1 I{xij?El}E?(yij) = m? i=1 ni? j=1 I{xij?El}(x?ij)T??. Let WN = (f?e?(???))/?N be the ?rst vector component of (2.38). Under the null hypothesis PN, WN is asymptotically normal, WN?N(0, ?), based on Corollary 2.7. Next, we compute the variance-covariance matrix in (2.38), which is equiv- alent to the variance-covariance matrix of aTSN(?0)/?N and (f?e?(???))/?N. 1? N(f?e ?(???)) ? 1? N(f?e ?(?? 0))? 1? N?e ?(?? 0)( ?? ? ??? 0) ? 1?N(f?e?(??0))? ??N( ??? ???0) ? 1?N(f?e?(??0))? ?(J???)?1(X?)TV?1(Y?X???0)/?N = 1?N(A?B)?(Y?X???0), where J??? denotes the information matrix when the second derivative for the log- likelihood function is taken with respect to ??, and A = ? ?? ?? ?? ? I{x11?E1}???I{xmnm?E1} ... I{x11?EL}???I{xmnm?EL} ? ?? ?? ?? ? , B = ?(J???)?1(X?)TV?1. 62 Thus, Cov(f?e ?(???) ?N ,log dQndP n ) = Cov(f?e ?(???) ?N , a TSN(?0) ?N ) = 1NCov(f?e?(???),aT1 S? +a2S?2a +a3S?2?). (2.40) Under equation (2.39), since both tr(V?1(?V/??2a)) and tr(V?1) are constants, we have Cov ( f?e?(???),S?2a ) = Cov ( (A?B)?(Y?X???0), 12(Y?X?0)TV?1 ?V??2 a V?1(Y?X?0) ) = (A?B) Cov ( Y?X???0, 12(Y?X???0)TV?1 ?V??2 a V?1(Y?X???0) ) = 0, because Cov(Y?X???0, 12(Y?X???0)TV?1 ?V??2 a V?1(Y?X???0)) = 0. Similarly, we get Cov(f?e?(???),S?2?) = 0. 63 Therefore (2.40) becomes Cov(f?e ?(???) ?N ,log dQndP n ) = 1N Cov ( f?e?(???),aT1 S? ) = 1N Cov((A?B)?(Y?X???0),aT1 XTV?1(Y?X???0)) = 1N (A?B) Var(Y) V?1Xa1 = 1N (A?B)Xa1 = { ? 1N ?(J???)?1[(X?)TV?1X] } a1 = { ? ?[(X?)TV?1(X?)]?1[(X?)TV?1X] } a1. (2.41) Since both f?e?(???) and aT1 S? can be written as a matrix multiply by the same normal vector Y?X?0 = Y?X???0, we easily get the asymptotic joint normality of f ? e?(???) and aT1 S?. Because f ? e?(???) is asymptotically uncorrelated with both S?2a and S?2? as shown in the above context, we also get the asymptotic jointly normality of f?e?(???) and aTSN(?0). 2 We next calculate the limits of the terms in Cov ( N?1/2(f?e?(???)),log(dQn/dPn) ) in Section 2.6.5.1 and 2.6.5.2. 2.6.5.1 Limit of (X?)TV?1X and (X?)TV?1(X?) in (2.22) The analytical expressions in (2.41) calculated in this subsection as well as in Subsection 2.6.5.2 are used to get the theoretical powers in the settings discussed in the simulation Section 2.3, where xij,i = 1,...,m;j = 1,...,ni are i.i.d., and xij 64 and ni are independently generated. Note that (X?)TV?1(X?) is a submatrix of (X?)TV?1X. To get the limit of (X?)TV?1X/N forthesettinginthesimulationSection, weassumeX = (1,x1,x2,x3) andX? = (1,x1,x2). The ith block matrix (2.4) can then be written as ?2?Ini+?2a1?2ni , with its inverse matrix being Ini/?2? ??2a/(?2?(?2? +ni?2a))1?2ni . Thus 65 1 N(X ?)TV?1X = 1N ? ?? ?? ?? ?? ?? ?? ?? ? ... 1 xi1,1 xi1,2 ... 1 xini,1 xini,2 ... ? ?? ?? ?? ?? ?? ?? ?? ? T [diag(?2 ?Ini +? 2 a1 ?2 ni ) ]?1 ? ?? ?? ?? ?? ?? ?? ?? ? ... 1 xi1,1 xi1,2 xi1,3 ... 1 xini,1 xini,2 xini,3 ... ? ?? ?? ?? ?? ?? ?? ?? ? = 1N m? i=1 ? ?? ?? ?? ? 1 ??? 1 xi1,1 ??? xini,1 xi1,2 ??? xini,2 ? ?? ?? ?? ? [ 1 ?2? Ini ? ?2a ?2?(?2? +ni?2a)1 ?2 ni ] ? ? ?? ?? ?? ? 1 xi1,1 xi1,2 xi1,3 ... 1 xini,1 xini,2 xini,3 ? ?? ?? ?? ? = 1?2 ? 1 N m? i=1 ? ?? ?? ?? ? 1 ??? 1 xi1,1 ??? xini,1 xi1,2 ??? xini,2 ? ?? ?? ?? ? ? ?? ?? ?? ? 1 xi1,1 xi1,2 xi1,3 ... 1 xini,1 xini,2 xini,3 ? ?? ?? ?? ? ? 1 N m? i=1 ?2a ?2?(?2? +ni?2a) ? ?? ?? ?? ? 1 ??? 1 xi1,1 ??? xini,1 xi1,2 ??? xini,2 ? ?? ?? ?? ? 1?2ni ? ?? ?? ?? ? 1 xi1,1 xi1,2 xi1,3 ... 1 xini,1 xini,2 xini,3 ? ?? ?? ?? ? . 66 Here the ?rst sum is 1 ?2? 1 N m? i=1 ? ?? ?? ?? ? 1 ??? 1 xi1,1 ??? xini,1 xi1,2 ??? xini,2 ? ?? ?? ?? ? ? ?? ?? ?? ? 1 xi1,1 xi1,2 xi1,3 ... 1 xini,1 xini,2 xini,3 ? ?? ?? ?? ? = 1?2 ? 1 N m? i=1 ni? j=1 ? ?? ?? ?? ? 1 xij,1 xij,2 xij,3 xij,1 x2ij,1 xij,1xij,2 xij,1xij,3 xij,2 xij,1xij,2 x2ij,2 xij,2xij,3 ? ?? ?? ?? ? . As N ??, its limit is 1 ?2? ? ?? ?? ?? ? 1 Ex1 Ex2 Ex3 Ex1 Ex21 E(x1x2) E(x1x3) Ex2 E(x1x2) Ex22 E(x2x3) ? ?? ?? ?? ? . The second sum is 1 N m? i=1 ?2a ?2?(?2? +ni?2a) ? ?? ?? ?? ? 1 ??? 1 xi1,1 ??? xini,1 xi1,2 ??? xini,2 ? ?? ?? ?? ? 1?2ni ? ?? ?? ?? ? 1 xi1,1 xi1,2 xi1,3 ... 1 xini,1 xini,2 xini,3 ? ?? ?? ?? ? = 1N m? i=1 ?2a ?2?(?2? +ni?2a)? ? ?? ?? ?? ? n2i ni?nij=1 xij,1 ni?nij=1 xij,2 ni?nij=1 xij,3 ni?nij=1 xij,1 (?nij=1 xij,1)2 d1 d2 ni?nij=1 xij,2 d1 (?nij=1 xij,2)2 d3 ? ?? ?? ?? ? 67 where d1 = (?nij=1 xij,1)(?nij=1 xij,2), d2 = (?nij=1 xij,1)(?nij=1 xij,3), d3 = (?nij=1 xij,2)(?nij=1 xij,3) and 1 N m? i=1 ?2a ?2?(?2? +ni?2a)n 2 i = 1 (?mi=1 ni)/m ? 1 m m? i=1 ?2a ?2?(?2? +ni?2a)n 2 i P? 1 E(n) ?E [ ?2a ?2?(?2? +n?2a) ?n 2] = 1E(n) ? ? 2 a ?2? ?c1, where c1 = E(n2/(?2? +n?2a)). 1 N m? i=1 ?2a ?2?(?2? +ni?2a)ni ni? j=1 xij,1 = 1(?m i=1 ni/m) ? 1m m? i=1 ni?2a ?2?(?2? +ni?2a) ( n i? j=1 xij,1 ) P? 1 E(n) ?E(x,n) [ n?2a ?2?(?2? +n?2a) ( n? j=1 xj,1 )] = 1E(n) ?En { Ex [ n?2a ?2?(?2? +n?2a) ( n? j=1 xj,1 ) n ]} = 1E(n) ?En { n2?2 a ?2?(?2? +n?2a) ?EX1 } = EX1E(n) ? ? 2 a ?2? ?c1. 68 1 N m? i=1 ?2a ?2?(?2? +ni?2a) ( n i? j=1 xij,1 )( n i? j=1 xij,2 ) = 1(?m i=1 ni)/m ? 1m m? i=1 ?2a ?2?(?2? +ni?2a) ( n i? j=1 xij,1 )( n i? j=1 xij,2 ) P? 1 E(n) ?E(x,n) { ?2a ?2?(?2? +n?2a) ( n? j=1 xj,1 )( n? j=1 xj,2 )} = 1E(n) ?En { Ex [ ?2a ?2?(?2? +n?2a) ( n? j=1 xj,1 )( n? j=1 xj,2 ) n ]} = 1E(n) ?En { ?2 a ?2?(?2? +n?2a) [nE(x 1x2) + (n2 ?n)?Ex1 ?Ex2 ]} = 1E(n) ? ? 2 a ?2? { En ( n ?2? +n?2a ) ?E (x1x2) +En ( n2 ?n ?2? +n?2a ) ?Ex1Ex2 } . Let c2 = En[n/(?2? +n?2a)], then the above limit is 1 E(n) ? ?2a ?2? {c2 ?E(x1x2)+c3 ?Ex1Ex2}, where c3 = c1 ?c2. Combining the above expressions, we have the limit for the second component of 1 N(X ?)TV?1X: 1 E(n) ? ?2a ?2? ? ?? ?? ?? ? c1 c1Ex1 c1Ex2 c1Ex3 c1Ex1 h1 h2 h4 c1Ex2 h2 h3 h5 ? ?? ?? ?? ? , with h1 = c2Ex21 + c3(Ex1)2, h2 = c2E(x1x2) + c3Ex1Ex2, h3 = c2Ex22 + c3(Ex2)2, h4 = c2E(x1x3) +c3Ex1Ex3 and h5 = c2E(x2x3)+c3Ex2Ex3. 69 Finally, 1 N(X ?)TV?1X = 1 ?2? 1 N m? i=1 (X?i)TXi ? 1N m? i=1 ?2a ?2?(?2? +ni?2a)(X ? i) T1?2 ni Xi P? 1 ?2? ? ?? ?? ?? ? 1 Ex1 Ex2 Ex3 Ex1 Ex21 E(x1x2) E(x1x3) Ex2 E(x1x2) Ex22 E(x2x3) ? ?? ?? ?? ? ? 1E(n)? 2 a ?2? ? ?? ?? ?? ? c1 c1Ex1 c1Ex2 c1Ex3 c1Ex1 h1 h2 h4 c1Ex2 h2 h3 h5 ? ?? ?? ?? ? , where c1 = En[n2/(?2? +n?2a)], c2 = En[n/(?2? +n?2a)] and c3 = c1 ? c2, h4 = c2E(x1x3) +c3Ex1Ex3 and h5 = c2E(x2x3) +c3Ex2Ex3. Since (X?)TV?1X? is a submatrix of (X?)TV?1X, ? = lim N?? 1 N(X ?)TV?1X? = 1?2 ? ? ?? ?? ?? ? 1 Ex1 Ex2 Ex1 Ex21 E(x1x2) Ex2 E(x1x2) Ex22 ? ?? ?? ?? ? ? 1E(n)? 2 a ?2? ? ?? ?? ?? ? c1 c1Ex1 c1Ex2 c1Ex1 h1 h2 c1Ex2 h2 h3 ? ?? ?? ?? ? . 2.6.5.2 Limit of ? in (2.22) We discuss the case when the cell partition is based on x3. Again, the results got from this Section is used to calculate the theoretical power in the scenarios of Section 2.3. 70 Let El be the lth cell of the cell partition, l = 1,...,L, then the elements of ? are: 1 N m? i=1 ni? j=1 I{xij,3?El} P?? ? El f3(x3)dx3 1 N m? i=1 ni? j=1 I{xij,3?El}xij,1 P?? ? x1 ? El x1f(x1,x3)(x1,x3)dx3dx1 1 N m? i=1 ni? j=1 I{xij,3?El}xij,3 P?? ? El x3f3(x3)dx3. When (x1,x2,x3) are jointly normal, which is the Scenario I in Section 2.3.1, F3(x3) and f(x1,x3)(x1,x3) are the corresponding normal and bivariate normal densities. When x3 = x1x2 and x1 and x2 independent, i.e. ?12 = 0, which is the Scenario II in Section 2.3.2, we can calculate F3(x3) and f(x1,x3)(x1,x3) as follows: F3(z) = P(x1x2 ? z) = P(x1x2 ? z,x1 > 0) +P(x1x2 ? z,x1 < 0) = ? ? {x20} f(x1x2)(x1,x2)dx1dx2 + ? ? {x2>z/x1,x1<0} f(x1x2)(x1,x2)dx1dx2 = ? ? x1=0 ? z x1 x2=?? fx1(x1)fx2(x2)dx1dx2 + ? 0 x1=?? ? ? x2= zx1 fx1(x1)fx2(x2)dx1dx2 = ? ? 0 F2 ( z x1 ) f1(x1)dx1 + ? 0 ?? [ 1?F2 ( z x1 )] f1(x1)dx1 f3(z) = dF3(z)dz = ? ? 0 f2 ( z x1 ) 1 x1f1(x1)dx1 ? ? 0 ?? f2 ( z x1 ) 1 x1f1(x1)dx1. To get the joint distribution of (x1,x3) = (x1,x1x2), 71 let ? ??? ??? y1 = x1 y2 = x1x2 , then equivalently, ? ??? ??? x1 = y1 x2 = y2y1 . So the Jacobian matrix J = ?x1 ?y1 ?x1 ?y2 ?x2 ?y1 ?x2 ?y2 = 1 0 ?y2y2 1 1 y1 = 1|y1| Thus f(y1,y2)(y1,y2) = f(x1,x2) ( y1, y2y 1 ) ?J = fx1(y1)?fx2 (y 2 y1 ) ? 1|y 1| Similarly, the joint distribution of (x2,x3) is f(x2,x3)(x2,x3) = f(x1,x2) (x 3 x2,x2 ) ? 1|x 2| = fx1 (x 3 x2 ) fx2(x2)? 1|x 2| Remark 2.15 The limiting terms calculated in Section 2.6.5.1 and 2.6.5.2 are used to calculate the theoretical power in Section 2.2.3. Figure 2.1, 2.2 and 2.3 are plotted by making use of these limit expressions. 2 72 Chapter 3 Goodness of ?t tests for generalized linear mixed models 3.1 Generalized linear mixed models (GLMMs) For i = 1,...,m, and j = 1,...,ni, let xij (with the ?rst component be- ing 1) and yij be the covariate and outcome value for the jth subject in cluster i respectively, and let yi = (yi1,...,yini). The GLMM has the following form: E(yij|ui) = g(xTij?+wTijui), where g(.) is a known strictly monotonic and di?eren- tiable function, the i.i.d. cluster random e?ects ui are assumed to be from a known probability distribution, fu(ui), with unknown parameter vector ? = (?21,...,?2s), and wij are covariates for the random e?ects. We assume that the conditional dis- tribution of yij given ?ij = xTij?+wTijui and ?, follows a distribution from the expo- nential family with density function fy|?(y|?,?) = exp{[yQ(?)?b(?)]a(?)+c(y,?)}, where ? is generally an unknown parameter related to Var(yij). In this Chapter, we restrict the exponential family to the canonical form, that is Q(?) = ?. One can always de?ne a transformed parameter to convert an exponential family to canonical form. The response variables yij are conditionally independent given the random e?ect ui. The covariates xij are usually treated as ?xed in practice. However, to deal with technical issues arising when we prove the consistency of the maximum likelihood estimators and the asymptotic properties of the test statistic, we assume throughout this chapter that (xi,ni) are i.i.d. 73 The likelihood function for ? = (?,?,?) is L(?) = m? i=1 ? ni? j=1 fy|?(yij|?ij)dF(?ij). The maximum likelihood estimator (MLE) of ?, denoted by ??, is the solution to S(?) = 0, where the S denotes the score function S(?) = ?logL(?)?? = m? i=1 Si(?). In what follows, we will focus on GLMMs with a single random intercept, given by E(yij|?i) = g(xTij?+?i), ?i ? N(0,?2). (3.1) In this case, the parameter vector ? for variance components reduces to a single parameter ?2. The linear mixed model, which is an important case of GLMMs, has been carefully studied in Chapter 2. We now describe another two special cases of the GLMMs with random intercept. 3.1.1 Mixed-e?ects logistic models For mixed-e?ects logistic models with a cluster speci?c random intercept, yij given pij, where pij = P(yij = 1), comes from a binomial distribution: yij ? 74 Binom(1,pij) and logit(pij) = log(pij/(1?pij)) = xTij?+?i, where the i.i.d. cluster random e?ects ?i are assumed to be N(0,?2). The response variables yij are condi- tionally independent given ?i. The unknown parameter vector is now ? = (?,?2). The marginal probability of the response for the ith cluster, conditionally given (xij,ni), under the random intercept logistic mixed model yij ? Binom(1,pij), logit(pij) = xTij?+?i, ?i ? N(0,?2) (3.2) is P(Yij = yij) = ? ni? j=1 pyijij q1?yijij dF(?i), where qij = 1?pij. The means and variance-covariances for yij, i = 1,...,m; j = 1,...,ni, , conditionally given (xij,ni), are E(yij) = E(E(yij|?i)) = E(pij) = ? pij dF(?i), Var(yij) = E[Var(yij|?i)] +Var[E(yij|?i)] = E[pij(1?pij)] +Var(pij) = E(pij)?[E(pij)]2, Cov(yis,yit) = ? pispitdF(?i)?E(pis)E(pit), ?s ?= t. 75 3.1.2 Mixed-e?ects Poisson models For the mixed-e?ects Poisson regression model with a cluster speci?c random e?ect, responses yij given ?ij, i = 1,...,m, j = 1,...,ni, follow Poisson distribu- tions with means ?ij and g?1(?ij) = xTij? + ?i, where the i.i.d. cluster random e?ects ?i are assumed N(0,?2) and g is a known monotonic and di?erentiable link function. When g?1 is the log function, this link function is canonical, transforming the mean ?ij to the natural exponential parameter. The response variables yij are conditionally independent given the random e?ects ?i. The unknown parameter vector in this case is ? = (?,?2). The marginal probability of the response for the ith cluster, conditionally given (xij,ni), under the mixed-e?ect Poisson regression model is P(Yij = yij) = ? ni? j=1 ?yijij exp(??ij) yij! dF(?i). When the link function is canonical, as we assume from now on, the means and variance-covariances for the response variables yij, i = 1,...,m; j = 1,...,ni, conditionally given (xij,ni), are E(yij) = E[E(yij|?ij)] = E(?ij) = exp(xTij?+?2/2), Var(yij) = E[Var(yij|?ij)] +Var[E(yij|?ij)] = exp(xTij?+?2/2) + exp(2xTij?+ 2?2)?exp(2xTij?+?2), 76 Cov(yis,yit) = [exp(2?2)?exp(?2)]exp(xTis?+xTit?), ? s ?= t. 3.2 Proof of the consistency of MLE for GLMMs Let f(yi;?) be the likelihood function for the ith cluster, i = 1,...,m. The log likelihood function for the whole set of observations is ?mi=1 logf(yi;?). The normalized score function is Sm(?) = 1m m? i=1 ?logf(yi;?) = 1m m? i=1 (?/??)logf(yi;?). The maximum likelihood estimator (MLE) ?? solves Sm(?) = 0. We always assume natural and canonical parameterization. With probability 1 as m gets large, the MLE exists and is unique (Bickel and Doksum 2006). Let B?(?0) = {? : d(?,?0) ? ?} be the ?-neighborhood of the true parameter vector ?0, which is an open convex Borel set in Rp. Let ?Mm(?) = 1 m m? i=1 logf(yi;?), ?M(?) = E[logf(yi;?)] and J(?;?0) = ?E?0[? ?2 logf(y1,x1,n1;?)]. The following set of assumptions (Assumption 3.1) are used for the consistency proof of the MLE ?? as well as for the asymptotic properties of the test statistic that will be discussed in the Section following. We comment on this set of Assumptions in Remark 3.1 below. 77 Assumption 3.1 B.0 (xi,ni) are i.i.d. B.1 ?M(?) and J(?;?0) exist for all ?; B.2 J(?0;?0) is positive de?nite; B.3 J(?;?0) is continuous at ?0 as a function of ?; B.4 (?/??)??2 logf(y1,x1,n1;?) exists and is integrable . B.5 Derivatives and expectations are interchangeable for logf(y1,x1,n1;?) up to the third derivative; B.6 The true parameter point ?0 = (?0,?20) is an interior point of = (Rp,R+); Remark 3.1 B.1 in Assumption 3.1 ensures that we can apply the Law of Large Numbers (LLN) to (m?1)?mi=1 logf(yi,xi,ni;?) and ?(m?1)?mi=1??2 logf(yi,xi,ni;?). In order for B.1 ? B.5 to hold, assumptions are needed on (xi,ni). We provide su?cient assumptions on (xi,ni) and check B.2 for the random intercept logistic mixed model stated as Lemma 3.8 in Section 3.6.1. To check B.3 and B.5 in Assumption 3.1 , based on the Dominated Convergence Theorem, it su?ces to show that ? ?? , there exists B?(?), such that ?E?0[ sup ???B?(?) ??2 logf(y1,x1,n1;??)] < ?. (3.3) We refer to condition (3.3) as the dominatedness condition. We provide su?cient assumptions on (xi,ni) and check (3.3) for both the random intercept logistic mixed model (stated as Lemma 3.9 in Section 3.6.2 ) and the random intercept Poisson mixed model (stated as Lemma 3.10 in Section 3.6.3). 2 78 Lemma 3.2 Under Assumption 3.1 within model (3.1), there exists an open neigh- borhood U of ?0, such that P( ?Mm(?) is a concave function of ? on U) ? 1, as m ? +?. (3.4) Proof: Based on the existence of ?M(?) and J(?;?0) as stated in B.1 of Assumption 3.1, by the Law of Large Numbers on logf(yi,xi,ni;?) and ??2 logf(yi,xi,ni;?), pointwise for each ?? ?, ?Mm(?) P? ?M(?), (3.5) and ? 1m m? i=1 ??2 logf(yi,xi,ni;?) ? J(?;?0), as m ? +?. (3.6) By B.2 { B.3 of Assumption 3.1, there exists U = B?(?0) such that ???U, J(?;?0) is positive de?nite. By B.1 together with B.4 of Assumption 3.1, the uniform LLN holds for (m?1)?mi=1??2 logf(yi,xi,ni;?) (van der Vaart, 2000, page 271, Example 19.7), i.e. sup ??U ?? 1m m? i=1 ??2 logf(yi,xi,ni;?)?J(?;?0)?? 0, as m ? +?. Then it follows that ?M(?) = (m?1)?mi=1 logf(yi,xi,ni;?) is a concave function of ? ?U with probability approaching 1. 2 To show consistency of the MLE, we use two theorems from Andersen and Gill (1982) and van der Vaart (2000) which are stated here before we summarize our result as a theorem. 79 Theorem 3.3 (Theorem II.1 in Appendix II of Andersen and Gill (1982)) Let E be an open convex subset of Rp and let M1,M2,..., be a sequence of random concave functions on E such that ? ? ? E, Mm(?) P? M(?), as m??, where M is some real function on E. Then M is also concave and for all compact A ? E, sup ??A |Mm(?)?M(?)| P? 0,as m??. (3.7) Theorem 3.4 (van der Vaart 2000, p.45) Let Mm be random functions and let M be a ?xed function of ? such that for every ? > 0, sup ?? |Mm(?)?M(?)| P? 0, sup ?:d(?,?0)?? M(?) < M(?0). Then any sequence of estimators ??m with Mm(??m) ? Mm(?0) ? op(1) converges in probability to ?0. Theorem 3.5 Under Assumption 3.1, the MLE ?? of the GLMM (3.1) is consistent. Proof: Let Mm(?) = ?Mm(?)I{ ~Mm concave on U}. By Lemma 3.2, the MLE ?? is also the solution to ?Mm(?) = 0, with probability approaching 1. Together with (3.5), based on Theorem 3.3, we get (3.7), which is the uniform convergence of Mm(?). Based on Theorem 3.4, the MLE ?? is consistent. 2 Remark 3.6 Whether or not the natural parameterization is used, with probability 80 approaching 1 for large m, the MLE ?? is unique in B?(?0) and is the unique solution of the likelihood score equation in that neighborhood. 2 3.3 Goodness of ?t test for GLMMs To test the goodness of ?t for a proposed GLMM with random intercept, ?rst we partition the covariate space into non-overlapping cells E1, ..., EL. For l = 1,...,L, de?ne fl = m? i=1 ni? j=1 I{xij?El}yij, el(?) = m? i=1 ni? j=1 I{xij?El}E(yij). (3.8) Let f = (f1,...,fL), e(?) = (e1(?),...,eL(?)). For simplicity, we denote E?(yij)|?=^? = ?E(yij). Theteststatisticisbasedontheobservedminustheexpected counts, f?e(??) = ? ?? ?? ?? ? ?m i=1 ?ni j=1 I{xij?E1}(yij ? ?E(yij)) ... ?m i=1 ?ni j=1 I{xij?EL}(yij ? ?E(yij)) ? ?? ?? ?? ? , a vector of length L. Theorem 3.7 For GLMM with random intercept (3.1), let E1,...,EL constitute a partition of the covariate space generated by X into disjoint sets such that E(?nij=1 I{xij?El}) > 0 for all l = 1,...,L. Under Assumption 3.1 in Section 3.2, as m ??, (f?e(??))? ? ?1svd(f?e(??))/m D? ?2k, 81 where ? svd is the reconstructed L ? L square matrix by applying Singular Value Decomposition on a consistent estimator ? , given in (3.11), of = Var(?i) in (3.10). k = rank ( ) = rank (? svd) for large m. ? ?1svd is the Moore ? Penrose pseudoinverse. Proof: The notation A ? B is used to indicate A?B P? 0. Since the MLE ?? is consistent (Section 3.2), by a ?rst order Taylor series expansion of the score function S(??) around?0, we obtain the approximation up to terms asymptotically negligible in probability, ????0 ? (? 1 m ?S(?0) ?? ) ?1 1 mS(?0)) = ?J?10 1 m m? i=1 Si(?0), where Si(?0) is the score function for the ith cluster and the conditional sample ?sher information ?J0 = ?m?1?/??0S(?0)|(xi,ni) P? J(?0;?0). The existence of J(?0;?0) is ensured by B.1 in Assumption 3.1, and its invertibility by B.2. For the rest of this proof, we use J0 to denote J(?0;?0). With E(yij) = E?i[g(xTij? + ?i)], di?erentiation under the integral sign by virtue of B.5 implies that ?E(yij) ?? = E?i[g ?(xT ij?+?i)]?xij = ? ?i g?(xTij?+?i)f?i d?i ?xij, and ?E(yij) ??2 = ? ?i g(xTij?+?i)?f?i??2 d?i. 82 By applying the LLN to the summands over i de?ning el(?0), as N ??, ? = 1 m ?e(?0) = ? ?? ?? ?? ? m?1?mi=1?nij=1 I{xij?E1} ??? 0 E(yij) ... m?1?mi=1?nij=1 I{xij?EL} ??? 0 E(yij) ? ?? ?? ?? ? P? ? ?? ?? ?? ? ?T1 ... ?TL ? ?? ?? ?? ? = . For l = 1,...,L, ?Tl = E(xi,ni) ( n i? j=1 I{xij?El}?E?i(g(xij T?+?i)) ??0 ) . Let zil =?nij=1 I{xij?El}(yij ?E(yij)), i = 1,...,m; l = 1,...,L. Then f?e(?) = ( m? i=1 zi1,..., m? i=1 ziL )T and our test statistic is based on a quadratic form in the vector (f?e(??))/?m = (f?e(?0))/?m+ (e(?0)?e(??))/?m ? (f?e(?0))/?m??e(?0)(????0)/?m ? 1?m ( m? i=1 zi1,..., m? i=1 ziL )T ? 1?m?e(?0)?J?10 1m m? i=1 Si(?0) = 1?m m? i=1 [ (zi1,...,ziL)T ? ? ?J?10 Si(?0) ] = 1?m m? i=1 ?i. (3.9) Under the assumption that (xi,ni) are i.i.d. for i = 1,...,m, also the variables ?i = ?i(yi,xi,ni;?) are i.i.d. with mean 0. Under B.1 { B.3 in Assumption 3.1, 83 Var(?i) = exists. By multivariate Central Limit Theorem, (f?e(??))/?m D? N(0, ). (3.10) Therefore 1 m(f?e( ??))? ?1(f?e(??)) D? ?2k, where k = rank( ). We estimate using ? = 1 m m? i=1 ?Var(?i|xi,ni), (3.11) where ?Var(?i|xi,ni) means that the MLE ?? is substituted for ? in Var(?i|xi,ni). ? is a consistent estimator of and its simpli?ed form is given in Section 3.6.4. Then by Slutsky?s theorem, the test statistic T = 1m(f?e(??))? ? ?1(f?e(??)) D? ?2k, (3.12) We then compute Singular Value Decomposition for ? . For each eigenvalue of ? , we compare it with a preset upper bound ?. For any eigenvalue less than ?, we instead set this eigenvalue to be 0 and reconstruct the ? matrix using the non-zero eigenvalues and their corresponding eigenvectors. We denote this reconstructed matrix as ? svd. Based on Corollary 5.3 given in the Appendix, P(rank (? svd) = 84 rank ( )) ? 1. The test statistic to be used is 1 m(f?e( ??))? ? ?1svd(f?e(??)) D? ?2k. (3.13) 3.3.1 Derivation of the power of T We derive the power of the test for the 2-level GLMM under contiguous alter- natives, based on Le Cam?s third lemma (Lemma 2.14). For a ?xed constant vector a, let H0 :?m =?0, H1 :?m =?0 + a?m, where one or more components of?0 in?0 are 0s. Denote the non-zero components of ?0 as ??0, which is a sub-vector of ?0. Note that ?m ??0, as m ??. By Taylor expansion, using Theorem 5.21 in Van der Vaart (2000), log dQmdP m = log Likelihood(?m;Y,X)Likelihood(? 0;Y,X) ?= log L(?m) L(?0) ? (?log(L(?0)))T a?m + 12 a T ?m(??2 log(L(?0))) a?m ? (Sm(?0))T a?m ? 12aTJ0a, Based on the LLN on ?m?1 ?mi=1??2 logf(yi,xi,ni;?) (3.6), with J0 = 85 J(?0;?0), limm??Var(Sm(?0)/?m) = J0. Thus log dQmdP m Pm? N(?1 2a TJ0a,aTJ0a). Since several components of ?0 are 0s, under H0, we ?t a reduced model to the data, using X?N?p? instead of XN?p with p? < p. We only estimate the coe?cient ?? corresponding to X?. The e(?) function in (3.8) el(?) = m? i=1 ni? j=1 I{xij?El}E(yij) = m? i=1 ni? j=1 I{xij?El} ? g(xTij?+?i)?(?i)d?i has Rp ?R+ as its domain, where p is the dimension of ? and R+ is the domain for ?2, the variance component for the random e?ect ?i. Let function e?(?) be the same meaning of function e(?), but has Rp? ?R+ as its domain e?l(?) = m? i=1 ni? j=1 I{xij?El}E?(yij) = m? i=1 ni? j=1 I{xij?El} ? g((x?ij)T?? +?i)?(?i)d?i. Let Wm = (f?e?(???))/?m. Under the null hypothesis H0, Wm is asymptot- ically normal, Wm?N(0, ?), by (3.10). The joint distribution of log(dQm/dPm) and (f?e?(???))/?m is asymptotically equal in probability to the joint distribution of aTSm(?0)/?m and (f?e?(???))/?m. We next show the jointly normality of these two quantities. 86 By the same sequence of steps as in (3.9), 1? m(f?e ?(???)) ? 1? m m? i=1 [ (z?i1,...,z?iL)T ? ? ?(?J?0)?1S?i(??0) ] , where the ? in z?il and ? ? means that the X and ? in zil and ? are replaced with X? and ??. ?J?0 and S?i(??0) denote the information matrix and the score when ?rst and second derivatives for the log-likelihood function are taken with respect to ??. Thus for all constant (L + 1)-vectors C, with C{1:L} denoting the ?rst L components of C, CT ? ?? ? ( f?e?(???) ) /?m aTSm(?0)/?m ? ?? ? = 1?m m? i=1 [ L? l=1 Clz?il ?CT{1:L}? ?(?J?0)?1S?i(??0) +CL+1aTSi(?0) ] . This quantity has a limiting Gaussian distribution under Assumption 3.1. Let ? in (2.38) be the variance-covariance matrix between aTSm(?0)/?m and (f ? e?(???))/?m. Then by Le Cam?s third lemma (Lemma 2.14), under the contigu- ous alternative H1, ( f?e?(???) ) /?m ? N(?, ?). Thus under H1, with ? =?T( ?)?1?, 1 m ( f?e?(???) )T ( ?)?1 ( f?e?(???) )H 1? ?2 k(?), 87 and 1 m ( f?e?(???) )T(? ?svd )?1( f?e?(???) )H 1? ?2 k(??), where ? ?svd is the reconstructed matrix by applying Singular Value Decomposition on a consistent estimators of ?, ?? is a consistent estimators of?, k = rank(? ?svd) = rank( ?) (Corollary 2.7) and the non-centrality parameter is ?? = ??T(? ?svd)?1??. For GLMMs, it is di?cult to get the explicit form of?, thus ??, a consistent estimator of ?, can be taken as the empirical variance -covariance matrix between aTSm(?0)/?m and (f?e?(???))/?m, which is 1 m m? i=1 (z?i1,...,z?iL)T (S?i(??0))T a? ? ?(?J?0)?1 1m m? i=1 Si(?0)(S?i(??0))T a, with MLE ?? substituted for parameters in the above expression. For a given type I error level ?, the approximate limiting power is thus P(T? > ?2k,?), where ?2k,? is the 1?? quantile of the central ?2k distribution and P denotes the non central ?2k(??) distribution. 3.4 Simulations for logistic mixed models In this Section, we implement the goodness of ?t test for the 2-level logistic mixed models in R and study its performance in simulation. The two stages of the model are: yij|pij ? Binom(1,pij), i = 1,...,m, j = 1,...,ni (3.14) 88 logit(pij) = xTij?+?i, ?i ? N(0,?2). Data are simulated from the following setting. We choose m = 500 clusters and let ni = 5 for all i = 1,...,m. Thus the total number of observation N = ?mi=1 ni = 2500. Then N ?xed covariates xij = (x1ij,x2ij,x3ij), were independently drew from a multivariate normal distribution, ? ?? ?? ?? ? x1 x2 x3 ? ?? ?? ?? ? ? N ? ?? ?? ?? ? ? ?? ?? ?? ? 0 0 0 ? ?? ?? ?? ? , ? ?? ?? ?? ? 1 0 ?13 0 1 ?23 ?13 ?23 1 ? ?? ?? ?? ? ? ?? ?? ?? ? . (3.15) Given the variance component parameter ?2, for m = 500, we generate ?i,i = 1,...,m independently from N(0,?2). Given ? and the generated xij, pij,i = 1,...,m, j = 1,...,ni are calculated from the equation logit(pij) = xTij? + ?i. The last step of generating data is to independently generate yij from (3.14). 3.4.1 Computational issues and code checking Implementing the goodness of ?t test for logistic mixed models is more com- plicated than for linear mixed models because each E(yij) and also the variance and covariance terms involve numerical integration. We used the Gaussian quadrature numerical integration method with 60 quadrature points to calculate those integrals, which are used to calculate the ? in our test statistic. The numerical errors intro- duced by approximating so many integrals through the Gaussian quadrature rule may result in instability when we invert the ? matrix. It may even destroy the 89 invertibility of the matrix, which is used to calculate the test statistics. Thus we computed a Singular Value Decomposition for each estimated ? . For each eigen- value of ? , we compare it with a preset upper bound ? (e.g. ? = .0005). For any eigenvalue less than ?, we instead set this eigenvalue to be 0 and reconstruct the ? matrix using the non-zero eigenvalues and their corresponding eigenvectors. We then did the following checks to make sure that our R code works. ? We compared the empirical variance covariance matrix of (f??e(??))/?m with ? in equation (3.11); ? Letting L(?) denote the likelihood function, and f and e be de?ned in equation (3.8), we compared the empirical variance covariance matrix of f?e(?0) and??logL(?)|?=?0 , computed based on 30,000 simulated data sets, with its estimated analytical variance, which involves terms in ? in equation (3.11); ? We checked in simulations that the goodness of ?t test statistic (3.12) indeed has an asymptotic ?2 distribution. We choose ?13 = ?23 = 0 and set the true parameter values ?2 = .5, ? = (?0,?1,?2,?3) = (.1,.5,?.5,.5). We then ?t the logistic mixed model with all covariates that in?uence the response y. We select L = 12 cells in the computation of T in (3.12) based on x1 and x2 by using ?xed cell boundaries. With the number of iterations K being 5000, we then have 5000 test statistics (f ? e(??))? ? ?1(f ? e(??))/m. Figure 3.1 gives the histogram of these 5000 independently calculated test statistics, which is close to ?211, with p value from the Kolmogorov-Smirnov test being .9294 and 90 test statistics prob 0 5 10 15 20 25 30 35 0.00 0.02 0.04 0.06 0.08 density of chisq, df=11 Histogram of test stat for logistic mixed model Figure 3.1: Histogram of 1000 test statistics for logistic mixed model p value from Pearson?s chi-square goodness of ?t test .40 when the number of cells used is 20. The simulation result agrees with the theory. 3.4.2 Checking the size of the test in simulations We checked the size of the test under various choices of cell partitions based on X. We chose ?13 = ?23 = 0 in (3.15) and let ?2 = .5, ? = (?0,?1,?2,?3) = (.1,.5,?.5,.5), and ?t the logistic mixed model with all covariates X in the model. All cell partitions in the computation of T in (3.12) were based on ?xed cut o?s. Table 3.2 shows that the empirical size estimates (Emp. Size) were close to the nominal ? levels of 0.05 and 0.1 for all choices of cell partitions. This closeness was observed with greater consistency when K = 5000. Table 3.1 gives the empirical sizes of the test under di?erent nominal ? levels when the cell partition is based on 91 x1 and the number of cells is L = 8. They were all very close. The third column is the standard deviation of the corresponding empirical size (ES), which is calculated as ?ES(1?ES)/K. Table 3.1: Empirical size of the test under di?erent levels (logistic mixed model). m = 500;ni = 5;? = (:1;:5;?:5;:5); 2 = :5; 13 = 23 = 0;L = 12;K = 5000. signi?cance level Empirical Size(ES) Standard Deviation of ES 0.05 0.049 0.0031 0.1 0.099 0.0042 0.2 0.199 0.0056 0.3 0.297 0.0065 0.4 0.406 0.0069 0.5 0.505 0.0071 0.6 0.605 0.0069 0.7 0.702 0.0065 0.8 0.801 0.0056 Table 3.2: Empirical size of the test under di?erent cell partitions (logistic mixed model). m = 500;ni = 5;? = (:1;:5;?:5;:5); 2 = :5; 13 = 23 = 0. L Empirical Size Empirical Size K=1000 K=5000 K=1000 K=5000 8 (x1) 0.05 0.054 0.0508 0.1 0.1080 0.1034 3?4 (x1;x2) 0.05 0.048 0.0492 0.1 0.0950 0.0994 5?4 (x1;x3) 0.05 0.044 0.0494 0.1 0.0940 0.0986 6?7 (x2;x3) 0.05 0.056 0.0490 0.1 0.1090 0.1012 3.4.3 Simulations to assess empirical power of the test To assess the power of the test, we ?t the logistic mixed model to the data without including x3 among the covariates. We then tried six di?erent cell partitions based on subsets of the design matrix X with L = 12 and 42 cells. We used ?xed cuto?s to do the cell partitions. We set ?2 = .5 and (?0,?1,?2) = (0,.8,?.8) for all 92 Table 3.3: Impact of cell partition on empirical power I (logistic mixed model). m = 500; 13 = 23 = 0;( 0; 1; 2) = (0;:8;?:8);K = 1000. Parti 3 = :2 3 = :3 3 = :4 L = 12 L = 42 L = 12 L = 42 L = 12 L = 42 x1 0.048 0.050 0.047 0.049 0.046 .056 x2 0.055 0.061 0.051 0.067 0.052 .071 x3 0.787 0.497 0.997 0.939 1 .999 x1;x2 0.053 0.054 0.048 0.047 0.047 .051 x1;x3 0.724 0.478 0.989 0.918 1 1 x2;x3 0.729 0.468 0.992 0.917 1 .999 simulations in this power study Section. Weset(?13,?23) = (0,0)andstudytheimpactofthemagnitudeof?3 onpower. For a given design matrix X, we simulated K = 1000 sets of Y and computed the empirical power of the test over K = 1000 iterations for ?3 = .2, .3 or .4 separately. Table 3.3 shows that with all other settings being the same, the power of the test increases as the magnitude of ?3 increases except with those choices of partition (the ?rst second and fourth) where power is e?ectively constant at 0.05. To study the impact of (?13,?23) on power, we then ?x ?3 = .3 and choose three di?erent pairs of (?13,?23). For each pair of (?13,?23), we simulated a set of design matrix X. We then simulated K = 5000 sets of Y based on this X and computed the empirical power of the test over these K iterations. Table 3.4 shows that with all other settings being the same, the power of the test decreases as the correlation between the x3 and x1, x2 increases, where x3 is the omitted covariate. From Tables 3.3 and 3.4 we can also see that the choice of cell partition strongly a?ects the power. When a covariate that should be in the model is omitted, cell partitions based on that omitted covariate result in adequate power of the test. 93 Table 3.4: Impact of cell partition on empirical power II (logistic mixed model). m = 500;?3 = (0;:8;?:8;:3); 2 = :5;K = 5000. Parti 13 = 0; 23 = 0 13 = 0:2; 23 = 0:3 13 = 0:4; 23 = 0:5 L = 12 L = 42 L = 12 L = 42 L = 12 L = 42 x1 0.047 0.049 0.046 0.054 0.054 0.058 x2 0.051 0.067 0.05 0.053 0.047 0.047 x3 0.997 0.939 0.988 0.883 0.902 0.657 x1;x2 0.048 0.047 0.047 0.046 0.046 0.061 x1;x3 0.989 0.918 0.974 0.862 0.831 0.630 x2;x3 0.992 0.917 0.975 0.850 0.818 0.557 However, if the cell partition is based only on covariates already in the model, this test has low power to detect any lack of model ?t. All the above ?ndings and conclusions in the logistic mixed model agree with what we saw earlier in the linear mixed models. 3.5 Discussion In this Chapter, we extended the goodness of ?t tests developed for Linear Mixed Models (LMMs) in Chapter 2 to Generalized Linear Random Intercept Mod- els (GLMMs). We described the asymptotic properties of the tests when parameters were estimated through maximum likelihood. We assessed factors that impact the power and the impact of choice of cell partitions on the test in simulations for lo- gistic mixed models. We obtained conclusions consistent with those found for the LMMs in Chapter 2, that when a speci?c covariate that is associated with outcome is omitted, cell partitions based on the omitted covariate result in adequate power of the test. However, if the cell partition is based only on covariates already in the model, this test has low power to detect any lack of model ?t. 94 This type of goodness of ?t test can be used to test the statistical adequacy of the ?nally selected GLMM in real applications. All that is needed in order to implement the test are the ?nal model parameter estimates and their variance covariance matrix as well as the estimated means for the outcome y under the model. Implementing logistic mixed models is more complicated that implementing LMMs, as all the means and variances of the outcome y involve integrals. The covariance matrix between the estimated ?xed e?ect parameters and the estimated variance components is not available in R, but is available in PROC nlmixed in SAS. As a note of caution, in applying the test in GLMMs, one needs to check the eigenvalues of the estimated variance covariance matrix ? in (3.11) and eliminate the tiny eigenvalues to make sure its inverse does not blow up. This also helps to get the correct rank of the estimated variance covariance matrix ? in (3.11) to ensure the correct degrees of freedom for the test statistic. 3.6 Technical details for Chapter 3 3.6.1 Checking B.2 in Assumption 3.1 for logistic mixed model Lemma 3.8 Let E be the event that ?mi=1?nij=1 x?2ij has full rank, then the condi- tional distribution of (b?,c)??,?2logf(yi|{xij}j,ni) given {(xij,ni)} on the event E is non-degenerate at ?0. Proof: In order to show that J(?0;?0) is positive de?nite for the random intercept logistic mixed model, we just need to show that ? (b?,c) non-trival, where b is any p ? 1 constant vector and c is any constant, (b?,c)??,?2 logfi(?0,?20) is non- 95 degenerate. We show this in the random intercept logistic model (3.2), where pij,0 = 1/(1+e?(x?ij?0+?0ai)) and {xij}j = {xij,j = 1,...,ni}. Then with ai ? N(0,1) (b?,c)??,?2 logf(yi|{xij}j,ni,ai) = b? ???logf(yi|{xij}j,ni,ai)+c ???2logf(yi|{xij}j,ni,ai) = b? ni? j=1 (yij ?pij,0)xij +c 12? 0 ni? j=1 (yij ?pij,0)ai = ni? j=1 (yij ?pij,0)(b?xij +c ai2? 0 ). Let g1(xTij?0,?0) =?ai pij,0 ?(ai)dai, and g2(xTij?0,?0) =?ai pij,0?i?(ai)dai, integrat- ing out ai then gives (bT,c)??,?2 logf(yi|{xij}j,ni) = ? ai ni? j=1 (yij ?pij,0)(bTxij +c ai2? 0 )?(ai)dai = ni? j=1 [ yijbTxij +yij c2? 0 E?i ? ? ai pij,0 ?(ai)dai bTxij ? c2? 0 ? ai pij,0?i?(ai)dai ] = ni? j=1 (y ij ?g1(xTij?0,?0) )xT ijb? c 2?0 ni? j=1 g2(xTij?0,?0), (3.16) where ?nij=1 g2(xTij?0,?0) is free of yij. With the fact that conditional on {(xij,ni),s.t. P( m? i=1 ni? j=1 x?2ij has full rank) > 0}, every yi ? {0,1}ni has positive probability, (3.16) equals 0 only when both b = 0 and c = 0. 96 3.6.2 Checking B.3 in Assumption 3.1 for Logistic mixed model Lemma 3.9 For the random intercept logistic mixed model (3.2), the dominatedness condition (3.3) holds under the assumption that ?nij=1?x?2ij ? is bounded. Proof: With ai ? N(0,1), the conditional likelihood for the ith cluster, condition- ing on ({xij}j,ni,ai), is logf (yi|{xij}j,ni,ai) = ? j (xT ij? +?ai )y ij ? ? j log ( 1 +exTij?+?ai ) . The ?rst derivatives of the conditional likelihood are ?logf(yi|{xij}j,ni,ai) ?? = ? j ( yij ? 11+e?(xT ij?+?ai) ) xij; ?logf(yi|{xij}j,ni,ai) ??2 = 1 2? ?logf(yi|{xij}j,ni,ai) ?? = 12? ? j ( yij ? 11+e?(xT ij?+?ai) ) ai. The second derivatives of the conditional likelihood are ?? 2logf(yi|{xij}j,ni,ai) ?(?)2 = ? j exTij?+?ai (1 +exTij?+?ai)2xijx T ij; ?? 2logf(yi|{xij}j,ni,ai) ????2 = 1 2? ? j exTij?+?ai (1 +exTij?+?ai)2aixij; 97 ?? 2logf(yi|{xij}j,ni,ai) ?(?2)2 = ? 12? ??? [ 1 2? ? j ( yij ? 11+e?(xT ij?+?ai) ) ai ] = 14?3 ? j [ ai ( yij ? 11 +e?(xT ij?+?ai) +?ai e x?ij?+?ai (1 +exTij?+?ai)2 )] . We ?rst take expectations with respect to yij, conditionally given all other, that is, {xij}j,ni,ai, to integrate out yij. E?0 [ ? ? 2 ??2logf(yi|{xij}j,ni,ai) ] = ? j exTij?+?ai (1 +exTij?+?ai)2xijx T ij, (3.17) E?0 [ ? ? 2 ????2logf(yi|{xij}j,ni,ai) ] = 12? ? j exTij?+?ai (1 +exTij?+?ai)2aixij. (3.18) E?0 [ ? ? 2 ?(?2)2logf (yi|{xij}j,ni,ai) ] = 1 4?3 ? j [ ai ( 1 1 +e?(xTij?0+?0ai) ? 1 1 +e?(xTij?+?ai) +?ai exTij?+?ai (1 +ex?ij?+?ai)2 )] . The two expectations in (3.17) and (3.18) have exactly the same forms as without taking the expectations since there is no term involving yij. We next integrate out ai and with pij = 1/(1+e?(xTij?+?ai)), get E?0 [ ? ? 2 ??2logf(yi|{xij}j,ni) ] = ? j ? ai pij(1?pij)dF(ai)xijx?ij, 98 E?0 [ ? ? 2 ????2logf(yi|{xij}j,ni) ] = 12? ? j (? ai pij(1?pij)aidF(ai) ) xij. E?0 [ ? ? 2 ?(?2)2logf(yi|{xij}j,ni) ] = 1 4?3 ? j (? ai 1 1 +e?(xTij?0+?0ai)aidF(ai)? ? ai pijaidF(ai) +? ? ai pij(1?pij)a2idF(ai) ) . Since pij is bounded, the dominatedness condition (3.3) will hold by inspection if ?nij=1?x?2ij ? is bounded. 3.6.3 Checking B.3 in Assumption 3.1 for Poisson mixed model Lemma 3.10 For the random intercept Poisson mixed model, the dominatedness condition (3.3) holds under the assumption that E(?j ec?k |xijk|) < ?, ? c ? maxk|?k|+?. Proof: With ai ? N(0,1), the conditional likelihood for the ith cluster, condi- tioned on ({xij}j,ni,ai), is logf(yi|{xij}j,ni,ai) = ? j [( xTij? +?ai)yij ?ex?ij?+?ai ?log(yij!) ] . The ?rst derivatives of the conditional likelihood are ?logf(yi|{xij}j,ni,ai) ?? = ? j ( yij ?ex?ij?+?ai ) xij; 99 ?logf(yi|{xij}j,ni,ai) ??2 = 1 2? ?logf(yi|{xij}j,ni,ai) ?? = 12? ? j ( yij ?ex?ij?+?ai ) ai. The second derivatives of the conditional likelihood are ?? 2logf(yi|{xij}j,ni,ai) ?(?)2 = ? j ex?ij?+?aixijx?ij; ?? 2logf(yi|{xij}j,ni,ai) ????2 = 1 2? ? j ex?ij?+?aiaixij; ?? 2logf(yi|{xij}j,ni,ai) ?(?2)2 = ? 1 2? ? ?? [ 1 2? ? j ( yij ?ex?ij?+?ai ) ai ] = 14?3 ? j [ ai ( yij ?ex?ij?+?ai ) +?a2iex?ij?+?ai ] . We ?rst take expectations with respect to yij, conditionally given {xij}j,ni,ai, to integrate out yij. E?0 [ ? ? 2 ??2logf(yi|{xij}j,ni,ai) ] = ? j ex?ij?+?aixijx?ij, (3.19) E?0 [ ? ? 2 ????2logf(yi|{xij}j,ni,ai) ] = 12? ? j ex?ij?+?aiaixij. (3.20) 100 E?0 [ ? ? 2 ?(?2)2logf(yi|{xij}j,ni,ai) ] = 14?2 ? j a2iex?ij?+?ai + 1 4?3 ? j [ ai ( ex?ij?0+?0ai ?ex?ij?+?ai )] . The two expectations in (3.19) and (3.20) have exactly the same forms as without taking the expectations since there is no term involving yij. We next integrate out ai. By making use of the three equations E(e?ai) = e?2/2, Eai(a2ie?ai) = (1+?2) e?2/2 and Eai(aie?ai) = ?e?2/2, we get E?0 [ ? ? 2 ??2logf(yi|{xij}j,ni) ] = (? j ex?ij?xijx?ij ) e?2/2. E?0 [ ? ? 2 ????2logf(yi|{xij}j,ni) ] = 12 e?2/2 (? j ex?ij?xij ) . E?0 [ ? ? 2 ?(?2)2logf(yi|{xij}j,ni) ] = 14?2 (? j ex?ij? ) E(a2ie?ai)+ 1 4?3 [(? j ex?ij?0 ) E (aie?0ai)? (? j ex?ij? ) E (aie?ai) ] = 14?2 (? j ex?ij? ) e?2/2 (1 +?2)+ 1 4?3 [(? j ex?ij?0 ) ?0e?20/2 ? (? j ex?ij? ) ?e?2/2 ] = 14e?2/2 (? j ex?ij? ) + ?04?3e?20/2 (? j ex?ij?0 ) . (3.3) will hold by inspection if assumptions are imposed ensuring that sup??B?(?j ex?ij?(1 +x?ijxij)) is integrable. The following condition, 101 E(?j ec?k |xijk|) < ?, ? c ? maxk|?k|+?, is su?cient to ensure that. 3.6.4 Simpli?cation of ?? in equation (3.11) In this section, we manipulate the expression for the estimated asymptotic variance ? in equation (3.11). The notations VarC,CovC,EC mean the corre- sponding quantities involve integrals over yi alone, conditionally given {xi,ni}i = {(xi,ni), i = 1,...,m}. Because ?J0 and ? are functions of only {xi,ni}i, condi- tionally given {xi,ni}i, VarC(?i) = Var(?i|{xi,ni}i) = VarC (zi1,...,ziL)T +VarC(? ?J?10 Si(?0))? 2CovC [ (zi1,...,ziL)T , ? ?J?10 Si(?0) ] = VarC (zi1,...,ziL)T + ? ?J?10 VarC(Si(?0))?J?10 ? T ? 2 ( CovC ( zi1, ? ?J?10 Si(?0) ) ,...,CovC ( ziL, ? ?J?10 Si(?0) ))T , where CovC ( zil, ? ?J?10 Si(?0) ) = ECyi(zilSTi (?0))?J?10 ? T. We next simplify ECyi(zilSTi (?0)), with notation fCyi meaning the conditional density of fyi given {xi,ni}i, ECyi(zilSTi (?0)) = ? yi ( zil ?f C yi fCyi ) fCyi dyi = ? yi zil ( ?fCyi ) dyi = ? ? yi zil fCyi dyi ? ? yi (?zil) fCyi dyi, 102 and the ?rst of these last two integrals is 0 because of the identity: ? yi zil fCyidyi = ECyi(zil) = 0. Thus ECyi(zilSTi (?0)) = ? ? yi (?zil)fCyidyi = ? yi ( n i? j=1 I{xij?El}?ECyij ) fCyi dyi = ni? j=1 I{xij?El}?ECyij. The last equality holds because ?ECyij is not a function of yi and ?yi fCyi dyi = 1. By the preceding calculations, an by de?nition of e(?0), ? ?? ?? ?? ? ?m i=1 E C[zi1ST i (?0) ] ... ?m i=1 E C[ziLST i (?0) ] ? ?? ?? ?? ? = ? e(?0). (3.21) Let ?? =?mi=1 Var(?i | {xi,ni}i)/m and ? = limm?? ??. Then 103 ?? = 1 m m? i=1 Var(?i | {xi,ni}i) = 1m m? i=1 VarC (zi1,...,ziL)T + 1m m? i=1 ? ?J?10 VarC(Si(?0))?J?10 ? T ? 1 m m? i=1 2 ( CovC ( zi1, ? ?J?10 Si(?0) ) ,...,CovC ( ziL, ? ?J?10 Si(?0) ))T = 1m m? i=1 VarC (zi1,...,ziL)T + ? ?J?10 ? T ?2? ?J?10 ? T = 1m m? i=1 VarC (zi1,...,ziL)T ? ? ?J?10 ? T, The existence of m?1 ?mi=1 Var(zi1,...,ziL)T is ensured by Assumption 3.1. Under the assumption that (xi,ni) are i.i.d, ? converge in probability to , the limiting variance covariance matrix of (f?e(??))/?m. With ? in ? replaced by its MLE ??, ? = 1m?mi=1 ?Var(?i|xi,ni) is a consistent estimator of . 104 Chapter 4 Discussion and further research 4.1 Discussion Schoenfeld (1980) presented a class of omnibus chi-squared goodness of ?t tests for the proportional hazards regression model. We adapted this idea and proposed a class of goodness of ?t tests for testing the statistical adequacy of a 2-level generalized linear mixed model (GLMM). We described the asymptotic properties of the test when parameters were estimated through maximum likelihood. For a special case of linear mixed models (LMMs), we extended this test to 2-level LMMs with no distributional assumptions for either the random e?ect ?i or the error term ?ij. We also extended the test to multi-level LMMs. We assessed factors that impact the power and the impact of choice of cell partitions on the test in simulations for both linear mixed models (LMMs) and lo- gistic mixed models. We found that when a speci?c covariate that is associated with outcome is omitted, cell partitions based on the omitted covariate result in adequate power of the test. However, if the cell partition is based only on covariates already in the model, this test has low power to detect any lack of model ?t. For LMMs, we also conducted simulations to show that our test was very robust to violations of the normality assumption of the error distribution when we use symmetric distributions. For LMMs, we developed the theoretical power of the test under the local 105 alternative. We then checked this theoretical power, obtained from Le Cam?s third lemma, with simulations. We found that the estimated theoretical power calculated using Le Cam?s third lemma was reliable at least when the number of clusters m is above 50. However, when m is very small, it may be advisable to rely on the empirical power computed through simulations. For LMMs, we also proposed a criteria parameter ? that is closely related to the power. This goodness of ?t test can be used to test the statistical adequacy of the ?nally selected GLMM in real applications. All that is needed in order to implement the test are the ?nal model parameter estimates and their variance covariance matrix as well as the estimated means for the outcome y under the model. It is easy to implement the test in LMMs as these are standard outputs from any statistical software. Implementing logistic mixed models is relatively more complicated, as all the means and variances of the outcome y involve integrals. Also, the covariance matrix between the estimated ?xed e?ect parameters and the estimated variance components is not available in R, but is available in PROC nlmixed in SAS. As a note of caution, in applying the test in GLMMs, one needs to check the eigenvalues of the estimated variance covariance matrix ? in (3.12) and eliminate the tiny eigenvalues to make sure its inverse won?t blow up. This also helps to get the correct rank of the estimated variance covariance matrix ? in (2.13) to ensure the correct degrees of freedom for the test statistic. 106 4.2 Further research In this thesis, I applied the Schoenfeld?s residual approach [30] to test the goodness of ?t of classical generalized linear mixed models, via quadratic goodness of ?t statistics. I then applied this test to three biomedical data, testing the goodness of ?t of two-level linear mixed models. One of the further research will be applying this test to multilevel and longitudinal data once we have these kinds of data. Another aspect of further research will be comparing our proposed goodness of ?t test with the existing bootstrap and Bayesian approaches. 107 Chapter 5 Appendix: Three general lemmas The following three Lemmas are used in several places in the text. Lemma 5.1 Suppose that {uin : n ? 1, 1 ? i ? n} is a triangular array of independent identically distributed random variables within each row (i.e., across i) with mean 0 and ?nite variance ?2u, and that these variables are independent of the random array {cin : n ? 1, 1 ? i ? n} which satis?es the additional properties that as n ?? (a) max 1?i?n |cin| ? 0 and (b) n? i=1 c2in ? ? in probability, where ? ? (0,?). Then ?ni=1 cinuin D? N(0,?) as n ??. Proof. {?ki=1 cinuin}nk=1 is a martingale with respect to the ?ltration Fkn = ?({cin, .uin : 1 ? i ? k}). The Lemma is an immediate consequence of the Martingale Central Limit Theorem (D. McLeish 1974; P. Hall and C. Heyde 1981). A direct proof from the standard Lindeberg Central Limit Theorem (applied condi- tionally given {cin}ni=1) is also not di?cult. 2 Lemma 5.2 Let be a q ?q covariance matrix of rank k with smallest non-zero 108 eigenvalue ?, and let ? ? (0,?) be arbitrary. Let ? N be a sequence of random q?q covariance matrices such that with probability approaching 1 as N ??, the smallest positive eigenvalue of ??N is greater than ?. If ? N P? , then P ( rank (? N) = k ) ? 1, as N ??. Proof. Let V and V? be the range and null space of respectively, and dim(V) = rank( ) = k. Because ? N P? , we have that ? w ? V? with w ?= 0, ?? Nw? P? 0, thus P(?? Nw?? ??w?) ? 1. By the hypothesis on ? N stated in the Lemma, P ( [ ?? Nw?? ??w? ] ? [ w ?= 0 ] ) ? 0. Thus P(w ?{null space of ? N}) ? 1, ? w ?V?, where V? is the null space of . Similarly, ? v ?V with v ?= 0, ? Nv P? v. Thus ?? Nv? P?? v?? ??v? > ??v?, which leads to P(? Nv ?= 0) ? 1. Overall, ? w ?V? with w ?= 0 and ? v ?V with v ?= 0, P(w ? null space of ? N, v ? range of ? N) ? 1. Therefore, P ( rank (? N) = k ) ? 1, as N ??. 2 109 Corollary 5.3 Let the symmetric matrix ? 0N be a consistent estimator sequence for a q ? q covariance matrix whose smallest non-zero eigenvalue is ?. Let ? be an arbitrary number, where 0 < ? < ?. If we represent ? 0N in terms of an orthonormal eigenbasis by ? 0N = ?qk=1 ckvkNvTkN and de?ne the random matrix ? N =?qk=1 ckI[c k>?] vkNv T kN, then P(rank (? N) = rank ( )) ? 1. Proof. Let V and V? be the range and null space of respectively. Because ? 0N P? , we have that ? w ?V? with w ?= 0 and ? v ?V with v ?= 0, ? 0Nw ? 0, (? 0N ? )v P? 0. By de?nition of the matrix ? N, { w : ?? 0Nw?? ??w? } ? { w : ? Nw = 0 } , and ?? Nv? > ??v?. Thus P(? Nw = 0) ? 1 and P(? Nv ?= 0) ? 1. Overall, ? w ?V? with w ?= 0 and ? v ?V with v ?= 0, P(w ? range of ? N, v ? null space of ? N) ? 1. Therefore, P ( rank (? N) = rank ( ) ) ? 1, as N ??. 2 110 Bibliography [1] Agresti, A. Categorical Data Analysis, 2nd ed. New York: Wiley, 2002. [2] Andersen, P. K. and Gill, R. D. (1982). Cox?s regression model for counting processes: a large sample study. Ann. Statist. 10, 1100-1120. [3] Bickel, P. J. and Doksum, K. A. Mathematical Statistics, 2nd ed. Prentice Hall; 2006. [4] Christiansen, C. L. and Morris, C. N. (1997). Hierarchical Poisson regression modeling. Journal of the American Statistical Assocation. 92, pp. 618-632. [5] Claeskens, G. and Hart, J. D. (2009). Goodness-of-?t tests in mixed models. TEST. 18, pp. 213-239. [6] Cox, D. R. (1961). Tests of separate families of hypotheses. Proc. Fourth Berke- ley Symp. on Math. Statist. and Prob., Vol. 1, 105-123. [7] Crainiceanu, C. M. and Ruppert D. (2004). Likelihood ratio tests in linear mixed models with one variance component. J. R. Statist. Soc. B 66, 165-185. [8] Dorgan, J. F. , Baer, D. J. , Albert, P. S. , Judd, J. T. , Brown, E. D. , Corle, D. K. , et al. (2001) Serum hormones and the alcohol-breast cancer association in postmenopausal women. J Natl Cancer Inst 93, pp. 710-5. [9] Godfrey, L. G. Misspeci?cation Tests in Econometrics. Cambridge, 1988. [10] Jacqmin-Gadda, H., Sibillot, S., Proust, C., Molina, J.-M., and Thi?ebaut, R. (2007). Robustness of the linear mixed model to misspeci?ed error distribution. Computational Statistics and Data Analysis, Vol. 51, pp, 5142-5154. [11] Jiang, J. (1998b). Consistent estimators in generalized linear mixed models. Journal of the American Statistical Association, 93, pp. 720-729. [12] Jiang, J. (1999). Conditional inference about generalized linear mixed models. The Annals of Statistics. 27, pp. 1974-2007. [13] Jiang, J. (2001). Goodness-of-?t tests for mixed model diagnostics. The Annals of Statistics 4, 1137-1164. 111 [14] Jiang, J. and Lahiri, P. (2006). Mixed Model Prediction and Small Area Esti- mation. Test. Vol. 15, No. 1, pp. 1-96. [15] Kedem, B. and Fokianos, K. (2002). Regression Models for Time Series Anal- ysis. Wiley, New York. [16] Khuri, A. I., Mathew, T. and Sinha, B. K. (1998). Statistical Tests for Mixed Linear Models. New York: Wiley. [17] Laird, N. M. and Ware, J. H. (1982). Random-E?ects Models for Longitudinal Data. Biometrics. Vol. 38, No. 4, pp. 963-974. [18] Lange, N. and Ryan, L. (1989). Assessing normality in random e?ects models. Annals of Statistics. Vol. 17, No, 2, pp. 624-642. [19] Lee, Y., Nelder, J. A., and Pawitan, Y.. Generalized Linear Models with Ran- dom E?ects Uni?ed Analysis via H-likelihood. Chapman Hall/CRC, 2006. [20] Lin, D. Y., Wei, L. J. and Ying, Z. (2002) Model-checking techniques based on cumulative residuals. Biometrics, Vol 58, No. 1, pp. 1-12. [21] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman and Hall, London, 2nd ed. [22] McCulloch, C. E. and Searle, S. R. Generalized, Linear, and Mixed Models. Wiley, 2001. [23] Miller, J. J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. The Annals of Statistics. 5, 746-762. [24] Pan, Zhiying and Lin, D. Y. (2005). Goodness-of-Fit Methods for Generalized Linear Mixed Models. Biometrics. 61, 1000-1009. [25] Park, T. and Lee, S.-Y. (2004). Model diagnostic plots for repeated measures data. Biometrical Journal, 46, pp. 441-452. [26] Prasad, N. G. N. and Rao, J. N. K. (1990). The Estimation of the Mean Squared Error of Small-Area Estimators. Journal of the American Statitical Association. Vol. 85, No. 409, pp. 163-171. [27] Rao, C. R. and Wu, Y. (1989) A strongly consistent procedure for model selec- tion in a regression problem. Biometrika, Vol. 76, No. 2, pp. 369-374. 112 [28] Rao, C. R., Wu, Y., Konishi, S. and Mukerjee, R. (2001) On model selection. IMS Lecture Notes - Monograph Series, Vol. 38, pp. 1-64. [29] Ritz, C. (2004). Goodness-of-?t tests for mixed models. Board of the Foundation of the Scandinavian Journal of Statistics 31, 443-458. [30] Schoenfeld, D. (1980). Chi-squared goodness-of-?t tests for the proportional hazards regression model. Biometrika 67, 145-153. [31] Self, S. G. and Liang, K. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association 82, 605-610. [32] Shao, J. (1997) An asymptotic theory for linear model selection. Statistica Sinica 7, 221-264. [33] Slud, E. and Kedem, B. (1994). Partial likelihood analysis of logistic regression and autoregression. Statistica Sinica 4, 89-106. [34] Stezhko, VA, Buglova, EE, Danilova, LI, et al (2004). A cohort study of thyroid cancer and other thyroid diseases after the Chornobyl accident: Objectives, design and methods. RADIATION RESEARCH Volume 161, Issue 4, 481- 492. [35] Tsiatis, A. A. (1980). A note on a goodness-of-?t test for the logistic regression model. Biometrika 67, Issue 1, pp. 250-251. [36] Van der Vaart, A. W. , Asymptotic Statistics. Cambridge, 2000. [37] Wand, M. P. (2007). Fisher information for generalised linear mixed models. Journal of Multivariate Analysis. 98, 1412-1416. [38] Weiss, R. E. Modelling Longitudinal data. Springer, 2005. 113