ABSTRACT
Title of dissertation: GOODNESS OF FIT TESTS FOR
GENERALIZED LINEAR MIXED MODELS
Min Tang, Doctor of Philosophy, 2010
Dissertation directed by: Professor Eric V. Slud
Department of Mathematics
Dr. Ruth M. Pfei?er
National Cancer Institute
Generalized Linear mixed models (GLMMs) are widely used for regression
analysis of data, continuous or discrete, that are assumed to be clustered or corre-
lated. Assessing model ?t is important for valid inference. We therefore propose a
class of chi-squared goodness-of-?t tests for GLMMs. Our test statistic is a quadratic
form in the di?erences between observed values and the values expected under the
estimated model in cells de?ned by a partition of the covariate space. We show that
this test statistic has an asymptotic chi-squared distribution. We study the power of
the test through simulations for two special cases of GLMMs, linear mixed models
(LMMs) and logistic mixed models. For LMMs, we further derive the analytical
power of the test under contiguous local alternatives and compare it with simulated
empirical power. Three examples are used to illustrate the proposed test.
GOODNESS OF FIT TESTS FOR
GENERALIZED LINEAR MIXED MODELS
by
Min Tang
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial ful?llment
of the requirements for the degree of
Doctor of Philosophy
2010
Advisory Committee:
Dr. Eric V. Slud, Co-Advisor
Dr. Ruth M. Pfei?er, Co-Advisor
Dr. Paul J. Smith
Dr. Abram M. Kagan
Dr. Wolfgang Losert
c? Copyright by
Min Tang
2010
Dedication
To my parents and Jun.
ii
Acknowledgments
It is a great honor for me to thank all the people who made this thesis possible.
First and foremost, I owe my deepest gratitude to my two coadvisors, Dr. Eric
V. Slud and Dr. Ruth M. Pfei?er.
I am grateful to Dr. Eric V. Slud for his expert guidance and continuous
support during my research and study at University of Maryland, College Park. Dr.
Slud has been a signi?cant presence in my life. His perpetual energy and his rigor
and enthusiasm in research have been motivating me through my whole doctoral
study. His insights have strengthened this study signi?cantly. I will always be
thankful for his wisdom, knowledge, deep concern and constant encouragement.
I would like to show my greatest gratitude and sincere thanks to Dr. Ruth
M. Pfei?er who o?ered me the predoctoral training at National Cancer Institute
(NCI) which led me to the world of Biostatistics and cancer research. Her rigor and
passion on research in?uenced me a lot. I am deeply indebted to her for her being
always ready to help and her precious time on guiding me through my research. I
could not have ?nished this thesis within four years without her invaluable help.
She has also made available her support in a number of other ways, such as revising
my CV, giving suggestions to my job talk rehearsal. She is the best mentor I have
ever met. It has been an honor for me to work with her.
I am grateful to all my committee members Dr. Paul J. Smith, Dr. Abram
Kagan and Dr. Wolfgang Losert for their valuable comments and suggestions to
this thesis. I would like to thank Dr. Paul J. Smith for all his generous help and
iii
guidance on taking courses in my entire graduate period. I also want to thank Dr.
Abram M. Kagan for his interest and discussion with me on the MLE consistency
proof in this thesis.
My special thanks go to Dr. Partha Lahiri for initially recommending me to
the Biostatistics Branch in NCI.
Many thanks to all my colleagues and collaborators involved in the projects I
have been working on at NCI. I would also show my sincere thanks to all my friends
at the Mathematics Department, especially Ziliang Li for all the discussions on my
thesis, Tinghui Yu for his generous help on my statistical questions, Ning Jiang for
his encouragement during my qualify exam stage, Huilin Li for her help and sharing
information during my stay at NCI and Lingyan Cao, Yong Zheng, Guanhua Lu,
Denise, Shihua Wen, Shu Zhang, Wei Guo, Neung Ha and Rongrong Wang, for their
sincere help and all the good time with them during my graduate study.
I am grateful to my cousins Lifang Liu and Juan Tang who provided tremen-
dous help when I was applying for graduate school, who took care of my parents
in China when I was pursuing my Ph.D. in the US, who have been standing by my
side at all stages of my life. I also want to thank my good friend Caixia Kou and
all my other friends in China for their constant supports.
My deepest gratitude goes to my family for their un?agging love and support
throughout my life.
Finally, I am forever indebted to Jun. His encouragement and companionship
have turned my journey through graduate school into a pleasure. This dissertation
is simply impossible without them.
iv
Table of Contents
List of Tables vii
List of Figures viii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Goodness of ?t tests for linear mixed models 7
2.1 Linear mixed models (LMMs) . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Goodness of ?t test statistic . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Test statistic and its asymptotic properties for LMMs when
parameters are estimated by maximum likelihood . . . . . . . 10
2.2.1.1 LMM with a single random e?ect . . . . . . . . . . . 10
2.2.1.2 LMM with additive random e?ects . . . . . . . . . . 15
2.2.2 Test statistic and its asymptotic properties for two-level LMM
with parameters estimated by least squares and method of
moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 Power of the test . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.1 Normally distributed covariates (Scenario I) . . . . . . . . . . 30
2.3.1.1 Impact of choice of the cell partition on power . . . . 34
2.3.1.2 Robustness of T with respect to error distribution . . 35
2.3.1.3 A summary parameter related to the power . . . . . 36
2.3.2 Normally distributed interacting covariates (Scenario II) . . . 38
2.4 Data examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.1 Birth weight data . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.2 Alcohol data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.3 Factors impacting thyroglobulin levels in an iodine de?cient
population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Technical details for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . 48
2.6.1 Proof of Theorem 2.3 . . . . . . . . . . . . . . . . . . . . . . . 48
2.6.2 Proof of Corollary 2.7 . . . . . . . . . . . . . . . . . . . . . . 52
2.6.3 Proof of Theorem 2.10 . . . . . . . . . . . . . . . . . . . . . . 52
2.6.4 Proof of Theorem 2.12 . . . . . . . . . . . . . . . . . . . . . . 53
2.6.5 Derivation of the power of T . . . . . . . . . . . . . . . . . . . 59
2.6.5.1 Limit of (X?)TV?1X and (X?)TV?1(X?) in (2.22) . 64
2.6.5.2 Limit of ? in (2.22) . . . . . . . . . . . . . . . . . . 70
v
3 Goodness of ?t tests for generalized linear mixed models 73
3.1 Generalized linear mixed models (GLMMs) . . . . . . . . . . . . . . . 73
3.1.1 Mixed-e?ects logistic models . . . . . . . . . . . . . . . . . . . 74
3.1.2 Mixed-e?ects Poisson models . . . . . . . . . . . . . . . . . . 76
3.2 Proof of the consistency of MLE for GLMMs . . . . . . . . . . . . . . 77
3.3 Goodness of ?t test for GLMMs . . . . . . . . . . . . . . . . . . . . . 81
3.3.1 Derivation of the power of T . . . . . . . . . . . . . . . . . . . 85
3.4 Simulations for logistic mixed models . . . . . . . . . . . . . . . . . . 88
3.4.1 Computational issues and code checking . . . . . . . . . . . . 89
3.4.2 Checking the size of the test in simulations . . . . . . . . . . . 91
3.4.3 Simulations to assess empirical power of the test . . . . . . . . 92
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.6 Technical details for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . 95
3.6.1 Checking B.2 in Assumption 3.1 for logistic mixed model . . . 95
3.6.2 Checking B.3 in Assumption 3.1 for Logistic mixed model . . 97
3.6.3 Checking B.3 in Assumption 3.1 for Poisson mixed model . . . 99
3.6.4 Simpli?cation of ?? in equation (3.11) . . . . . . . . . . . . . . 102
4 Discussion and further research 105
4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2 Further research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5 Appendix: Three general lemmas 108
Bibliography 111
vi
List of Tables
2.1 Empirical size of the test under di?erent  levels (LMM) . . . . . . . . . 32
2.2 Empirical size of the test under di?erent cell partitions (LMM). . . . . . . 33
2.3 Power and robustness study for Scenario I (LMM). . . . . . . . . . . . . 34
2.4 Impact of cell partition on empirical power for Scenario I (LMM). . . . . . 35
2.5 Empirical size of the test for Scenario II under di?erent cell partitions with
standard deviations in brackets (LMM). . . . . . . . . . . . . . . . . . . 39
2.6 Impact of cell partition on empirical power for Scenario II (LMM). . . . . 39
3.1 Empirical size of the test under di?erent  levels (logistic mixed model). . 92
3.2 Empirical size of the test under di?erent cell partitions (logistic mixed
model). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3 Impact of cell partition on empirical power I (logistic mixed model). . . . 93
3.4 Impact of cell partition on empirical power II (logistic mixed model). . . . 94
vii
List of Figures
2.1 The impact of (?2a,?2?) on analytical power (Scenario I, LMM) . . . . 29
2.2 The impact of (?13,?23) on analytical power (Scenario I, LMM) . . . 29
2.3 Analytical power plot for ?31 = ?32 = 0, x-axis is ?3/(?2a +?2?). Case
1: ?2a = .25,?2? = 1; Case 2: ?2a = .625,?2? = .625; Case 3: ?2a =
1,?2? = .25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Empirical and Asymptotic distribution of the test statistic 2.13 in
Scenario I with correlations ?13 = ?23 = 0. . . . . . . . . . . . . . . . 32
2.5 Empirical power vs summary parameter ? . . . . . . . . . . . . . . . 38
2.6 Theimpactof(?2a,?2?)onanalyticalpower, ?12 = 0, x-axisis?3/(?2a +?2?).
Case 1: ?2a = .25,?2? = 1; Case 2: ?2a = .625,?2? = .625; Case 3:
?2a = 1,?2? = .25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7 Normality assumption checking for birth weight data. . . . . . . . . . 41
2.8 Normality checking for residuals of ?nal model for birth weight data . 42
2.9 Normality checking for log(TG) for Alcohol data. . . . . . . . . . . . 43
2.10 Normality checking for residuals of the ?nal model for Alcohol data . 44
3.1 Histogram of 1000 test statistics for logistic mixed model . . . . . . . 91
viii
List of Notation and Abbreviations
? a space of outcomes of an experiment
P a probability measure on ?
R the real line, (??, ?)
R+ the non-negative real line, [0,?)
Z+ the set of positive integer numbers
I{A} indicator function, where A is either a set or an event de?ned by a
property enclosed in brackets, de?ned in terms of a random variable
N(?,?2) Normal distribution with mean ? and variance ?2
E(X) mean of the random variable X
Var(X) variance of the random variable X
?{?} the ?-algebra generated by random variables in {?}
E(Y|X) the conditional expectation of Y w.r.t. a ?-algebra generated by X
Var(Y|X) the conditional variance of Y given X
i.i.d. independent and identically distributed
A ? B A?B P? 0
d(?,?0) Euclidean distance ????0?2 between ? and ?0
aT transpose of the vector or matrix a
IN N ?N identity matrix
1n a n?1 vector of all 1s
a?2 Kronecker product of vector a, aaT
1?2n the n?n matrix of all 1s
A?1 inverse of matrix A
tr(A) trace of matrix A, de?ned to be the sum of the diagonal elements
of matrix A
?A? Euclidean norm of matrix A
P? convergence in probability
D? convergence in distribution
op(1) a random variable sequence Xn = op(1) means Xn P? 0
ix
Chapter 1
Introduction
1.1 Background
The linear mixed model (LMM) (McCulloch and Searle, 2001) extends the
standard linear regression model by including random e?ects in addition to the usual
?xed e?ects in the linear predictors. LMMs can be expressed as Y = X?+Z?+?,
where Y is a vector of observations, X is a matrix of known covariates,? is a vector
of unknown ?xed regression coe?cients which are called ?xed e?ects, Z is a known
matrix, ? is a vector of unknown random e?ects and ? is a vector of unobservable
random errors. By incorporating random e?ects, LMMs can accommodate clus-
tered or correlated or longitudinal data. For example, in medical studies, various
measurements are often collected from the same individual over time. It is then
reasonable to assume that the observations for the same individual are correlated.
Examples of applying LMM to longitudinal data can be found for example in Laird
and Ware (1982), Weiss (2005, Chapter 9) and Lee et al. (2006). Generalized linear
mixed models (GLMMs) further extend the LMM family to discrete or categorical
exponential family data. Examples include logistic mixed models for binomial data
and Poisson mixed models for count data. Data examples of GLMMs are given
in McCullagh and Nelder (1989, Section 14.5) and GLMMs have numerous appli-
cations in medical research and the survey area. Jiang and Lahiri (2006) gave a
1
very good general literature review of prediction based on GLMMs, with particular
application to small-area estimation.
There are various methods of estimating GLMM parameters. The method of
maximum likelihood is widely used. A full maximum likelihood analysis requires
numerical integration techniques to calculate the log-likelihood function and thus
the distribution of the random e?ects needs to be known. Jiang (1998b) proposed
estimating equations that apply to GLMMs not necessarily having a block-diagonal
covariance matrix structure. Jiang (1999) proposed a method of inference which
in many ways resembles the method of least squares in linear models and relies on
weak distributional assumptions about random e?ects. In this thesis, we focus on
the maximum likelihood method for parameter estimation in GLMMs.
Developments in model ?tting algorithms and their implementations in statis-
tical packages have greatly facilitated the applications of LMMs and GLMMs. The
commonly used functions for mixed modeling in the statistical software package
SAS, version 9.2, are PROC MIXED, PROC NLMIXED and PROC GLIMMIX.
The commonly used functions for mixed modeling in R, version 2.11.1, are
? linear mixed models: aov(), lme() in library(nlme), lmer() in library(lme4);
? generalized linear mixed models: glmmPQL() in library(MASS), glmer() in
library(lme4), MCMCglmm() in library(MCMCglmm).
Two important steps in modeling are selecting a model and checking its ?t. Fre-
quently, model selection is done by comparing nested models, via likelihood ratio,
wald or score tests, as part of model building and there are approaches for com-
2
paring non-nested models (Cox, 1961; Godfrey, 1988). AIC (Akaike?s information
criterion), BIC (Bayesian information criterion) and other model selection principles
(Rao and Wu, 1989; Shao, 1997) focus on selection of covariates. Rao, Wu et al.
(2001) gave a concise review on the subject of the statistical model selection.
These methods select the best statistical model from a set of potential models
chosen by the researcher, given the observed data. Even though the ?nally selected
model may be the best in the class of potential models, it might still not provide a
good ?t to the data. Thus once a model is selected, its ?t should be assessed. This is
often done by checking residuals and formal goodness of ?t tests. There are various
diagnostics and graphical techniques in assessing goodness of ?t of models. Lin, Wei
and Ying (2002) developed objective and informative model-checking techniques
for a variety of statistical models and data structures, including generalized linear
models with independent or dependent observations, by taking the cumulative sums
of residuals over certain coordinates. Lange and Ryan (1989) described a graphical
method for checking distributional assumptions about the random e?ects in random
e?ects models. Park and Lee (2004) proposed residual plots to investigate the
goodness of ?t for repeated measure data, where they mainly focus on the mean
model diagnostics. Jacqmin-Gadda et al. (2007) discussed the ?t of a linear mixed
model through Cholesky residuals and conditional residuals. Pan and Lin (2005)
developed graphical and numerical methods for checking the adequacy of generalized
linear mixed models, by comparing the cumulative sums of residuals with certain
Gaussian processes. Regarding formal tests for the model adequacy, goodness of
?t tests for generalized linear models for ?xed e?ects can be found in Chapter 4
3
in Agresti (2002). However, the literature for formally assessing the overall ?t for
GLMMs is limited. Some procedures to assess model misspeci?cation have been
proposed. Testing for the presence of random e?ects in LMMs has been discussed
by Self and Liang (1987) and by Crainiceanu and Ruppert (2004). Jiang (2001)
and Ritz (2004) assessed the distributional assumptions for the random e?ects in
LMMs. Claeskens and Hart (2009) proposed formal tests for testing the normality
of random e?ects and/or error terms in LMMs. Khuri, Mathew and Sinha (1998)
presented derivations of both exact and optimal tests regarding variance component
models, as well as guidance on using such tests for hypothesis testing for the ?xed
e?ect part. These are separate tools for checking ?xed-e?ect speci?cation or for
separately checking the residuals or forms of the random e?ect speci?cation.
Testing the overall adequacy of a proposed model has been discussed in the
literature for several types of models with ?xed e?ects. Tsiatis (1980) proposed a
goodness-of-?t test to test the overall ?t of a logistic regression model. The test
is originally established based on the e?cient scores test and after simpli?cation,
it is reduced to a quadratic form of observed counts minus the expected counts in
regions of the covariate space. However, for logistic mixed models, with the presence
of random e?ects, this e?cient scores test can not be simpli?ed as a quadratic form
of observed counts minus the expected counts because of the integrals involved in
the likelihood function. For survival data, Schoenfeld (1980) presented a class of
omnibus chi-squared goodness of ?t tests for the proportional hazards regression
model. Slud and Kedem (1994) adapted the idea of Schoenfeld to generalized linear
time series models and discussed ?xed e?ects binary-response models with time-
4
dependent covariates. Kedem and Fokianos (2002) extended this approach to various
other goodness of ?t tests based on categorical time series residuals. In this thesis,
we adopt the idea of Schoenfeld (1980) and develop a class of goodness of ?t tests
for GLMMs by comparing the observed and expected values computed from the
model within cells of a partition of the covariate space. This class of goodness of ?t
tests primarily assesses the adequacy of ?t of the ?xed e?ects part in the presence
of random e?ects.
1.2 Overview of thesis
In Chapter 2, we present the linear mixed models (LMMs). We adopt the
idea of Schoenfeld (1980) and propose a class of goodness of ?t tests for testing
the statistical adequacy of the selected LMM. We study two classes of LMMs, the
general LMM with additive random e?ects Y = X?+?Rr=1 Zr?r +?, where the
random e?ects?r,r = 1,...,R are normally distributed and in a moderate to large
sample setting, the two-level LMM yij = xTij? + ?i + ?ij, that is, the LMM with
one random intercept, where no distributional assumption is made on the random
e?ect ?i or the error term ?ij. For this two-level LMM, i = 1,...,m, m denotes
the number of clusters, and j = 1,...,ni, ni denotes the size of cluster i. To
deal with technical issues, the covariate matrix X is is assumed in di?erent settings
to be a matrix either of ?xed constants or of random variables. We propose a
test statistic based on di?erences between observations and their expected values
computed under the model aggregated over cells of a partition of the covariate space.
5
We ?rst discuss assumptions needed, and then derive the asymptotic properties of
this test statistic as the total number of observations N tends to in?nity under
the null hypothesis and under local alternatives. For the two-level LMM, N =
?m
i=1 ni = (
?m
i=1 ni/m)m. Under the assumption of the existence of
?m
i=1 ni/m, N
tending to in?nity is equivalent to m tending to in?nity. For the general LMM with
additive normal random e?ects, we estimate parameters using maximum likelihood
estimators (MLEs) and assume that the covariate matrix X is ?xed and nonrandom.
For the two-level LMM with no distributional assumptions made on the random
e?ect or the error term, we assume that (xi,ni), where i is the cluster index, are
i.i.d random vectors and estimate the parameters using least squares and method
of moments. In Chapter 2, we also check the theoretical power in simulations, and
study the impact of choice of cell partitions on the test as well as the robustness of
the test with respect to the error distribution. We illustrate this test in three real
datasets.
In Chapter 3, we extend the test to random intercept generalized linear mixed
models (GLMMs) and derive its theoretical power. To do that, we also prove the
MLE consistency of GLMMs under certain assumptions. The covariate matrix X in
this Chapter is considered to be random variables. Again, we conduct simulations
to assess factors with an impact on the power of the test in GLMMs.
In Chapter 4, we discuss the applicability of the results and point to future
research directions.
There is a separate technical appendix at the end of each chapter. Appendix
A contains three lemmas cited in this thesis.
6
Chapter 2
Goodness of ?t tests for linear mixed models
2.1 Linear mixed models (LMMs)
A linear mixed model (LMM) has the form
Y = X?+Zu+?, (2.1)
where YN?1 is the vector of observations; XN?p is the design matrix for the ?xed
e?ects part of the model;?is a p?1 vector of unknown ?xed e?ects parameters; u is
a vector of random e?ects and?is a vector of errors. Typically u and?are assumed
to be independent of each other and each independently normally distributed with
mean 0 and unknown variances.
In this thesis, we only consider the LMM (2.1) with Z?=?Rr=1 Zr?r, i.e.
Y = X?+
R?
r=1
Zr?r +?. (2.2)
Here Zr, an N ?mr matrix of constants, is the design matrix for the random e?ect
?r, r = 1,...,R. The quantity ?r is an mr ?1 random vector, r = 1,...,R. Also,
components of ?r are i.i.d. within the vector, ?1,...,?R are independent and
are also independent of ?. In this thesis, ?1,...,?R are always assumed normal
7
except for the 2-level LMM case discussed in Section 2.2.2 where no distributional
assumptions are made on either the random e?ect or the error term. Let ? =
(?2?,?21,...,?2R), the parameter vector of all variance components and let? = (?,?).
Let Gr = ZrZTr , r = 1,...,R and G0 = IN. The X matrix can be ?xed or random.
To deal with technical issues, X is considered to be ?xed in Section 2.2.1 and to be
random in Section 2.2.2.
As an important case of model (2.2), we now introduce the linear mixed model
with a single random e?ect:
yij = xTij?+?i +?ij, i = 1,...,m, j = 1,...,ni, (2.3)
where the 1?p vector xTij = (1,xij1,...,xij(p?1)) denotes covariates for ?xed e?ects
for the jth observation within the ith cluster. The cluster speci?c random e?ects
?i ? N(0,?2a) are assumed to be independent of the error terms ?ij, ?ij ? N(0,?2?).
To accommodate an intercept term in the model, the ?rst entry in xij is 1. We write
N = ?mi=1 ni and yi = (yi1,...,yini) denotes the vector of observations for the ith
cluster.
Under these assumptions, Y ? N(X?,V), with a block diagonal covariance
matrix V, where each of the m ni ?ni blocks has the structure
Vi =
?
??
??
??
?
?2a +?2? ??? ?2a
... ... ...
?2a ??? ?2a +?2?
?
??
??
??
?
ni?ni
. (2.4)
8
For asymptotic analysis of the LMM model (2.3), we always assume that m goes to
in?nity, thus N also goes to in?nity.
The following assumptions are made on model (2.2) throughout Chapter 2.
Assumption 2.1 The true parameter point ?0 = (?0,?0) is an interior point of
 = (Rp,(R+)R+1). For the 2-level LMM (2.3), R = 1.
Assumption 2.2 The covariate matrix X can be either ?xed or random. If X is
assumed to be ?xed, then it is assumed to have full rank p. If X is assumed to be
a matrix of random variables which is the 2-level LMM (2.3) discussed in Section
2.2.2, then (xi,ni) are assumed i.i.d. with ?E(x
?2
i )? < ? and E(n
2
i) < ?.
Assumption 2.3 When the covariate matrix X is considered to be ?xed, we al-
ways assume that, with E1,...,El being a cell partition of the covariate space, for
l = 1,...,L, N?1?Nk=1 I{xk?El}xTk exists.
Remark 2.1 For the 2-level LMM with ?xed covariate matrix X, Assumption 2.3 is
equivalent to the existence of N?1?mi=1?nij=1 I{xij?El}xTij. When the covariate matrix
X is considered to be random, the existence of N?1?Nk=1 I{xk?El}xTk is ensured by
Assumption 2.2. 2
9
2.2 Goodness of ?t test statistic
2.2.1 Test statistic and its asymptotic properties for LMMs when
parameters are estimated by maximum likelihood
2.2.1.1 LMM with a single random e?ect
In this Section, we discuss the 2-level LMM (2.3) with normality assumptions
on both the random e?ect and the error term. The covariate matrix X is considered
to be ?xed. We derive our test statistic for the setting where the model parameters
? = (?,?) = (?,?2a,?2?) are estimated by the maximum likelihood. Since we assume
normality both for the random intercept term ?i and for the error term ?ij, we can
use the result of Miller (1977) to show the consistency and asymptotic normality of
the maximum likelihood estimator (MLE) ??. The following assumptions are made
for the two-level LMM.
Assumption 2.4 J?? = limN??XTV?1X/N exists and is positive de?nite;
Here the positive de?niteness assumption for J?? is equivalent to the assumption
that X has full rank.
Assumption 2.5 The 2?2 matrix M with elements de?ned below exists and is
positive de?nite;
[M]st = 12 lim
N??
(trV?1G
sV?1Gt
)/N, s,t = 0,1,
where G0 = I is the N ?N identity matrix and G1 = 1?2 is the N ?N matrix of
10
all 1s.
Under Assumptions 2.1 ? 2.5, based on Miller (1977), the maximum likelihood
estimators (MLEs) for model (2.3) exist and are consistent.
To test the goodness of ?t of the LMM (2.3), we ?rst divide the covariate space
into L disjoint regions E1, ..., EL. We compute the observed and expected sums
in each region El as
fl =
m?
i=1
ni?
j=1
I{xij?El}yij, (2.5)
el(?) =
m?
i=1
ni?
j=1
I{xij?El}E(yij) =
m?
i=1
ni?
j=1
I{xij?El}xTij?, (2.6)
where I denotes the indicator function.
Remark 2.2 The cell partition can also be based on covariates not included in
model (2.3). In this case, if we let xij denote the vector of all available covariates
and x?ij only includes covariates used in the regression, then we would de?ne el(??) =
?m
i=1
?ni
j=1 I{xij?El}(x
?
ij)
T??, where ?? corresponds to the coe?cients of x?
ij. But no
matter which kind of cell partition we choose, we employ the expressions in (2.5)
and (2.6) in the whole thesis for notational simplicity. 2
With the notation f = (f1,...,fL) and e(?) = (e1,...,eL), the observed minus
the expected vector is
f?e(?0) =
?
??
??
??
?
?m
i=1
?ni
j=1 I{xij?E1}(yij ?x
T
ij?0)
...
?m
i=1
?ni
j=1 I{xij?EL}(yij ?x
T
ij?0)
?
??
??
??
?
. (2.7)
11
Since the true parameter vector?0 is unknown, we replace?0 in (2.7) by its MLE ??
for Theorem 2.3. The proof of this Theorem is given in Section 2.6.1. The following
assumption is made to ensure the existence of components of the variance covariance
matrix for the test statistic.
Assumption 2.6 limK??limsupm?? 1m ?mi=1 I{n2i?K} n2i ? 0.
Theorem 2.3 For model (2.3), let E1,...,EL constitute a disjoint partition of the
covariate space generated by X. Under Assumptions 2.1 ? 2.6, as N ??,
?N
?
??
?
(f ?e(?0))/N
????0
?
??
?
D? N(0,DVDT),
where with the (L+p)?N matrix D given in equation (2.29),
DVDT =
?
??
?
H  J?1??
J?1?? T J?1??
?
??
?
(L+p)?(L+p)
. (2.8)
The entries of H,  and J?? are given below
Hlk = ?2a lim
N??
1
N
m?
i=1
[( n
i?
j=1
I{xij?Ek}
)( n
i?
j=1
I{xij?El}
)]
, (2.9)
Hll = ?2? lim
N??
1
N
m?
i=1
ni?
j=1
I{xij?El} +?2a 1N
m?
i=1
( n
i?
j=1
I{xij?El}
)2
, (2.10)
 l = lim
N??
1
N
m?
i=1
ni?
j=1
I{xij?El}xTij, (2.11)
J?? = lim
N??
XTV?1X/N. (2.12)
12
Consistent estimators for these quantities are given in Corollary 2.5 below.
Remark 2.4 Assumption 2.3 ensures the existence of  in (2.8). Assumptions
2.6 and 2.4 ensure the existence of H, the limiting variance covariance matrix for
(f?e(?0))/N, and J?? in (2.8). 2
Corollary 2.5 Consistent estimators for elements in the block matrix DVDT in
Theorem 2.3 are:
?Hlk = ??2a 1
N
m?
i=1
[( n
i?
j=1
I{xij?Ek}
)( n
i?
j=1
I{xij?El}
)]
,
?Hll = ??2? 1
N
m?
i=1
ni?
j=1
I{xij?El} + ??2a 1N
m?
i=1
( n
i?
j=1
I{xij?El}
)2
,
? l = 1
N
m?
i=1
ni?
j=1
I{xij?El}xTij, ?I?? = XT ?V?1X/N,
where ?Hlk, ?Hll are estimators for o?-diagonal and diagonal elements of H, and ? l
estimates the l-th row of  .
Remark 2.6 If X is random, under the more restrictive assumption that xij,i =
1,...,m;j = 1,...,ni are i.i.d. and are independent of ni, the diagonal and o?-
diagonal elements of H given in (2.9) and (2.10) are
Hlk = ?2a ? E(n
2
1 ?n1)
E(n1) P (x11 ? El)P (x11 ? Ek), ? l ?= k
and
Hll = (?2? +?2a)P(x11 ? El)+?2aE(n
2
1 ?n1)
E(n1) [P(x11 ? El)]
2 .
13
In this case, another way of estimating the diagonal and o?-diagonal elements of H
is to use
?Hlk = ??2a ?E(n21)? ?E(n1)?
E(n1)
?P(x11 ? El)?P(x11 ? Ek)
and
?Hll = (??2? + ??2a)?P(x11 ? El) + ??2a ?E(n21 ?n1)?
E(n1)
[?
P(x11 ? El)
]2
,
where ?E(n21) =?mi=1 n2i/m, ?E(n1) =?mi=1 ni/m and
?P(x11 ? El) =?mi=1?nij=1 I{x
ij?El}/N. In the R code for simulations and data anal-
ysis below, we use the estimators in Corollary 2.5 to estimate H and in calculating
the theoretical power in the analytical power study Section, the estimators in this
Remark are applied. 2
Corollary 2.7 Under Assumptions 2.1 ? 2.6, (f ? e(??))/?N D? N(0, ), where
 = H? J?1?? T is an L?L matrix and can be replaced by its consistent estimator
? 0 = ?H? ? ?J?1?? ? T based on Corollay 2.5.
We compute Singular Value Decomposition for ? 0. For each eigenvalue of ? 0,
we compare it with a small preset threshold, such as 10?4 ?. For any eigenvalue
less than ?, we instead set this eigenvalue to be 0 and reconstruct the ? 0 matrix
using the non-zero eigenvalues and their corresponding eigenvectors. We denote
this reconstructed matrix as ? . Based on Corollary 5.3 given in the Appendix,
P(rank (? ) = rank ( )) ? 1.
14
Our goodness of ?t test statistic is then given by the following quadratic form
T = 1N(f?e(??))T ? ?1(f?e(??)), (2.13)
where ? ?1 denotes the Moore-Penrose pseudoinverse of ? . Under the null hypoth-
esis, T has an asymptotic central ?2k distribution, where k = rank(? ) = rank( )
for large N, based on Corollary 5.3.
2.2.1.2 LMM with additive random e?ects
We consider next the LMM with additive random e?ects (2.2), that is,
Y = X?+
R?
r=1
Zr?r +?. (2.14)
The covariate matrix X is considered to be ?xed numbers in this Section. This is
model (1) in Miller (1977). We ?rst state and comment on the assumptions Miller
(1977) made to ensure consistency and asymptotic normality of the MLE for ? in
(2.2).
Assumption 2.7 A.1 The partitioned matrix [X : Zr] has rank greater than p,
r = 1,...,R.
A.2 The matrices G0,G1,...,GR are linearly independent; that is, ?Rr=0 ?rGr = 0
implies ?r = 0, r = 0,1,...,R.
A.3 N and each mr, r = 1,...,R, tend to in?nity.
A.4 Let m0 = N. Then for each s,t = 0,1,...,R, either limN??ms/mt = ?st or
15
limN??mt/ms = ?ts exists. If ?st = 0, then let ?ts = ? for notational convenience.
Without loss of generality, let Zr be labeled so that for s < t, ?st > 0; i.e.,
the mr are in decreasing order of magnitude. Generate a partition of the integers
0,1,...,R, S0,S1,...,Sc, so that for indices r in the same set Ss, the associated
mr?s have the same order of magnitude. Such a partition is generated as follows:
i) r0 = 0; S0 = {0}; r1 = 1.
ii) For s = 1,2,..., it is true that rs ? Ss. Then for r = rs+1,rs+2,..., include r in
Ss until ?rs,r = ?; call the ?rst value of r where this occurs rs+1; then rs+1 ? Ss+1.
iii) Continue as in step ii until R has been placed in a set. Call this set Sc.
There are then c+ 1 sets in partitions, S0,S1,...,Sc, and Ss = {rs,...,rs+1 ?1}.
For each r = 1,2,...,R, r ? Ss for some s = 1,2,...,c. De?ne sequences Kr
(depending on N) as follows:
Kr = rank[Zrs : Zrs+1 : ??? : ZR] ? rank[Zrs : ??? : Zr?1 : Zr+1 : ??? : ZR],
r = 1,2,...,R,
K0 = N ?rank[Z1,...,ZR].
(The Kr so de?ned are closely related to the degrees of freedom of sums of squares
in the analysis of variance.)
A.5 Each of the limN?Kr/mr, r = 1,...,R exists and is positive.
Let V0 =?Rr=1 ?2rZr be the true covariance matrix.
A.6 There exists a sequence KR+1 (depending on N) increasing to in?nity such that
the p?p matrix C0 de?ned by C0 = limN??[X?V?10 X]/KR+1 exists and is positive
de?nite.
16
De?ne the (R + 1)?(R + 1) matrix C1 by
[C1]st = 12 lim
N??
[trV?10 GsV?10 Gt]/K12s K12t , s,t = 0,1,...,R.
A.7 Each of the limits used in de?ning [C1]st exists, s,t = 0,1,...,R. The matrix
C1 is positive de?nite.
Remark 2.8 Assumption A.1 requires that the ?xed e?ects not be confounded with
any of the random e?ects. A.2 requires that the random e?ects not be confounded
with each other. For the LMM (2.3) or hierarchical linear mixed models with nested
blocks such as (2.18), the matrices Zr,r = 1,...,R, consist only of 0?s or 1?s and
satisfy A.1?A.2. Assumptions A.1?A.2 are su?cient to guarantee identi?ability
of the MLE ??. Assumptions A.3?A.7, which correspond to Assumptions 3.1?3.5
in Miller (1977), are used to ensure the consistency of the MLE. Assumption A.3 is
natural and necessary for the consistency property of MLE estimators of both ? and
the variance components ?2? and ?2r,r = 1,...,R, because the sample size used to
estimate ? and ?2? is N and the sample size used to estimate ?2r is mr. Assumptions
A.6?A.7 are used to establish the existence and positive de?niteness of the limiting
variance-covariance matrix of the MLE ??. 2
InadditiontoAssumption2.7, takenfromMiller(1977), wealsorequirethefollowing
Assumption to ensure the existence of components in the variance covariance matrix
for the test statistic.
17
Assumption 2.8 H = limN??FVF? exists and is positive de?nite, with F given
in (2.16).
Remark 2.9 Assumption 2.3 ensures the existence of  in (2.17). Assumption
2.8 ensures the existence and positive de?niteness of H in  . H is the limiting
variance covariance matrix for (f?e(?0))/N, which involves empirical moments of
the covariates and empirical moments of cluster sizes at di?erent levels. At the end
of this Section, we give speci?c forms of H and ways of estimating H for the special
case of 3-level LMM (2.18). 2
We next state our goodness of ?t test for model (2.2) in Theorem 2.10 below.
The details of the proof are given in Section 2.6.3. The covariate space is divided
into L disjoint regions E1,...,EL. This partition can also be based on covariates not
included in model (2.2). As discussed in Remark 2.2 in Section 2.2.1.1, for notational
simplicity, the following notation applies whether or not the cell partition is based
only on covariates in the model.
For l = 1,2,...,L, de?ne
fl =
N?
k=1
I{xk?El}yk,
el(?) =
N?
k=1
I{xk?El}E(yk) =
N?
k=1
I{xk?El}xTk?.
Let f = (f1,...,fL), e(?) = (e1(?),...,eL(?)).
Theorem 2.10 For model (2.2), let E1,...,EL constitute a disjoint partition of the
18
covariate space generated by X. Under Assumptions 2.1 , 2.2 and 2.7 ? 2.8,
(f?e(??))T ? ?1(f?e(??))/?N D? ?2k, (2.15)
where ?? is the MLE, ? is the reconstructed matrix by applying Singular Value De-
composition on a consistent estimator of  = H? J?1?? T, ? ?1 denotes the Moore-
Penrose pseudoinverse of ? , k = rank(? ) = rank( ). Here
J?? = limN??XTV?1X/N, H = limN??FVFT, with
F = 1?N
?
??
??
??
?
I{x1?E1}???I{xN?E1}
...
I{x1?EL}???I{xN?EL}
?
??
??
??
?
(2.16)
and
 =
?
??
??
??
?
?T1
...
?TL
?
??
??
??
?
L?p
= lim
N??
?
??
??
??
?
1
N
?N
k=1 I{xk?E1}x
T
k
...
1
N
?N
k=1 I{xk?EL}x
T
k
?
??
??
??
?
. (2.17)
Remark 2.11 The detailed steps used in deriving the matrix ? with small-eigenvalue
eigenspaces project to 0 are exactly analogous to those described after Corollary 2.7
in Section 2.2.1.1. For the special case of a 2-level LMM (2.3), an explicit form
of H, which also follows from (2.16), and two consistent estimators were provided
in Section 2.2.1.1 under the more restrictive assumption that xij,i = 1,...,m;j =
1,...,ni are i.i.d. random variables and are independent of ni. The explicit form
19
of H is also given for the 3-level LMM (2.18) in this Remark. Even if the alternate
forms of the estimators for H and  are used, we still need to do the small-eigenvalue
thresholding in de?ning ? .
For the 3-level hierarchical block nested LMM
yijt = xTijt?+ui +vij +?ijt, i = 1,...,m; j = 1,...,ni; t = 1,...,nij, (2.18)
the matrix V has the structure
V =
?
??
??
??
??
??
??
??
??
??
??
??
??
?
??
??
??
?
a ??? b
... ... ...
b ??? a
?
??
??
??
?
ni1?ni1
???
?
??
??
??
?
?21 ??? ?21
... ... ...
?21 ??? ?21
?
??
??
??
?
ni1?ni,ni1
...
?
??
??
??
?
?21 ??? ?21
... ... ...
?21 ??? ?21
?
??
??
??
?
ni1?ni,ni1
???
?
??
??
??
?
a ??? b
... ... ...
b ??? a
?
??
??
??
?
ni,ni1?ni,ni1
?
??
??
??
??
??
??
??
??
??
??
??
??
,
where ?21 = Var(ui),?22 = Var(vij),?20 = Var(?ijt),a = ?20 + ?21 + ?22,b = ?21 + ?22. In
this case, ? l ?= k, the o?-diagonal elements of H are
Hlk = ?21 lim
N??
1
N
m?
i=1
[( n
i?
j=1
nij?
h=1
I{xijh?El}
)( n
i?
j=1
nij?
h=1
I{xijh?Ek}
)]
+ ?22 lim
N??
1
N
m?
i=1
ni?
j=1
(n
ij?
h=1
I{xijh?El}
)(n
ij?
h=1
I{xijh?Ek}
)
.
20
For l = 1,??? ,L, the diagonal elements of H are
Hll = ?20 lim
N??
1
N
m?
i=1
ni?
j=1
nij?
k=1
I{xijk?El} +?21 lim
N??
1
N
m?
i=1
( n
i?
j=1
nij?
k=1
I{xijk?El}
)2
+ ?22 lim
N??
1
N
m?
i=1
ni?
j=1
(n
ij?
k=1
I{xijk?El}
)2
.
Similar to the 2-level LMM, under the assumption that the covariate vectors are i.i.d.
random variables and the cluster sizes are independent of the covariate vectors, the
Hlk and Hll here can be expressed as functions of moments of n1, n11 and x111.
This can be done by applying Law of Large Numbers theory and by taking iterated
conditional expectations ?rst conditioning on {n1,n11}, which is similar to what was
done in (2.9), (2.10). Because of the complexity of these functions in the 3-level
LMM model (2.18), we recommend directly estimating Hlk as
?Hlk = ??21 1
N
m?
i=1
[( n
i?
j=1
nij?
k=1
I{xijk?El}
)( n
i?
j=1
nij?
k=1
I{xijk?Ek}
)]
+ ??22 1N
m?
i=1
ni?
j=1
(n
ij?
k=1
I{xijk?El}
)(n
ij?
k=1
I{xijk?Ek}
)
.
This also applies to estimating Hll.
2
21
2.2.2 Test statistic and its asymptotic properties for two-level LMM
with parameters estimated by least squares and method of mo-
ments
We consider the LMM (2.3), but now only require that E(?i) = E(?ij) =
0,Var(?i) = ?2a,Var(?ij) = ?2?, instead of assuming normality of ?i and ?ij. The
covariate vectors xij are considered to be random variables in this Section and
(xi,ni) are assumed to be i.i.d. This model, also called Nested Error Regression
Model, is widely used and studied in small area estimation (Prasad and Rao, 1990).
We estimate ? by the generalized least squares estimator
?? = (XT ?V?1X)?1(XT ?V?1Y)
= (XTV?1X)?1(XTV?1Y) +op(1), as N ??,
where V is a function of the variance components? = (?2a,?2?), which are estimated
by the method of moments by equating the right-hand sides of
E
[ m?
i=1
ni?
j=1
(yij ? ?yi.)2
]
=
m?
i=1
ni?
j=1
(xT
ij?? ?x
T
i.?
)2 +(N ?m)?2
? (2.19)
E
[ m?
i=1
ni?
j=1
(?yi. ? ?y..)2
]
=
m?
i=1
ni(?xTi.?? ?xT..?)2 +
(
N ? 1N
m?
i=1
n2i
)
?2a + (m?1)?2?
(2.20)
22
respectively to their estimates
SSW =
m?
i=1
ni?
j=1
(yij ? ?yi.)2
and
SSB =
m?
i=1
ni?
j=1
(?yi. ? ?y..)2 =
m?
i=1
ni?y2i. ?N?y2..
respectively, where SSW is the Sum of Squares Within groups and SSB is the Sum of
Squares Between groups in the analysis of variance. Because di?erent clusters (over
index i) are independent and the three quantities?nij=1(yij??yi.)2, ni?y2i. and?nij=1 yij
have ?nite second moments, SSW/m and SSB/m satisfy Laws of Large Numbers.
Equations (2.19), (2.19) and (2.20) are estimating equations for the parameter vector
? = (?,?2a,?2?), which can be solved iteratively. The solutions of equations (2.19),
(2.19) and (2.20) ??, is consistent.
We ?rst divide the covariate space into L disjoint regions E1, ..., EL and
compute the observed and expected values in each cell El as given in (2.5) and
(2.6). We then state our goodness of ?t test in Theorem 2.12 below with details of
the proof in Section 2.6.4.
Theorem 2.12 For the LMM (2.3) with ?nite second moments for both ?i and ?ij,
under Assumptions 2.1 and 2.2, (f?e(??))/?N D? N(0, ), where  = H? J?1?? ?.
Thus (f ? e(??))? ? ?1(f ? e(??))/N D? ?2k, where ? is the reconstructed matrix by
applying Singular Value Decomposition on a consistent estimator of  and k =
rank(? ).
23
Based on Corollary 5.3, k = rank(? ) = rank( ) for large N. Detailed steps used
in de?ning a reconstructed matrix ? with all eigenvalues lower-bounded away from
0 are exactly analogous to those following Corollary 2.7 in Section 2.2.1.1. The
matrices J??,  and H are the same as for the two-level LMM (2.3) where normality
was assumed for both ?i and ?ij and MLEs are used, with formulas given in (2.9),
(2.10), (2.11) and (2.12).
2.2.3 Power of the test
For the multi-level LMM (2.2), we derive the theoretical power under local, and
more speci?cally under contiguous alternatives for the test in (2.13) in the situation
where some covariates that in?uence the outcome y have been omitted from model
(2.2). Let X be the true N ? p covariate matrix and X? be a submatrix of X of
dimension N ? p? used in model (2.2), with p? < p. Let the null hypothesis be
H0 : ?N =?0. We assess the power of T under the alternative
H1 :?N =?0 + a?N, (2.21)
with ?0 = (?0,?0), where several components of ?0 are 0. We use the vector ??0
to denote the non-zero components of ?0. The indexing of ??0 as a sub-vector of ?0
corresponds to the same index subset as the columns of X? within X.
Based on the derivation for Theorem 2.10, we have that under H0,
1?
N(f?e(
???)) H0? N(0, ?),
24
where  ? is the limiting variance covariance matrix of (f?e(???))/?N. By checking
the condition (2.14) in Le Cam?s third lemma in Section 2.6.5, we ?nd that under
the alternative hypothesis H1 in (2.21),
1?
N(f?e(
???)) H1? N(?, ?),
where
? = lim
N??
{ ? ?[(X?)TV ?1X?]?1[(X?)TV ?1X]}a, (2.22)
with  given by expression (2.17) and  ? corresponds to the same expression, but
computed using X? and ??.
Thus under H1, T? has a limiting noncentral ?2 distribution
T? = 1N[f?e(???)]T( ?)?1[f?e(???)] H1? ?2k(?), (2.23)
where k = rank( ?) and the non centrality parameter is ? = ?T( ?)?1?. For
a given type I error level ?, the power is thus P(T? > ?2k,?), where ?2k,? is the
1 ? ? quantile of the central ?2k distribution and P denotes the non central ?2k(?)
distribution. We then substitute all parameter values in  ? with their MLEs and
reconstruct this consistent estimator of by applying Singular Value Decomposition
as explained in the context after Corollary 2.7 in Section 2.2.1.1. With ? ? denoting
the reconstructed matrix,
?T? = 1
N[f?e(
???)]T(? ?)?1[f?e(???)] H1? ?2k(??), (2.24)
25
where k = rank(? ) = rank( ) for large N, based on Corollary 5.3.
We now compute ? and  ? explicitly for two-level LMMs for the setting of
three covariates xij = (xij1,xij2,xij3) that are from a multivariate normal distribu-
tion (2.25) (as studied in the simulation Section 2.3.1), where xij,i = 1,...,m;j =
1,...,ni are i.i.d., and xij and ni are independent. We assume Y ? N(XT?,V),
where X = (1,x1,x2,x3) and V is given in (2.4), but then omit x3 in ?tting the
model, leading to X? = (1,x1,x2). Here a in ? is equal to (0,0,0,?3). With
derivation details given in Sections 2.6.5.1 and 2.6.5.2,
 ? = lim
N??
1
N(X
?)TV?1X?
= 1?2
?
?
??
??
??
?
1 Ex1 Ex2
Ex1 Ex21 E(x1x2)
Ex2 E(x1x2) Ex22
?
??
??
??
?
? 1E(n)?
2
a
?2?
?
??
??
??
?
c1 c1Ex1 c1Ex2
c1Ex1 h1 h2
c1Ex2 h2 h3
?
??
??
??
?
,
where c1 = En[n2/(?2? +n?2a)], c2 = En[n/(?2? +n?2a)], c3 = c1 ?c2, h1 = c2Ex21 +
c3(Ex1)2, h2 = c2E(x1x2)+c3Ex1Ex2 and h3 = c2Ex22 +c3(Ex2)2 .
26
And the component (X?)TV?1X in ? satis?es
1
N(X
?)TV?1X
P? 1
?2?
?
??
??
??
?
1 Ex1 Ex2 Ex3
Ex1 Ex21 E(x1x2) E(x1x3)
Ex2 E(x1x2) Ex22 E(x2x3)
?
??
??
??
?
? 1E(n)?
2
a
?2?
?
??
??
??
?
c1 c1Ex1 c1Ex2 c1Ex3
c1Ex1 h1 h2 h4
c1Ex2 h2 h3 h5
?
??
??
??
?
,
where h4 = c2E(x1x3) +c3Ex1Ex3 and h5 = c2E(x2x3) +c3Ex2Ex3.
When the cell partition E1,...,El is based on the omitted covariate x3, the
elements of  in ? are
1
N
m?
i=1
ni?
j=1
I{xij,3?El} P??
?
El
f3(x3)dx3
1
N
m?
i=1
ni?
j=1
I{xij,3?El}xij,1 P??
?
x1
?
El
x1f(x1,x3)(x1,x3)dx3dx1
1
N
m?
i=1
ni?
j=1
I{xij,3?El}xij,3 P??
?
El
x3f3(x3)dx3.
With (x1,x2,x3) jointly normal, which is the Scenario I in Section 2.3.1, F3(x3) and
f(x1,x3)(x1,x3) are the corresponding normal and bivariate normal densities.
Based on the above quantities, we next study the impact of the magnitude
of the variance components ?2a and ?2? and the correlations ?13 and ?23 in (2.25)
between the omitted covariate x3 and the covariates in the model (x1 and x2) on
27
the theoretical power when the cell partition is based on quantiles of the omitted
covariate x3 with L = 8 cells. For ?13 = .5 and ?23 = .6, Figure 2.1 plots the
theoretical power against ?3/(?2a + ?2?) for three choices of (?2a,?2?) and varying ?3
on the x-axis. For any ?xed pair of (?2a,?2?), the power of the test not surprisingly
increases as a function of ?3, the coe?cient of the omitted covariate x3. This ob-
servation can be made by a Taylor Expansion to the theoretical power formula. Let
G(x,?) = P?2k, ?([?2k, ?,?)) be the theoretical power, then by a Taylor Expansion
to G(x,?) around ? = 0, we get
G(x,?) ? G(x,?) + ???G(x,?) ?.
Thus the theoretical power G(x,?) for ? close to zero is approximately a linear
function of ? =?T( ?)?1?, which is a function of ?23.
For any ?xed ?3, the power increases when the random e?ect ?2a decreases
compared to the error term ?2?. Figure 2.2 plots the power for ?xed ?2a = 1, ?2? = .25
and di?erent choices of (?13,?23). It shows that the power of test increases as ?213+?223
decreases. When we set ?13 = 0 and ?23 = 0, that is, when x3 is correlated with
neither x1 nor x2, we can see from Figure 2.3 that the power is not a?ected by the
individual values of ?2a and ?2?, as long as ?2a +?2? is ?xed.
28
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
?3/(?a2+??2)1/2 
 
?a2=1; ??2=.25?
a2=.25; ??2=1?
a2=.625; ??2=.625
Figure 2.1: The impact of (?2a,?2?) on analytical power (Scenario I, LMM)
0.05 0.1 0.15 0.2 0.250.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
?3/(?a2+??2)1/2 
 
?13=0; ?23=0?
13=.1; ?23=.2?
13=.3; ?23=.4?
13=.05; ?23=.55
Figure 2.2: The impact of (?13,?23) on analytical power (Scenario I, LMM)
29
0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18
0.2
0.4
0.6
0.8
1.0
Analytic Power Plot (Scenerao I)
x
Analytic P
ower
Case 1Case 2
Case 3
Figure 2.3: Analytical power plot for ?31 = ?32 = 0, x-axis is ?3/(?2a +?2?). Case 1:
?2a = .25,?2? = 1; Case 2: ?2a = .625,?2? = .625; Case 3: ?2a = 1,?2? = .25.
2.3 Simulations
2.3.1 Normally distributed covariates (Scenario I)
We simulate data from the following setting. For m = 500, we ?rst generate
cluster sizes ni ? uniform on {2,3,4,5} and compute N =?mi=1 ni. Thus the total
number of observations in each repetition is always around N = 1750. We then
draw N covariates xij = (x1ij,x2ij,x3ij), ?xed in the sense that they are simulated
only once for the entire simulation of K = 1000 repetitions, independently from a
multivariate normal distribution,
?
??
??
??
?
x1
x2
x3
?
??
??
??
?
? N
?
??
??
??
?
?
??
??
??
?
0
0
0
?
??
??
??
?
,
?
??
??
??
?
1 0 ?13
0 1 ?23
?13 ?23 1
?
??
??
??
?
?
??
??
??
?
. (2.25)
30
Given X = (1,x1,x2,x3) and parameters ?, ?a and ??, we generate Y from a
multivariate normal distribution Y ? N(X??,V), where V is given in (2.4).
We ?rst do simulations to show that our goodness of ?t test statistic (2.13)
indeed has asymptotic ?2 distribution. We choose ?13 = ?23 = 0 and set true
parameter values ?a = 1, ?? = .5, ? = (?0,?1,?2,?3) = (1,1,1,.25). We then
?t model (2.3) with all covariates that in?uence the response y. The number of
cells L in the computation of T is twelve based on empirical quantiles of x1 and
x2. With the number of iterations being 1000, we then have 1000 test statistic
values N?1(f?e(??))? ? ?1(f?e(??)). Figure 2.4 gives the histogram of these 1000
independently calculated test statistics which turns out to be close to ?212, with p
value from the Kolmogorov-Smirnov test being around .5 and p value from Pearson?s
chi-square goodness of ?t test being around .27. The simulation result coincides with
the theory we claim for (2.13).
Let ? be the level of signi?cance. We show in Table 2.1 the empirical size of
the test, which is the proportion of iterations that have p values less than or equal
to ?. For example, the ?rst row of Table 2.1 says that 4.7% of the 1000 simulations
have p-values less than or equal to .05. The third column is the standard deviation of
the corresponding empirical size (ES), which is calculated as?ES(1?ES)/1000.
We also check the size of the test under various choices of cell partition based on
X. We choose ?13 = ?23 = 0 in (2.25) and let ?a = 1, ?? = .5, ? = (?0,?1,?2,?3) =
(1,1,1,1), and ?t model (2.3) with all covariates X in the model. Cell partitions
in the computation of T are always based on empirical quantiles of each generated
covariate matrix in each repetition. For example, the ?rst row in Table 2.2 means
31
stat.ML
prob
5 10 15 20 25 30
0.00
0.02
0.04
0.06
0.08
density of chisq, df=12
Hist of test.stat.ML
Figure 2.4: Empirical and Asymptotic distribution of the test statistic 2.13 in Sce-
nario I with correlations ?13 = ?23 = 0.
Table 2.1: Empirical size of the test under di?erent  levels (LMM)
m = 500;E(N) = 1750; 3 = :25; 13 =  23 = 0;K = 1000.
signi?cance level  Empirical Size(ES) Standard Deviation of ES
0.05 0.047 0.0067
0.1 0.097 0.0094
0.2 0.189 0.0124
0.3 0.279 0.0142
0.4 0.385 0.0154
0.5 0.505 0.0158
0.6 0.604 0.0155
0.7 0.703 0.0144
0.8 0.797 0.0127
that in each of K = 1000 repetitions, the cell partition is based on the each generated
x1 with number of cells being 8. The second row in Table 2.2 means that in each of
K = 1000 repetitions, the cell partition is based on both the generated x1 and x2
using cross tabulation. Table 2.2 shows empirical sizes (Emp. Size) were all close
to the nominal ? levels of 0.05 and 0.1 for all choices of cell partitions.
32
Table 2.2: Empirical size of the test under di?erent cell partitions (LMM).
m = 500;E(N) = 1750; 3 = 1; 13 =  23 = 0;K = 1000.
L  Emp. Size  Emp. Size
8 (x1) 0.05 0.056 0.1 0.097
3?4 (x1;x2) 0.05 0.058 0.1 0.109
5?4 (x1;x3) 0.05 0.048 0.1 0.088
6?7 (x2;x3) 0.05 0.050 0.1 0.097
To assess the power of the test, we ?t model (2.3) to the data without includ-
ing x3 among the covariates. We then use a cell partition based on the empirical
quantiles of the omitted x3 with L = 8 cells when the number of clusters m = 500
or 50. For m = 20, we use ?xed cell boundaries, based on theoretical quantiles of
the distribution of x3, to divide x3 into L = 8 cells. We set (?13,?23) = (.5,.6),
?a = 1, ?? = .5, ? = (?0,?1,?2,?3) = (1,1,1,.25). For a given design matrix
X, we compute the theoretical power (Theo.Pow.), the mean estimated theoretical
power (Theo.Pow.hat), and the mean empirical power (Empi.Pow.n) for K = 1000
iterations. We compute the theoretical power of T? in (2.23) based on the asymp-
totic ?2 distribution with the true values of the variance components and empirical
moments for X used in the calculation of the non-centrality parameter (2.22). We
compare these values to the estimated theoretical power, that is computed based
on the asymptotic ?2 distribution with estimated variance components and empir-
ical moments for X in (2.22). We repeat the power computations for D = 500
randomly generated matrices X. Table 2.3 shows means and variances of the 500
distinct power estimates (each based on K=1000 iterations) varying over the design
matrices for m = 500,50 and m = 20 clusters. The theoretical power, the empirical
power and the estimated theoretical power agree very well, even when m is small.
33
Table 2.3: Power and robustness study for Scenario I (LMM).
L = 8;K = 1000;D = 500;( 13; 23) = (:5;:6); a = 1; ? = :5.
Power m = 500;EN = 1750 m = 50;EN = 175 m = 20;EN = 70
 3 = :25 mean stan.dev. mean stan.dev. mean stan.dev.
Theo.Pow. .798 .0392 .122 .0236 .084 .0175
Theo.Pow.hat .796 .0383 .127 .0241 .089 .0184
Empi.Pow.n .796 .0388 .113 .0227 .062 .0181
Empi.Pow.t3 .797 .0383 .113 .0234 .061 .0184
Empi.Pow.t5 .797 .0375 .113 .0237 .061 .0180
Power m = 500;EN = 1750 m = 50;EN = 175 m = 20;EN = 70
 3 = :8 mean stan.dev. mean stan.dev. mean stan.dev.
Theo.Pow. 1 0 .853 .0967 .533 .1842
Theo.Pow.hat 1 0 .827 .0895 .506 .1443
Empi.Pow.n 1 0 .827 .0954 .439 .1586
Empi.Pow.t3 1 0 .828 .0942 .443 .1629
Empi.Pow.t5 1 .0001 .827 .0941 .440 .1591
However, only for m = 500 was there adequate power to detect lack of ?t when
?3 = 0.25, which was substantially smaller than the coe?cients ?1 = ?2 = 1 of
x1, and x2, the covariates included in the model. When the e?ect of the omitted
covariate was larger, ?3 = 0.8, the test statistic had approximately 80% power even
for m = 50 clusters.
2.3.1.1 Impact of choice of the cell partition on power
As is true for Pearson?s chi-square test, the choice of cell partition plays an
important role for our goodness-of-?t test. We now illustrate the impact of the cell
partition on the power of our test.
We let ?13 = .5 and ?23 = 0 and set ? = (?0,?1,?2,?3) = (1,1,1,.15), ?a = 1
and ?? = .5, for di?erent choices of cell partitions. Again, we generate y from
a model whose proper speci?cation includes x1, x2 and x3 but then analyze the
34
Table 2.4: Impact of cell partition on empirical power for Scenario I (LMM).
m = 500; 3 = :15; a = 1; ? = :5;K = 1000.
Parti  13 = 0; 23 = 0  13 = 0:2; 23 = 0:3  13 = 0:4; 23 = 0:5
L = 12 L = 42 L = 12 L = 42 L = 12 L = 42
x1 0.059 0.044 0.055 0.055 0.058 0.051
x2 0.052 0.044 0.044 0.046 0.041 0.053
x3 0.991 0.871 0.907 0.708 0.539 0.292
x1;x2 0.041 0.044 0.053 0.047 0.060 0.053
x1;x3 0.962 0.860 0.922 0.737 0.566 0.360
x2;x3 0.961 0.871 0.889 0.723 0.569 0.368
results of omitting x3 in the subsequent model ?tting. We choose six di?erent cell
partitions: partitions based on only x1, only x2, only x3, both x1 and x2, both x1
and x3, or both x2 and x3. For all cell partitions, we use empirical quartiles based
on data to divide the covariates. The number of replications in our simulation study
is K = 1000. The results in Table 2.4 show that a lack of ?t is detecTable by our
test statistic only when the cell partition involves the omitted covariate, x3. Similar
results are observed when x3 is independent of x1 and x2.
2.3.1.2 Robustness of T with respect to error distribution
In Table 2.3 we also assessed the impact of misspeci?cation of the error dis-
tribution on the power of the test statistic. Using the same setting as in the power
calculations given above, we generated ? from a t distribution with k = 3 or 5 de-
gree of freedom instead of from a N(0,?2?). We rescaled the variance of ? so that
Var(??tk?k ?2/?k) = ?2? to ensure that the noise had the same variance as in the
normal case. The power of the test under a t-distribution is virtually the same as
the power of the test with a normal error distribution. For example, for m = 50
35
the power was 0.83 for the normal error distribution and for t-distributions with 3
and 5 degrees of freedom (Table 2.3). Thus, by comparing the last three rows of
Table 2.3, we can see that our test is very robust with respect to symmetric error
distributions.
2.3.1.3 A summary parameter related to the power
The power of a statistical test is the probability that the test will reject the
null hypothesis when the null hypothesis is not true. The rejection ratio in Table
1, which is de?ned as total number of iterations with p value < .05 over the total
number of simulation iterations, can be considered as an estimator of the power for
our goodness-of-?t test. We performed many simulations to show that the rejection
ratio is closely related to a summary parameter ? which is de?ned as the ratio of
two variances. Suppose the true model is yij = h(x1ij,x2ij,x3ij) + ?i + ?ij, where
the true covariates that impact y are x1,x2 and x3. But we ?t the data only using
covariates x1 and x2. Then the summary parameter
? = Var[h(x1,x2,x3)?
?
x1,x2 h(x1,x2,x3)]
Var(?i +?ij) , (2.26)
where ?x1,x2 h(x1,x2,x3) = ?E(h(X1,X2,X3)|X1,X2) is the linear projection.
One simulation scenario we considered is similar to the one we already ex-
plained in Section 2.3.1, where both the random e?ect ?i and the error term ?ij are
generated from the normal distribution. The true model is de?ned to have covariate
36
vector generated from the multivariate normal distribution:
?
??
??
??
?
x1
x2
x3
?
??
??
??
?
? N
?
??
??
??
?
?
??
??
??
?
12
2
24
?
??
??
??
?
,
?
??
??
??
?
1 0 .5
0 .5 .8
.5 .8 2
?
??
??
??
?
?
??
??
??
?
.
Underthisjointdistribution, one veri?esthatx3|x1,x2 ?N(.5x1+1.6x2+14.8,0.47).
But we ?t the data only using x1 and x2, omitting x3. Then the summary parameter
(2.26) is simpli?ed to be ?23Var(x3|x1,x2)/Var(?) = .47?23/(?2a + ?2?). With the
covariate set x ?xed for each iteration, we change the magnitude of the coe?cient
vector ? and the two variance components ?2a and ?2?. For each set of parameters
{?,?2a,?2?}, there is a corresponding summary parameter ?, which we plotted on
the x-axis. We then ran simulations with 1000 iterations to get the corresponding
rejection ratio, which we plotted on the y-axis. The simulation results in Figure 2.5
indicate that the empirical power of the test increases as the summary parameter
? increases and this is true for three di?erent cell partitions. As we can see from
the graph, there is a nearly linear relationship between the estimated power and the
de?ned summary parameter ? and we can get the slope from the Taylor expansion
of the theoretical power around 0.
We then compared what we found here with what we saw by using the analytic
power formula for the special case when (x1,x2,x3) comes from a multivariate normal
(2.25), the simulation scenario we carefully studied in Section 2.3.1. In this case,
the numerator of ? in (2.26) becomes Var(x3|x1,x2) = 1 ? (?213 + ?223). Thus,
37
0.00 0.02 0.04 0.06 0.08
0.2
0.4
0.6
0.8
Empirical Power vs Summary Parameter
Summary Parameter
Empir
ical P
ower
Cell Partition on x3Cell Partition on x1,x3
Cell Partition on x2,x3
Figure 2.5: Empirical power vs summary parameter ?
? = (1?(?213 + ?223))/(?2a + ?2?). This agrees with what we found from Figure 2.2,
i.e. the power of test increases as ?213 + ?223 decreases. However, this ? doesn?t
include other information we got in Section 2.3.1, such as the way that the power
increases as ?3 increase and the relationship of the power to the relative magnitude
of ?2a and ?2?, given ?xed ?2a +?2?.
2.3.2 Normally distributed interacting covariates (Scenario II)
We again generatey from a linear model that depends on three covariatesx1,x2
and x3. However, we now let x1 and x2 arise from a bivariate normal distribution
with mean (1,0.5), with variances ?21 = ?22 = 1 and covariance ?12 = 0, and de?ne
x3 as their product, i.e. x3 = x1x2. Again, we choose m = 500 and generate the
cluster size ni from a uniform distribution on {2,3,4,5}. We let ?a = 1, ?? = .5,
38
? = (?0,?1,?2,?3) = (1,1,1,.2) and use empirical quantiles of the covariates to
de?ne the cell partition. First, we ?t model (2.3) with all covariates (x1,x2,x3)
that in?uence the response y and check the size of the test. Again, the empirical
sizes were all close to the nominal ? level under di?erent choices of cell partitions,
as shown in Table 2.5. We then ?t model (2.3) without x3 and study the power of
Table 2.5: Empirical size of the test for Scenario II under di?erent cell partitions with
standard deviations in brackets (LMM).
m = 500;E(N) = 1750; 3 = :2; a = 1; ? = :5; 12 = 0;K = 1000.
L  Emp. Size  Emp. Size
8(x1) 0.05 0.048 (0.0068) 0.1 0.102 (0.0096)
3?4(x1;x2) 0.05 0.045 (0.0066) 0.1 0.085 (0.0088)
5?4(x1;x3) 0.05 0.060 (0.0075) 0.1 0.098 (0.0094)
6?7(x2;x3) 0.05 0.051 (0.0070) 0.1 0.100 (0.0095)
the test using di?erent cell partitions. Results in Table 2.6 indicate that the test
has adequate power only when the cell partition is based on x1 and x2, or on the
omitted interaction term x3 = x1x2, but not if the cell partition is based on either
x1 or x2 alone, for ?12 = 0 and ?12 = .3.
We next study the impact of the magnitude of the variance components ?2a and
?2? on the theoretical power when the cell partition is based on the omitted covariate
Table 2.6: Impact of cell partition on empirical power for Scenario II (LMM).
m = 500;E(N) = 1750; 3 = :2; a = 1; ? = :5;K = 1000.
Partition  12 = 0  12 = 0:3
L = 12 L = 42 L = 12 L = 42
x1 0.049 0.049 0.256 0.182
x2 0.038 0.041 0.273 0.173
x3 0.893 0.771 0.859 0.749
x1;x2 0.991 0.966 0.989 0.975
x1;x3 0.843 0.938 0.885 0.939
x2;x3 0.936 0.912 0.956 0.928
39
0.05 0.10 0.15 0.20
0.2
0.4
0.6
0.8
1.0
Analytic Power Plot (Scenario II)
x
Analytic P
ower
Case 1Case 2
Case 3
Figure 2.6: The impact of (?2a,?2?) on analytical power, ?12 = 0, x-axis is
?3/(?2a +?2?). Case 1: ?2a = .25,?2? = 1; Case 2: ?2a = .625,?2? = .625; Case 3:
?2a = 1,?2? = .25.
x3 with L = 8 cells using ?xed cell boundaries. For ?12 = 0, Figure 2.6 plots the
theoretical power against ?3/(?2a+?2?) for three choice of (?2a,?2?) and varying ?3 on
the x-axis. Our conclusions are consistent with what we saw in Section 2.3.1. For
any ?xed pair of (?2a,?2?), the power of the test increases as a function of ?3, the
coe?cient of the omitted covariate x3. For any ?xed ?3, the power increases when
the random e?ect ?2a decreases compared to the error term ?2?.
2.4 Data examples
2.4.1 Birth weight data
These data were obtained from the book [19] and consist of fetal birth weights
of 432 boys from 108 families of size 4, and BMI of the mother, age of the mother,
40
0 40 80
2500
4500
Family index
Birthweight(grams)
(a) Birth weight in families
?3 ?1 1 2 3?1000
500
Theoretical Quantiles
Residuals
(b) Normal plot of res
?2 0 1 2
3000
4500
Theoretical Quantiles
Family means(grams)
(c) Family means qqnorm
20 30
2500
4500
BMI
Birthweight(grams)
(d) Birth weight vs BMI
Figure 2.7: Normality assumption checking for birth weight data.
birth order, gestation time and a family indicator. Lee et al. [19] analyzed this data
set using a linear mixed model with only an intercept term. To use all covariates,
we excluded individuals who had missing covariates and modi?ed possible recording
errors on the covariate order. The complete data set consists of 104 families with
370 individuals. Family sizes di?ered for di?erent families, but all family sizes were
four or less. We repeated the graphical analysis in Chapter 5 of [19] on the modi?ed
data and got similar results. There is a strong familial e?ect of birth weight and
the within-family variation is normally distributed. We conclude that the weight
data satisfy assumptions for (2.3). We tried various covariate combinations for the
?xed part in (2.3), including interactions. Based on the signi?cance of the covariates
and BIC (or AIC) criteria, the best model is the LMM with 3 covariates: intercept,
mom.age and gestation. Residual plots further con?rm the normality assumptions
in our LMM. We then apply the goodness of ?t test to our model. We use the cell
partition based on the covariate order, which is not in our ?nal model, and obtain
41
?3?11 3?1500
?500
0
500
1500
Theoretical Quantiles
Sample Quantiles
(b) qqnorm: a_i+e_ij
?2 0 2
?500
0
500
1000
Theoretical Quantiles
Sample Quantiles
(c) qqnorm: a_i+e_i.
Figure 2.8: Normality checking for residuals of ?nal model for birth weight data
the following number of observations in the cells: 61, 74, 79, 80, 31, 45. The value
of the test statistic is T = 2.44, which corresponds to a p value of .87 for a chi-
square distribution with 6 degrees of freedom. However, when we tested the ?t of
the intercept only model in [19] using the same cell partition, we also found that
it had adequate ?t to the data with T = 4.97. Thus both of these two models are
judged adequate by our test.
2.4.2 Alcohol data
These data come from a Women?s Alcohol Study [8], where 53 healthy, non-
smoking postmenopausal women completed a random-order, three-period (8-week
treatment for each period) study with a crossover design in which each woman re-
ceived 0, 15 or 30 g of alcohol per day. Participants were not told the amount
of alcohol they were consuming and each controlled feeding period was preceded
by a two to ?ve week washout period during which time the participant consumed
no alcohol. During the controlled feeding period, all food and beverages for the
42
?2 ?1 0 1 2
4.0
4.5
5.0
5.5
6.0
Theoretical Quantiles
Sample Quantiles
QQ plot for log(TG)
Figure 2.9: Normality checking for log(TG) for Alcohol data.
participants were supplied by the study investigators.
For each woman in the data set, three blood measurements for the three pe-
riods are recorded, corresponding to the three randomized alcohol intake periods.
We assessed the association of Plasma Triglycerides level (TG) with alcohol in-
take. Other relevant covariates were race, age, height, weight and BMI (Body Mass
Index). A log transformation of TG met the normality assumption by a formal
goodness of ?t check on the response variable. We then tried all possible covariate
combinations for the ?xed-e?ect terms of (2.3), including interactions. Assessed
by the signi?cance of the covariates and BIC (or AIC) criteria, our ?nal model in-
cluded an intercept, age, BMI and alcohol. The residual plots further con?rmed the
normality assumptions in (2.3). To apply our goodness of ?t test on this model,
we use the cell partition based on the covariates height and weight, which are not
in our ?nal model, and obtain the following number of observations in the cells:
19,19,13,17,15,11,13,15,14,22. The value of the test statistic is T = 8.98, which
corresponds to a p values of .53 for a chi-square distribution with 10 degrees of
43
?2 0 1 2
?0.5
0.0
0.5
Theoretical Quantiles
Sample Quantiles
qqnorm for a_i+e_ij
?2?10 1 2?0.6
?0.4
?0.2
0.0
0.2
0.4
0.6
0.8
Theoretical Quantiles
Sample Quantiles
qqnorm for a_i+e_i.
Figure 2.10: Normality checking for residuals of the ?nal model for Alcohol data
freedom. Thus our ?nal model ?tted well and we concluded, not surprisingly, that
alcohol intake a?ects Plasma Triglycerides levels.
2.4.3 Factors impacting thyroglobulin levels in an iodine de?cient
population
On April 26, 1986, an accident at the Chernobyl power plant located in north-
western Ukraine, close to the border with Belarus, released large amounts of radioac-
tive materials including iodine-131 (I-131) into the atmosphere from the destroyed
reactor. Deposition of these radioactive materials resulted in serious contamination
of the territory and exposed its residents. Because the thyroid gland concentrates
iodine, the doses to the thyroid due to consumption of I-131 contaminated milk were
much greater than those to any other organs. The National Cancer Institute, NIH
is involved in a cohort study in Belarusian individuals exposed to the accident [34].
While the main objective of the study is to evaluate the relationship between I-131
44
doses and risk of thyroid cancer, investigators were also interested in describing the
levels of iodine in this population, as historical data suggest that the study area
could be mildly iodine de?cient and iodine de?ciency impacts absorption of I-131.
We therefore evaluated the relationship between levels of serum thyroglobulin
(TG), a sensitive marker of iodine de?ciency, and patients? characteristics including
age, sex, thyroid volume and other demographic and clinical variables that might
re?ect or in?uence the intake of dietary iodine and were identi?ed in a initial screen
of variables.
We restrict our example to men from four of the ?ve study regions, who had
complete information on the covariates. After excluding observations with TG > 80,
log(TG) was normally distributed. The ?nal dataset was comprised of m = 933
individuals, of whom 404 had a single TG measurement, 484 had two, 42 three
and 3 four TG measurements during follow-up, resulting in a total of N = 1510
observations.
An initial screening of the variables, one at a time by simple linear regres-
sion, indicated that age at time of exam, age at time of the accident, rural or
urban residence, smoking status, urinary iodine levels, serum thyroid-stimulating
hormone (TSH) levels, serum anti-thyroglobulin antibody (ATG) levels, thyroid
volume, presence of thyroid nodules, presence of goiter and presence of any thyroid
abnormality may impact TG levels.
We ?t various of the LMM (2.3) model, including combinations of those co-
variates and their interaction terms using PROC GLIMMIX (SAS 9.2). A model
(Model 1) that included all the variables mentioned above, with the exception of
45
presence of nodules, and in addition included an interaction term of ATG levels
with presence of any thyroid abnormality that was marginally signi?cant (Wald p-
value p = 0.054), had a log-likelihood value of 1625.2. The random e?ect variance
estimate for Model 1 was ?2a = 0.29 and the error variance estimate was ?2? = 0.25.
A second model (Model 2) that had no interaction term, but included presence of
nodules, resulted in a log-likelihood of 1621.3. The variance component estimates
were similar to model 1, ?2a = 0.29 and ?2? = 0.27. However, as models 1 and 2 are
not nested, we could not compare them using a likelihood ratio test.
To assess the ?t of Models 1 and 2, we formed the cell partition for the test
based on L = 8 cells de?ned by quantiles of ATG and presence of any thyroid
abnormality. Based on a chi-squared test statistic with eight degrees of freedom,
there was no indication of lack of ?t for either model, with corresponding p-values
p = 0.32 and p = 0.40 for Models 1 and 2 respectively. We repeated the calculation
of the test statistic for a second cell partition based on presence of nodules and
presence of goiter, resulting in L = 4 cells, with corresponding p-values p = 0.19
and p = 0.70 for Models 1 and 2 respectively. These results suggested that both
models provided an adequate ?t to the data.
When we checked a third model, that included neither presence of nodules
not the interaction term (log-likelihood 1621.5), there was also no evidence of lack
of ?t based on eight (p = 0.39) or 4 degrees of freedom chi square tests (p = .21).
However, with a model that included all the covariates, the interaction term of ATG
levels with presence of any thyroid abnormality but excluded presence of goiter, we
did detect a lack of ?t in the cell partition de?ned by presence of nodules and goiter,
46
with p = 0.006, while there was no evidence of lack of ?t based on the L = 8 cell
partition (p = 0.28). This highlights the importance of presence of any thyroid
abnormality in the model. However, omitting the interaction term of this variable
with ATG levels does not a?ect ?t to the data.
2.5 Discussion
Schoenfeld (1980) presented a class of chi-squared goodness of ?t tests for the
proportional hazards regression model. We adapted this idea and proposed a class
of goodness of ?t tests for testing the statistical adequacy of a linear mixed model.
We described the asymptotic properties of the test when parameters were estimated
and developed its theoretical power under the local, or contiguous, alternative. We
assessed factors that impact the power, the impact of choice of cell partitions on
the test as well as the robustness of the test with respect to symmetric error distri-
bution in simulations. We found that when a speci?c covariate that is associated
with outcome is omitted, especially interaction terms or a covariate correlated with
covariates already in the model, cell partitions based on the omitted covariate result
in adequate power of the test. However, if the cell partition is based only on covari-
ates already in the model, this test has no power to detect any lack of model ?t.
We also found that the estimated theoretical power calculated using Le Cam?s third
lemma was reliable at least when the number of clusters m is above 50. However,
when m is very small, it may be advisable to rely on the empirical power computed
through simulations. Our test was also very robust to violations of the normality
47
assumption of the error distribution.
This goodness of ?t test can be used to test the statistical adequacy of the
?nally selected LMM in real application. The test statistic is very easy to implement,
as all that is needed in order to apply the test are the ?nal model parameter estimates
andtheirvariancecovariancematrix, whicharestandardoutputsfromanystatistical
software. As a note of caution, in applying the test one needs to check the rank of
the estimated variance covariance matrix ? in (2.13) to ensure the correct degrees
of freedom for the test statistic.
In Chapter 3, we extend this test statistic to assess the ?t of generalized linear
mixed models.
2.6 Technical details for Chapter 2
2.6.1 Proof of Theorem 2.3
Proof: Let J be the limit of the sample information matrix per observation,
J = lim
N??
1
N
?
??
?
? ?2l??i??j ? ?2l??i??j
? ?2l??i??j ? ?2l??i??j
?
??
?=
?
??
?
J?? J??
JT?? J??
?
??
?. (2.27)
The MLE ?? in model (2.3) is consistent as follows from results by Miller (1977). By
Taylor series expansion of the score function S(??), we obtain
?N(????
0) ?
(
? 1N ?S(?0)??
)?1 1
?NS(?0) ? J?1 1?NS(?0). (2.28)
48
Under model (2.3), J?? = 0 in (2.27) (Wand 2007). The Fisher information is
therefore a block diagonal matrix and J?1 =
?
??
?
J?1?? 0
0 J?1??
?
??
?. As Y?X?? N(0,V),
the score function for ?, which is the ?rst p components of S(?), is
S?(?) = (?/??)l(?) = XTV?1(Y?X?).
By extracting the ?rst p components of (2.28), we have
?N(????
0) ? J
?1
??
1?
NS?(?0) = J
?1
??X
TV?1(Y?X?
0)/
?N.
Thus,
?N
?
??
?
(f ?e(?0))/N
????0
?
??
? ?
?
??
??
??
??
??
?
N?1/2 [I{x11?E1}???I{xmnm?E1}]
...
N?1/2 [I{x11?EL}???I{xmnm?EL}]
N?1/2 J?1??XTV?1
?
??
??
??
??
??
?
(Y?X?0)
= D(L+p)?N(Y?X?0), (2.29)
which is a linear combination of Gaussian random variables.
Therefore, as N ??,
?N
?
??
?
(f ?e(?0))/N
????0
?
??
?
D? N(0,DVDT),
49
where
DVDT =
?
??
?
H  J?1??
J?1?? T J?1??
?
??
?,
with
 =
?
??
??
??
?
?T1
...
?TL
?
??
??
??
?
L?p
= lim
N??
?
??
??
??
?
N?1?mi=1?nij=1 I{xij?E1}xTij
...
N?1?mi=1?nij=1 I{xij?EL}xTij
?
??
??
??
?
, (2.30)
and H is a symmetric matrix of dimension L ? L. For l = 1,??? ,L ? 1; k =
l +1,??? ,L, the o?-diagonal elements of H are
Hlk = ?2a lim
N??
1
N
m?
i=1
[( n
i?
j=1
I{xij?El}
)( n
i?
j=1
I{xij?Ek}
)]
(2.31)
Similarly, for l = 1,??? ,L, the diagonal elements of H are
Hll = ?2? lim
N??
1
N
m?
i=1
ni?
j=1
I{xij?El} +?2a lim
N??
1
N
m?
i=1
( n
i?
j=1
I{xij?El}
)2
. (2.32)
Remark 2.13 If xij are random variables, under Assumption 2.2,
50
?l = E{x1,n1}(?n1j=1 I{x1j?El}x1j)/E(n1), and
Hlk = ?2a ? limm?? 1(?m
i=1 ni)/m
? 1m
m?
i=1
[( n
i?
s=1
I{xis?El}
)( n
i?
t=1
I{xit?Ek}
)]
= ?
2
a
E(n1) ?E(x1,n1)
[( n
1?
s=1
I{x1s?El}
)( n
1?
t=1
I{x1t?Ek}
)]
= ?
2
a
E(n1) ?En1
{
Ex1
[( n
1?
s=1
I{x1s?El}
)( n
1?
t=1
I{x1t?Ek}
) 
  n
1
]}
.
The expressions of Hlk and Hll depend on the joint distribution of xi and ni. Under
the more restrictive assumption that xij,i = 1,...,m;j = 1,...,ni are i.i.d. and
are independent of ni, then Hlk and Hll can be further simpli?ed as
Hlk = ?
2
a
E(n1) ?En1
{n
1Ex11
(I
{x11?El}I{x11?Ek}
)+ (n2
1 ?n1)E
(I
{x11?El}
)E(I
{x11?Ek}
)}
= ?2a ? E(n
2
1 ?n1)
E(n1) P
(I
{x11?El}
)P(I
{x11?Ek}
)
and
Hll = (?2? +?2a)E(I{x11?El})+?2aE(n
2
1 ?n1)
E(n1)
[
P(I{x11?El})
]2
.
2
51
2.6.2 Proof of Corollary 2.7
Proof: Under the asymptotic normality of ?N(????0), with A ? B denot-
ing A?B P? 0,
1?
N
(
f?e(??)
)
= 1?N (f?e(?0)) + 1?N
(
e(?0)?e(??)
)
? 1?N (f?e(?0))? 1?N?e(?0)
(?
???0
)
? 1?N (f?e(?0))? ?N
(?
???0
)
= (I | ? )?N
?
??
?
(f ?e(?0))/N
????0
?
??
?.
Since N?1/2
(
f?e(??)
)
is a linear combination of components in
?N
?
??
?
(f ?e(?0))/N
????0
?
??
?, thus
1?
N
(
f?e(??)
) D
? N(0, ), (2.33)
with  = H? J?1?? T.
2.6.3 Proof of Theorem 2.10
Since the 2-level LMM model (2.3) is a special case of the LMM model (2.2)
under normality assumptions for both random e?ects and the error term, the proof
of Theorem 2.10 for model (2.2) is quite similar to the proof of Theorem 2.3 for
model (2.3) as stated in Section 2.6.1. Two results that we need to use for the proof
52
of Theorem 2.10 are, again, the MLE consistency (Miller, 1977) and the fact that
the o?-diagonal matrix J?? of J, the limit of the sample information matrix per
observation, is 0 (Wand 2007, equation (3)). Key steps of the proof are ?rst to do a
Taylor expansion to the MLE ??, and then to use the fact that the response vector
Y is normally distributed to show the asymptotic normality of the observed minus
estimated expected vector. In this case, H = limN??FVFT is a symmetric L?L
matrix, with
F = 1?N
?
??
??
??
?
I{x1?E1}???I{xN?E1}
...
I{x1?EL}???I{xN?EL}
?
??
??
??
?
. (2.34)
and
 =
?
??
??
??
?
?T1
...
?TL
?
??
??
??
?
L?p
= lim
N??
?
??
??
??
?
1
N
?N
k=1 I{xk?E1}x
T
k
...
1
N
?N
k=1 I{xk?EL}x
T
k
?
??
??
??
?
. (2.35)
2.6.4 Proof of Theorem 2.12
We use the multivariate Central Limit Theorem to prove Theorem 2.12.
Proof: Let the ni ?p covariate matrix for the i-th cluster be
xi =
?
??
??
??
?
1 xi1,1 ??? xi1,p?1
... ... ...
1 xini,1 ??? xini,p?1
?
??
??
??
?
,
53
then with Vi given in (2.4),
?N(????
0) =
?N(XT ?V?1X)?1XT ?V?1(Y?X?
0)
?
(XTV?1X
N
)?1 1
?N
m?
i=1
xTi V?1i (yi ?xi?0).
Let zil =?nij=1 I{xij?El}(yij ?E(yij)) =?nij=1 I{xij?El}(yij ?xij?0),
i = 1,...,m; l = 1,...,L. Then
f?e(?0) =
?
??
??
??
?
?m
i=1 zi1
...
?m
i=1 ziL
?
??
??
??
?
,
and our test statistic is based on a quadratic form in
(f?e(??))/?N
= (f?e(?0))/?N +
(
e(?0)?e(??)
)
/?N
? (f?e(?0))/?N ??e(?0)(????0)/?N
? 1?N
?
??
??
??
?
?m
i=1 zi1
...
?m
i=1 ziL
?
??
??
??
?
? 1?N ?e(?0)N
(XTV?1X
N
)?1 m?
i=1
xTi V?1i (yi ?xi?0)
=
m?
i=1
1?
N
?
??
??
??
?
?
??
??
??
?
zi1
...
ziL
?
??
??
??
?
? ?e(?0)N
(XTV?1X
N
)?1
xTi V?1i (yi ?xi?0)
?
??
??
??
?
.
54
Let ? = N?1?e(?0), ?J?? = N?1XTV?1X. Then under Assumption 2.4, ?J?? P?
J??, and under Assumption 2.2, ? P?  (Remark 2.1), with  given in (2.30).
We next show that (f?e(??))/?N has a limiting Gaussian distribution by using
the multivariate Central Limit Theorem. For any constant vector C = (C1,...,CL),
since the inverse of Vi given in equation (2.4) is
Vi = Ini/?2? ??2a/(?2?(?2? +ni?2a))1
?2
,
we have
CTN?1/2
(
f?e(??)
)
?
m?
i=1
1?
N
[ L?
l=1
Clzil ?CT ? ?I?1??xTi V?1i (yi ?xi?0)
]
=
m?
i=1
1?
N
[ L?
l=1
Cl
ni?
j=1
I{xij?El}(yij ?xij?0)?
CT ? ?I?1??xTi
( 1
?2? Ini ?
?2a
?2?(?2? +ni?2a)1
?2)
(yi ?xi?0)
]
=
m?
i=1
1?
N
{ ni?
j=1
[ L?
l=1
ClI{xij?El}
]
(?i +?ij)?
CT ? ?I?1??
ni?
j=1
[ 1
?2? xij ?
ni?2a
?2?(?2? +ni?2a)?xi.
]
(?i +?ij)
}
=
m?
i=1
{
1?
N
ni?
j=1
[ L?
l=1
ClI{xij?El} ?CT ? ?I?1??
( 1
?2? xij ?
ni?2a
?2?(?2? +ni?2a)?xi.
)]}
?i
+
m?
i=1
ni?
j=1
1?
N
[ L?
l=1
ClI{xij?El} ?CT ? ?I?1??
( 1
?2? xij ?
ni?2a
?2?(?2? +ni?2a)?xi.
)]
?ij
=
m?
i=1
ci,ni?i +
N?
s=1
ws?s,
55
where the double index (i,j) is placed in one-to-one correspondence with the single
index s. Because{?i}mi=1 are i.i.d and{?s}Ns=1 are i.i.d, we can show both of the above
sums have limiting normal distributions as m ??, by checking the conditions (a)
and (b) of Lemma 5.1 in the Appendix. First we bound
|ci,ni|
= 1?N
  
  
 
ni?
j=1
{ L?
l=1
ClI[xij?El] ?CT ? ?I?1??
( 1
?2? xij ?
ni?2a
?2?(?2? +ni?2a)?xi.
)}   
  
= 1?N
  
  
 
ni?
j=1
L?
l=1
ClI[xij?El] ?CT ? ?I?1??
( 1
?2? ni?xi. ?
ni?2a
?2?(?2? +ni?2a)ni?xi.
)   
  
= 1??m
i=1 ni/m
1?
m
  
  
 
L?
l=1
Cl
ni?
j=1
I{xij?El} ?CT ? ?I?1?? ni?
2
a
?2? +ni?2a
1
?2a?xi.
  
  
 
? 1??m
i=1 ni/m
1?
m
(
ni
  
  
 
L?
l=1
Cl
  
  
 +
  
  CT ? ?I?1?? 1
?2a?xi.
  
  
)
? 1?E(n) ?0 ?i = 1,...,m
Next,
m?
i=1
c2i,ni
= 1?m
i=1 ni/m
1
m
m?
i=1
{ L?
l=1
Cl
ni?
j=1
I{xij?El} ?CT ? ?I?1?? ni?
2
a
?2? +ni?2a
1
?2a?xi.
}2
? 1?m
i=1 ni/m
1
m
m?
i=1
2
?
?
?
( L?
l=1
Cl
)2
n2i +
( 1
?2a
)2
CT ? ?I?1???xi.?xTi.?I?1?? ? TC
?
?
?
P? 2
E(n)
?
?
?
( L?
l=1
Cl
)2
En2 + 1?4
a
CT ? ?I?1??
[
limm?? 1m
m?
i=1
?xi.?xTi.
]
?I?1?? ? TC
?
?
?
= a ?nite constant. (2.36)
56
Thus condition (a) in Lemma 5.1 holds, i.e.
max
1?i?m
|ci,ni| P? 0, as m ??.
Since (xi,ni) are assumed i.i.d., and based on (2.36), the c2i,ni terms can be
written as g(xi,ni)/m after removing the factor m/(?mi=1 ni). This function g is the
same across i, with E(g(xi,ni)) < ?. Thus the Law of Large Numbers Theorem
holds for (xi,ni), condition (b) in Lemma 5.1 holds, i.e.
m?
i=1
c2i,ni P? a ?nite constant.
Then ?mi=1 ci,ni?i is asymptotically normally distributed since both conditions (a)
and (b) in Lemma 5.1 are satis?ed.
We do the same checking for ?Ns=1 ws?s.
|ws| = 1?N
  
  
 
L?
l=1
ClI{xij?El} ?C? ? ?I?1??
( 1
?2? xij ?
ni?2a
?2?(?2? +ni?2a)?xi.
)   
  
? 1?N
[  
  
 
L?
l=1
ClI{xij?El}
  
  
 +
  
  C? ? ?I?1?? 1
?2? xij
  
  +
  
  C? ? ?I?1?? ni?2a
?2?(?2? +ni?2a)?xi.
  
  
]
? 1?N
[ L?
l=1
|Cl|+ 1?2
?
  
 C? ? ?I?1??xij
  
 +
  
 C? ? ?I?1???xi.
  
 
]
P? 0 ?s = 1,...,N. (2.37)
57
Based on (2.37) and the fact that (x+y +z)2 ? 3(x2 +y2 +z2), we have
N?
s=1
w2s = 1N
N?
s=1
[ L?
l=1
ClI{xij?El} ?C? ? ?J?1??
( 1
?2? xij ?
ni?2a
?2?(?2? +ni?2a)?xi.
)]2
? 1N
N?
s=1
[ L?
l=1
|Cl|+ 1?2
?
  
 C? ? ?J?1??xij
  
 + 1?2
?
  
 C? ? ?J?1???xi.
  
 
]2
? 1N
N?
s=1
3
?
?
( L?
l=1
|Cl|
)2
+
( 1
?2?
)2  
 C? ? ?J?1??xij
  
 
2 +( 1
?2?
)2  
 C? ? ?J?1???xi.
  
 
2
?
?
P? 3
( L?
l=1
|Cl|
)2
+3
( 1
?2a
)2   
  C? ? ?J?1??
(
lim
N??
1
N
m?
i=1
ni?
j=1
xijx?ij
)
?J?1?? ? ?C
  
  
 
+ 3
( 1
?2a
)2   
  C? ? ?J?1??
(
lim
N??
1
N
m?
i=1
ni?xi.?x?i.
)
?J?1?? ? ?C
  
  
 
= a ?nite constant.
Therefore, we have
max
1?s?N
|ws| P? 0, as N ??.
With the same argument that we did on ?mi=1 ci,ni?i, we get the asymptotic nor-
mality of ?Ns=1 ws?s by checking conditions (a) and (b) of Lemma 5.1.
We have shown above that both?mi=1 ci,ni?i and?Ns=1 ws?s are asymptotically
normal. Because ?i and ?ij are independent, the sums ?mi=1 ci,ni?i and ?Ns=1 ws?s
are conditionally independent given (xi,ni). These two sums are jointly normal and
asymptotically uncorrelated. Therefore they are asymptotically independent. Thus
the limiting distribution of C?N?1/2(f?e(??)), which is asymptotically the sum of
these two quantities, is normal. Moreover, for any constant vector C, its limiting
58
variance is of the form CT C with the same ?xed  ,
 = lim
N??
1
N
m?
i=1
Var
?
??
??
??
?
?ni
j=1 I{xij?E1}(yij ?xij?0)
...
?ni
j=1 I{xij?EL}(yij ?xij?0)
?
??
??
??
?
?
lim
N??
[?e(?
0)
N
][XTV?1X
N
]?1[?e(?
0)
N
]T
= H? J?1?? T.
Therefore, N?1/2(f ? e(??)) D? N(0, ), and (f ? e(??))T ?1(f ? e(??))/N D? ?2k,
where k = rank( ). We replace  with ? , the reconstructed matrix by applying
Singular Value Decomposition on a consistent estimator of  . One such consistent
estimator of  is to replace all parameters in  with their MLEs. Based on Lemma
5.3, rank (? ) = rank ( ) for large N. Thus
(f?e(??))T ? ?1(f?e(??))/N D? ?2k.
2.6.5 Derivation of the power of T
We derive the power of the test for multi level LMM (2.1) under contiguous
alternatives, based on Le Cam?s third lemma (Van der Vaart, 2000), as stated below.
Lemma 2.14 (Le Cam?s third lemma) Let Pm and Qm be two measures on a
measurable space, corresponding to a null distribution under investigation, and an
59
alternative hypothesis respectively. If
(Wm,log dQmdP
m
) Pm?NL+1
?
??
?
?
??
?
?
??2?/2
?
??
?,
?
??
?
? ?
?T ?2?
?
??
?
?
??
?, (2.38)
then Wm Qm? NL(?+?,?). 2
Let
H0 :?N =?0,
H1 :?N =?0 + a?N,
where a is a constant vector, a/?N ? 0, as N ? ?. Thus ?N ??0, as N ? ?.
By Taylor expansion, under Theorem 5.21 in Van der Vaart (2000),
log dQNdP
N
= log Likelihood(?N;Y,X)Likelihood(?
0;Y,X)
?= log L(?N)
L(?0)
? (?log(L(?0)))T a?N + 12 a
T
?N (??2 log(L(?0))) a?N
? (SN(?0))T a?N ? 12aTJ(?0)a,
60
where
SN(?0) = ?log(L(?0))
=
?
??
??
??
?
XTV?1(Y?X?0)
?12tr(V?1 ?V??2
a
)+ 12(Y?X?0)TV?1 ?V??2
a
V?1(Y?X?0)
?12tr(V?1)+ 12(Y?X?0)TV?1V?1(Y?X?0)
?
??
??
??
?
(2.39)
is the score function for all observations (Chapter 6, McCulloch and Searle, 2001).
Under Assumptions 2.4 and 2.5, we get the existence of
J(?0) = lim
N??
??
?2
log(L(?0))/N,
the limit of the sample Fisher information per observation and
lim
N??
Var(SN(?0)/?N) = J(?0).
Thus
log dQNdP
N
PN? N(?1
2a
TJ(?0)a,aTJ(?0)a).
For the special case when we ?t a reduced model to the data, using X?N?p? instead
of XN?p with p? < p, we only estimate the coe?cient ?? corresponding to X?. The
e(?) function in (2.7)
el(?) =
m?
i=1
ni?
j=1
I{xij?El}E(yij) =
m?
i=1
ni?
j=1
I{xij?El}xTij?
61
has Rp as its domain. Let function e?(?) have the same meaning of function e(?),
but with Rp? as its domain.
e?l(??) =
m?
i=1
ni?
j=1
I{xij?El}E?(yij) =
m?
i=1
ni?
j=1
I{xij?El}(x?ij)T??.
Let WN = (f?e?(???))/?N be the ?rst vector component of (2.38). Under the null
hypothesis PN, WN is asymptotically normal, WN?N(0, ?), based on Corollary
2.7.
Next, we compute the variance-covariance matrix  in (2.38), which is equiv-
alent to the variance-covariance matrix of aTSN(?0)/?N and (f?e?(???))/?N.
1?
N(f?e
?(???)) ? 1?
N(f?e
?(??
0))?
1?
N?e
?(??
0)( ??
? ???
0)
? 1?N(f?e?(??0))? ??N( ??? ???0)
? 1?N(f?e?(??0))? ?(J???)?1(X?)TV?1(Y?X???0)/?N
= 1?N(A?B)?(Y?X???0),
where J??? denotes the information matrix when the second derivative for the log-
likelihood function is taken with respect to ??, and
A =
?
??
??
??
?
I{x11?E1}???I{xmnm?E1}
...
I{x11?EL}???I{xmnm?EL}
?
??
??
??
?
, B =  ?(J???)?1(X?)TV?1.
62
Thus,
Cov(f?e
?(???)
?N ,log dQndP
n
)
= Cov(f?e
?(???)
?N , a
TSN(?0)
?N )
= 1NCov(f?e?(???),aT1 S? +a2S?2a +a3S?2?). (2.40)
Under equation (2.39), since both tr(V?1(?V/??2a)) and tr(V?1) are constants, we
have
Cov
(
f?e?(???),S?2a
)
= Cov
(
(A?B)?(Y?X???0), 12(Y?X?0)TV?1 ?V??2
a
V?1(Y?X?0)
)
= (A?B) Cov
(
Y?X???0, 12(Y?X???0)TV?1 ?V??2
a
V?1(Y?X???0)
)
= 0,
because Cov(Y?X???0, 12(Y?X???0)TV?1 ?V??2
a
V?1(Y?X???0)) = 0.
Similarly, we get
Cov(f?e?(???),S?2?) = 0.
63
Therefore (2.40) becomes
Cov(f?e
?(???)
?N ,log dQndP
n
)
= 1N Cov
(
f?e?(???),aT1 S?
)
= 1N Cov((A?B)?(Y?X???0),aT1 XTV?1(Y?X???0))
= 1N (A?B) Var(Y) V?1Xa1
= 1N (A?B)Xa1
=
{
 ? 1N ?(J???)?1[(X?)TV?1X]
}
a1
=
{
 ? ?[(X?)TV?1(X?)]?1[(X?)TV?1X]
}
a1. (2.41)
Since both f?e?(???) and aT1 S? can be written as a matrix multiply by the same
normal vector Y?X?0 = Y?X???0, we easily get the asymptotic joint normality
of f ? e?(???) and aT1 S?. Because f ? e?(???) is asymptotically uncorrelated with
both S?2a and S?2? as shown in the above context, we also get the asymptotic jointly
normality of f?e?(???) and aTSN(?0). 2
We next calculate the limits of the terms in
Cov
(
N?1/2(f?e?(???)),log(dQn/dPn)
)
in Section 2.6.5.1 and 2.6.5.2.
2.6.5.1 Limit of (X?)TV?1X and (X?)TV?1(X?) in (2.22)
The analytical expressions in (2.41) calculated in this subsection as well as in
Subsection 2.6.5.2 are used to get the theoretical powers in the settings discussed in
the simulation Section 2.3, where xij,i = 1,...,m;j = 1,...,ni are i.i.d., and xij
64
and ni are independently generated.
Note that (X?)TV?1(X?) is a submatrix of (X?)TV?1X. To get the limit of
(X?)TV?1X/N forthesettinginthesimulationSection, weassumeX = (1,x1,x2,x3)
andX? = (1,x1,x2). The ith block matrix (2.4) can then be written as ?2?Ini+?2a1?2ni ,
with its inverse matrix being Ini/?2? ??2a/(?2?(?2? +ni?2a))1?2ni . Thus
65
1
N(X
?)TV?1X
= 1N
?
??
??
??
??
??
??
??
?
...
1 xi1,1 xi1,2
...
1 xini,1 xini,2
...
?
??
??
??
??
??
??
??
?
T
[diag(?2
?Ini +?
2
a1
?2
ni )
]?1
?
??
??
??
??
??
??
??
?
...
1 xi1,1 xi1,2 xi1,3
...
1 xini,1 xini,2 xini,3
...
?
??
??
??
??
??
??
??
?
= 1N
m?
i=1
?
??
??
??
?
1 ??? 1
xi1,1 ??? xini,1
xi1,2 ??? xini,2
?
??
??
??
?
[ 1
?2? Ini ?
?2a
?2?(?2? +ni?2a)1
?2
ni
]
?
?
??
??
??
?
1 xi1,1 xi1,2 xi1,3
...
1 xini,1 xini,2 xini,3
?
??
??
??
?
= 1?2
?
1
N
m?
i=1
?
??
??
??
?
1 ??? 1
xi1,1 ??? xini,1
xi1,2 ??? xini,2
?
??
??
??
?
?
??
??
??
?
1 xi1,1 xi1,2 xi1,3
...
1 xini,1 xini,2 xini,3
?
??
??
??
?
?
1
N
m?
i=1
?2a
?2?(?2? +ni?2a)
?
??
??
??
?
1 ??? 1
xi1,1 ??? xini,1
xi1,2 ??? xini,2
?
??
??
??
?
1?2ni
?
??
??
??
?
1 xi1,1 xi1,2 xi1,3
...
1 xini,1 xini,2 xini,3
?
??
??
??
?
.
66
Here the ?rst sum is
1
?2?
1
N
m?
i=1
?
??
??
??
?
1 ??? 1
xi1,1 ??? xini,1
xi1,2 ??? xini,2
?
??
??
??
?
?
??
??
??
?
1 xi1,1 xi1,2 xi1,3
...
1 xini,1 xini,2 xini,3
?
??
??
??
?
= 1?2
?
1
N
m?
i=1
ni?
j=1
?
??
??
??
?
1 xij,1 xij,2 xij,3
xij,1 x2ij,1 xij,1xij,2 xij,1xij,3
xij,2 xij,1xij,2 x2ij,2 xij,2xij,3
?
??
??
??
?
.
As N ??, its limit is
1
?2?
?
??
??
??
?
1 Ex1 Ex2 Ex3
Ex1 Ex21 E(x1x2) E(x1x3)
Ex2 E(x1x2) Ex22 E(x2x3)
?
??
??
??
?
.
The second sum is
1
N
m?
i=1
?2a
?2?(?2? +ni?2a)
?
??
??
??
?
1 ??? 1
xi1,1 ??? xini,1
xi1,2 ??? xini,2
?
??
??
??
?
1?2ni
?
??
??
??
?
1 xi1,1 xi1,2 xi1,3
...
1 xini,1 xini,2 xini,3
?
??
??
??
?
= 1N
m?
i=1
?2a
?2?(?2? +ni?2a)?
?
??
??
??
?
n2i ni?nij=1 xij,1 ni?nij=1 xij,2 ni?nij=1 xij,3
ni?nij=1 xij,1 (?nij=1 xij,1)2 d1 d2
ni?nij=1 xij,2 d1 (?nij=1 xij,2)2 d3
?
??
??
??
?
67
where d1 = (?nij=1 xij,1)(?nij=1 xij,2), d2 = (?nij=1 xij,1)(?nij=1 xij,3),
d3 = (?nij=1 xij,2)(?nij=1 xij,3) and
1
N
m?
i=1
?2a
?2?(?2? +ni?2a)n
2
i =
1
(?mi=1 ni)/m ?
1
m
m?
i=1
?2a
?2?(?2? +ni?2a)n
2
i
P? 1
E(n) ?E
[ ?2a
?2?(?2? +n?2a) ?n
2]
= 1E(n) ? ?
2
a
?2? ?c1,
where c1 = E(n2/(?2? +n?2a)).
1
N
m?
i=1
?2a
?2?(?2? +ni?2a)ni
ni?
j=1
xij,1 = 1(?m
i=1 ni/m)
? 1m
m?
i=1
ni?2a
?2?(?2? +ni?2a)
( n
i?
j=1
xij,1
)
P? 1
E(n) ?E(x,n)
[
n?2a
?2?(?2? +n?2a)
( n?
j=1
xj,1
)]
= 1E(n) ?En
{
Ex
[
n?2a
?2?(?2? +n?2a)
( n?
j=1
xj,1
) 
  n
]}
= 1E(n) ?En
{ n2?2
a
?2?(?2? +n?2a) ?EX1
}
= EX1E(n) ? ?
2
a
?2? ?c1.
68
1
N
m?
i=1
?2a
?2?(?2? +ni?2a)
( n
i?
j=1
xij,1
)( n
i?
j=1
xij,2
)
= 1(?m
i=1 ni)/m
? 1m
m?
i=1
?2a
?2?(?2? +ni?2a)
( n
i?
j=1
xij,1
)( n
i?
j=1
xij,2
)
P? 1
E(n) ?E(x,n)
{
?2a
?2?(?2? +n?2a)
( n?
j=1
xj,1
)( n?
j=1
xj,2
)}
= 1E(n) ?En
{
Ex
[
?2a
?2?(?2? +n?2a)
( n?
j=1
xj,1
)( n?
j=1
xj,2
) 
  n
]}
= 1E(n) ?En
{ ?2
a
?2?(?2? +n?2a)
[nE(x
1x2) + (n2 ?n)?Ex1 ?Ex2
]}
= 1E(n) ? ?
2
a
?2?
{
En
( n
?2? +n?2a
)
?E (x1x2) +En
( n2 ?n
?2? +n?2a
)
?Ex1Ex2
}
.
Let c2 = En[n/(?2? +n?2a)], then the above limit is
1
E(n) ?
?2a
?2? {c2 ?E(x1x2)+c3 ?Ex1Ex2},
where c3 = c1 ?c2.
Combining the above expressions, we have the limit for the second component of
1
N(X
?)TV?1X:
1
E(n) ?
?2a
?2?
?
??
??
??
?
c1 c1Ex1 c1Ex2 c1Ex3
c1Ex1 h1 h2 h4
c1Ex2 h2 h3 h5
?
??
??
??
?
,
with h1 = c2Ex21 + c3(Ex1)2, h2 = c2E(x1x2) + c3Ex1Ex2, h3 = c2Ex22 + c3(Ex2)2,
h4 = c2E(x1x3) +c3Ex1Ex3 and h5 = c2E(x2x3)+c3Ex2Ex3.
69
Finally,
1
N(X
?)TV?1X = 1
?2?
1
N
m?
i=1
(X?i)TXi ? 1N
m?
i=1
?2a
?2?(?2? +ni?2a)(X
?
i)
T1?2
ni Xi
P? 1
?2?
?
??
??
??
?
1 Ex1 Ex2 Ex3
Ex1 Ex21 E(x1x2) E(x1x3)
Ex2 E(x1x2) Ex22 E(x2x3)
?
??
??
??
?
? 1E(n)?
2
a
?2?
?
??
??
??
?
c1 c1Ex1 c1Ex2 c1Ex3
c1Ex1 h1 h2 h4
c1Ex2 h2 h3 h5
?
??
??
??
?
,
where c1 = En[n2/(?2? +n?2a)], c2 = En[n/(?2? +n?2a)] and c3 = c1 ? c2, h4 =
c2E(x1x3) +c3Ex1Ex3 and h5 = c2E(x2x3) +c3Ex2Ex3.
Since (X?)TV?1X? is a submatrix of (X?)TV?1X,
 ? = lim
N??
1
N(X
?)TV?1X?
= 1?2
?
?
??
??
??
?
1 Ex1 Ex2
Ex1 Ex21 E(x1x2)
Ex2 E(x1x2) Ex22
?
??
??
??
?
? 1E(n)?
2
a
?2?
?
??
??
??
?
c1 c1Ex1 c1Ex2
c1Ex1 h1 h2
c1Ex2 h2 h3
?
??
??
??
?
.
2.6.5.2 Limit of ? in (2.22)
We discuss the case when the cell partition is based on x3. Again, the results
got from this Section is used to calculate the theoretical power in the scenarios of
Section 2.3.
70
Let El be the lth cell of the cell partition, l = 1,...,L, then the elements of
? are:
1
N
m?
i=1
ni?
j=1
I{xij,3?El} P??
?
El
f3(x3)dx3
1
N
m?
i=1
ni?
j=1
I{xij,3?El}xij,1 P??
?
x1
?
El
x1f(x1,x3)(x1,x3)dx3dx1
1
N
m?
i=1
ni?
j=1
I{xij,3?El}xij,3 P??
?
El
x3f3(x3)dx3.
When (x1,x2,x3) are jointly normal, which is the Scenario I in Section 2.3.1, F3(x3)
and f(x1,x3)(x1,x3) are the corresponding normal and bivariate normal densities.
When x3 = x1x2 and x1 and x2 independent, i.e. ?12 = 0, which is the Scenario
II in Section 2.3.2, we can calculate F3(x3) and f(x1,x3)(x1,x3) as follows:
F3(z) = P(x1x2 ? z)
= P(x1x2 ? z,x1 > 0) +P(x1x2 ? z,x1 < 0)
=
? ?
{x2<z/x1,x1>0}
f(x1x2)(x1,x2)dx1dx2 +
? ?
{x2>z/x1,x1<0}
f(x1x2)(x1,x2)dx1dx2
=
? ?
x1=0
? z
x1
x2=??
fx1(x1)fx2(x2)dx1dx2 +
? 0
x1=??
? ?
x2= zx1
fx1(x1)fx2(x2)dx1dx2
=
? ?
0
F2
( z
x1
)
f1(x1)dx1 +
? 0
??
[
1?F2
( z
x1
)]
f1(x1)dx1
f3(z) = dF3(z)dz =
? ?
0
f2
( z
x1
) 1
x1f1(x1)dx1 ?
? 0
??
f2
( z
x1
) 1
x1f1(x1)dx1.
To get the joint distribution of (x1,x3) = (x1,x1x2),
71
let
?
???
???
y1 = x1
y2 = x1x2
, then equivalently,
?
???
???
x1 = y1
x2 = y2y1
.
So the Jacobian matrix J =
  
  
  
  
?x1
?y1
?x1
?y2
?x2
?y1
?x2
?y2
  
  
  
  
=
  
  
  
  
1 0
?y2y2
1
1
y1
  
  
  
  
= 1|y1|
Thus
f(y1,y2)(y1,y2) = f(x1,x2)
(
y1, y2y
1
)
?J
= fx1(y1)?fx2
(y
2
y1
)
? 1|y
1|
Similarly, the joint distribution of (x2,x3) is
f(x2,x3)(x2,x3) = f(x1,x2)
(x
3
x2,x2
)
? 1|x
2|
= fx1
(x
3
x2
)
fx2(x2)? 1|x
2|
Remark 2.15 The limiting terms calculated in Section 2.6.5.1 and 2.6.5.2 are used
to calculate the theoretical power in Section 2.2.3. Figure 2.1, 2.2 and 2.3 are plotted
by making use of these limit expressions. 2
72
Chapter 3
Goodness of ?t tests for generalized linear mixed models
3.1 Generalized linear mixed models (GLMMs)
For i = 1,...,m, and j = 1,...,ni, let xij (with the ?rst component be-
ing 1) and yij be the covariate and outcome value for the jth subject in cluster
i respectively, and let yi = (yi1,...,yini). The GLMM has the following form:
E(yij|ui) = g(xTij?+wTijui), where g(.) is a known strictly monotonic and di?eren-
tiable function, the i.i.d. cluster random e?ects ui are assumed to be from a known
probability distribution, fu(ui), with unknown parameter vector ? = (?21,...,?2s),
and wij are covariates for the random e?ects. We assume that the conditional dis-
tribution of yij given ?ij = xTij?+wTijui and ?, follows a distribution from the expo-
nential family with density function fy|?(y|?,?) = exp{[yQ(?)?b(?)]a(?)+c(y,?)},
where ? is generally an unknown parameter related to Var(yij). In this Chapter,
we restrict the exponential family to the canonical form, that is Q(?) = ?. One can
always de?ne a transformed parameter to convert an exponential family to canonical
form. The response variables yij are conditionally independent given the random
e?ect ui. The covariates xij are usually treated as ?xed in practice. However, to
deal with technical issues arising when we prove the consistency of the maximum
likelihood estimators and the asymptotic properties of the test statistic, we assume
throughout this chapter that (xi,ni) are i.i.d.
73
The likelihood function for ? = (?,?,?) is
L(?) =
m?
i=1
? ni?
j=1
fy|?(yij|?ij)dF(?ij).
The maximum likelihood estimator (MLE) of ?, denoted by ??, is the solution to
S(?) = 0, where the S denotes the score function
S(?) = ?logL(?)?? =
m?
i=1
Si(?).
In what follows, we will focus on GLMMs with a single random intercept,
given by
E(yij|?i) = g(xTij?+?i), ?i ? N(0,?2). (3.1)
In this case, the parameter vector ? for variance components reduces to a single
parameter ?2.
The linear mixed model, which is an important case of GLMMs, has been
carefully studied in Chapter 2. We now describe another two special cases of the
GLMMs with random intercept.
3.1.1 Mixed-e?ects logistic models
For mixed-e?ects logistic models with a cluster speci?c random intercept,
yij given pij, where pij = P(yij = 1), comes from a binomial distribution: yij ?
74
Binom(1,pij) and logit(pij) = log(pij/(1?pij)) = xTij?+?i, where the i.i.d. cluster
random e?ects ?i are assumed to be N(0,?2). The response variables yij are condi-
tionally independent given ?i. The unknown parameter vector is now ? = (?,?2).
The marginal probability of the response for the ith cluster, conditionally given
(xij,ni), under the random intercept logistic mixed model
yij ? Binom(1,pij), logit(pij) = xTij?+?i, ?i ? N(0,?2) (3.2)
is
P(Yij = yij) =
? ni?
j=1
pyijij q1?yijij dF(?i),
where qij = 1?pij.
The means and variance-covariances for yij, i = 1,...,m; j = 1,...,ni, ,
conditionally given (xij,ni), are
E(yij) = E(E(yij|?i)) = E(pij) =
?
pij dF(?i),
Var(yij) = E[Var(yij|?i)] +Var[E(yij|?i)]
= E[pij(1?pij)] +Var(pij) = E(pij)?[E(pij)]2,
Cov(yis,yit) =
?
pispitdF(?i)?E(pis)E(pit), ?s ?= t.
75
3.1.2 Mixed-e?ects Poisson models
For the mixed-e?ects Poisson regression model with a cluster speci?c random
e?ect, responses yij given ?ij, i = 1,...,m, j = 1,...,ni, follow Poisson distribu-
tions with means ?ij and g?1(?ij) = xTij? + ?i, where the i.i.d. cluster random
e?ects ?i are assumed N(0,?2) and g is a known monotonic and di?erentiable link
function. When g?1 is the log function, this link function is canonical, transforming
the mean ?ij to the natural exponential parameter. The response variables yij are
conditionally independent given the random e?ects ?i. The unknown parameter
vector in this case is ? = (?,?2).
The marginal probability of the response for the ith cluster, conditionally given
(xij,ni), under the mixed-e?ect Poisson regression model is
P(Yij = yij) =
? ni?
j=1
?yijij exp(??ij)
yij! dF(?i).
When the link function is canonical, as we assume from now on, the means
and variance-covariances for the response variables yij, i = 1,...,m; j = 1,...,ni,
conditionally given (xij,ni), are
E(yij) = E[E(yij|?ij)] = E(?ij) = exp(xTij?+?2/2),
Var(yij) = E[Var(yij|?ij)] +Var[E(yij|?ij)]
= exp(xTij?+?2/2) + exp(2xTij?+ 2?2)?exp(2xTij?+?2),
76
Cov(yis,yit) = [exp(2?2)?exp(?2)]exp(xTis?+xTit?), ? s ?= t.
3.2 Proof of the consistency of MLE for GLMMs
Let f(yi;?) be the likelihood function for the ith cluster, i = 1,...,m. The
log likelihood function for the whole set of observations is ?mi=1 logf(yi;?). The
normalized score function is
Sm(?) = 1m
m?
i=1
?logf(yi;?) = 1m
m?
i=1
(?/??)logf(yi;?).
The maximum likelihood estimator (MLE) ?? solves Sm(?) = 0. We always assume
natural and canonical parameterization. With probability 1 as m gets large, the
MLE exists and is unique (Bickel and Doksum 2006).
Let B?(?0) = {? : d(?,?0) ? ?} be the ?-neighborhood of the true parameter
vector ?0, which is an open convex Borel set in Rp. Let
?Mm(?) = 1
m
m?
i=1
logf(yi;?), ?M(?) = E[logf(yi;?)]
and
J(?;?0) = ?E?0[?
?2
logf(y1,x1,n1;?)].
The following set of assumptions (Assumption 3.1) are used for the consistency proof
of the MLE ?? as well as for the asymptotic properties of the test statistic that will
be discussed in the Section following. We comment on this set of Assumptions in
Remark 3.1 below.
77
Assumption 3.1 B.0 (xi,ni) are i.i.d.
B.1 ?M(?) and J(?;?0) exist for all ?;
B.2 J(?0;?0) is positive de?nite;
B.3 J(?;?0) is continuous at ?0 as a function of ?;
B.4 (?/??)??2 logf(y1,x1,n1;?) exists and is integrable .
B.5 Derivatives and expectations are interchangeable for logf(y1,x1,n1;?)
up to the third derivative;
B.6 The true parameter point ?0 = (?0,?20) is an interior point of  =
(Rp,R+);
Remark 3.1 B.1 in Assumption 3.1 ensures that we can apply the Law of Large
Numbers (LLN) to (m?1)?mi=1 logf(yi,xi,ni;?) and
?(m?1)?mi=1??2 logf(yi,xi,ni;?). In order for B.1 ? B.5 to hold, assumptions
are needed on (xi,ni). We provide su?cient assumptions on (xi,ni) and check B.2
for the random intercept logistic mixed model stated as Lemma 3.8 in Section 3.6.1.
To check B.3 and B.5 in Assumption 3.1 , based on the Dominated Convergence
Theorem, it su?ces to show that ? ??  , there exists B?(?), such that
?E?0[ sup
???B?(?)
??2 logf(y1,x1,n1;??)] < ?. (3.3)
We refer to condition (3.3) as the dominatedness condition. We provide su?cient
assumptions on (xi,ni) and check (3.3) for both the random intercept logistic mixed
model (stated as Lemma 3.9 in Section 3.6.2 ) and the random intercept Poisson
mixed model (stated as Lemma 3.10 in Section 3.6.3). 2
78
Lemma 3.2 Under Assumption 3.1 within model (3.1), there exists an open neigh-
borhood U of ?0, such that
P( ?Mm(?) is a concave function of ? on U) ? 1, as m ? +?. (3.4)
Proof: Based on the existence of ?M(?) and J(?;?0) as stated in B.1 of Assumption
3.1, by the Law of Large Numbers on logf(yi,xi,ni;?) and ??2 logf(yi,xi,ni;?),
pointwise for each ?? ?,
?Mm(?) P? ?M(?), (3.5)
and
? 1m
m?
i=1
??2 logf(yi,xi,ni;?) ? J(?;?0), as m ? +?. (3.6)
By B.2 { B.3 of Assumption 3.1, there exists U = B?(?0) such that ???U,
J(?;?0) is positive de?nite. By B.1 together with B.4 of Assumption 3.1, the
uniform LLN holds for (m?1)?mi=1??2 logf(yi,xi,ni;?) (van der Vaart, 2000, page
271, Example 19.7), i.e.
sup
??U
?? 1m
m?
i=1
??2 logf(yi,xi,ni;?)?J(?;?0)?? 0, as m ? +?.
Then it follows that ?M(?) = (m?1)?mi=1 logf(yi,xi,ni;?) is a concave function of
? ?U with probability approaching 1. 2
To show consistency of the MLE, we use two theorems from Andersen and
Gill (1982) and van der Vaart (2000) which are stated here before we summarize
our result as a theorem.
79
Theorem 3.3 (Theorem II.1 in Appendix II of Andersen and Gill (1982))
Let E be an open convex subset of Rp and let M1,M2,..., be a sequence of random
concave functions on E such that ? ? ? E, Mm(?) P? M(?), as m??, where M is
some real function on E. Then M is also concave and for all compact A ? E,
sup
??A
|Mm(?)?M(?)| P? 0,as m??. (3.7)
Theorem 3.4 (van der Vaart 2000, p.45) Let Mm be random functions and let
M be a ?xed function of ? such that for every ? > 0,
sup
?? 
|Mm(?)?M(?)| P? 0,
sup
?:d(?,?0)??
M(?) < M(?0).
Then any sequence of estimators ??m with Mm(??m) ? Mm(?0) ? op(1) converges in
probability to ?0.
Theorem 3.5 Under Assumption 3.1, the MLE ?? of the GLMM (3.1) is consistent.
Proof: Let Mm(?) = ?Mm(?)I{ ~Mm concave on U}. By Lemma 3.2, the MLE ?? is also
the solution to ?Mm(?) = 0, with probability approaching 1. Together with (3.5),
based on Theorem 3.3, we get (3.7), which is the uniform convergence of Mm(?).
Based on Theorem 3.4, the MLE ?? is consistent. 2
Remark 3.6 Whether or not the natural parameterization is used, with probability
80
approaching 1 for large m, the MLE ?? is unique in B?(?0) and is the unique solution
of the likelihood score equation in that neighborhood. 2
3.3 Goodness of ?t test for GLMMs
To test the goodness of ?t for a proposed GLMM with random intercept,
?rst we partition the covariate space into non-overlapping cells E1, ..., EL. For
l = 1,...,L, de?ne
fl =
m?
i=1
ni?
j=1
I{xij?El}yij, el(?) =
m?
i=1
ni?
j=1
I{xij?El}E(yij). (3.8)
Let f = (f1,...,fL), e(?) = (e1(?),...,eL(?)). For simplicity, we denote
E?(yij)|?=^? = ?E(yij). Theteststatisticisbasedontheobservedminustheexpected
counts,
f?e(??) =
?
??
??
??
?
?m
i=1
?ni
j=1 I{xij?E1}(yij ? ?E(yij))
...
?m
i=1
?ni
j=1 I{xij?EL}(yij ? ?E(yij))
?
??
??
??
?
,
a vector of length L.
Theorem 3.7 For GLMM with random intercept (3.1), let E1,...,EL constitute a
partition of the covariate space generated by X into disjoint sets such that
E(?nij=1 I{xij?El}) > 0 for all l = 1,...,L. Under Assumption 3.1 in Section 3.2, as
m ??,
(f?e(??))? ? ?1svd(f?e(??))/m D? ?2k,
81
where ? svd is the reconstructed L ? L square matrix by applying Singular Value
Decomposition on a consistent estimator ? , given in (3.11), of  = Var(?i) in
(3.10). k = rank ( ) = rank (? svd) for large m. ? ?1svd is the Moore ? Penrose
pseudoinverse.
Proof: The notation A ? B is used to indicate A?B P? 0. Since the MLE ?? is
consistent (Section 3.2), by a ?rst order Taylor series expansion of the score function
S(??) around?0, we obtain the approximation up to terms asymptotically negligible
in probability,
????0 ? (? 1
m
?S(?0)
?? )
?1 1
mS(?0)) =
?J?10 1
m
m?
i=1
Si(?0),
where Si(?0) is the score function for the ith cluster and the conditional sample
?sher information ?J0 = ?m?1?/??0S(?0)|(xi,ni) P? J(?0;?0). The existence of
J(?0;?0) is ensured by B.1 in Assumption 3.1, and its invertibility by B.2. For the
rest of this proof, we use J0 to denote J(?0;?0).
With E(yij) = E?i[g(xTij? + ?i)], di?erentiation under the integral sign by
virtue of B.5 implies that
?E(yij)
?? = E?i[g
?(xT
ij?+?i)]?xij =
?
?i
g?(xTij?+?i)f?i d?i ?xij,
and
?E(yij)
??2 =
?
?i
g(xTij?+?i)?f?i??2 d?i.
82
By applying the LLN to the summands over i de?ning el(?0), as N ??,
? = 1
m ?e(?0) =
?
??
??
??
?
m?1?mi=1?nij=1 I{xij?E1} ???
0
E(yij)
...
m?1?mi=1?nij=1 I{xij?EL} ???
0
E(yij)
?
??
??
??
?
P?
?
??
??
??
?
?T1
...
?TL
?
??
??
??
?
=  .
For l = 1,...,L,
?Tl = E(xi,ni)
( n
i?
j=1
I{xij?El}?E?i(g(xij
T?+?i))
??0
)
.
Let zil =?nij=1 I{xij?El}(yij ?E(yij)), i = 1,...,m; l = 1,...,L. Then
f?e(?) =
( m?
i=1
zi1,...,
m?
i=1
ziL
)T
and our test statistic is based on a quadratic form in the vector
(f?e(??))/?m = (f?e(?0))/?m+ (e(?0)?e(??))/?m
? (f?e(?0))/?m??e(?0)(????0)/?m
? 1?m
( m?
i=1
zi1,...,
m?
i=1
ziL
)T
? 1?m?e(?0)?J?10 1m
m?
i=1
Si(?0)
= 1?m
m?
i=1
[
(zi1,...,ziL)T ? ? ?J?10 Si(?0)
]
= 1?m
m?
i=1
?i. (3.9)
Under the assumption that (xi,ni) are i.i.d. for i = 1,...,m, also the variables
?i = ?i(yi,xi,ni;?) are i.i.d. with mean 0. Under B.1 { B.3 in Assumption 3.1,
83
Var(?i) =  exists. By multivariate Central Limit Theorem,
(f?e(??))/?m D? N(0, ). (3.10)
Therefore
1
m(f?e(
??))? ?1(f?e(??)) D? ?2k,
where k = rank( ). We estimate  using
? = 1
m
m?
i=1
?Var(?i|xi,ni), (3.11)
where ?Var(?i|xi,ni) means that the MLE ?? is substituted for ? in Var(?i|xi,ni).
? is a consistent estimator of  and its simpli?ed form is given in Section
3.6.4. Then by Slutsky?s theorem, the test statistic
T = 1m(f?e(??))? ? ?1(f?e(??)) D? ?2k, (3.12)
We then compute Singular Value Decomposition for ? . For each eigenvalue of ? ,
we compare it with a preset upper bound ?. For any eigenvalue less than ?, we
instead set this eigenvalue to be 0 and reconstruct the ? matrix using the non-zero
eigenvalues and their corresponding eigenvectors. We denote this reconstructed
matrix as ? svd. Based on Corollary 5.3 given in the Appendix, P(rank (? svd) =
84
rank ( )) ? 1. The test statistic to be used is
1
m(f?e(
??))? ? ?1svd(f?e(??)) D? ?2k. (3.13)
3.3.1 Derivation of the power of T
We derive the power of the test for the 2-level GLMM under contiguous alter-
natives, based on Le Cam?s third lemma (Lemma 2.14).
For a ?xed constant vector a, let
H0 :?m =?0,
H1 :?m =?0 + a?m,
where one or more components of?0 in?0 are 0s. Denote the non-zero components
of ?0 as ??0, which is a sub-vector of ?0. Note that ?m ??0, as m ??. By Taylor
expansion, using Theorem 5.21 in Van der Vaart (2000),
log dQmdP
m
= log Likelihood(?m;Y,X)Likelihood(?
0;Y,X)
?= log L(?m)
L(?0)
? (?log(L(?0)))T a?m + 12 a
T
?m(??2 log(L(?0))) a?m
? (Sm(?0))T a?m ? 12aTJ0a,
Based on the LLN on ?m?1 ?mi=1??2 logf(yi,xi,ni;?) (3.6), with J0 =
85
J(?0;?0),
limm??Var(Sm(?0)/?m) = J0.
Thus
log dQmdP
m
Pm? N(?1
2a
TJ0a,aTJ0a).
Since several components of ?0 are 0s, under H0, we ?t a reduced model to the
data, using X?N?p? instead of XN?p with p? < p. We only estimate the coe?cient
?? corresponding to X?. The e(?) function in (3.8)
el(?) =
m?
i=1
ni?
j=1
I{xij?El}E(yij) =
m?
i=1
ni?
j=1
I{xij?El}
?
g(xTij?+?i)?(?i)d?i
has Rp ?R+ as its domain, where p is the dimension of ? and R+ is the domain
for ?2, the variance component for the random e?ect ?i. Let function e?(?) be the
same meaning of function e(?), but has Rp? ?R+ as its domain
e?l(?) =
m?
i=1
ni?
j=1
I{xij?El}E?(yij) =
m?
i=1
ni?
j=1
I{xij?El}
?
g((x?ij)T?? +?i)?(?i)d?i.
Let Wm = (f?e?(???))/?m. Under the null hypothesis H0, Wm is asymptot-
ically normal, Wm?N(0, ?), by (3.10). The joint distribution of log(dQm/dPm)
and (f?e?(???))/?m is asymptotically equal in probability to the joint distribution
of aTSm(?0)/?m and (f?e?(???))/?m. We next show the jointly normality of these
two quantities.
86
By the same sequence of steps as in (3.9),
1?
m(f?e
?(???)) ? 1?
m
m?
i=1
[
(z?i1,...,z?iL)T ? ? ?(?J?0)?1S?i(??0)
]
,
where the ? in z?il and ? ? means that the X and ? in zil and ? are replaced with
X? and ??. ?J?0 and S?i(??0) denote the information matrix and the score when ?rst
and second derivatives for the log-likelihood function are taken with respect to ??.
Thus for all constant (L + 1)-vectors C, with C{1:L} denoting the ?rst L
components of C,
CT
?
??
?
(
f?e?(???)
)
/?m
aTSm(?0)/?m
?
??
?
= 1?m
m?
i=1
[ L?
l=1
Clz?il ?CT{1:L}? ?(?J?0)?1S?i(??0) +CL+1aTSi(?0)
]
.
This quantity has a limiting Gaussian distribution under Assumption 3.1. Let
? in (2.38) be the variance-covariance matrix between aTSm(?0)/?m and (f ?
e?(???))/?m. Then by Le Cam?s third lemma (Lemma 2.14), under the contigu-
ous alternative H1,
(
f?e?(???)
)
/?m ? N(?, ?).
Thus under H1, with ? =?T( ?)?1?,
1
m
(
f?e?(???)
)T
( ?)?1
(
f?e?(???)
)H
1? ?2
k(?),
87
and
1
m
(
f?e?(???)
)T(?
 ?svd
)?1(
f?e?(???)
)H
1? ?2
k(??),
where ? ?svd is the reconstructed matrix by applying Singular Value Decomposition
on a consistent estimators of  ?, ?? is a consistent estimators of?, k = rank(? ?svd) =
rank( ?) (Corollary 2.7) and the non-centrality parameter is ?? = ??T(? ?svd)?1??. For
GLMMs, it is di?cult to get the explicit form of?, thus ??, a consistent estimator of
?, can be taken as the empirical variance -covariance matrix between aTSm(?0)/?m
and (f?e?(???))/?m, which is
1
m
m?
i=1
(z?i1,...,z?iL)T (S?i(??0))T a? ? ?(?J?0)?1 1m
m?
i=1
Si(?0)(S?i(??0))T a,
with MLE ?? substituted for parameters in the above expression.
For a given type I error level ?, the approximate limiting power is thus P(T? >
?2k,?), where ?2k,? is the 1?? quantile of the central ?2k distribution and P denotes
the non central ?2k(??) distribution.
3.4 Simulations for logistic mixed models
In this Section, we implement the goodness of ?t test for the 2-level logistic
mixed models in R and study its performance in simulation. The two stages of the
model are:
yij|pij ? Binom(1,pij), i = 1,...,m, j = 1,...,ni (3.14)
88
logit(pij) = xTij?+?i, ?i ? N(0,?2).
Data are simulated from the following setting. We choose m = 500 clusters and let
ni = 5 for all i = 1,...,m. Thus the total number of observation N = ?mi=1 ni =
2500. Then N ?xed covariates xij = (x1ij,x2ij,x3ij), were independently drew from
a multivariate normal distribution,
?
??
??
??
?
x1
x2
x3
?
??
??
??
?
? N
?
??
??
??
?
?
??
??
??
?
0
0
0
?
??
??
??
?
,
?
??
??
??
?
1 0 ?13
0 1 ?23
?13 ?23 1
?
??
??
??
?
?
??
??
??
?
. (3.15)
Given the variance component parameter ?2, for m = 500, we generate ?i,i =
1,...,m independently from N(0,?2). Given ? and the generated xij, pij,i =
1,...,m, j = 1,...,ni are calculated from the equation logit(pij) = xTij? + ?i.
The last step of generating data is to independently generate yij from (3.14).
3.4.1 Computational issues and code checking
Implementing the goodness of ?t test for logistic mixed models is more com-
plicated than for linear mixed models because each E(yij) and also the variance and
covariance terms involve numerical integration. We used the Gaussian quadrature
numerical integration method with 60 quadrature points to calculate those integrals,
which are used to calculate the ? in our test statistic. The numerical errors intro-
duced by approximating so many integrals through the Gaussian quadrature rule
may result in instability when we invert the ? matrix. It may even destroy the
89
invertibility of the matrix, which is used to calculate the test statistics. Thus we
computed a Singular Value Decomposition for each estimated ? . For each eigen-
value of ? , we compare it with a preset upper bound ? (e.g. ? = .0005). For any
eigenvalue less than ?, we instead set this eigenvalue to be 0 and reconstruct the ? 
matrix using the non-zero eigenvalues and their corresponding eigenvectors.
We then did the following checks to make sure that our R code works.
? We compared the empirical variance covariance matrix of (f??e(??))/?m with
? in equation (3.11);
? Letting L(?) denote the likelihood function, and f and e be de?ned in equation
(3.8), we compared the empirical variance covariance matrix of
f?e(?0) and??logL(?)|?=?0 , computed based on 30,000 simulated data sets,
with its estimated analytical variance, which involves terms in ? in equation
(3.11);
? We checked in simulations that the goodness of ?t test statistic (3.12) indeed
has an asymptotic ?2 distribution. We choose ?13 = ?23 = 0 and set the true
parameter values ?2 = .5, ? = (?0,?1,?2,?3) = (.1,.5,?.5,.5). We then ?t
the logistic mixed model with all covariates that in?uence the response y. We
select L = 12 cells in the computation of T in (3.12) based on x1 and x2 by
using ?xed cell boundaries. With the number of iterations K being 5000, we
then have 5000 test statistics (f ? e(??))? ? ?1(f ? e(??))/m. Figure 3.1 gives
the histogram of these 5000 independently calculated test statistics, which is
close to ?211, with p value from the Kolmogorov-Smirnov test being .9294 and
90
test statistics
prob
0 5 10 15 20 25 30 35
0.00
0.02
0.04
0.06
0.08
density of chisq, df=11
Histogram of test stat for logistic mixed model
Figure 3.1: Histogram of 1000 test statistics for logistic mixed model
p value from Pearson?s chi-square goodness of ?t test .40 when the number of
cells used is 20. The simulation result agrees with the theory.
3.4.2 Checking the size of the test in simulations
We checked the size of the test under various choices of cell partitions based
on X. We chose ?13 = ?23 = 0 in (3.15) and let ?2 = .5, ? = (?0,?1,?2,?3) =
(.1,.5,?.5,.5), and ?t the logistic mixed model with all covariates X in the model.
All cell partitions in the computation of T in (3.12) were based on ?xed cut o?s.
Table 3.2 shows that the empirical size estimates (Emp. Size) were close to the
nominal ? levels of 0.05 and 0.1 for all choices of cell partitions. This closeness was
observed with greater consistency when K = 5000. Table 3.1 gives the empirical
sizes of the test under di?erent nominal ? levels when the cell partition is based on
91
x1 and the number of cells is L = 8. They were all very close. The third column is
the standard deviation of the corresponding empirical size (ES), which is calculated
as ?ES(1?ES)/K.
Table 3.1: Empirical size of the test under di?erent  levels (logistic mixed model).
m = 500;ni = 5;? = (:1;:5;?:5;:5); 2 = :5; 13 =  23 = 0;L = 12;K = 5000.
signi?cance level  Empirical Size(ES) Standard Deviation of ES
0.05 0.049 0.0031
0.1 0.099 0.0042
0.2 0.199 0.0056
0.3 0.297 0.0065
0.4 0.406 0.0069
0.5 0.505 0.0071
0.6 0.605 0.0069
0.7 0.702 0.0065
0.8 0.801 0.0056
Table 3.2: Empirical size of the test under di?erent cell partitions (logistic mixed model).
m = 500;ni = 5;? = (:1;:5;?:5;:5); 2 = :5; 13 =  23 = 0.
L  Empirical Size  Empirical Size
K=1000 K=5000 K=1000 K=5000
8 (x1) 0.05 0.054 0.0508 0.1 0.1080 0.1034
3?4 (x1;x2) 0.05 0.048 0.0492 0.1 0.0950 0.0994
5?4 (x1;x3) 0.05 0.044 0.0494 0.1 0.0940 0.0986
6?7 (x2;x3) 0.05 0.056 0.0490 0.1 0.1090 0.1012
3.4.3 Simulations to assess empirical power of the test
To assess the power of the test, we ?t the logistic mixed model to the data
without including x3 among the covariates. We then tried six di?erent cell partitions
based on subsets of the design matrix X with L = 12 and 42 cells. We used ?xed
cuto?s to do the cell partitions. We set ?2 = .5 and (?0,?1,?2) = (0,.8,?.8) for all
92
Table 3.3: Impact of cell partition on empirical power I (logistic mixed model).
m = 500; 13 =  23 = 0;( 0; 1; 2) = (0;:8;?:8);K = 1000.
Parti  3 = :2  3 = :3  3 = :4
L = 12 L = 42 L = 12 L = 42 L = 12 L = 42
x1 0.048 0.050 0.047 0.049 0.046 .056
x2 0.055 0.061 0.051 0.067 0.052 .071
x3 0.787 0.497 0.997 0.939 1 .999
x1;x2 0.053 0.054 0.048 0.047 0.047 .051
x1;x3 0.724 0.478 0.989 0.918 1 1
x2;x3 0.729 0.468 0.992 0.917 1 .999
simulations in this power study Section.
Weset(?13,?23) = (0,0)andstudytheimpactofthemagnitudeof?3 onpower.
For a given design matrix X, we simulated K = 1000 sets of Y and computed the
empirical power of the test over K = 1000 iterations for ?3 = .2, .3 or .4 separately.
Table 3.3 shows that with all other settings being the same, the power of the test
increases as the magnitude of ?3 increases except with those choices of partition (the
?rst second and fourth) where power is e?ectively constant at 0.05. To study the
impact of (?13,?23) on power, we then ?x ?3 = .3 and choose three di?erent pairs of
(?13,?23). For each pair of (?13,?23), we simulated a set of design matrix X. We then
simulated K = 5000 sets of Y based on this X and computed the empirical power of
the test over these K iterations. Table 3.4 shows that with all other settings being
the same, the power of the test decreases as the correlation between the x3 and x1,
x2 increases, where x3 is the omitted covariate.
From Tables 3.3 and 3.4 we can also see that the choice of cell partition strongly
a?ects the power. When a covariate that should be in the model is omitted, cell
partitions based on that omitted covariate result in adequate power of the test.
93
Table 3.4: Impact of cell partition on empirical power II (logistic mixed model).
m = 500;?3 = (0;:8;?:8;:3); 2 = :5;K = 5000.
Parti  13 = 0; 23 = 0  13 = 0:2; 23 = 0:3  13 = 0:4; 23 = 0:5
L = 12 L = 42 L = 12 L = 42 L = 12 L = 42
x1 0.047 0.049 0.046 0.054 0.054 0.058
x2 0.051 0.067 0.05 0.053 0.047 0.047
x3 0.997 0.939 0.988 0.883 0.902 0.657
x1;x2 0.048 0.047 0.047 0.046 0.046 0.061
x1;x3 0.989 0.918 0.974 0.862 0.831 0.630
x2;x3 0.992 0.917 0.975 0.850 0.818 0.557
However, if the cell partition is based only on covariates already in the model, this
test has low power to detect any lack of model ?t. All the above ?ndings and
conclusions in the logistic mixed model agree with what we saw earlier in the linear
mixed models.
3.5 Discussion
In this Chapter, we extended the goodness of ?t tests developed for Linear
Mixed Models (LMMs) in Chapter 2 to Generalized Linear Random Intercept Mod-
els (GLMMs). We described the asymptotic properties of the tests when parameters
were estimated through maximum likelihood. We assessed factors that impact the
power and the impact of choice of cell partitions on the test in simulations for lo-
gistic mixed models. We obtained conclusions consistent with those found for the
LMMs in Chapter 2, that when a speci?c covariate that is associated with outcome
is omitted, cell partitions based on the omitted covariate result in adequate power
of the test. However, if the cell partition is based only on covariates already in the
model, this test has low power to detect any lack of model ?t.
94
This type of goodness of ?t test can be used to test the statistical adequacy
of the ?nally selected GLMM in real applications. All that is needed in order
to implement the test are the ?nal model parameter estimates and their variance
covariance matrix as well as the estimated means for the outcome y under the model.
Implementing logistic mixed models is more complicated that implementing LMMs,
as all the means and variances of the outcome y involve integrals. The covariance
matrix between the estimated ?xed e?ect parameters and the estimated variance
components is not available in R, but is available in PROC nlmixed in SAS. As a
note of caution, in applying the test in GLMMs, one needs to check the eigenvalues
of the estimated variance covariance matrix ? in (3.11) and eliminate the tiny
eigenvalues to make sure its inverse does not blow up. This also helps to get the
correct rank of the estimated variance covariance matrix ? in (3.11) to ensure the
correct degrees of freedom for the test statistic.
3.6 Technical details for Chapter 3
3.6.1 Checking B.2 in Assumption 3.1 for logistic mixed model
Lemma 3.8 Let E be the event that ?mi=1?nij=1 x?2ij has full rank, then the condi-
tional distribution of (b?,c)??,?2logf(yi|{xij}j,ni) given {(xij,ni)} on the event E
is non-degenerate at ?0.
Proof: In order to show that J(?0;?0) is positive de?nite for the random intercept
logistic mixed model, we just need to show that ? (b?,c) non-trival, where b is
any p ? 1 constant vector and c is any constant, (b?,c)??,?2 logfi(?0,?20) is non-
95
degenerate. We show this in the random intercept logistic model (3.2), where pij,0 =
1/(1+e?(x?ij?0+?0ai)) and {xij}j = {xij,j = 1,...,ni}. Then with ai ? N(0,1)
(b?,c)??,?2 logf(yi|{xij}j,ni,ai)
= b? ???logf(yi|{xij}j,ni,ai)+c ???2logf(yi|{xij}j,ni,ai)
= b?
ni?
j=1
(yij ?pij,0)xij +c 12?
0
ni?
j=1
(yij ?pij,0)ai
=
ni?
j=1
(yij ?pij,0)(b?xij +c ai2?
0
).
Let g1(xTij?0,?0) =?ai pij,0 ?(ai)dai, and g2(xTij?0,?0) =?ai pij,0?i?(ai)dai, integrat-
ing out ai then gives
(bT,c)??,?2 logf(yi|{xij}j,ni)
=
?
ai
ni?
j=1
(yij ?pij,0)(bTxij +c ai2?
0
)?(ai)dai
=
ni?
j=1
[
yijbTxij +yij c2?
0
E?i ?
?
ai
pij,0 ?(ai)dai bTxij ? c2?
0
?
ai
pij,0?i?(ai)dai
]
=
ni?
j=1
(y
ij ?g1(xTij?0,?0)
)xT
ijb?
c
2?0
ni?
j=1
g2(xTij?0,?0), (3.16)
where ?nij=1 g2(xTij?0,?0) is free of yij. With the fact that conditional on
{(xij,ni),s.t. P(
m?
i=1
ni?
j=1
x?2ij has full rank) > 0},
every yi ? {0,1}ni has positive probability, (3.16) equals 0 only when both b = 0
and c = 0.
96
3.6.2 Checking B.3 in Assumption 3.1 for Logistic mixed model
Lemma 3.9 For the random intercept logistic mixed model (3.2), the dominatedness
condition (3.3) holds under the assumption that ?nij=1?x?2ij ? is bounded.
Proof: With ai ? N(0,1), the conditional likelihood for the ith cluster, condition-
ing on ({xij}j,ni,ai), is
logf (yi|{xij}j,ni,ai) =
?
j
(xT
ij? +?ai
)y
ij ?
?
j
log
(
1 +exTij?+?ai
)
.
The ?rst derivatives of the conditional likelihood are
?logf(yi|{xij}j,ni,ai)
?? =
?
j
(
yij ? 11+e?(xT
ij?+?ai)
)
xij;
?logf(yi|{xij}j,ni,ai)
??2 =
1
2?
?logf(yi|{xij}j,ni,ai)
??
= 12?
?
j
(
yij ? 11+e?(xT
ij?+?ai)
)
ai.
The second derivatives of the conditional likelihood are
??
2logf(yi|{xij}j,ni,ai)
?(?)2 =
?
j
exTij?+?ai
(1 +exTij?+?ai)2xijx
T
ij;
??
2logf(yi|{xij}j,ni,ai)
????2 =
1
2?
?
j
exTij?+?ai
(1 +exTij?+?ai)2aixij;
97
??
2logf(yi|{xij}j,ni,ai)
?(?2)2
= ? 12? ???
[
1
2?
?
j
(
yij ? 11+e?(xT
ij?+?ai)
)
ai
]
= 14?3
?
j
[
ai
(
yij ? 11 +e?(xT
ij?+?ai)
+?ai e
x?ij?+?ai
(1 +exTij?+?ai)2
)]
.
We ?rst take expectations with respect to yij, conditionally given all other,
that is, {xij}j,ni,ai, to integrate out yij.
E?0
[
? ?
2
??2logf(yi|{xij}j,ni,ai)
]
=
?
j
exTij?+?ai
(1 +exTij?+?ai)2xijx
T
ij, (3.17)
E?0
[
? ?
2
????2logf(yi|{xij}j,ni,ai)
]
= 12?
?
j
exTij?+?ai
(1 +exTij?+?ai)2aixij. (3.18)
E?0
[
? ?
2
?(?2)2logf (yi|{xij}j,ni,ai)
]
=
1
4?3
?
j
[
ai
(
1
1 +e?(xTij?0+?0ai) ?
1
1 +e?(xTij?+?ai) +?ai
exTij?+?ai
(1 +ex?ij?+?ai)2
)]
.
The two expectations in (3.17) and (3.18) have exactly the same forms as without
taking the expectations since there is no term involving yij. We next integrate out
ai and with pij = 1/(1+e?(xTij?+?ai)), get
E?0
[
? ?
2
??2logf(yi|{xij}j,ni)
]
=
?
j
?
ai
pij(1?pij)dF(ai)xijx?ij,
98
E?0
[
? ?
2
????2logf(yi|{xij}j,ni)
]
= 12?
?
j
(?
ai
pij(1?pij)aidF(ai)
)
xij.
E?0
[
? ?
2
?(?2)2logf(yi|{xij}j,ni)
]
=
1
4?3
?
j
(?
ai
1
1 +e?(xTij?0+?0ai)aidF(ai)?
?
ai
pijaidF(ai) +?
?
ai
pij(1?pij)a2idF(ai)
)
.
Since pij is bounded, the dominatedness condition (3.3) will hold by inspection
if ?nij=1?x?2ij ? is bounded.
3.6.3 Checking B.3 in Assumption 3.1 for Poisson mixed model
Lemma 3.10 For the random intercept Poisson mixed model, the dominatedness
condition (3.3) holds under the assumption that E(?j ec?k |xijk|) < ?, ? c ?
maxk|?k|+?.
Proof: With ai ? N(0,1), the conditional likelihood for the ith cluster, condi-
tioned on ({xij}j,ni,ai), is
logf(yi|{xij}j,ni,ai) =
?
j
[(
xTij? +?ai)yij ?ex?ij?+?ai ?log(yij!)
]
.
The ?rst derivatives of the conditional likelihood are
?logf(yi|{xij}j,ni,ai)
?? =
?
j
(
yij ?ex?ij?+?ai
)
xij;
99
?logf(yi|{xij}j,ni,ai)
??2 =
1
2?
?logf(yi|{xij}j,ni,ai)
??
= 12?
?
j
(
yij ?ex?ij?+?ai
)
ai.
The second derivatives of the conditional likelihood are
??
2logf(yi|{xij}j,ni,ai)
?(?)2 =
?
j
ex?ij?+?aixijx?ij;
??
2logf(yi|{xij}j,ni,ai)
????2 =
1
2?
?
j
ex?ij?+?aiaixij;
??
2logf(yi|{xij}j,ni,ai)
?(?2)2 = ?
1
2?
?
??
[
1
2?
?
j
(
yij ?ex?ij?+?ai
)
ai
]
= 14?3
?
j
[
ai
(
yij ?ex?ij?+?ai
)
+?a2iex?ij?+?ai
]
.
We ?rst take expectations with respect to yij, conditionally given {xij}j,ni,ai, to
integrate out yij.
E?0
[
? ?
2
??2logf(yi|{xij}j,ni,ai)
]
=
?
j
ex?ij?+?aixijx?ij, (3.19)
E?0
[
? ?
2
????2logf(yi|{xij}j,ni,ai)
]
= 12?
?
j
ex?ij?+?aiaixij. (3.20)
100
E?0
[
? ?
2
?(?2)2logf(yi|{xij}j,ni,ai)
]
= 14?2
?
j
a2iex?ij?+?ai +
1
4?3
?
j
[
ai
(
ex?ij?0+?0ai ?ex?ij?+?ai
)]
.
The two expectations in (3.19) and (3.20) have exactly the same forms as without
taking the expectations since there is no term involving yij. We next integrate out
ai. By making use of the three equations E(e?ai) = e?2/2, Eai(a2ie?ai) = (1+?2) e?2/2
and Eai(aie?ai) = ?e?2/2, we get
E?0
[
? ?
2
??2logf(yi|{xij}j,ni)
]
=
(?
j
ex?ij?xijx?ij
)
e?2/2.
E?0
[
? ?
2
????2logf(yi|{xij}j,ni)
]
= 12 e?2/2
(?
j
ex?ij?xij
)
.
E?0
[
? ?
2
?(?2)2logf(yi|{xij}j,ni)
]
= 14?2
(?
j
ex?ij?
)
E(a2ie?ai)+
1
4?3
[(?
j
ex?ij?0
)
E (aie?0ai)?
(?
j
ex?ij?
)
E (aie?ai)
]
= 14?2
(?
j
ex?ij?
)
e?2/2 (1 +?2)+
1
4?3
[(?
j
ex?ij?0
)
?0e?20/2 ?
(?
j
ex?ij?
)
?e?2/2
]
= 14e?2/2
(?
j
ex?ij?
)
+ ?04?3e?20/2
(?
j
ex?ij?0
)
.
(3.3) will hold by inspection if assumptions are imposed ensuring that
sup??B?(?j ex?ij?(1 +x?ijxij)) is integrable. The following condition,
101
E(?j ec?k |xijk|) < ?, ? c ? maxk|?k|+?, is su?cient to ensure that.
3.6.4 Simpli?cation of ?? in equation (3.11)
In this section, we manipulate the expression for the estimated asymptotic
variance ? in equation (3.11). The notations VarC,CovC,EC mean the corre-
sponding quantities involve integrals over yi alone, conditionally given {xi,ni}i =
{(xi,ni), i = 1,...,m}. Because ?J0 and ? are functions of only {xi,ni}i, condi-
tionally given {xi,ni}i,
VarC(?i) = Var(?i|{xi,ni}i)
= VarC (zi1,...,ziL)T +VarC(? ?J?10 Si(?0))?
2CovC
[
(zi1,...,ziL)T , ? ?J?10 Si(?0)
]
= VarC (zi1,...,ziL)T + ? ?J?10 VarC(Si(?0))?J?10 ? T ?
2
(
CovC
(
zi1, ? ?J?10 Si(?0)
)
,...,CovC
(
ziL, ? ?J?10 Si(?0)
))T
,
where CovC
(
zil, ? ?J?10 Si(?0)
)
= ECyi(zilSTi (?0))?J?10 ? T. We next simplify
ECyi(zilSTi (?0)), with notation fCyi meaning the conditional density of fyi given
{xi,ni}i,
ECyi(zilSTi (?0)) =
?
yi
(
zil ?f
C
yi
fCyi
)
fCyi dyi =
?
yi
zil
(
?fCyi
)
dyi
= ?
?
yi
zil fCyi dyi ?
?
yi
(?zil) fCyi dyi,
102
and the ?rst of these last two integrals is 0 because of the identity:
?
yi
zil fCyidyi = ECyi(zil) = 0.
Thus
ECyi(zilSTi (?0)) = ?
?
yi
(?zil)fCyidyi
=
?
yi
( n
i?
j=1
I{xij?El}?ECyij
)
fCyi dyi =
ni?
j=1
I{xij?El}?ECyij.
The last equality holds because ?ECyij is not a function of yi and ?yi fCyi dyi = 1.
By the preceding calculations, an by de?nition of e(?0),
?
??
??
??
?
?m
i=1 E
C[zi1ST
i (?0)
]
...
?m
i=1 E
C[ziLST
i (?0)
]
?
??
??
??
?
= ? e(?0). (3.21)
Let ?? =?mi=1 Var(?i | {xi,ni}i)/m and ? = limm?? ??. Then
103
?? = 1
m
m?
i=1
Var(?i | {xi,ni}i)
= 1m
m?
i=1
VarC (zi1,...,ziL)T + 1m
m?
i=1
? ?J?10 VarC(Si(?0))?J?10 ? T ?
1
m
m?
i=1
2
(
CovC
(
zi1, ? ?J?10 Si(?0)
)
,...,CovC
(
ziL, ? ?J?10 Si(?0)
))T
= 1m
m?
i=1
VarC (zi1,...,ziL)T + ? ?J?10 ? T ?2? ?J?10 ? T
= 1m
m?
i=1
VarC (zi1,...,ziL)T ? ? ?J?10 ? T,
The existence of m?1 ?mi=1 Var(zi1,...,ziL)T is ensured by Assumption 3.1.
Under the assumption that (xi,ni) are i.i.d, ? converge in probability to  ,
the limiting variance covariance matrix of (f?e(??))/?m. With ? in ? replaced by
its MLE ??, ? = 1m?mi=1 ?Var(?i|xi,ni) is a consistent estimator of  .
104
Chapter 4
Discussion and further research
4.1 Discussion
Schoenfeld (1980) presented a class of omnibus chi-squared goodness of ?t
tests for the proportional hazards regression model. We adapted this idea and
proposed a class of goodness of ?t tests for testing the statistical adequacy of a 2-level
generalized linear mixed model (GLMM). We described the asymptotic properties
of the test when parameters were estimated through maximum likelihood. For a
special case of linear mixed models (LMMs), we extended this test to 2-level LMMs
with no distributional assumptions for either the random e?ect ?i or the error term
?ij. We also extended the test to multi-level LMMs.
We assessed factors that impact the power and the impact of choice of cell
partitions on the test in simulations for both linear mixed models (LMMs) and lo-
gistic mixed models. We found that when a speci?c covariate that is associated with
outcome is omitted, cell partitions based on the omitted covariate result in adequate
power of the test. However, if the cell partition is based only on covariates already
in the model, this test has low power to detect any lack of model ?t. For LMMs, we
also conducted simulations to show that our test was very robust to violations of the
normality assumption of the error distribution when we use symmetric distributions.
For LMMs, we developed the theoretical power of the test under the local
105
alternative. We then checked this theoretical power, obtained from Le Cam?s third
lemma, with simulations. We found that the estimated theoretical power calculated
using Le Cam?s third lemma was reliable at least when the number of clusters m
is above 50. However, when m is very small, it may be advisable to rely on the
empirical power computed through simulations. For LMMs, we also proposed a
criteria parameter ? that is closely related to the power.
This goodness of ?t test can be used to test the statistical adequacy of the
?nally selected GLMM in real applications. All that is needed in order to implement
the test are the ?nal model parameter estimates and their variance covariance matrix
as well as the estimated means for the outcome y under the model. It is easy to
implement the test in LMMs as these are standard outputs from any statistical
software. Implementing logistic mixed models is relatively more complicated, as all
the means and variances of the outcome y involve integrals. Also, the covariance
matrix between the estimated ?xed e?ect parameters and the estimated variance
components is not available in R, but is available in PROC nlmixed in SAS. As a
note of caution, in applying the test in GLMMs, one needs to check the eigenvalues
of the estimated variance covariance matrix ? in (3.12) and eliminate the tiny
eigenvalues to make sure its inverse won?t blow up. This also helps to get the
correct rank of the estimated variance covariance matrix ? in (2.13) to ensure the
correct degrees of freedom for the test statistic.
106
4.2 Further research
In this thesis, I applied the Schoenfeld?s residual approach [30] to test the
goodness of ?t of classical generalized linear mixed models, via quadratic goodness of
?t statistics. I then applied this test to three biomedical data, testing the goodness of
?t of two-level linear mixed models. One of the further research will be applying this
test to multilevel and longitudinal data once we have these kinds of data. Another
aspect of further research will be comparing our proposed goodness of ?t test with
the existing bootstrap and Bayesian approaches.
107
Chapter 5
Appendix: Three general lemmas
The following three Lemmas are used in several places in the text.
Lemma 5.1 Suppose that {uin : n ? 1, 1 ? i ? n} is a triangular array of
independent identically distributed random variables within each row (i.e., across i)
with mean 0 and ?nite variance ?2u, and that these variables are independent
of the random array {cin : n ? 1, 1 ? i ? n} which satis?es the additional
properties that as n ??
(a) max
1?i?n
|cin| ? 0 and (b)
n?
i=1
c2in ? ?
in probability, where ? ? (0,?). Then ?ni=1 cinuin D? N(0,?) as n ??.
Proof. {?ki=1 cinuin}nk=1 is a martingale with respect to the ?ltration Fkn =
?({cin, .uin : 1 ? i ? k}). The Lemma is an immediate consequence of the
Martingale Central Limit Theorem (D. McLeish 1974; P. Hall and C. Heyde 1981).
A direct proof from the standard Lindeberg Central Limit Theorem (applied condi-
tionally given {cin}ni=1) is also not di?cult. 2
Lemma 5.2 Let  be a q ?q covariance matrix of rank k with smallest non-zero
108
eigenvalue ?, and let ? ? (0,?) be arbitrary. Let ? N be a sequence of random q?q
covariance matrices such that with probability approaching 1 as N ??, the smallest
positive eigenvalue of ??N is greater than ?. If ? N P?  , then P
(
rank (? N) = k
)
?
1, as N ??.
Proof. Let V and V? be the range and null space of  respectively, and
dim(V) = rank( ) = k. Because ? N P?  , we have that ? w ? V? with w ?= 0,
?? Nw? P? 0, thus
P(?? Nw?? ??w?) ? 1.
By the hypothesis on ? N stated in the Lemma,
P
(
[ ?? Nw?? ??w? ]
?
[ w ?= 0 ]
)
? 0.
Thus
P(w ?{null space of ? N}) ? 1, ? w ?V?,
where V? is the null space of  . Similarly, ? v ?V with v ?= 0, ? Nv P?  v. Thus
?? Nv? P?? v?? ??v? > ??v?,
which leads to P(? Nv ?= 0) ? 1. Overall, ? w ?V? with w ?= 0 and ? v ?V with
v ?= 0,
P(w ? null space of ? N, v ? range of ? N) ? 1.
Therefore, P
(
rank (? N) = k
)
? 1, as N ??. 2
109
Corollary 5.3 Let the symmetric matrix ? 0N be a consistent estimator sequence
for a q ? q covariance matrix  whose smallest non-zero eigenvalue is ?. Let ?
be an arbitrary number, where 0 < ? < ?. If we represent ? 0N in terms of an
orthonormal eigenbasis by ? 0N = ?qk=1 ckvkNvTkN and de?ne the random matrix
? N =?qk=1 ckI[c
k>?] vkNv
T
kN, then P(rank (? N) = rank ( )) ? 1.
Proof. Let V and V? be the range and null space of  respectively. Because
? 0N P?  , we have that ? w ?V? with w ?= 0 and ? v ?V with v ?= 0,
? 0Nw ? 0, (? 0N ? )v P? 0.
By de?nition of the matrix ? N,
{
w : ?? 0Nw?? ??w?
}
?
{
w : ? Nw = 0
}
, and ?? Nv? > ??v?.
Thus P(? Nw = 0) ? 1 and P(? Nv ?= 0) ? 1. Overall, ? w ?V? with w ?= 0 and
? v ?V with v ?= 0,
P(w ? range of ? N, v ? null space of ? N) ? 1.
Therefore, P
(
rank (? N) = rank ( )
)
? 1, as N ??. 2
110
Bibliography
[1] Agresti, A. Categorical Data Analysis, 2nd ed. New York: Wiley, 2002.
[2] Andersen, P. K. and Gill, R. D. (1982). Cox?s regression model for counting
processes: a large sample study. Ann. Statist. 10, 1100-1120.
[3] Bickel, P. J. and Doksum, K. A. Mathematical Statistics, 2nd ed. Prentice Hall;
2006.
[4] Christiansen, C. L. and Morris, C. N. (1997). Hierarchical Poisson regression
modeling. Journal of the American Statistical Assocation. 92, pp. 618-632.
[5] Claeskens, G. and Hart, J. D. (2009). Goodness-of-?t tests in mixed models.
TEST. 18, pp. 213-239.
[6] Cox, D. R. (1961). Tests of separate families of hypotheses. Proc. Fourth Berke-
ley Symp. on Math. Statist. and Prob., Vol. 1, 105-123.
[7] Crainiceanu, C. M. and Ruppert D. (2004). Likelihood ratio tests in linear
mixed models with one variance component. J. R. Statist. Soc. B 66, 165-185.
[8] Dorgan, J. F. , Baer, D. J. , Albert, P. S. , Judd, J. T. , Brown, E. D. , Corle,
D. K. , et al. (2001) Serum hormones and the alcohol-breast cancer association
in postmenopausal women. J Natl Cancer Inst 93, pp. 710-5.
[9] Godfrey, L. G. Misspeci?cation Tests in Econometrics. Cambridge, 1988.
[10] Jacqmin-Gadda, H., Sibillot, S., Proust, C., Molina, J.-M., and Thi?ebaut, R.
(2007). Robustness of the linear mixed model to misspeci?ed error distribution.
Computational Statistics and Data Analysis, Vol. 51, pp, 5142-5154.
[11] Jiang, J. (1998b). Consistent estimators in generalized linear mixed models.
Journal of the American Statistical Association, 93, pp. 720-729.
[12] Jiang, J. (1999). Conditional inference about generalized linear mixed models.
The Annals of Statistics. 27, pp. 1974-2007.
[13] Jiang, J. (2001). Goodness-of-?t tests for mixed model diagnostics. The Annals
of Statistics 4, 1137-1164.
111
[14] Jiang, J. and Lahiri, P. (2006). Mixed Model Prediction and Small Area Esti-
mation. Test. Vol. 15, No. 1, pp. 1-96.
[15] Kedem, B. and Fokianos, K. (2002). Regression Models for Time Series Anal-
ysis. Wiley, New York.
[16] Khuri, A. I., Mathew, T. and Sinha, B. K. (1998). Statistical Tests for Mixed
Linear Models. New York: Wiley.
[17] Laird, N. M. and Ware, J. H. (1982). Random-E?ects Models for Longitudinal
Data. Biometrics. Vol. 38, No. 4, pp. 963-974.
[18] Lange, N. and Ryan, L. (1989). Assessing normality in random e?ects models.
Annals of Statistics. Vol. 17, No, 2, pp. 624-642.
[19] Lee, Y., Nelder, J. A., and Pawitan, Y.. Generalized Linear Models with Ran-
dom E?ects Uni?ed Analysis via H-likelihood. Chapman Hall/CRC, 2006.
[20] Lin, D. Y., Wei, L. J. and Ying, Z. (2002) Model-checking techniques based on
cumulative residuals. Biometrics, Vol 58, No. 1, pp. 1-12.
[21] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman
and Hall, London, 2nd ed.
[22] McCulloch, C. E. and Searle, S. R. Generalized, Linear, and Mixed Models.
Wiley, 2001.
[23] Miller, J. J. (1977). Asymptotic properties of maximum likelihood estimates
in the mixed model of the analysis of variance. The Annals of Statistics. 5,
746-762.
[24] Pan, Zhiying and Lin, D. Y. (2005). Goodness-of-Fit Methods for Generalized
Linear Mixed Models. Biometrics. 61, 1000-1009.
[25] Park, T. and Lee, S.-Y. (2004). Model diagnostic plots for repeated measures
data. Biometrical Journal, 46, pp. 441-452.
[26] Prasad, N. G. N. and Rao, J. N. K. (1990). The Estimation of the Mean Squared
Error of Small-Area Estimators. Journal of the American Statitical Association.
Vol. 85, No. 409, pp. 163-171.
[27] Rao, C. R. and Wu, Y. (1989) A strongly consistent procedure for model selec-
tion in a regression problem. Biometrika, Vol. 76, No. 2, pp. 369-374.
112
[28] Rao, C. R., Wu, Y., Konishi, S. and Mukerjee, R. (2001) On model selection.
IMS Lecture Notes - Monograph Series, Vol. 38, pp. 1-64.
[29] Ritz, C. (2004). Goodness-of-?t tests for mixed models. Board of the Foundation
of the Scandinavian Journal of Statistics 31, 443-458.
[30] Schoenfeld, D. (1980). Chi-squared goodness-of-?t tests for the proportional
hazards regression model. Biometrika 67, 145-153.
[31] Self, S. G. and Liang, K. (1987). Asymptotic properties of maximum likelihood
estimators and likelihood ratio tests under nonstandard conditions. Journal of
the American Statistical Association 82, 605-610.
[32] Shao, J. (1997) An asymptotic theory for linear model selection. Statistica
Sinica 7, 221-264.
[33] Slud, E. and Kedem, B. (1994). Partial likelihood analysis of logistic regression
and autoregression. Statistica Sinica 4, 89-106.
[34] Stezhko, VA, Buglova, EE, Danilova, LI, et al (2004). A cohort study of thyroid
cancer and other thyroid diseases after the Chornobyl accident: Objectives,
design and methods. RADIATION RESEARCH Volume 161, Issue 4, 481-
492.
[35] Tsiatis, A. A. (1980). A note on a goodness-of-?t test for the logistic regression
model. Biometrika 67, Issue 1, pp. 250-251.
[36] Van der Vaart, A. W. , Asymptotic Statistics. Cambridge, 2000.
[37] Wand, M. P. (2007). Fisher information for generalised linear mixed models.
Journal of Multivariate Analysis. 98, 1412-1416.
[38] Weiss, R. E. Modelling Longitudinal data. Springer, 2005.
113