ABSTRACT
Title of dissertation: ASYMPTOTIC NORMALITY
IN GENERALIZED
LINEAR MIXED MODELS
Min Min
Doctor of Philosophy, 2007
Dissertation directed by: Professor Paul J. Smith
Statistics Program
Department of Mathematics
Generalized Linear Mixed Models (GLMMs) extend the framework of Gener-
alized Linear Models (GLMs) by including random efiects into the linear predictor.
This will achieve two main goals of incorporating correlation and allowing broader
inference. This thesis investigates estimation of flxed efiects as the number of ran-
dom efiects grows large. This model describes cluster analysis with many clusters
and also meta-analysis.
After reviewing currently available methods, especially the penalized likeli-
hood and conditional likelihood estimators of Jiang [14], we focus on the random
intercept problem. We propose a new estimator ^^flw of regression coe?cient and
prove that when m, the number of random efiects, grows to inflnity at a slower
rate than the smallest cluster sample size, ^^flw is consistent and given the realiza-
tion of random efiects, is asymptotically normal. We also show how to estimate
the standard errors of our estimators. We also study the asymptotic distribution of
Jiang?s [14] penalized likelihood estimators. In the absence of regression coe?cients,
the normalized estimated intercept pm(^a?a0) converges to a normal distribution.
Di?culties arise in establishing the conditional asymptotic normality of Jiang?s [14]
penalized likelihood estimator ^fl of regression coe?cients for flxed efiects in a general
GLMM.
In Chapter 4, we make an extended analysis of the 2?2?m table to show
how to verify the general conditions in Chapter 3. We compare our estimator to
the Mantel-Haenszel estimator. Simulation studies and real data analysis results
validate our theoretical results.
In Chapter 5, asymptotic normality of joint flxed efiect estimate and scale
parameter estimate is proved for the case as m=N 9 0. An example was used to
verify the general conditions in this case.
Simulation studies were performed to validate the theoretical results as well
as to investigate conjectures that are not covered in the theoretical proofs. The
asymptotic theory for ^flw describes the flnite sample behavior of ^flw very accurately.
We flnd that in the case as m=N ! 0, in the random logistic and Poisson intercept
models, consistency and conditional asymptotic normality results appear to hold for
the penalized regression coe?cient estimates ^fl.
ASYMPTOTIC NORMALITY IN GENERALIZED
LINEAR MIXED MODELS
by
Min Min
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulflllment
of the requirements for the degree of
Doctor of Philosophy
2007
Advisory Committee:
Dr. Paul Smith, Chair/Advisor
Dr. C. Mitchell Dayton
Dr. Abram Kagan
Dr. Partha Lahiri
Dr. Eric Slud
c Copyright by
Min Min
2007
DEDICATION
In the memory of my grandparents,
Houzhang and Yunxuan
ii
ACKNOWLEDGEMENTS
I am deeply grateful to my advisor, Dr. Paul Smith for his teaching, guiding
and encouraging me through all these years. He has always made himself available
for help and provided valuable insights. Without his academic advising and moral
support, it is impossible to complete my dissertation. I am honored to work with
such an extraordinary scholar.
I would like to thank the committee members: Drs. Abram Kagan and Eric
Slud for reading my manuscript and making valuable comments and corrections
which enhanced the quality of this work. Drs. C. Mitchell Dayton and Partha
Lahiri for agreeing to serve on my dissertation committee.
I would also like to thank Chip Denman for his support when I was working
at stat lab.
I thank all of my friends at UMCP, especially Guanhua Lu, Huaizhong Ren,
Hsiao-Hui Tsou, Shihua Wen, Chin-Fang Weng and Zhihui Yang for their support
and loyal friendship.
I owe my deepest appreciation to my family, especially my parents for their
unconditional love and always believing in me.
Finally, I want to thank my husband for his love, understanding and sharing
the happiness of my achievement. I also want to thank my best friend Yuehan Wang
for always being there for me. A special thank you goes to my son Chance for his
patience, for bringing me great joy and enriching my life.
iii
Table of Contents
List of Tables v
List of Figures viii
1 Introduction 1
1.1 Notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Review 7
2.1 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Deflnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Quasilikelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Generalized Estimating Equations . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Deflnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Generalized Linear Mixed Models . . . . . . . . . . . . . . . . . . . . 22
2.4.1 Deflnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Estimation approaches . . . . . . . . . . . . . . . . . . . . . . 23
2.4.3 Random intercept model (canonical link) . . . . . . . . . . . . 26
2.5 Generalized GLMM (canonical link function case) . . . . . . . . . . . 27
2.5.1 Deflnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.2 Penalized Generalized Weighted Least Squares in Case 1 . . . 28
2.5.3 Maximum Conditional Likelihood Estimates in Case 2 . . . . 33
3 Case 1 random intercept model results. 36
3.1 Case 1 random intercept (combine a and vi) . . . . . . . . . . . . . . 37
3.2 Penalized likelihood estimator ^a in the case 1 random intercept model
when fl=0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Discussion of Jiang [14] penalized likelihood estimator ^fl in case 1
random intercept model (fl 6= 0) . . . . . . . . . . . . . . . . . . . . . 44
3.4 Proofs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.1 Proof of Lemma 3.1.1 . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.2 Proof of Theorem 3.1.2 . . . . . . . . . . . . . . . . . . . . . . 51
3.4.3 Proof of Theorem 3.1.3 . . . . . . . . . . . . . . . . . . . . . . 55
3.4.4 Proof of Corollary 3.1.4 . . . . . . . . . . . . . . . . . . . . . 59
3.4.5 Proof of Theorem 3.2.1 . . . . . . . . . . . . . . . . . . . . . . 60
4 Logistic 2?2?m table 64
4.1 Logistic 2?2?m table . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2 Simulation results for 2?2?m table. . . . . . . . . . . . . . . . . . 73
4.2.1 Unconditional convergence in distribution . . . . . . . . . . . 74
4.2.2 Conditional Convergence in Distribution . . . . . . . . . . . . 84
4.3 Analysis of real data for a 2?2?22 table . . . . . . . . . . . . . . . 89
iv
5 Case 2 Results. 91
5.1 Case 2 consistency results of Jiang [14]. . . . . . . . . . . . . . . . . . 91
5.2 The Simple case (fi = 0). . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3 The logistic model logitP(yijk = 1jbij) = ?+bij: . . . . . . . . . . . . 99
6 Simulation 106
6.1 Case 1 simulations for ^^flw . . . . . . . . . . . . . . . . . . . . . . . . 107
6.1.1 Case 1 combining a with vi. . . . . . . . . . . . . . . . . . . . 107
6.2 Case 1 logistic random intercept simulation for flxed realizations of
random efiects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3 Case 1 random intercept simulations for penalized likelihood estimators121
6.3.1 Case 1 logistic random intercept combining a with vi. . . . . . 122
6.4 Case 2 logistic simple example . . . . . . . . . . . . . . . . . . . . . . 126
7 Conclusions and Future Work. 130
7.1 Theoretical Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.2 Conclusions from the simulation studies and real data analysis . . . . 131
7.3 Future work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
A Simulation Results 133
A.1 Case 1 random intercept model for penalized likelihood estimator . . 133
A.1.1 Case 1 logistic random intercept combining a with vi. . . . . . 133
A.1.2 Case 1 logistic random intercept when a and vi are estimated
separately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
A.1.3 Case 1 Poisson random intercept combining a and vi. . . . . . 151
A.1.4 Case 1 Poisson random intercept with a and vi estimated sep-
arately . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Bibliography 159
v
List of Tables
1.1 Notations used throughout the Thesis . . . . . . . . . . . . . . . . . . 5
1.2 Table 1.1 (Continued): Notation used throughout the Thesis . . . . . 6
4.1 2?2?m table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2 Simulated estimates of logistic odds ratio in 2?2?m balanced table. 76
4.3 Simulated standardized estimates of log odds ratio (standardized by
true and estimated conditional standardized error) in 2?2?m bal-
anced table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.4 Simulated estimates of log odds ratio (standardized by true and esti-
mated conditional standardized error) in 2?2?m unbalanced table. 81
4.5 Simulated standardized estimates of log odds ratio (standardized by
true and estimated conditional standardized error) in 2?2?m un-
balanced table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6 Simulated estimates of logistic odds ratio for flxed realizations of uni-
formly distributed random efiects (m = 10;n1 = 200) and fl0 = 0:5. . 85
4.7 Simulated standardized estimate of logistic odds ratio with flxed real-
izations of uniformly distributed random efiects (m = 10; n1 = 200)
and fl0 = 0:5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8 Simulated estimates of logistic odds ratio with flxed realizations of
uniformly distributed random efiects (m = 10;n = 200) and fl0 = 1. . 87
4.9 Simulated standardized estimate of logistic odds ratio with flxed real-
izations of uniformly distributed random efiects (m = 10; n1 = 200)
and fl0 = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.10 Summary of simulation of our estimator and Mantel-Haenszel esti-
mator of log odds ratio. . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.1 Simulated estimates of logistic and Poisson regression coe?cient com-
bining flxed intercept with uniformly distributed random efiects. . . . 110
6.2 Simulated standardized estimate of logistic and Poisson regression
coe?cient with uniformly distributed random efiects. . . . . . . . . . 111
vi
6.3 Simulated estimates of logistic and Poisson regression coe?cient com-
bining flxed intercept with normally distributed random efiects. . . . 114
6.4 Simulated standardized estimate of logistic and Poisson regression
coe?cient with normally distributed random efiects. . . . . . . . . . . 115
6.5 Simulated estimates of logistic regression coe?cient combining flxed
efiect and flxed realizations of uniformly distributed random efiects
(m = 20;n1 = 200). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.6 Simulated standardized estimate of logistic regression coe?cient with
flxedrealizationsofuniformlydistributedrandomefiects(m = 20; n1 =
200). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.7 Simulated estimates of logistic regression coe?cient combining flxed
efiect and flxed realizations of uniformly distributed random efiects
(m = 40;n1 = 240). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.8 Simulated standardized estimate of logistic regression coe?cient with
flxedrealizationsofuniformlydistributedrandomefiects(m = 40; n1 =
240). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.9 Simulatedstandardizedestimatesoflogisticregressioncoe?cient(stan-
dardized by true and estimated conditional standardized error) com-
bining flxed intercept with random efiects. . . . . . . . . . . . . . . . 124
6.10 Simulated standardized estimate of logistic regression coe?cient for
various initial values. . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.11 Simulated standardized estimate of logistic regression coe?cient for
various ?1 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.12 Simulated standardized estimate of ? and ? in case 2 simple example
for various (m;n1) values . . . . . . . . . . . . . . . . . . . . . . . . . 128
A.1 Simulatedstandardizedestimatesoflogisticregressioncoe?cient(stan-
dardized by true and estimated conditional standardized error) com-
bining flxed intercept with random efiects. . . . . . . . . . . . . . . . 136
A.2 Simulated standardized estimate of logistic regression coe?cient for
various initial values. . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.3 Simulated standardized estimate of logistic regression coe?cient for
various ?1 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
vii
A.4 Simulatedstandardizedestimatesoflogisticregressioncoe?cient(stan-
dardized by true and estimated conditional standardized error) with
a and vi estimated separately. . . . . . . . . . . . . . . . . . . . . . . 147
A.5 Simulated standardized estimate of Poisson regression coe?cient for
various (m;n1) values. . . . . . . . . . . . . . . . . . . . . . . . . . . 152
A.6 Simulated standardized estimate of logistic regression coe?cient for
various ?1 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
viii
List of Figures
4.1 Q-Q plots for (^flw ? fl0) and (^^flw ? fl0) standardized by true and
estimated conditional standard error, for various values of (m;n) in
logistic 2?2?m balanced setup. . . . . . . . . . . . . . . . . . . . . 78
4.2 Q-Q plots for (^flw ? fl0) and (^^flw ? fl0) standardized by true and
estimated conditional standard error, for various values of (m;n) in
logistic 2?2?m unbalanced setup where n1=600, 400 for (1/3,2/3),
(1/4, 3/4) setups respectively . . . . . . . . . . . . . . . . . . . . . . . 83
6.1 Q-Q plots for (^flw ? fl0) and (^^flw ? fl0) standardized by true and
estimated standard error, for various values of (m;n) in case 1 random
intercept model with uniformly distributed random efiects. . . . . . . 112
6.2 Q-Q plots for (^flw ? fl0) and (^^flw ? fl0) standardized by true and
estimated standard error, for various values of (m;n) in case 1 random
intercept model with normally distributed random efiects. . . . . . . 116
6.3 Q-Q plots for standardized (^? ? ?0) and (^? ? ?0) by true standard
error, for various (m;n1) in case 2 logistic simple example. . . . . . . 129
A.1 Q-Q plots for (^fl1 ? fl10) standardized by true conditional standard
error combining a with vi, for various (m;n1) in logistic random in-
tercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
A.2 Q-Q plots for (^fl1 ?fl10) standardized by estimated conditional stan-
dard error combining a with vi, for various (m;n1) in logistic random
intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
A.3 Q-Q plots for (^fl1 ? fl10) standardized by true conditional standard
error combining a with vi, for various initial values of ^fl1 in logistic
random intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . 142
A.4 Q-Q plots for (^fl1 ? fl10) standardized by estimated standard error
combining a with vi, for various initial values of ^fl1 in logistic random
intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A.5 Q-Q plots for (^fl1 ? fl10) standardized by true conditional standard
error combining a with vi, for various values of ?1 in logistic random
intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
ix
A.6 Q-Q plots for (^fl1 ?fl10) standardized by estimated conditional stan-
dard error combining a with vi, for various values of ?1 in logistic
random intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . 145
A.7 Q-Q plot for (^fl1 ? fl10) standardized by true conditional standard
error not combining a with vi, for various (m;n1) in logistic random
intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A.8 Q-Q plot for (^fl1 ?fl10) standardized by estimated conditional stan-
dard error with estimated a with vi separately, for various (m;n1) in
logistic random intercept case 1. . . . . . . . . . . . . . . . . . . . . . 150
A.9 Q-Q plots for (^fl1 ? fl10) standardized by true conditional standard
error with a and vi estimated separately in Poisson random intercept
case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
A.10 Q-Q plots for(^fl1 ?fl10) standardized by estimated conditional stan-
dard error with a and vi estimated separately in Poisson random
intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
A.11 Q-Q plots for (^fl1?fl10) for standardized by true conditional standard
error with a and vi estimated separately, for various values of ?1 in
Poisson random intercept case 1. . . . . . . . . . . . . . . . . . . . . 156
A.12 Q-Q plots for (^fl1 ?fl10) standardized by estimated conditional stan-
dard error with a and vi estimated separately, for various values of
?1 in Poisson random intercept case 1. . . . . . . . . . . . . . . . . . 157
x
Chapter 1
Introduction
This thesis is focused on asymptotic normality of flxed efiect estimates in
Generalized Linear Mixed Models (GLMMs) for the canonical link function case
when the number of random efiects is large. Due to the wide range of applications
of GLMMs, these models have received substantial attention during the last decade.
Although asymptotic behavior of flxed efiect estimates has been wellstudied in linear
regression models, Generalized Linear Models (GLMs) and Generalized Estimating
Equations (GEEs), for GLMMs there is still a lot of work to do. These models and
the main idea of this thesis are addressed brie y below.
Generalized Linear Models (GLMs), originally introduced by Nelder and Wed-
derburn [23], provide a unifled family of models that is widely used for regression
analysis. These models are intended to describe non-normal responses. In partic-
ular, they avoid having to select a single transformation of the data to achieve the
possibly con icting objectives of normality, linearity and homogeneity of variance.
Important examples include binary and count data. They can be applied to a wide
array of discrete, continuous and censored outcomes. They are most commonly used
when the outcomes are independent. These models are described in detail in Section
2.1. Another important extension of GLM is the Quasilikelihood model approach
which can be deflned by specifying only the relation between the mean and the
1
linear predictors and between the mean and variance of observations. This model
is illustrated in Section 2.2.
However, in many applications, independence of outcomes is not a reasonable
assumption. This is particularly obvious in longitudinal studies, where multiple
measurements made on the same individual are likely to be correlated. One tech-
nique for the analysis of such general correlated data is the generalized estimating
equations (GEE) approach introduced by Liang and Zeger [37]. This approach has
the desired quality of independence between subjects while adjusting for the cor-
relation structure within subjects. These models are described in detail in Section
2.3.
We may wish to build a model that accommodates correlated data, or to
consider the levels of a factor as selected from a population of levels in order to
make inference about that population. Generalized Linear Mixed Models (GLMMs)
extend the framework of Linear Mixed Models (LMMs) and of Generalized Linear
Models (GLMs) by allowing for non-Gaussian data, nonlinear link functions and
inclusion of random efiects and of correlated error. These models are described in
detail in Section 2.4.
Asymptotic normality and consistency results for flxed efiect estimates have
been proved for GLMs ( e.g., Habermann [13] and Fahrmeir and Kaufmann [11]) and
also for GEEs ( e.g., Zeger and Liang [37] and Xie and Yang [35]). For GLMMs, most
work about asymptotic properties of flxed efiect estimates is based on eliminating
random efiects either by integrating them out (e.g., Sinha [30]) or by conditioning on
minimal su?cient statistics of random efiects (e.g, Sartori and Severini [28] ). Simply
2
eliminating random efiects may discard information. In addition, integrating out all
the random efiects requires knowledge about the distribution of all the random
efiects.
Jiang [14] extended GLMMs to generalized GLMMs by making only an expec-
tation assumption for random efiects instead of a normal distribution assumption.
He divided generalized GLMMs into two cases: m=N ! 0 (case 1) and m=N 9 0
(case 2) where m is the number of levels of random efiects and N is the sample
size. Both m and N go to inflnity. For case 1 and 2, he proposed two methods:
Penalized Generalized Weighted Least Squares (PGWLS) and Maximum Condi-
tional Likelihood (MCL). Under reasonable conditions, consistency of both flxed
and random efiect estimates was proved rigorously. Jiang [14] used three examples
to illustrate those reasonable conditions. The concepts and results mentioned above
are discussed in Section 2.5.
In order to flnd approximate tests and confldence regions, an asymptotic nor-
mality result is needed. In Chapter 3, focusing on the random intercept problem,
we propose a new estimator ^^flw of regression coe?cients and prove that when m,
the number of random efiects, grows to inflnity at a slower rate than the smallest
cluster sample size, ^^flw is consistent and given the realization of random efiects, is
asymptotically normal. We also show how to estimate the normalizer and weight
matrices of our estimators. We also study the asymptotic distribution of Jiang?s [14]
penalized likelihood estimators. In the absence of regression coe?cients, the normal-
ized estimated intercept pm(^a?a0) converges to a normal distribution. Di?culties
arise in establishing the conditional asymptotic normality of the penalized likelihood
3
estimator ^fl of regression coe?cients for flxed efiects in a general GLMM.
In Chapter 4, we make an extended analysis of the 2?2?m table to show
how to verify the general conditions in Chapter 3. We compare our estimator to
the Mantel-Haenszel estimator. Simulation studies and real data analysis results
validate our theoretical results. In Chapter 5, asymptotic normality of joint flxed
efiect estimate and scale parameter estimate is proved for the case as m=N 9 0.
An example was used to verify the general conditions in this case. In Chapter 6, in
order to check the asymptotic results of the theorems in Chapter 3 and 5, logistic
and Poisson random intercept models are simulated in case 1. For case 2, a simple
model is simulated. The Splus built-in function nlminb is used to compute the
estimates. The contents of Chapters 3-6 are new.
In Chapter 7, we summarize our theoretical and simulation results, and discuss
the potential direction of future work.
1.1 Notations.
4
Table 1.1: Notations used throughout the Thesis
Symbol Meaning
Bij b00ij(?ij0)+ 12b(3)ij (??ij)(^?ij ??ij0)
rij 12b(3)ij (??ij)(^?ij ??ij0)
~B diag(Bij)1?i?m ; 1?j?n
i
E?i Pj Bij
n? min1?i?m ni
C??1 ??1=(?1Pi(E?i )?1 +m)
^?ij ^ai +xtij^fli
^^?ij ^ai +xtij^^flw
5
Table 1.2: Table 1.1 (Continued): Notation used throughout the Thesis
Symbol Meaning
DN( ) ?@2lP( )=@2 
gN( ) @lP( )=@ 
1ni the ni ?1 vector whose elements are all 10s
kuk = (Pni=1 u2i)1=2 vector norm for vector u 2 Rn
kAk = supkuk=1jutAuj = maxj(j?jj) matrix norm for a symmetric matrix A
?max(A) = supkuk=1utAu largest eigenvalue of A
?min(A) = infku=1kutAu smallest eigenvalue of A
A =
0
BB
@
A11 A12
A21 A22
1
CC
A Partitioned matrix A
A11:2 = A11 ?A12A?122 A21 Schur component of A11
For notational simplicity, we let b00ij = b00(?ij0), b0ij = b0(?ij0), DN = DN( 0)
and gN = gN( 0), ect. where ?ij0 and  0 are the true values of ?ij and  respectively.
A1=2 orAT=2 represents a left or right square root. A? represents generalized inverse.
6
Chapter 2
Literature Review
2.1 Generalized Linear Models
Nelder and Wedderburn [23] introduced Generalized Linear Models (GLMs)
as a unifying class of models which are widely used in regression analysis. Section
2.1.1 presents their original deflnition and the extended deflnition from Fahrmeir
and Kaufmann [11]. Asymptotic properties of estimates are described in Section
2.1.2.
2.1.1 Deflnitions
GLMs were originally described as follows: A vector of observations y having
n components is assumed to be a realization of a random variable Y whose compo-
nents are independently distributed with means ?i, i = 1;:::;n. Assume that the
components Yi are independent and have a distribution in the exponential family,
taking the form
f(yi; i;`) = exp
?y
i i ?b( i)
a(`) +c(yi;`)
 
(2.1)
for some speciflc functions a(?), b(?) and c(?). Then
E(Yi) = ?i = b0( i);
Var(Yi) = b00( i)a(`):
7
Deflne g(?i) = xTi fl = ?i, where fl = (fl1;:::;flp)T is a vector of unknown para-
meters, xiT = (xi1;:::;xip) is a vector of covariates and g is a known link function
(since it links together the mean of yi and the linear form of predictors).
The large sample theory of GLM?s is derived by Fahrmeir and Kaufmann [11].
They consider GLM?s with the following structure and notations.
(i) The fyig are independent q-dimensional random variables with densities
f(yij i) = c(yi)exp( Ti yi ?b( i)) (2.2)
of the natural exponential type,  i 2 ?0 (assume ? 2 <2 to be the natural
parameter space, that is, the set of all  satisfying 0 < R c(y)exp( Ty)dy <
1). Then ? is convex, and in the interior ?0 of ?, all derivatives of b( )
and all moments of y exist (assume ?0 6= ;). In particular we have E(y) =
@b( )=@ ? ?( ) 2<2 and Cov(y) = @2b( )=@ @ T ? ?( ), a q ?q matrix.
(ii) The deterministic matrix xi in uences yi in the form of a linear combination
?i = xTi fl, where fl is a p-dimensional parameter.
(iii) The linear combination ?i is related to the mean ?( i) of yi by the injective
link function g : M ! <q where M 2 <q is the range of the function ?( ),
?i = g(?( i)). One can write  i = u(xTi fl), where u is the injective function
(g??)?1, mapping < into ?.
Here (2.2) is difierent from (2.1) because the response variable may be a vec-
tor rather than a scalar and (2.1) has an additional nuisance parameter `. The
maximum likelihood estimating equations are unchanged by the presence of a(`),
8
but the information matrix has to be multiplied by an unknown scale factor a(`),
which can be estimated consistently. Thus, without loss of generality, Fahrmeir and
Kaufmann [11] deflne GLM?s using the simpler form (2.2). By allowing y to be a
vector, Fahrmeir and Kaufmann [11] can accommodate multivariate responses, such
as multinomial data.
Then the loglikelihood of a sample y1;:::;yn is given by
ln(fl) =
nX
i=1
( Ti yi ?b( i))?Cl (2.3)
where  i = u(xTi fl), i = 1;??? ;n and Cl does not depend on fl.
Setting ?i(fl) = ?(u(xTi fl)), ?i(fl) = ?(u(xTi fl)), Ui(fl) = [@u(xTi fl)=@?]T
and difierentiating ln(fl), we flnd the score function sn(fl) and the information ma-
trix Fn(fl) to be
sn(fl) = @ln(fl)=@fl =
nX
i=1
xiUi(fl)(yi ??i(fl)) (2.4)
Fn(fl) = Cov[sn(fl)] =
nX
i=1
xiUi(fl)?i(fl)UTi (fl)xTi (2.5)
Further difierentiation yields
Hn(fl) = ?@2ln(fl)=@fl@flT: (2.6)
It is easy to see that E(sn(fl)) = 0 and E(Hn(fl)) = Fn(fl).
For canonical link functions, these expressions simplify considerably:
sn(fl) =
nX
i=1
xi(yi ??i(fl)); Fn(fl) =
nX
i=1
xi?i(fl)xTi ; Hn(fl) = Fn(fl):
The true parameter is denoted by fl0, and Fn(fl0) is written as Fn for simplicity.
Similarly we write sn and Hn, etc.
9
2.1.2 Asymptotics
Based on maximum likelihood, under reasonable conditions, asymptotic ex-
istence, consistency and asymptotic normality of flxed efiect estimates are proved
by Haberman [13] and Fahrmeir and Kaufmann [11]. The details of their work are
presented in the following.
Haberman [13] provides conditions for the asymptotic existence of the MLE by
use of the general theory for exponential models derived by Berk [2] and Barndorfi-
Nielsen [4]. Here asymptotic properties of MLE are considered for canonical link
functions when the dimension of fl grows to inflnity. The main idea is that the
parameter space grows with the sample size, but one only wants to estimate a linear
functional of the natural parameter which is determined by a flnite dimensional sub-
space of the parameter space. Suppose ? = ?( ) is the linear functional and ^?n is the
unique MLE of ?( ), and  n(?) is the asymptotic standard deviation of ^?n. Under
some technical conditions , consistency results like  n(?) !p 0 and ^?n !p ?( ) as
n ! 1 are established. Based on Newton?s method, ^?n is proved to be asymp-
totically normal with asymptotic mean ?( ), meaning that (^?n ? ?( ))= n(?) !d
N(0;1). Asymptotic confldence intervals are also considered since the MLE ^ n(?) of
 n(?) is a consistent estimate of  n(?) (e.g., ^ n(?)= n(?) !p 1). There are about six
difierent inner products appearing back and forth in Haberman?s [13] paper which
makes it intractable and not easily understood. Fahrmeir and Kaufmann [11] present
mild general conditions which, respectively, assure weak or strong convergence or
asymptotic normality of the MLE for both canonical and noncanonical link function
10
cases, where the dimension of fl is flxed.
In the canonical link function case, the normality condition, though obtained
by a difierent approach, is closely related to one of Haberman?s [13] conditions. The
assumptions of Fahrmeir and Kaufmann [11] are simpler to interpret and check and
they seem slightly weaker. In addition to regularity conditions on the parameter
space, link function and covariates, in order to derive conditions for consistency and
asymptotic normality of MLE, they deflne a sequenceNn(?), ? > 0, of neighborhoods
of fl0 (the true value of fl) as
Nn(?) = ffl : kFT=2n (fl?fl0)k? ?g
where n = 1;2;??? and A1=2 is a left square root of the positive deflnite matrix A
and AT=2 denotes (A1=2)T so that A1=2AT=2 = A. Their conditions are:
(D) Divergence: ?min(Fn) !1.
(C) Boundedness from below: for all ? > 0, Fn(fl) ? cFn is positive semideflnite,
fl 2 Nn(?), n ? n1 with some constants n1 = n1(?), c > 0 independent of ?.
(N) Convergence and continuity: for all ? > 0, maxfl2Nn(?)kVn(fl)?Ik! 0, where
Vn(fl) = F?1=2n Fn(fl)F?T=2n is the normed information matrix.
(S?) Boundedness of the eigenvalue ratio: there is a neighborhood N ? B of fl0
such that
?min(F(fl)) ? c(?max(Fn))1=2+?; fl 2 N; n ? n1:
where B, the set of admissible values of fl, is open in <p and additionally,
convex for canonical link functions and c > 0, ? > 0, n1 are some constants.
11
Fahrmeir and Kaufmann?s [11] main theorems for the canonical link function case
are the following:
Theorem 2.1.1. Under (D) and (C), there is a sequence f^flng of random variables
with
(i) P[sn(^fln) = 0] ! 1 (asymptotic existence),
(ii) ^fln !p fl0 (weak consistency).
Theorem 2.1.2. Under (D) and (S?) with a ? > 0, there is a sequence f^flng of
random variables and a random number n2 with
(i) P(sn(^fln) = 0 for all n ? n2) = 1,
(ii) ^fln !a:s: fl0 (strong consistency).
Lemma 2.1.3. Under (D) and (N), the normed score function is asymptotically
normal:
F?1=2n sn !d N(0;I):
Theorem 2.1.4. Under (D) and (N), the normed MLE is asymptotically normal:
FT=2n (^fln ?fl0) !d N(0;I):
Remark: Because B us convex and Fn(fl) is positive deflnite, there is at most
one zero of the score function. Lemma 2.1.3 and Theorem 2.1.4 hold for any version
of the matrix square root.
Remark: InpracticethenormalizingmatrixFT=2n mustbereplacedbyFT=2n (^fln).
Fahrmeir and Kaufmann [11] state that FT=2n (^fln)(^fln ?fl0) !d N(0;I) if the fol-
lowing condition holds
12
(Q) For all ? > 0, supfl2Nn(?)kF1=2n F1=2n (fl)?Ik! 0
They state that (Q) =) (N). If F1=2n is the Choleski square root, then (N) =) (Q).
Likewise, (N) =) (Q) if ?max(Fn)=?min(Fn) ? c < 1, n ? n0 for some constants c
and n0.
They also verify these general conditions in several examples, including Poisson
model, response with a bounded range, regressor with compact range and stochastic
regressor.
Noncanonical link functions enlarge the class of GLMs but they cause ad-
ditional di?culties in establishing consistency and asymptotic normality, mainly
because of the existence of the MLE can not be guaranteed except in special cases
(for a number of important examples see Wedderburn [33]). A local maximum of
the likelihood in a neighborhood of the true fl0 does not necessarily deflne a global
maximum. Here consistency and asymptotic normality results only apply to a se-
quence f^flng of solutions of maximum likelihood equations sn(fl) = 0. The same
line of arguments in the canonical link function case is followed in the proof, but
only for conditions (C), (D), (N), (S?) with Fn(fl) replaced by Hn(fl).
2.2 Quasilikelihood
Another important extension of GLMs is the Quasilikelihood model approach
which was introduced by Wedderburn [34]. To deflne a likelihood one has to specify
the form of distribution of the observations. It may be di?cult to decide what
distribution the observations follow, but to deflne a quasilikelihood function one
13
needs only to specify a model for the mean and a relation between the mean and
variance of the observations.
This is is what makes quasi-likelihood useful. By using the chain rule we can
rewrite Equation (2.4) in the following form
sn(fl) = @ln(fl)=@fl =
nX
i=1
xi@?i@?
flfl
fl?=xTi fl ??1i (fl)(yi ??i(fl)): (2.7)
We can see that the score function in (2.7) depends on the parameters only
through the mean ?i(fl) and ?i(fl) = ?i(?i(fl)), the variance function.
Fahrmeir [12] extended the analysis of correctly specifled GLM?s to misspec-
ifled GLM?s, where misspeciflcation occurs if the true density of response variable
is not of the assumed exponential type, and/or the true expectation yi can not be
represented by ?(u(xtifl)).
He followed the same line argument as Fahrmeir and Kaufmann [11] and
proved consistency and asymptotic normality of Maximum Quasilikelihood Esti-
mators (MQLE) under very similar conditions. Fahrmeir [12] also verifled these
conditions in several examples like regressors with a compact range, response y
with bounded range and univariate models, etc.
2.3 Generalized Estimating Equations
The class of GLM?s (Nelder and Wedderburn [23]) plays a central role in re-
gression problems with discrete or nonnegative responses. This class of regression
models was extended by Liang and Zeger [37] to analyze longitudinal or batch cor-
related data. The Liang and Zeger approach is known as Generalized Estimating
14
Equations (GEE). The deflnition of GEE is presented in Section 2.3.1. Several
approaches to the asymptotic properties of GEE are summarized in Section 2.3.2.
2.3.1 Deflnition
The GEE model is the following: Suppose (yij;xij) are observations for the
jth measurement on the ith subject, j = 1;2;:::;ni and i = 1;2;:::;m, where yij
is a scalar response, xij is a p?1 covariate vector, and ni is the cluster size. Assume
that the observations on difierent subjects are independent and the observations
on the same subjects are correlated. For i = 1;:::;m, let yi = (yi1;:::;yini)T and
Xi = (xi1;:::;xini)T. Liang and Zeger [37] used a generalized linear model to model
the marginal density of yij (with respect to a  -flnite measure ?):
f(yijjxij;fl;`) = exp[fyij ij ?b( ij)+c(yij)g=`]: (2.8)
As in Section 2.1,  ij = u(?ij), u is a known injective function mapping < into ? and
?ij = xTijfl. The vector fl contains the regression parameters of interest, and ` is a
nuisance scale parameter. Under such a model speciflcation, the flrst two moments
of yij are given by
?ij(fl) = E(yijjxij;fl;`) = b0( ij);  2(fl) = Cov(yijjxij;fl;`) = b00( ij)`: (2.9)
Let g(t) = (b0?u)?1(t); then g(?ij(fl)) = xtijfl. The function g(t) is the link function
and its inverse function h(s) = (b0 ? u)(s) is called the inverse link function. Of
importance are the canonical link functions, where u(s) = s, so g(t) = (b0)?1(t) and
h(s) = b0(s).
15
Let ?i(fl) = E(yi) = (?i1(fl);:::;?ini(fl))T and ?i(fl) = Cov(yi). We
write Ai(fl) = diag( 2i1(fl);:::; 2ini(fl)) and ?i(fl) = diag(u0(xTi1fl);:::;u0(xTinifl)),
where, for any vector v, diag(v) represents a diagonal matrix whose diagonal ele-
mentsaretheelementsofv. LetDi(fl) = Ai(fl)?i(fl)Xi andVi(fl;fi) = A12i (fl)Ri(fi)A12i (fl).
Here Ri(fi) is the \working" correlation matrix which one can choose freely and
which may possibly contain a nuisance parameter (or parameter vector) fi. If Ri(fi)
is equal to the true (often unspecifled) correlation matrix Ri, then Vi(fl0;fi) =
?i(fl0). Liang and Zeger [37] proposed solving the following equations:
gnm(fl) =
mX
i=1
gni;i =
mX
i=1
Di(fl)TV?1i (fl;fi)(yi ??i(fl)) = 0; (2.10)
which they called \generalized estimating equations." Let ?min(T) (?max(T)) denote
the smallest (largest) eigenvalue of the matrix T, and deflne
Mnm(fl) = Cov(gnm(fl))
=
mX
i=1
DTi (fl)V?1i (fl;fi)?i(fl)V?1i (fl;fi)Di(fl); (2.11)
Dnm(fl) = ?@gnm(fl)@flT ; (2.12)
Hnm(fl) =
mX
i=1
DTi (fl)V?1i (fl;fi)Di(fl): (2.13)
Let
gnm = gnm(fl0); Hnm = Hnm(fl0);
and
Mnm = Mnm(fl0); Fnm = HnmM?1nmHnm:
16
2.3.2 Asymptotics
The GEE method allows the nuisance parameter to be determined from the
sample, which extends the  exibility and applicability of the method. Furthermore,
the covariance matrices ?i(fl) can be estimated from the data, so that large-sample
testing and interval estimation is possible. In the past, most research in GEE
has been directed to methodological development and modeling issues. Most of
the work relies on the asymptotic results presented by Liang and Zeger [37], in
which exact conditions are not specifled. Xie and Yang [35] developed a set of
(information matrix based) general conditions, which leads to the proof of weak and
strong consistency as well as asymptotic normality. In both papers, the dimension
of fl is flxed.
Liang and Zeger [37] claim consistency and asymptotic normality of GEE
estimator under regularity conditions when the number of independent subjects goes
to inflnity and the number of observations on each subject stays bounded. Exact
conditions are not specifled. E?ciency is improved if the \working" correlation is
correct or close to correct.
Xie and Yang [35] present asymptotic properties of the GEE estimator ^flnm in
each of three distinct large sample settings:
(i) m !1 and n = n(m) = max1?i?m ni is uniformly bounded for all m,
(ii) m is bounded but n !1,
(iii) n !1 as m !1,
17
where m is the number of independent clusters and ni is the cluster size.
Liang and Zeger [37] only consider large sample setting (i). Most of the con-
ditions Xie and Yang [35] derived for consistency and asymptotic normality of GEE
estimator parallel the elegant conditions presented by Fahrmeir and Kaufmann [11].
They also ignore the nuisance parameter ` in their equations (2.10) and (2.11),
following (2.4) and (2.5) of Fahrmeir and Kaufmann [11]. Also for simplicity, they
do not study the efiect of estimating the nuisance parameter fi that appears in the
working correlation matrix Ri(fi).
In addition to regularity conditions on the parameter space, link function
and covariates, Xie and Yang [?] propose the following conditions for asymptotic
behaviors of GEE estimation:
(Iw) ?min(Fnm) !1:
(Lw) There exists a constant c0 > 0, for any r > 0 such that
P ?DTnm(fl)M?1nmDnm(fl) ? c0Fnm and
Dnm(fl) is nonsingular ;for all fl 2 Bnm(r)) ! 1
where Bnm(r) = ffl : kM?12nmHnm(fl?fl0)k? rg:
(I?w) (?nm)?1?min(Hnm) !1; where ?nm = max1?i?mf?max(R?1i (fi)Ri)g:
(L?w) There exists a constant c0, for any ? > 0 and r > 0, such that
P (Dnm(fl) ? c0Hnm and
Dnm(fl) is nonsingular; for fl 2 B?nm(r)) ! 1
18
where B?nm(r) = ffl : kH12nm(fl?fl0)k? (?nm)12rg:
(CC) For any given r > 0 and ? > 0
P
?
sup
fl2B?nm(r)
kH?12nmDnm(fl)H?12nm ?Ik < ?
!
! 1:
The matrix norm is the Euclidean matrix norm and their main theorems are the
following:
Theorem 2.3.1. Under conditions (Iw) and (Lw), there exists a sequence of random
variables ^flnm, such that
P(gnm(^flnm) = 0) ! 1
and
^flnm ! fl0 in probability:
Theorem 2.3.2. The results of Theorem 2.3.1 hold if (Iw) and (Lw) are replaced
by (I?w) and (L?w), respectively.
Theorem 2.3.3. Suppose that conditions (Iw), (Lw) and (CC) hold, or that condi-
tions (I?w) and (CC) hold. Then there exists a sequence of solutions ^flnm to the GEE
equation in B?nm(r) such that M?12nmHnm(^flnm?fl0) and M?12nmgnm are asymptotically
identically distributed.
Fort > 0, let?(t)beapositivenondecreasingfunctionsuchthatlimt!1?(t) =
1 and t?(t) is a convex function. Xie and Yang [35] use ?(t) to establish Lindeberg
conditions. Examples include ?(t) = t1=?, ? > 0 and ?(t) = exp(t).
19
Lemma 2.3.4. Under the GEE setting, suppose there exist a constant K (indepen-
dent of n) and an integer n0 such that, for j = 1;2;:::;ni and i = 1;2;:::;m, when
n > n0
E[y?2ij ?(y?2ij )] ? K
where y?i = (y?i1;:::;y?imi)T = A?12i (yi ??i). In addition, for any " > 0,
cnm~?nmn
"
?
?
"2
cnm~?nmn (D)nm
!#?1
! 0
Then, when m !1, we have
M?12nmgnm ! N(0;I) in distribution
where cnm = ?max(M?1nmHnm),
 (D)nm = max
1?i?m
?max(H?12nmDTi V?1i DiH?12nm):
Strong consistency of the GEE estimator can be proven under the following
condition.
(Ls) In a neighborhood of fl0, say N, there exists a constant c0 > 0 (independent
of m) and ? > 0 such that when m !1.
?min(Dnm(fl)TM?1nmDnm(fl)) ? c0(logm)2(1+?)
and Dnm(fl) is nonsingular a:s: for fl 2 N.
Theorem 2.3.5. Suppose gni;i, i = 1;:::;m, the summands of the GEE score
function gnm, form a inflnitesimal double array sequence. Under condition (Ls),
there exist a sequence of random variables ^flnm and a random number n0, such that
P(gnm(^flnm) = 0; for all n ? n0) = 1
20
and when m !1
^flnm ! fl0 a:s:
Also they stated that when m is bounded or n goes to inflnity too fast, as-
ymptotic normality of gnm or ^flnm does not hold without specifying the dependence
structure on each subject. They even point out that if we put Ri = I, gnm(fl) is
asymptotically normally distributed if and only if n=m ! 0. Also for settings (i)
and (iii) when n is bounded above or tends to inflnity at a limited rate as m !1,
they present a set of su?cient conditions to ensure asymptotic normality of gnm and
^flnm.
They verify these general conditions for some cases of practical importance,
such as marginal GLM with compact covariate set, marginal Poisson regression
model and marginal GLMs with bounded responses (binomial or polytomous re-
gression models).
A drawback to the GEE approach is that it assumes all subjects have the same
covariance structure. The \working" correlation matrix is not necessarily an essen-
tial feature of GEE. Jiang [16] proposed a nonparametric quasi-likelihood approach
for getting a nonparametric estimator of the unknown covariance matrices. Chiou
and M?uller [7] proposed Estimated Estimating Equations (EEE) whose covariance
structure is modeled nonparametrically as a function of the mean and therefore is
an essential component that is part of the model fltting.
The difierence between the approaches of Jiang [16] and Chiou and M?uller [7]
is that the mean function is correctly specifled in Jiang [16] but unknown in Chiou
21
and M?uller [7]. It has been shown by Jiang [16] that the estimator obtained from
the nonparametric quasi-likelihood approach has an asymptotic normal distribution,
the same as the estimator obtained from quasi-likelihood approach with true covari-
ance matrix. Moreover, the rate of convergence has been established. Chiou and
M?uller [7] gave a sketchy proof on consistency and asymptotic normality of their
EEE estimator along the lines of McCullagh and Nelder [19].
2.4 Generalized Linear Mixed Models
Generalized Linear Mixed Models (GLMM?s) are a natural extension of GLM
by including random efiects. It is usually assumed that the random efiects have
a multivariate normal distribution whose variance components are to be estimated
from the data. In Section 2.4.1 the deflnition of GLMM is presented and some
asymptotic results are discussed in Section 2.4.2.
2.4.1 Deflnitions
The structure of GLMM?s (McCulloch and Searle [22]) is the following:
The response vector y is typically, but not necessarily, assumed to consist of
conditionally independent elements, each with a distribution with density from the
exponential family:
yijfi ? independent; fYijfi(yijfi)
fYijfi(yijfi) = expf[yi i ?b( i)]=a(`)+c(yi;`)g:
22
E[yijfi] = ?i
g(?i) = xtifl +ztifi = ?i:
Here, g(?) is a known function, called the link function, xti is the ith row of the
model matrix for the flxed efiects, and fl is the flxed efiects parameter vector. To
that speciflcation we have added zti, which is the ith row of the model matrix for
the random efiects, and fi, the random efiects vector. To complete the speciflcation
we assign a distribution to the random efiects:
fi ? ffi(fijD)
E(fi) = 0
where D represents the parameters governing the distribution of fi. Often f(fijD)
is multivariate normal and D is the covariance matrix.
In GLMM, the canonical parameter  i is related to the covariate by  i =  (?i).
When  i = ?i, the link is said to be the canonical link.
2.4.2 Estimation approaches
According to Jiang [14], the GLMM setup is divided into two cases: the case
where there is enough information about the random efiects and the case where
there is not. The flrst case is characterized by m=N ! 0 (case 1), while the second
by m=N 90 (case 2), where m is the dimension of the random efiects and N is the
sample size.
In case 1 when the dimension of flxed efiects is flxed, Sartori and Severini [28]
extend Davison?s [9] conditional likelihood approach for GLMs to GLMMs. The
23
Conditional Maximum Likelihood Estimate (CMLE) is deflned by Andersen [1] in
the following way:
To describe the situation Neyman and Scott [24] introduced the concept of
structural and incidental parameters as follows. Consider a sequence of independent
random variables X1;X2;X3;::: The distribution of Xi depends on the parameters
fl and ?i, where the value of fl is the same, independent of i, while the value of
?i changes with i. Then fl is called a structural parameter and the ??s incidental
parameters.
Andersen [1] discussed a general method for obtaining consistent estimates for
a structural parametersfl of flxed dimension in the presence of an increasing number
of incidental parameters. He eliminated the incidental parameters by considering
the conditional distribution given minimal su?cient statistics for the ??s. The value
of fl that maximizes this conditional distribution is then called the Conditional
Maximum Likelihood Estimate (CMLE) for fl.
Sartori and Severini [28] showed that the conditional likelihood function is
valid for any distribution of the random efiects. Hence, the inferences about the
flxed efiects are insensitive to misspeciflcation of the random efiect distribution.
Furthermore, Andersen [1] showed that the convergence of the normalized ^fl to a
normal distribution holds given the random efiects. Hence, the asymptotic normality
of ^fl is valid for any random efiect distribution.
Li, Lindsay and Waterman [18] considered a rectangular array asymptotic
embedding for multistratum datasets, in which both the number of strata and the
number of within-stratum replications increase, and at the same rate. They pointed
24
that under this embedding the MLE is consistent but may not be e?cient owing to a
non-zero mean in its asymptotic normal distribution. By using a projection operator
on the score function, an adjusted MLE can be obtained that is asymptotically
unbiased and has a variance that attains the Cramer-Rao lower bound. The adjusted
MLE can be viewed as an approximation to the conditional MLE.
In case 2 with flxed dimension of flxed efiects, Sinha [30] develops a technique
for flnding a Robust Maximum Likelihood (RML) estimate of the model parame-
ters in GLMM?s by using Huber?s ? function and the Mahalanobis distance, which
appears to be useful in downweighting the in uential data points when estimating
the parameter. The asymptotic properties of RMLE are investigated under regu-
larity conditions. Sinha [30] also proposed a Robust Monte Carlo Newton-Raphson
(RMCNR) algorithm for fltting GLMM?s to avoid the computational problems in-
volving high-dimensional integrals. RMCNR can be considered as a modiflcation of
the Monte Carlo Newton-Raphson (MCNR) method of McCulloch [21].
Breslow and Clayton [8] considered two closely related approximate methods
of inference in GLMM?s: Penalized Quasi-likelihood (PQL), which is based on in-
tegrated quasi-likelihood for integral approximation, and Marginal Quasi-likelihood
(MQL). The major difierence between those two is that PQL hasE(yjfi) = h(XTfl+
ZTfi) in which fi is a vector of random efiects and MQL only has E(y) = h(XTfl).
Lee and Nelder [17] developed a joint likelihood, called h-likelihood, for Hier-
archical Generalized Linear Models (HGLMs) which allows extra error components
in the linear predictors of GLMM?s. In order to get a marginal likelihood, one
has to integrate out the random efiects from the joint likelihood. However, this
25
integration is often quite intractable and at the same time it makes random efiects
nonestimable. By contrast the h-likelihood is easily available and avoids the need for
burdensome integration. Under appropriate conditions, Lee and Nelder [17] showed
that a random efiect MHLE (Maximum h-likelihood estimator) is an asymptotically
best unbiased predictor and that a flxed efiect MHLE is asymptotically e?cient as
the marginal MLE. With the h-likelihood, the scaled deviance test and test statistics
for flxed and random efiects ofier a simple unifled framework of analysis. They also
proposed an extended quasi-h-likelihood and several algorithms.
2.4.3 Random intercept model (canonical link)
The penalized methods proposed by Jiang [14] are discussed in detail in Section
2.5 and we have the following important case of GLMM.
Jiang [14] considers a special case of GLMM in which the responses are clus-
tered into groups with each group associated with a single random efiect (possibly
vector valued). Suppose that given unobservable random vectors fi1 ;:::; fim satis-
fying E(fii) = 0 the responses yij, 1 ? i ? m, 1 ? j ? ni, (ni ? 1) are independent
with E(yijjfi) = b0ij(?ij), where bij(?) is difierentiable. Write,
?ij = a+xtijfl +ztifii
where a is an unknown intercept, fl = (flk)1?k?s (s is flxed) is an unknown vector
of regression coe?cients, and xij = (xijk)1?k?s and zi are known vectors. Such
models are useful, for example, in the context of small-area estimation in which
26
fii represents a random efiect associated with the ith selected area. Here we are
interested in the estimation of the flxed efiect a, flk, 1 ? k ? s, and the \area-
speciflc" random efiects vi = ztifii, 1 ? i ? m. Therefore, we may assume that in
the above model ?ij has the following expression:
?ij = a+xtijfl +vi
where v1;:::;vm are random variables with E(vi) = 0. Note that here we regard
ai = a+vi as a random intercept.
Logistic 2?2?m table is an important example in this case and it is modeled
as logitP(yij = 1jfii) = fii + xijfl where fii is the random efiect, fl is the common
log odds ratio and xij=0 or 1.
2.5 Generalized GLMM (canonical link function case)
In order to apply GLMM, one has to know the distribution of random efiects.
In fact, in many problems little is known about the distribution of random efiects.
Therefore, it is of practical interest to develop methods and models that do not
require strong distributional assumptions. In Section 2.5.1 the deflnition of gener-
alized GLMM is given. Jiang [14] proposed methods called Penalized Generalized
Weighted Least Squares (PGWLS) and Maximum Conditional Likelihood Estimate
(MCLE) which are discussed in Sections 2.5.2 and 2.5.3 respectively. Jiang?s [14]
consistency results are summarized in Section 2.5.4.
27
2.5.1 Deflnitions
Jiang [14] generalized the deflnition of GLMM?s without assuming distribu-
tions of random efiects and yi by conditioning on random efiects in the following
way.
Suppose that, given a vector fi = (fik)1?k?m of unobservable random variables
(the random efiects) satisfying
E(fi) = 0; (2.14)
the responses y1;:::;yN are independent with conditional expectation
E(yijfi) = b0i(?i) (2.15)
where bi(?) is a difierentiable function. Let suppose
?i = xtifl +ztifi (2.16)
where fl = (flj)1?j?p is a vector of unknown constants (the flxed efiects), and xi =
(xij)1?j?p, zi = (zik)1?k?m are known vectors, 1 ? i ? N. In vector notation,
? = Xfl +Zfi.
2.5.2 Penalized Generalized Weighted Least Squares in Case 1
The method of maximum likelihood is widely used for analyzing GLMMs.
A full maximum likelihood analysis requires numerical integration techniques to
calculate the log-likelihood and also the distribution of random efiects needs to be
known. Jiang [14] proposed a method of inference which in many ways resembles
28
the method of Least Squares (LS) in linear models and relies on weak distributional
assumptions about random efiects.
Assume without loss of generality that rank (X) = p and no column of Z is
0. In linear models (LMs), which correspond to (2.15) and (2.16) with bi(?i) = ?2i=2
and m = 0 (i.e., there are no random efiects), a well-known method is weighted least
squares (WLS), which deflnes the estimate of fl as the minimizer of
NX
i=1
wi(yi ??i)2; (2.17)
where wi, 1 ? i ? N, are weights, or equivalently, the maximizer of
NX
i=1
wi
 
yi?i ? ?
2
i
2
?
: (2.18)
A straightforward generalization of this method to the case of GLMM would suggest
the maximizer of the following function as the estimates of fl and fi:
NX
i=1
wi(yi?i ?bi(?i)): (2.19)
However, conditionally, the individual flxed and random efiects may not be identifl-
able. In LM there are two remedies when the identiflability problem arises, namely,
reparameterization and constraints. We shall, for now, focus on the latter. A set of
linear constraints on fi may be expressed as Pfi = 0 for some matrix P. By La-
grange?s method of multipliers, maximizing (2.19) subject to Pfi = 0 is equivalent
to maximizing
NX
i=1
wi(yi?i ?bi(?i))? ?12 kPfik2 (2.20)
without constraint, where ?1 is an additional variable. On the other hand, for flxed
?1 the last term in (2.20) may be regarded as a penalizer. The only thing that
29
needs to be specifled is the matrix P. For any matrix M and vector space V, let
B(V) = fB : B is a matrix whose columns constitute a base for Vg; N(M) = the
null-space of M = fv : Mv = 0g; PM = M(MtM)?Mt, and PM? = I?PM. Let
A 2 B(N(PX?Z)) so that PX?ZA = 0. We deflne the penalized generalized WLS
(PGWLS) estimate of  = (fl;fi) as the maximizer of
lP( ) =
NX
i=1
wi(yi?i ?bi(?i))? ?12 kPAfik2: (2.21)
where ?1 is a positive constant. The notation lP is used because (2.21) may also be
viewed as a penalized conditional quasi-log-likelihood.
Consider the expression (2.21). The reason that one needs a penalizer here is
because the flrst term, lC( ) = PNi=1 wi(yi?i ?bi(?i)), depends on  = (fl;fi) only
through ?. However,  can not be identifled by ?, so there may be many vectors  
for which ? = Xfl +Zfi is the same. The idea is therefore to consider a restricted
space S = f : PAfi = 0g such that within this subspace,  is uniquely determined
by ?.
Case 1 Consistency Results.
Jiang [14] uses the following notations:
Let B = (bij)1?i?k; 1?j?l be a matrix, v = (vi)1?i?k a vector and V a vector
space. Deflnekvk = max1?i?k jvij; kBk = ?12max(BtB),kBkR = (tr(BtB))12,kBk1 =
max1?i?kPlj=1jbijj; BV = fBv : v 2 Vg, ?min(B)jV = infv2Vnf0g(vtBv=vtv).
Jiang [14] presents consistency results for case 1 in the following Theorem 2.5.2
and 2.5.4 and Corollary 2.5.3.
Theorem 2.5.1. Let b00i (?) be continuous, let max1?i?Nfw2iEvar(yijfi0)g be bounded
30
and let
1
N
?
(max
1?u?p
jXuj2)k(XtX)?1XtZk2 +( max
1?k?m
jZkj2)
?
jPAfi0j2 !p 0: (2.22)
Let cN;dN > 0 be any sequences such that limsupkfl0k=cN < 1 and P(kfi0k=dN <
1) ! 1, Mi ? cN Ppu=1jxiuj+dN Pmk=1jzikj, 1 ? i ? N and ^ = (^fl; ^fi) be the
maximizer of lP over ?(M) = f : j?ij? Mi;1 ? i ? Ng. Then
1
N
? pX
u=1
jXuj2(^flu ?fl0u)2 +
mX
k=1
jZkj2(^fik ?fi0k)2
!
!p 0 (2.23)
provided that
p+m
N = o(!
2) (2.24)
where
! = ?min(W?1HW?1)jWS min
1?i?N
fwi inf
jhj?Mi
b00i (h)g
with
W = diag(jX1j;:::;jXpj;jZ1j;:::;jZmj):
Corollary 2.5.2. Let the conditions of Theorem 2.5.2 [including (2.24)] hold.
(i) Suppose p is flxed, and
liminf ?min(XtX)=N > 0 (2.25)
Then ^fl !p fl0.
(ii) Suppose Z = (Z(1)???Z(q)) and correspondingly, fi = (fi1;:::;fiq), where
fiu = (fiuv)1?v?mu, and each Z(u) is a standard design matrix in the same sense as
for U deflned below Lemma 2.5.1, 1 ? u ? q. Let Zuv be the vth column of Z(u) and
nuv = jZuvj2 = the number of appearances of the vth component of fiu.
31
Then
?m
uX
v=1
nuv
!?1 m
uX
v=1
nuv(^fiuv ?fi0uv)2 !p 0; 1 ? u ? q; (2.26)
where ^fiuv and fi0uv, 1 ? v ? mu, 1 ? u ? q are the corresponding components of ^fi
and fi0, respectively.
For the special case of GLMM described in Section 2.4.3 as a random intercept
model, the following theorem presents the consistency results:
Theorem 2.5.3. Let b00ij(?) be continuous; let w2ijEvar(yijjv0), jxijj be bounded, let
liminf(?N=N) > 0 where ?N = ?min(Pmi=1Pnij=1(xij ? xi)(xij ? xi)t) with xi =
n?1i Pnij=1 xij. and let v0 !p 0. Let cN, dN > 0 be such that limsupja0j_jfl0j=cN < 1
and P(kv0k=dN < 1) ! 1, Mij ? cN(1+jxijj)+dN and ^ = (^a; ^fl;^v) be the maxi-
mizer of lP over ?(M) = f : j?ijj ? Mij;all i;jg and ?N = minij infjhj?Mij b00(h).
Then ^fl !p fl0 and
1
N
mX
i=1
ni(^ai ?a0i)2 !p 0; (2.27)
where ^ai = ^a + ^vi and a0i = a0 + v0i, provided that m=N = o(?2N). If the latter is
strengthened to (min1?i?m ni)?1 = o(?2N), then, in addition, ^a !p a0, and
1
N
mX
i=1
ni(^vi ?v0i)2 !p 0; 1m
mX
i=1
(^vi ?v0i)2 !p 0: (2.28)
Note. It can be shown, by simple example, that ^fi !p fi0 and (2.28) may not hold
without min1?i?m ni !1, even if m=N ! 0.
32
2.5.3 Maximum Conditional Likelihood Estimates in Case 2
In case 2 since m=N 90, we do not have enough information to estimate all
random efiects. Jiang [14] states that it is often possible to estimate with adequacy a
subset of the random efiects, and the ones which are not estimable will be integrated
out. As in case 1, conditionally, the individual efiects may not be identiflable. A
basic technique here is reparameterization which is a map from (fl;fi) to (~fl; ~fi).
Jiang?s [14] reparameterization is stated in the following lemma:
Lemma 2.5.4. There is a map fl 7! ~fl, fi 7! ~fi such that
(i) Xfl+Zfi = ~X~fl+~Z~fi, where (~X; ~Z) is a known matrix of full column rank.
(ii) ~zi = ~z?j, i 2 Sj for some known vector ~z?j, where ~ztj is the ith row of ~Z
and where Sj is deflned below.
Let U be a standard design matrix in the sense that it consists of 0?s and 1?s
and there is exactly one 1 in each row and at least one 1 in each column. The vector
uti is the ith row of U and eM;j is the M-dimensional vector whose jth component
is 1 and whose other components are 0. Let Sj = f1 ? i ? N : ui = eM;jg,
and y(j) = (yi)i2Sj, 1 ? j ? M. Suppose that ?1;:::;?M are independent with
common distribution ?(?=?)=?, where ?(?) is a known density function and ? > 0
is an unknown scale parameter. Furthermore, we assume that there are no random
efiects nested within ?. In notation, this means that zi = z?j, i 2 Sj, 1 ? j ? M,
where z?j = (z?jk)1?k?l.
Let ? = (~fl;?), ? = (~fi;?). Then we have
f(yijfi;?) = f(yij?i); 1 ? i ? N (2.29)
33
where f(?2j?1) denotes the conditional density of ?2 given ?1.
By Lemma 2.5.1, we have
? = ~X~fl + ~Z~fi+U?:
Let ? = (~fl;?), ? = (~fi;?). By (2.29) we have
f(yj?) =
MY
j=1
f(y(j)j?)
and it is easy to show that f(y(j)j?) = gj(zt?j ~fi; ~fl;?), where
gj(s) = E
2
4Y
i2Sj
f(yijs1 +xtis(2) +sr+2?)
3
5;
s(2) = (s2;:::;sr+1) and r is the dimension of ~fl. Note that r ? p. Let n be the
dimension of ~fi, hj(s) = loggj(s), lC(?) = logf(yj?) and lC;j(?) = logf(y(j)j?) =
hj(zt?j ~fi; ~fl; ?). Then
lc(?) =
MX
j=1
lc;j(?)
Let Z? be the matrix whose jth row is zt?j, 1 ? j ? M. Let ?0 and ?0 be the
vectors corresponding to the true parameters and realization of random efiects.
Case 2 Consistency Results.
Under some regularity conditions on derivatives, smoothness, integrability of
densities f(y(j)j?), boundness of hj(s)?s second, third derivatives, etc., Jiang [14]
proved consistency results for estimates of reparameterized parameters (~fl; ~fi) but
not for the original (fl;fi), and one may never recover a consistency result for the
original parameter (fl;fi) from the reparameterized (~fl; ~fi):
Jiang [14] verifles the general conditions for consistency in three examples: for
case 1, the two way crossed logistic model logit(P(yij = 1ja;b)) = ? + ai + bj, the
34
other one is random intercept logistic regression model are analyzed. For case 2 the
only example is two stage nested logistical model logit(P(yijk = 1ja;b)) = ?+ai+bij.
35
Chapter 3
Case 1 random intercept model results.
This Chapter proposes a new estimator Pmi=1 ^wi^fli and establishes its condi-
tional asymptotic normality in the random intercept problem in case 1. Here ^fli
is the maximum likelihood estimator based on the ith group and ^wi is a matrix
weight proportional to the inverse estimated covariance of ^fli. Also this Chapter
extends the consistency results of Jiang [14] to establishing asymptotic normality of
the penalized likelihood estimators for flxed intercept in the case 1 random intercept
problem when there are no regression coe?cients. Furthermore, this Chapter dis-
cusses the di?culties of establishing asymptotic normality of Jiang?s [14] penalized
likelihood estimators in general.
Throughout this chapter we consider the canonical GLMM with
f(yijjvi) = exp[?ijyij ?b(?ij)+c(yij)]
and ?ij = a+vi +xtifl = ai +xtifl.
Assumptions:
(i) vi0 are iid, E(vi0) = 0, Var(vi0) =  2v where  2v is a constant. 9M such that
P[jvi0j? M] = 1.
(ii) 9K0 such that kxijk? K0 for all i, j.
36
3.1 Case 1 random intercept (combine a and vi)
Recalling the Estimating Equation (2.3), for the ith group in the random
intercept model we have the loglikelihood function :
ln( i) =
niX
j=1
(yij?ij ?b(?ij)?C(yij)): (3.1)
where  i = (ai;flti)t. Take Taylor expansion for partial derivative of ln( i) around
 i0:
Ur(^ i) = @ln( ri)@ 
i
(^ i) = Ur( i0)+(rUr( ?i))(^ i ? i0)
= Ur( i0)+(?Fni)(^ i ? i0)+[Fni ?Fni( ?)](^ i ? i0) (3.2)
where  ?i is between ^ i and  i0 and ?Fni( ?) = rUr( ?i).
Because the random efiects vi0 are independently and identically distributed
with mean 0 and bounded and kxijk are uniformly bounded, for all i, j, b0ij is
bounded and b00ij is bounded below and above by positive constants bl and bu, Fur-
thermore, the consistency of ^ i implies that the b(3)ij (~?ij) are bounded with high
probability where ~?ij is between ^?ij and ?ij0 .
Deflne
(Fflfli )?1 = Wbi = (Wbikl)1?kl?s ; 1?l?s
where
Wbikl =
X
j
b00ij
?
xijk ?
P
j xijkb
00
ijP
j b
00
ij
!?
xijl ?
P
j xijlb
00
ijP
j b
00
ij
!
: (3.3)
37
Lemma 3.1.1. Let liminf(?ni=ni) > 0, where ?ni = ?min(Pj(xij ?xi)(xij ?xi)T)
with xi = n?1i Pnij=1xij. Then there exists a positive constant c2 such that for any
h > 0,
inf
kuk=1
uTWbiu
ni >
1
c2: (3.4)
for all ni su?ciently large.
By considering each group separately, we construct a new estimator ^^flw =
Pm
i=1 ^wi^fli. We can treat each group as a conditional GLM, as in Fahrmeir and
Kaufmann [11]. We make the following assumptions: First we assume that (Zi;Xi)
is a full-column rank matrix and deflne a sequence Nni(?), ? > 0, of neighborhoods
of  i0 (the true value of  i) by
Nni(?) = f i : kFT=2ni ( i ? i0)k? ?g
where ni = 1;2;??? and FT=2ni any right square root of the positive deflnite ma-
trix Fni; that is, F1=2ni FT=2ni = Fni. Conditions (D), (C) and (N) of Fahrmeir and
Kaufmann [11] in our setting become:
(1) Divergence: ?minFni !1.
(2) Boundedness from below: for all ? > 0, Fni( i)?cFni is positive semideflnite,
for all i 2 Nni(?), ni ? n1 with some constants n1 = n1(?), c > 0 independent
of ?.
(3) Convergence and continuity: for all ? > 0, max i2Nni(?)kVni( i) ? Ik ! 0,
where Vni( i) = F?1=2ni Fni( i)F?T=2ni is the normed information matrix.
38
Let (^ai; ^fli) be the solution of the score equation in the ith group.
Then, given vi0,
(Fni)T=2
0
BB
@
^ai ?ai0
^fli ?fli0
1
CC
A!d N(0;I): (3.5)
Informally, given vi0,
0
BB
@
^ai
^fli
1
CC
A? AN
0
BB
@
0
BB
@
ai0
fl0
1
CC
A;F?1ni
1
CC
A; (3.6)
where
F?1ni =
0
BB
@
ZTi DiZi ZTi DiXi
XTi DiZi XTi DiXi
1
CC
A
?1
=
0
BB
@
Faai Fafli
Fflai Fflfli
1
CC
A: (3.7)
Here
Fflfli =
?
XTi DiXi ?XTi DiZi?ZTi DiZi??1ZTi DiXi
??1
and
Faai =
?
ZTi DiZi ?ZTi DiXi?XTi DiXi??1XTi DiZi
??1
where Zi = 1ni and Di = diag(b00ij)1?j?ni.
Wechoose ^wi = (Pi(^Fflfli )?1)?1(^Fflfli )?1 and ^^flw = Pi(Pi(^Fflfli )?1)?1(^Fflfli )?1^fli.
Note that a scalar version of ^^flw was proposed by Woolf (1955) to estimate the com-
mon log odds ratio in a 2?2?m contingency table.
Theorem3.1.2. Suppose that Assumptions (i) and (ii) hold and for each i, liminf[n?1i ?ni] >
0, ni(Fni)?1 !p F0i, P(infi ?min(F0i) > ? > 0) > 1?h, P(supi ?max(F0i) < M <
1) > 1?h, where F0i is a positive deflnite matrix and that the following asymptotic
relations are true:
39
(1) There exist positive numbers K1 < K2 such that
K1 < min
i
?min(Fflfli )=max
i
?max(Fflfli ) < K2;
(2)
m=min
i
ni = o(1); max
i
ni=min
i
ni = O(1);
(3)
max
i
??
max(Fni)?1=?min(Fni)?1
? = O(1); X
i
k(Fni)?1k = o(1);
(4)
max
i
E
h
(^fl?i)t^fl?i?((^fl?i)t^fl?i)jvi0
i
= Op(1);
where ^fl?i = (Fflfli )?12(^fli?E(^flijvi0)) and ?(t) is a positive nondecreasing func-
tion on [0;1) such that limt!1?(t) = 1 and t?(t) is a convex function.
Then, given v0,
? mX
i=1
(Fflfli )?1
!?1
2 mX
i=1
(Fflfli )?1(^fli ?fl0) !d N(0;I): (3.8)
Note that Fflfli depends on the unknown (ai;fl) through Di.
Let ^Di = diag(b00ij(^?ij)1?j?ni), where ^?ij = ^ai +xtij^fli, ^^Di = diag(b00ij(^^?ij)1?j?ni)
where ^^?ij = ^ai +xtij^^flw.
Deflne
^Fflfli =
 
XTi ^DiXi ?XTi ^DiZi
?
ZTi ^DiZi
??1
ZTi ^DiXi
??1
:
Then ^Fflfli estimates Fflfli and ^^flw = Pmi=1
?P
m
i=1(^F
flfl
i )
?1
??1
(^Fflfli )?1(^fli ?fl0) .
40
Deflne
^^Fflfl
i =
 
XTi ^^DiXi ?XTi ^^DiZi
?
ZTi ^^DiZi
??1
ZTi ^^DiXi
??1
:
Then ^^Fflfli estimates Fflfli .
Theorem 3.1.3. Let the hypotheses of Theorem 3.1.2 hold. Then we have
(1)  
  
  
 
? mX
i=1
(Fflfli )?1
!1
2
? mX
i=1
(^Fflfli )?1
!?1
2
?I
  
  
  = op(1);
(2)
k(^Fflfli )?1 ?(Fflfli )?1k = Op(pni);
(3)  
  
  
 
?X
i
(Fflfli )?1
!?1
2
  
  
  = Op(1=
pN); k(Fflfl
i )
1=2k = Op(1=pni):
Furthermore, given v0, we have
? mX
i=1
(^Fflfli )?1
!?1
2 mX
i=1
(^Fflfli )?1(^fli ?fl0) !d N(0;I): (3.9)
Under the same hypothesis with (1), (2) and (3) in Theorem 3.1.4 modifled by re-
placing ^Fflfli by ^^Fflfli , given v0,
? mX
i=1
(^^Fflfli )?1
!?1
2 mX
i=1
(^^Fflfli )?1(^fli ?fl0) !d N(0;I): (3.10)
According to Theorem 3.1.2 and 3.1.3, given the random efiects v0, the nor-
malized versions of ^flw and ^^flw converge in distribution to N(0;I). Since this conver-
gence holds for almost all realizations of v0, we can also claim that the convergence
in distribution holds unconditionally. This is formalized in the following Corollary.
41
Corollary 3.1.4. Under the hypotheses of Theorem 3.1.2, the convergence state-
ments (3.8), (3.9), and (3.10) hold unconditionally (except on a set of v0-probability
zero).
3.2 Penalized likelihood estimator ^a in the case 1 random intercept
model when fl=0
From Jiang [14] we can write lP( ) as follows:
lP( ) =
mX
i=1
niX
j=1
(yij?ij ?b(?ij))? ?12 mv2 (3.11)
where ?ij = a + vi and there are no regression coe?cient. Here the penalty term
(?1=2)mv2 is used to take care of the identiflability problem and we can estimate a
and vi separately.
We will write
Bij = b00ij + 12b(3)ij (??ij)(^?ij ??ij0) = b00ij +rij (3.12)
where
rij = 12b(3)ij (??ij)(^?ij ??ij0) (3.13)
and
~B = diag(Bij)1?i?m ; 1?j?n
i: (3.14)
Taylor expansion of @lP( )=@ around  0 takes the following matrix form:
0
BB
@
^a?a0
^v?v0
1
CC
A = (Q?)?1
0
BB
@
1TN(y??)
ZT(y??)??1v01tm
1
CC
A (3.15)
42
where
Q? =
0
BB
@
1TN ~B1N 1TN ~BZ
ZT ~B1N ZT ~BZ+?1=mJ
1
CC
A =
0
BB
@
Q?11 Q?12
Q?21 Q?22
1
CC
A: (3.16)
Let Znew = (1N;Z) and note that Q? is a nonsingular matrix since
Q? = eAteA+ eHteH:
where
eA = ~B12Znew eH = (0
p?
1p
m1
t
m)
Here eA is an N?(m+1) matrix with rank m, since Znew is not of full column
rank matrix, and eH is a 1 ? (m+1) matrix with rank 1. However, eA and eH are
complementary matrices since Sp(eAt) \ Sp(eHt) = f0g, we have Q? is a full-rank
matrix with rank m+1.
Then
(Q?)?1 =
0
BB
@
(Q?11:2)?1 ?(Q?11:2)?1Q?12(Q?22)?1
?(Q?22)?1Q?21(Q?11:2)?1 (Q?22:1)?1
1
CC
A: (3.17)
Since we?re only interested in the ^a terms,
(^a?a0) = (Q?11:2)?1?1TN(y??)?Q?12(Q?22)?1ZT(y??)?
+(Q?11:2)?1Q?12(Q?22)?1?1v01m: (3.18)
Here we choose pm as normalizer. Then
pm(^a?a
0) =
pm(Q?
11:2)
?1?1T
N(y??)?Q
?
12(Q
?
22)
?1ZT(y??)?
+pm(Q?11:2)?1Q?12(Q?22)?1?1v01m: (3.19)
43
Theorem 3.2.1. We assume the following conditions, which are equivalent to the
hypotheses of Theorem 2.5.4 (Jiang [14], Theorem 2.2).
(J1) E(Var(yijjv0)) and Pk x2ijk are bounded.
(J2) Let ?N = ?min(PiPj(xij ?xi)(xij ?xi)t) with xi = n?1i Pnij=1xij and ?N =
mini;j infjhj?Mij b00ij(h), liminf(?N=N) > 0 , m=N = o(?2N), (min1?i?m ni)?1 =
o(?2N) and v0 !p 0.
(J3) Let cN; dN > 0 be such that limsup(ja0j_jfl0j)=cN < 1 and P(kv0k=dN <
1) ! 1, Mij ? cN(1+(Pk x2ijk)12)+dN and let ^ = (^a; ^fl;^v) be the maximizer
of lP over ?(M) = f : j?ijj? Mij; all i;jg.
Then
pm(^a?a
0) !d N(0; 2v):
3.3 Discussion of Jiang [14] penalized likelihood estimator ^fl in case
1 random intercept model (fl 6= 0)
According to Jiang [14], for the random intercept model the penalized loglike-
lihood is stated lP( ) as equation (??) in Section 2.5.2. Here for simplicity, we let
wij = 1 so that lP( ) becomes the following:
lP( ) =
mX
i=1
niX
j=1
(yij?ij ?b(?ij))? ?12 mv2: (3.20)
By Theorem 2.5.4 in Section 2.5.4 (Jiang [14], Theorem 2.2), ^ is the maximizer of
lP( ) over ?(M). The penalizer here takes care of the identiflability problem and
44
when combining a with vi, (X;Z) is a full-column rank matrix we do not need a
penalizer term.
Based on Jiang?s [14] consistency results (^a !p a0; ^fl !p fl0; (1=m)Pmi=1(^vi?
vi0)2 !p; (1=N)Pmi=1 ni(^vi ?vi0)2 !p 0) and (J1), (J2) and (J3) in Section 3.2, we
tried to establish conditional asymptotic normality for (^fl ? fl0), given v0, under
certain conditions. Taylor expansion was used to solve lp( ) to write the estimators
as 0
BB
@
^fl?fl0
^a?a0
1
CC
A = Q?1gN (3.21)
where
gN =
0
BB
@
XT(y??)
ZT(y??)
1
CC
A
and
Q =
0
BB
@
XT ~BX XT ~BZ
ZT ~BX ZT ~BZ
1
CC
A =
0
BB
@
Q11 Q12
Q21 Q22
1
CC
A: (3.22)
Let
Q0 =
0
BB
@
XTDX XTDZ
ZTDX ZTDZ
1
CC
A =
0
BB
@
Q110 Q120
Q210 Q220
1
CC
A: (3.23)
We will write
Bij = b00ij + 12b(3)(??ij)(^?ij ??ij0) (3.24)
and
~B = diag(Bij)1?i?m ; 1?j?n
i D = diag(b
00
ij(?ij0))1?i?m; 1?j?ni (3.25)
Recall that Z is a block diagonal matrix, Z = diag(1ni)1?i?m, and X is (X1;:::;Xs)
without the flrst column 1N, after combining flxed intercept a with the random
45
efiects vi to obtain ai = a+vi. Note that Q is a nonsingular matrix since (X;Z) is
a full column rank matrix (not all elements of X can be equal).
Then
Q?1 =
0
BB
@
Q?111:2 ?Q?111:2Q12Q?122
?Q?122 Q21Q?111:2 Q?122:1
1
CC
A (3.26)
where
Q11:2 = Q11 ?Q12Q?122 Q21
Q22:1 = Q22 ?Q21Q?111 Q12:
Since we are only interested in the fl terms, we flnd that
^fl?fl0 = Q?111:2'XT(y??)?Q12(Q22)?1ZT(y??)?: (3.27)
Let
W? = XTDX?(XTDZ)(ZTDZ)?1ZTDX: (3.28)
Here we choose (W?)?12Q11:20 as the normalizer. Then
(W?)?12Q11:20(^fl?fl0)
= (W?)?12Q11:20Q?111:2'XT(y??)?Q12(Q22)?1ZT(y??)?
= I +II +III (3.29)
where
I = (W?)?12 'XT(y??)?Q120(Q220)?1ZT(y??)?; (3.30)
II = (W?)?12 ?Q11:20Q?111:2 ?I?XT(y??); (3.31)
III = (W?)?12 'Q120Q?1220 ?Q11:20Q?111:2Q12Q?122 ?Zt(y??): (3.32)
46
We can prove that under regularity conditions, given v0, I !d N(0;I) and II !p 0.
Consider III, We can write III as
III = (W?)?12 'I?Q11:20Q?111:2?Q120Q?1220Zt(y??) (3.33)
+(W?)?12Q11:20Q?111:2'Q120Q?1220 ?Q12Q?122 ?Zt(y??): (3.34)
we can also prove that k(W?)?12kpN = Op(1), kI?Q11:20Q?111:2k = op(1) and the
norm of (3.33) goes to 0. where
Q120Q?1220Zt(y??) =
?X
i
(P
j xijkb
00
ij
P
j(yij ?b
0
ij)P
j b
00
ij
)!
1?k?s
and we have
?Q
120Q?1220 ?Q12Q?122
?Zt(y??)
=
?X
i
(P
j xijkb
00
ijP
j b
00
ij
?
P
j xijkb
00
ij(?
?
ij)P
j b
00
ij(?
?
ij)
)X
j
(yij ?b0ij)
!
1?k?s
Consider the norm of (3.34)
k(W?)?12Q11:20Q?111:2'Q120Q?1220 ?Q12Q?122 ?Zt(y??)k
?kQ11:20Q?111:2kk(W?)?1=2kk'Q120Q?1220 ?Q12Q?122 kZt(y??)k
? Op(1)pN k'Q120Q?1220 ?Q12Q?122 ?Zt(y??)k
= Op(1)
0
@X
k
( mX
i=1
ni
(P
j xijkb
00
ijP
j b
00
ij
?
P
j xijkb
00
ij(?
?
ij)P
j b
00
ij(?
?
ij)
)
1
ni
X
j
(yij ?b0ij)
)21
A
1
2
(3.35)
We conjecture 1=niPj(yij ?b0ij) = Op(1=pni) and that ^?ij ??ij0 = Op(1=pni). If
so, inside of square of (3.35) would be Op(m=pN) ? Op(pm=mini ni). However,
we are unable to calculate the convergence rate of (^?ij ??ij0) because the number
47
of random efiects goes to inflnity. Li, Lindsay and Waterman [18] examined the
asymptotic behavior of MLE?s of a scalar fl as the number of nuisance parameters
goes to inflnity. They found that under rectangular asymptotics (m = cn, c flxed,
m ! 1) pN(^fl1 ?fl0) !d N(?;I?1(fl)) when ? is possibly nonzero and I?1(fl) is
the Fisher information. They proposed an alternative estimator (based on projected
scores) which satisfled pN(^fl2 ? fl0) !d N(0;I?1(fl)). In the Chapter 6 and Ap-
pendix A simulation studies, we flnd that in Logistic and Poisson random intercept
with combined flxed intercept a and random efiects vi, consistency and conditional
asymptotic normality results appear to hold.
3.4 Proofs.
3.4.1 Proof of Lemma 3.1.1
Recall the deflnition of Wbi in (3.3). We have
uTWbiu =
niX
j=1
b00ij
? sX
k=1
uk
?
xijk ?
Pni
l=1 xilkb
00
ilP
ni
l=1 b
00
il
 !2
:
Let
~Wi = diag(?i)??i?Ti
where
?i =
2
66
66
66
4
b00i1=Pj b00ij
...
b00ini=Pj b00ij
3
77
77
77
5
ni?1
:
48
Then
1
niu
TWbiu = 1
niu
T X
j
b00ijXTi ~WiXiu
= 1n
i
uT
X
j
b00ij(Xi ?Xi)T ~Wi(Xi ?Xi)u
where
Xi = (Xi1;:::;Xis);
Xik =
2
66
66
66
4
xi1k
...
xinik
3
77
77
77
5
ni?1
;
Xi = (xi11ni;:::;xis1ni);
and xik = n?1i Pj xijk.
Therefore
inf
kuk=1
uT
?P
j b
00
ij(Xi ?Xi)
T ~Wi(Xi ?Xi)
?
u
ni
= inf
kuk=1
P
j b
00
ij
?
uT(Xi ?Xi)T ~Wi(Xi ?Xi)u
?
ni
? inf
kuk=1
P
j b
00
ij?2( ~Wi)
?uT(X
i ?Xi)T(Xi ?Xi)u
?
ni :
Theboundabove?2( ~Wi)isthesecondsmallesteigenvalueof ~Wi, because(Xi?Xi)u
is orthogonal to 1ni which is the eigenvector corresponding to the unique eigenvalue
0.
Here we use the deflnition of the second smallest eigenvalue of Wi:
49
uT(Xi ?Xi)T ~Wi(Xi ?Xi)u
uT(Xi ?Xi)T(Xi ?Xi)u
? inf
kuk=1
uT(Xi ?Xi)T ~Wi(Xi ?Xi)u
uT(Xi ?Xi)T(Xi ?Xi)u = ?2(
~Wi):
Now we need to flnd a lower bound of ?2( ~Wi). Stewart?s [31] Theorem 5.1 states
the following result:
Let fi1 ? fi2 ? ??? ? fin, fl1 ? fl2 ? ??? ? fln and  1 ?  2 ? ??? ?  n be the
eigenvalues of the real Hermitian matrices A, B and C = A+B.
Then
fii +fl1 ?  i ? fii +fln:
In our case we set A = ??i?Ti and B = diag(?i), and by the special structure of A
we have fi2 = ??? = fini = 0 and fi1 = ??Ti ?i = ?Pj ?2j.
Then
min
j
b00ijP
j b
00
ij
? ?2( ~Wi): (3.36)
Consider
inf
kuk=1
uTWbiu
ni
= inf
kuk=1
uTfPj b00ij(?ij0)(Xi ?Xi)T ~Wi(Xi ?Xi)gu
ni
? inf
kuk=1
uTfPj b00ij(?ij0)(Xi ?Xi)T?2( ~Wi)(Xi ?Xi)gu
ni
? inf
kuk=1
uTfminj b00ij(?ij0)(Xi ?Xi)T(Xi ?Xi)gu
ni >
1
c2:
where c2 is a suitably large constant. Equivalently Wbi is a positive deflnite matrix.
50
3.4.2 Proof of Theorem 3.1.2
It is easy to prove that of ^flw?fl0, where ^flw = Pi
?P
i(F
flfl
i )
?1
??1
(Fflfli )?1^fli,
then ^flw !p fl0. by using the result of Lemma 3.1.1 and conditions of Theorem 3.1.2
because we have
k^flw ?fl0k
?
X
i
  
  
  
?X
i
(Fflfli )?1
!?1   
  
 
k(Fflfli )?1=2kk(Fflfli )?1=2(^fli ?fl0)k
? Op(1)
K1 mini
q
?max(Fflfli )?1
? Op(1)=K1min
i
pn
i
pb
l liminf(?ni=ni)
= op(1):
Here we use the fact that (Fflfli )?1=2(^fli ?fl0) is Op(1), as proved by Fahrmeir and
Kaufmann [11].
Since
(
X
i
(Fflfli )?1)?12
X
i
(Fflfli )?1(^fli ?fl0)
= (
X
i
(Fflfli )?1)?12
X
i
(Fflfli )?1(^fli ?E(^flijvi0)+E(^flijvi0)?fl0);
let u be a unit vector and
Ti = ut
?X
i
(Fflfli )?1
!?1
2
(Fflfli )?1(^fli ?E(^flijvi0)) (3.37)
= ut
?X
i
(Fflfli )?1
!?1
2
(Fflfli )?12(Fflfli )?12(^fli ?E(^flijvi0)): (3.38)
Recall ^fl?i = (Fflfli )?12(^fli ?E(^flijvi0)). By the Cauchy-Schwartz inequality we have
T2i ?
8<
:u
t
?X
i
(Fflfli )?1
!?1
2
(Fflfli )?1
?X
i
(Fflfli )?1
!?1
2
u
9=
;(
^fl?i)t^fl?i (3.39)
? ?max
?X
i
(Fflfli )?1
!?1
?max(Fflfli )?1(^fl?i)t^fl?i (3.40)
51
By Lemma 3.1.1 we have nik(Wbi)?1k = ni?maxFflfli = O(1), and because of bound-
ness of Pk x2ijk and b00ij
1
niu
tWbiu = 1
ni
X
j
b00ij
(X
k
uk(xijk ?
P
j xijkb
00
ijP
j b
00
ij
)
)2
= O(1):
Equivalently we have (?minFflfli )?1 = kWbik = Op(ni) and (1) of Theorem 3.1.3
follows. Now we check the Lindeberg condition:
mX
i=1
E(T2i I(jTij > ")jvi0) =
mX
i=1
E(T2i I(T2i > "2)jvi0)
?
?X
i
??1maxFflfli min
i
?minFflfli
!?1
?
mX
i=1
E
"
(^fl?i)t^fl?iI
?
(^fl?i)t^fl?i > "2?min(Fflfli )?min(
X
i
(Fflfli )?1)
!
jvi0
#
? maxi ?maxF
flfl
i
mmini ?minFflfli
mX
i=1
E[(^fl?i)t^fl?i?((^fl?i)t^fl?i)jvi0]
?
(
?
?
"2?min(Fflfli )(
X
i
??1maxFflfli )
!)?1
? Op(1)maxi ?maxF
flfl
i
mini ?minFflfli
(
?
?
mmini ?minFflfli
maxi ?maxFflfli
!)?1
= op(1)
since by condition (1) and (4), the argument of ? goes to 1.
According to equation (3.2) we have
l0n(^ i) = l0n( i0)?Fni(^ i ? i0)+(Fni ?Fni( ?))(^ i ? i0)
where  ?i is between  i0 and ^ i.
Taking the expectation on both sides and conditioning on vi0 we have
E[(^ i ? i0)jvi0]
= (Fni)?1E?(Fni ?Fni( ?))F?T=2ni FT=2ni (^ i ? i0)jvi0?
= (Fni)?1E?(1=pni)(Fni ?Fni( ?))pniF?T=2ni FT=2ni (^ i ? i0)jvi0?
52
Let u be a unit vector and consider
(1=pni)kFni ?Fni( ?i)k
= (1=pni) sup
ku=1k
flfl
flfl
flfl
X
j
(b00ij ?b00ij(??ij))
?
u1 +
sX
k=1
uk+1xijk
!2flflfl
flfl
fl
? (2sK0 +2)(1=ni)
X
j
pn
ijb
00
ij ?b
00
ij(?
?
ij)j
? (2sK0 +2)(1=2ni)
X
j
pn
ijb
(3)
ij (~?ij)jj?ij ? ^?ijj (3.41)
= Op(1): (3.42)
Since we have b(3)ij (~?ij) bounded and according to Fahrmeir and Kaufmann [11], we
have, given vi0,
pn
i(^?ij ??ij0) ? AN(0;ni(Faai +F
afl
i xij +x
t
ijF
fla
i +x
t
ijF
flfl
i xij)): (3.43)
In addition, by the assumptions of Theorem 3.1.3 we can get from (3.41) to
(3.42) and pnik(Fni)?T=2k = Op(1).
Then
kE[(^fli ?fl0)jvi0]k?kE[(^ i ? i0)jvi0]k?k(Fni)?1k?Op(1):
Now we consider the bias term
?X
i
(Fflfli )?1
!?1
2 X
i
(Fflfli )?1(E(^flijvi0)?fl0): (3.44)
53
Then
0
@ut
?X
i
(Fflfli )?1
!?1
2 X
i
(Fflfli )?1(E(^flijvi0)?fl0)
1
A
2
? m
mX
i=1
?
ut(
X
i
(Fflfli )?1)?12(Fflfli )?1(E(^flijvi0)?fl0)
!2
? m
mX
i=1
ut(
X
i
(Fflfli )?1)?12(Fflfli )?1(
X
i
(Fflfli )?1)?12u
?(E(^flijvi0)?fl0)t(Fflfli )?1(E(^flijvi0)?fl0)
? m
8
<
:maxi ?max(F
flfl
i )
?1?max
?X
i
(Fflfli )?1
!?19=
;
?
mX
i=1
(E(^flijvi0)?fl0)t(Fflfli )?1(E(^flijvi0)?fl0)
? maxi ?maxF
flfl
i
mini ?minFflfli
mX
i=1
(E(^flijvi0)?fl0)t(Fflfli )?1(E(^flijvi0)?fl0)
? Op(1)
mX
i=1
kE(^flijvi0)?fl0k2k(Fflfli )?1k
? ?2Op(1)
mX
i=1
k(Fni)?1k2k(Fflfli )?1k (3.45)
? ?2Op(1)max
i
?max(Fni)?1
?min(Fni)?1
mX
i=1
k(Fni)?1k (3.46)
= op(1)
The step from (3.45) to (3.46) follows Schott?s [29] Theorem 3.20:
If A is a m?m symmetric matrix and Ak is a leading k?k principal submatrix
we have the following inequality:
?m?i+1(A) ? ?k?i+1(Ak) ? ?k?i+1(A)
where i = 1;??? ;k and ?1 is the maximum eigenvalue of A. Here we chose A =
(Fni)?1 and Ak = Fflfli .
54
By the Central Limit Theorem and Slutsky?s Theorem, we have, given v0,
?X
i
(Fflfli )?1
!?1
2 X
i
(Fflfli )?1(^fli ?fl0) !d N(0;I):
3.4.3 Proof of Theorem 3.1.3
If we let V = Pi(Fflfli )?1 and ^V = Pi(^Fflfli )?1, then
k^^flw ?fl0k
?k^^flw ? ^flwk+k^flw ?fl0k
=
  
  
 
mX
i=1
?^
V?1
n
[(^Fflfli )?1Fflfli ?I]V+V? ^V
o?
wi(^fli ?fl0)
  
  
 +op(1)
?
X
i
  
 
?^
V?1
n
[(^Fflfli )?1Fflfli ?I]V+V? ^V
o?  
 kwi(^fli ?fl0)k+op(1)
= op(1)
Since we can prove
  
 
?^
V?1
n
[(^Fflfli )?1Fflfli ?I]V+V? ^V
o?  
 
?k^V?1kk[(^Fflfli )?1Fflfli ?I]kkVk+k^V?1kkV? ^Vk (3.47)
= op(1) (3.48)
Consider 
  
  
 
mX
i=1
2
4
?X
i
(^Fflfli )?1
!?1
2
(^Fflfli )?1 ?
?X
i
(Fflfli )?1
!?1
2
(Fflfli )?1
3
5(^fli ?fl0)
  
  
  
= k
mX
i=1
?^
V?12(^Fflfli )?1 ?V?12(Fflfli )?1
?
(^fl?fl0)k
= k
mX
i=1
V?12
?
V12 ^V?12(^Fflfli )?1 ?(Fflfli )?1
?
(Fflfli )12(Fflfli )?12(^fli ?fl0)k
?
mX
i=1
kV?12kkV12 ^V?12(^Fflfli )?1 ?(Fflfli )?1kk(Fflfli )12kk(Fflfli )?12(^fli ?fl0)k
? I1 +I2
55
where
I1 =
mX
i=1
kV?12kkV12 ^V?12 ?Ikk(^Fflfli )?1kk(Fflfli )12kk(Fflfli )?12(^fli ?fl0)k
I2 =
mX
i=1
kV?12kk(^Fflfli )?1 ?(Fflfli )?1kk(Fflfli )12kk(Fflfli )?12(^fli ?fl0)k
Consider
kV12 ^V?12 ?Ik = k(V12 ? ^V12)^V?12k (3.49)
?kV? ^Vk12k^V?12k (3.50)
The step from (3.49) to (3.50) follows from Theorem X.1.1 of Bhatia?s [3]. A function
f is said to be matrix monotone of order n if it is monotone with respect to this
order n ? n Hermitian matrices, i.e., if A ? B implies f(A) ? f(B). If f is
matrix monotone of order n for all n then we say f is matrix monotone or operator
monotone.
Then the following theorem holds:
Let f be an operator monotone function on [0;1] such that f(0) = 0. Then
for all positive operators A, B,
kf(A)?f(B)k? f(kA?Bk):
Here our operator monotone function is the square root function.
56
In addition, we have
1
NkV?
^Vk = 1
Nk
X
i
(Fflfli )?1 ?
X
i
(^Fflfli )?1k
? 1N
X
i
k(Fflfli )?1 ?(^Fflfli )?1k
= 1N
X
i
ni
  
  1
ni(F
flfl
i )
?1 ? 1
ni(
^Fflfli )?1
  
  :
Since here we have the assumption that jvi0j is bounded, the assumption of
Theorem 3.1.3 and the result of Lemma 3.1.2, considering condition (3) of Theorem
3.1.4 we have
NkV?12k2 = Nk
?X
i
(Fflfli )?1
!?1
k = N
?
?min(
X
i
(Fflfli )?1)
!?1
? N
?X
i
?min(Fflfli )?1
!?1
= N
?X
i
ni 1n
i
??1maxFflfli
!?1
= O(1):
and
1
ni
?
(Fflfli )?1 ?(^Fflfli )?1
?
= 1n
i
?X
j
xijkxijl(b00ij ?b00(^?ij))+ (
P
j xijkb
00
ij)(
P
j xijlb
00
ij)P
j b
00
ij
?(
P
j xijkb
00
ij(^?ij))(
P
j xijlb
00
ij(^?ij))P
j b
00
ij(^?ij)
!
1?k?s;1?l?s
Therefore,
1
nik((F
flfl
i )
?1 ?(^Fflfl
i )
?1)k? II1 +II2
57
where
II1 = 1n
i
sup
kuk=1
flfl
flfl
fl
X
j
(b00ij(^?ij)?b00ij)(
X
k
ukxijk)2
flfl
flfl
fl (3.51)
II2 = 1n
i
sup
kuk=1
flfl
flfl
fl
?(P
k uk
P
j xijkb
00
ij)
2
P
j b
00
ij
? (
P
k uk
P
j xijkb
00
ij(^?ij))
2
P
j b
00
ij(^?ij)
!flfl
flfl
fl: (3.52)
Let rij = b00ij(^?ij)?b00ij(?ij0) and consider
flfl
flfl
fl
(Pk ukPj xijkb00ij)2P
j b
00
ij
? (
P
k uk
P
j xijkb
00
ij(^?ij))
2
P
j b
00
ij(^?ij)
flfl
flfl
fl
=
flfl
flfl
fl
P
j b
00
ij(^?ij)(
P
k uk
P
j xijkb
00
ij)
2 ?P
j b
00
ij(
P
k uk
P
j xijkb
00
ij(^?ij))
2
P
j b
00
ij
P
j b
00
ij(^?ij)
flfl
flfl
fl
?
flfl
flfl
fl
(Pk ukPj xijkrij)2P
j b
00
ij
flfl
flfl
fl+
flfl
flfl
fl
2(Pk ukPj xijkb00ij(^?ij))(Pk ukPj xijkrij)P
j b
00
ij
flfl
flfl
fl
+
flfl
flfl
fl
P
j rij(
P
k uk
P
j xijkb
00
ij(^?ij))
2
P
j b
00
ij
P
j b
00
ij(^?ij)
flfl
flfl
fl;
By the Cauchy-Schwartz inequality
II1 ? 1n
i
sup
kuk=1
flfl
flfl
fl
X
j
(b00ij(^?ij)?b00ij)
X
k
u2k
X
k
x2ijk
flfl
flfl
fl
? K
02
ni
X
j
j12rijj;
and
II2 ? 1n
i
(PkPj x2ijk)(Pj r2ij)P
j b
00
ij
+ 1n
i
(Pj jrijj)Pk(Pj xijkb00ij(^?ij))2P
j b
00
ij(^?ij)
P
j b
00
ij
+ 1n
i
sup
kuk=1
(Pk jukjPj jxijkjb00ij(^?ij))(Pk jukjPj jxijkjjrijj)P
j b
00
ij
?
 K02 +2sK02
4bl
? 1
ni
X
j
j12rijj+
 K02
bl
? 1
ni
X
j
 1
2rij
?2
:
Since Psk=1 x2ijk < K0 where K0 is a positive constant and according to Fahrmeir
and Kaufmann [11], given vi0, we have (3.43).
58
By the delta method and assumptions of Theorem 3.1.3, we obtain
k 1n
i
?
(Fflfli )?1 ?(^Fflfli )?1
?
k = Op(1=pni):
BytheassumptionofTheorem3.1.2, wehave, (1=N)kV?^Vk? Op(1)=NPipnici =
op(1) and conditions (1) and (2) of Theorem 3.1.3 are satisfled.
We can prove (1=N)kV ? ^Vk = op(1) and NkV?1k = O(1) which leads the
result Nk^V?1k = Op(1). Also it easy to get (pni=N)kV ? ^Vk = Op(1) and
(1=ni)k(^Fflfli )?1k = Op(1), then we can get from (3.47) to (3.48) which leads the
conclusion that ^^flw is consistent. By (2) of Theorem 3.1.2, we have I1 !p 0 and
I2 !p 0. The result of Theorem 3.1.3 follows.
The similar argument follows for replacing ^Fflfli by ^^Fflfli since by the conclusion
of Theorem 3.1.3, we have, according to Fahrmeir and Kaufmann [11] and Slutsky?s
Theorem, given vi0,
pn
i(^^?ij ??ij0) ? AN(0;niFaai ):
By the delta method and assumptions of Theorem 3.1.2, we obtain
k 1n
i
?
(Fflfli )?1 ?(^^Fflfli )?1
?
k = Op(1=pni):
The asymptotic normality result follows.
3.4.4 Proof of Corollary 3.1.4
We prove the unconditional convergence in (3.8); the same method applies to
(3.9) and (3.10).
Write Zm; N =
?P
m
i=1fF
flfl
i g
?1
??1=2
(^fli ? fl0), Let B be any p-dimensional
rectangle with rational coordinates for all vertices. We know from Theorem 3.1.2
59
that
P[Zm; N 2 Bjv0] ! P(Z 2 B)
where Z ? N(0;I).
From properties of conditional expectation, E[P(Zm; N 2 Bjv0)] = P[Zm; N 2
B]. where the expectation is taken over the distribution of v0. Almost surely,
P[Zm; N 2 Bjv0] ? 1, so we can use the Dominated Convergence Theorem to write
limP[Zm; N 2 B] = limE[P(Zm; N 2 Bjv0)]
= E[limP(Zm; N 2 Bjv0)]
= E[P(Z 2 B)] = P(Z 2 B):
In fact, since the rationals are countable, the above convergence holds simul-
taneously over all rectangles whose vertices have rational components, and hence
for all Borel sets.
3.4.5 Proof of Theorem 3.2.1
Consider (Q?22)?1:
(Q?22)?1 = (ZT~BZ+ ?1mJ)?1 = (ZT~BZ)?12(I+A)?1(ZT~BZ)?12 (3.53)
where
A = (ZT~BZ)?12 ?1mJ(ZT~BZ)?12:
60
Since ZT~BZ is a symmetric matrix, using the l2 norm of a symmetric matrix we
have:
k(ZT~BZ)?12k2 = sup
kuk=1
juT(ZT~BZ)?12(ZT~BZ)?12uj
= k(ZT~BZ)?1k:
Then
kAk?k(ZT~BZ)?12k2k?1mJk = j?1jmin
ijE?ij
:
Recall that Bij deflned in (3.24) is positive and lies between b00ij and b00ij(^?ij).
By Jiang?s [14] condition (J2), we have the following:
kAk? j?1jmin
ijE?ij
? 1=(n??N) = o(1=pn?):
where n? = mini ni.
By applying the Neumann series we have
(Q?22)?1 = (ZT~BZ)?12(I?A+A2 ?A3 +???)(ZT~BZ)?12
After the calculation we can get a closed form of (Q?22)?1 as the following:
(Q?22)?1 =
2
66
66
66
4
(E?1)?1 +(E?1)?2C??1 ??? (E?1)?1(E?m)?1C??1
... ... ...
(E?1)?1(E?m)?1C??1 ??? (E?m)?1 +(E?m)?2C??1
3
77
77
77
5
m?m
(3.54)
where
C??1 =
1X
i=1
(?1m)i(
mX
k=1
(E?k)?1)i?1(?1)i = ? ?1?
1
Pm
k=1(E
?
k)?1 +m
: (3.55)
and E?i = Pj Bij.
61
Recall (3.19). We have
pm(^a?a
0) =
pm(Q?
11:2)
?1?1T
N ?Q
?
12(Q
?
22)
?1ZT?(y??)
+pm(Q?11:2)?1Q?12(Q?22)?1?1v01m
where Q?12 = (E?1; ::: ;E?m) and Q?11:2 = ?m2C??1: Then,
jpm(Q?11:2)?1?1TN(y??)?Q?12(Q?22)?1ZT(y??)?j
= 'pmj(Q?11:2)?1?1TN ?Q?12(Q?22)?1ZT?(y??)j?
= pm
flfl
flfl
fl
1
m
X
i
(E?i )?1
X
j
(yij ?b0ij)
flfl
flfl
fl
? op(1)pmn
?
j
X
i
X
j
(yij ?b0ij)j
= op(1): (3.56)
Since by (J1) we have 1=pNjPiPj(yij ?b0ij)j = Op(1).
Then pm(^a?a0) and pm(Q?11:2)?1Q?12(Q22)?1?1v01m have the same asymp-
totic distribution. Furthermore,
pm(Q?
11:2)
?1Q?
12(Q22)
?1?1v01m
= pm ?1m2C?
?1
?
mv0?1 +v0?1mC??1
X
i
(E?i )?1
!
= ? ?1mC?
?1
pmv
0 ?
v0?1p
m
X
i
(E?i )?1:
Since by (J1), condition (2) of Theorem 3.1.3 and v0 !p 0,
? ?1mC?
?1
= (1+ ?1
P
i(E
?
i )
?1
m ) !p 1 (3.57)
v0?1
P
i(E
?
i )
?1
pm !p 0 (3.58)
62
and the fact that the vi0 are iid with mean 0 and variance  2v, by the Central Limit
Theorem
pmv
0 !D N(0; 2v):
By Slutsky?s theorem, asymptotic normality follows.
63
Chapter 4
Logistic 2?2?m table
This Chapter illustrates conditional asymptotic normality results of Theorem
3.1.3 and 3.1.4. In order to check the asymptotic results, simulations are performed
to explore the asymptotic properties of our estimator. We also apply the estimator
to a real data set to compare with Mantel-Haenszel estimator.
4.1 Logistic 2?2?m table
We use this example to illustrate Theorem 3.1.2 and 3.1.3. The 2 ? 2 ? m
table example is set up as the following:
logitP(yij = 1jx) = fii +flxij
where xij=1 or 0 and the table is the following:
Table 4.1: 2?2?m table
y = 0 y = 1
x = 0 a1 b1
x = 1 c1 d1
...
y = 0 y = 1
x = 0 am bm
x = 1 cm dm
For 2?2?m tables, there are two types of models:
Model I : the number of tables m remains flxed but individual cell sizes increase
without bound.
64
Model II : the number of tables m increases but the cell sizes remained bounded.
Breslow [5] studied the properties of four commonly used estimators of the odds
ratio in Model II:
Consider a series of m pairs of independent binomial observations (di;bi) with
denominators (ni = di + ci; mi = ai + bi) and success probabilities (p1i; p0i) for
i = 1; ::: ;m. Its assumed throughout that the odds ratio ? = (p1iq0i)=(p0iq1i)
remains constant from table to table.
One of the earliest estimators (Woolf, 1955) of the common odds ratio ? is
the empirical logit estimator deflned by
log(^?G) =
" mX
i=1
wi log
?(a
i +?)(di +?)
(ci +?)(bi +?)
 #,?X
i
wi
!
where the weights are
wi = (1=(ai +?)+1=(bi +?)+1=(ci +?)+1=(di +?))?1
and ? > 0 is a constant added to each cell to avoid zero denominators. The choice
? = 1=2 is the most popular because it is thought to reduce the bias in small
examples (Anscombe, 1956). There are unconditional MLE and conditional MLE
(the details omitted here). The fouth and flnal estimator is given by famous formula
of Mantel and Haenszel (1959):
^?MH =
P
i(aidi=Ni)P
i(cibi=Ni)
where Ni = ai +bi +ci +di = mi +ni. Due largely to its simplicity, ^?MH has been
widely used by practicing statisticians and epidemiologists. Breslow [5] showed that
65
in the Model II setting, the empirical logit estimator does not converge to the true
odds ratio. The Mantel-Haenszel estimator is consistent and retains good e?ciency
even for moderately large odds ratios in the Model II setting (sparse data).
Let Ri = aidi=Ni and Si = cibi=Ni, Breslow [5] proposed an empirical estimate
of Var[^?MH] as
mdVarE(^?MH) =
P
i(Ri ? ^?MHSi)=m
(Pi Si=m)2
Robins, Breslow and Greenland [25] proposed a new estimator of Var[^?MH] as:
mdVarUS(^?MH) = m
?P
i PiRi
2R2+ +
P
i(PiSi +QiRi)
2R+S+ +
P
i QiSi
2S2+
?
(^?MH)2
where Pi = (ai + di)=Ni, Qi = (ci + bi)=Ni, R+ = Pi Ri and S+ = Pi Si. The
corresponding estimators of mVar(log ^?MH) are
mVi = mdVari(^?MH)=(^?MH)2
where i 2 US; E means using dVarUS, dVarE respectively. The Robins-Breslow-
Greenland [25] estimator is consistent under both Model I and Model II.
The estimating equation xti(yi ??i) is
0 = bi +di ?(ai +bi)exp(fii)=(1+exp(fii))
?(ci +di)exp(fii +fl)=(1+exp(fii +fl)) (4.1)
0 = di ?(ci +di)exp(fii +fl)=(1+exp(fii +fl)) (4.2)
and we can get the MLE of fl as log(aidi=bici) = logai +logdi ?logbi ?logci.
Let
Zi0 = (ai ?E(aijfii0))=
p
Var(aijfii0)
66
and
Zi1 = (ci ?E(cijfii0))=
p
Var(cijfii0)
For simplicity we assume ai + bi = ni and ci + di = ni. Since given fii0, ai belongs
to Binomial(ni;1=(1+exp(fii0))) we have
ai = ni1+exp(fi
i0)
+
p
Var(aijfii0)
?
ai ?E(aijfii0)p
Var(aijfii0)
!
= ni1+exp(fi
i0)
+Zi0exp(fii0=2)
pn
i
1+exp(fii0)
= ni1+exp(fi
i0)
 
1+Zi0exp(fii0=2)pn
i
?
;
logai ? log
 n
i
1+exp(fii0)
?
+Zi0exp(fii0=2)pn
i
?Z2i0exp(fii0)2n
i
+o( 1n
i
):
By the same argument we have
?logbi ? ?log
 n
i exp(fii0)
1+exp(fii0)
?
+Zi0exp(?fii0=2)pn
i
+Z2i0exp(?fii0)2n
i
+o( 1n
i
);
?logci ? ?log
 n
i
1+exp(fii0 +fl0)
?
?Zi1exp((fii0 +fl0)=2)pn
i
;
+Z2i1exp(fii0 +fl0)2n
i
+o( 1n
i
)
logdi ? log
 n
i exp(fii0 +fl0)
1+exp(fii0 +fl0)
?
?Zi1exp(?(fii0 +fl0)=2)pn
i
?Z2i1exp(?(fii0 +fl0))2n
i
+o( 1n
i
):
Then
^fli = fl0 + Zi0p
ni (exp(fii0=2)+exp(?fii0=2))
? Zi1pn
i
(exp((fii0 +fl0)=2)+exp(?(fii0 +fl0)=2))
? Z
2
i0
2ni (exp(fii0)?exp(?fii0))
+ Z
2
i1
2ni (exp(fii0 +fl0)?exp(?(fii0 +fl0)))+op(
1
ni):
67
The bias of ^fli is the following:
E((^fli ?fl0)jfii0)
= exp(fii0)(exp(fl0)?1)?exp(?fii0)(exp(?fl0)?1)2n
i
+o( 1n
i
):
Let
G(x) = (4+4exp(?x)+4exp(x))?1 ; (4.3)
in order to check the hypotheseses of Theorem 3.1.2, flrst we consider 1ni(Fflfli )?1 =
G(?ij) where ?ij = fii +xijfl and xij = 1 or 0. Since we assume jfii0j is bounded, it
is obvious that infi G(?ij0) > ?1 > 0 and supi G(?ij0) < M1 < 1, for some positive
constants ?1 and M1. Similar arguments follow for ni(Fni)?1.
We need to verify condition (1) of Theorem 3.1.3. In this case we have the
log-likelihood function for ith group as the following:
l = bifii +di(fii +fl)?ni log(1+exp(fii))?ni log(1+exp(fii +fl)):
Then
? 1n
i
l00(fii0;fl0) =
0
BB
@
Ii1 +Ii2 Ii2
Ii2 Ii2
1
CC
A (4.4)
where
Ii1 = exp(fii0)(1+exp(fi
i0))2
Ii2 = exp(fii0 +fl0)(1+exp(fi
i0 +fl0))2
Since in this special example Fflfli as a scalar instead of a matrix, the Lindeberg
condition veriflcation can be simplifled and for condition (1) of Theorem 3.1.2 we
68
only need to verify
K1 < mini F
flfl
i
maxi Fflfli < K2
where K1 and K2 are positive constants. Obviously here K2 = 1 since here we have
Fflfli = (1=ni)((1+exp(?fl0))exp(?fii0)+(1+exp(fl0))exp(fii0)+4).
Consider condition (3) of Theorem 3.1.2: Let
?maxF?1ni
?minF?1ni =
supkuk=1 fi(u)
infkuk=1 fi(u)
where
fi(u) = (1+exp(fii0 +fl0))
2
exp(fii0 +fl0) u
2
2 +(u1 ?u2)
2(1+exp(fii0))
2
exp(fii0) :
For the further calculation we write
fi(u) = exp(fii0)
?(exp(?fi
i0)+exp(fl0))2
exp(fl0) u
2
2 +(u1 ?u2)
2(1+exp(?fii0))2
?
= exp(?fii0)
?(1+exp(fi
i0 +fl0))2
exp(fl0) u
2
2 +(u1 ?u2)
2(1+exp(fii0))2
?
Let
B1 = (exp(?fii0)+exp(fl0))
2
exp(fl0) ; A1 = (1+exp(?fii0))
2:
and
B2 = (1+exp(fii0 +fl0))
2
exp(fl0) ; A2 = (1+exp(fii0))
2:
Here we have u1 = p1?u22 for A1, B1 and A2, B2 as the above, we have
(u22)1 = 12 ? 12
q
1?4A21=(4A21 +B21)
(u22)2 = 12 ? 12
q
1?4A22=(4A22 +B22)
69
So we have
max
i
supkuk=1 fi(u)
infkuk=1 fi(u)
= max
i
A1(1?2pA21=(4A21 +B21))+B1(12 + 12p1?4A21=(4A21 +B21))
A1(1?2pA21=(4A21 +B21))+B1(12 ? 12p1?4A21=(4A21 +B21))
and we can see that when fii0 ! +1we have maxi?supkuk=1 fi(u)=infkuk=1 fi(u)? =
O(1). Also
max
i
supkuk=1 fi(u)
infku=1kfi(u)
= max
i
A2(1?2pA22=(4A22 +B22))+B2(12 + 12p1?4A22=(4A22 +B22))
A2(1?2pA22=(4A22 +B22))+B2(12 ? 12p1?4A22=(4A22 +B22))
we can see that when fii0 ! ?1 we have maxi?supkuk=1 fi(u)=infkuk=1 fi(u)? =
O(1). Consider
F?1ni =
?
?l00(fii0;fl0)
??1
= 1=ni
0
BB
@
I?1i1 ?I?1i1
?I?1i1 I?1i1 +I?1i2
1
CC
A: (4.5)
Here kuk2 = u21 +u22 = 1 so that
mX
i
kF?1ni k =
X
i
1
ni supkuk=1
 (1+exp(fi
i0 +fl0))2
exp(fii0 +fl0) u
2
2 +(u1 ?u2)
2(1+exp(fii0))
2
exp(fii0)
?
?
mX
i
1
ni (6+(2+exp(?fl0))exp(?fii0)+(2+exp(fl0))exp(fii0))
? 6mmin
i ni
+
mX
i=1
c1 exp(jfii0j) 1n
i
? 6mmin
i ni
+ mc1min
i ni
1
m
mX
i=1
exp(jfii0j): (4.6)
Ifwehaveconditionthe(1=m)Pmi=1 exp(jfii0j) !a:s: E(exp(jfii0j)), thenbycondition
(2) of Theorem 3.1.2, (4.6)= 6m=mini ni + Op(m=mini ni) = op(1), the asymptotic
70
relation k(1=pni)[Fni ?Fni( ?)]k = Op(1) can be verifled following the same argu-
ment in the proof of Theorem 3.1.3, we verifled assumptions of Theorem 3.1.2 which
gives kpni(Fni)?T=2k = O(1), so condition (3) of Theorem 3.1.3 is satisfled.
Consider condition (4) of Theorem 3.1.2 and here we let ?(t) = t:
We have:
^fli ?E(^flijfii0)
= Zi0pn
i
(exp(fii0=2)+exp(?fii0=2))
? Zi1pn
i
(exp((fii0 +fl0)=2)+exp(?(fii0 +fl0)=2))
? 1+Z
2
i0
2ni (exp(fii0)?exp(?fii0))
+ 1+Z
2
i1
2ni (exp(fii0 +fl0)?exp(?(fii0 +fl0)))+o(
1
ni);
and
?K
i1Zi0p
ni ?
Ki3(1+Z2i0)
2ni +
Ki4(1+Z2i1)
2ni ?
Ki2Zi1p
ni +o(
1
ni)
?4
?
?flfl
flflKi1Zi0p
ni ?
Ki2Zi1p
ni
flfl
flfl+
flfl
flflKi4(1+Z2i1)
2ni ?
Ki3(1+Z2i0)
2ni +o(
1
ni)
flfl
flfl
?4
? 8
"flfl
flflKi1Zi0p
ni ?
Ki2Zi1p
ni
flfl
flfl
4
+
flfl
flflKi4(1+Z2i1)
2ni ?
Ki3(1+Z2i0)
2ni +o(
1
ni)
flfl
flfl
4#
? 64
"flfl
flflKi1Zi0p
ni
flfl
flfl
4
+
flfl
flflKi2Zi1p
ni
flfl
flfl
4
+
flfl
flflKi4(1+Z2i1)
2ni
flfl
flfl
4
+
flfl
flflo( 1
ni)?
Ki3(1+Z2i0)
2ni
flfl
flfl
4#
? 64
?K4
i1Z
4
i0
n2i +
K4i2Z4i1
n2i +
K4i4(1+Z2i1)4
16n4i +o(
8
n4i )+
8K4i3(1+Z2i0)4
16n4i
?
? 64
?K4
i1Z
4
i0
n2i +
K4i2Z4i1
n2i +
K4i4(1+Z8i1)
2n4i +o(
8
n4i )+
4K4i3(1+Z8i0)
n4i
?
:
71
Then we have
E((^fl?ti ^fl?i )2jfii0)
= (Fflfli )?2E
(?
Ki1Zi0p
ni ?
Ki3(1+Z2i0)
2ni +
Ki4(1+Z2i1)
2ni ?
Ki2Zi1p
ni +o(
1
ni)
?4
jfii0
)
? 64(Fflfli )?2E
??K4
i1Z
4
i0
n2i +
K4i2Z4i1
n2i +
K4i4(1+Z8i1)
2n4i +o(
8
n4i )+
4K4i3(1+Z8i0)
n4i
?
jfii0
 
where Ki1 = exp(fii0=2)+exp(?fii0=2), Ki2 = exp((fii0+fl0)=2)+exp(?(fii0+fl0)=2),
Ki3 = exp(fii0)?exp(?fii0), Ki4 = exp(fii0 +fl0)?exp(?(fii0 +fl0)) .
Since here Zi0 and Zi1 are normalized binomial random variables for ni with
difierentmeansni=(1+exp(fii0)), ni=(1+exp(fii0+fl0))andvariancesni exp(fii0)=(1+
exp(fii0)), ni exp(fi0+fl0)=(1+exp(fii0+fl0)) respectively, which are both in order of
ni, by M. Znidaric [38] we have E(Z4i0jfii0) = (n2i +O(n?1i ))Op(n?2i ) = Op(1). By the
same argument we have E(Z8i0jfii0) = Op(1), E(Z4i1jfii0) = Op(1) and E(Z8i1jfii0) =
Op(1), so it is easy to show that condition (4) of Theorem 3.1.2 satisfled.
Now we consider the following:
?
(
X
i
(Fflfli )?1)?12
X
i
(Fflfli )?1(E(^flijvi0)?fl0)
!2
=
(X
i
(Fflfli )?1
 K
i4 ?Ki3
2ni +op(
1
ni)
?)2
?
(X
i
ni?I?1i1 +I?1i2 ??1
)?1
? m
X
i
n2i
 I
i1Ii2
Ii1 +Ii2
?2 K
i4 ?Ki3
2ni +op(
1
ni)
?2(X
i
ni?I?1i1 +I?1i2 ??1
)?1
(4.7)
? 2m
X
i
?I?1
i1 +I
?1
i2
??2?(K2
i4 +K
2
i3)+op(1)
?
(X
i
ni?I?1i1 +I?1i2 ??1
)?1
(4.8)
? Op(1) mmin
i ni
(4.9)
The step from (4.7) to (4.8) follows because of (1) and (2) of Theorem 3.1.2, the
fact that jfii0jis bounded and maxi(K2i3 +K2i4)=mini(I?1i1 +I?1i2 ) = Op(1): The result
72
of Theorem 3.1.2 follows.
Consider (1) of Theorem 3.1.3. We have
1
Nk
^V?Vk = 1
Nk
X
i
nifG(^?ij)?G(?ij0)gk
? 1N
mX
i=1
nikG(^?ij)?G(?ij0)k = 1N
X
i
pn
iOp(1) = op(1):
The above conclusion follows from the delta method since, given fii0,
pn
i(G(^?ij)?G(?ij0)) ? AN(0;ni(G
0(?ij0))2(Faa
i +2F
afl
i +F
flfl
i )):
Also NjV ?1j = Nj(Pi niG(?ij0)j = O(1) and condition (1) of Theorem 3.1.3 is
satisfled. For condition (2) of Theorem 3.1.4, j(^Fflfli )?1 ? (Fflfli )?1j = nijG(^?ij) ?
G(?ij0)j = Op(pni), the condition (3) is easy to verifled. The result of Theorem
3.1.3 follows. Similar argument follow if we replace ^flij by ^^?ij since ,given fii0,
pn
i(G(^^?ij)?G(?ij0)) ? AN(0;ni(G
0(?ij0))2Faa
i ):
4.2 Simulation results for 2?2?m table.
We consider the 2?2?m table as the following set up:
logitP(yij = 1jfii;xij) = fii +xijfl
where fii is the random efiect and fl is flxed efiect, xij=0 or 1. The fiis are uniformly
distributed between -0.8 and 0.8 since from the Section 4.1 we assumejfii0j bounded.
We maximize the following likelihood function to get estimator of ^fli and ^fii for each
group according to Table 4.1.
L =
 1
1+exp(fii)
?ai  exp(fi
i)
1+exp(fii)
?bi  1
1+exp(fii +fl)
?ci  exp(fi
i)
1+exp(fl +fii)
?di
73
The weighted sum of the ^fl (using both true and estimated weights) are simulated.
It is easy to solve the above equation and get ^fli = log(aidi=cibi) and ^fii = logbi=ai
for the ith table.
4.2.1 Unconditional convergence in distribution
We simulated both balanced and unbalanced m tables with n observations per
table. We generated product binomial data for the flrst and second rows with success
probabilitiesexp(fii0)=(1+exp(fii0))andexp(fl0+fii0)=(1+exp(fl0+fii0))respectively.
Here ai +bi +ci +di = n and for the balanced case we have ai +bi = ci +di = n=2.
For the unbalanced case we either have ai+bi = n=3 or ai+bi = n=4. We simulated
with either fl0 = 1 or = 0:5. Various combinations of (m;n) and true values of fl
were used for both balanced and unbalanced settings.
For each combination, 1000 replications of random efiects fii and m groups
of ai, bi, ci and di were generated. Estimated regression coe?cients for various
choices of (m;n) and fl0 with fii ? Unif(?0:8;0:8) are summarized in Table 4.2 and
Table 4.3.
For the balanced setup, the tables display the means and standard errors of the
simulated values of (^flw ?fl0)=s:e:, (^^flw ?fl0)=ds:e:, ^flw ?fl0, ^^flw ?fl0, 95% confldence
bounds for (^flw?fl0) and (^^flw?fl0) based on Student?s t. From Table 4.2 and Table
4.3 under the balanced setup, we can see that for fl0 = 0:5 in all the combinations ^flw
is approximately unbiassed and for all combinations with fl0 = 1 except for (100;50)
^flw is slightly biased. For fl0 = 0:5 and fl0 = 1 ^^flw is approximately biased in all
74
combinations except the combination fl0 = 1 or 0.5, m = 100 and n = 50.
For all of the combinations in both tables, Shapiro-Wilk test p values are above
0.24 except for fl0 = 0:5 (20;400) and fl0 = 1 and (40;800) which means in those
combinations, ^flw and ^^flw given fi0 are normal. Likewise, Kolmogorov-Smirnov test
p values are larger than 0.06.
From Table 4.2 and Table 4.3, we can see that bias/se< 0:208 except for ex-
treme case (m = 100;n = 50) so the normal inference is not greatly afiected in the
cases where consistency results do not hold. For the combination (100;50), in order
to avoid 0 observations in the cells we use (Woolf, 1955)?s adjusted method to have
^fli = logf[(ai +1=2)(di +1=2)]/[(ci +1=2)(bi +1=2)]gand ^fii = log(fbi +1=2g/fai +1=2g).
We compared our estimators ^flw with true weights and ^^flw with estimated
weights to the Mantel-Haenszel estimator ^flMH. To see whether our estimators
are more e?cient, we compared the standard deviation of our estimators from 1000
replications with that of the Mantel-Haenszel estimator. From the results of difierent
combinations of (m;n) and fl0, our estimators are almost as e?cient as the Mantel-
Haenszel estimator. Since ^^flw is constructed by plugging the MLE ^fli for ith group
into the weight formula, we can also construct another empirical estimator of fl
by plugging in ^^flw. We ran similar simulations of this new empirical estimator,
which still introduced more positive bias. But the normality results hold for this
new empirical estimator. It suggests that in practice, one can stay with the simple
empirical estimator by plugging MLE ^fli from each group. The plots show that ^flw
and ^^flw are approximately normal, with departures from normality in the extreme
tails in this balanced setup.
75
Table 4.2: Simulated estimates of logistic odds ratio in 2?2?m balanced table.
(^flw ?fl0) (^^flw ?fl0)
(m, n) mean std 95% CI for (^flw ?fl0) mean std
fl0 = 0:5, balanced set up
(10,200) 0.006 0.095 (0.000, 0.001) -0.0001 0.093
(10,300) 0.002 0.077 (-0.003, 0.007) -0.002 0.076
(20,400) 0.002 0.067 (-0.002, 0.007) -0.004 0.066
(30,600) 0.001 0.032 (-0.002, 0.001) -0.002 0.032
(40,800) -0.002 0.023 (0.000, 0.003) 0.000 0.023
(100,50) -0.001 0.060 (-0.005, 0.002) -0.028 0.056
fl0 = 1, balanced setup
(10,200) 0.013 0.100 (0.006, 0.019) -0.002 0.098
(10,300) 0.012 0.080 (0.008, 0.017) 0.003 0.079
(20,400) 0.007 0.050 (0.004, 0.010) -0.001 0.050
(30,600) 0.004 0.033 (0.002, 0.006) -0.001 0.032
(40,800) 0.005 0.024 (0.003, 0.006) 0.001 0.024
(100,50) 0.000 0.063 (-0.004, 0.004) 0.061 0.057
76
Table 4.3: Simulated standardized estimates of log odds ratio (standardized by true
and estimated conditional standardized error) in 2?2?m balanced table .
(^flw ?fl0)=s:e: (^^flw ?fl0)=ds:e:
(m, n) mean std 95% CI for (^^flw ?fl0) mean std
fl0 = 0:5, balanced set up
(10,200) 0.064 1.016 (-0.006, 0.006) -0.005 0.995
(10,300) 0.026 1.011 (-0.007, 0.003) -0.029 0.997
(20,400) 0.037 1.017 (-0.008, 0.0003) -0.060 0.993
(30,600) 0.016 1.019 (-0.001, 0.002) -0.054 1.014
(40,800) 0.064 0.990 (-0.002, 0.001) -0.006 0.986
(100,50) -0.023 1.032 (-0.031, -0.024) -0.475 0.958
fl0 = 1, balanced set up
(10,200) 0.131 1.029 (-0.008, 0.004) -0.026 1.000
(10,300) 0.156 1.005 (-0.002, 0.008) 0.026 0.985
(20,400) 0.144 1.030 (-0.004, 0.003) -0.013 1.021
(30,600) 0.120 1.002 (-0.003 0.001) -0.037 1.000
(40,800) 0.190 0.979 (-0.001, 0.002) 0.035 0.972
(100,50) -0.001 1.043 (-0.064, -0.057) -0.994 0.933
77
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=30, n1=600, betawhat, beta0=0.5
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=30, n1=600, betawhathat, beta0=0.5
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=40, n1=800, betawhat, beta0=0.5
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=40, n1=800, betawhathat, beta0=0.5
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-4
-2
0
2
4
m=30, n1=600, betawhat, beta0=1
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-4
-2
0
2
m=30, n1=600, betawhathat, beta0=1
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-4
-2
0
2
m=40, n1=800, betawhat, beta0=1
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-4
-2
0
2
m=40, n1=800, betawhathat, beta0=1
Figure 4.1: Q-Q plots for (^flw?fl0) and (^^flw?fl0) standardized by true and estimated
conditional standard error, for various values of (m;n) in logistic 2?2?m balanced
setup.
78
For each combination, 1000 replications of random efiects fii and m groups of
ai, bi, ci and dis were generated. Estimated regression coe?cients for various choices
of (m;n), fl0 with fii ? Unif(?0:8;0:8) are summarized in Table 4.4 and Table 4.5.
For the unbalanced setup, the tables display the means and standard errors of the
simulated values of (^flw?fl0)=s:e:, (^^flw?fl0)=ds:e:, (^flw?fl0), (^^flw?fl0), 95% confldence
bounds for (^flw?fl0) and (^^flw?fl0) based on Student?s t. From Table 4.4 and Table
4.5 for fl0 = 0:5 or 1 with unbalanced setup (1=3;2=3) (meaning ai +bi = n=3) and
(1=4;3=4) (meaning ai + bi = n=4), we can see that in all the combinations ^flw is
approximately unbiased hold except for the combination with fl0 = 1, (1=4;3=4),
m = 10, n = 200 or fl0 = 0:5, (1=3;2=3), m = 10, n = 300 and fl0 = 0:5 with
m = 30, n = 600 or m = 20, n = 400.
But their Shapiro-Wilk test p values are above 0.19 except fl0 = 0:5, (20;400)
which means in those combinations, ^flw given fi0 are conditionally normal and
Kolmogorov-Smirnov test p values are larger than 0.10.
For ^^flw in all the combinations except for fl0 = 1 or 0.5, (1=4;3=4) or (1=3;2=3),
m = 100, n = 60 the estimator is approximately unbiased. From Table 4.4 and
Table 4.5, we can see that bias/se< 0:109 except for extreme combinations like
(m = 100;n = 60), so the normal inference is not greatly afiected in these cases
because the bias is small.
For the combination (100;60), in order to avoid 0 observations in the cells we
use (Woolf, 1955)?s adjusted method to have
^fli = log((ai +1=2)(di +1=2)=(ci +1=2)(bi +1=2))
79
and ^fii = log(bi +1=2=ai +1=2).
In order to compare our estimators ^flw with true weights and ^^flw with es-
timated weights to the Mantel-Haenszel estimator ^flMH and to see whether our
estimators are more e?cient, we compared the standard deviation of our estimators
from 1000 replication with of Mantel-Haenszel estimator. From the results of dif-
ferent combinations of (m;n) and fl0, our estimators are almost the same e?cient
as Mantel-Haenszel estimator. Since ^^flw is constructed by plugging MLE ^fli for ith
group, we can also construct another empirical estimator of fl by plugging in ^^flw,
we run the similar simulations under this new empirical estimator which introduced
more positive bias. But the normality results hold for this new empirical estimator.
It suggests that in practice, one can stay with the simpler empirical estimator by
obtained by plugging MLE ^fli into the formula for the weight matrix. The plots
show that ^flw and ^^flw are approximately normal, with departures from normality in
the extreme tails in logistic 2?2?m unbalanced setup.
Since when we introduce 1=2 adjustment into estimator we reduce bias, we
tried simulation for (10;200) in balanced setup and unbalanced setup (1=4;3=4) and
(10;300) in unbalanced setup (1=3;2=3). Adding the 1=2 adjustment can reduce
bias for ^flw ?fl0 and ^^flw ?fl0 a lot, especially for the combination (10;200).
80
Table 4.4: Simulated estimates of log odds ratio (standardized by true and estimated
conditional standardized error) in 2?2?m unbalanced table.
(^flw ?fl0) (^^flw ?fl0)
(m, n) mean std 95% CI for (^flw ?fl0) mean std
fl0 = 1, unbalanced setup (1/4,3/4)
(10,200) 0.012 0.110 (0.005, 0.019) 0.005 0.107
(20,400) 0.003 0.054 (-0.0003, 0.006) -0.0002 0.053
(100,60) 0.001 0.063 (-0.003 0.004) 0.023 0.057
fl0 = 1, unbalanced setup (1/3,2/3)
(10,300) 0.010 0.084 (0.005, 0.015) 0.004 0.083
(30,600) 0.002 0.034 (-0.0003, 0.004) -0.001 0.034
(100,60) -0.001 0.059 (-0.004, 0.003) -0.032 0.055
fl0 = 0:5, unbalanced setup (1/3,2/3)
(10,300) 0.005 0.082 (-0.0004, 0.010) 0.002 0.081
(30,600) 0.002 0.033 (0.0004, 0.004) 0.0003 0.032
(100,60) -0.003 0.056 (-0.007, 0.0004) -0.016 0.052
fl0 = 0:5, unbalanced setup (1/4,3/4)
(10,200) 0.006 0.109 (-0.0005, 0.013) 0.003 0.106
(20,400) 0.005 0.054 (0.002, 0.008) 0.004 0.054
(100,60) -0.001 0.064 (-0.003, 0.005) -0.009 0.058
81
Table 4.5: Simulated standardized estimates of log odds ratio (standardized by true
and estimated conditional standardized error) in 2?2?m unbalanced table .
(^flw ?fl0)=s:e: (^^flw ?fl0)=ds:e:
(m, n) mean std 95% CI for (^^flw ?fl0) mean std
fl0 = 1, unbalanced setup (1/4,3/4)
(10,200) 0.110 1.005 (-0.002, 0.012) 0.041 0.969
(20,400) 0.056 0.985 (-0.004, 0.003) -0.006 0.965
(100,60) 0.008 1.015 (-0.027, -0.020) -0.372 0.899
fl0 = 1, unbalanced setup (1/3,2/3)
(10,300) 0.122 1.011 (-0.001, 0.009) 0.047 0.992
(30,600) 0.054 1.004 (-0.003, 0.001) -0.035 0.996
(100,60) -0.012 1.025 (-0.035, -0.028) -0.542 0.939
fl0 = 0:5, unbalanced setup (1/3,2/3)
(10,300) 0.058 1.018 (-0.003, 0.007) 0.025 1.001
(30,600) -0.049 0.996 (-0.002, 0.002) 0.010 0.986
(100,60) -0.055 1.005 (-0.019, -0.013) -0.283 0.924
fl0 = 0:5, unbalanced setup (1/4,3/4)
(10,200) 0.059 1.018 (-0.003, 0.010) 0.029 0.980
(20,400) 0.094 1.020 (0.0002, 0.007) 0.064 1.002
(100,60) 0.017 1.066 (-0.013, -0.006) -0.149 0.947
82
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
m=20, n1=400, betawhat, beta0=1
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
m=20, n1=400, betawhathat, beta0=1
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
0
2
4
m=30, n1=600, betawhat, beta0=1
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-4
-2
0
2
4
m=30, n1=600, betawhathat, beta0=1
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
0
2
4
m=20, n1=400, betawhat, beta0=0.5
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
0
2
4
m=20, n1=400, betawhathat, beta0=0.5
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=30, n1=600, betawhat, beta0=0.5
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=30, n1=600, betawhathat, beta0=0.5
Figure 4.2: Q-Q plots for (^flw?fl0) and (^^flw?fl0) standardized by true and estimated
conditionalstandarderror, forvariousvaluesof(m;n)inlogistic2?2?munbalanced
setup where n1=600, 400 for (1/3,2/3), (1/4, 3/4) setups respectively .
83
4.2.2 Conditional Convergence in Distribution
Recall that Theorem 3.1.2 and Theorem 3.1.3 gave conditional convergence in
distribution of the normalized ^flw and ^^flw to the N(0;I) distribution. We performed
a limited study to examine this conditional convergence.
We repeated the simulations designed for the logistic 2?2?m table, but with
the following modiflcation: we generated 10 realizations of i.i.d Unif(-0.8,0.8) of fi0.
For each realization of fi0 we generated 1000 replications of y with the same fi0.
This was performed only for sample sizes m = 10, n = 200 for fl0=1 or 0.5.
From the following tables Table 4.6,Table 4.7, Table 4.8 and Table 4.9, we
found that each realization of fi0, the normalized ^flw and ^^flw had Monte Carlo mean
zero and variances near 1 in the case m = 10, n1 = 200. The Kolmogorov-Smirmov
and Shapiro-Wilk tests all indicated no signiflcant departures from normality. These
flndings are very similar to those which describe the unconditional distribution.
84
Table 4.6: Simulated estimates of logistic odds ratio for flxed realizations of uni-
formly distributed random efiects (m = 10;n1 = 200) and fl0 = 0:5.
(^flw ?fl0) ^^flw ?fl0
Realization mean std 95% CI for ^flw ?fl0 mean std
1 0.007 0.097 (0.001, 0.013) 0.000 0.096
2 0.002 0.091 (-0.005, 0.006) -0.005 0.090
3 0.004 0.100 (-0.002, 0.010) -0.001 0.100
4 0.003 0.093 (-0.003, 0.009) -0.004 0.092
5 0.011 0.095 (0.005, 0.017) 0.004 0.093
6 0.002 0.092 (-0.004, 0.008) -0.003 0.090
7 0.003 0.092 (-0.003, 0.009) -0.003 0.091
8 0.006 0.098 (0.000, 0.012) -0.001 0.097
9 0.006 0.092 (0.000, 0.012) 0.005 0.092
10 0.003 0.095 (-0.003, 0.009) -0.003 0.093
85
Table 4.7: Simulated standardized estimate of logistic odds ratio with flxed realiza-
tions of uniformly distributed random efiects (m = 10; n1 = 200) and fl0 = 0:5.
(^flw ?fl0)=s:e (^^flw ?fl0)=ds:e:
Realization mean std 95% CI for (^^flw ?fl0) mean std
1 0.071 1.030 (-0.005, 0.006) -0.003 1.000
2 0.002 0.980 (-0.011, 0.000) -0.062 0.964
3 0.045 1.089 (-0.008, 0.005) -0.019 1.069
4 0.030 1.000 (-0.009, 0.002) -0.041 0.970
5 0.116 1.000 (-0.002, 0.010) 0.037 0.975
6 0.024 1.000 (-0.009, 0.002) -0.039 0.980
7 0.032 0.991 (-0.008, 0.003) -0.033 0.974
8 0.066 1.040 (-0.006, 0.006) -0.009 1.016
9 0.068 1.007 (-0.005, 0.006) 0.002 0.993
10 0.033 1.018 (-0.009, 0.003) -0.035 0.993
86
Table 4.8: Simulated estimates of logistic odds ratio with flxed realizations of uni-
formly distributed random efiects (m = 10;n = 200) and fl0 = 1.
(^flw ?fl0) ^^flw ?fl0
Realization mean std 95% CI for ^flw ?fl0 mean std
1 0.015 0.097 (0.009, 0.021) 0.002 0.095
2 0.013 0.102 (0.007, 0.020) -0.003 0.099
3 0.010 0.100 (0.003, 0.016) -0.004 0.099
4 0.013 0.103 (0.008, 0.020) 0.000 0.102
5 0.014 0.094 (0.009, 0.020) 0.000 0.092
6 0.011 0.094 (0.005, 0.017) 0.000 0.093
7 0.013 0.099 (0.007, 0.020) -0.001 0.097
8 0.011 0.097 (0.005, 0.017) -0.003 0.095
9 0.009 0.099 (0.003, 0.015) -0.004 0.098
10 0.011 0.098 (0.005, 0.017) -0.005 0.096
87
Table 4.9: Simulated standardized estimate of logistic odds ratio with flxed realiza-
tions of uniformly distributed random efiects (m = 10; n1 = 200) and fl0 = 1.
(^flw ?fl0)=s:e (^^flw ?fl0)=ds:e:
Realization mean std 95% CI for (^^flw ?fl0) mean std
1 0.153 1.021 (-0.003, 0.008) 0.019 0.996
2 0.138 1.022 (-0.009, 0.003) -0.039 0.987
3 0.099 1.033 (-0.010, 0.002) -0.050 1.008
4 0.144 1.064 (-0.007, 0.006) -0.011 1.038
5 0.149 0.968 (-0.005, 0.006) -0.003 0.940
6 0.119 1.001 (-0.006, 0.006) -0.008 0.982
7 0.139 1.025 (-0.007, 0.006) -0.013 0.999
8 0.109 0.994 (-0.009, 0.003) -0.039 0.969
9 0.095 1.031 (-0.010, 0.002) -0.051 1.011
10 0.111 1.003 (-0.010, 0.001) -0.053 0.977
88
4.3 Analysis of real data for a 2?2?22 table
We used the data from Yusuf et al. [36] which has 22 clinical trials of beta-
blockers for reducing mortality after myocardial infarction. The data structure is
the following: Clinical trial j, where 1 ? j ? 22 (in the series to be considered for
meta-analysis), involves the use of n0j subjects in the control group and n1j in the
treatment group, giving rise to bj and dj deaths in the control and treatment groups,
respectively. Then the usual sampling models involve two independent binomial
distributions with probability of death p0j and p1j, respectively. We concentrate on
estimating the common log odds ratio which we label fl. Here we assume the model
logit(yij = 1jxij;fii) = fii + xijfl where fii is a random efiect which may represent
variation between clinical trials. We use this real data to calculate our estimator,
the Mantel-Haenszel estimator and their variance estimators, respectively. Here we
need to point out that the difierence between our asymptotic setting and that Woolf
(1955) is that we allow m ! 1, but in Woolf?s setting, m is flxed. The following
table summarizes the result.
89
Table 4.10: Summary of simulation of our estimator and Mantel-Haenszel estimator
of log odds ratio.
^^fl
w ^flMH 95% CI for
^^fl
w 95% CI for ^flMH Var(
^^fl
w) Var(^flMH)
-0.260 -0.261 (-0.359, -0.161 ) (-0.372, -0.150) 0.00253 0.00249
First, we use Woolf?s test in Splus, Breslow-Day test and Likelihood test in
SAS to test homogeneity of odds ratios whose p-values are 0.3595, 0.3149 and 0.3118
respectively. They all suggest that the common odds ratio model is appropriate for
combining the 22 clinical trials.
The above empirical variance estimator of Var[^flMH] is based on Breslow,
Greenland and Robins [25]. We bootstrapped the data as follows: for each i =
1; ::: ;22 and j = 0; 1, we generated Y ?ij ? Binomial(nij;pij), where pij = Yij=nij
is the sample proportion of successes in table i with x = j. Based on this bootstrap
sample, new estimators ^^fl?w and ^fl?MH were calculated. This process were repeated
1000 times, and the sample variance of these bootstrap replicates was used to esti-
mate Var[^^flw] and Var[^flMH].
We obtained 0:002476 for ^^flw and 0:002603 for ^flMH. These results agree with
Table 4.10 and we can see that our estimator ^^flw is very close to the Mantel-Haenszel
estimator. The Mantel-Haenszel estimator is consistent in the sparse data case,
unlike our estimator. The logic of our estimator can be extended to other types
of GLMM where m ! 1 and m=mini ni ! 0. No corresponding extension for
Mantel-Haenszel seems available in the literature.
90
Chapter 5
Case 2 Results.
This Chapter establishes asymptotic normality results for parameter estimates
in certain versions of Case 2, when m=n90. In Section 5.1 we review Jiang?s [14]
results on consistency. In Section 5.2 we state and prove our asymptotic normality
theorem, and in Section 5.3 we focus on a random efiect logistic regression model.
OurresultsfocusontheMaximumConditionalLikelihoodEstimates(MCLEs).
These estimates are based on maximizing the likelihood function conditional on the
estimable random efiects after integrating out the unestimable random efiects. The
estimates derived from this likelihood function were called MCLEs by Jiang [14].
A basic technique here is reparameterization, because, conditionally, the indi-
vidual efiects may not be identiflable. To illustrate this method, a special case is
considered. The analysis of the more general setting will be similar. In case 2 the
dimension of fl is flxed.
5.1 Case 2 consistency results of Jiang [14].
This section reviews the results on MCLE?s obtained by Jiang [14]. A key tool
in this analysis is to reparameterize the model to address identiflability.
Lemma 5.1.1. There is a map fl 7! ~fl, fi 7! ~fi such that
(i) Xfl+Zfi = ~X~fl+Z~fi, where (~X ~Z) is a known matrix of full column rank.
91
(ii) ~zi = ~z?j, i 2 Sj for some known vector ~z?j, where ~ztj is the ith row of ~Z
and Sj is deflned as below.
Vector uti is the ith row of U where U is standard in the sense that consists
of 0?s and 1?s and there is at least one 1 in each column and exactly one 1 in
each row. Vector eM;j is the M-dimensional vector whose jth component is 1 and
other components are 0. Let Sj = f1 ? i ? N : ui = eM;jg, and y(j) = (yi)i2Sj,
1 ? j ? M. Suppose that ?1;:::;?M are independent with common distribution
?(?=?)=?, where ?(?) is a known density function and ? > 0 is an unknown scale
parameter. Furthermore, we assume that there are no random efiects nested within
?. In notation, this means that zi = z?j, i 2 Sj, 1 ? j ? M, where z?j = (z?jk)1?k?l.
By Jiang?s [14] Lemma 4.1.1, we have
? = ~X~fl + ~Z~fi+U? (5.1)
Let ? = (~fl;?),  = (~fi;?). By (2.23) distribution of y given only fi is
f(yj ) =
MY
j=1
f(y(j)j ) (5.2)
and it is easy to show that f(y(j)j ) = gj(zt?j ~fi; ~fl;?), where
gj(s) = E
2
4Y
i2Sj
f(yijs1 +xtis(2) +sr+2?)
3
5
with s(2) = (s2;:::;sr+1) and r is the dimension of ~fl. Note that r ? p. Let n
be the dimension of ~fi and the ecpectation is with respect to the distribution of ?,
hj(s) = loggj(s), lC( ) = logf(yj ) and lC;j( ) = logf(y(j)j ) = hj(zt?j ~fi; ~fl; ?).
92
Then
lc( ) =
MX
j=1
lc;j( ): (5.3)
Let Z? be the matrix whose jth row is zt?j, 1 ? j ? M. Let ?0 and  0 be the
vectors corresponding to the true parameters and realization of random efiects.
Deflne s(l)M; k = PMj=1j~z?jkjl, l = 1;2;:::, tM;k = PMj=1Pl6=k j~z?jk~z?jlj,
Hj = (@2hj=@s2)
flfl
fls1=~zt
?j~fi; s(2)=~fl; sr+2=?
; (5.4)
and
A2 =
MX
j=1
0
BB
@
~z?j 0
0 Ir+1
1
CC
A(Hj( 0)?E(Hj( 0)j 0))
0
BB
@
~zt?j 0
0 Ir+1
1
CC
A (5.5)
where Il represents the l-dimensional identity matrix. Deflne
G =
0
BB
@
~Zt?~Z? 0
0 MIr+1
1
CC
A =
MX
j=1
0
BB
@
~z?j~zt?j 0
0 Ir+1
1
CC
A; (5.6)
?M( ) = min
1?j?M
?min
?
Var
?
(@hj=@s)j(~zt
?j~fi; ~fl; ?)
j 
??
; (5.7)
and ?M = ?M( 0). Let ?(l)j ( ) = (@lhj=@ls1)(~zt?j~fi; ~fl; ?), l = 1; 2,
V (1)k (") = max
1?j?M
E
?
j?(1)j ( 0)j1(~z
?jk?
(1)
j ( 0)j>1=2")
j 0
?
;
V (2)k (") = max
1?j?M
E
?
j?(2)j ( 0)?E(?(2)j ( 0)j 0)j1(~z2?jkj???????j>1=2")j?0
?
:
Theorem 5.1.2. Suppose:
(i) the conditional densities f(y(j)j ), 1 ? j ? M, are with respect to a com-
mon measure ? and have common support, and the flrst and second partial deriva-
tives of R f(yjj )d? with respect to components of  exist and can be taken under
the integral sign.
93
(ii) hj(s), 1 ? j ? M, are three times difierentiable and there exist ?, B > 0
such that
max
1?j?M
? 
max
1?u?r+2
flfl(@2h
j=@s1@su)
flfl?_ max
1?u; v; w?r+2
flfl(@3h
j=@su@sv@sw)
flfl? ? B
(5.8)
for all  such that k???0k < ?.
(iii) ~Z?k 6= 0, 1 ? k ? n, where ~Z?k is the kth column of ~Z?, and the following
are bounded:
k~Z?k1; max
1?k?n
?
s(1)M; k
s(2)M; k
!
; max
1?k?n
?
s(2)M; k
s(4)M; k
!
; max
1?k?n
?
s(4)M; k
s(2)M; k
!
and
max
1?j?M
?
j~z?jj2E
 @h
j
@s1 js0
?2!
_
?
max
2?u?r+2
E
 @h
j
@su js0
?2!
; (5.9)
(iv) ?M > 0, and there is a sequence ?M such that 0 < ?M ? ?M ^1, and the
following ! 0 in probability:
?max(G?1=2A2G?1=2)=?M; max
1?k?n
?
tM; k
s(2)M; k
!
=?M; ( nM)=?4M
and
max
l=1; 2
(logn=?2lM min
1?k?n
s(6?2l)M; k )_
 
n max
1?k?n
V (3?l)k (?lM)=?lM
?
:
Then, with probability approaching 1, there is a sequence ^ satisfying (@lC=@ )(^ ) =
0 and maxij^ i ? i0j = op(?M).
Note. In fact, it is seen from the proof of the theorem that maxij^?i ??i0j =
op(?2M).
94
Consider a special case in which there is only one random efiect factor. In such
a case, one may integrate out all the random efiects, if necessary. The resulting MCL
estimates are the maximum likelihood estimates for the flxed parameters. We have
the following.
Corollary 5.1.3. Suppose that in (5.1) fi = 0 (i.e., there are no random efiects
besides ?), and that:
(1) Part (i) of Theorem 4.1.2 holds with  replaced by ?.
(2) hj(?), 1 ? j ? M are three times difierentiable and there exists ?, B > 0 such
that
max
1?j?M
sup
k???0k??
j(any third derivative of hj)(?)j? B:
(3) ?M = min1?j?M ?min(Var((@hj=@?)(?0))j?0) > 0 and
1
M2(?M ^1)2
MX
j=1
E(kHj(?0)?EHj(?0)k2R) ! 0;
where Hj(?) = @2hj=@?2. Then, with probability approaching 1, there is a sequence
5.2 The Simple case (fi = 0).
In this case, we have the classical likelihood function after integrating out the
unestimable random efiects and do not need reparameterization, since we only have
flxed efiects and the dispersion parameter of the random efiect distribution.
95
Recall the likelihood function is deflned between (5.1) and (5.4), Then
0
BB
@
(@lC(?)=@fl)j^?
(@lC(?)=@?)j^?
1
CC
A
=
0
BB
@
(@2lC(?)=@fl@flt)j?? (@2lC(?)=@fl@?)j??
(@2lC(?)=@flt@?)j?? (@2lC(?)=@?2j??
1
CC
A
0
BB
@
^fl?fl0
^? ??0
1
CC
A
+
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)=@?)flfl?0
1
CC
A
where ?? is between ^? and ?0.
Then
(?H)
0
BB
@
^fl?fl0
^? ??0
1
CC
A =
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)=@?)flfl?0
1
CC
A
where
H =
0
BB
@
(@2lC(?)=@fl@flt)j?? (@2lC(?)=@fl@?)j??
(@2lC(?)=@flt@?)j?? (@2lC(?)=@?2)j??
1
CC
A: (5.10)
Let
C = Cov
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)=@?)flfl?0
1
CC
A (5.11)
Then
C?12(?H)C?12C12
0
BB
@
^fl?fl0
^? ??0
1
CC
A
= C?12
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)=@?)flfl?0
1
CC
A: (5.12)
96
Theorem 5.2.1. Suppose we have the following conditions:
(1) For any ? > 0, assume C?12(?H)C?12 is a positive deflnite matrix and
P(kC?12(?H)C?12 ?Ik < ?) ! 1
(2)
MkC?1k = O(1)
(3)
EfkGjk?(kGjk)g? K2
where K2 is a positive constant, ?(t) is a positive nondecreasing function mapping
from [0;infty) to [0;1) such that limt!1?(t) = 1 and t?(t) is a convex function,
and
Gj =
0
BB
@
?(@h
j=@fl)j?0
??(@h
j=@fl)j?0
?T ?(@h
j=@fl)j?0
??(@h
j=@?)j?0
?
?(@h
j=@?)j?0
??(@h
j=@fl)j?0
?T ?(@h
j=@?)j?0
?2
1
CC
A: (5.13)
Then
C12
0
BB
@
^fl?fl0
^? ??0
1
CC
A!D N(0;I): (5.14)
Proof:
Recall (5.11), (5.12). By condition (1), we have
C12
0
BB
@
^fl?fl0
^? ??0
1
CC
A
= (C?12(?H)C?12)?1C?12
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)=@?)flfl?0
1
CC
A
97
Then
  
  
  
  
C?12
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)=@?)flfl?0
1
CC
A?C
1
2
0
BB
@
^fl?fl0
^? ??0
1
CC
A
  
  
  
  
(5.15)
? k(C?12(?H)C?12)?1 ?Ik
  
  
  
  
C?12
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)@?)flfl?0
1
CC
A
  
  
  
  
(5.16)
= k(C?12(?H)C?12)?1 ?IkOp(1): (5.17)
The second factor of (5.16) is Op(1) from the following argument:
P
0
BB
@
  
  
  
  
C?12
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)@?)flfl?0
1
CC
A
  
  
  
  
< c4
1
CC
A
? 1?(1=c24)E
0
BB
@
  
  
  
  
C?12
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)@?)flfl?0
1
CC
A
  
  
  
  
1
CC
A
2
= 1?(1=c24)tr
0
BB
@C?1Cov
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)@?)flfl?0
1
CC
A
1
CC
A = 1?(p+1)=c24
By condition (1) of Theorem 5.2.1, We have
k(C?12(?H)C?12)?1 ?Ik
? k(C?12(?H)C?12)?1kkC?12(?H)C?12 ?Ik
= op(1):
So C?12
0
BB
@
(@lC(?)=@fl)flfl?0
(@lC(?)=@?)flfl?0
1
CC
A and C
1
2
0
BB
@
^fl?fl0
^? ??0
1
CC
A have the same asymptotic dis-
tribution.
98
Let u be a unit vector and
Tj = uTC?12
0
BB
@
(@hj=@fl)j?0
(@hj=@?)j?0:
1
CC
A
Recall (5.13). Then we have
T2j = uTC?12GjC?12u: (5.18)
By conditions (1), (2) and (3)
MX
j=1
E(T2j I(jTjj > "))
=
MX
j=1
E(T2j I(T2j > "2))
? kC?1k
MX
j=1
E
?
kGjkI
 
kGjk > "
2
kC?1k
? 
? kC?1k
MX
j=1
E(kGjk?fkGjkg)
?
?
 "2
kC?1k
? ?1
? MkC?1kK2
?
?
 "2
kC?1k
? ?1
= op(1):
The Lindeberg condition is satisfled and by Slutsky?s theorem, the conclusion of
Theorem 5.2.1 follows.
5.3 The logistic model logitP(yijk = 1jbij) = ?+bij:
We use the logistic model logitP(yijk = 1jbij) = ? + bij example to illustrate
Theorem 5.2.1.
99
Suppose ai = 0 and bij are iid normal with 1 ? i ? m1, 1 ? j ? n. The binary
responses yijk are conditionally independent with
logitP(yijk = 1jb) = ?+bij; 1 ? k ? r (5.19)
where r is flxed.
Recall hj(s) above (5.3). Then
hij(s) = logEexpf(s1 +s2?ij)
rX
k=1
yijk ?rlog(1+exp(s1 +s2?ij))g;
where the ?ij are independently identically distributed with a standard normal dis-
tribution.
Recall (5.11). In this example we have
C =
X
i
X
j
Var(@hij@? j?0): (5.20)
By condition (iii) of Corollary 5.1.3, C is a positive deflnite matrix. We need
to verify mn1kC?1k = O(1), equivalently as
inf
kuk=1
(utCu)=mn1 > 1=cmn1
where cmn1 is a positive constant and u is a unit vector.
By condition (iii) of Corollary 5.1.3 we have
inf
kuk=1
(utCu)=mn1 > ?M
where ?M > 0. Then condition (2) of Theorem 5.2.1 is verifled.
Recall (5.10), (5.11) and consider condition (1) of Theorem 5.2.1. In this
100
example we have
?H =
0
BB
@
?(@2lC(?)=@?2)j?? ?(@2lC(?)=@?@?)j??
?(@2lC(?)=@?@?)j?? ?(@2lC(?)=@?2)j??
1
CC
A
= C?C?H: (5.21)
Then
kC?12(?H)C?12 ?Ik
= kC?12 (C?H?C)C?12 ?Ik
= kC?12 (?H?C)C?12k:
Let
?C =
0
BB
@
C11 C12
C21 C22
1
CC
A
where
C11 = ?
X
i
X
j
E(@hij@? j?0)2;
C12 = C21 = ?
X
i
X
j
E(@hij@? j?0)(@hij@? j?0);
C22 = ?
X
i
X
j
E(@hij@? j?0)2:
Let
wij(?0;?ij) = (?0 +?0?ij)
X
k
yijk ?rlog(1+exp(?0 +?0?ij)) (5.22)
and
vij(?0;?ij) =
X
k
yijk ? rexp(?0 +?0?ij)1+exp(?
0 +?0?ij)
: (5.23)
101
Since we have P(yijk = 1j?;??ij) = exp(?+??ij)=(1+exp(?+??ij)), then
X
k
yijkj?0;?0?ij ? Binomial
 
r; exp(?0 +?0?ij)1+exp(?
0 +?0?ij)
?
:
Thus
@hij
@? j?0 =
E(exp(wij(?0;?ij)))vij(?0;?ij)
E(exp(wij(?0;?ij))) (5.24)
@hij
@? j?0 =
Ef?ij(exp(wij(?0;?ij)))vij(?0;?ij)g
E(exp(wij(?0;?ij))) (5.25)
?@
2hij
@?2 j?0 = ?
1
E(exp(wij(?0;?ij)))
'E?exp(w
ij(?0;?ij))(v2ij(?0;?ij))
? Var(
X
k
yijkj?0;?0?ij)
!)
+
?E(exp(w
ij(?0;?ij)))vij(?0;?ij)
E(exp(wij(?0;?ij)))
 2
?@
2hij
@?2 j?0 = ?
1
E(exp(wij(?0;?ij)))
'E??2
ij exp(wij(?0;?ij))v
2
ij(?0;?ij)
? Var(
X
k
yijkj?0;?0?ij)
!)
+
?Ef?
ij(exp(wij(?0;?ij)))vij(?0;?ij)g
E(exp(wij(?0;?ij)))
 2
?@
2hij
@?@?j?0 = ?
1
E(exp(wij(?0;?ij)))
'E??
ij(exp(wij(?0;?ij)))v2ij(?0;?ij)
? Var(
X
k
yijkj?0;?0?ij)
!)
+
?
E(exp(wij(?0;?ij)))vij(?0;?ij)
fE(exp(wij(?0;?ij)))g2
!
? (Ef?ij(exp(wij(?0;?ij)))vij(?0;?ij)g):
Since we have the classical likelihood function in this example, then
?E(@hij@? j?0)2 ?E(@
2hij
@?2 j?0) = 0; ?E(
@hij
@? j?0)(
@hij
@? j?0)?E(
@2hij
@?@?j?0) = 0;
?E(@hij@? j?0)2 ?E(@
2hij
@?2 j?0) = 0:
102
So now we have
?C =
0
BB
@
A?11 A?12
A?21 A?22
1
CC
A
where
A?11 =
X
i
X
j
E(@
2hij
@?2 j?0); A
?
12 = A
?
21 =
X
i
X
j
E(@
2hij
@?@?j?0);
A?22 =
X
i
X
j
E(@
2hij
@?2 j?0):
Let
?C?H =
0
BB
@
H?11 H?12
H?21 H?22
1
CC
A
where
H?11 =
X
i
X
j
?
E(@
2hij
@?2 j?0 )?
@2hij
@?2 j?0 ?
@2hij
@?2 j?? +
@2hij
@?2 j?0
 
;
H?21 = H?12 =
X
i
X
j
?
E(@
2hij
@?@?j?0)?
@2hij
@?@?j?0 ?
@2hij
@?@?j?? +
@2hij
@?@?j?0
 
;
H?22 =
X
i
X
j
?
E(@
2hij
@?2 j?0)?
@2hij
@?2 j?0 ?
@2hij
@?2 j?? +
@2hij
@?2 j?0
 
:
By Jiang?s [14] Corollary 4.1.3 (ii) and Taylor expansion
max
ij
flfl
flfl@2hij
@?2 j?? ?
@2hij
@?2 j?0
flfl
flfl?j^???0jmax
ij
flfl
flfl@3hij
@?3 j~?
flfl
flfl? Bj^???0j (5.26)
max
ij
flfl
flfl@2hij
@?2 j?? ?
@2hij
@?2 j?0
flfl
flfl?j^???0jmax
ij
flfl
flfl@3hij
@?3 j~?
flfl
flfl? Bj^???0j (5.27)
max
ij
flfl
flfl@2hij
@?@?j?? ?
@2hij
@?@?j?0
flfl
flfl?j^???0jmax
ij
flfl
flfl@3hij
@?3 j~?
flfl
flfl? Bj^???0j (5.28)
where ~? is between ?? and ?0 and Bj^???0j!p 0.
103
Since ?H?C is a symmetric matrix, then
k?H?Ck
= sup
kuk=1
jut(?H?C)uj
? 2
(?
m1nBj^???0j+
flfl
flfl
fl
X
i
X
j
(E@
2hij
@?2 j?0 ?
@2hij
@?2 j?0)
flfl
flfl
fl
!
_
?
m1nBj^???0j+
flfl
flfl
fl
X
i
X
j
(E@
2hij
@?2 j?0 ?
@2hij
@?2 j?0)
flfl
flfl
fl
!
_
?
m1nBj^???0j+
flfl
flfl
fl
X
i
X
j
(E@
2hij
@?@?j?0 ?
@2hij
@?@?j?0)
flfl
flfl
fl
!)
Since (@2hij=@?2)j?0 are iid, (@2hij=@?2)j?0 are iid, and (@2hij=@?@?)j?0 are iid.
Suppose we have 0 < ?0 < b where b is a positive constant. By Jiang?s [14]
Lemma 3.3, the second derivatives of hij are uniformly bounded. Then
E
flfl
flfl@2hij
@?2 j?0
flfl
flfl < 1 E
flfl
flfl@2hij
@?2 j?0
flfl
flfl < 1 E
flfl
flfl@2hij
@?@?j?0
flfl
flfl < 1: (5.29)
By the Law of Large numbers
1
mn1
flfl
flfl
fl
X
i
X
j
 
E@
2hij
@?2 j?0 ?
@2hij
@?2 j?0
?flflfl
flfl = op(1); (5.30)
1
mn1
flfl
flfl
fl
X
i
X
j
 
E@
2hij
@?2 j?0 ?
@2hij
@?2 j?0
?flflfl
flfl = op(1); (5.31)
1
mn1
flfl
flfl
fl
X
i
X
j
 
E@
2hij
@?@?j?0 ?
@2hij
@?@?j?0
?flflfl
flfl = op(1): (5.32)
104
Then
kC?12(?H)C?12 ?Ik
= kC?12 (?H?C)C?12k
? kC?1kk?H?Ck
= Op(1)k?H?Ckmn
1
= op(1):
Consider condition (3) of Theorem 5.2.1 and recall (5.13). We have
Gij =
0
BB
@
((@hij=@?)j?0)2 ((@hij=@?)j?0)((@hij=@?)j?0)
((@hij=@?)j?0)((@hij=@?)j?0) ((@hij=@?)j?0)2
1
CC
A:
Since Gij is a symmetric matrix, we have
kGijk? 2max
? 
@hij
@? j?0
?2
;
flfl
flfl
 @h
ij
@? j?0
? @h
ij
@? j?0
?flfl
flfl;
 @h
ij
@? j?0
?2!
:
Recall (5.24), (5.23). It is easy to show
 @h
ij
@? j?0
?2
? 4r2
since ?ij are independent identically distributed standard normal. Suppose we have
0 < ?0 < b where b is a positive constant, By Jiang?s [14] Lemma 3.3 that flrst
derivatives of hij are uniformly bounded. We have E(@hij=@?j?0)2 bounded and
flfl
flfl
 @h
ij
@? j?0
? @h
ij
@? j?0
?flfl
flfl
2
?
 @h
ij
@? j?0
?2 @h
ij
@? j?0
?2
:
Condition (1), (2) and (3) of Theorem 5.2.1 are verifled, so we have
C12
0
BB
@
^???0
^? ??0
1
CC
A!D N(0;I) (5.33)
105
Chapter 6
Simulation
In this chapter, in order to check the asymptotic results of Chapters 3 and 5,
logistic and Poisson random intercept models are simulated under case 1 for both
our new estimator (linear combination of weighted MLE) and PGWLE (Penalized
GeneralizedWeightedLeast Square Estimate)from Jiang [14]. Forcase 2, one simple
model was simulated. The asymptotic behavior of the estimates is investigated in
samples generated by Splus 2000 or R 2.6 for various sample sizes and parameter
conflgurations. The built-in function nlminb was used to compute the estimates.
Both statistical and computational questions were examined in the course of the
simulations.
In our simulations we compared Monte Carlo average of estimators to the
known true values of parameters, and we compared Monte Carlo averages of ap-
proximate variance formulas to the Monte Carlo sample variances. We also used
both the Kolmogorov-Smirnov and Shapiro-Wilk tests to assess agreement of stan-
dardized estimators with the N(0;1) distribution.
The Kolmogorov-Smirnov test is based on Dn = supxj^Fn(x) ? 'xj, where
^Fn(x) is the empirical cdf and '(x) is the N(0;1) cdf. The Shapiro-Wilk test [27]
is implemented in R using the algorithm of Royston. Intuitively, the Shapiro-Wilk
test is based on the observed correlation between an ordered sample and expected
106
values of N(0;1) order statistics. The actual calculation of the statistic and its
p-value relies on various approximations of means and variances of normal order
statistics. See Royston [26] and references there in for details.
6.1 Case 1 simulations for ^^flw
We simulate logistic and Poisson random intercept regression to investigate
consistency and asymptotic normality of the estimated regression coe?cients ^flw
and ^^flw. The results of Chapter 3 established the conditional and unconditional
asymptotic behavior of regression coe?cients, given the random efiects v0.
6.1.1 Case 1 combining a with vi.
We consider:
E(yijjvi) = b0ij(a+vi +flxij) (6.1)
where a and fl are flxed parameters, xij is a scalar valued predictor, and vi are iid
Unif (?0:8;0:8) or N(0;0:25). For our simulations we let a0 = 1. We used the
nlminb minimization function in Splus to get estimators which minimize ?lni( i) in
each cluster deflned as
?lni( i) =
X
j
(?yij?ij +bij(?ij))
where ?ij = a+vi+xijfl and = (a;fl;v1 ; ::: ;vm)t. We do not attempt to estimate
a and vi separately but only a+vi. For simplicity, we simulated the balanced model
with m clusters and n1 observations per cluster. We generated m ? n1 covariates
xij uniformly spaced between ?1 and 1, so that the ranges of fxijg and fxljg, i 6= l,
107
did not overlap. For example if we have m = 3 clusters and n1 = 2 observations per
cluster, flrst we divided interval [?1;1] into m = 3 equal sized intervals [?1;?1=3],
[?1=3;1=3] and [1=3;1], and then divide each interval into n1 = 2 subintervals to get
x11 = ?1, x12 = ?2=3, x21 = ?1=3, x22 = 0, and x31 = 1=3, x32 = 2=3 for clusters
1, 2, 3 respectively. We also generate m independent random efiects vi from the
Uniform distribution (?0:8;0:8) or N(0;0:25) according to (xij;vi) with regression
coe?cient fl0 = 1, m samples of n1 binary random variables Yij was generated
with E(Yijjvi;xij) = b0ij(?ij) where for Poisson model with b0ij(?ij) = exp(?ij) and
for logistic model with b0ij(?ij) = exp(?ij)=(1 + exp(?ij)). Various combinations of
(m;n1) are used in the simulations.
For each combination, 1000 replications of random efiects vi and responses Yij
were generated. The covariates xij were the same for all the simulations. Estimated
regression coe?cients for various choices of (m;n1) are summarized in Table 6.1
and Table 6.2. The tables display the Monte Carlo means and standard errors of
the simulated values of ^flw ? fl0, ^^flw ? fl0, (^flw ? fl0)=s:e:, (^^flw ? fl0)=ds:e: and 95%
confldence bounds for (^flw ? fl0) and ^^flw ? fl0 based on Student?s t. From Table
6.1 and Table 6.2, we can see that in combinations (30;120), (40;240) in logistic
model for ^flw, ^^flw, some bias is present (based on confldence intervals). But for all
the combinations from both tables, their Shapiro-Wilk test p values are above 0.18
and Kolmogorov-Smirnov test p values are larger than 0.1, which means in those
combinations, ^flw and ^^flw given v0 seem normal. From Table 6.1 and Table 6.2,
we can see that bias/se< 0:165 so the normal inference is not greatly afiected in
the cases where bias is present. The plots show that ^flw and ^^flw are approximately
108
normal, with departures from normality in the extreme tails when random efiects
are uniformly distributed.
109
Table 6.1: Simulated estimates of logistic and Poisson regression coe?cient combin-
ing flxed intercept with uniformly distributed random efiects.
(^flw ?fl0) ^^flw ?fl0
(m, n1) mean std 95% CI for ^flw ?fl0 mean std
Logistic fl0 = 1
(10,200) 0.020 0.808 (-0.029, 0.071) 0.002 0.794
(20,200) -0.020 1.190 (-0.089, 0.050) -0.033 1.100
(30,120) -0.224 1.432 (-0.313, -0.135) -0.229 1.421
(30,210) -0.055 1.200 (-0.130, 0.019) -0.063 1.191
(40,240) -0.197 1.226 (-0.273, -0.120) -0.202 1.221
Poisson fl0 = 1
(10,200) 0.007 0.355 (-0.015, 0.029) 0.002 0.352
(20,200) 0.028 0.510 (-0.004, 0.059) 0.023 0.504
(30,210) -0.033 0.590 (-0.069, 0.004) -0.036 0.588
(40,240) 0.003 0.648 (-0.038, 0.043) -0.001 0.644
110
Table 6.2: Simulated standardized estimate of logistic and Poisson regression coef-
flcient with uniformly distributed random efiects.
(^flw ?fl0)=s:e (^^flw ?fl0)=ds:e:
(m;n1) mean std 95% CI for (^^flw ?fl0) mean std
Logistic fl0 = 1
(10,200) 0.026 1.014 (-0.047, 0.051) 0.001 0.989
(20,200) -0.018 0.995 (-0.101, 0.035) -0.030 0.971
(30,120) 0.126 0.806 (-0.317, -0.140) -0.127 0.792
(30,210) -0.041 0.893 (-0.137, 0.010) -0.047 0.881
(40,240) -0.135 0.845 (-0.278, -0.126) -0.139 0.837
Poisson fl0 = 1
(10,200) 0.014 1.006 (-0.020, 0.024) 0.001 1.000
(20,200) 0.053 0.995 (-0.008, 0.054) 0.044 0.982
(30,210) -0.051 0.963 (-0.073, 0.0002) -0.057 0.959
(40,240) 0.005 0.976 (-0.041, 0.039) -0.0005 0.968
111
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=20, n1=200, betawhat, logistic
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=20, n1=200, betawhathat, logistic
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
m=40, n1=240, betawhat, logistic
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
m=40, n1=240, betawhathat, logistic
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=20, n1=200, betawhat, Poisson
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=20, n1=200, betawhathat, Poisson
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=40, n1=240, betawhat, Poisson
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=40, n1=240, betawhathat, Poisson
Figure 6.1: Q-Q plots for (^flw?fl0) and (^^flw?fl0) standardized by true and estimated
standard error, for various values of (m;n) in case 1 random intercept model with
uniformly distributed random efiects.
112
Our theoretical results assumed bounded random efiects, but we investigated
unbounded normal random efiects by simulation studies to those with uniform ran-
dom efiects. For each combination (m;n), 1000 replications of random N(0;0:25)
efiects vi and responses Yij were generated. The covariates xij were the same for all
the simulations. Estimated regression coe?cients for various choices of (m;n1) are
summarized in Table 6.3 and Table 6.4. The tables display the means and standard
errors of the simulated values of ^flw?fl0, ^^flw?fl0, (^flw?fl0)=s:e:, (^^flw?fl0)=ds:e: and
95% confldence bounds for (^flw?fl0), ^^flw?fl0 based on Student?s t. From Table 6.3
and Table 6.4, we can see that in combinations (30;210), (40;240) in logistic model
for ^flw, ^^flw, some bias is present. But for all the combinations from both tables,
the Shapiro-Wilk test p values are above 0.463 and the Kolmogorov-Smirnov test p
values are larger than 0.1, which means in those combinations, ^flw and ^^flw given v0
seem normal. From Table 6.3 and Table 6.4, we can see that bias/se< 0:177, so the
normal inference is not greatly afiected in the cases where bias is present. The plots
show that ^flw and ^^flw are approximately normal, with departures from normality in
the extreme tails when random efiects are normally distributed. Here we need to
point out that we have not proven asymptotic normality of our estimator in case 1
random intercept model for normally distributed random efiects, but the simulation
studies show that normality appears to hold.
113
Table 6.3: Simulated estimates of logistic and Poisson regression coe?cient combin-
ing flxed intercept with normally distributed random efiects.
(^flw ?fl0) ^^flw ?fl0
(m, n1) mean std 95% CI for ^flw ?fl0 mean std
Logistic fl0 = 1
(20,200) 0.012 1.132 (-0.058, 0.082) -0.002 1.117
(30,210) -0.206 1.219 (-0.282, -0.130) -0.214 1.209
(40,240) -0.202 1.227 (-0.278, -0.126) -0.203 1.221
Poisson fl0 = 1
(20,200) -0.001 0.512 (-0.033, 0.031) -0.005 0.510
(30,210) 0.011 0.587 (-0.026, 0.047) 0.005 0.585
(40,240) 0.024 0.662 (-0.017, 0.065) 0.021 0.659
114
Table 6.4: Simulated standardized estimate of logistic and Poisson regression coef-
flcient with normally distributed random efiects.
(^flw ?fl0)=s:e (^^flw ?fl0)=ds:e:
(m;n1) mean std 95% CI for (^^flw ?fl0) mean std
Logistic fl0 = 1
(20,200) 0.011 1.004 (-0.071, 0.068) -0.002 0.983
(30,210) -0.153 0.904 (-0.289, -0.139) -0.158 0.891
(40,240) -0.139 0.843 (-0.279, -0.127) -0.139 0.834
Poisson fl0 = 1
(20,200) -0.003 1.003 (-0.037, 0.026) -0.011 0.997
(30,210) 0.019 0.959 (-0.031, 0.414) 0.009 0.954
(40,240) 0.037 0.994 (-0.019, 0.062) 0.033 0.987
115
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=20, n1=200, betawhat, logistic
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=20, n1=200, betawhathat, logistic
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
m=40, n1=240, betawhat, logistic
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
m=40, n1=240, betawhathat, logistic
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=20, n1=200, betawhat, Poisson
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
m=20, n1=200, betawhathat, Poisson
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
0
2
m=40, n1=240, betawhat, Poisson
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
0
2
m=40, n1=240, betawhathat, Poisson
Figure 6.2: Q-Q plots for (^flw?fl0) and (^^flw?fl0) standardized by true and estimated
standard error, for various values of (m;n) in case 1 random intercept model with
normally distributed random efiects.
116
6.2 Case 1 logistic random intercept simulation for flxed realizations
of random efiects
Recall that Theorem 3.1.2 and Theorem 3.1.3 gave conditional convergence in
distribution of the normalized ^flw and ^^flw to the N(0;I) distribution. We performed
a limited study to examine this conditional convergence.
We repeated the simulations designed in Section 6.1.1 for the random intercept
problem, but with the following modiflcation: we generated 10 realizations of i.i.d
Unif(-0.8,0.8) of v0. For each realization of v0 we generated 1000 replications of y
with the same v0 and x?s as described in Section 6.1.1. This was performed only
for sample sizes m = 20, n1 = 200 and m = 40, n1 = 240, and for the logistic model
(6.1).
From the following tables Table 6.5,Table 6.6, Table 6.7 and Table 6.8, we
found that each realizations of v0, the normalized ^flw and ^^flw had Monte Carlo
mean zero and variances near 1 in the case m = 20, n1 = 200. For the case m = 40,
n1 = 240, the Monte Carlo mean and variance show some departures from the
desired 0 and 1. The Kolmogorov-Smirmov and Shapiro-Wilk tests all indicated no
signiflcant departures from normality. These flndings are very similar to those of
Section 6.1.1, which describe the unconditional distribution.
117
Table 6.5: Simulated estimates of logistic regression coe?cient combining flxed efiect
and flxed realizations of uniformly distributed random efiects (m = 20;n1 = 200).
(^flw ?fl0) ^^flw ?fl0
Realization mean std 95% CI for ^flw ?fl0 mean std
1 0.004 1.063 (-0.062, 0.070) -0.012 1.047
2 -0.208 1.133 (-0.098, 0.043) -0.042 1.117
3 -0.012 1.127 (-0.082, 0.058) -0.028 1.109
4 -0.044 1.105 (-0.113, 0.024) -0.058 1.087
5 0.028 1.066 (-0.038, 0.094) 0.012 1.048
6 0.019 1.141 (-0.052, 0.089) 0.005 1.127
7 -0.024 1.075 (-0.090, 0.043) -0.039 1.059
8 0.026 1.110 (-0.043, 0.095) 0.010 1.095
9 -0.035 1.079 (-0.102, 0.032) -0.048 1.061
10 -0.026 1.110 (-0.095, 0.042) -0.040 1.092
118
Table 6.6: Simulated standardized estimate of logistic regression coe?cient with
flxed realizations of uniformly distributed random efiects (m = 20; n1 = 200).
(^flw ?fl0)=s:e (^^flw ?fl0)=ds:e:
Realization mean std 95% CI for (^^flw ?fl0) mean std
1 0.003 0.948 (-0.077, 0.052) -0.011 0.926
2 -0.025 1.005 (-0.111, 0.027) -0.037 0.982
3 0.010 0.998 (-0.097, 0.041) -0.025 0.975
4 -0.040 0.988 (-0.125, 0.010) -0.052 0.975
5 0.025 0.946 (-0.053, 0.077) 0.011 0.924
6 0.017 1.016 (-0.065, 0.075) 0.004 0.996
7 -0.021 0.959 (-0.105, 0.027) -0.035 0.937
8 0.023 0.991 (-0.058, 0.078) 0.008 0.970
9 -0.031 0.961 (-0.114, 0.018) -0.043 0.938
10 -0.024 0.992 (-0.108, 0.027) -0.036 0.969
119
Table 6.7: Simulated estimates of logistic regression coe?cient combining flxed efiect
and flxed realizations of uniformly distributed random efiects (m = 40;n1 = 240).
(^flw ?fl0) ^^flw ?fl0
Realization mean std 95% CI for ^flw ?fl0 mean std
1 -0.210 1.207 (-0.285, -0.135) -0.214 1.202
2 -0.225 1.226 (-0.301, -0.149) -0.228 1.223
3 -0.117 1.125 (-0.194, -0.041) -0.120 1.229
4 -0.201 1.261 (-0.279, -0.122) -0.204 1.257
5 -0.164 1.261 (-0.238, -0.089) -0.204 1.225
6 -0.211 1.204 (-0.285, -0.136) -0.214 1.198
7 -0.185 1.211 (-0.260, -0.110) -0.188 1.206
8 -0.206 1.189 (-0.280, -0.132) -0.209 1.184
9 -0.205 1.203 (-0.279, -0.130) -0.207 1.200
10 -0.249 1.223 (-0.325, -0.173) -0.252 1.219
120
Table 6.8: Simulated standardized estimate of logistic regression coe?cient with
flxed realizations of uniformly distributed random efiects (m = 40; n1 = 240).
(^flw ?fl0)=s:e (^^flw ?fl0)=ds:e:
Realization mean std 95% CI for (^^flw ?fl0) mean std
1 -0.145 0.836 (-0.288, -0.139) -0.147 0.829
2 -0.150 0.845 (-0.304, -0.152) -0.157 0.839
3 -0.081 0.853 (-0.200, -0.044) -0.083 0.845
4 -0.138 0.868 (-0.282, -0.126) -0.140 0.861
5 -0.112 0.822 (-0.242, -0.093) -0.114 0.815
6 -0.145 0.831 (-0.288, -0.140) -0.147 0.823
7 -0.128 0.836 (-0.263, -0.113) -0.129 0.829
8 -0.142 0.821 (-0.283, -0.136) -0.144 0.813
9 -0.142 0.833 (-0.281, -0.133) -0.143 0.824
10 -0.172 0.845 (-0.328, -0.176) -0.174 0.839
6.3 Case 1 random intercept simulations for penalized likelihood es-
timators
We simulate logistic and Poisson random intercept regression to investigate
consistency and asymptotic normality of penalized regression coe?cient estimates by
using Jiang?s [14] PGWLE (Penalized Generalized Weighted Least Squares) method.
121
6.3.1 Case 1 logistic random intercept combining a with vi.
We consider Example 3.2 from Jiang [14]:
logitP(yij = 1jvi) = a+vi +fl1xij (6.2)
where a and fl1 are flxed parameters, xij is a scalar valued predictor, and vi are
iid N(0;Va). For our simulations we let a = 0. We used the PGWLS method by
Jiang [14] and the nlminb minimization function in Splus to get estimators which
minimize ?lP( ) derived by Jiang [14] as (??), which is deflned as
?lP( ) =
X
i
X
j
(?yij?ij +log(1+exp(?ij)))+ ?12 m(v)2
where ?ij = a+vi +xijfl1, v = Pmi vi=m, and  = (a;fl1;v1 ; ::: ;vm)t. We do not
attempt to estimate a and vi separately but only a+vi. For simplicity, we simulated
the balanced model with m clusters and n1 observations per cluster. We generated
m?n1 covariates xij uniformly spaced between -1 and 1, so that the ranges of fxijg
and fxljg, i 6= l, did not overlap. We also generate m independent random efiects vi
from the normal distribution with mean 0 and variance Va. Conditionally on (xij;vi)
with regression coe?cient fl10 = 1, a sample of mn1 binary random variables Yij was
generated with E(Yijjvi;xij) = exp(xij +vi)=(1+exp(vi +xij)).
Various combinations of (m;n1), Va, ?1 and initial values of estimated fl1,
v1 ; ::: ;vm were used.
For each combination, 1000 replications of random efiects vi and responses Yij
were generated. The covariates xij were the same for all the simulations. Estimated
regression coe?cients for various choices of (m;n1) with Va = 4, initial estimates set
122
to 0, and ?1 = 1, are summarized in Table 6.9. The tables display the means and
standard errors of the simulated values of (^fl1 ?fl10)=s:e:, (^fl1 ?fl10)=ds:e: and 95%
confldence bounds for ^fl1 ?fl10 based on Student?s t. From Table 6.9, we can see
that in the combination (20;20) with Va = 4, bias is present. But its Kolmogorov-
Smirnov test p values are larger than 0.1. This can be explained in terms of the
consistency condition logm=(logn1)2 ! 0 [Jiang ([14], Ex.3.2]. Among all the pairs
of sample sizes for simulation, (20;20) has the highest values of logm=(logn1)2 as
0:334. For all combinations from Table 6.9 whose 95% confldence intervals including
0.
In order to check whether the computations of ^fl1 are sensitive to initial values,
we ran various combinations of m, n1, Va and ?1 with initial values set to 0. For
the same vi and yij, we initialized ^fl1 at 3 and at logit(y::=(1 ? y::)). In each case
the ^vi were initialized at 0. This comparison was repeated 1000 times. The results
are summarized in Table 6.10.
123
Table 6.9: Simulated standardized estimates of logistic regression coe?cient (stan-
dardized by true and estimated conditional standardized error) combining flxed
intercept with random efiects.
(^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e:
(m, n1) mean std 95% CI for ^fl1 ?fl10 mean std
Va = 4, initials set to 0, ?1=1
(10,20) -0.017 1.021 (-0.127, 0.293) -0.015 1.010
(20,20) 0.077 1.012 (0.082, 0.665) 0.076 1.011
(20,40) 0.054 0.995 (-0.019, 0.384) 0.053 0.991
Va = 3, initials set to 0, ?1 = 1
(20,20) 0.054 0.938 (-0.008, 0.509) 0.053 0.932
Table 6.10: Simulated standardized estimate of logistic regression coe?cient for
various initial values.
(^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e:
initial ^fl1 mean std 95% CI for ^fl1 ?fl10 mean std
Va = 4, (m;n1) = (10;20), ?1=1
0 -0.017 1.021 (-0.127, 0.293) 0.015 1.011
3 -0.014 0.878 (-0.150, 0.277) 0.009 1.022
logit -0.020 0.896 (-0.150, 0.277) 0.015 1.011
124
AswecanseefromTable6.10, theestimatesarebiasedforlargerlogm=(logn1)2.
Moreover, the variance of (^fl1?fl10)=s:e: is sensitive to the initial guess of ^fl1. Also in
the combination (20;20) with Va = 4 and initial ^fl1 equal to logit, their Kolmogorov-
Smirnov test rejects the hypothesis of normality with p values 0.043 and 0.019. For
all the other combinations from Table 6.10, their goodness-flt tests suggest normal-
ity.
Since in reality we do not know how to choose ?1, we tried various values of ?1
to assess the efiect of ?1 on normality and consistency. The Monte Carlo averages
and standard deviations of (^fl1 ? fl10)=s:e: are displayed in Table 6.11 for various
(m;n1), ?1, initial values and Va. The estimates are computed for ?1 = 0:1, 1 and 5
for the same data, and this comparison was repeated 1000 times. Table 6.11 shows
that extremely large values of the penalty parameter (?1 = 5) did afiect the bias.
Table 6.11: Simulated standardized estimate of logistic regression coe?cient for
various ?1 values.
(^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e:
?1 mean std 95% CI for ^fl1 ?fl0 mean std
Va = 4, (m;n1) = (20;50), initials set to 0
1 0.048 1.012 (-0.034, 0.330) 0.048 1.009
0.1 0.022 0.858 (-0.034, 0.330) 0.048 1.009
5 0.022 0.858 (0.034, 0.400) 0.070 1.017
125
6.4 Case 2 logistic simple example
We consider Example 3.3 from Jiang [14], with fii = 0 for simplicity. We have
logitP(yijk = 1jbij) = ?+bij = ?+??ij (6.3)
where the ?ij are iid N(0;1). We estimate ? and ? by Jiang?s [14] MCLE method
based on integrating ?ij out of the likelihood. This gives us an unconditional log-
likelihood function in this special example. We also use the nlminb function in
Splus to get estimates which minimize the negative log likelihood function. In Case
2, reparameterization is used to take care of the identiflability problem. Here since
we integrate out ?ij, we only have the flxed efiect ? and scale parameter ?. We do
not need to worry about the identiflability problem, so the negative log-likelihood
function is the following:
?lC(?) = ?
X
i
X
j
hij = ?
X
i
X
j
Eexp[(?+??ij)yij+ ?rlog(1+exp(?+??ij))]
(6.4)
where yij+ = Pk yijk.
More precisely, here we have m rows and n columns for each cell we have r
observations. We generate m ? n1 random efiects bij from N(0;?20); where ?0 is a
scale parameter and set r = 2. Then we can get ?ij1 = ?ij2 = ?+bij. Conditionally
on ?ijk with k=1 or 2, a sample of m?n1?2 random variables Yijk is generated from
the Bernoulli distribution with E(Yijkj?;??ij) = exp(?+??ij)=(1+exp(?+??ij)).
Various choices of (m;n1) were simulated. For each choice, 500 replications of
bij, Yijk were generated. We used the Splus nlminb function to calculate estimates
of ? and ?. In this simulation, ?0 = 1, initials for ^? and ^? are 0 and 1, respectively.
126
Asymptotic normality results for standardized ^???0 and ^? ??0 are summarized in
Table 6.12.
For the combinations of Table (6.12), the only pair (m;n1) = (20;20) has
Kolmogorov-Smirnov test p values larger than 0.1. For (m;n1) = (10;10), the
Kolmogorov-Smirnov test rejects the normality hypothesis with p values 0.01 and 0
for (^???0)=s:e(?) and (^? ??0)=s:e:(?), respectively. For combinations (m;n1) =
(10;40) and (m;n1) = (10;30), normality only appear to hold for standardized
(^? ? ?0) with Kolmogorov-Smirnov test p values larger than 0.1. Moreover, the
Kolmogorov-Smirnov test rejects the normality hypothesis with p value less than
0.008.
The 95% confldence intervals for (^???0) for all the pairs (m;n1) include 0,
but 95% confldence intervals for (^? ??0) for (10;10) and (10;30) do not include 0.
In the combinations (m;n1) = (20;20) and (m;n1) = (10;40), both (^? ? ?0) and
(^? ??0) seem unbiased.
As a partial check on the joint normality of (^?;^?), we also examined [(^? ?
?0)+(^? ??0)]=p2. Similar results were obtained.
127
Table 6.12: Simulated standardized estimate of ? and ? in case 2 simple example
for various (m;n1) values
(^???0)=s:e(?) (^? ??0)=s:e:(?)
(m;n1) mean std mean std
?0 = 1, ?0 = 1, initials set to (0,1)
(10,10) -0.024 0.641 -0.236 1.550
(10,30) 0.079 0.835 -0.025 0.475
(20,20) 0.062 0.816 -0.015 0.523
(10,40) 0.036 0.800 -0.024 0.531
128
Quantiles of Standard Normal
Normalized estimate of mu
-3 -2 -1 0 1 2 3
-2
-1
0
1
2
m=10, n1=10, mu
Quantiles of Standard Normal
Normalized estimate of tau
-3 -2 -1 0 1 2 3
-4
-2
0
2
4
m=10, n1=10, tau
Quantiles of Standard Normal
Normalized estimate of mu
-3 -2 -1 0 1 2 3
-2
-1
0
1
2
m=10, n1=30, mu
Quantiles of Standard Normal
Normalized estimate of tau
-3 -2 -1 0 1 2 3
-2
-1
0
1
m=10, n1=30, tau
Quantiles of Standard Normal
Normalized estimate of mu
-3 -2 -1 0 1 2 3
-2
-1
0
1
2
3
m=20, n1=20, mu
Quantiles of Standard Normal
Normalized estimate of tau
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
m=20, n1=20, tau
Quantiles of Standard Normal
Normalized estimate of mu
-3 -2 -1 0 1 2 3
-2
-1
0
1
2
m=10, n1=40, mu
Quantiles of Standard Normal
Normalized estimate of tau
-3 -2 -1 0 1 2 3
-1
0
1
m=10, n1=40, tau
Figure 6.3: Q-Q plots for standardized (^???0) and (^? ??0) by true standard error,
for various (m;n1) in case 2 logistic simple example.
129
Chapter 7
Conclusions and Future Work.
7.1 Theoretical Conclusions
We have proposed a new estimator ^^flw = Pi ^wi^fli and have proven its condi-
tional asymptotic normality for the random intercept problem when m, the number
of random efiects goes to inflnity with the sample size N but at a slower rate,
characterized as m=N ! 0. Given the random efiects, the linear combinations of
weighted MLE denoted ^flw and ^^flw are asymptotically normal. The weight matri-
ces and asymptotic conditional covariance matrix of ^flw can be estimated, and the
standardized regression estimates, standardized by either the true or estimated co-
variance, have a limiting N(0;I) distribution. We have proven that in the absence
of regression coe?cients, the normalized Jiang?s [14] penalized likelihood estima-
tor of flxed intercept pm(^a ? a0) converges to a normal distribution. Di?culties
arise in establishing the conditional asymptotic normality of the penalized likelihood
estimator ^fl of regression coe?cients for flxed efiects in a general GLMM.
For the case m=N 9 0, joint asymptotic normality is proved for regression
coe?cient and scale parameter estimates after suitable standardization. A logistic
example with logitP(yijkjbij) = ?+bij is used to illustrate how to verify the general
conditions in this case.
130
7.2 Conclusions from the simulation studies and real data analysis
We focused on investigating asymptotic behavior of our new estimator ^^flw,
theoretic estimator ^flw and simulated balanced models for simplicity. In the case
m=N ! 0, logistic and Poisson random intercept models were simulated. For both
models, we considered normally and uniformly distributed random efiects. When
m=n ? 1=7, our estimators had some bias under logistic model. But for Poisson
model this problem did not occur except for (30;210) with fl0 = 1. In all the cases,
normality appears to hold and the ratio bias/s.e is less than 0.218, which means
normal inference is not greatly afiected. In the simulation studies for Jiang?s [14]
penalized estimator ^a of flxed intercept, asymptotic consistency and normality does
not hold for (30;60) with Va = 4 and (30;60), (40;60), (40;80) with Va = 2. Esti-
mates of a0 and standard deviation shows considerable bias.
We simulated logistic 2?2?m tables for both balanced and unbalanced setups
with uniformly distributed random efiects. In both setups, ^^flw has better consistency
results than ^flw. In all the cases, normality holds and the ratio bias/s.e is less than
0.194 except in extreme cases (100;50) and (100;60), which means normal inference
is not greatly afiected.
In the case m=N 9 0 for the logistic model logitP(yijk = 1jbij) = ? + bij,
among the combinations (20;20) with r = 2, unbiasedness and approximate nor-
mality hold for the intercept and scale parameter estimates. Scale parameter esti-
mates are biased if the number of clusters is too small (< 300). Normality holds
for standardized ^? ? ?0 but not for standardized ^? ? ?0 in (10;40) and (10;30)
131
combinations.
We analized a real data set of 22 clinical trials of beta-blocker for heart attach
treatment (Yusuf et al. (1985)). We flrst test homogeneity of odds ratios by using
Woolf?s test, the Breslow-Day test and the likelihood ratio test. None of them reject
the null hypothesis, which suggest common odds ratio model is appropriate. Our
estimator and Mantel-Haenszel estimator are very close. Mantel-Haenszel estimator
is consistent in sparse data case unlike our estimator. The logic of our estimator
can be extended to other types of GLMM where m ! 1 and m=mini ni ! 0. No
corresponding extension for Mantel-Haenszel is available.
7.3 Future work.
We can try to use projected score methods to overcome the di?culties in
proving conditional asymptotic normality for penalized likelihood estimates of flxed
efiects in the case 1 random intercept problem. Also we can consider conditional
logistic regression estimate of random intercept model.
We can investigate the asymptotic behavior of Penalized Generalized Weighted
Least Square (PGWLS) estimate in a more complicated random efiects models such
as two way crossed random efiects model.
132
Chapter A
Simulation Results
A.1 Case 1 random intercept model for penalized likelihood estima-
tor
A.1.1 Case 1 logistic random intercept combining a with vi.
We consider Example 3.2 from Jiang [14]:
logitP(yij = 1jvi) = a+vi +fl1xij (A.1)
where a and fl1 are flxed parameters, xij is a scalar valued predictor, and vi are
iid N(0;Va). For our simulations we let a = 0. We used the PGWLS method
by Jiang [14] and nlminb minimization function in Splus to get estimators which
minimize ?lP( ) derived by Jiang [14] as (??), which is deflned as
?lP( ) =
X
i
X
j
(?yij?ij +log(1+exp(?ij)))+ ?12 m(v)2
where ?ij = a + vi + xijfl1, v = Pmi vi=m, and  = (a;fl1;v1 ; ::: ;vm)t. We do
not attempt to estimate a and vi separately but only a + vi. For simplicity, we
simulated the balanced model with m clusters and n1 observations per cluster. We
generated m ? n1 covariates xij uniformly spaced between -1 and 1, so that the
ranges of fxijg and fxljg, i 6= l, did not overlap. For example if we have m = 3
clusters and n1 = 2 observations per cluster, flrst we divided interval [?1;1] into
133
m = 3 equal sized intervals [?1;?1=3], [?1=3;1=3] and [1=3;1], and then divide
each interval into n1 = 2 subintervals to get x11 = ?1, x12 = ?2=3, x21 = ?1=3,
x22 = 0, and x31 = 1=3, x32 = 2=3 for clusters 1, 2, 3 respectively. We also
generate m independent random efiects vi from the normal distribution with mean
0 and variance Va. Conditionally on (xij;vi) with regression coe?cient fl10 = 1,
a sample of mn1 binary random variables Yij was generated with E(Yijjvi;xij) =
exp(xij +vi)=(1+exp(vi +xij)).
Various combinations of (m;n1), Va, ?1 and initial values of estimated fl1,
v1 ; ::: ;vm were used.
For each combination, 1000 replications of random efiects vi and responses Yij
were generated. The covariates xij were the same for all the simulations. Estimated
regression coe?cients for various choices of (m;n1) with Va = 4, initial estimates
set to 0, and ?1 = 1, are summarized in Table A.1. The tables display the means
and standard errors of the simulated values of (^fl1 ?fl10)=s:e: and 95% confldence
bounds for E(^fl1 ?fl10) based on Student?s t. From Table A.1, we can see that in
combinations (20;20), (20;30) with Va = 4 and even for Va = 3 with (20;20), the
bias is present. But their Kolmogorov-Smirnov test p values are larger than 0.1.
This can be explained in terms of the consistency condition logm=(logn1)2 ! 0
[Jiang ([14], Ex.3.2]. Among all the pairs of sample sizes for simulation, (20;20) and
(20;30) have the highest values of logm=(logn1)2, 0:334 and 0:259 respectively. For
all combinations from Table A.1 whose 95% confldence intervals including 0.
In order to check whether the computations of ^fl1 are sensitive to initial values,
we ran various combinations of m, n1, Va and ?1 with initial values set to 0. For
134
the same vi and yij, we initialized ^fl1 at 3 and at logit(y::=(1?y::)). In each case the
^vi were initialized at 0. This comparison was repeated 1000 times. The estimated
standardized regression coe?cients, standardized by conditional standard error and
estimated conditional standard error, are summarized in Table A.2.
135
Table A.1: Simulated standardized estimates of logistic regression coe?cient (stan-
dardized by true and estimated conditional standardized error) combining flxed
intercept with random efiects.
(^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e:
(m, n1) mean std 95% CI for ^fl1 ?fl10 mean std
Va = 4, initials set to 0, ?1=1
(10,20) -0.017 1.021 (-0.127, 0.293) -0.015 1.010
(10,30) 0.025 1.009 (-0.075, 0.260) 0.024 1.002
(10,40) 0.021 0.980 (-0.086, 0.195) 0.021 0.977
(20,20) 0.077 1.012 (0.082, 0.665) 0.076 1.011
(20,30) 0.075 0.995 (0.059, 0.523) 0.075 0.990
(20,40) 0.054 0.995 (-0.019, 0.384) 0.053 0.991
(20,50) 0.049 1.012 (-0.034, 0.330) 0.048 1.009
Va = 3, initials set to 0, ?1 = 1
(20,20) 0.054 0.938 (-0.008, 0.509) 0.053 0.932
136
Table A.2: Simulated standardized estimate of logistic regression coe?cient for var-
ious initial values.
(^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e:
initial ^fl1 mean std 95% CI for ^fl1 ?fl10 mean std
Va = 4, (m;n1) = (10;20), ?1=1
0 -0.017 1.021 (-0.127, 0.293) 0.015 1.011
3 -0.014 0.878 (-0.150, 0.277) 0.009 1.022
logit -0.020 0.896 (-0.150, 0.277) 0.015 1.011
Va = 4, (m;n1) = (20;50), ?1 = 1
0 0.048 1.011 (-0.034, 0.330) 0.048 1.009
logit 0.022 0.058 (-0.034, 0.330) 0.048 1.009
Va = 4, (m;n1) = (20;20), ?1 = 1
0 0.077 1.012 (0.082, 0.665) 0.076 1.001
logit 0.074 0.839 (0.152, 0.696) 0.087 0.940
Va = 4, (m;n1) = (20;30), ?1 = 1
0 0.075 0.995 (0.059, 0.523) 0.075 0.990
logit 0.062 0.845 (0.114, 0.562) 0.087 0.958
137
AswecanseefromTableA.2, theestimatesarebiasedforlargerlogm=(logn1)2.
Moreover, the variance of (^fl1?fl10)=s:e: is sensitive to the initial guess of ^fl1. Also in
the combination (20;20) with Va = 4 and initial ^fl1 equal to logit, their Kolmogorov-
Smirnov test rejects the hypothesis of normality with p values 0.043 and 0.019. For
all the other combinations from Table A.2, their Kolmogorov-Smirnov test p values
are larger than 0.1.
Since in reality we do not know how to choose ?1, we tried various values
of ?1 to assess the efiect of ?1 on normality and consistency. The Monte Carlo
averages and standard deviations of (^fl1 ? fl10)=s:e: are displayed in Table A.3 for
various (m;n1), ?1, initial values and Va. The estimates are computed for ?1 = 0:1,
1 and 5 for the same data, and this comparison was repeated 1000 times. Table A.3
shows that extremely large values of the penalty parameter (?1 = 5) did afiect the
consistency. For Table A.3, the combinations (20;20) with Va = 4 have their the
Kolmogorov-Smirnov test rejects the hypothesis of normality with p values between
0.019 and 0.043, except for ?1 = 5 whose p value is bigger than 0.1. For all the other
combinations from Table A.3, their Kolmogorov-Smirnov test p values are larger
than 0.1. Generally speaking, the plots show that ^fl1 is approximately normal, with
departures from normality in the extreme tails.
138
Table A.3: Simulated standardized estimate of logistic regression coe?cient for var-
ious ?1 values.
(^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e:
?1 mean std 95% CI for ^fl1 ?fl10 mean std
Va = 4, (m;n1) = (10;20), initials set to 0
1 -0.017 0.957 (-0.229, 0.164) -0.018 0.947
0.1 -0.020 0.896 (-0.150, 0.277) -0.009 1.002
5 -0.020 0.914 (-0.164, 0.258) 0.003 1.020
Va = 3, (m;n1) = (20;20), initials set to 0
1 0.054 0.938 (-0.008, 0.509) 0.053 0.932
0.1 0.037 0.889 (-0.021, 0.520) 0.052 0.975
5 0.037 0.889 (-0.021, 0.520) 0.052 0.975
Va=4, (m;n1) = (20;50), initials set to 0
1 0.048 1.012 (-0.034, 0.330) 0.048 1.009
0.1 0.022 0.858 (-0.034, 0.330) 0.048 1.009
5 0.022 0.858 (0.034, 0.400) 0.070 1.017
Va = 4, (m;n1) = (20;20), initials set to 0
1 0.077 1.012 (0.082, 0.665) 0.076 1.001
0.1 0.074 0.839 (0.250, 0.800) 0.087 0.940
5 0.078 0.855 (0.250, 0.800) 0.108 0.959
139
Quantiles of Standard Normal
Normalized  estimate  of  beta
-2 0 2
-3
-2
-1
0
1
2
Va=4, m=10, n1=20 
Quantiles of Standard Normal
Normalized  estimate  of  beta
-2 0 2
-3
-2
-1
0
1
2
Va=4, m=20, n1=20
Quantiles of Standard Normal
Normalized  estimate  of  beta
-2 0 2
-2
-1
0
1
2
Va=3, m=20, n1=20
Quantiles of Standard Normal
Normalized  estimate  of  beta
-2 0 2
-3
-2
-1
0
1
2
3
Va=4, m=20, n1=30
Figure A.1: Q-Q plots for (^fl1?fl10) standardized by true conditional standard error
combining a with vi, for various (m;n1) in logistic random intercept case 1.
140
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
Va=4, m=10, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
Va=4, m=20, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
Va=3, m=20, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Va=4, m=20, n1=30
Figure A.2: Q-Q plots for (^fl1?fl10) standardized by estimated conditional standard
error combining a with vi, for various (m;n1) in logistic random intercept case 1.
141
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
initials(0,rep(0,m)), (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
initials(logit,rep(0,m)), (20,50)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
initials(3,rep(0,m)), (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
initials(0,rep(0,m)), (20,50)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
initials(logit,rep(0,m)), (10,20)
Figure A.3: Q-Q plots for (^fl1?fl10) standardized by true conditional standard error
combining a with vi, for various initial values of ^fl1 in logistic random intercept case
1.
142
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
initials(0,rep(0,m)), (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
initials(0,rep(0,m)), (20,50)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
initials(3,rep(0,m)), (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
initials(logit,rep(0,m)), (20,50)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
initials(logit,rep(0,m)), (10,20)
Figure A.4: Q-Q plots for (^fl1 ? fl10) standardized by estimated standard error
combining a with vi, for various initial values of ^fl1 in logistic random intercept case
1.
143
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
lambda1=1, (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
lambda1=1, (20,50)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
lambda1=0.1, (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
lambda1=0.1, (20,50)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
lambda1=5, (10,20)
Figure A.5: Q-Q plots for (^fl1?fl10) standardized by true conditional standard error
combining a with vi, for various values of ?1 in logistic random intercept case 1.
144
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
lambda1=1, (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
lambda1=1, (20,50)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
lambda1=0.1, (20,50)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
lambda1=0.1, (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
lambda1=5, (10,20)
Figure A.6: Q-Q plots for (^fl1?fl10) standardized by estimated conditional standard
error combining a with vi, for various values of ?1 in logistic random intercept case
1.
145
A.1.2 Case 1 logistic random intercept when a and vi are estimated
separately.
We consider the following example from Jiang [14]:
logitP(yij = 1jv) = a+vi +xijfl1: (A.2)
Since ^a0 and ^fl1 have difierent convergence rates, we do not consider joint asymptotic
distribution. We are interested in ^fl1 and use the estimating equation (??) and the
Splus function nlminb to investigate the asymptotic behavior of ^fl1.
We use a balanced setup with m clusters and n1 observations per cluster. We
generate m random efiects vi from the normal distribution with mean 0 and variance
Va. Conditionally on (xij;vi) with fl10 = a0 = 1, a sample of mn1 random variables
Yij is generated from Bernoulli distribution with E(Yijjvi;xij) = exp(xijfl1 + a +
vi)=(1+exp(xijfl1 +a+vi)).
Various combinations of (m;n1), Va, ?1 were simulated and all combinations
initialized ^a, ^fl1 and ^vi at zero.
For each combination, 1000 random replications of random efiects vi and the
responses Yij are generated. Estimated standardized regression coe?cients ^fl1 stan-
dardized by true conditional standardized error and estimated conditional standard-
ized error are summarized in Table A.4.
146
Table A.4: Simulated standardized estimates of logistic regression coe?cient (stan-
dardized by true and estimated conditional standardized error) with a and vi esti-
mated separately.
(^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e:
(m;n1) mean std 95% CI for ^fl1 ?fl10 mean std
Va = 4, initials set to 0, ?1=1
(10,20) -0.007 0.999 (-0.207, 0.223) -0.009 0.976
(20,20) 0.018 0.903 (-0.344, 0.194) -0.019 0.890
(20,30) 0.011 0.970 (-0.190, 0.280) 0.010 0.963
(20,40) 0.054 1.030 (-0.110, 0.276) 0.053 0.956
Va = 3, initials set to 0, ?1 = 1
(20,20) 0.020 0.924 (-0.160, 0.370) 0.018 0.910
(20,30) 0.020 0.975 (-0.159, 0.290) 0.020 0.967
Va = 2, initials set to 0, ?1 = 1
(20,20) 0.014 0.963 (-0.190, 0.340) 0.012 0.950
For the above Table (A.4), the 95% confldence intervals for (^fl1 ?fl10) include
0. Normality holds for (^fl1 ? fl10) standardized by true conditional standard error
given v0. The mean and standard error estimates are close to 0 and 1 respectively,
except for the pairs as (20;20) and (20;30) with Va = 4. In these cases, their the
Kolmogorov-Smirnov test rejects the hypothesis of normality with p-values of 0.0179
and 0.0171 respectively. Even for Va = 3 , (m;n1) = (20;20), the p-value of 0.06 is
147
almost signiflcant.
Normality holds for (^fl1?fl10) standardized by estimated conditional standard
error given v0. The mean and standard error estimates are close to 0 and 1 respec-
tively, except for the pairs as (20;20), (20;30) with Va = 4 and (20;20) with Va = 3.
In these cases, the Kolmogorov-Smirnov test rejects the hypothesis of normality
with p-values of 0.014, 0.011, 0.04 respectively.
For all the other combinations for Table A.4, their Kolmogorov-Smirnov test
p values are larger than 0.1.
The 95% confldence intervals for (^a?a0) do not include 0 (the intervals are not
listed in the table but all were approximately (0:1;0:4)). The Kolmogorov-Smirnov
test applied to Monte Carlo distribution of ^a signiflcantly rejects the hypothesis
of normality. From the theoretical results of Chapter 3, we know the convergence
rates difier between estimated logistic regression coe?cient and intercept, with rates
approximately 1=N and 1=m, respectively. Here we have sample sizes from 200 to
1000, but number of clusters 10 or 20. Even in a small simulation with (m;n1) =
(40;60) with 200 replications, there is bias present for estimated logistic regression
intercept.
The Q-Q plots show that the standardized (^fl1?fl10) is approximately normal,
with departures from normality in the extreme tails.
148
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
Va=4, m=10, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
Va=4, m=20, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
Va=3, m=20, n1=30
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
Va=2, m=20, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
Va=4, m=20, n1=30
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
Va=3, m=20, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
3
Va=4, m=20, n1=40
Figure A.7: Q-Q plot for (^fl1?fl10) standardized by true conditional standard error
not combining a with vi, for various (m;n1) in logistic random intercept case 1.
149
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
Va=4, m=10, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
Va=4, m=20, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
Va=3, m=20, n1=30
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
Va=2, m=20, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Va=4, m=20, n1=30
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
Va=3, m=20, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
-1
0
1
2
3
Va=4, m=20, n1=40
Figure A.8: Q-Q plot for (^fl1?fl10) standardized by estimated conditional standard
error with estimated a with vi separately, for various (m;n1) in logistic random
intercept case 1.
150
A.1.3 Case 1 Poisson random intercept combining a and vi.
We use the PGWLS method by Jiang [14] and nlminb minimization function
in Splus to get estimators which minimize negative lP( ) as (??), Here we have
?lP( ) =
X
i
X
j
(?yij?ij +exp(?ij))+ ?12 m(v)2
where ?ij = ai + fl1xij, ai = a + vi = vi (in our simulation we let a = 0) and
v = Pmi vi=m,  = (fl1;v1; ::: ;vm). Here a and fl1 are flxed parameters, xij is a
scalar valued predictor, and vi are iid N(0;Va). We do not attempt to estimate a and
vi separately. For simplicity we simulated the balanced model with m clusters and n1
observations per cluster. We generate m random efiects vi from normal distribution
with mean 0 and variance Va. Covariates xij are the same here as in logistic Case 1.
Conditionally on (xij;vi) with fl10 = 1, a sample of mn1 random variables Yij were
generated from Poisson distribution with E(Yijjvi;xij) = exp(xij +vi).
Various combinations of (m,n1), Va and ?1 values are investigated.
For each combination, 1000 replications of random efiects vi and responses Yij
were generated. Estimated regression coe?cients, for various choices of (m;n1) with
Va = 4, ?1 = 1 and initial estimates set to 0, are summarized in Table A.5. The table
displays the means and standard errors of the simulated values of (^fl1 ? fl10)=s:e:
and 95% confldence bounds for E(^fl1 ?fl10) based on Student?s t.
151
In order to check sensitivity of ?1 values, the estimates are computed for
?1 = 0:1, 1 and 5 for the same data, and this comparison was repeated 1000 times.
From Table A.5 and Table A.6, consistency and normality results hold and are not
sensitive to the values of ?1. In these cases, their Kolmogorov-Smirnov test p values
are larger than 0.1 except for the combination (20;20) with Va = 4. In this case,
the Kolmogorov-Smirnov test almost rejects the normality hypothesis with p value
0.06. The Q-Q plot shows that standardized (^fl1 ?fl10) is approximately normal.
Table A.5: Simulated standardized estimate of Poisson regression coe?cient for
various (m;n1) values.
(^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e:
(m;n1) mean std 95% CI for ^fl1 ?fl10 mean std
Va = 4, ?1 = 1, initials set to 0
(10,20) 0.018 0.970 (-0.022, 0.064) 0.018 0.969
(20,20) -0.005 1.008 (-0.050, 0.054) 0.005 1.008
(20,30) -0.0006 1.007 (-0.042, 0.039) 0.006 1.007
152
Table A.6: Simulated standardized estimate of logistic regression coe?cient for var-
ious ?1 values.
(^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e:
?1 mean std 95% CI for ^fl1 ?fl10 mean std
Va = 4, (m;n1) = (10;20), initials set to 0
1 0.018 0.970 (-0.022, 0.064) 0.018 0.969
0.1 0.015 1.100 (-0.026, 0.064) 0.013 1.024
5 0.028 1.021 (-0.018, 0.064) 0.022 0.965
Va = 4, (m;n1) = (20;30), initials set to 0
1 0.006 1.007 (-0.042, 0.039) 0.006 1.007
0.1 0.001 1.025 (-0.044, 0.033) -0.002 1.008
5 0.009 1.012 (-0.038, 0.043) 0.006 1.003
153
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Va=4, m=10, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Va=4, m=20, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Va=4, m=20, n1=30
Figure A.9: Q-Q plots for (^fl1?fl10) standardized by true conditional standard error
with a and vi estimated separately in Poisson random intercept case 1.
154
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Va=4, m=10, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Va=4, m=20, n1=20
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Va=4, m=20, n1=30
Figure A.10: Q-Q plots for(^fl1?fl10) standardized by estimated conditional standard
error with a and vi estimated separately in Poisson random intercept case 1.
155
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Lamda1=1, (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Lamda1=1, (20,30)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-4
-2
0
2
lambda1=5, (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
lambda1=5, (20,30)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-4
-2
0
2
4
lambda1=0.1, (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-4
-2
0
2
4
lambda1=0.1, (20,30)
Figure A.11: Q-Q plots for (^fl1?fl10) for standardized by true conditional standard
error with a and vi estimated separately, for various values of ?1 in Poisson random
intercept case 1.
156
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Lamda1=1, (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
Lamda1=1, (20,30)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
0
2
lambda1=0.1, (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-2
0
2
lambda1=0.1, (20,30)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
lambda1=5, (10,20)
Quantiles of Standard Normal
Normalized estimate of beta
-2 0 2
-3
-2
-1
0
1
2
3
lambda1=5, (20,30)
Figure A.12: Q-Q plots for (^fl1?fl10) standardized by estimated conditional standard
error with a and vi estimated separately, for various values of ?1 in Poisson random
intercept case 1.
157
A.1.4 Case 1 Poisson random intercept with a and vi estimated sep-
arately
We use the PGWLS method by Jiang [14] and nlminb minimization function
in Splus to get estimators which minimize negative lP( ) as (??). Here we have
?lP( ) =
X
i
X
j
(?yij?ij +exp(?ij))+ ?12 m(v)2
where ?ij = a+vi +fl1xij and v = Pmi vi=m,  = (fl1; a; v1; ::: ;vm). Here a and
fl1 are flxed parameters, xij is a scalar valued predictor, and vi are iid N(0;Va). We
attempt to estimate a and vi separately. For simplicity we simulated the balanced
model with m clusters and n1 observations per cluster. We generate m random
efiects vi from normal distribution with mean 0 and variance Va. Covariates xij are
the same here as in logistic Case 1. Conditionally on (xij;vi) with fl10 = a0 = 1, a
sample of mn1 random variables Yij were generated from Poisson distribution with
E(Yijjvi;xij) = exp(fl1xij +vi +a).
The same choices of (m;n1) and Va values as in the previous subsection were
simulated.
Unlike the results when only vi + a were estimated, as in the previous sub-
section, when we attempted to estimate a and vi separately, none of the desired
asymptotic results for ^fl1 were observed. The estimate ^fl1 was biased and its Monte
Carlo distribution failed tests of normality.
158
Bibliography
[1] E. B. Andersen, \Asymptotic properties of conditional maximum-Likelihood
estimators", J. Roy. Statist. Soc. Ser. B 32, 283-301 (1970).
[2] R. H. Berk, \Consistency and asymptotic normality of MLE?s for exponential
models," The Annals of Mathematical Statistics, 43, 193-204 (1972).
[3] R. Bhatia, Matrix Analysis, Springer (1991).
[4] O. Barndorfi-Nielsen, \On a formula for the distribution of the maximum like-
lihood estimate," Biometrika, 70, 343-365 (1983).
[5] N. Breslow \Odds ratio estimators when the data are sparse," Biometrika, 68,
73-84 (1981).
[6] E. Cantoni, \Robust inference for generalized linear models," J. Amer. Statist.
Assoc 96, 1022-1030 (2001).
[7] J. M. Chiou and H. G. Muller, \Estimated estimating equations: semiparamet-
ric inference for clustered and longitudinal data," J. Roy. Statist. Soc. Ser.B
67, 531-553 (2005).
[8] N. E. Breslow and D. G. Clayton, \Approximate inference in generalized linear
mixed models," J. Amer. Statist. Assoc. 88, 9-25 (1993).
[9] A. C. Davison, \Approximate conditional inference in generalized linear mod-
els," J. Roy. Statist. Soc. Ser. B, 50, 445-461 (1988).
[10] I. Domowitz, \Misspecifled models with dependent observations," J. of Econo-
metrics 20, 35-38 (1982).
[11] L. Fahrmeir and H. Kaufmann, \Consistency and asymptotic normality of the
maximum likelihood estimators in generalized linear models," The Annals of
Statistics 13, 342-368 (1985).
[12] L. Fahrmeir, \Maximum likelihood estimation in misspecifled generlized linear
models," Akademie-Veriag Berlin statistics 21 4, 487-502 (1990).
[13] S. J. Haberman, \Maximum likelihood estimates in exponential response mod-
els," Annals of Statistics 5, 815-841 (1977).
159
[14] J. Jiang, \Conditional inference about generalized linear mixed models," The
Annals of Statistics 27, 1974-2007 (1999).
[15] J. Jiang, H. Jia and H. Chen, \Maximum posterior estimates of random efiects
in generalized linear mixed models," Statistica Sinica 11, 97-120 (2001).
[16] X. P. Jiang, Ph.D. thesis, \Nonparametric quasi-likelihood in longitudinal data
analysis," University of Maryland, College Park, (2004).
[17] Y. Lee and J. A. Nelder, \Hierarchical generlized linear models," J. Roy. Statist.
Soc. Ser. B 58, 619-678 (1996).
[18] H. Li, B. G. Lindsay and R. P. Waterman, \E?ciency of projected score meth-
ods in rectangular array asymptotics," J. Roy. Statist. Soc. Ser. B, 65, 191-208
(2003)
[19] P. McCullagh and J. A. Nelder, Generalized Linear Models, (2nd ed. Chapman
and Hall, New York, 1989).
[20] P. McCullagh, \Quasi-likelihood functions," Annals of Statistics 11, 59-67
(1983).
[21] C. E. McCulloch, \Maximum likelihood algorithems for generalized linear
mixed models," J. Amer. Statist. Assoc. 92, 162-170 (1997).
[22] C. E. McCulloch and S. R. Searle, Generalized, Linear, and Mixed Models
(Wiley-Interscience Publication, 2001).
[23] J. A. Nelder and R. W. M. Wedderburn, \Generalized linear models," J. Roy.
Statist. Soc. Ser.A 135, 370-384 (1972).
[24] J. Neyman and E. L. Scott, \Consistent estimates based on partially consistent
observations," Ecomometrika. 16, 1-32 (1948).
[25] J. Robins, N. Breslow and S. Greenland, \Estimators of the Mantel-Haenszel
variance consistent in both sparse data and large-strata limiting models," Bio-
metrics 42, 311-323 (1986).
[26] P. Royston \A remark on algorithm AS 181: The W test for normality" Applied
Statistics 44 547-551 (1995).
[27] S.S.Shapiro and M.B.Wilk \An analysis of variance test for normality (complete
samples)" Biometrika 52 591-611 (1965)
160
[28] N. Sartori and T. A. Severini, \Conditional likelihood inference in generalized
linear mixed models," Statistica Sinica 14, 349-360 (2004).
[29] J. R. Schott, Matrix Analysis for Statistics
[30] S. K. Sinha, \Robust analysis of generalized linear mixed models," J. Amer.
Statist. Assoc. 99, 451-460 (2004).
[31] G. W. Stewart, Introduction to Matrix Computations (Academic Press, 1973).
[32] A. W. Van der Vaart, Asymptotic Statistics (Cambridge, 1998).
[33] R. W. M. Wedderburn, \On the existence and uniqueness of the maximum
likelihoodestimatesforgeneralizedlinearmodels," Biometrika63, 27-32(1976).
[34] R. W. M. Wedderburn, \Quasi-likelihood functions, generalized linear models,
and the Gauss-Newton method," Biometrika, 61, 439-447 (1974).
[35] X. Xie and Y. Yang, \Asymptotics for generalized estimating equations with
large cluster sizes," The Annals of Statistics 31, 310-347 (2003).
[36] S. Yusuf, R. Peto, J. Lewis, R. Collins and P. Sleight \Beta blockade during
and after myocardial infartion: an overiew of the randomized trials" Progress
in Cardiovascabr Diseases 27 335-371 (1985)
[37] S. L. Zeger and K. -Y. Liang, \Longitudinal data analysis for discrete and
continuous outcomes," Biometrics, 42, 121-130 (1986).
[38] M. Znidaric, \Asymptotic expansion for inverse moments of Binomial and Pois-
son Distributions?.
161