ABSTRACT Title of dissertation: ASYMPTOTIC NORMALITY IN GENERALIZED LINEAR MIXED MODELS Min Min Doctor of Philosophy, 2007 Dissertation directed by: Professor Paul J. Smith Statistics Program Department of Mathematics Generalized Linear Mixed Models (GLMMs) extend the framework of Gener- alized Linear Models (GLMs) by including random efiects into the linear predictor. This will achieve two main goals of incorporating correlation and allowing broader inference. This thesis investigates estimation of flxed efiects as the number of ran- dom efiects grows large. This model describes cluster analysis with many clusters and also meta-analysis. After reviewing currently available methods, especially the penalized likeli- hood and conditional likelihood estimators of Jiang [14], we focus on the random intercept problem. We propose a new estimator ^^flw of regression coe?cient and prove that when m, the number of random efiects, grows to inflnity at a slower rate than the smallest cluster sample size, ^^flw is consistent and given the realiza- tion of random efiects, is asymptotically normal. We also show how to estimate the standard errors of our estimators. We also study the asymptotic distribution of Jiang?s [14] penalized likelihood estimators. In the absence of regression coe?cients, the normalized estimated intercept pm(^a?a0) converges to a normal distribution. Di?culties arise in establishing the conditional asymptotic normality of Jiang?s [14] penalized likelihood estimator ^fl of regression coe?cients for flxed efiects in a general GLMM. In Chapter 4, we make an extended analysis of the 2?2?m table to show how to verify the general conditions in Chapter 3. We compare our estimator to the Mantel-Haenszel estimator. Simulation studies and real data analysis results validate our theoretical results. In Chapter 5, asymptotic normality of joint flxed efiect estimate and scale parameter estimate is proved for the case as m=N 9 0. An example was used to verify the general conditions in this case. Simulation studies were performed to validate the theoretical results as well as to investigate conjectures that are not covered in the theoretical proofs. The asymptotic theory for ^flw describes the flnite sample behavior of ^flw very accurately. We flnd that in the case as m=N ! 0, in the random logistic and Poisson intercept models, consistency and conditional asymptotic normality results appear to hold for the penalized regression coe?cient estimates ^fl. ASYMPTOTIC NORMALITY IN GENERALIZED LINEAR MIXED MODELS by Min Min Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulflllment of the requirements for the degree of Doctor of Philosophy 2007 Advisory Committee: Dr. Paul Smith, Chair/Advisor Dr. C. Mitchell Dayton Dr. Abram Kagan Dr. Partha Lahiri Dr. Eric Slud c Copyright by Min Min 2007 DEDICATION In the memory of my grandparents, Houzhang and Yunxuan ii ACKNOWLEDGEMENTS I am deeply grateful to my advisor, Dr. Paul Smith for his teaching, guiding and encouraging me through all these years. He has always made himself available for help and provided valuable insights. Without his academic advising and moral support, it is impossible to complete my dissertation. I am honored to work with such an extraordinary scholar. I would like to thank the committee members: Drs. Abram Kagan and Eric Slud for reading my manuscript and making valuable comments and corrections which enhanced the quality of this work. Drs. C. Mitchell Dayton and Partha Lahiri for agreeing to serve on my dissertation committee. I would also like to thank Chip Denman for his support when I was working at stat lab. I thank all of my friends at UMCP, especially Guanhua Lu, Huaizhong Ren, Hsiao-Hui Tsou, Shihua Wen, Chin-Fang Weng and Zhihui Yang for their support and loyal friendship. I owe my deepest appreciation to my family, especially my parents for their unconditional love and always believing in me. Finally, I want to thank my husband for his love, understanding and sharing the happiness of my achievement. I also want to thank my best friend Yuehan Wang for always being there for me. A special thank you goes to my son Chance for his patience, for bringing me great joy and enriching my life. iii Table of Contents List of Tables v List of Figures viii 1 Introduction 1 1.1 Notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Literature Review 7 2.1 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Deflnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Quasilikelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Generalized Estimating Equations . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Deflnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 Generalized Linear Mixed Models . . . . . . . . . . . . . . . . . . . . 22 2.4.1 Deflnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.2 Estimation approaches . . . . . . . . . . . . . . . . . . . . . . 23 2.4.3 Random intercept model (canonical link) . . . . . . . . . . . . 26 2.5 Generalized GLMM (canonical link function case) . . . . . . . . . . . 27 2.5.1 Deflnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5.2 Penalized Generalized Weighted Least Squares in Case 1 . . . 28 2.5.3 Maximum Conditional Likelihood Estimates in Case 2 . . . . 33 3 Case 1 random intercept model results. 36 3.1 Case 1 random intercept (combine a and vi) . . . . . . . . . . . . . . 37 3.2 Penalized likelihood estimator ^a in the case 1 random intercept model when fl=0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Discussion of Jiang [14] penalized likelihood estimator ^fl in case 1 random intercept model (fl 6= 0) . . . . . . . . . . . . . . . . . . . . . 44 3.4 Proofs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.1 Proof of Lemma 3.1.1 . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.2 Proof of Theorem 3.1.2 . . . . . . . . . . . . . . . . . . . . . . 51 3.4.3 Proof of Theorem 3.1.3 . . . . . . . . . . . . . . . . . . . . . . 55 3.4.4 Proof of Corollary 3.1.4 . . . . . . . . . . . . . . . . . . . . . 59 3.4.5 Proof of Theorem 3.2.1 . . . . . . . . . . . . . . . . . . . . . . 60 4 Logistic 2?2?m table 64 4.1 Logistic 2?2?m table . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Simulation results for 2?2?m table. . . . . . . . . . . . . . . . . . 73 4.2.1 Unconditional convergence in distribution . . . . . . . . . . . 74 4.2.2 Conditional Convergence in Distribution . . . . . . . . . . . . 84 4.3 Analysis of real data for a 2?2?22 table . . . . . . . . . . . . . . . 89 iv 5 Case 2 Results. 91 5.1 Case 2 consistency results of Jiang [14]. . . . . . . . . . . . . . . . . . 91 5.2 The Simple case (fi = 0). . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3 The logistic model logitP(yijk = 1jbij) = ?+bij: . . . . . . . . . . . . 99 6 Simulation 106 6.1 Case 1 simulations for ^^flw . . . . . . . . . . . . . . . . . . . . . . . . 107 6.1.1 Case 1 combining a with vi. . . . . . . . . . . . . . . . . . . . 107 6.2 Case 1 logistic random intercept simulation for flxed realizations of random efiects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.3 Case 1 random intercept simulations for penalized likelihood estimators121 6.3.1 Case 1 logistic random intercept combining a with vi. . . . . . 122 6.4 Case 2 logistic simple example . . . . . . . . . . . . . . . . . . . . . . 126 7 Conclusions and Future Work. 130 7.1 Theoretical Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.2 Conclusions from the simulation studies and real data analysis . . . . 131 7.3 Future work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A Simulation Results 133 A.1 Case 1 random intercept model for penalized likelihood estimator . . 133 A.1.1 Case 1 logistic random intercept combining a with vi. . . . . . 133 A.1.2 Case 1 logistic random intercept when a and vi are estimated separately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A.1.3 Case 1 Poisson random intercept combining a and vi. . . . . . 151 A.1.4 Case 1 Poisson random intercept with a and vi estimated sep- arately . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Bibliography 159 v List of Tables 1.1 Notations used throughout the Thesis . . . . . . . . . . . . . . . . . . 5 1.2 Table 1.1 (Continued): Notation used throughout the Thesis . . . . . 6 4.1 2?2?m table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Simulated estimates of logistic odds ratio in 2?2?m balanced table. 76 4.3 Simulated standardized estimates of log odds ratio (standardized by true and estimated conditional standardized error) in 2?2?m bal- anced table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4 Simulated estimates of log odds ratio (standardized by true and esti- mated conditional standardized error) in 2?2?m unbalanced table. 81 4.5 Simulated standardized estimates of log odds ratio (standardized by true and estimated conditional standardized error) in 2?2?m un- balanced table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.6 Simulated estimates of logistic odds ratio for flxed realizations of uni- formly distributed random efiects (m = 10;n1 = 200) and fl0 = 0:5. . 85 4.7 Simulated standardized estimate of logistic odds ratio with flxed real- izations of uniformly distributed random efiects (m = 10; n1 = 200) and fl0 = 0:5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.8 Simulated estimates of logistic odds ratio with flxed realizations of uniformly distributed random efiects (m = 10;n = 200) and fl0 = 1. . 87 4.9 Simulated standardized estimate of logistic odds ratio with flxed real- izations of uniformly distributed random efiects (m = 10; n1 = 200) and fl0 = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.10 Summary of simulation of our estimator and Mantel-Haenszel esti- mator of log odds ratio. . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.1 Simulated estimates of logistic and Poisson regression coe?cient com- bining flxed intercept with uniformly distributed random efiects. . . . 110 6.2 Simulated standardized estimate of logistic and Poisson regression coe?cient with uniformly distributed random efiects. . . . . . . . . . 111 vi 6.3 Simulated estimates of logistic and Poisson regression coe?cient com- bining flxed intercept with normally distributed random efiects. . . . 114 6.4 Simulated standardized estimate of logistic and Poisson regression coe?cient with normally distributed random efiects. . . . . . . . . . . 115 6.5 Simulated estimates of logistic regression coe?cient combining flxed efiect and flxed realizations of uniformly distributed random efiects (m = 20;n1 = 200). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.6 Simulated standardized estimate of logistic regression coe?cient with flxedrealizationsofuniformlydistributedrandomefiects(m = 20; n1 = 200). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.7 Simulated estimates of logistic regression coe?cient combining flxed efiect and flxed realizations of uniformly distributed random efiects (m = 40;n1 = 240). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.8 Simulated standardized estimate of logistic regression coe?cient with flxedrealizationsofuniformlydistributedrandomefiects(m = 40; n1 = 240). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.9 Simulatedstandardizedestimatesoflogisticregressioncoe?cient(stan- dardized by true and estimated conditional standardized error) com- bining flxed intercept with random efiects. . . . . . . . . . . . . . . . 124 6.10 Simulated standardized estimate of logistic regression coe?cient for various initial values. . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.11 Simulated standardized estimate of logistic regression coe?cient for various ?1 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.12 Simulated standardized estimate of ? and ? in case 2 simple example for various (m;n1) values . . . . . . . . . . . . . . . . . . . . . . . . . 128 A.1 Simulatedstandardizedestimatesoflogisticregressioncoe?cient(stan- dardized by true and estimated conditional standardized error) com- bining flxed intercept with random efiects. . . . . . . . . . . . . . . . 136 A.2 Simulated standardized estimate of logistic regression coe?cient for various initial values. . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 A.3 Simulated standardized estimate of logistic regression coe?cient for various ?1 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 vii A.4 Simulatedstandardizedestimatesoflogisticregressioncoe?cient(stan- dardized by true and estimated conditional standardized error) with a and vi estimated separately. . . . . . . . . . . . . . . . . . . . . . . 147 A.5 Simulated standardized estimate of Poisson regression coe?cient for various (m;n1) values. . . . . . . . . . . . . . . . . . . . . . . . . . . 152 A.6 Simulated standardized estimate of logistic regression coe?cient for various ?1 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 viii List of Figures 4.1 Q-Q plots for (^flw ? fl0) and (^^flw ? fl0) standardized by true and estimated conditional standard error, for various values of (m;n) in logistic 2?2?m balanced setup. . . . . . . . . . . . . . . . . . . . . 78 4.2 Q-Q plots for (^flw ? fl0) and (^^flw ? fl0) standardized by true and estimated conditional standard error, for various values of (m;n) in logistic 2?2?m unbalanced setup where n1=600, 400 for (1/3,2/3), (1/4, 3/4) setups respectively . . . . . . . . . . . . . . . . . . . . . . . 83 6.1 Q-Q plots for (^flw ? fl0) and (^^flw ? fl0) standardized by true and estimated standard error, for various values of (m;n) in case 1 random intercept model with uniformly distributed random efiects. . . . . . . 112 6.2 Q-Q plots for (^flw ? fl0) and (^^flw ? fl0) standardized by true and estimated standard error, for various values of (m;n) in case 1 random intercept model with normally distributed random efiects. . . . . . . 116 6.3 Q-Q plots for standardized (^? ? ?0) and (^? ? ?0) by true standard error, for various (m;n1) in case 2 logistic simple example. . . . . . . 129 A.1 Q-Q plots for (^fl1 ? fl10) standardized by true conditional standard error combining a with vi, for various (m;n1) in logistic random in- tercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 A.2 Q-Q plots for (^fl1 ?fl10) standardized by estimated conditional stan- dard error combining a with vi, for various (m;n1) in logistic random intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 A.3 Q-Q plots for (^fl1 ? fl10) standardized by true conditional standard error combining a with vi, for various initial values of ^fl1 in logistic random intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . 142 A.4 Q-Q plots for (^fl1 ? fl10) standardized by estimated standard error combining a with vi, for various initial values of ^fl1 in logistic random intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 A.5 Q-Q plots for (^fl1 ? fl10) standardized by true conditional standard error combining a with vi, for various values of ?1 in logistic random intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 ix A.6 Q-Q plots for (^fl1 ?fl10) standardized by estimated conditional stan- dard error combining a with vi, for various values of ?1 in logistic random intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . 145 A.7 Q-Q plot for (^fl1 ? fl10) standardized by true conditional standard error not combining a with vi, for various (m;n1) in logistic random intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 A.8 Q-Q plot for (^fl1 ?fl10) standardized by estimated conditional stan- dard error with estimated a with vi separately, for various (m;n1) in logistic random intercept case 1. . . . . . . . . . . . . . . . . . . . . . 150 A.9 Q-Q plots for (^fl1 ? fl10) standardized by true conditional standard error with a and vi estimated separately in Poisson random intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 A.10 Q-Q plots for(^fl1 ?fl10) standardized by estimated conditional stan- dard error with a and vi estimated separately in Poisson random intercept case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 A.11 Q-Q plots for (^fl1?fl10) for standardized by true conditional standard error with a and vi estimated separately, for various values of ?1 in Poisson random intercept case 1. . . . . . . . . . . . . . . . . . . . . 156 A.12 Q-Q plots for (^fl1 ?fl10) standardized by estimated conditional stan- dard error with a and vi estimated separately, for various values of ?1 in Poisson random intercept case 1. . . . . . . . . . . . . . . . . . 157 x Chapter 1 Introduction This thesis is focused on asymptotic normality of flxed efiect estimates in Generalized Linear Mixed Models (GLMMs) for the canonical link function case when the number of random efiects is large. Due to the wide range of applications of GLMMs, these models have received substantial attention during the last decade. Although asymptotic behavior of flxed efiect estimates has been wellstudied in linear regression models, Generalized Linear Models (GLMs) and Generalized Estimating Equations (GEEs), for GLMMs there is still a lot of work to do. These models and the main idea of this thesis are addressed brie y below. Generalized Linear Models (GLMs), originally introduced by Nelder and Wed- derburn [23], provide a unifled family of models that is widely used for regression analysis. These models are intended to describe non-normal responses. In partic- ular, they avoid having to select a single transformation of the data to achieve the possibly con icting objectives of normality, linearity and homogeneity of variance. Important examples include binary and count data. They can be applied to a wide array of discrete, continuous and censored outcomes. They are most commonly used when the outcomes are independent. These models are described in detail in Section 2.1. Another important extension of GLM is the Quasilikelihood model approach which can be deflned by specifying only the relation between the mean and the 1 linear predictors and between the mean and variance of observations. This model is illustrated in Section 2.2. However, in many applications, independence of outcomes is not a reasonable assumption. This is particularly obvious in longitudinal studies, where multiple measurements made on the same individual are likely to be correlated. One tech- nique for the analysis of such general correlated data is the generalized estimating equations (GEE) approach introduced by Liang and Zeger [37]. This approach has the desired quality of independence between subjects while adjusting for the cor- relation structure within subjects. These models are described in detail in Section 2.3. We may wish to build a model that accommodates correlated data, or to consider the levels of a factor as selected from a population of levels in order to make inference about that population. Generalized Linear Mixed Models (GLMMs) extend the framework of Linear Mixed Models (LMMs) and of Generalized Linear Models (GLMs) by allowing for non-Gaussian data, nonlinear link functions and inclusion of random efiects and of correlated error. These models are described in detail in Section 2.4. Asymptotic normality and consistency results for flxed efiect estimates have been proved for GLMs ( e.g., Habermann [13] and Fahrmeir and Kaufmann [11]) and also for GEEs ( e.g., Zeger and Liang [37] and Xie and Yang [35]). For GLMMs, most work about asymptotic properties of flxed efiect estimates is based on eliminating random efiects either by integrating them out (e.g., Sinha [30]) or by conditioning on minimal su?cient statistics of random efiects (e.g, Sartori and Severini [28] ). Simply 2 eliminating random efiects may discard information. In addition, integrating out all the random efiects requires knowledge about the distribution of all the random efiects. Jiang [14] extended GLMMs to generalized GLMMs by making only an expec- tation assumption for random efiects instead of a normal distribution assumption. He divided generalized GLMMs into two cases: m=N ! 0 (case 1) and m=N 9 0 (case 2) where m is the number of levels of random efiects and N is the sample size. Both m and N go to inflnity. For case 1 and 2, he proposed two methods: Penalized Generalized Weighted Least Squares (PGWLS) and Maximum Condi- tional Likelihood (MCL). Under reasonable conditions, consistency of both flxed and random efiect estimates was proved rigorously. Jiang [14] used three examples to illustrate those reasonable conditions. The concepts and results mentioned above are discussed in Section 2.5. In order to flnd approximate tests and confldence regions, an asymptotic nor- mality result is needed. In Chapter 3, focusing on the random intercept problem, we propose a new estimator ^^flw of regression coe?cients and prove that when m, the number of random efiects, grows to inflnity at a slower rate than the smallest cluster sample size, ^^flw is consistent and given the realization of random efiects, is asymptotically normal. We also show how to estimate the normalizer and weight matrices of our estimators. We also study the asymptotic distribution of Jiang?s [14] penalized likelihood estimators. In the absence of regression coe?cients, the normal- ized estimated intercept pm(^a?a0) converges to a normal distribution. Di?culties arise in establishing the conditional asymptotic normality of the penalized likelihood 3 estimator ^fl of regression coe?cients for flxed efiects in a general GLMM. In Chapter 4, we make an extended analysis of the 2?2?m table to show how to verify the general conditions in Chapter 3. We compare our estimator to the Mantel-Haenszel estimator. Simulation studies and real data analysis results validate our theoretical results. In Chapter 5, asymptotic normality of joint flxed efiect estimate and scale parameter estimate is proved for the case as m=N 9 0. An example was used to verify the general conditions in this case. In Chapter 6, in order to check the asymptotic results of the theorems in Chapter 3 and 5, logistic and Poisson random intercept models are simulated in case 1. For case 2, a simple model is simulated. The Splus built-in function nlminb is used to compute the estimates. The contents of Chapters 3-6 are new. In Chapter 7, we summarize our theoretical and simulation results, and discuss the potential direction of future work. 1.1 Notations. 4 Table 1.1: Notations used throughout the Thesis Symbol Meaning Bij b00ij(?ij0)+ 12b(3)ij (??ij)(^?ij ??ij0) rij 12b(3)ij (??ij)(^?ij ??ij0) ~B diag(Bij)1?i?m ; 1?j?n i E?i Pj Bij n? min1?i?m ni C??1 ??1=(?1Pi(E?i )?1 +m) ^?ij ^ai +xtij^fli ^^?ij ^ai +xtij^^flw 5 Table 1.2: Table 1.1 (Continued): Notation used throughout the Thesis Symbol Meaning DN( ) ?@2lP( )=@2 gN( ) @lP( )=@ 1ni the ni ?1 vector whose elements are all 10s kuk = (Pni=1 u2i)1=2 vector norm for vector u 2 Rn kAk = supkuk=1jutAuj = maxj(j?jj) matrix norm for a symmetric matrix A ?max(A) = supkuk=1utAu largest eigenvalue of A ?min(A) = infku=1kutAu smallest eigenvalue of A A = 0 BB @ A11 A12 A21 A22 1 CC A Partitioned matrix A A11:2 = A11 ?A12A?122 A21 Schur component of A11 For notational simplicity, we let b00ij = b00(?ij0), b0ij = b0(?ij0), DN = DN( 0) and gN = gN( 0), ect. where ?ij0 and 0 are the true values of ?ij and respectively. A1=2 orAT=2 represents a left or right square root. A? represents generalized inverse. 6 Chapter 2 Literature Review 2.1 Generalized Linear Models Nelder and Wedderburn [23] introduced Generalized Linear Models (GLMs) as a unifying class of models which are widely used in regression analysis. Section 2.1.1 presents their original deflnition and the extended deflnition from Fahrmeir and Kaufmann [11]. Asymptotic properties of estimates are described in Section 2.1.2. 2.1.1 Deflnitions GLMs were originally described as follows: A vector of observations y having n components is assumed to be a realization of a random variable Y whose compo- nents are independently distributed with means ?i, i = 1;:::;n. Assume that the components Yi are independent and have a distribution in the exponential family, taking the form f(yi; i;`) = exp ?y i i ?b( i) a(`) +c(yi;`) (2.1) for some speciflc functions a(?), b(?) and c(?). Then E(Yi) = ?i = b0( i); Var(Yi) = b00( i)a(`): 7 Deflne g(?i) = xTi fl = ?i, where fl = (fl1;:::;flp)T is a vector of unknown para- meters, xiT = (xi1;:::;xip) is a vector of covariates and g is a known link function (since it links together the mean of yi and the linear form of predictors). The large sample theory of GLM?s is derived by Fahrmeir and Kaufmann [11]. They consider GLM?s with the following structure and notations. (i) The fyig are independent q-dimensional random variables with densities f(yij i) = c(yi)exp( Ti yi ?b( i)) (2.2) of the natural exponential type, i 2 ?0 (assume ? 2 <2 to be the natural parameter space, that is, the set of all satisfying 0 < R c(y)exp( Ty)dy < 1). Then ? is convex, and in the interior ?0 of ?, all derivatives of b( ) and all moments of y exist (assume ?0 6= ;). In particular we have E(y) = @b( )=@ ? ?( ) 2<2 and Cov(y) = @2b( )=@ @ T ? ?( ), a q ?q matrix. (ii) The deterministic matrix xi in uences yi in the form of a linear combination ?i = xTi fl, where fl is a p-dimensional parameter. (iii) The linear combination ?i is related to the mean ?( i) of yi by the injective link function g : M ! 0, of neighborhoods of fl0 (the true value of fl) as Nn(?) = ffl : kFT=2n (fl?fl0)k? ?g where n = 1;2;??? and A1=2 is a left square root of the positive deflnite matrix A and AT=2 denotes (A1=2)T so that A1=2AT=2 = A. Their conditions are: (D) Divergence: ?min(Fn) !1. (C) Boundedness from below: for all ? > 0, Fn(fl) ? cFn is positive semideflnite, fl 2 Nn(?), n ? n1 with some constants n1 = n1(?), c > 0 independent of ?. (N) Convergence and continuity: for all ? > 0, maxfl2Nn(?)kVn(fl)?Ik! 0, where Vn(fl) = F?1=2n Fn(fl)F?T=2n is the normed information matrix. (S?) Boundedness of the eigenvalue ratio: there is a neighborhood N ? B of fl0 such that ?min(F(fl)) ? c(?max(Fn))1=2+?; fl 2 N; n ? n1: where B, the set of admissible values of fl, is open in

0, ? > 0, n1 are some constants. 11 Fahrmeir and Kaufmann?s [11] main theorems for the canonical link function case are the following: Theorem 2.1.1. Under (D) and (C), there is a sequence f^flng of random variables with (i) P[sn(^fln) = 0] ! 1 (asymptotic existence), (ii) ^fln !p fl0 (weak consistency). Theorem 2.1.2. Under (D) and (S?) with a ? > 0, there is a sequence f^flng of random variables and a random number n2 with (i) P(sn(^fln) = 0 for all n ? n2) = 1, (ii) ^fln !a:s: fl0 (strong consistency). Lemma 2.1.3. Under (D) and (N), the normed score function is asymptotically normal: F?1=2n sn !d N(0;I): Theorem 2.1.4. Under (D) and (N), the normed MLE is asymptotically normal: FT=2n (^fln ?fl0) !d N(0;I): Remark: Because B us convex and Fn(fl) is positive deflnite, there is at most one zero of the score function. Lemma 2.1.3 and Theorem 2.1.4 hold for any version of the matrix square root. Remark: InpracticethenormalizingmatrixFT=2n mustbereplacedbyFT=2n (^fln). Fahrmeir and Kaufmann [11] state that FT=2n (^fln)(^fln ?fl0) !d N(0;I) if the fol- lowing condition holds 12 (Q) For all ? > 0, supfl2Nn(?)kF1=2n F1=2n (fl)?Ik! 0 They state that (Q) =) (N). If F1=2n is the Choleski square root, then (N) =) (Q). Likewise, (N) =) (Q) if ?max(Fn)=?min(Fn) ? c < 1, n ? n0 for some constants c and n0. They also verify these general conditions in several examples, including Poisson model, response with a bounded range, regressor with compact range and stochastic regressor. Noncanonical link functions enlarge the class of GLMs but they cause ad- ditional di?culties in establishing consistency and asymptotic normality, mainly because of the existence of the MLE can not be guaranteed except in special cases (for a number of important examples see Wedderburn [33]). A local maximum of the likelihood in a neighborhood of the true fl0 does not necessarily deflne a global maximum. Here consistency and asymptotic normality results only apply to a se- quence f^flng of solutions of maximum likelihood equations sn(fl) = 0. The same line of arguments in the canonical link function case is followed in the proof, but only for conditions (C), (D), (N), (S?) with Fn(fl) replaced by Hn(fl). 2.2 Quasilikelihood Another important extension of GLMs is the Quasilikelihood model approach which was introduced by Wedderburn [34]. To deflne a likelihood one has to specify the form of distribution of the observations. It may be di?cult to decide what distribution the observations follow, but to deflne a quasilikelihood function one 13 needs only to specify a model for the mean and a relation between the mean and variance of the observations. This is is what makes quasi-likelihood useful. By using the chain rule we can rewrite Equation (2.4) in the following form sn(fl) = @ln(fl)=@fl = nX i=1 xi@?i@? flfl fl?=xTi fl ??1i (fl)(yi ??i(fl)): (2.7) We can see that the score function in (2.7) depends on the parameters only through the mean ?i(fl) and ?i(fl) = ?i(?i(fl)), the variance function. Fahrmeir [12] extended the analysis of correctly specifled GLM?s to misspec- ifled GLM?s, where misspeciflcation occurs if the true density of response variable is not of the assumed exponential type, and/or the true expectation yi can not be represented by ?(u(xtifl)). He followed the same line argument as Fahrmeir and Kaufmann [11] and proved consistency and asymptotic normality of Maximum Quasilikelihood Esti- mators (MQLE) under very similar conditions. Fahrmeir [12] also verifled these conditions in several examples like regressors with a compact range, response y with bounded range and univariate models, etc. 2.3 Generalized Estimating Equations The class of GLM?s (Nelder and Wedderburn [23]) plays a central role in re- gression problems with discrete or nonnegative responses. This class of regression models was extended by Liang and Zeger [37] to analyze longitudinal or batch cor- related data. The Liang and Zeger approach is known as Generalized Estimating 14 Equations (GEE). The deflnition of GEE is presented in Section 2.3.1. Several approaches to the asymptotic properties of GEE are summarized in Section 2.3.2. 2.3.1 Deflnition The GEE model is the following: Suppose (yij;xij) are observations for the jth measurement on the ith subject, j = 1;2;:::;ni and i = 1;2;:::;m, where yij is a scalar response, xij is a p?1 covariate vector, and ni is the cluster size. Assume that the observations on difierent subjects are independent and the observations on the same subjects are correlated. For i = 1;:::;m, let yi = (yi1;:::;yini)T and Xi = (xi1;:::;xini)T. Liang and Zeger [37] used a generalized linear model to model the marginal density of yij (with respect to a -flnite measure ?): f(yijjxij;fl;`) = exp[fyij ij ?b( ij)+c(yij)g=`]: (2.8) As in Section 2.1, ij = u(?ij), u is a known injective function mapping < into ? and ?ij = xTijfl. The vector fl contains the regression parameters of interest, and ` is a nuisance scale parameter. Under such a model speciflcation, the flrst two moments of yij are given by ?ij(fl) = E(yijjxij;fl;`) = b0( ij); 2(fl) = Cov(yijjxij;fl;`) = b00( ij)`: (2.9) Let g(t) = (b0?u)?1(t); then g(?ij(fl)) = xtijfl. The function g(t) is the link function and its inverse function h(s) = (b0 ? u)(s) is called the inverse link function. Of importance are the canonical link functions, where u(s) = s, so g(t) = (b0)?1(t) and h(s) = b0(s). 15 Let ?i(fl) = E(yi) = (?i1(fl);:::;?ini(fl))T and ?i(fl) = Cov(yi). We write Ai(fl) = diag( 2i1(fl);:::; 2ini(fl)) and ?i(fl) = diag(u0(xTi1fl);:::;u0(xTinifl)), where, for any vector v, diag(v) represents a diagonal matrix whose diagonal ele- mentsaretheelementsofv. LetDi(fl) = Ai(fl)?i(fl)Xi andVi(fl;fi) = A12i (fl)Ri(fi)A12i (fl). Here Ri(fi) is the \working" correlation matrix which one can choose freely and which may possibly contain a nuisance parameter (or parameter vector) fi. If Ri(fi) is equal to the true (often unspecifled) correlation matrix Ri, then Vi(fl0;fi) = ?i(fl0). Liang and Zeger [37] proposed solving the following equations: gnm(fl) = mX i=1 gni;i = mX i=1 Di(fl)TV?1i (fl;fi)(yi ??i(fl)) = 0; (2.10) which they called \generalized estimating equations." Let ?min(T) (?max(T)) denote the smallest (largest) eigenvalue of the matrix T, and deflne Mnm(fl) = Cov(gnm(fl)) = mX i=1 DTi (fl)V?1i (fl;fi)?i(fl)V?1i (fl;fi)Di(fl); (2.11) Dnm(fl) = ?@gnm(fl)@flT ; (2.12) Hnm(fl) = mX i=1 DTi (fl)V?1i (fl;fi)Di(fl): (2.13) Let gnm = gnm(fl0); Hnm = Hnm(fl0); and Mnm = Mnm(fl0); Fnm = HnmM?1nmHnm: 16 2.3.2 Asymptotics The GEE method allows the nuisance parameter to be determined from the sample, which extends the exibility and applicability of the method. Furthermore, the covariance matrices ?i(fl) can be estimated from the data, so that large-sample testing and interval estimation is possible. In the past, most research in GEE has been directed to methodological development and modeling issues. Most of the work relies on the asymptotic results presented by Liang and Zeger [37], in which exact conditions are not specifled. Xie and Yang [35] developed a set of (information matrix based) general conditions, which leads to the proof of weak and strong consistency as well as asymptotic normality. In both papers, the dimension of fl is flxed. Liang and Zeger [37] claim consistency and asymptotic normality of GEE estimator under regularity conditions when the number of independent subjects goes to inflnity and the number of observations on each subject stays bounded. Exact conditions are not specifled. E?ciency is improved if the \working" correlation is correct or close to correct. Xie and Yang [35] present asymptotic properties of the GEE estimator ^flnm in each of three distinct large sample settings: (i) m !1 and n = n(m) = max1?i?m ni is uniformly bounded for all m, (ii) m is bounded but n !1, (iii) n !1 as m !1, 17 where m is the number of independent clusters and ni is the cluster size. Liang and Zeger [37] only consider large sample setting (i). Most of the con- ditions Xie and Yang [35] derived for consistency and asymptotic normality of GEE estimator parallel the elegant conditions presented by Fahrmeir and Kaufmann [11]. They also ignore the nuisance parameter ` in their equations (2.10) and (2.11), following (2.4) and (2.5) of Fahrmeir and Kaufmann [11]. Also for simplicity, they do not study the efiect of estimating the nuisance parameter fi that appears in the working correlation matrix Ri(fi). In addition to regularity conditions on the parameter space, link function and covariates, Xie and Yang [?] propose the following conditions for asymptotic behaviors of GEE estimation: (Iw) ?min(Fnm) !1: (Lw) There exists a constant c0 > 0, for any r > 0 such that P ?DTnm(fl)M?1nmDnm(fl) ? c0Fnm and Dnm(fl) is nonsingular ;for all fl 2 Bnm(r)) ! 1 where Bnm(r) = ffl : kM?12nmHnm(fl?fl0)k? rg: (I?w) (?nm)?1?min(Hnm) !1; where ?nm = max1?i?mf?max(R?1i (fi)Ri)g: (L?w) There exists a constant c0, for any ? > 0 and r > 0, such that P (Dnm(fl) ? c0Hnm and Dnm(fl) is nonsingular; for fl 2 B?nm(r)) ! 1 18 where B?nm(r) = ffl : kH12nm(fl?fl0)k? (?nm)12rg: (CC) For any given r > 0 and ? > 0 P ? sup fl2B?nm(r) kH?12nmDnm(fl)H?12nm ?Ik < ? ! ! 1: The matrix norm is the Euclidean matrix norm and their main theorems are the following: Theorem 2.3.1. Under conditions (Iw) and (Lw), there exists a sequence of random variables ^flnm, such that P(gnm(^flnm) = 0) ! 1 and ^flnm ! fl0 in probability: Theorem 2.3.2. The results of Theorem 2.3.1 hold if (Iw) and (Lw) are replaced by (I?w) and (L?w), respectively. Theorem 2.3.3. Suppose that conditions (Iw), (Lw) and (CC) hold, or that condi- tions (I?w) and (CC) hold. Then there exists a sequence of solutions ^flnm to the GEE equation in B?nm(r) such that M?12nmHnm(^flnm?fl0) and M?12nmgnm are asymptotically identically distributed. Fort > 0, let?(t)beapositivenondecreasingfunctionsuchthatlimt!1?(t) = 1 and t?(t) is a convex function. Xie and Yang [35] use ?(t) to establish Lindeberg conditions. Examples include ?(t) = t1=?, ? > 0 and ?(t) = exp(t). 19 Lemma 2.3.4. Under the GEE setting, suppose there exist a constant K (indepen- dent of n) and an integer n0 such that, for j = 1;2;:::;ni and i = 1;2;:::;m, when n > n0 E[y?2ij ?(y?2ij )] ? K where y?i = (y?i1;:::;y?imi)T = A?12i (yi ??i). In addition, for any " > 0, cnm~?nmn " ? ? "2 cnm~?nmn (D)nm !#?1 ! 0 Then, when m !1, we have M?12nmgnm ! N(0;I) in distribution where cnm = ?max(M?1nmHnm), (D)nm = max 1?i?m ?max(H?12nmDTi V?1i DiH?12nm): Strong consistency of the GEE estimator can be proven under the following condition. (Ls) In a neighborhood of fl0, say N, there exists a constant c0 > 0 (independent of m) and ? > 0 such that when m !1. ?min(Dnm(fl)TM?1nmDnm(fl)) ? c0(logm)2(1+?) and Dnm(fl) is nonsingular a:s: for fl 2 N. Theorem 2.3.5. Suppose gni;i, i = 1;:::;m, the summands of the GEE score function gnm, form a inflnitesimal double array sequence. Under condition (Ls), there exist a sequence of random variables ^flnm and a random number n0, such that P(gnm(^flnm) = 0; for all n ? n0) = 1 20 and when m !1 ^flnm ! fl0 a:s: Also they stated that when m is bounded or n goes to inflnity too fast, as- ymptotic normality of gnm or ^flnm does not hold without specifying the dependence structure on each subject. They even point out that if we put Ri = I, gnm(fl) is asymptotically normally distributed if and only if n=m ! 0. Also for settings (i) and (iii) when n is bounded above or tends to inflnity at a limited rate as m !1, they present a set of su?cient conditions to ensure asymptotic normality of gnm and ^flnm. They verify these general conditions for some cases of practical importance, such as marginal GLM with compact covariate set, marginal Poisson regression model and marginal GLMs with bounded responses (binomial or polytomous re- gression models). A drawback to the GEE approach is that it assumes all subjects have the same covariance structure. The \working" correlation matrix is not necessarily an essen- tial feature of GEE. Jiang [16] proposed a nonparametric quasi-likelihood approach for getting a nonparametric estimator of the unknown covariance matrices. Chiou and M?uller [7] proposed Estimated Estimating Equations (EEE) whose covariance structure is modeled nonparametrically as a function of the mean and therefore is an essential component that is part of the model fltting. The difierence between the approaches of Jiang [16] and Chiou and M?uller [7] is that the mean function is correctly specifled in Jiang [16] but unknown in Chiou 21 and M?uller [7]. It has been shown by Jiang [16] that the estimator obtained from the nonparametric quasi-likelihood approach has an asymptotic normal distribution, the same as the estimator obtained from quasi-likelihood approach with true covari- ance matrix. Moreover, the rate of convergence has been established. Chiou and M?uller [7] gave a sketchy proof on consistency and asymptotic normality of their EEE estimator along the lines of McCullagh and Nelder [19]. 2.4 Generalized Linear Mixed Models Generalized Linear Mixed Models (GLMM?s) are a natural extension of GLM by including random efiects. It is usually assumed that the random efiects have a multivariate normal distribution whose variance components are to be estimated from the data. In Section 2.4.1 the deflnition of GLMM is presented and some asymptotic results are discussed in Section 2.4.2. 2.4.1 Deflnitions The structure of GLMM?s (McCulloch and Searle [22]) is the following: The response vector y is typically, but not necessarily, assumed to consist of conditionally independent elements, each with a distribution with density from the exponential family: yijfi ? independent; fYijfi(yijfi) fYijfi(yijfi) = expf[yi i ?b( i)]=a(`)+c(yi;`)g: 22 E[yijfi] = ?i g(?i) = xtifl +ztifi = ?i: Here, g(?) is a known function, called the link function, xti is the ith row of the model matrix for the flxed efiects, and fl is the flxed efiects parameter vector. To that speciflcation we have added zti, which is the ith row of the model matrix for the random efiects, and fi, the random efiects vector. To complete the speciflcation we assign a distribution to the random efiects: fi ? ffi(fijD) E(fi) = 0 where D represents the parameters governing the distribution of fi. Often f(fijD) is multivariate normal and D is the covariance matrix. In GLMM, the canonical parameter i is related to the covariate by i = (?i). When i = ?i, the link is said to be the canonical link. 2.4.2 Estimation approaches According to Jiang [14], the GLMM setup is divided into two cases: the case where there is enough information about the random efiects and the case where there is not. The flrst case is characterized by m=N ! 0 (case 1), while the second by m=N 90 (case 2), where m is the dimension of the random efiects and N is the sample size. In case 1 when the dimension of flxed efiects is flxed, Sartori and Severini [28] extend Davison?s [9] conditional likelihood approach for GLMs to GLMMs. The 23 Conditional Maximum Likelihood Estimate (CMLE) is deflned by Andersen [1] in the following way: To describe the situation Neyman and Scott [24] introduced the concept of structural and incidental parameters as follows. Consider a sequence of independent random variables X1;X2;X3;::: The distribution of Xi depends on the parameters fl and ?i, where the value of fl is the same, independent of i, while the value of ?i changes with i. Then fl is called a structural parameter and the ??s incidental parameters. Andersen [1] discussed a general method for obtaining consistent estimates for a structural parametersfl of flxed dimension in the presence of an increasing number of incidental parameters. He eliminated the incidental parameters by considering the conditional distribution given minimal su?cient statistics for the ??s. The value of fl that maximizes this conditional distribution is then called the Conditional Maximum Likelihood Estimate (CMLE) for fl. Sartori and Severini [28] showed that the conditional likelihood function is valid for any distribution of the random efiects. Hence, the inferences about the flxed efiects are insensitive to misspeciflcation of the random efiect distribution. Furthermore, Andersen [1] showed that the convergence of the normalized ^fl to a normal distribution holds given the random efiects. Hence, the asymptotic normality of ^fl is valid for any random efiect distribution. Li, Lindsay and Waterman [18] considered a rectangular array asymptotic embedding for multistratum datasets, in which both the number of strata and the number of within-stratum replications increase, and at the same rate. They pointed 24 that under this embedding the MLE is consistent but may not be e?cient owing to a non-zero mean in its asymptotic normal distribution. By using a projection operator on the score function, an adjusted MLE can be obtained that is asymptotically unbiased and has a variance that attains the Cramer-Rao lower bound. The adjusted MLE can be viewed as an approximation to the conditional MLE. In case 2 with flxed dimension of flxed efiects, Sinha [30] develops a technique for flnding a Robust Maximum Likelihood (RML) estimate of the model parame- ters in GLMM?s by using Huber?s ? function and the Mahalanobis distance, which appears to be useful in downweighting the in uential data points when estimating the parameter. The asymptotic properties of RMLE are investigated under regu- larity conditions. Sinha [30] also proposed a Robust Monte Carlo Newton-Raphson (RMCNR) algorithm for fltting GLMM?s to avoid the computational problems in- volving high-dimensional integrals. RMCNR can be considered as a modiflcation of the Monte Carlo Newton-Raphson (MCNR) method of McCulloch [21]. Breslow and Clayton [8] considered two closely related approximate methods of inference in GLMM?s: Penalized Quasi-likelihood (PQL), which is based on in- tegrated quasi-likelihood for integral approximation, and Marginal Quasi-likelihood (MQL). The major difierence between those two is that PQL hasE(yjfi) = h(XTfl+ ZTfi) in which fi is a vector of random efiects and MQL only has E(y) = h(XTfl). Lee and Nelder [17] developed a joint likelihood, called h-likelihood, for Hier- archical Generalized Linear Models (HGLMs) which allows extra error components in the linear predictors of GLMM?s. In order to get a marginal likelihood, one has to integrate out the random efiects from the joint likelihood. However, this 25 integration is often quite intractable and at the same time it makes random efiects nonestimable. By contrast the h-likelihood is easily available and avoids the need for burdensome integration. Under appropriate conditions, Lee and Nelder [17] showed that a random efiect MHLE (Maximum h-likelihood estimator) is an asymptotically best unbiased predictor and that a flxed efiect MHLE is asymptotically e?cient as the marginal MLE. With the h-likelihood, the scaled deviance test and test statistics for flxed and random efiects ofier a simple unifled framework of analysis. They also proposed an extended quasi-h-likelihood and several algorithms. 2.4.3 Random intercept model (canonical link) The penalized methods proposed by Jiang [14] are discussed in detail in Section 2.5 and we have the following important case of GLMM. Jiang [14] considers a special case of GLMM in which the responses are clus- tered into groups with each group associated with a single random efiect (possibly vector valued). Suppose that given unobservable random vectors fi1 ;:::; fim satis- fying E(fii) = 0 the responses yij, 1 ? i ? m, 1 ? j ? ni, (ni ? 1) are independent with E(yijjfi) = b0ij(?ij), where bij(?) is difierentiable. Write, ?ij = a+xtijfl +ztifii where a is an unknown intercept, fl = (flk)1?k?s (s is flxed) is an unknown vector of regression coe?cients, and xij = (xijk)1?k?s and zi are known vectors. Such models are useful, for example, in the context of small-area estimation in which 26 fii represents a random efiect associated with the ith selected area. Here we are interested in the estimation of the flxed efiect a, flk, 1 ? k ? s, and the \area- speciflc" random efiects vi = ztifii, 1 ? i ? m. Therefore, we may assume that in the above model ?ij has the following expression: ?ij = a+xtijfl +vi where v1;:::;vm are random variables with E(vi) = 0. Note that here we regard ai = a+vi as a random intercept. Logistic 2?2?m table is an important example in this case and it is modeled as logitP(yij = 1jfii) = fii + xijfl where fii is the random efiect, fl is the common log odds ratio and xij=0 or 1. 2.5 Generalized GLMM (canonical link function case) In order to apply GLMM, one has to know the distribution of random efiects. In fact, in many problems little is known about the distribution of random efiects. Therefore, it is of practical interest to develop methods and models that do not require strong distributional assumptions. In Section 2.5.1 the deflnition of gener- alized GLMM is given. Jiang [14] proposed methods called Penalized Generalized Weighted Least Squares (PGWLS) and Maximum Conditional Likelihood Estimate (MCLE) which are discussed in Sections 2.5.2 and 2.5.3 respectively. Jiang?s [14] consistency results are summarized in Section 2.5.4. 27 2.5.1 Deflnitions Jiang [14] generalized the deflnition of GLMM?s without assuming distribu- tions of random efiects and yi by conditioning on random efiects in the following way. Suppose that, given a vector fi = (fik)1?k?m of unobservable random variables (the random efiects) satisfying E(fi) = 0; (2.14) the responses y1;:::;yN are independent with conditional expectation E(yijfi) = b0i(?i) (2.15) where bi(?) is a difierentiable function. Let suppose ?i = xtifl +ztifi (2.16) where fl = (flj)1?j?p is a vector of unknown constants (the flxed efiects), and xi = (xij)1?j?p, zi = (zik)1?k?m are known vectors, 1 ? i ? N. In vector notation, ? = Xfl +Zfi. 2.5.2 Penalized Generalized Weighted Least Squares in Case 1 The method of maximum likelihood is widely used for analyzing GLMMs. A full maximum likelihood analysis requires numerical integration techniques to calculate the log-likelihood and also the distribution of random efiects needs to be known. Jiang [14] proposed a method of inference which in many ways resembles 28 the method of Least Squares (LS) in linear models and relies on weak distributional assumptions about random efiects. Assume without loss of generality that rank (X) = p and no column of Z is 0. In linear models (LMs), which correspond to (2.15) and (2.16) with bi(?i) = ?2i=2 and m = 0 (i.e., there are no random efiects), a well-known method is weighted least squares (WLS), which deflnes the estimate of fl as the minimizer of NX i=1 wi(yi ??i)2; (2.17) where wi, 1 ? i ? N, are weights, or equivalently, the maximizer of NX i=1 wi yi?i ? ? 2 i 2 ? : (2.18) A straightforward generalization of this method to the case of GLMM would suggest the maximizer of the following function as the estimates of fl and fi: NX i=1 wi(yi?i ?bi(?i)): (2.19) However, conditionally, the individual flxed and random efiects may not be identifl- able. In LM there are two remedies when the identiflability problem arises, namely, reparameterization and constraints. We shall, for now, focus on the latter. A set of linear constraints on fi may be expressed as Pfi = 0 for some matrix P. By La- grange?s method of multipliers, maximizing (2.19) subject to Pfi = 0 is equivalent to maximizing NX i=1 wi(yi?i ?bi(?i))? ?12 kPfik2 (2.20) without constraint, where ?1 is an additional variable. On the other hand, for flxed ?1 the last term in (2.20) may be regarded as a penalizer. The only thing that 29 needs to be specifled is the matrix P. For any matrix M and vector space V, let B(V) = fB : B is a matrix whose columns constitute a base for Vg; N(M) = the null-space of M = fv : Mv = 0g; PM = M(MtM)?Mt, and PM? = I?PM. Let A 2 B(N(PX?Z)) so that PX?ZA = 0. We deflne the penalized generalized WLS (PGWLS) estimate of = (fl;fi) as the maximizer of lP( ) = NX i=1 wi(yi?i ?bi(?i))? ?12 kPAfik2: (2.21) where ?1 is a positive constant. The notation lP is used because (2.21) may also be viewed as a penalized conditional quasi-log-likelihood. Consider the expression (2.21). The reason that one needs a penalizer here is because the flrst term, lC( ) = PNi=1 wi(yi?i ?bi(?i)), depends on = (fl;fi) only through ?. However, can not be identifled by ?, so there may be many vectors for which ? = Xfl +Zfi is the same. The idea is therefore to consider a restricted space S = f : PAfi = 0g such that within this subspace, is uniquely determined by ?. Case 1 Consistency Results. Jiang [14] uses the following notations: Let B = (bij)1?i?k; 1?j?l be a matrix, v = (vi)1?i?k a vector and V a vector space. Deflnekvk = max1?i?k jvij; kBk = ?12max(BtB),kBkR = (tr(BtB))12,kBk1 = max1?i?kPlj=1jbijj; BV = fBv : v 2 Vg, ?min(B)jV = infv2Vnf0g(vtBv=vtv). Jiang [14] presents consistency results for case 1 in the following Theorem 2.5.2 and 2.5.4 and Corollary 2.5.3. Theorem 2.5.1. Let b00i (?) be continuous, let max1?i?Nfw2iEvar(yijfi0)g be bounded 30 and let 1 N ? (max 1?u?p jXuj2)k(XtX)?1XtZk2 +( max 1?k?m jZkj2) ? jPAfi0j2 !p 0: (2.22) Let cN;dN > 0 be any sequences such that limsupkfl0k=cN < 1 and P(kfi0k=dN < 1) ! 1, Mi ? cN Ppu=1jxiuj+dN Pmk=1jzikj, 1 ? i ? N and ^ = (^fl; ^fi) be the maximizer of lP over ?(M) = f : j?ij? Mi;1 ? i ? Ng. Then 1 N ? pX u=1 jXuj2(^flu ?fl0u)2 + mX k=1 jZkj2(^fik ?fi0k)2 ! !p 0 (2.23) provided that p+m N = o(! 2) (2.24) where ! = ?min(W?1HW?1)jWS min 1?i?N fwi inf jhj?Mi b00i (h)g with W = diag(jX1j;:::;jXpj;jZ1j;:::;jZmj): Corollary 2.5.2. Let the conditions of Theorem 2.5.2 [including (2.24)] hold. (i) Suppose p is flxed, and liminf ?min(XtX)=N > 0 (2.25) Then ^fl !p fl0. (ii) Suppose Z = (Z(1)???Z(q)) and correspondingly, fi = (fi1;:::;fiq), where fiu = (fiuv)1?v?mu, and each Z(u) is a standard design matrix in the same sense as for U deflned below Lemma 2.5.1, 1 ? u ? q. Let Zuv be the vth column of Z(u) and nuv = jZuvj2 = the number of appearances of the vth component of fiu. 31 Then ?m uX v=1 nuv !?1 m uX v=1 nuv(^fiuv ?fi0uv)2 !p 0; 1 ? u ? q; (2.26) where ^fiuv and fi0uv, 1 ? v ? mu, 1 ? u ? q are the corresponding components of ^fi and fi0, respectively. For the special case of GLMM described in Section 2.4.3 as a random intercept model, the following theorem presents the consistency results: Theorem 2.5.3. Let b00ij(?) be continuous; let w2ijEvar(yijjv0), jxijj be bounded, let liminf(?N=N) > 0 where ?N = ?min(Pmi=1Pnij=1(xij ? xi)(xij ? xi)t) with xi = n?1i Pnij=1 xij. and let v0 !p 0. Let cN, dN > 0 be such that limsupja0j_jfl0j=cN < 1 and P(kv0k=dN < 1) ! 1, Mij ? cN(1+jxijj)+dN and ^ = (^a; ^fl;^v) be the maxi- mizer of lP over ?(M) = f : j?ijj ? Mij;all i;jg and ?N = minij infjhj?Mij b00(h). Then ^fl !p fl0 and 1 N mX i=1 ni(^ai ?a0i)2 !p 0; (2.27) where ^ai = ^a + ^vi and a0i = a0 + v0i, provided that m=N = o(?2N). If the latter is strengthened to (min1?i?m ni)?1 = o(?2N), then, in addition, ^a !p a0, and 1 N mX i=1 ni(^vi ?v0i)2 !p 0; 1m mX i=1 (^vi ?v0i)2 !p 0: (2.28) Note. It can be shown, by simple example, that ^fi !p fi0 and (2.28) may not hold without min1?i?m ni !1, even if m=N ! 0. 32 2.5.3 Maximum Conditional Likelihood Estimates in Case 2 In case 2 since m=N 90, we do not have enough information to estimate all random efiects. Jiang [14] states that it is often possible to estimate with adequacy a subset of the random efiects, and the ones which are not estimable will be integrated out. As in case 1, conditionally, the individual efiects may not be identiflable. A basic technique here is reparameterization which is a map from (fl;fi) to (~fl; ~fi). Jiang?s [14] reparameterization is stated in the following lemma: Lemma 2.5.4. There is a map fl 7! ~fl, fi 7! ~fi such that (i) Xfl+Zfi = ~X~fl+~Z~fi, where (~X; ~Z) is a known matrix of full column rank. (ii) ~zi = ~z?j, i 2 Sj for some known vector ~z?j, where ~ztj is the ith row of ~Z and where Sj is deflned below. Let U be a standard design matrix in the sense that it consists of 0?s and 1?s and there is exactly one 1 in each row and at least one 1 in each column. The vector uti is the ith row of U and eM;j is the M-dimensional vector whose jth component is 1 and whose other components are 0. Let Sj = f1 ? i ? N : ui = eM;jg, and y(j) = (yi)i2Sj, 1 ? j ? M. Suppose that ?1;:::;?M are independent with common distribution ?(?=?)=?, where ?(?) is a known density function and ? > 0 is an unknown scale parameter. Furthermore, we assume that there are no random efiects nested within ?. In notation, this means that zi = z?j, i 2 Sj, 1 ? j ? M, where z?j = (z?jk)1?k?l. Let ? = (~fl;?), ? = (~fi;?). Then we have f(yijfi;?) = f(yij?i); 1 ? i ? N (2.29) 33 where f(?2j?1) denotes the conditional density of ?2 given ?1. By Lemma 2.5.1, we have ? = ~X~fl + ~Z~fi+U?: Let ? = (~fl;?), ? = (~fi;?). By (2.29) we have f(yj?) = MY j=1 f(y(j)j?) and it is easy to show that f(y(j)j?) = gj(zt?j ~fi; ~fl;?), where gj(s) = E 2 4Y i2Sj f(yijs1 +xtis(2) +sr+2?) 3 5; s(2) = (s2;:::;sr+1) and r is the dimension of ~fl. Note that r ? p. Let n be the dimension of ~fi, hj(s) = loggj(s), lC(?) = logf(yj?) and lC;j(?) = logf(y(j)j?) = hj(zt?j ~fi; ~fl; ?). Then lc(?) = MX j=1 lc;j(?) Let Z? be the matrix whose jth row is zt?j, 1 ? j ? M. Let ?0 and ?0 be the vectors corresponding to the true parameters and realization of random efiects. Case 2 Consistency Results. Under some regularity conditions on derivatives, smoothness, integrability of densities f(y(j)j?), boundness of hj(s)?s second, third derivatives, etc., Jiang [14] proved consistency results for estimates of reparameterized parameters (~fl; ~fi) but not for the original (fl;fi), and one may never recover a consistency result for the original parameter (fl;fi) from the reparameterized (~fl; ~fi): Jiang [14] verifles the general conditions for consistency in three examples: for case 1, the two way crossed logistic model logit(P(yij = 1ja;b)) = ? + ai + bj, the 34 other one is random intercept logistic regression model are analyzed. For case 2 the only example is two stage nested logistical model logit(P(yijk = 1ja;b)) = ?+ai+bij. 35 Chapter 3 Case 1 random intercept model results. This Chapter proposes a new estimator Pmi=1 ^wi^fli and establishes its condi- tional asymptotic normality in the random intercept problem in case 1. Here ^fli is the maximum likelihood estimator based on the ith group and ^wi is a matrix weight proportional to the inverse estimated covariance of ^fli. Also this Chapter extends the consistency results of Jiang [14] to establishing asymptotic normality of the penalized likelihood estimators for flxed intercept in the case 1 random intercept problem when there are no regression coe?cients. Furthermore, this Chapter dis- cusses the di?culties of establishing asymptotic normality of Jiang?s [14] penalized likelihood estimators in general. Throughout this chapter we consider the canonical GLMM with f(yijjvi) = exp[?ijyij ?b(?ij)+c(yij)] and ?ij = a+vi +xtifl = ai +xtifl. Assumptions: (i) vi0 are iid, E(vi0) = 0, Var(vi0) = 2v where 2v is a constant. 9M such that P[jvi0j? M] = 1. (ii) 9K0 such that kxijk? K0 for all i, j. 36 3.1 Case 1 random intercept (combine a and vi) Recalling the Estimating Equation (2.3), for the ith group in the random intercept model we have the loglikelihood function : ln( i) = niX j=1 (yij?ij ?b(?ij)?C(yij)): (3.1) where i = (ai;flti)t. Take Taylor expansion for partial derivative of ln( i) around i0: Ur(^ i) = @ln( ri)@ i (^ i) = Ur( i0)+(rUr( ?i))(^ i ? i0) = Ur( i0)+(?Fni)(^ i ? i0)+[Fni ?Fni( ?)](^ i ? i0) (3.2) where ?i is between ^ i and i0 and ?Fni( ?) = rUr( ?i). Because the random efiects vi0 are independently and identically distributed with mean 0 and bounded and kxijk are uniformly bounded, for all i, j, b0ij is bounded and b00ij is bounded below and above by positive constants bl and bu, Fur- thermore, the consistency of ^ i implies that the b(3)ij (~?ij) are bounded with high probability where ~?ij is between ^?ij and ?ij0 . Deflne (Fflfli )?1 = Wbi = (Wbikl)1?kl?s ; 1?l?s where Wbikl = X j b00ij ? xijk ? P j xijkb 00 ijP j b 00 ij !? xijl ? P j xijlb 00 ijP j b 00 ij ! : (3.3) 37 Lemma 3.1.1. Let liminf(?ni=ni) > 0, where ?ni = ?min(Pj(xij ?xi)(xij ?xi)T) with xi = n?1i Pnij=1xij. Then there exists a positive constant c2 such that for any h > 0, inf kuk=1 uTWbiu ni > 1 c2: (3.4) for all ni su?ciently large. By considering each group separately, we construct a new estimator ^^flw = Pm i=1 ^wi^fli. We can treat each group as a conditional GLM, as in Fahrmeir and Kaufmann [11]. We make the following assumptions: First we assume that (Zi;Xi) is a full-column rank matrix and deflne a sequence Nni(?), ? > 0, of neighborhoods of i0 (the true value of i) by Nni(?) = f i : kFT=2ni ( i ? i0)k? ?g where ni = 1;2;??? and FT=2ni any right square root of the positive deflnite ma- trix Fni; that is, F1=2ni FT=2ni = Fni. Conditions (D), (C) and (N) of Fahrmeir and Kaufmann [11] in our setting become: (1) Divergence: ?minFni !1. (2) Boundedness from below: for all ? > 0, Fni( i)?cFni is positive semideflnite, for all i 2 Nni(?), ni ? n1 with some constants n1 = n1(?), c > 0 independent of ?. (3) Convergence and continuity: for all ? > 0, max i2Nni(?)kVni( i) ? Ik ! 0, where Vni( i) = F?1=2ni Fni( i)F?T=2ni is the normed information matrix. 38 Let (^ai; ^fli) be the solution of the score equation in the ith group. Then, given vi0, (Fni)T=2 0 BB @ ^ai ?ai0 ^fli ?fli0 1 CC A!d N(0;I): (3.5) Informally, given vi0, 0 BB @ ^ai ^fli 1 CC A? AN 0 BB @ 0 BB @ ai0 fl0 1 CC A;F?1ni 1 CC A; (3.6) where F?1ni = 0 BB @ ZTi DiZi ZTi DiXi XTi DiZi XTi DiXi 1 CC A ?1 = 0 BB @ Faai Fafli Fflai Fflfli 1 CC A: (3.7) Here Fflfli = ? XTi DiXi ?XTi DiZi?ZTi DiZi??1ZTi DiXi ??1 and Faai = ? ZTi DiZi ?ZTi DiXi?XTi DiXi??1XTi DiZi ??1 where Zi = 1ni and Di = diag(b00ij)1?j?ni. Wechoose ^wi = (Pi(^Fflfli )?1)?1(^Fflfli )?1 and ^^flw = Pi(Pi(^Fflfli )?1)?1(^Fflfli )?1^fli. Note that a scalar version of ^^flw was proposed by Woolf (1955) to estimate the com- mon log odds ratio in a 2?2?m contingency table. Theorem3.1.2. Suppose that Assumptions (i) and (ii) hold and for each i, liminf[n?1i ?ni] > 0, ni(Fni)?1 !p F0i, P(infi ?min(F0i) > ? > 0) > 1?h, P(supi ?max(F0i) < M < 1) > 1?h, where F0i is a positive deflnite matrix and that the following asymptotic relations are true: 39 (1) There exist positive numbers K1 < K2 such that K1 < min i ?min(Fflfli )=max i ?max(Fflfli ) < K2; (2) m=min i ni = o(1); max i ni=min i ni = O(1); (3) max i ?? max(Fni)?1=?min(Fni)?1 ? = O(1); X i k(Fni)?1k = o(1); (4) max i E h (^fl?i)t^fl?i?((^fl?i)t^fl?i)jvi0 i = Op(1); where ^fl?i = (Fflfli )?12(^fli?E(^flijvi0)) and ?(t) is a positive nondecreasing func- tion on [0;1) such that limt!1?(t) = 1 and t?(t) is a convex function. Then, given v0, ? mX i=1 (Fflfli )?1 !?1 2 mX i=1 (Fflfli )?1(^fli ?fl0) !d N(0;I): (3.8) Note that Fflfli depends on the unknown (ai;fl) through Di. Let ^Di = diag(b00ij(^?ij)1?j?ni), where ^?ij = ^ai +xtij^fli, ^^Di = diag(b00ij(^^?ij)1?j?ni) where ^^?ij = ^ai +xtij^^flw. Deflne ^Fflfli = XTi ^DiXi ?XTi ^DiZi ? ZTi ^DiZi ??1 ZTi ^DiXi ??1 : Then ^Fflfli estimates Fflfli and ^^flw = Pmi=1 ?P m i=1(^F flfl i ) ?1 ??1 (^Fflfli )?1(^fli ?fl0) . 40 Deflne ^^Fflfl i = XTi ^^DiXi ?XTi ^^DiZi ? ZTi ^^DiZi ??1 ZTi ^^DiXi ??1 : Then ^^Fflfli estimates Fflfli . Theorem 3.1.3. Let the hypotheses of Theorem 3.1.2 hold. Then we have (1) ? mX i=1 (Fflfli )?1 !1 2 ? mX i=1 (^Fflfli )?1 !?1 2 ?I = op(1); (2) k(^Fflfli )?1 ?(Fflfli )?1k = Op(pni); (3) ?X i (Fflfli )?1 !?1 2 = Op(1= pN); k(Fflfl i ) 1=2k = Op(1=pni): Furthermore, given v0, we have ? mX i=1 (^Fflfli )?1 !?1 2 mX i=1 (^Fflfli )?1(^fli ?fl0) !d N(0;I): (3.9) Under the same hypothesis with (1), (2) and (3) in Theorem 3.1.4 modifled by re- placing ^Fflfli by ^^Fflfli , given v0, ? mX i=1 (^^Fflfli )?1 !?1 2 mX i=1 (^^Fflfli )?1(^fli ?fl0) !d N(0;I): (3.10) According to Theorem 3.1.2 and 3.1.3, given the random efiects v0, the nor- malized versions of ^flw and ^^flw converge in distribution to N(0;I). Since this conver- gence holds for almost all realizations of v0, we can also claim that the convergence in distribution holds unconditionally. This is formalized in the following Corollary. 41 Corollary 3.1.4. Under the hypotheses of Theorem 3.1.2, the convergence state- ments (3.8), (3.9), and (3.10) hold unconditionally (except on a set of v0-probability zero). 3.2 Penalized likelihood estimator ^a in the case 1 random intercept model when fl=0 From Jiang [14] we can write lP( ) as follows: lP( ) = mX i=1 niX j=1 (yij?ij ?b(?ij))? ?12 mv2 (3.11) where ?ij = a + vi and there are no regression coe?cient. Here the penalty term (?1=2)mv2 is used to take care of the identiflability problem and we can estimate a and vi separately. We will write Bij = b00ij + 12b(3)ij (??ij)(^?ij ??ij0) = b00ij +rij (3.12) where rij = 12b(3)ij (??ij)(^?ij ??ij0) (3.13) and ~B = diag(Bij)1?i?m ; 1?j?n i: (3.14) Taylor expansion of @lP( )=@ around 0 takes the following matrix form: 0 BB @ ^a?a0 ^v?v0 1 CC A = (Q?)?1 0 BB @ 1TN(y??) ZT(y??)??1v01tm 1 CC A (3.15) 42 where Q? = 0 BB @ 1TN ~B1N 1TN ~BZ ZT ~B1N ZT ~BZ+?1=mJ 1 CC A = 0 BB @ Q?11 Q?12 Q?21 Q?22 1 CC A: (3.16) Let Znew = (1N;Z) and note that Q? is a nonsingular matrix since Q? = eAteA+ eHteH: where eA = ~B12Znew eH = (0 p? 1p m1 t m) Here eA is an N?(m+1) matrix with rank m, since Znew is not of full column rank matrix, and eH is a 1 ? (m+1) matrix with rank 1. However, eA and eH are complementary matrices since Sp(eAt) \ Sp(eHt) = f0g, we have Q? is a full-rank matrix with rank m+1. Then (Q?)?1 = 0 BB @ (Q?11:2)?1 ?(Q?11:2)?1Q?12(Q?22)?1 ?(Q?22)?1Q?21(Q?11:2)?1 (Q?22:1)?1 1 CC A: (3.17) Since we?re only interested in the ^a terms, (^a?a0) = (Q?11:2)?1?1TN(y??)?Q?12(Q?22)?1ZT(y??)? +(Q?11:2)?1Q?12(Q?22)?1?1v01m: (3.18) Here we choose pm as normalizer. Then pm(^a?a 0) = pm(Q? 11:2) ?1?1T N(y??)?Q ? 12(Q ? 22) ?1ZT(y??)? +pm(Q?11:2)?1Q?12(Q?22)?1?1v01m: (3.19) 43 Theorem 3.2.1. We assume the following conditions, which are equivalent to the hypotheses of Theorem 2.5.4 (Jiang [14], Theorem 2.2). (J1) E(Var(yijjv0)) and Pk x2ijk are bounded. (J2) Let ?N = ?min(PiPj(xij ?xi)(xij ?xi)t) with xi = n?1i Pnij=1xij and ?N = mini;j infjhj?Mij b00ij(h), liminf(?N=N) > 0 , m=N = o(?2N), (min1?i?m ni)?1 = o(?2N) and v0 !p 0. (J3) Let cN; dN > 0 be such that limsup(ja0j_jfl0j)=cN < 1 and P(kv0k=dN < 1) ! 1, Mij ? cN(1+(Pk x2ijk)12)+dN and let ^ = (^a; ^fl;^v) be the maximizer of lP over ?(M) = f : j?ijj? Mij; all i;jg. Then pm(^a?a 0) !d N(0; 2v): 3.3 Discussion of Jiang [14] penalized likelihood estimator ^fl in case 1 random intercept model (fl 6= 0) According to Jiang [14], for the random intercept model the penalized loglike- lihood is stated lP( ) as equation (??) in Section 2.5.2. Here for simplicity, we let wij = 1 so that lP( ) becomes the following: lP( ) = mX i=1 niX j=1 (yij?ij ?b(?ij))? ?12 mv2: (3.20) By Theorem 2.5.4 in Section 2.5.4 (Jiang [14], Theorem 2.2), ^ is the maximizer of lP( ) over ?(M). The penalizer here takes care of the identiflability problem and 44 when combining a with vi, (X;Z) is a full-column rank matrix we do not need a penalizer term. Based on Jiang?s [14] consistency results (^a !p a0; ^fl !p fl0; (1=m)Pmi=1(^vi? vi0)2 !p; (1=N)Pmi=1 ni(^vi ?vi0)2 !p 0) and (J1), (J2) and (J3) in Section 3.2, we tried to establish conditional asymptotic normality for (^fl ? fl0), given v0, under certain conditions. Taylor expansion was used to solve lp( ) to write the estimators as 0 BB @ ^fl?fl0 ^a?a0 1 CC A = Q?1gN (3.21) where gN = 0 BB @ XT(y??) ZT(y??) 1 CC A and Q = 0 BB @ XT ~BX XT ~BZ ZT ~BX ZT ~BZ 1 CC A = 0 BB @ Q11 Q12 Q21 Q22 1 CC A: (3.22) Let Q0 = 0 BB @ XTDX XTDZ ZTDX ZTDZ 1 CC A = 0 BB @ Q110 Q120 Q210 Q220 1 CC A: (3.23) We will write Bij = b00ij + 12b(3)(??ij)(^?ij ??ij0) (3.24) and ~B = diag(Bij)1?i?m ; 1?j?n i D = diag(b 00 ij(?ij0))1?i?m; 1?j?ni (3.25) Recall that Z is a block diagonal matrix, Z = diag(1ni)1?i?m, and X is (X1;:::;Xs) without the flrst column 1N, after combining flxed intercept a with the random 45 efiects vi to obtain ai = a+vi. Note that Q is a nonsingular matrix since (X;Z) is a full column rank matrix (not all elements of X can be equal). Then Q?1 = 0 BB @ Q?111:2 ?Q?111:2Q12Q?122 ?Q?122 Q21Q?111:2 Q?122:1 1 CC A (3.26) where Q11:2 = Q11 ?Q12Q?122 Q21 Q22:1 = Q22 ?Q21Q?111 Q12: Since we are only interested in the fl terms, we flnd that ^fl?fl0 = Q?111:2'XT(y??)?Q12(Q22)?1ZT(y??)?: (3.27) Let W? = XTDX?(XTDZ)(ZTDZ)?1ZTDX: (3.28) Here we choose (W?)?12Q11:20 as the normalizer. Then (W?)?12Q11:20(^fl?fl0) = (W?)?12Q11:20Q?111:2'XT(y??)?Q12(Q22)?1ZT(y??)? = I +II +III (3.29) where I = (W?)?12 'XT(y??)?Q120(Q220)?1ZT(y??)?; (3.30) II = (W?)?12 ?Q11:20Q?111:2 ?I?XT(y??); (3.31) III = (W?)?12 'Q120Q?1220 ?Q11:20Q?111:2Q12Q?122 ?Zt(y??): (3.32) 46 We can prove that under regularity conditions, given v0, I !d N(0;I) and II !p 0. Consider III, We can write III as III = (W?)?12 'I?Q11:20Q?111:2?Q120Q?1220Zt(y??) (3.33) +(W?)?12Q11:20Q?111:2'Q120Q?1220 ?Q12Q?122 ?Zt(y??): (3.34) we can also prove that k(W?)?12kpN = Op(1), kI?Q11:20Q?111:2k = op(1) and the norm of (3.33) goes to 0. where Q120Q?1220Zt(y??) = ?X i (P j xijkb 00 ij P j(yij ?b 0 ij)P j b 00 ij )! 1?k?s and we have ?Q 120Q?1220 ?Q12Q?122 ?Zt(y??) = ?X i (P j xijkb 00 ijP j b 00 ij ? P j xijkb 00 ij(? ? ij)P j b 00 ij(? ? ij) )X j (yij ?b0ij) ! 1?k?s Consider the norm of (3.34) k(W?)?12Q11:20Q?111:2'Q120Q?1220 ?Q12Q?122 ?Zt(y??)k ?kQ11:20Q?111:2kk(W?)?1=2kk'Q120Q?1220 ?Q12Q?122 kZt(y??)k ? Op(1)pN k'Q120Q?1220 ?Q12Q?122 ?Zt(y??)k = Op(1) 0 @X k ( mX i=1 ni (P j xijkb 00 ijP j b 00 ij ? P j xijkb 00 ij(? ? ij)P j b 00 ij(? ? ij) ) 1 ni X j (yij ?b0ij) )21 A 1 2 (3.35) We conjecture 1=niPj(yij ?b0ij) = Op(1=pni) and that ^?ij ??ij0 = Op(1=pni). If so, inside of square of (3.35) would be Op(m=pN) ? Op(pm=mini ni). However, we are unable to calculate the convergence rate of (^?ij ??ij0) because the number 47 of random efiects goes to inflnity. Li, Lindsay and Waterman [18] examined the asymptotic behavior of MLE?s of a scalar fl as the number of nuisance parameters goes to inflnity. They found that under rectangular asymptotics (m = cn, c flxed, m ! 1) pN(^fl1 ?fl0) !d N(?;I?1(fl)) when ? is possibly nonzero and I?1(fl) is the Fisher information. They proposed an alternative estimator (based on projected scores) which satisfled pN(^fl2 ? fl0) !d N(0;I?1(fl)). In the Chapter 6 and Ap- pendix A simulation studies, we flnd that in Logistic and Poisson random intercept with combined flxed intercept a and random efiects vi, consistency and conditional asymptotic normality results appear to hold. 3.4 Proofs. 3.4.1 Proof of Lemma 3.1.1 Recall the deflnition of Wbi in (3.3). We have uTWbiu = niX j=1 b00ij ? sX k=1 uk ? xijk ? Pni l=1 xilkb 00 ilP ni l=1 b 00 il !2 : Let ~Wi = diag(?i)??i?Ti where ?i = 2 66 66 66 4 b00i1=Pj b00ij ... b00ini=Pj b00ij 3 77 77 77 5 ni?1 : 48 Then 1 niu TWbiu = 1 niu T X j b00ijXTi ~WiXiu = 1n i uT X j b00ij(Xi ?Xi)T ~Wi(Xi ?Xi)u where Xi = (Xi1;:::;Xis); Xik = 2 66 66 66 4 xi1k ... xinik 3 77 77 77 5 ni?1 ; Xi = (xi11ni;:::;xis1ni); and xik = n?1i Pj xijk. Therefore inf kuk=1 uT ?P j b 00 ij(Xi ?Xi) T ~Wi(Xi ?Xi) ? u ni = inf kuk=1 P j b 00 ij ? uT(Xi ?Xi)T ~Wi(Xi ?Xi)u ? ni ? inf kuk=1 P j b 00 ij?2( ~Wi) ?uT(X i ?Xi)T(Xi ?Xi)u ? ni : Theboundabove?2( ~Wi)isthesecondsmallesteigenvalueof ~Wi, because(Xi?Xi)u is orthogonal to 1ni which is the eigenvector corresponding to the unique eigenvalue 0. Here we use the deflnition of the second smallest eigenvalue of Wi: 49 uT(Xi ?Xi)T ~Wi(Xi ?Xi)u uT(Xi ?Xi)T(Xi ?Xi)u ? inf kuk=1 uT(Xi ?Xi)T ~Wi(Xi ?Xi)u uT(Xi ?Xi)T(Xi ?Xi)u = ?2( ~Wi): Now we need to flnd a lower bound of ?2( ~Wi). Stewart?s [31] Theorem 5.1 states the following result: Let fi1 ? fi2 ? ??? ? fin, fl1 ? fl2 ? ??? ? fln and 1 ? 2 ? ??? ? n be the eigenvalues of the real Hermitian matrices A, B and C = A+B. Then fii +fl1 ? i ? fii +fln: In our case we set A = ??i?Ti and B = diag(?i), and by the special structure of A we have fi2 = ??? = fini = 0 and fi1 = ??Ti ?i = ?Pj ?2j. Then min j b00ijP j b 00 ij ? ?2( ~Wi): (3.36) Consider inf kuk=1 uTWbiu ni = inf kuk=1 uTfPj b00ij(?ij0)(Xi ?Xi)T ~Wi(Xi ?Xi)gu ni ? inf kuk=1 uTfPj b00ij(?ij0)(Xi ?Xi)T?2( ~Wi)(Xi ?Xi)gu ni ? inf kuk=1 uTfminj b00ij(?ij0)(Xi ?Xi)T(Xi ?Xi)gu ni > 1 c2: where c2 is a suitably large constant. Equivalently Wbi is a positive deflnite matrix. 50 3.4.2 Proof of Theorem 3.1.2 It is easy to prove that of ^flw?fl0, where ^flw = Pi ?P i(F flfl i ) ?1 ??1 (Fflfli )?1^fli, then ^flw !p fl0. by using the result of Lemma 3.1.1 and conditions of Theorem 3.1.2 because we have k^flw ?fl0k ? X i ?X i (Fflfli )?1 !?1 k(Fflfli )?1=2kk(Fflfli )?1=2(^fli ?fl0)k ? Op(1) K1 mini q ?max(Fflfli )?1 ? Op(1)=K1min i pn i pb l liminf(?ni=ni) = op(1): Here we use the fact that (Fflfli )?1=2(^fli ?fl0) is Op(1), as proved by Fahrmeir and Kaufmann [11]. Since ( X i (Fflfli )?1)?12 X i (Fflfli )?1(^fli ?fl0) = ( X i (Fflfli )?1)?12 X i (Fflfli )?1(^fli ?E(^flijvi0)+E(^flijvi0)?fl0); let u be a unit vector and Ti = ut ?X i (Fflfli )?1 !?1 2 (Fflfli )?1(^fli ?E(^flijvi0)) (3.37) = ut ?X i (Fflfli )?1 !?1 2 (Fflfli )?12(Fflfli )?12(^fli ?E(^flijvi0)): (3.38) Recall ^fl?i = (Fflfli )?12(^fli ?E(^flijvi0)). By the Cauchy-Schwartz inequality we have T2i ? 8< :u t ?X i (Fflfli )?1 !?1 2 (Fflfli )?1 ?X i (Fflfli )?1 !?1 2 u 9= ;( ^fl?i)t^fl?i (3.39) ? ?max ?X i (Fflfli )?1 !?1 ?max(Fflfli )?1(^fl?i)t^fl?i (3.40) 51 By Lemma 3.1.1 we have nik(Wbi)?1k = ni?maxFflfli = O(1), and because of bound- ness of Pk x2ijk and b00ij 1 niu tWbiu = 1 ni X j b00ij (X k uk(xijk ? P j xijkb 00 ijP j b 00 ij ) )2 = O(1): Equivalently we have (?minFflfli )?1 = kWbik = Op(ni) and (1) of Theorem 3.1.3 follows. Now we check the Lindeberg condition: mX i=1 E(T2i I(jTij > ")jvi0) = mX i=1 E(T2i I(T2i > "2)jvi0) ? ?X i ??1maxFflfli min i ?minFflfli !?1 ? mX i=1 E " (^fl?i)t^fl?iI ? (^fl?i)t^fl?i > "2?min(Fflfli )?min( X i (Fflfli )?1) ! jvi0 # ? maxi ?maxF flfl i mmini ?minFflfli mX i=1 E[(^fl?i)t^fl?i?((^fl?i)t^fl?i)jvi0] ? ( ? ? "2?min(Fflfli )( X i ??1maxFflfli ) !)?1 ? Op(1)maxi ?maxF flfl i mini ?minFflfli ( ? ? mmini ?minFflfli maxi ?maxFflfli !)?1 = op(1) since by condition (1) and (4), the argument of ? goes to 1. According to equation (3.2) we have l0n(^ i) = l0n( i0)?Fni(^ i ? i0)+(Fni ?Fni( ?))(^ i ? i0) where ?i is between i0 and ^ i. Taking the expectation on both sides and conditioning on vi0 we have E[(^ i ? i0)jvi0] = (Fni)?1E?(Fni ?Fni( ?))F?T=2ni FT=2ni (^ i ? i0)jvi0? = (Fni)?1E?(1=pni)(Fni ?Fni( ?))pniF?T=2ni FT=2ni (^ i ? i0)jvi0? 52 Let u be a unit vector and consider (1=pni)kFni ?Fni( ?i)k = (1=pni) sup ku=1k flfl flfl flfl X j (b00ij ?b00ij(??ij)) ? u1 + sX k=1 uk+1xijk !2flflfl flfl fl ? (2sK0 +2)(1=ni) X j pn ijb 00 ij ?b 00 ij(? ? ij)j ? (2sK0 +2)(1=2ni) X j pn ijb (3) ij (~?ij)jj?ij ? ^?ijj (3.41) = Op(1): (3.42) Since we have b(3)ij (~?ij) bounded and according to Fahrmeir and Kaufmann [11], we have, given vi0, pn i(^?ij ??ij0) ? AN(0;ni(Faai +F afl i xij +x t ijF fla i +x t ijF flfl i xij)): (3.43) In addition, by the assumptions of Theorem 3.1.3 we can get from (3.41) to (3.42) and pnik(Fni)?T=2k = Op(1). Then kE[(^fli ?fl0)jvi0]k?kE[(^ i ? i0)jvi0]k?k(Fni)?1k?Op(1): Now we consider the bias term ?X i (Fflfli )?1 !?1 2 X i (Fflfli )?1(E(^flijvi0)?fl0): (3.44) 53 Then 0 @ut ?X i (Fflfli )?1 !?1 2 X i (Fflfli )?1(E(^flijvi0)?fl0) 1 A 2 ? m mX i=1 ? ut( X i (Fflfli )?1)?12(Fflfli )?1(E(^flijvi0)?fl0) !2 ? m mX i=1 ut( X i (Fflfli )?1)?12(Fflfli )?1( X i (Fflfli )?1)?12u ?(E(^flijvi0)?fl0)t(Fflfli )?1(E(^flijvi0)?fl0) ? m 8 < :maxi ?max(F flfl i ) ?1?max ?X i (Fflfli )?1 !?19= ; ? mX i=1 (E(^flijvi0)?fl0)t(Fflfli )?1(E(^flijvi0)?fl0) ? maxi ?maxF flfl i mini ?minFflfli mX i=1 (E(^flijvi0)?fl0)t(Fflfli )?1(E(^flijvi0)?fl0) ? Op(1) mX i=1 kE(^flijvi0)?fl0k2k(Fflfli )?1k ? ?2Op(1) mX i=1 k(Fni)?1k2k(Fflfli )?1k (3.45) ? ?2Op(1)max i ?max(Fni)?1 ?min(Fni)?1 mX i=1 k(Fni)?1k (3.46) = op(1) The step from (3.45) to (3.46) follows Schott?s [29] Theorem 3.20: If A is a m?m symmetric matrix and Ak is a leading k?k principal submatrix we have the following inequality: ?m?i+1(A) ? ?k?i+1(Ak) ? ?k?i+1(A) where i = 1;??? ;k and ?1 is the maximum eigenvalue of A. Here we chose A = (Fni)?1 and Ak = Fflfli . 54 By the Central Limit Theorem and Slutsky?s Theorem, we have, given v0, ?X i (Fflfli )?1 !?1 2 X i (Fflfli )?1(^fli ?fl0) !d N(0;I): 3.4.3 Proof of Theorem 3.1.3 If we let V = Pi(Fflfli )?1 and ^V = Pi(^Fflfli )?1, then k^^flw ?fl0k ?k^^flw ? ^flwk+k^flw ?fl0k = mX i=1 ?^ V?1 n [(^Fflfli )?1Fflfli ?I]V+V? ^V o? wi(^fli ?fl0) +op(1) ? X i ?^ V?1 n [(^Fflfli )?1Fflfli ?I]V+V? ^V o? kwi(^fli ?fl0)k+op(1) = op(1) Since we can prove ?^ V?1 n [(^Fflfli )?1Fflfli ?I]V+V? ^V o? ?k^V?1kk[(^Fflfli )?1Fflfli ?I]kkVk+k^V?1kkV? ^Vk (3.47) = op(1) (3.48) Consider mX i=1 2 4 ?X i (^Fflfli )?1 !?1 2 (^Fflfli )?1 ? ?X i (Fflfli )?1 !?1 2 (Fflfli )?1 3 5(^fli ?fl0) = k mX i=1 ?^ V?12(^Fflfli )?1 ?V?12(Fflfli )?1 ? (^fl?fl0)k = k mX i=1 V?12 ? V12 ^V?12(^Fflfli )?1 ?(Fflfli )?1 ? (Fflfli )12(Fflfli )?12(^fli ?fl0)k ? mX i=1 kV?12kkV12 ^V?12(^Fflfli )?1 ?(Fflfli )?1kk(Fflfli )12kk(Fflfli )?12(^fli ?fl0)k ? I1 +I2 55 where I1 = mX i=1 kV?12kkV12 ^V?12 ?Ikk(^Fflfli )?1kk(Fflfli )12kk(Fflfli )?12(^fli ?fl0)k I2 = mX i=1 kV?12kk(^Fflfli )?1 ?(Fflfli )?1kk(Fflfli )12kk(Fflfli )?12(^fli ?fl0)k Consider kV12 ^V?12 ?Ik = k(V12 ? ^V12)^V?12k (3.49) ?kV? ^Vk12k^V?12k (3.50) The step from (3.49) to (3.50) follows from Theorem X.1.1 of Bhatia?s [3]. A function f is said to be matrix monotone of order n if it is monotone with respect to this order n ? n Hermitian matrices, i.e., if A ? B implies f(A) ? f(B). If f is matrix monotone of order n for all n then we say f is matrix monotone or operator monotone. Then the following theorem holds: Let f be an operator monotone function on [0;1] such that f(0) = 0. Then for all positive operators A, B, kf(A)?f(B)k? f(kA?Bk): Here our operator monotone function is the square root function. 56 In addition, we have 1 NkV? ^Vk = 1 Nk X i (Fflfli )?1 ? X i (^Fflfli )?1k ? 1N X i k(Fflfli )?1 ?(^Fflfli )?1k = 1N X i ni 1 ni(F flfl i ) ?1 ? 1 ni( ^Fflfli )?1 : Since here we have the assumption that jvi0j is bounded, the assumption of Theorem 3.1.3 and the result of Lemma 3.1.2, considering condition (3) of Theorem 3.1.4 we have NkV?12k2 = Nk ?X i (Fflfli )?1 !?1 k = N ? ?min( X i (Fflfli )?1) !?1 ? N ?X i ?min(Fflfli )?1 !?1 = N ?X i ni 1n i ??1maxFflfli !?1 = O(1): and 1 ni ? (Fflfli )?1 ?(^Fflfli )?1 ? = 1n i ?X j xijkxijl(b00ij ?b00(^?ij))+ ( P j xijkb 00 ij)( P j xijlb 00 ij)P j b 00 ij ?( P j xijkb 00 ij(^?ij))( P j xijlb 00 ij(^?ij))P j b 00 ij(^?ij) ! 1?k?s;1?l?s Therefore, 1 nik((F flfl i ) ?1 ?(^Fflfl i ) ?1)k? II1 +II2 57 where II1 = 1n i sup kuk=1 flfl flfl fl X j (b00ij(^?ij)?b00ij)( X k ukxijk)2 flfl flfl fl (3.51) II2 = 1n i sup kuk=1 flfl flfl fl ?(P k uk P j xijkb 00 ij) 2 P j b 00 ij ? ( P k uk P j xijkb 00 ij(^?ij)) 2 P j b 00 ij(^?ij) !flfl flfl fl: (3.52) Let rij = b00ij(^?ij)?b00ij(?ij0) and consider flfl flfl fl (Pk ukPj xijkb00ij)2P j b 00 ij ? ( P k uk P j xijkb 00 ij(^?ij)) 2 P j b 00 ij(^?ij) flfl flfl fl = flfl flfl fl P j b 00 ij(^?ij)( P k uk P j xijkb 00 ij) 2 ?P j b 00 ij( P k uk P j xijkb 00 ij(^?ij)) 2 P j b 00 ij P j b 00 ij(^?ij) flfl flfl fl ? flfl flfl fl (Pk ukPj xijkrij)2P j b 00 ij flfl flfl fl+ flfl flfl fl 2(Pk ukPj xijkb00ij(^?ij))(Pk ukPj xijkrij)P j b 00 ij flfl flfl fl + flfl flfl fl P j rij( P k uk P j xijkb 00 ij(^?ij)) 2 P j b 00 ij P j b 00 ij(^?ij) flfl flfl fl; By the Cauchy-Schwartz inequality II1 ? 1n i sup kuk=1 flfl flfl fl X j (b00ij(^?ij)?b00ij) X k u2k X k x2ijk flfl flfl fl ? K 02 ni X j j12rijj; and II2 ? 1n i (PkPj x2ijk)(Pj r2ij)P j b 00 ij + 1n i (Pj jrijj)Pk(Pj xijkb00ij(^?ij))2P j b 00 ij(^?ij) P j b 00 ij + 1n i sup kuk=1 (Pk jukjPj jxijkjb00ij(^?ij))(Pk jukjPj jxijkjjrijj)P j b 00 ij ? K02 +2sK02 4bl ? 1 ni X j j12rijj+ K02 bl ? 1 ni X j 1 2rij ?2 : Since Psk=1 x2ijk < K0 where K0 is a positive constant and according to Fahrmeir and Kaufmann [11], given vi0, we have (3.43). 58 By the delta method and assumptions of Theorem 3.1.3, we obtain k 1n i ? (Fflfli )?1 ?(^Fflfli )?1 ? k = Op(1=pni): BytheassumptionofTheorem3.1.2, wehave, (1=N)kV?^Vk? Op(1)=NPipnici = op(1) and conditions (1) and (2) of Theorem 3.1.3 are satisfled. We can prove (1=N)kV ? ^Vk = op(1) and NkV?1k = O(1) which leads the result Nk^V?1k = Op(1). Also it easy to get (pni=N)kV ? ^Vk = Op(1) and (1=ni)k(^Fflfli )?1k = Op(1), then we can get from (3.47) to (3.48) which leads the conclusion that ^^flw is consistent. By (2) of Theorem 3.1.2, we have I1 !p 0 and I2 !p 0. The result of Theorem 3.1.3 follows. The similar argument follows for replacing ^Fflfli by ^^Fflfli since by the conclusion of Theorem 3.1.3, we have, according to Fahrmeir and Kaufmann [11] and Slutsky?s Theorem, given vi0, pn i(^^?ij ??ij0) ? AN(0;niFaai ): By the delta method and assumptions of Theorem 3.1.2, we obtain k 1n i ? (Fflfli )?1 ?(^^Fflfli )?1 ? k = Op(1=pni): The asymptotic normality result follows. 3.4.4 Proof of Corollary 3.1.4 We prove the unconditional convergence in (3.8); the same method applies to (3.9) and (3.10). Write Zm; N = ?P m i=1fF flfl i g ?1 ??1=2 (^fli ? fl0), Let B be any p-dimensional rectangle with rational coordinates for all vertices. We know from Theorem 3.1.2 59 that P[Zm; N 2 Bjv0] ! P(Z 2 B) where Z ? N(0;I). From properties of conditional expectation, E[P(Zm; N 2 Bjv0)] = P[Zm; N 2 B]. where the expectation is taken over the distribution of v0. Almost surely, P[Zm; N 2 Bjv0] ? 1, so we can use the Dominated Convergence Theorem to write limP[Zm; N 2 B] = limE[P(Zm; N 2 Bjv0)] = E[limP(Zm; N 2 Bjv0)] = E[P(Z 2 B)] = P(Z 2 B): In fact, since the rationals are countable, the above convergence holds simul- taneously over all rectangles whose vertices have rational components, and hence for all Borel sets. 3.4.5 Proof of Theorem 3.2.1 Consider (Q?22)?1: (Q?22)?1 = (ZT~BZ+ ?1mJ)?1 = (ZT~BZ)?12(I+A)?1(ZT~BZ)?12 (3.53) where A = (ZT~BZ)?12 ?1mJ(ZT~BZ)?12: 60 Since ZT~BZ is a symmetric matrix, using the l2 norm of a symmetric matrix we have: k(ZT~BZ)?12k2 = sup kuk=1 juT(ZT~BZ)?12(ZT~BZ)?12uj = k(ZT~BZ)?1k: Then kAk?k(ZT~BZ)?12k2k?1mJk = j?1jmin ijE?ij : Recall that Bij deflned in (3.24) is positive and lies between b00ij and b00ij(^?ij). By Jiang?s [14] condition (J2), we have the following: kAk? j?1jmin ijE?ij ? 1=(n??N) = o(1=pn?): where n? = mini ni. By applying the Neumann series we have (Q?22)?1 = (ZT~BZ)?12(I?A+A2 ?A3 +???)(ZT~BZ)?12 After the calculation we can get a closed form of (Q?22)?1 as the following: (Q?22)?1 = 2 66 66 66 4 (E?1)?1 +(E?1)?2C??1 ??? (E?1)?1(E?m)?1C??1 ... ... ... (E?1)?1(E?m)?1C??1 ??? (E?m)?1 +(E?m)?2C??1 3 77 77 77 5 m?m (3.54) where C??1 = 1X i=1 (?1m)i( mX k=1 (E?k)?1)i?1(?1)i = ? ?1? 1 Pm k=1(E ? k)?1 +m : (3.55) and E?i = Pj Bij. 61 Recall (3.19). We have pm(^a?a 0) = pm(Q? 11:2) ?1?1T N ?Q ? 12(Q ? 22) ?1ZT?(y??) +pm(Q?11:2)?1Q?12(Q?22)?1?1v01m where Q?12 = (E?1; ::: ;E?m) and Q?11:2 = ?m2C??1: Then, jpm(Q?11:2)?1?1TN(y??)?Q?12(Q?22)?1ZT(y??)?j = 'pmj(Q?11:2)?1?1TN ?Q?12(Q?22)?1ZT?(y??)j? = pm flfl flfl fl 1 m X i (E?i )?1 X j (yij ?b0ij) flfl flfl fl ? op(1)pmn ? j X i X j (yij ?b0ij)j = op(1): (3.56) Since by (J1) we have 1=pNjPiPj(yij ?b0ij)j = Op(1). Then pm(^a?a0) and pm(Q?11:2)?1Q?12(Q22)?1?1v01m have the same asymp- totic distribution. Furthermore, pm(Q? 11:2) ?1Q? 12(Q22) ?1?1v01m = pm ?1m2C? ?1 ? mv0?1 +v0?1mC??1 X i (E?i )?1 ! = ? ?1mC? ?1 pmv 0 ? v0?1p m X i (E?i )?1: Since by (J1), condition (2) of Theorem 3.1.3 and v0 !p 0, ? ?1mC? ?1 = (1+ ?1 P i(E ? i ) ?1 m ) !p 1 (3.57) v0?1 P i(E ? i ) ?1 pm !p 0 (3.58) 62 and the fact that the vi0 are iid with mean 0 and variance 2v, by the Central Limit Theorem pmv 0 !D N(0; 2v): By Slutsky?s theorem, asymptotic normality follows. 63 Chapter 4 Logistic 2?2?m table This Chapter illustrates conditional asymptotic normality results of Theorem 3.1.3 and 3.1.4. In order to check the asymptotic results, simulations are performed to explore the asymptotic properties of our estimator. We also apply the estimator to a real data set to compare with Mantel-Haenszel estimator. 4.1 Logistic 2?2?m table We use this example to illustrate Theorem 3.1.2 and 3.1.3. The 2 ? 2 ? m table example is set up as the following: logitP(yij = 1jx) = fii +flxij where xij=1 or 0 and the table is the following: Table 4.1: 2?2?m table y = 0 y = 1 x = 0 a1 b1 x = 1 c1 d1 ... y = 0 y = 1 x = 0 am bm x = 1 cm dm For 2?2?m tables, there are two types of models: Model I : the number of tables m remains flxed but individual cell sizes increase without bound. 64 Model II : the number of tables m increases but the cell sizes remained bounded. Breslow [5] studied the properties of four commonly used estimators of the odds ratio in Model II: Consider a series of m pairs of independent binomial observations (di;bi) with denominators (ni = di + ci; mi = ai + bi) and success probabilities (p1i; p0i) for i = 1; ::: ;m. Its assumed throughout that the odds ratio ? = (p1iq0i)=(p0iq1i) remains constant from table to table. One of the earliest estimators (Woolf, 1955) of the common odds ratio ? is the empirical logit estimator deflned by log(^?G) = " mX i=1 wi log ?(a i +?)(di +?) (ci +?)(bi +?) #,?X i wi ! where the weights are wi = (1=(ai +?)+1=(bi +?)+1=(ci +?)+1=(di +?))?1 and ? > 0 is a constant added to each cell to avoid zero denominators. The choice ? = 1=2 is the most popular because it is thought to reduce the bias in small examples (Anscombe, 1956). There are unconditional MLE and conditional MLE (the details omitted here). The fouth and flnal estimator is given by famous formula of Mantel and Haenszel (1959): ^?MH = P i(aidi=Ni)P i(cibi=Ni) where Ni = ai +bi +ci +di = mi +ni. Due largely to its simplicity, ^?MH has been widely used by practicing statisticians and epidemiologists. Breslow [5] showed that 65 in the Model II setting, the empirical logit estimator does not converge to the true odds ratio. The Mantel-Haenszel estimator is consistent and retains good e?ciency even for moderately large odds ratios in the Model II setting (sparse data). Let Ri = aidi=Ni and Si = cibi=Ni, Breslow [5] proposed an empirical estimate of Var[^?MH] as mdVarE(^?MH) = P i(Ri ? ^?MHSi)=m (Pi Si=m)2 Robins, Breslow and Greenland [25] proposed a new estimator of Var[^?MH] as: mdVarUS(^?MH) = m ?P i PiRi 2R2+ + P i(PiSi +QiRi) 2R+S+ + P i QiSi 2S2+ ? (^?MH)2 where Pi = (ai + di)=Ni, Qi = (ci + bi)=Ni, R+ = Pi Ri and S+ = Pi Si. The corresponding estimators of mVar(log ^?MH) are mVi = mdVari(^?MH)=(^?MH)2 where i 2 US; E means using dVarUS, dVarE respectively. The Robins-Breslow- Greenland [25] estimator is consistent under both Model I and Model II. The estimating equation xti(yi ??i) is 0 = bi +di ?(ai +bi)exp(fii)=(1+exp(fii)) ?(ci +di)exp(fii +fl)=(1+exp(fii +fl)) (4.1) 0 = di ?(ci +di)exp(fii +fl)=(1+exp(fii +fl)) (4.2) and we can get the MLE of fl as log(aidi=bici) = logai +logdi ?logbi ?logci. Let Zi0 = (ai ?E(aijfii0))= p Var(aijfii0) 66 and Zi1 = (ci ?E(cijfii0))= p Var(cijfii0) For simplicity we assume ai + bi = ni and ci + di = ni. Since given fii0, ai belongs to Binomial(ni;1=(1+exp(fii0))) we have ai = ni1+exp(fi i0) + p Var(aijfii0) ? ai ?E(aijfii0)p Var(aijfii0) ! = ni1+exp(fi i0) +Zi0exp(fii0=2) pn i 1+exp(fii0) = ni1+exp(fi i0) 1+Zi0exp(fii0=2)pn i ? ; logai ? log n i 1+exp(fii0) ? +Zi0exp(fii0=2)pn i ?Z2i0exp(fii0)2n i +o( 1n i ): By the same argument we have ?logbi ? ?log n i exp(fii0) 1+exp(fii0) ? +Zi0exp(?fii0=2)pn i +Z2i0exp(?fii0)2n i +o( 1n i ); ?logci ? ?log n i 1+exp(fii0 +fl0) ? ?Zi1exp((fii0 +fl0)=2)pn i ; +Z2i1exp(fii0 +fl0)2n i +o( 1n i ) logdi ? log n i exp(fii0 +fl0) 1+exp(fii0 +fl0) ? ?Zi1exp(?(fii0 +fl0)=2)pn i ?Z2i1exp(?(fii0 +fl0))2n i +o( 1n i ): Then ^fli = fl0 + Zi0p ni (exp(fii0=2)+exp(?fii0=2)) ? Zi1pn i (exp((fii0 +fl0)=2)+exp(?(fii0 +fl0)=2)) ? Z 2 i0 2ni (exp(fii0)?exp(?fii0)) + Z 2 i1 2ni (exp(fii0 +fl0)?exp(?(fii0 +fl0)))+op( 1 ni): 67 The bias of ^fli is the following: E((^fli ?fl0)jfii0) = exp(fii0)(exp(fl0)?1)?exp(?fii0)(exp(?fl0)?1)2n i +o( 1n i ): Let G(x) = (4+4exp(?x)+4exp(x))?1 ; (4.3) in order to check the hypotheseses of Theorem 3.1.2, flrst we consider 1ni(Fflfli )?1 = G(?ij) where ?ij = fii +xijfl and xij = 1 or 0. Since we assume jfii0j is bounded, it is obvious that infi G(?ij0) > ?1 > 0 and supi G(?ij0) < M1 < 1, for some positive constants ?1 and M1. Similar arguments follow for ni(Fni)?1. We need to verify condition (1) of Theorem 3.1.3. In this case we have the log-likelihood function for ith group as the following: l = bifii +di(fii +fl)?ni log(1+exp(fii))?ni log(1+exp(fii +fl)): Then ? 1n i l00(fii0;fl0) = 0 BB @ Ii1 +Ii2 Ii2 Ii2 Ii2 1 CC A (4.4) where Ii1 = exp(fii0)(1+exp(fi i0))2 Ii2 = exp(fii0 +fl0)(1+exp(fi i0 +fl0))2 Since in this special example Fflfli as a scalar instead of a matrix, the Lindeberg condition veriflcation can be simplifled and for condition (1) of Theorem 3.1.2 we 68 only need to verify K1 < mini F flfl i maxi Fflfli < K2 where K1 and K2 are positive constants. Obviously here K2 = 1 since here we have Fflfli = (1=ni)((1+exp(?fl0))exp(?fii0)+(1+exp(fl0))exp(fii0)+4). Consider condition (3) of Theorem 3.1.2: Let ?maxF?1ni ?minF?1ni = supkuk=1 fi(u) infkuk=1 fi(u) where fi(u) = (1+exp(fii0 +fl0)) 2 exp(fii0 +fl0) u 2 2 +(u1 ?u2) 2(1+exp(fii0)) 2 exp(fii0) : For the further calculation we write fi(u) = exp(fii0) ?(exp(?fi i0)+exp(fl0))2 exp(fl0) u 2 2 +(u1 ?u2) 2(1+exp(?fii0))2 ? = exp(?fii0) ?(1+exp(fi i0 +fl0))2 exp(fl0) u 2 2 +(u1 ?u2) 2(1+exp(fii0))2 ? Let B1 = (exp(?fii0)+exp(fl0)) 2 exp(fl0) ; A1 = (1+exp(?fii0)) 2: and B2 = (1+exp(fii0 +fl0)) 2 exp(fl0) ; A2 = (1+exp(fii0)) 2: Here we have u1 = p1?u22 for A1, B1 and A2, B2 as the above, we have (u22)1 = 12 ? 12 q 1?4A21=(4A21 +B21) (u22)2 = 12 ? 12 q 1?4A22=(4A22 +B22) 69 So we have max i supkuk=1 fi(u) infkuk=1 fi(u) = max i A1(1?2pA21=(4A21 +B21))+B1(12 + 12p1?4A21=(4A21 +B21)) A1(1?2pA21=(4A21 +B21))+B1(12 ? 12p1?4A21=(4A21 +B21)) and we can see that when fii0 ! +1we have maxi?supkuk=1 fi(u)=infkuk=1 fi(u)? = O(1). Also max i supkuk=1 fi(u) infku=1kfi(u) = max i A2(1?2pA22=(4A22 +B22))+B2(12 + 12p1?4A22=(4A22 +B22)) A2(1?2pA22=(4A22 +B22))+B2(12 ? 12p1?4A22=(4A22 +B22)) we can see that when fii0 ! ?1 we have maxi?supkuk=1 fi(u)=infkuk=1 fi(u)? = O(1). Consider F?1ni = ? ?l00(fii0;fl0) ??1 = 1=ni 0 BB @ I?1i1 ?I?1i1 ?I?1i1 I?1i1 +I?1i2 1 CC A: (4.5) Here kuk2 = u21 +u22 = 1 so that mX i kF?1ni k = X i 1 ni supkuk=1 (1+exp(fi i0 +fl0))2 exp(fii0 +fl0) u 2 2 +(u1 ?u2) 2(1+exp(fii0)) 2 exp(fii0) ? ? mX i 1 ni (6+(2+exp(?fl0))exp(?fii0)+(2+exp(fl0))exp(fii0)) ? 6mmin i ni + mX i=1 c1 exp(jfii0j) 1n i ? 6mmin i ni + mc1min i ni 1 m mX i=1 exp(jfii0j): (4.6) Ifwehaveconditionthe(1=m)Pmi=1 exp(jfii0j) !a:s: E(exp(jfii0j)), thenbycondition (2) of Theorem 3.1.2, (4.6)= 6m=mini ni + Op(m=mini ni) = op(1), the asymptotic 70 relation k(1=pni)[Fni ?Fni( ?)]k = Op(1) can be verifled following the same argu- ment in the proof of Theorem 3.1.3, we verifled assumptions of Theorem 3.1.2 which gives kpni(Fni)?T=2k = O(1), so condition (3) of Theorem 3.1.3 is satisfled. Consider condition (4) of Theorem 3.1.2 and here we let ?(t) = t: We have: ^fli ?E(^flijfii0) = Zi0pn i (exp(fii0=2)+exp(?fii0=2)) ? Zi1pn i (exp((fii0 +fl0)=2)+exp(?(fii0 +fl0)=2)) ? 1+Z 2 i0 2ni (exp(fii0)?exp(?fii0)) + 1+Z 2 i1 2ni (exp(fii0 +fl0)?exp(?(fii0 +fl0)))+o( 1 ni); and ?K i1Zi0p ni ? Ki3(1+Z2i0) 2ni + Ki4(1+Z2i1) 2ni ? Ki2Zi1p ni +o( 1 ni) ?4 ? ?flfl flflKi1Zi0p ni ? Ki2Zi1p ni flfl flfl+ flfl flflKi4(1+Z2i1) 2ni ? Ki3(1+Z2i0) 2ni +o( 1 ni) flfl flfl ?4 ? 8 "flfl flflKi1Zi0p ni ? Ki2Zi1p ni flfl flfl 4 + flfl flflKi4(1+Z2i1) 2ni ? Ki3(1+Z2i0) 2ni +o( 1 ni) flfl flfl 4# ? 64 "flfl flflKi1Zi0p ni flfl flfl 4 + flfl flflKi2Zi1p ni flfl flfl 4 + flfl flflKi4(1+Z2i1) 2ni flfl flfl 4 + flfl flflo( 1 ni)? Ki3(1+Z2i0) 2ni flfl flfl 4# ? 64 ?K4 i1Z 4 i0 n2i + K4i2Z4i1 n2i + K4i4(1+Z2i1)4 16n4i +o( 8 n4i )+ 8K4i3(1+Z2i0)4 16n4i ? ? 64 ?K4 i1Z 4 i0 n2i + K4i2Z4i1 n2i + K4i4(1+Z8i1) 2n4i +o( 8 n4i )+ 4K4i3(1+Z8i0) n4i ? : 71 Then we have E((^fl?ti ^fl?i )2jfii0) = (Fflfli )?2E (? Ki1Zi0p ni ? Ki3(1+Z2i0) 2ni + Ki4(1+Z2i1) 2ni ? Ki2Zi1p ni +o( 1 ni) ?4 jfii0 ) ? 64(Fflfli )?2E ??K4 i1Z 4 i0 n2i + K4i2Z4i1 n2i + K4i4(1+Z8i1) 2n4i +o( 8 n4i )+ 4K4i3(1+Z8i0) n4i ? jfii0 where Ki1 = exp(fii0=2)+exp(?fii0=2), Ki2 = exp((fii0+fl0)=2)+exp(?(fii0+fl0)=2), Ki3 = exp(fii0)?exp(?fii0), Ki4 = exp(fii0 +fl0)?exp(?(fii0 +fl0)) . Since here Zi0 and Zi1 are normalized binomial random variables for ni with difierentmeansni=(1+exp(fii0)), ni=(1+exp(fii0+fl0))andvariancesni exp(fii0)=(1+ exp(fii0)), ni exp(fi0+fl0)=(1+exp(fii0+fl0)) respectively, which are both in order of ni, by M. Znidaric [38] we have E(Z4i0jfii0) = (n2i +O(n?1i ))Op(n?2i ) = Op(1). By the same argument we have E(Z8i0jfii0) = Op(1), E(Z4i1jfii0) = Op(1) and E(Z8i1jfii0) = Op(1), so it is easy to show that condition (4) of Theorem 3.1.2 satisfled. Now we consider the following: ? ( X i (Fflfli )?1)?12 X i (Fflfli )?1(E(^flijvi0)?fl0) !2 = (X i (Fflfli )?1 K i4 ?Ki3 2ni +op( 1 ni) ?)2 ? (X i ni?I?1i1 +I?1i2 ??1 )?1 ? m X i n2i I i1Ii2 Ii1 +Ii2 ?2 K i4 ?Ki3 2ni +op( 1 ni) ?2(X i ni?I?1i1 +I?1i2 ??1 )?1 (4.7) ? 2m X i ?I?1 i1 +I ?1 i2 ??2?(K2 i4 +K 2 i3)+op(1) ? (X i ni?I?1i1 +I?1i2 ??1 )?1 (4.8) ? Op(1) mmin i ni (4.9) The step from (4.7) to (4.8) follows because of (1) and (2) of Theorem 3.1.2, the fact that jfii0jis bounded and maxi(K2i3 +K2i4)=mini(I?1i1 +I?1i2 ) = Op(1): The result 72 of Theorem 3.1.2 follows. Consider (1) of Theorem 3.1.3. We have 1 Nk ^V?Vk = 1 Nk X i nifG(^?ij)?G(?ij0)gk ? 1N mX i=1 nikG(^?ij)?G(?ij0)k = 1N X i pn iOp(1) = op(1): The above conclusion follows from the delta method since, given fii0, pn i(G(^?ij)?G(?ij0)) ? AN(0;ni(G 0(?ij0))2(Faa i +2F afl i +F flfl i )): Also NjV ?1j = Nj(Pi niG(?ij0)j = O(1) and condition (1) of Theorem 3.1.3 is satisfled. For condition (2) of Theorem 3.1.4, j(^Fflfli )?1 ? (Fflfli )?1j = nijG(^?ij) ? G(?ij0)j = Op(pni), the condition (3) is easy to verifled. The result of Theorem 3.1.3 follows. Similar argument follow if we replace ^flij by ^^?ij since ,given fii0, pn i(G(^^?ij)?G(?ij0)) ? AN(0;ni(G 0(?ij0))2Faa i ): 4.2 Simulation results for 2?2?m table. We consider the 2?2?m table as the following set up: logitP(yij = 1jfii;xij) = fii +xijfl where fii is the random efiect and fl is flxed efiect, xij=0 or 1. The fiis are uniformly distributed between -0.8 and 0.8 since from the Section 4.1 we assumejfii0j bounded. We maximize the following likelihood function to get estimator of ^fli and ^fii for each group according to Table 4.1. L = 1 1+exp(fii) ?ai exp(fi i) 1+exp(fii) ?bi 1 1+exp(fii +fl) ?ci exp(fi i) 1+exp(fl +fii) ?di 73 The weighted sum of the ^fl (using both true and estimated weights) are simulated. It is easy to solve the above equation and get ^fli = log(aidi=cibi) and ^fii = logbi=ai for the ith table. 4.2.1 Unconditional convergence in distribution We simulated both balanced and unbalanced m tables with n observations per table. We generated product binomial data for the flrst and second rows with success probabilitiesexp(fii0)=(1+exp(fii0))andexp(fl0+fii0)=(1+exp(fl0+fii0))respectively. Here ai +bi +ci +di = n and for the balanced case we have ai +bi = ci +di = n=2. For the unbalanced case we either have ai+bi = n=3 or ai+bi = n=4. We simulated with either fl0 = 1 or = 0:5. Various combinations of (m;n) and true values of fl were used for both balanced and unbalanced settings. For each combination, 1000 replications of random efiects fii and m groups of ai, bi, ci and di were generated. Estimated regression coe?cients for various choices of (m;n) and fl0 with fii ? Unif(?0:8;0:8) are summarized in Table 4.2 and Table 4.3. For the balanced setup, the tables display the means and standard errors of the simulated values of (^flw ?fl0)=s:e:, (^^flw ?fl0)=ds:e:, ^flw ?fl0, ^^flw ?fl0, 95% confldence bounds for (^flw?fl0) and (^^flw?fl0) based on Student?s t. From Table 4.2 and Table 4.3 under the balanced setup, we can see that for fl0 = 0:5 in all the combinations ^flw is approximately unbiassed and for all combinations with fl0 = 1 except for (100;50) ^flw is slightly biased. For fl0 = 0:5 and fl0 = 1 ^^flw is approximately biased in all 74 combinations except the combination fl0 = 1 or 0.5, m = 100 and n = 50. For all of the combinations in both tables, Shapiro-Wilk test p values are above 0.24 except for fl0 = 0:5 (20;400) and fl0 = 1 and (40;800) which means in those combinations, ^flw and ^^flw given fi0 are normal. Likewise, Kolmogorov-Smirnov test p values are larger than 0.06. From Table 4.2 and Table 4.3, we can see that bias/se< 0:208 except for ex- treme case (m = 100;n = 50) so the normal inference is not greatly afiected in the cases where consistency results do not hold. For the combination (100;50), in order to avoid 0 observations in the cells we use (Woolf, 1955)?s adjusted method to have ^fli = logf[(ai +1=2)(di +1=2)]/[(ci +1=2)(bi +1=2)]gand ^fii = log(fbi +1=2g/fai +1=2g). We compared our estimators ^flw with true weights and ^^flw with estimated weights to the Mantel-Haenszel estimator ^flMH. To see whether our estimators are more e?cient, we compared the standard deviation of our estimators from 1000 replications with that of the Mantel-Haenszel estimator. From the results of difierent combinations of (m;n) and fl0, our estimators are almost as e?cient as the Mantel- Haenszel estimator. Since ^^flw is constructed by plugging the MLE ^fli for ith group into the weight formula, we can also construct another empirical estimator of fl by plugging in ^^flw. We ran similar simulations of this new empirical estimator, which still introduced more positive bias. But the normality results hold for this new empirical estimator. It suggests that in practice, one can stay with the simple empirical estimator by plugging MLE ^fli from each group. The plots show that ^flw and ^^flw are approximately normal, with departures from normality in the extreme tails in this balanced setup. 75 Table 4.2: Simulated estimates of logistic odds ratio in 2?2?m balanced table. (^flw ?fl0) (^^flw ?fl0) (m, n) mean std 95% CI for (^flw ?fl0) mean std fl0 = 0:5, balanced set up (10,200) 0.006 0.095 (0.000, 0.001) -0.0001 0.093 (10,300) 0.002 0.077 (-0.003, 0.007) -0.002 0.076 (20,400) 0.002 0.067 (-0.002, 0.007) -0.004 0.066 (30,600) 0.001 0.032 (-0.002, 0.001) -0.002 0.032 (40,800) -0.002 0.023 (0.000, 0.003) 0.000 0.023 (100,50) -0.001 0.060 (-0.005, 0.002) -0.028 0.056 fl0 = 1, balanced setup (10,200) 0.013 0.100 (0.006, 0.019) -0.002 0.098 (10,300) 0.012 0.080 (0.008, 0.017) 0.003 0.079 (20,400) 0.007 0.050 (0.004, 0.010) -0.001 0.050 (30,600) 0.004 0.033 (0.002, 0.006) -0.001 0.032 (40,800) 0.005 0.024 (0.003, 0.006) 0.001 0.024 (100,50) 0.000 0.063 (-0.004, 0.004) 0.061 0.057 76 Table 4.3: Simulated standardized estimates of log odds ratio (standardized by true and estimated conditional standardized error) in 2?2?m balanced table . (^flw ?fl0)=s:e: (^^flw ?fl0)=ds:e: (m, n) mean std 95% CI for (^^flw ?fl0) mean std fl0 = 0:5, balanced set up (10,200) 0.064 1.016 (-0.006, 0.006) -0.005 0.995 (10,300) 0.026 1.011 (-0.007, 0.003) -0.029 0.997 (20,400) 0.037 1.017 (-0.008, 0.0003) -0.060 0.993 (30,600) 0.016 1.019 (-0.001, 0.002) -0.054 1.014 (40,800) 0.064 0.990 (-0.002, 0.001) -0.006 0.986 (100,50) -0.023 1.032 (-0.031, -0.024) -0.475 0.958 fl0 = 1, balanced set up (10,200) 0.131 1.029 (-0.008, 0.004) -0.026 1.000 (10,300) 0.156 1.005 (-0.002, 0.008) 0.026 0.985 (20,400) 0.144 1.030 (-0.004, 0.003) -0.013 1.021 (30,600) 0.120 1.002 (-0.003 0.001) -0.037 1.000 (40,800) 0.190 0.979 (-0.001, 0.002) 0.035 0.972 (100,50) -0.001 1.043 (-0.064, -0.057) -0.994 0.933 77 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=30, n1=600, betawhat, beta0=0.5 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=30, n1=600, betawhathat, beta0=0.5 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=40, n1=800, betawhat, beta0=0.5 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=40, n1=800, betawhathat, beta0=0.5 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -4 -2 0 2 4 m=30, n1=600, betawhat, beta0=1 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -4 -2 0 2 m=30, n1=600, betawhathat, beta0=1 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -4 -2 0 2 m=40, n1=800, betawhat, beta0=1 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -4 -2 0 2 m=40, n1=800, betawhathat, beta0=1 Figure 4.1: Q-Q plots for (^flw?fl0) and (^^flw?fl0) standardized by true and estimated conditional standard error, for various values of (m;n) in logistic 2?2?m balanced setup. 78 For each combination, 1000 replications of random efiects fii and m groups of ai, bi, ci and dis were generated. Estimated regression coe?cients for various choices of (m;n), fl0 with fii ? Unif(?0:8;0:8) are summarized in Table 4.4 and Table 4.5. For the unbalanced setup, the tables display the means and standard errors of the simulated values of (^flw?fl0)=s:e:, (^^flw?fl0)=ds:e:, (^flw?fl0), (^^flw?fl0), 95% confldence bounds for (^flw?fl0) and (^^flw?fl0) based on Student?s t. From Table 4.4 and Table 4.5 for fl0 = 0:5 or 1 with unbalanced setup (1=3;2=3) (meaning ai +bi = n=3) and (1=4;3=4) (meaning ai + bi = n=4), we can see that in all the combinations ^flw is approximately unbiased hold except for the combination with fl0 = 1, (1=4;3=4), m = 10, n = 200 or fl0 = 0:5, (1=3;2=3), m = 10, n = 300 and fl0 = 0:5 with m = 30, n = 600 or m = 20, n = 400. But their Shapiro-Wilk test p values are above 0.19 except fl0 = 0:5, (20;400) which means in those combinations, ^flw given fi0 are conditionally normal and Kolmogorov-Smirnov test p values are larger than 0.10. For ^^flw in all the combinations except for fl0 = 1 or 0.5, (1=4;3=4) or (1=3;2=3), m = 100, n = 60 the estimator is approximately unbiased. From Table 4.4 and Table 4.5, we can see that bias/se< 0:109 except for extreme combinations like (m = 100;n = 60), so the normal inference is not greatly afiected in these cases because the bias is small. For the combination (100;60), in order to avoid 0 observations in the cells we use (Woolf, 1955)?s adjusted method to have ^fli = log((ai +1=2)(di +1=2)=(ci +1=2)(bi +1=2)) 79 and ^fii = log(bi +1=2=ai +1=2). In order to compare our estimators ^flw with true weights and ^^flw with es- timated weights to the Mantel-Haenszel estimator ^flMH and to see whether our estimators are more e?cient, we compared the standard deviation of our estimators from 1000 replication with of Mantel-Haenszel estimator. From the results of dif- ferent combinations of (m;n) and fl0, our estimators are almost the same e?cient as Mantel-Haenszel estimator. Since ^^flw is constructed by plugging MLE ^fli for ith group, we can also construct another empirical estimator of fl by plugging in ^^flw, we run the similar simulations under this new empirical estimator which introduced more positive bias. But the normality results hold for this new empirical estimator. It suggests that in practice, one can stay with the simpler empirical estimator by obtained by plugging MLE ^fli into the formula for the weight matrix. The plots show that ^flw and ^^flw are approximately normal, with departures from normality in the extreme tails in logistic 2?2?m unbalanced setup. Since when we introduce 1=2 adjustment into estimator we reduce bias, we tried simulation for (10;200) in balanced setup and unbalanced setup (1=4;3=4) and (10;300) in unbalanced setup (1=3;2=3). Adding the 1=2 adjustment can reduce bias for ^flw ?fl0 and ^^flw ?fl0 a lot, especially for the combination (10;200). 80 Table 4.4: Simulated estimates of log odds ratio (standardized by true and estimated conditional standardized error) in 2?2?m unbalanced table. (^flw ?fl0) (^^flw ?fl0) (m, n) mean std 95% CI for (^flw ?fl0) mean std fl0 = 1, unbalanced setup (1/4,3/4) (10,200) 0.012 0.110 (0.005, 0.019) 0.005 0.107 (20,400) 0.003 0.054 (-0.0003, 0.006) -0.0002 0.053 (100,60) 0.001 0.063 (-0.003 0.004) 0.023 0.057 fl0 = 1, unbalanced setup (1/3,2/3) (10,300) 0.010 0.084 (0.005, 0.015) 0.004 0.083 (30,600) 0.002 0.034 (-0.0003, 0.004) -0.001 0.034 (100,60) -0.001 0.059 (-0.004, 0.003) -0.032 0.055 fl0 = 0:5, unbalanced setup (1/3,2/3) (10,300) 0.005 0.082 (-0.0004, 0.010) 0.002 0.081 (30,600) 0.002 0.033 (0.0004, 0.004) 0.0003 0.032 (100,60) -0.003 0.056 (-0.007, 0.0004) -0.016 0.052 fl0 = 0:5, unbalanced setup (1/4,3/4) (10,200) 0.006 0.109 (-0.0005, 0.013) 0.003 0.106 (20,400) 0.005 0.054 (0.002, 0.008) 0.004 0.054 (100,60) -0.001 0.064 (-0.003, 0.005) -0.009 0.058 81 Table 4.5: Simulated standardized estimates of log odds ratio (standardized by true and estimated conditional standardized error) in 2?2?m unbalanced table . (^flw ?fl0)=s:e: (^^flw ?fl0)=ds:e: (m, n) mean std 95% CI for (^^flw ?fl0) mean std fl0 = 1, unbalanced setup (1/4,3/4) (10,200) 0.110 1.005 (-0.002, 0.012) 0.041 0.969 (20,400) 0.056 0.985 (-0.004, 0.003) -0.006 0.965 (100,60) 0.008 1.015 (-0.027, -0.020) -0.372 0.899 fl0 = 1, unbalanced setup (1/3,2/3) (10,300) 0.122 1.011 (-0.001, 0.009) 0.047 0.992 (30,600) 0.054 1.004 (-0.003, 0.001) -0.035 0.996 (100,60) -0.012 1.025 (-0.035, -0.028) -0.542 0.939 fl0 = 0:5, unbalanced setup (1/3,2/3) (10,300) 0.058 1.018 (-0.003, 0.007) 0.025 1.001 (30,600) -0.049 0.996 (-0.002, 0.002) 0.010 0.986 (100,60) -0.055 1.005 (-0.019, -0.013) -0.283 0.924 fl0 = 0:5, unbalanced setup (1/4,3/4) (10,200) 0.059 1.018 (-0.003, 0.010) 0.029 0.980 (20,400) 0.094 1.020 (0.0002, 0.007) 0.064 1.002 (100,60) 0.017 1.066 (-0.013, -0.006) -0.149 0.947 82 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 m=20, n1=400, betawhat, beta0=1 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 m=20, n1=400, betawhathat, beta0=1 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 0 2 4 m=30, n1=600, betawhat, beta0=1 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -4 -2 0 2 4 m=30, n1=600, betawhathat, beta0=1 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 0 2 4 m=20, n1=400, betawhat, beta0=0.5 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 0 2 4 m=20, n1=400, betawhathat, beta0=0.5 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=30, n1=600, betawhat, beta0=0.5 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=30, n1=600, betawhathat, beta0=0.5 Figure 4.2: Q-Q plots for (^flw?fl0) and (^^flw?fl0) standardized by true and estimated conditionalstandarderror, forvariousvaluesof(m;n)inlogistic2?2?munbalanced setup where n1=600, 400 for (1/3,2/3), (1/4, 3/4) setups respectively . 83 4.2.2 Conditional Convergence in Distribution Recall that Theorem 3.1.2 and Theorem 3.1.3 gave conditional convergence in distribution of the normalized ^flw and ^^flw to the N(0;I) distribution. We performed a limited study to examine this conditional convergence. We repeated the simulations designed for the logistic 2?2?m table, but with the following modiflcation: we generated 10 realizations of i.i.d Unif(-0.8,0.8) of fi0. For each realization of fi0 we generated 1000 replications of y with the same fi0. This was performed only for sample sizes m = 10, n = 200 for fl0=1 or 0.5. From the following tables Table 4.6,Table 4.7, Table 4.8 and Table 4.9, we found that each realization of fi0, the normalized ^flw and ^^flw had Monte Carlo mean zero and variances near 1 in the case m = 10, n1 = 200. The Kolmogorov-Smirmov and Shapiro-Wilk tests all indicated no signiflcant departures from normality. These flndings are very similar to those which describe the unconditional distribution. 84 Table 4.6: Simulated estimates of logistic odds ratio for flxed realizations of uni- formly distributed random efiects (m = 10;n1 = 200) and fl0 = 0:5. (^flw ?fl0) ^^flw ?fl0 Realization mean std 95% CI for ^flw ?fl0 mean std 1 0.007 0.097 (0.001, 0.013) 0.000 0.096 2 0.002 0.091 (-0.005, 0.006) -0.005 0.090 3 0.004 0.100 (-0.002, 0.010) -0.001 0.100 4 0.003 0.093 (-0.003, 0.009) -0.004 0.092 5 0.011 0.095 (0.005, 0.017) 0.004 0.093 6 0.002 0.092 (-0.004, 0.008) -0.003 0.090 7 0.003 0.092 (-0.003, 0.009) -0.003 0.091 8 0.006 0.098 (0.000, 0.012) -0.001 0.097 9 0.006 0.092 (0.000, 0.012) 0.005 0.092 10 0.003 0.095 (-0.003, 0.009) -0.003 0.093 85 Table 4.7: Simulated standardized estimate of logistic odds ratio with flxed realiza- tions of uniformly distributed random efiects (m = 10; n1 = 200) and fl0 = 0:5. (^flw ?fl0)=s:e (^^flw ?fl0)=ds:e: Realization mean std 95% CI for (^^flw ?fl0) mean std 1 0.071 1.030 (-0.005, 0.006) -0.003 1.000 2 0.002 0.980 (-0.011, 0.000) -0.062 0.964 3 0.045 1.089 (-0.008, 0.005) -0.019 1.069 4 0.030 1.000 (-0.009, 0.002) -0.041 0.970 5 0.116 1.000 (-0.002, 0.010) 0.037 0.975 6 0.024 1.000 (-0.009, 0.002) -0.039 0.980 7 0.032 0.991 (-0.008, 0.003) -0.033 0.974 8 0.066 1.040 (-0.006, 0.006) -0.009 1.016 9 0.068 1.007 (-0.005, 0.006) 0.002 0.993 10 0.033 1.018 (-0.009, 0.003) -0.035 0.993 86 Table 4.8: Simulated estimates of logistic odds ratio with flxed realizations of uni- formly distributed random efiects (m = 10;n = 200) and fl0 = 1. (^flw ?fl0) ^^flw ?fl0 Realization mean std 95% CI for ^flw ?fl0 mean std 1 0.015 0.097 (0.009, 0.021) 0.002 0.095 2 0.013 0.102 (0.007, 0.020) -0.003 0.099 3 0.010 0.100 (0.003, 0.016) -0.004 0.099 4 0.013 0.103 (0.008, 0.020) 0.000 0.102 5 0.014 0.094 (0.009, 0.020) 0.000 0.092 6 0.011 0.094 (0.005, 0.017) 0.000 0.093 7 0.013 0.099 (0.007, 0.020) -0.001 0.097 8 0.011 0.097 (0.005, 0.017) -0.003 0.095 9 0.009 0.099 (0.003, 0.015) -0.004 0.098 10 0.011 0.098 (0.005, 0.017) -0.005 0.096 87 Table 4.9: Simulated standardized estimate of logistic odds ratio with flxed realiza- tions of uniformly distributed random efiects (m = 10; n1 = 200) and fl0 = 1. (^flw ?fl0)=s:e (^^flw ?fl0)=ds:e: Realization mean std 95% CI for (^^flw ?fl0) mean std 1 0.153 1.021 (-0.003, 0.008) 0.019 0.996 2 0.138 1.022 (-0.009, 0.003) -0.039 0.987 3 0.099 1.033 (-0.010, 0.002) -0.050 1.008 4 0.144 1.064 (-0.007, 0.006) -0.011 1.038 5 0.149 0.968 (-0.005, 0.006) -0.003 0.940 6 0.119 1.001 (-0.006, 0.006) -0.008 0.982 7 0.139 1.025 (-0.007, 0.006) -0.013 0.999 8 0.109 0.994 (-0.009, 0.003) -0.039 0.969 9 0.095 1.031 (-0.010, 0.002) -0.051 1.011 10 0.111 1.003 (-0.010, 0.001) -0.053 0.977 88 4.3 Analysis of real data for a 2?2?22 table We used the data from Yusuf et al. [36] which has 22 clinical trials of beta- blockers for reducing mortality after myocardial infarction. The data structure is the following: Clinical trial j, where 1 ? j ? 22 (in the series to be considered for meta-analysis), involves the use of n0j subjects in the control group and n1j in the treatment group, giving rise to bj and dj deaths in the control and treatment groups, respectively. Then the usual sampling models involve two independent binomial distributions with probability of death p0j and p1j, respectively. We concentrate on estimating the common log odds ratio which we label fl. Here we assume the model logit(yij = 1jxij;fii) = fii + xijfl where fii is a random efiect which may represent variation between clinical trials. We use this real data to calculate our estimator, the Mantel-Haenszel estimator and their variance estimators, respectively. Here we need to point out that the difierence between our asymptotic setting and that Woolf (1955) is that we allow m ! 1, but in Woolf?s setting, m is flxed. The following table summarizes the result. 89 Table 4.10: Summary of simulation of our estimator and Mantel-Haenszel estimator of log odds ratio. ^^fl w ^flMH 95% CI for ^^fl w 95% CI for ^flMH Var( ^^fl w) Var(^flMH) -0.260 -0.261 (-0.359, -0.161 ) (-0.372, -0.150) 0.00253 0.00249 First, we use Woolf?s test in Splus, Breslow-Day test and Likelihood test in SAS to test homogeneity of odds ratios whose p-values are 0.3595, 0.3149 and 0.3118 respectively. They all suggest that the common odds ratio model is appropriate for combining the 22 clinical trials. The above empirical variance estimator of Var[^flMH] is based on Breslow, Greenland and Robins [25]. We bootstrapped the data as follows: for each i = 1; ::: ;22 and j = 0; 1, we generated Y ?ij ? Binomial(nij;pij), where pij = Yij=nij is the sample proportion of successes in table i with x = j. Based on this bootstrap sample, new estimators ^^fl?w and ^fl?MH were calculated. This process were repeated 1000 times, and the sample variance of these bootstrap replicates was used to esti- mate Var[^^flw] and Var[^flMH]. We obtained 0:002476 for ^^flw and 0:002603 for ^flMH. These results agree with Table 4.10 and we can see that our estimator ^^flw is very close to the Mantel-Haenszel estimator. The Mantel-Haenszel estimator is consistent in the sparse data case, unlike our estimator. The logic of our estimator can be extended to other types of GLMM where m ! 1 and m=mini ni ! 0. No corresponding extension for Mantel-Haenszel seems available in the literature. 90 Chapter 5 Case 2 Results. This Chapter establishes asymptotic normality results for parameter estimates in certain versions of Case 2, when m=n90. In Section 5.1 we review Jiang?s [14] results on consistency. In Section 5.2 we state and prove our asymptotic normality theorem, and in Section 5.3 we focus on a random efiect logistic regression model. OurresultsfocusontheMaximumConditionalLikelihoodEstimates(MCLEs). These estimates are based on maximizing the likelihood function conditional on the estimable random efiects after integrating out the unestimable random efiects. The estimates derived from this likelihood function were called MCLEs by Jiang [14]. A basic technique here is reparameterization, because, conditionally, the indi- vidual efiects may not be identiflable. To illustrate this method, a special case is considered. The analysis of the more general setting will be similar. In case 2 the dimension of fl is flxed. 5.1 Case 2 consistency results of Jiang [14]. This section reviews the results on MCLE?s obtained by Jiang [14]. A key tool in this analysis is to reparameterize the model to address identiflability. Lemma 5.1.1. There is a map fl 7! ~fl, fi 7! ~fi such that (i) Xfl+Zfi = ~X~fl+Z~fi, where (~X ~Z) is a known matrix of full column rank. 91 (ii) ~zi = ~z?j, i 2 Sj for some known vector ~z?j, where ~ztj is the ith row of ~Z and Sj is deflned as below. Vector uti is the ith row of U where U is standard in the sense that consists of 0?s and 1?s and there is at least one 1 in each column and exactly one 1 in each row. Vector eM;j is the M-dimensional vector whose jth component is 1 and other components are 0. Let Sj = f1 ? i ? N : ui = eM;jg, and y(j) = (yi)i2Sj, 1 ? j ? M. Suppose that ?1;:::;?M are independent with common distribution ?(?=?)=?, where ?(?) is a known density function and ? > 0 is an unknown scale parameter. Furthermore, we assume that there are no random efiects nested within ?. In notation, this means that zi = z?j, i 2 Sj, 1 ? j ? M, where z?j = (z?jk)1?k?l. By Jiang?s [14] Lemma 4.1.1, we have ? = ~X~fl + ~Z~fi+U? (5.1) Let ? = (~fl;?), = (~fi;?). By (2.23) distribution of y given only fi is f(yj ) = MY j=1 f(y(j)j ) (5.2) and it is easy to show that f(y(j)j ) = gj(zt?j ~fi; ~fl;?), where gj(s) = E 2 4Y i2Sj f(yijs1 +xtis(2) +sr+2?) 3 5 with s(2) = (s2;:::;sr+1) and r is the dimension of ~fl. Note that r ? p. Let n be the dimension of ~fi and the ecpectation is with respect to the distribution of ?, hj(s) = loggj(s), lC( ) = logf(yj ) and lC;j( ) = logf(y(j)j ) = hj(zt?j ~fi; ~fl; ?). 92 Then lc( ) = MX j=1 lc;j( ): (5.3) Let Z? be the matrix whose jth row is zt?j, 1 ? j ? M. Let ?0 and 0 be the vectors corresponding to the true parameters and realization of random efiects. Deflne s(l)M; k = PMj=1j~z?jkjl, l = 1;2;:::, tM;k = PMj=1Pl6=k j~z?jk~z?jlj, Hj = (@2hj=@s2) flfl fls1=~zt ?j~fi; s(2)=~fl; sr+2=? ; (5.4) and A2 = MX j=1 0 BB @ ~z?j 0 0 Ir+1 1 CC A(Hj( 0)?E(Hj( 0)j 0)) 0 BB @ ~zt?j 0 0 Ir+1 1 CC A (5.5) where Il represents the l-dimensional identity matrix. Deflne G = 0 BB @ ~Zt?~Z? 0 0 MIr+1 1 CC A = MX j=1 0 BB @ ~z?j~zt?j 0 0 Ir+1 1 CC A; (5.6) ?M( ) = min 1?j?M ?min ? Var ? (@hj=@s)j(~zt ?j~fi; ~fl; ?) j ?? ; (5.7) and ?M = ?M( 0). Let ?(l)j ( ) = (@lhj=@ls1)(~zt?j~fi; ~fl; ?), l = 1; 2, V (1)k (") = max 1?j?M E ? j?(1)j ( 0)j1(~z ?jk? (1) j ( 0)j>1=2") j 0 ? ; V (2)k (") = max 1?j?M E ? j?(2)j ( 0)?E(?(2)j ( 0)j 0)j1(~z2?jkj???????j>1=2")j?0 ? : Theorem 5.1.2. Suppose: (i) the conditional densities f(y(j)j ), 1 ? j ? M, are with respect to a com- mon measure ? and have common support, and the flrst and second partial deriva- tives of R f(yjj )d? with respect to components of exist and can be taken under the integral sign. 93 (ii) hj(s), 1 ? j ? M, are three times difierentiable and there exist ?, B > 0 such that max 1?j?M ? max 1?u?r+2 flfl(@2h j=@s1@su) flfl?_ max 1?u; v; w?r+2 flfl(@3h j=@su@sv@sw) flfl? ? B (5.8) for all such that k???0k < ?. (iii) ~Z?k 6= 0, 1 ? k ? n, where ~Z?k is the kth column of ~Z?, and the following are bounded: k~Z?k1; max 1?k?n ? s(1)M; k s(2)M; k ! ; max 1?k?n ? s(2)M; k s(4)M; k ! ; max 1?k?n ? s(4)M; k s(2)M; k ! and max 1?j?M ? j~z?jj2E @h j @s1 js0 ?2! _ ? max 2?u?r+2 E @h j @su js0 ?2! ; (5.9) (iv) ?M > 0, and there is a sequence ?M such that 0 < ?M ? ?M ^1, and the following ! 0 in probability: ?max(G?1=2A2G?1=2)=?M; max 1?k?n ? tM; k s(2)M; k ! =?M; ( nM)=?4M and max l=1; 2 (logn=?2lM min 1?k?n s(6?2l)M; k )_ n max 1?k?n V (3?l)k (?lM)=?lM ? : Then, with probability approaching 1, there is a sequence ^ satisfying (@lC=@ )(^ ) = 0 and maxij^ i ? i0j = op(?M). Note. In fact, it is seen from the proof of the theorem that maxij^?i ??i0j = op(?2M). 94 Consider a special case in which there is only one random efiect factor. In such a case, one may integrate out all the random efiects, if necessary. The resulting MCL estimates are the maximum likelihood estimates for the flxed parameters. We have the following. Corollary 5.1.3. Suppose that in (5.1) fi = 0 (i.e., there are no random efiects besides ?), and that: (1) Part (i) of Theorem 4.1.2 holds with replaced by ?. (2) hj(?), 1 ? j ? M are three times difierentiable and there exists ?, B > 0 such that max 1?j?M sup k???0k?? j(any third derivative of hj)(?)j? B: (3) ?M = min1?j?M ?min(Var((@hj=@?)(?0))j?0) > 0 and 1 M2(?M ^1)2 MX j=1 E(kHj(?0)?EHj(?0)k2R) ! 0; where Hj(?) = @2hj=@?2. Then, with probability approaching 1, there is a sequence 5.2 The Simple case (fi = 0). In this case, we have the classical likelihood function after integrating out the unestimable random efiects and do not need reparameterization, since we only have flxed efiects and the dispersion parameter of the random efiect distribution. 95 Recall the likelihood function is deflned between (5.1) and (5.4), Then 0 BB @ (@lC(?)=@fl)j^? (@lC(?)=@?)j^? 1 CC A = 0 BB @ (@2lC(?)=@fl@flt)j?? (@2lC(?)=@fl@?)j?? (@2lC(?)=@flt@?)j?? (@2lC(?)=@?2j?? 1 CC A 0 BB @ ^fl?fl0 ^? ??0 1 CC A + 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)=@?)flfl?0 1 CC A where ?? is between ^? and ?0. Then (?H) 0 BB @ ^fl?fl0 ^? ??0 1 CC A = 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)=@?)flfl?0 1 CC A where H = 0 BB @ (@2lC(?)=@fl@flt)j?? (@2lC(?)=@fl@?)j?? (@2lC(?)=@flt@?)j?? (@2lC(?)=@?2)j?? 1 CC A: (5.10) Let C = Cov 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)=@?)flfl?0 1 CC A (5.11) Then C?12(?H)C?12C12 0 BB @ ^fl?fl0 ^? ??0 1 CC A = C?12 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)=@?)flfl?0 1 CC A: (5.12) 96 Theorem 5.2.1. Suppose we have the following conditions: (1) For any ? > 0, assume C?12(?H)C?12 is a positive deflnite matrix and P(kC?12(?H)C?12 ?Ik < ?) ! 1 (2) MkC?1k = O(1) (3) EfkGjk?(kGjk)g? K2 where K2 is a positive constant, ?(t) is a positive nondecreasing function mapping from [0;infty) to [0;1) such that limt!1?(t) = 1 and t?(t) is a convex function, and Gj = 0 BB @ ?(@h j=@fl)j?0 ??(@h j=@fl)j?0 ?T ?(@h j=@fl)j?0 ??(@h j=@?)j?0 ? ?(@h j=@?)j?0 ??(@h j=@fl)j?0 ?T ?(@h j=@?)j?0 ?2 1 CC A: (5.13) Then C12 0 BB @ ^fl?fl0 ^? ??0 1 CC A!D N(0;I): (5.14) Proof: Recall (5.11), (5.12). By condition (1), we have C12 0 BB @ ^fl?fl0 ^? ??0 1 CC A = (C?12(?H)C?12)?1C?12 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)=@?)flfl?0 1 CC A 97 Then C?12 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)=@?)flfl?0 1 CC A?C 1 2 0 BB @ ^fl?fl0 ^? ??0 1 CC A (5.15) ? k(C?12(?H)C?12)?1 ?Ik C?12 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)@?)flfl?0 1 CC A (5.16) = k(C?12(?H)C?12)?1 ?IkOp(1): (5.17) The second factor of (5.16) is Op(1) from the following argument: P 0 BB @ C?12 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)@?)flfl?0 1 CC A < c4 1 CC A ? 1?(1=c24)E 0 BB @ C?12 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)@?)flfl?0 1 CC A 1 CC A 2 = 1?(1=c24)tr 0 BB @C?1Cov 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)@?)flfl?0 1 CC A 1 CC A = 1?(p+1)=c24 By condition (1) of Theorem 5.2.1, We have k(C?12(?H)C?12)?1 ?Ik ? k(C?12(?H)C?12)?1kkC?12(?H)C?12 ?Ik = op(1): So C?12 0 BB @ (@lC(?)=@fl)flfl?0 (@lC(?)=@?)flfl?0 1 CC A and C 1 2 0 BB @ ^fl?fl0 ^? ??0 1 CC A have the same asymptotic dis- tribution. 98 Let u be a unit vector and Tj = uTC?12 0 BB @ (@hj=@fl)j?0 (@hj=@?)j?0: 1 CC A Recall (5.13). Then we have T2j = uTC?12GjC?12u: (5.18) By conditions (1), (2) and (3) MX j=1 E(T2j I(jTjj > ")) = MX j=1 E(T2j I(T2j > "2)) ? kC?1k MX j=1 E ? kGjkI kGjk > " 2 kC?1k ? ? kC?1k MX j=1 E(kGjk?fkGjkg) ? ? "2 kC?1k ? ?1 ? MkC?1kK2 ? ? "2 kC?1k ? ?1 = op(1): The Lindeberg condition is satisfled and by Slutsky?s theorem, the conclusion of Theorem 5.2.1 follows. 5.3 The logistic model logitP(yijk = 1jbij) = ?+bij: We use the logistic model logitP(yijk = 1jbij) = ? + bij example to illustrate Theorem 5.2.1. 99 Suppose ai = 0 and bij are iid normal with 1 ? i ? m1, 1 ? j ? n. The binary responses yijk are conditionally independent with logitP(yijk = 1jb) = ?+bij; 1 ? k ? r (5.19) where r is flxed. Recall hj(s) above (5.3). Then hij(s) = logEexpf(s1 +s2?ij) rX k=1 yijk ?rlog(1+exp(s1 +s2?ij))g; where the ?ij are independently identically distributed with a standard normal dis- tribution. Recall (5.11). In this example we have C = X i X j Var(@hij@? j?0): (5.20) By condition (iii) of Corollary 5.1.3, C is a positive deflnite matrix. We need to verify mn1kC?1k = O(1), equivalently as inf kuk=1 (utCu)=mn1 > 1=cmn1 where cmn1 is a positive constant and u is a unit vector. By condition (iii) of Corollary 5.1.3 we have inf kuk=1 (utCu)=mn1 > ?M where ?M > 0. Then condition (2) of Theorem 5.2.1 is verifled. Recall (5.10), (5.11) and consider condition (1) of Theorem 5.2.1. In this 100 example we have ?H = 0 BB @ ?(@2lC(?)=@?2)j?? ?(@2lC(?)=@?@?)j?? ?(@2lC(?)=@?@?)j?? ?(@2lC(?)=@?2)j?? 1 CC A = C?C?H: (5.21) Then kC?12(?H)C?12 ?Ik = kC?12 (C?H?C)C?12 ?Ik = kC?12 (?H?C)C?12k: Let ?C = 0 BB @ C11 C12 C21 C22 1 CC A where C11 = ? X i X j E(@hij@? j?0)2; C12 = C21 = ? X i X j E(@hij@? j?0)(@hij@? j?0); C22 = ? X i X j E(@hij@? j?0)2: Let wij(?0;?ij) = (?0 +?0?ij) X k yijk ?rlog(1+exp(?0 +?0?ij)) (5.22) and vij(?0;?ij) = X k yijk ? rexp(?0 +?0?ij)1+exp(? 0 +?0?ij) : (5.23) 101 Since we have P(yijk = 1j?;??ij) = exp(?+??ij)=(1+exp(?+??ij)), then X k yijkj?0;?0?ij ? Binomial r; exp(?0 +?0?ij)1+exp(? 0 +?0?ij) ? : Thus @hij @? j?0 = E(exp(wij(?0;?ij)))vij(?0;?ij) E(exp(wij(?0;?ij))) (5.24) @hij @? j?0 = Ef?ij(exp(wij(?0;?ij)))vij(?0;?ij)g E(exp(wij(?0;?ij))) (5.25) ?@ 2hij @?2 j?0 = ? 1 E(exp(wij(?0;?ij))) 'E?exp(w ij(?0;?ij))(v2ij(?0;?ij)) ? Var( X k yijkj?0;?0?ij) !) + ?E(exp(w ij(?0;?ij)))vij(?0;?ij) E(exp(wij(?0;?ij))) 2 ?@ 2hij @?2 j?0 = ? 1 E(exp(wij(?0;?ij))) 'E??2 ij exp(wij(?0;?ij))v 2 ij(?0;?ij) ? Var( X k yijkj?0;?0?ij) !) + ?Ef? ij(exp(wij(?0;?ij)))vij(?0;?ij)g E(exp(wij(?0;?ij))) 2 ?@ 2hij @?@?j?0 = ? 1 E(exp(wij(?0;?ij))) 'E?? ij(exp(wij(?0;?ij)))v2ij(?0;?ij) ? Var( X k yijkj?0;?0?ij) !) + ? E(exp(wij(?0;?ij)))vij(?0;?ij) fE(exp(wij(?0;?ij)))g2 ! ? (Ef?ij(exp(wij(?0;?ij)))vij(?0;?ij)g): Since we have the classical likelihood function in this example, then ?E(@hij@? j?0)2 ?E(@ 2hij @?2 j?0) = 0; ?E( @hij @? j?0)( @hij @? j?0)?E( @2hij @?@?j?0) = 0; ?E(@hij@? j?0)2 ?E(@ 2hij @?2 j?0) = 0: 102 So now we have ?C = 0 BB @ A?11 A?12 A?21 A?22 1 CC A where A?11 = X i X j E(@ 2hij @?2 j?0); A ? 12 = A ? 21 = X i X j E(@ 2hij @?@?j?0); A?22 = X i X j E(@ 2hij @?2 j?0): Let ?C?H = 0 BB @ H?11 H?12 H?21 H?22 1 CC A where H?11 = X i X j ? E(@ 2hij @?2 j?0 )? @2hij @?2 j?0 ? @2hij @?2 j?? + @2hij @?2 j?0 ; H?21 = H?12 = X i X j ? E(@ 2hij @?@?j?0)? @2hij @?@?j?0 ? @2hij @?@?j?? + @2hij @?@?j?0 ; H?22 = X i X j ? E(@ 2hij @?2 j?0)? @2hij @?2 j?0 ? @2hij @?2 j?? + @2hij @?2 j?0 : By Jiang?s [14] Corollary 4.1.3 (ii) and Taylor expansion max ij flfl flfl@2hij @?2 j?? ? @2hij @?2 j?0 flfl flfl?j^???0jmax ij flfl flfl@3hij @?3 j~? flfl flfl? Bj^???0j (5.26) max ij flfl flfl@2hij @?2 j?? ? @2hij @?2 j?0 flfl flfl?j^???0jmax ij flfl flfl@3hij @?3 j~? flfl flfl? Bj^???0j (5.27) max ij flfl flfl@2hij @?@?j?? ? @2hij @?@?j?0 flfl flfl?j^???0jmax ij flfl flfl@3hij @?3 j~? flfl flfl? Bj^???0j (5.28) where ~? is between ?? and ?0 and Bj^???0j!p 0. 103 Since ?H?C is a symmetric matrix, then k?H?Ck = sup kuk=1 jut(?H?C)uj ? 2 (? m1nBj^???0j+ flfl flfl fl X i X j (E@ 2hij @?2 j?0 ? @2hij @?2 j?0) flfl flfl fl ! _ ? m1nBj^???0j+ flfl flfl fl X i X j (E@ 2hij @?2 j?0 ? @2hij @?2 j?0) flfl flfl fl ! _ ? m1nBj^???0j+ flfl flfl fl X i X j (E@ 2hij @?@?j?0 ? @2hij @?@?j?0) flfl flfl fl !) Since (@2hij=@?2)j?0 are iid, (@2hij=@?2)j?0 are iid, and (@2hij=@?@?)j?0 are iid. Suppose we have 0 < ?0 < b where b is a positive constant. By Jiang?s [14] Lemma 3.3, the second derivatives of hij are uniformly bounded. Then E flfl flfl@2hij @?2 j?0 flfl flfl < 1 E flfl flfl@2hij @?2 j?0 flfl flfl < 1 E flfl flfl@2hij @?@?j?0 flfl flfl < 1: (5.29) By the Law of Large numbers 1 mn1 flfl flfl fl X i X j E@ 2hij @?2 j?0 ? @2hij @?2 j?0 ?flflfl flfl = op(1); (5.30) 1 mn1 flfl flfl fl X i X j E@ 2hij @?2 j?0 ? @2hij @?2 j?0 ?flflfl flfl = op(1); (5.31) 1 mn1 flfl flfl fl X i X j E@ 2hij @?@?j?0 ? @2hij @?@?j?0 ?flflfl flfl = op(1): (5.32) 104 Then kC?12(?H)C?12 ?Ik = kC?12 (?H?C)C?12k ? kC?1kk?H?Ck = Op(1)k?H?Ckmn 1 = op(1): Consider condition (3) of Theorem 5.2.1 and recall (5.13). We have Gij = 0 BB @ ((@hij=@?)j?0)2 ((@hij=@?)j?0)((@hij=@?)j?0) ((@hij=@?)j?0)((@hij=@?)j?0) ((@hij=@?)j?0)2 1 CC A: Since Gij is a symmetric matrix, we have kGijk? 2max ? @hij @? j?0 ?2 ; flfl flfl @h ij @? j?0 ? @h ij @? j?0 ?flfl flfl; @h ij @? j?0 ?2! : Recall (5.24), (5.23). It is easy to show @h ij @? j?0 ?2 ? 4r2 since ?ij are independent identically distributed standard normal. Suppose we have 0 < ?0 < b where b is a positive constant, By Jiang?s [14] Lemma 3.3 that flrst derivatives of hij are uniformly bounded. We have E(@hij=@?j?0)2 bounded and flfl flfl @h ij @? j?0 ? @h ij @? j?0 ?flfl flfl 2 ? @h ij @? j?0 ?2 @h ij @? j?0 ?2 : Condition (1), (2) and (3) of Theorem 5.2.1 are verifled, so we have C12 0 BB @ ^???0 ^? ??0 1 CC A!D N(0;I) (5.33) 105 Chapter 6 Simulation In this chapter, in order to check the asymptotic results of Chapters 3 and 5, logistic and Poisson random intercept models are simulated under case 1 for both our new estimator (linear combination of weighted MLE) and PGWLE (Penalized GeneralizedWeightedLeast Square Estimate)from Jiang [14]. Forcase 2, one simple model was simulated. The asymptotic behavior of the estimates is investigated in samples generated by Splus 2000 or R 2.6 for various sample sizes and parameter conflgurations. The built-in function nlminb was used to compute the estimates. Both statistical and computational questions were examined in the course of the simulations. In our simulations we compared Monte Carlo average of estimators to the known true values of parameters, and we compared Monte Carlo averages of ap- proximate variance formulas to the Monte Carlo sample variances. We also used both the Kolmogorov-Smirnov and Shapiro-Wilk tests to assess agreement of stan- dardized estimators with the N(0;1) distribution. The Kolmogorov-Smirnov test is based on Dn = supxj^Fn(x) ? 'xj, where ^Fn(x) is the empirical cdf and '(x) is the N(0;1) cdf. The Shapiro-Wilk test [27] is implemented in R using the algorithm of Royston. Intuitively, the Shapiro-Wilk test is based on the observed correlation between an ordered sample and expected 106 values of N(0;1) order statistics. The actual calculation of the statistic and its p-value relies on various approximations of means and variances of normal order statistics. See Royston [26] and references there in for details. 6.1 Case 1 simulations for ^^flw We simulate logistic and Poisson random intercept regression to investigate consistency and asymptotic normality of the estimated regression coe?cients ^flw and ^^flw. The results of Chapter 3 established the conditional and unconditional asymptotic behavior of regression coe?cients, given the random efiects v0. 6.1.1 Case 1 combining a with vi. We consider: E(yijjvi) = b0ij(a+vi +flxij) (6.1) where a and fl are flxed parameters, xij is a scalar valued predictor, and vi are iid Unif (?0:8;0:8) or N(0;0:25). For our simulations we let a0 = 1. We used the nlminb minimization function in Splus to get estimators which minimize ?lni( i) in each cluster deflned as ?lni( i) = X j (?yij?ij +bij(?ij)) where ?ij = a+vi+xijfl and = (a;fl;v1 ; ::: ;vm)t. We do not attempt to estimate a and vi separately but only a+vi. For simplicity, we simulated the balanced model with m clusters and n1 observations per cluster. We generated m ? n1 covariates xij uniformly spaced between ?1 and 1, so that the ranges of fxijg and fxljg, i 6= l, 107 did not overlap. For example if we have m = 3 clusters and n1 = 2 observations per cluster, flrst we divided interval [?1;1] into m = 3 equal sized intervals [?1;?1=3], [?1=3;1=3] and [1=3;1], and then divide each interval into n1 = 2 subintervals to get x11 = ?1, x12 = ?2=3, x21 = ?1=3, x22 = 0, and x31 = 1=3, x32 = 2=3 for clusters 1, 2, 3 respectively. We also generate m independent random efiects vi from the Uniform distribution (?0:8;0:8) or N(0;0:25) according to (xij;vi) with regression coe?cient fl0 = 1, m samples of n1 binary random variables Yij was generated with E(Yijjvi;xij) = b0ij(?ij) where for Poisson model with b0ij(?ij) = exp(?ij) and for logistic model with b0ij(?ij) = exp(?ij)=(1 + exp(?ij)). Various combinations of (m;n1) are used in the simulations. For each combination, 1000 replications of random efiects vi and responses Yij were generated. The covariates xij were the same for all the simulations. Estimated regression coe?cients for various choices of (m;n1) are summarized in Table 6.1 and Table 6.2. The tables display the Monte Carlo means and standard errors of the simulated values of ^flw ? fl0, ^^flw ? fl0, (^flw ? fl0)=s:e:, (^^flw ? fl0)=ds:e: and 95% confldence bounds for (^flw ? fl0) and ^^flw ? fl0 based on Student?s t. From Table 6.1 and Table 6.2, we can see that in combinations (30;120), (40;240) in logistic model for ^flw, ^^flw, some bias is present (based on confldence intervals). But for all the combinations from both tables, their Shapiro-Wilk test p values are above 0.18 and Kolmogorov-Smirnov test p values are larger than 0.1, which means in those combinations, ^flw and ^^flw given v0 seem normal. From Table 6.1 and Table 6.2, we can see that bias/se< 0:165 so the normal inference is not greatly afiected in the cases where bias is present. The plots show that ^flw and ^^flw are approximately 108 normal, with departures from normality in the extreme tails when random efiects are uniformly distributed. 109 Table 6.1: Simulated estimates of logistic and Poisson regression coe?cient combin- ing flxed intercept with uniformly distributed random efiects. (^flw ?fl0) ^^flw ?fl0 (m, n1) mean std 95% CI for ^flw ?fl0 mean std Logistic fl0 = 1 (10,200) 0.020 0.808 (-0.029, 0.071) 0.002 0.794 (20,200) -0.020 1.190 (-0.089, 0.050) -0.033 1.100 (30,120) -0.224 1.432 (-0.313, -0.135) -0.229 1.421 (30,210) -0.055 1.200 (-0.130, 0.019) -0.063 1.191 (40,240) -0.197 1.226 (-0.273, -0.120) -0.202 1.221 Poisson fl0 = 1 (10,200) 0.007 0.355 (-0.015, 0.029) 0.002 0.352 (20,200) 0.028 0.510 (-0.004, 0.059) 0.023 0.504 (30,210) -0.033 0.590 (-0.069, 0.004) -0.036 0.588 (40,240) 0.003 0.648 (-0.038, 0.043) -0.001 0.644 110 Table 6.2: Simulated standardized estimate of logistic and Poisson regression coef- flcient with uniformly distributed random efiects. (^flw ?fl0)=s:e (^^flw ?fl0)=ds:e: (m;n1) mean std 95% CI for (^^flw ?fl0) mean std Logistic fl0 = 1 (10,200) 0.026 1.014 (-0.047, 0.051) 0.001 0.989 (20,200) -0.018 0.995 (-0.101, 0.035) -0.030 0.971 (30,120) 0.126 0.806 (-0.317, -0.140) -0.127 0.792 (30,210) -0.041 0.893 (-0.137, 0.010) -0.047 0.881 (40,240) -0.135 0.845 (-0.278, -0.126) -0.139 0.837 Poisson fl0 = 1 (10,200) 0.014 1.006 (-0.020, 0.024) 0.001 1.000 (20,200) 0.053 0.995 (-0.008, 0.054) 0.044 0.982 (30,210) -0.051 0.963 (-0.073, 0.0002) -0.057 0.959 (40,240) 0.005 0.976 (-0.041, 0.039) -0.0005 0.968 111 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=20, n1=200, betawhat, logistic Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=20, n1=200, betawhathat, logistic Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 m=40, n1=240, betawhat, logistic Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 m=40, n1=240, betawhathat, logistic Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=20, n1=200, betawhat, Poisson Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=20, n1=200, betawhathat, Poisson Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=40, n1=240, betawhat, Poisson Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=40, n1=240, betawhathat, Poisson Figure 6.1: Q-Q plots for (^flw?fl0) and (^^flw?fl0) standardized by true and estimated standard error, for various values of (m;n) in case 1 random intercept model with uniformly distributed random efiects. 112 Our theoretical results assumed bounded random efiects, but we investigated unbounded normal random efiects by simulation studies to those with uniform ran- dom efiects. For each combination (m;n), 1000 replications of random N(0;0:25) efiects vi and responses Yij were generated. The covariates xij were the same for all the simulations. Estimated regression coe?cients for various choices of (m;n1) are summarized in Table 6.3 and Table 6.4. The tables display the means and standard errors of the simulated values of ^flw?fl0, ^^flw?fl0, (^flw?fl0)=s:e:, (^^flw?fl0)=ds:e: and 95% confldence bounds for (^flw?fl0), ^^flw?fl0 based on Student?s t. From Table 6.3 and Table 6.4, we can see that in combinations (30;210), (40;240) in logistic model for ^flw, ^^flw, some bias is present. But for all the combinations from both tables, the Shapiro-Wilk test p values are above 0.463 and the Kolmogorov-Smirnov test p values are larger than 0.1, which means in those combinations, ^flw and ^^flw given v0 seem normal. From Table 6.3 and Table 6.4, we can see that bias/se< 0:177, so the normal inference is not greatly afiected in the cases where bias is present. The plots show that ^flw and ^^flw are approximately normal, with departures from normality in the extreme tails when random efiects are normally distributed. Here we need to point out that we have not proven asymptotic normality of our estimator in case 1 random intercept model for normally distributed random efiects, but the simulation studies show that normality appears to hold. 113 Table 6.3: Simulated estimates of logistic and Poisson regression coe?cient combin- ing flxed intercept with normally distributed random efiects. (^flw ?fl0) ^^flw ?fl0 (m, n1) mean std 95% CI for ^flw ?fl0 mean std Logistic fl0 = 1 (20,200) 0.012 1.132 (-0.058, 0.082) -0.002 1.117 (30,210) -0.206 1.219 (-0.282, -0.130) -0.214 1.209 (40,240) -0.202 1.227 (-0.278, -0.126) -0.203 1.221 Poisson fl0 = 1 (20,200) -0.001 0.512 (-0.033, 0.031) -0.005 0.510 (30,210) 0.011 0.587 (-0.026, 0.047) 0.005 0.585 (40,240) 0.024 0.662 (-0.017, 0.065) 0.021 0.659 114 Table 6.4: Simulated standardized estimate of logistic and Poisson regression coef- flcient with normally distributed random efiects. (^flw ?fl0)=s:e (^^flw ?fl0)=ds:e: (m;n1) mean std 95% CI for (^^flw ?fl0) mean std Logistic fl0 = 1 (20,200) 0.011 1.004 (-0.071, 0.068) -0.002 0.983 (30,210) -0.153 0.904 (-0.289, -0.139) -0.158 0.891 (40,240) -0.139 0.843 (-0.279, -0.127) -0.139 0.834 Poisson fl0 = 1 (20,200) -0.003 1.003 (-0.037, 0.026) -0.011 0.997 (30,210) 0.019 0.959 (-0.031, 0.414) 0.009 0.954 (40,240) 0.037 0.994 (-0.019, 0.062) 0.033 0.987 115 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=20, n1=200, betawhat, logistic Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=20, n1=200, betawhathat, logistic Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 m=40, n1=240, betawhat, logistic Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 m=40, n1=240, betawhathat, logistic Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=20, n1=200, betawhat, Poisson Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 m=20, n1=200, betawhathat, Poisson Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 0 2 m=40, n1=240, betawhat, Poisson Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 0 2 m=40, n1=240, betawhathat, Poisson Figure 6.2: Q-Q plots for (^flw?fl0) and (^^flw?fl0) standardized by true and estimated standard error, for various values of (m;n) in case 1 random intercept model with normally distributed random efiects. 116 6.2 Case 1 logistic random intercept simulation for flxed realizations of random efiects Recall that Theorem 3.1.2 and Theorem 3.1.3 gave conditional convergence in distribution of the normalized ^flw and ^^flw to the N(0;I) distribution. We performed a limited study to examine this conditional convergence. We repeated the simulations designed in Section 6.1.1 for the random intercept problem, but with the following modiflcation: we generated 10 realizations of i.i.d Unif(-0.8,0.8) of v0. For each realization of v0 we generated 1000 replications of y with the same v0 and x?s as described in Section 6.1.1. This was performed only for sample sizes m = 20, n1 = 200 and m = 40, n1 = 240, and for the logistic model (6.1). From the following tables Table 6.5,Table 6.6, Table 6.7 and Table 6.8, we found that each realizations of v0, the normalized ^flw and ^^flw had Monte Carlo mean zero and variances near 1 in the case m = 20, n1 = 200. For the case m = 40, n1 = 240, the Monte Carlo mean and variance show some departures from the desired 0 and 1. The Kolmogorov-Smirmov and Shapiro-Wilk tests all indicated no signiflcant departures from normality. These flndings are very similar to those of Section 6.1.1, which describe the unconditional distribution. 117 Table 6.5: Simulated estimates of logistic regression coe?cient combining flxed efiect and flxed realizations of uniformly distributed random efiects (m = 20;n1 = 200). (^flw ?fl0) ^^flw ?fl0 Realization mean std 95% CI for ^flw ?fl0 mean std 1 0.004 1.063 (-0.062, 0.070) -0.012 1.047 2 -0.208 1.133 (-0.098, 0.043) -0.042 1.117 3 -0.012 1.127 (-0.082, 0.058) -0.028 1.109 4 -0.044 1.105 (-0.113, 0.024) -0.058 1.087 5 0.028 1.066 (-0.038, 0.094) 0.012 1.048 6 0.019 1.141 (-0.052, 0.089) 0.005 1.127 7 -0.024 1.075 (-0.090, 0.043) -0.039 1.059 8 0.026 1.110 (-0.043, 0.095) 0.010 1.095 9 -0.035 1.079 (-0.102, 0.032) -0.048 1.061 10 -0.026 1.110 (-0.095, 0.042) -0.040 1.092 118 Table 6.6: Simulated standardized estimate of logistic regression coe?cient with flxed realizations of uniformly distributed random efiects (m = 20; n1 = 200). (^flw ?fl0)=s:e (^^flw ?fl0)=ds:e: Realization mean std 95% CI for (^^flw ?fl0) mean std 1 0.003 0.948 (-0.077, 0.052) -0.011 0.926 2 -0.025 1.005 (-0.111, 0.027) -0.037 0.982 3 0.010 0.998 (-0.097, 0.041) -0.025 0.975 4 -0.040 0.988 (-0.125, 0.010) -0.052 0.975 5 0.025 0.946 (-0.053, 0.077) 0.011 0.924 6 0.017 1.016 (-0.065, 0.075) 0.004 0.996 7 -0.021 0.959 (-0.105, 0.027) -0.035 0.937 8 0.023 0.991 (-0.058, 0.078) 0.008 0.970 9 -0.031 0.961 (-0.114, 0.018) -0.043 0.938 10 -0.024 0.992 (-0.108, 0.027) -0.036 0.969 119 Table 6.7: Simulated estimates of logistic regression coe?cient combining flxed efiect and flxed realizations of uniformly distributed random efiects (m = 40;n1 = 240). (^flw ?fl0) ^^flw ?fl0 Realization mean std 95% CI for ^flw ?fl0 mean std 1 -0.210 1.207 (-0.285, -0.135) -0.214 1.202 2 -0.225 1.226 (-0.301, -0.149) -0.228 1.223 3 -0.117 1.125 (-0.194, -0.041) -0.120 1.229 4 -0.201 1.261 (-0.279, -0.122) -0.204 1.257 5 -0.164 1.261 (-0.238, -0.089) -0.204 1.225 6 -0.211 1.204 (-0.285, -0.136) -0.214 1.198 7 -0.185 1.211 (-0.260, -0.110) -0.188 1.206 8 -0.206 1.189 (-0.280, -0.132) -0.209 1.184 9 -0.205 1.203 (-0.279, -0.130) -0.207 1.200 10 -0.249 1.223 (-0.325, -0.173) -0.252 1.219 120 Table 6.8: Simulated standardized estimate of logistic regression coe?cient with flxed realizations of uniformly distributed random efiects (m = 40; n1 = 240). (^flw ?fl0)=s:e (^^flw ?fl0)=ds:e: Realization mean std 95% CI for (^^flw ?fl0) mean std 1 -0.145 0.836 (-0.288, -0.139) -0.147 0.829 2 -0.150 0.845 (-0.304, -0.152) -0.157 0.839 3 -0.081 0.853 (-0.200, -0.044) -0.083 0.845 4 -0.138 0.868 (-0.282, -0.126) -0.140 0.861 5 -0.112 0.822 (-0.242, -0.093) -0.114 0.815 6 -0.145 0.831 (-0.288, -0.140) -0.147 0.823 7 -0.128 0.836 (-0.263, -0.113) -0.129 0.829 8 -0.142 0.821 (-0.283, -0.136) -0.144 0.813 9 -0.142 0.833 (-0.281, -0.133) -0.143 0.824 10 -0.172 0.845 (-0.328, -0.176) -0.174 0.839 6.3 Case 1 random intercept simulations for penalized likelihood es- timators We simulate logistic and Poisson random intercept regression to investigate consistency and asymptotic normality of penalized regression coe?cient estimates by using Jiang?s [14] PGWLE (Penalized Generalized Weighted Least Squares) method. 121 6.3.1 Case 1 logistic random intercept combining a with vi. We consider Example 3.2 from Jiang [14]: logitP(yij = 1jvi) = a+vi +fl1xij (6.2) where a and fl1 are flxed parameters, xij is a scalar valued predictor, and vi are iid N(0;Va). For our simulations we let a = 0. We used the PGWLS method by Jiang [14] and the nlminb minimization function in Splus to get estimators which minimize ?lP( ) derived by Jiang [14] as (??), which is deflned as ?lP( ) = X i X j (?yij?ij +log(1+exp(?ij)))+ ?12 m(v)2 where ?ij = a+vi +xijfl1, v = Pmi vi=m, and = (a;fl1;v1 ; ::: ;vm)t. We do not attempt to estimate a and vi separately but only a+vi. For simplicity, we simulated the balanced model with m clusters and n1 observations per cluster. We generated m?n1 covariates xij uniformly spaced between -1 and 1, so that the ranges of fxijg and fxljg, i 6= l, did not overlap. We also generate m independent random efiects vi from the normal distribution with mean 0 and variance Va. Conditionally on (xij;vi) with regression coe?cient fl10 = 1, a sample of mn1 binary random variables Yij was generated with E(Yijjvi;xij) = exp(xij +vi)=(1+exp(vi +xij)). Various combinations of (m;n1), Va, ?1 and initial values of estimated fl1, v1 ; ::: ;vm were used. For each combination, 1000 replications of random efiects vi and responses Yij were generated. The covariates xij were the same for all the simulations. Estimated regression coe?cients for various choices of (m;n1) with Va = 4, initial estimates set 122 to 0, and ?1 = 1, are summarized in Table 6.9. The tables display the means and standard errors of the simulated values of (^fl1 ?fl10)=s:e:, (^fl1 ?fl10)=ds:e: and 95% confldence bounds for ^fl1 ?fl10 based on Student?s t. From Table 6.9, we can see that in the combination (20;20) with Va = 4, bias is present. But its Kolmogorov- Smirnov test p values are larger than 0.1. This can be explained in terms of the consistency condition logm=(logn1)2 ! 0 [Jiang ([14], Ex.3.2]. Among all the pairs of sample sizes for simulation, (20;20) has the highest values of logm=(logn1)2 as 0:334. For all combinations from Table 6.9 whose 95% confldence intervals including 0. In order to check whether the computations of ^fl1 are sensitive to initial values, we ran various combinations of m, n1, Va and ?1 with initial values set to 0. For the same vi and yij, we initialized ^fl1 at 3 and at logit(y::=(1 ? y::)). In each case the ^vi were initialized at 0. This comparison was repeated 1000 times. The results are summarized in Table 6.10. 123 Table 6.9: Simulated standardized estimates of logistic regression coe?cient (stan- dardized by true and estimated conditional standardized error) combining flxed intercept with random efiects. (^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e: (m, n1) mean std 95% CI for ^fl1 ?fl10 mean std Va = 4, initials set to 0, ?1=1 (10,20) -0.017 1.021 (-0.127, 0.293) -0.015 1.010 (20,20) 0.077 1.012 (0.082, 0.665) 0.076 1.011 (20,40) 0.054 0.995 (-0.019, 0.384) 0.053 0.991 Va = 3, initials set to 0, ?1 = 1 (20,20) 0.054 0.938 (-0.008, 0.509) 0.053 0.932 Table 6.10: Simulated standardized estimate of logistic regression coe?cient for various initial values. (^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e: initial ^fl1 mean std 95% CI for ^fl1 ?fl10 mean std Va = 4, (m;n1) = (10;20), ?1=1 0 -0.017 1.021 (-0.127, 0.293) 0.015 1.011 3 -0.014 0.878 (-0.150, 0.277) 0.009 1.022 logit -0.020 0.896 (-0.150, 0.277) 0.015 1.011 124 AswecanseefromTable6.10, theestimatesarebiasedforlargerlogm=(logn1)2. Moreover, the variance of (^fl1?fl10)=s:e: is sensitive to the initial guess of ^fl1. Also in the combination (20;20) with Va = 4 and initial ^fl1 equal to logit, their Kolmogorov- Smirnov test rejects the hypothesis of normality with p values 0.043 and 0.019. For all the other combinations from Table 6.10, their goodness-flt tests suggest normal- ity. Since in reality we do not know how to choose ?1, we tried various values of ?1 to assess the efiect of ?1 on normality and consistency. The Monte Carlo averages and standard deviations of (^fl1 ? fl10)=s:e: are displayed in Table 6.11 for various (m;n1), ?1, initial values and Va. The estimates are computed for ?1 = 0:1, 1 and 5 for the same data, and this comparison was repeated 1000 times. Table 6.11 shows that extremely large values of the penalty parameter (?1 = 5) did afiect the bias. Table 6.11: Simulated standardized estimate of logistic regression coe?cient for various ?1 values. (^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e: ?1 mean std 95% CI for ^fl1 ?fl0 mean std Va = 4, (m;n1) = (20;50), initials set to 0 1 0.048 1.012 (-0.034, 0.330) 0.048 1.009 0.1 0.022 0.858 (-0.034, 0.330) 0.048 1.009 5 0.022 0.858 (0.034, 0.400) 0.070 1.017 125 6.4 Case 2 logistic simple example We consider Example 3.3 from Jiang [14], with fii = 0 for simplicity. We have logitP(yijk = 1jbij) = ?+bij = ?+??ij (6.3) where the ?ij are iid N(0;1). We estimate ? and ? by Jiang?s [14] MCLE method based on integrating ?ij out of the likelihood. This gives us an unconditional log- likelihood function in this special example. We also use the nlminb function in Splus to get estimates which minimize the negative log likelihood function. In Case 2, reparameterization is used to take care of the identiflability problem. Here since we integrate out ?ij, we only have the flxed efiect ? and scale parameter ?. We do not need to worry about the identiflability problem, so the negative log-likelihood function is the following: ?lC(?) = ? X i X j hij = ? X i X j Eexp[(?+??ij)yij+ ?rlog(1+exp(?+??ij))] (6.4) where yij+ = Pk yijk. More precisely, here we have m rows and n columns for each cell we have r observations. We generate m ? n1 random efiects bij from N(0;?20); where ?0 is a scale parameter and set r = 2. Then we can get ?ij1 = ?ij2 = ?+bij. Conditionally on ?ijk with k=1 or 2, a sample of m?n1?2 random variables Yijk is generated from the Bernoulli distribution with E(Yijkj?;??ij) = exp(?+??ij)=(1+exp(?+??ij)). Various choices of (m;n1) were simulated. For each choice, 500 replications of bij, Yijk were generated. We used the Splus nlminb function to calculate estimates of ? and ?. In this simulation, ?0 = 1, initials for ^? and ^? are 0 and 1, respectively. 126 Asymptotic normality results for standardized ^???0 and ^? ??0 are summarized in Table 6.12. For the combinations of Table (6.12), the only pair (m;n1) = (20;20) has Kolmogorov-Smirnov test p values larger than 0.1. For (m;n1) = (10;10), the Kolmogorov-Smirnov test rejects the normality hypothesis with p values 0.01 and 0 for (^???0)=s:e(?) and (^? ??0)=s:e:(?), respectively. For combinations (m;n1) = (10;40) and (m;n1) = (10;30), normality only appear to hold for standardized (^? ? ?0) with Kolmogorov-Smirnov test p values larger than 0.1. Moreover, the Kolmogorov-Smirnov test rejects the normality hypothesis with p value less than 0.008. The 95% confldence intervals for (^???0) for all the pairs (m;n1) include 0, but 95% confldence intervals for (^? ??0) for (10;10) and (10;30) do not include 0. In the combinations (m;n1) = (20;20) and (m;n1) = (10;40), both (^? ? ?0) and (^? ??0) seem unbiased. As a partial check on the joint normality of (^?;^?), we also examined [(^? ? ?0)+(^? ??0)]=p2. Similar results were obtained. 127 Table 6.12: Simulated standardized estimate of ? and ? in case 2 simple example for various (m;n1) values (^???0)=s:e(?) (^? ??0)=s:e:(?) (m;n1) mean std mean std ?0 = 1, ?0 = 1, initials set to (0,1) (10,10) -0.024 0.641 -0.236 1.550 (10,30) 0.079 0.835 -0.025 0.475 (20,20) 0.062 0.816 -0.015 0.523 (10,40) 0.036 0.800 -0.024 0.531 128 Quantiles of Standard Normal Normalized estimate of mu -3 -2 -1 0 1 2 3 -2 -1 0 1 2 m=10, n1=10, mu Quantiles of Standard Normal Normalized estimate of tau -3 -2 -1 0 1 2 3 -4 -2 0 2 4 m=10, n1=10, tau Quantiles of Standard Normal Normalized estimate of mu -3 -2 -1 0 1 2 3 -2 -1 0 1 2 m=10, n1=30, mu Quantiles of Standard Normal Normalized estimate of tau -3 -2 -1 0 1 2 3 -2 -1 0 1 m=10, n1=30, tau Quantiles of Standard Normal Normalized estimate of mu -3 -2 -1 0 1 2 3 -2 -1 0 1 2 3 m=20, n1=20, mu Quantiles of Standard Normal Normalized estimate of tau -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 m=20, n1=20, tau Quantiles of Standard Normal Normalized estimate of mu -3 -2 -1 0 1 2 3 -2 -1 0 1 2 m=10, n1=40, mu Quantiles of Standard Normal Normalized estimate of tau -3 -2 -1 0 1 2 3 -1 0 1 m=10, n1=40, tau Figure 6.3: Q-Q plots for standardized (^???0) and (^? ??0) by true standard error, for various (m;n1) in case 2 logistic simple example. 129 Chapter 7 Conclusions and Future Work. 7.1 Theoretical Conclusions We have proposed a new estimator ^^flw = Pi ^wi^fli and have proven its condi- tional asymptotic normality for the random intercept problem when m, the number of random efiects goes to inflnity with the sample size N but at a slower rate, characterized as m=N ! 0. Given the random efiects, the linear combinations of weighted MLE denoted ^flw and ^^flw are asymptotically normal. The weight matri- ces and asymptotic conditional covariance matrix of ^flw can be estimated, and the standardized regression estimates, standardized by either the true or estimated co- variance, have a limiting N(0;I) distribution. We have proven that in the absence of regression coe?cients, the normalized Jiang?s [14] penalized likelihood estima- tor of flxed intercept pm(^a ? a0) converges to a normal distribution. Di?culties arise in establishing the conditional asymptotic normality of the penalized likelihood estimator ^fl of regression coe?cients for flxed efiects in a general GLMM. For the case m=N 9 0, joint asymptotic normality is proved for regression coe?cient and scale parameter estimates after suitable standardization. A logistic example with logitP(yijkjbij) = ?+bij is used to illustrate how to verify the general conditions in this case. 130 7.2 Conclusions from the simulation studies and real data analysis We focused on investigating asymptotic behavior of our new estimator ^^flw, theoretic estimator ^flw and simulated balanced models for simplicity. In the case m=N ! 0, logistic and Poisson random intercept models were simulated. For both models, we considered normally and uniformly distributed random efiects. When m=n ? 1=7, our estimators had some bias under logistic model. But for Poisson model this problem did not occur except for (30;210) with fl0 = 1. In all the cases, normality appears to hold and the ratio bias/s.e is less than 0.218, which means normal inference is not greatly afiected. In the simulation studies for Jiang?s [14] penalized estimator ^a of flxed intercept, asymptotic consistency and normality does not hold for (30;60) with Va = 4 and (30;60), (40;60), (40;80) with Va = 2. Esti- mates of a0 and standard deviation shows considerable bias. We simulated logistic 2?2?m tables for both balanced and unbalanced setups with uniformly distributed random efiects. In both setups, ^^flw has better consistency results than ^flw. In all the cases, normality holds and the ratio bias/s.e is less than 0.194 except in extreme cases (100;50) and (100;60), which means normal inference is not greatly afiected. In the case m=N 9 0 for the logistic model logitP(yijk = 1jbij) = ? + bij, among the combinations (20;20) with r = 2, unbiasedness and approximate nor- mality hold for the intercept and scale parameter estimates. Scale parameter esti- mates are biased if the number of clusters is too small (< 300). Normality holds for standardized ^? ? ?0 but not for standardized ^? ? ?0 in (10;40) and (10;30) 131 combinations. We analized a real data set of 22 clinical trials of beta-blocker for heart attach treatment (Yusuf et al. (1985)). We flrst test homogeneity of odds ratios by using Woolf?s test, the Breslow-Day test and the likelihood ratio test. None of them reject the null hypothesis, which suggest common odds ratio model is appropriate. Our estimator and Mantel-Haenszel estimator are very close. Mantel-Haenszel estimator is consistent in sparse data case unlike our estimator. The logic of our estimator can be extended to other types of GLMM where m ! 1 and m=mini ni ! 0. No corresponding extension for Mantel-Haenszel is available. 7.3 Future work. We can try to use projected score methods to overcome the di?culties in proving conditional asymptotic normality for penalized likelihood estimates of flxed efiects in the case 1 random intercept problem. Also we can consider conditional logistic regression estimate of random intercept model. We can investigate the asymptotic behavior of Penalized Generalized Weighted Least Square (PGWLS) estimate in a more complicated random efiects models such as two way crossed random efiects model. 132 Chapter A Simulation Results A.1 Case 1 random intercept model for penalized likelihood estima- tor A.1.1 Case 1 logistic random intercept combining a with vi. We consider Example 3.2 from Jiang [14]: logitP(yij = 1jvi) = a+vi +fl1xij (A.1) where a and fl1 are flxed parameters, xij is a scalar valued predictor, and vi are iid N(0;Va). For our simulations we let a = 0. We used the PGWLS method by Jiang [14] and nlminb minimization function in Splus to get estimators which minimize ?lP( ) derived by Jiang [14] as (??), which is deflned as ?lP( ) = X i X j (?yij?ij +log(1+exp(?ij)))+ ?12 m(v)2 where ?ij = a + vi + xijfl1, v = Pmi vi=m, and = (a;fl1;v1 ; ::: ;vm)t. We do not attempt to estimate a and vi separately but only a + vi. For simplicity, we simulated the balanced model with m clusters and n1 observations per cluster. We generated m ? n1 covariates xij uniformly spaced between -1 and 1, so that the ranges of fxijg and fxljg, i 6= l, did not overlap. For example if we have m = 3 clusters and n1 = 2 observations per cluster, flrst we divided interval [?1;1] into 133 m = 3 equal sized intervals [?1;?1=3], [?1=3;1=3] and [1=3;1], and then divide each interval into n1 = 2 subintervals to get x11 = ?1, x12 = ?2=3, x21 = ?1=3, x22 = 0, and x31 = 1=3, x32 = 2=3 for clusters 1, 2, 3 respectively. We also generate m independent random efiects vi from the normal distribution with mean 0 and variance Va. Conditionally on (xij;vi) with regression coe?cient fl10 = 1, a sample of mn1 binary random variables Yij was generated with E(Yijjvi;xij) = exp(xij +vi)=(1+exp(vi +xij)). Various combinations of (m;n1), Va, ?1 and initial values of estimated fl1, v1 ; ::: ;vm were used. For each combination, 1000 replications of random efiects vi and responses Yij were generated. The covariates xij were the same for all the simulations. Estimated regression coe?cients for various choices of (m;n1) with Va = 4, initial estimates set to 0, and ?1 = 1, are summarized in Table A.1. The tables display the means and standard errors of the simulated values of (^fl1 ?fl10)=s:e: and 95% confldence bounds for E(^fl1 ?fl10) based on Student?s t. From Table A.1, we can see that in combinations (20;20), (20;30) with Va = 4 and even for Va = 3 with (20;20), the bias is present. But their Kolmogorov-Smirnov test p values are larger than 0.1. This can be explained in terms of the consistency condition logm=(logn1)2 ! 0 [Jiang ([14], Ex.3.2]. Among all the pairs of sample sizes for simulation, (20;20) and (20;30) have the highest values of logm=(logn1)2, 0:334 and 0:259 respectively. For all combinations from Table A.1 whose 95% confldence intervals including 0. In order to check whether the computations of ^fl1 are sensitive to initial values, we ran various combinations of m, n1, Va and ?1 with initial values set to 0. For 134 the same vi and yij, we initialized ^fl1 at 3 and at logit(y::=(1?y::)). In each case the ^vi were initialized at 0. This comparison was repeated 1000 times. The estimated standardized regression coe?cients, standardized by conditional standard error and estimated conditional standard error, are summarized in Table A.2. 135 Table A.1: Simulated standardized estimates of logistic regression coe?cient (stan- dardized by true and estimated conditional standardized error) combining flxed intercept with random efiects. (^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e: (m, n1) mean std 95% CI for ^fl1 ?fl10 mean std Va = 4, initials set to 0, ?1=1 (10,20) -0.017 1.021 (-0.127, 0.293) -0.015 1.010 (10,30) 0.025 1.009 (-0.075, 0.260) 0.024 1.002 (10,40) 0.021 0.980 (-0.086, 0.195) 0.021 0.977 (20,20) 0.077 1.012 (0.082, 0.665) 0.076 1.011 (20,30) 0.075 0.995 (0.059, 0.523) 0.075 0.990 (20,40) 0.054 0.995 (-0.019, 0.384) 0.053 0.991 (20,50) 0.049 1.012 (-0.034, 0.330) 0.048 1.009 Va = 3, initials set to 0, ?1 = 1 (20,20) 0.054 0.938 (-0.008, 0.509) 0.053 0.932 136 Table A.2: Simulated standardized estimate of logistic regression coe?cient for var- ious initial values. (^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e: initial ^fl1 mean std 95% CI for ^fl1 ?fl10 mean std Va = 4, (m;n1) = (10;20), ?1=1 0 -0.017 1.021 (-0.127, 0.293) 0.015 1.011 3 -0.014 0.878 (-0.150, 0.277) 0.009 1.022 logit -0.020 0.896 (-0.150, 0.277) 0.015 1.011 Va = 4, (m;n1) = (20;50), ?1 = 1 0 0.048 1.011 (-0.034, 0.330) 0.048 1.009 logit 0.022 0.058 (-0.034, 0.330) 0.048 1.009 Va = 4, (m;n1) = (20;20), ?1 = 1 0 0.077 1.012 (0.082, 0.665) 0.076 1.001 logit 0.074 0.839 (0.152, 0.696) 0.087 0.940 Va = 4, (m;n1) = (20;30), ?1 = 1 0 0.075 0.995 (0.059, 0.523) 0.075 0.990 logit 0.062 0.845 (0.114, 0.562) 0.087 0.958 137 AswecanseefromTableA.2, theestimatesarebiasedforlargerlogm=(logn1)2. Moreover, the variance of (^fl1?fl10)=s:e: is sensitive to the initial guess of ^fl1. Also in the combination (20;20) with Va = 4 and initial ^fl1 equal to logit, their Kolmogorov- Smirnov test rejects the hypothesis of normality with p values 0.043 and 0.019. For all the other combinations from Table A.2, their Kolmogorov-Smirnov test p values are larger than 0.1. Since in reality we do not know how to choose ?1, we tried various values of ?1 to assess the efiect of ?1 on normality and consistency. The Monte Carlo averages and standard deviations of (^fl1 ? fl10)=s:e: are displayed in Table A.3 for various (m;n1), ?1, initial values and Va. The estimates are computed for ?1 = 0:1, 1 and 5 for the same data, and this comparison was repeated 1000 times. Table A.3 shows that extremely large values of the penalty parameter (?1 = 5) did afiect the consistency. For Table A.3, the combinations (20;20) with Va = 4 have their the Kolmogorov-Smirnov test rejects the hypothesis of normality with p values between 0.019 and 0.043, except for ?1 = 5 whose p value is bigger than 0.1. For all the other combinations from Table A.3, their Kolmogorov-Smirnov test p values are larger than 0.1. Generally speaking, the plots show that ^fl1 is approximately normal, with departures from normality in the extreme tails. 138 Table A.3: Simulated standardized estimate of logistic regression coe?cient for var- ious ?1 values. (^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e: ?1 mean std 95% CI for ^fl1 ?fl10 mean std Va = 4, (m;n1) = (10;20), initials set to 0 1 -0.017 0.957 (-0.229, 0.164) -0.018 0.947 0.1 -0.020 0.896 (-0.150, 0.277) -0.009 1.002 5 -0.020 0.914 (-0.164, 0.258) 0.003 1.020 Va = 3, (m;n1) = (20;20), initials set to 0 1 0.054 0.938 (-0.008, 0.509) 0.053 0.932 0.1 0.037 0.889 (-0.021, 0.520) 0.052 0.975 5 0.037 0.889 (-0.021, 0.520) 0.052 0.975 Va=4, (m;n1) = (20;50), initials set to 0 1 0.048 1.012 (-0.034, 0.330) 0.048 1.009 0.1 0.022 0.858 (-0.034, 0.330) 0.048 1.009 5 0.022 0.858 (0.034, 0.400) 0.070 1.017 Va = 4, (m;n1) = (20;20), initials set to 0 1 0.077 1.012 (0.082, 0.665) 0.076 1.001 0.1 0.074 0.839 (0.250, 0.800) 0.087 0.940 5 0.078 0.855 (0.250, 0.800) 0.108 0.959 139 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 Va=4, m=10, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 Va=4, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=3, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Va=4, m=20, n1=30 Figure A.1: Q-Q plots for (^fl1?fl10) standardized by true conditional standard error combining a with vi, for various (m;n1) in logistic random intercept case 1. 140 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 Va=4, m=10, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 Va=4, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=3, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Va=4, m=20, n1=30 Figure A.2: Q-Q plots for (^fl1?fl10) standardized by estimated conditional standard error combining a with vi, for various (m;n1) in logistic random intercept case 1. 141 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 initials(0,rep(0,m)), (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 initials(logit,rep(0,m)), (20,50) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 initials(3,rep(0,m)), (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 initials(0,rep(0,m)), (20,50) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 initials(logit,rep(0,m)), (10,20) Figure A.3: Q-Q plots for (^fl1?fl10) standardized by true conditional standard error combining a with vi, for various initial values of ^fl1 in logistic random intercept case 1. 142 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 initials(0,rep(0,m)), (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 initials(0,rep(0,m)), (20,50) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 initials(3,rep(0,m)), (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 initials(logit,rep(0,m)), (20,50) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 initials(logit,rep(0,m)), (10,20) Figure A.4: Q-Q plots for (^fl1 ? fl10) standardized by estimated standard error combining a with vi, for various initial values of ^fl1 in logistic random intercept case 1. 143 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 lambda1=1, (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 lambda1=1, (20,50) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 lambda1=0.1, (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 lambda1=0.1, (20,50) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 lambda1=5, (10,20) Figure A.5: Q-Q plots for (^fl1?fl10) standardized by true conditional standard error combining a with vi, for various values of ?1 in logistic random intercept case 1. 144 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 lambda1=1, (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 lambda1=1, (20,50) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 lambda1=0.1, (20,50) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 lambda1=0.1, (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 lambda1=5, (10,20) Figure A.6: Q-Q plots for (^fl1?fl10) standardized by estimated conditional standard error combining a with vi, for various values of ?1 in logistic random intercept case 1. 145 A.1.2 Case 1 logistic random intercept when a and vi are estimated separately. We consider the following example from Jiang [14]: logitP(yij = 1jv) = a+vi +xijfl1: (A.2) Since ^a0 and ^fl1 have difierent convergence rates, we do not consider joint asymptotic distribution. We are interested in ^fl1 and use the estimating equation (??) and the Splus function nlminb to investigate the asymptotic behavior of ^fl1. We use a balanced setup with m clusters and n1 observations per cluster. We generate m random efiects vi from the normal distribution with mean 0 and variance Va. Conditionally on (xij;vi) with fl10 = a0 = 1, a sample of mn1 random variables Yij is generated from Bernoulli distribution with E(Yijjvi;xij) = exp(xijfl1 + a + vi)=(1+exp(xijfl1 +a+vi)). Various combinations of (m;n1), Va, ?1 were simulated and all combinations initialized ^a, ^fl1 and ^vi at zero. For each combination, 1000 random replications of random efiects vi and the responses Yij are generated. Estimated standardized regression coe?cients ^fl1 stan- dardized by true conditional standardized error and estimated conditional standard- ized error are summarized in Table A.4. 146 Table A.4: Simulated standardized estimates of logistic regression coe?cient (stan- dardized by true and estimated conditional standardized error) with a and vi esti- mated separately. (^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e: (m;n1) mean std 95% CI for ^fl1 ?fl10 mean std Va = 4, initials set to 0, ?1=1 (10,20) -0.007 0.999 (-0.207, 0.223) -0.009 0.976 (20,20) 0.018 0.903 (-0.344, 0.194) -0.019 0.890 (20,30) 0.011 0.970 (-0.190, 0.280) 0.010 0.963 (20,40) 0.054 1.030 (-0.110, 0.276) 0.053 0.956 Va = 3, initials set to 0, ?1 = 1 (20,20) 0.020 0.924 (-0.160, 0.370) 0.018 0.910 (20,30) 0.020 0.975 (-0.159, 0.290) 0.020 0.967 Va = 2, initials set to 0, ?1 = 1 (20,20) 0.014 0.963 (-0.190, 0.340) 0.012 0.950 For the above Table (A.4), the 95% confldence intervals for (^fl1 ?fl10) include 0. Normality holds for (^fl1 ? fl10) standardized by true conditional standard error given v0. The mean and standard error estimates are close to 0 and 1 respectively, except for the pairs as (20;20) and (20;30) with Va = 4. In these cases, their the Kolmogorov-Smirnov test rejects the hypothesis of normality with p-values of 0.0179 and 0.0171 respectively. Even for Va = 3 , (m;n1) = (20;20), the p-value of 0.06 is 147 almost signiflcant. Normality holds for (^fl1?fl10) standardized by estimated conditional standard error given v0. The mean and standard error estimates are close to 0 and 1 respec- tively, except for the pairs as (20;20), (20;30) with Va = 4 and (20;20) with Va = 3. In these cases, the Kolmogorov-Smirnov test rejects the hypothesis of normality with p-values of 0.014, 0.011, 0.04 respectively. For all the other combinations for Table A.4, their Kolmogorov-Smirnov test p values are larger than 0.1. The 95% confldence intervals for (^a?a0) do not include 0 (the intervals are not listed in the table but all were approximately (0:1;0:4)). The Kolmogorov-Smirnov test applied to Monte Carlo distribution of ^a signiflcantly rejects the hypothesis of normality. From the theoretical results of Chapter 3, we know the convergence rates difier between estimated logistic regression coe?cient and intercept, with rates approximately 1=N and 1=m, respectively. Here we have sample sizes from 200 to 1000, but number of clusters 10 or 20. Even in a small simulation with (m;n1) = (40;60) with 200 replications, there is bias present for estimated logistic regression intercept. The Q-Q plots show that the standardized (^fl1?fl10) is approximately normal, with departures from normality in the extreme tails. 148 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 Va=4, m=10, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=4, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=3, m=20, n1=30 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=2, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=4, m=20, n1=30 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=3, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 3 Va=4, m=20, n1=40 Figure A.7: Q-Q plot for (^fl1?fl10) standardized by true conditional standard error not combining a with vi, for various (m;n1) in logistic random intercept case 1. 149 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 Va=4, m=10, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=4, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=3, m=20, n1=30 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=2, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Va=4, m=20, n1=30 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 Va=3, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 -1 0 1 2 3 Va=4, m=20, n1=40 Figure A.8: Q-Q plot for (^fl1?fl10) standardized by estimated conditional standard error with estimated a with vi separately, for various (m;n1) in logistic random intercept case 1. 150 A.1.3 Case 1 Poisson random intercept combining a and vi. We use the PGWLS method by Jiang [14] and nlminb minimization function in Splus to get estimators which minimize negative lP( ) as (??), Here we have ?lP( ) = X i X j (?yij?ij +exp(?ij))+ ?12 m(v)2 where ?ij = ai + fl1xij, ai = a + vi = vi (in our simulation we let a = 0) and v = Pmi vi=m, = (fl1;v1; ::: ;vm). Here a and fl1 are flxed parameters, xij is a scalar valued predictor, and vi are iid N(0;Va). We do not attempt to estimate a and vi separately. For simplicity we simulated the balanced model with m clusters and n1 observations per cluster. We generate m random efiects vi from normal distribution with mean 0 and variance Va. Covariates xij are the same here as in logistic Case 1. Conditionally on (xij;vi) with fl10 = 1, a sample of mn1 random variables Yij were generated from Poisson distribution with E(Yijjvi;xij) = exp(xij +vi). Various combinations of (m,n1), Va and ?1 values are investigated. For each combination, 1000 replications of random efiects vi and responses Yij were generated. Estimated regression coe?cients, for various choices of (m;n1) with Va = 4, ?1 = 1 and initial estimates set to 0, are summarized in Table A.5. The table displays the means and standard errors of the simulated values of (^fl1 ? fl10)=s:e: and 95% confldence bounds for E(^fl1 ?fl10) based on Student?s t. 151 In order to check sensitivity of ?1 values, the estimates are computed for ?1 = 0:1, 1 and 5 for the same data, and this comparison was repeated 1000 times. From Table A.5 and Table A.6, consistency and normality results hold and are not sensitive to the values of ?1. In these cases, their Kolmogorov-Smirnov test p values are larger than 0.1 except for the combination (20;20) with Va = 4. In this case, the Kolmogorov-Smirnov test almost rejects the normality hypothesis with p value 0.06. The Q-Q plot shows that standardized (^fl1 ?fl10) is approximately normal. Table A.5: Simulated standardized estimate of Poisson regression coe?cient for various (m;n1) values. (^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e: (m;n1) mean std 95% CI for ^fl1 ?fl10 mean std Va = 4, ?1 = 1, initials set to 0 (10,20) 0.018 0.970 (-0.022, 0.064) 0.018 0.969 (20,20) -0.005 1.008 (-0.050, 0.054) 0.005 1.008 (20,30) -0.0006 1.007 (-0.042, 0.039) 0.006 1.007 152 Table A.6: Simulated standardized estimate of logistic regression coe?cient for var- ious ?1 values. (^fl1 ?fl10)=s:e: (^fl1 ?fl10)=ds:e: ?1 mean std 95% CI for ^fl1 ?fl10 mean std Va = 4, (m;n1) = (10;20), initials set to 0 1 0.018 0.970 (-0.022, 0.064) 0.018 0.969 0.1 0.015 1.100 (-0.026, 0.064) 0.013 1.024 5 0.028 1.021 (-0.018, 0.064) 0.022 0.965 Va = 4, (m;n1) = (20;30), initials set to 0 1 0.006 1.007 (-0.042, 0.039) 0.006 1.007 0.1 0.001 1.025 (-0.044, 0.033) -0.002 1.008 5 0.009 1.012 (-0.038, 0.043) 0.006 1.003 153 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Va=4, m=10, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Va=4, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Va=4, m=20, n1=30 Figure A.9: Q-Q plots for (^fl1?fl10) standardized by true conditional standard error with a and vi estimated separately in Poisson random intercept case 1. 154 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Va=4, m=10, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Va=4, m=20, n1=20 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Va=4, m=20, n1=30 Figure A.10: Q-Q plots for(^fl1?fl10) standardized by estimated conditional standard error with a and vi estimated separately in Poisson random intercept case 1. 155 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Lamda1=1, (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Lamda1=1, (20,30) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -4 -2 0 2 lambda1=5, (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 lambda1=5, (20,30) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -4 -2 0 2 4 lambda1=0.1, (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -4 -2 0 2 4 lambda1=0.1, (20,30) Figure A.11: Q-Q plots for (^fl1?fl10) for standardized by true conditional standard error with a and vi estimated separately, for various values of ?1 in Poisson random intercept case 1. 156 Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Lamda1=1, (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 Lamda1=1, (20,30) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 0 2 lambda1=0.1, (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -2 0 2 lambda1=0.1, (20,30) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 lambda1=5, (10,20) Quantiles of Standard Normal Normalized estimate of beta -2 0 2 -3 -2 -1 0 1 2 3 lambda1=5, (20,30) Figure A.12: Q-Q plots for (^fl1?fl10) standardized by estimated conditional standard error with a and vi estimated separately, for various values of ?1 in Poisson random intercept case 1. 157 A.1.4 Case 1 Poisson random intercept with a and vi estimated sep- arately We use the PGWLS method by Jiang [14] and nlminb minimization function in Splus to get estimators which minimize negative lP( ) as (??). Here we have ?lP( ) = X i X j (?yij?ij +exp(?ij))+ ?12 m(v)2 where ?ij = a+vi +fl1xij and v = Pmi vi=m, = (fl1; a; v1; ::: ;vm). Here a and fl1 are flxed parameters, xij is a scalar valued predictor, and vi are iid N(0;Va). We attempt to estimate a and vi separately. For simplicity we simulated the balanced model with m clusters and n1 observations per cluster. We generate m random efiects vi from normal distribution with mean 0 and variance Va. Covariates xij are the same here as in logistic Case 1. Conditionally on (xij;vi) with fl10 = a0 = 1, a sample of mn1 random variables Yij were generated from Poisson distribution with E(Yijjvi;xij) = exp(fl1xij +vi +a). The same choices of (m;n1) and Va values as in the previous subsection were simulated. Unlike the results when only vi + a were estimated, as in the previous sub- section, when we attempted to estimate a and vi separately, none of the desired asymptotic results for ^fl1 were observed. The estimate ^fl1 was biased and its Monte Carlo distribution failed tests of normality. 158 Bibliography [1] E. B. Andersen, \Asymptotic properties of conditional maximum-Likelihood estimators", J. Roy. Statist. Soc. Ser. B 32, 283-301 (1970). [2] R. H. Berk, \Consistency and asymptotic normality of MLE?s for exponential models," The Annals of Mathematical Statistics, 43, 193-204 (1972). [3] R. Bhatia, Matrix Analysis, Springer (1991). [4] O. Barndorfi-Nielsen, \On a formula for the distribution of the maximum like- lihood estimate," Biometrika, 70, 343-365 (1983). [5] N. Breslow \Odds ratio estimators when the data are sparse," Biometrika, 68, 73-84 (1981). [6] E. Cantoni, \Robust inference for generalized linear models," J. Amer. Statist. Assoc 96, 1022-1030 (2001). [7] J. M. Chiou and H. G. Muller, \Estimated estimating equations: semiparamet- ric inference for clustered and longitudinal data," J. Roy. Statist. Soc. Ser.B 67, 531-553 (2005). [8] N. E. Breslow and D. G. Clayton, \Approximate inference in generalized linear mixed models," J. Amer. Statist. Assoc. 88, 9-25 (1993). [9] A. C. Davison, \Approximate conditional inference in generalized linear mod- els," J. Roy. Statist. Soc. Ser. B, 50, 445-461 (1988). [10] I. Domowitz, \Misspecifled models with dependent observations," J. of Econo- metrics 20, 35-38 (1982). [11] L. Fahrmeir and H. Kaufmann, \Consistency and asymptotic normality of the maximum likelihood estimators in generalized linear models," The Annals of Statistics 13, 342-368 (1985). [12] L. Fahrmeir, \Maximum likelihood estimation in misspecifled generlized linear models," Akademie-Veriag Berlin statistics 21 4, 487-502 (1990). [13] S. J. Haberman, \Maximum likelihood estimates in exponential response mod- els," Annals of Statistics 5, 815-841 (1977). 159 [14] J. Jiang, \Conditional inference about generalized linear mixed models," The Annals of Statistics 27, 1974-2007 (1999). [15] J. Jiang, H. Jia and H. Chen, \Maximum posterior estimates of random efiects in generalized linear mixed models," Statistica Sinica 11, 97-120 (2001). [16] X. P. Jiang, Ph.D. thesis, \Nonparametric quasi-likelihood in longitudinal data analysis," University of Maryland, College Park, (2004). [17] Y. Lee and J. A. Nelder, \Hierarchical generlized linear models," J. Roy. Statist. Soc. Ser. B 58, 619-678 (1996). [18] H. Li, B. G. Lindsay and R. P. Waterman, \E?ciency of projected score meth- ods in rectangular array asymptotics," J. Roy. Statist. Soc. Ser. B, 65, 191-208 (2003) [19] P. McCullagh and J. A. Nelder, Generalized Linear Models, (2nd ed. Chapman and Hall, New York, 1989). [20] P. McCullagh, \Quasi-likelihood functions," Annals of Statistics 11, 59-67 (1983). [21] C. E. McCulloch, \Maximum likelihood algorithems for generalized linear mixed models," J. Amer. Statist. Assoc. 92, 162-170 (1997). [22] C. E. McCulloch and S. R. Searle, Generalized, Linear, and Mixed Models (Wiley-Interscience Publication, 2001). [23] J. A. Nelder and R. W. M. Wedderburn, \Generalized linear models," J. Roy. Statist. Soc. Ser.A 135, 370-384 (1972). [24] J. Neyman and E. L. Scott, \Consistent estimates based on partially consistent observations," Ecomometrika. 16, 1-32 (1948). [25] J. Robins, N. Breslow and S. Greenland, \Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models," Bio- metrics 42, 311-323 (1986). [26] P. Royston \A remark on algorithm AS 181: The W test for normality" Applied Statistics 44 547-551 (1995). [27] S.S.Shapiro and M.B.Wilk \An analysis of variance test for normality (complete samples)" Biometrika 52 591-611 (1965) 160 [28] N. Sartori and T. A. Severini, \Conditional likelihood inference in generalized linear mixed models," Statistica Sinica 14, 349-360 (2004). [29] J. R. Schott, Matrix Analysis for Statistics [30] S. K. Sinha, \Robust analysis of generalized linear mixed models," J. Amer. Statist. Assoc. 99, 451-460 (2004). [31] G. W. Stewart, Introduction to Matrix Computations (Academic Press, 1973). [32] A. W. Van der Vaart, Asymptotic Statistics (Cambridge, 1998). [33] R. W. M. Wedderburn, \On the existence and uniqueness of the maximum likelihoodestimatesforgeneralizedlinearmodels," Biometrika63, 27-32(1976). [34] R. W. M. Wedderburn, \Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method," Biometrika, 61, 439-447 (1974). [35] X. Xie and Y. Yang, \Asymptotics for generalized estimating equations with large cluster sizes," The Annals of Statistics 31, 310-347 (2003). [36] S. Yusuf, R. Peto, J. Lewis, R. Collins and P. Sleight \Beta blockade during and after myocardial infartion: an overiew of the randomized trials" Progress in Cardiovascabr Diseases 27 335-371 (1985) [37] S. L. Zeger and K. -Y. Liang, \Longitudinal data analysis for discrete and continuous outcomes," Biometrics, 42, 121-130 (1986). [38] M. Znidaric, \Asymptotic expansion for inverse moments of Binomial and Pois- son Distributions?. 161