ABSTRACT Title of dissertation: ON ROBUSTNESS IN SOME EXTENDED REGRESSION MODELS Gabriela V. Cohen Freue, Doctor of Philosophy, 2004 Dissertation directed by: Professor Paul J. Smith Department of Mathematics Generalized Linear Models extends classical regression models to non-normal response vari- ables and allows a non-linear relation between the mean of the responses and the predictors. In addition, when the responses are correlated or show overdispersion, one can add a linear combina- tion of random components to the linear predictor. The resulting models are known as Generalized Linear Mixed Models. Traditional estimation methods in these classes of models rely on distrib- utional assumptions about the random components, as well as the implicit assumption that the explanatory variables are uncorrelated with the error term. In Chapters 2 and 3 we investigate, using the Change-of-Variance Function, the behavior of the asymptotic variance-covariance matrix of the class of M-estimators when the distribution of the random components is slightly contam- inated. In Chapter 4 we study a different concept of robustness for classical models that contain explanatory variables correlated with the error term. For these models we propose an instrumental variables estimator and study its robustness by means of its Influence Function. We extend the definitions of Change-of-Variance Function to Generalized Linear Models and Generalized Linear Mixed Models. We use them to analyze in detail the sensitivity of the asymptotic variance of the maximum likelihood estimator. For the first class of models, we found that, in general, a contamination of the distribution can seriously affect the asymptotic variance of the estimators. For the second class, we focus on the Poisson-Gamma model and two mixed-effects Binomial models. We found that the effect of a contamination in the mixing distribution on the asymptotic variance of the maximum likelihood estimator remain bounded for both models. A simulation study was performed in all cases to illustrate the relevance of our results. Finally, we propose a robust instrumental variables estimator based on high breakdown point S-estimators of location and scatter. The resulting estimator has bounded Influence Function and satisfies the usual asymptotic properties for suitable choices of the S-estimator used. We also derive an estimate for the asymptotic covariance matrix of our estimator which is robust against outliers and leverage points. We illustrate our results using a real data example. ON ROBUSTNESS IN SOME EXTENDED REGRESSION MODELS by Gabriela V. Cohen Freue Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2004 Advisory Commmittee: Associate Professor Paul J. Smith, Chair/Advisor Professor Grace L. Yang Professor Abraham M. Kagan Professor Benjamin J. Kedem Professor Francis Alt c? Copyright by Gabriela V. Cohen Freue 2004 DEDICATION In the memory of my grandparents Maria and Dante ii ACKNOWLEDGMENTS I am very grateful to my advisor, Professor Paul Smith for giving me an invaluable op- portunity to work on interesting projects over the past four years. He has always made himself available for help and provided good insights. I would also like to thank Professor Grace Young. She was there from the very beginning, when I was searching for a dissertation topic. She was always very enthusiastic about my work, encourage me and give me invaluable advice throughout these years. Special thanks are extended to the remaining members of my committee: Professor Abraham Kagan and Professor Benjamin Kedem, for their useful comments and suggestions that enhanced the quality of this work. Professor Ruben Zamar deserves a special mention and my sincere appreciation. I would like to thank him for his support and helpful comments. It has been a pleasure to work with and learn from such an extraordinary individual. I would also like to acknowledge the staff members of the Mathematical department for creating a friendly environment and being always available for help. In particular I thank Timothy Strobell and Tony Zhang for their help and support with computer softwares. Many thanks go to my colleague and great friend Ru for being a tremendous source of strength and enriching my graduate life. I thank my friends at Graduate Hills, who have been a crucial factor in my finishing smoothly, specially to Maria Jose and Juan Gabriel for their help with computer problems. I owe my deepest thanks to my family, especially my parents, Josefina and Jaime, my sister, Silvina, and my brother, Guillermo, who have always stood by me and give me unwavering support and love. I am very grateful to my family in law for always being there and for showing support. I would also like to thank Boris for believing in me and always encouraging me to pursue my dreams. Words cannot express the gratitude I owe to my husband, Hernan, who has guided me iii through my career and has pulled me through against impossible odds at times. I want to thank him for sharing the happiness of my achievements, for his understanding in hard times and his unrelenting love. A special thank you goes to my little son Lucas, for his patience during the last months of work and for bringing so much light to my life. Lastly, thank you all and thank God! iv TABLE OF CONTENTS List of Tables viii List of Figures ix 1 Literature review 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Generalized Linear Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Endogenous covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.1 Linear model with endogeneity . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5.1 M-estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.5.2 Influence Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.5.3 Change-of-variance function . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5.4 Extensions to more general models . . . . . . . . . . . . . . . . . . . . . . . 18 1.5.5 Robust Multivariate Location and Scatter Matrix Estimation . . . . . . . . 19 2 Change-of-variance function in GLMs 23 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 CVF of the M-estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3 Robust M-estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 v 2.3.1 A Class of robust M-estimators . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.2 CVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4.1 The Logistic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4.2 The contaminated model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 Change-of-variance function in GLMMs 43 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 The CVF of the M-estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.1 The MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3 The Poisson-Gamma Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.1 The CVF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3.2 V-Robustness of Parameter Estimates . . . . . . . . . . . . . . . . . . . . . 51 3.4 Mixed-Effects Binomial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.1 General Mixed-Effect Binomial Models . . . . . . . . . . . . . . . . . . . . . 54 3.4.2 V-Robustness of Parameter Estimates . . . . . . . . . . . . . . . . . . . . . 56 3.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5.1 The Poisson-Gamma model . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5.2 The contaminated model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.6 Conclusions and future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4 Robust Instrumental Variables Estimator 78 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2 Robust Instrumental Variables Estimator . . . . . . . . . . . . . . . . . . . . . . . 80 4.3 Properties of the RIV estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.4 Influence Function and Asymptotic Variance . . . . . . . . . . . . . . . . . . . . . 85 4.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 vi 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 vii LIST OF TABLES 2.1 Summary of maximum likelihood (MLE) and robust (ROB) estimation for uncon- taminated data in a Logistic Model. . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2 Summary of maximum likelihood (MLE) and robust (ROB) estimation using c = 1.2, for various levels of contaminated data in a Logistic Model. . . . . . . . . . . . 36 2.3 Summary of maximum likelihood (MLE) and robust (ROB) estimation usingc = .8, for various levels of contaminated data in a Logistic Model. . . . . . . . . . . . . . 37 2.4 Monte Carlo and estimated standard errors of the maximum likelihood (MLE) and the robust (ROB) estimators for various levels of contaminated data in a Logistic Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.1 Means and variances of MLE of Poisson-Gamma model. . . . . . . . . . . . . . . . 62 4.1 Three measures of strength for 62 Alaskan earthquake (from Fuller, 1987; source Meyers and von Hake, 1976). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.2 Contaminated Datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.3 Estimatesoftheregressioncoefficients, standarderrors(inparentheses)andvariance- covariance matrix of each estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.4 Estimatesoftheregressioncoefficients, standarderrors(inparentheses)andvariance- covariance matrix of each estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . 96 viii LIST OF FIGURES 2.1 Bias of the MLE and the robust (ROB) estimator using c = 1.2. . . . . . . . . . . 39 2.2 Bias of the MLE and the robust (ROB) estimator using c = .8. . . . . . . . . . . . 39 2.3 Estimated standard errors of the MLE and the robust (ROB) estimator using c = 1.2. 40 2.4 Estimated standard errors of the MLE and the robust (ROB) estimator using c = .8. 40 2.5 Bias of the estimated standard errors of the MLE and the robust (ROB) estimator using c = 1.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.6 Bias of the estimated standard errors of the MLE and the robust (ROB) estimator using c = .8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 Bias in the estimation of ? for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 64 3.2 Variance of estimate of ? for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 64 3.3 Bias in the estimation of ? for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 65 3.4 Variance of estimate of? for?0 = 0, ?1 = 1, and different values of? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5 Bias in the estimation of ? for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 66 3.6 Variance of estimate of? for?0 = 2, ?1 = 1, and different values of? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.7 Bias in the estimation of ?0 for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 68 3.8 Variance of estimate of ?0 for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 68 ix 3.9 Bias in the estimation of ?0 for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 69 3.10 Variance of estimate of ?0 for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 69 3.11 Bias in the estimation of ?0 for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 70 3.12 Variance of estimate of ?0 for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 70 3.13 Bias in the estimation of ?1 for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 71 3.14 Variance of estimate of ?1 for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 71 3.15 Bias in the estimation of ?1 for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 72 3.16 Variance of estimate of ?1 for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 72 3.17 Bias in the estimation of ?1 for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 73 3.18 Variance of estimate of ?1 for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. . . . . . . . . . . . . . . . . . . . . . 73 3.19 Bias in the estimation of ? for ?0 = 0, ?1 = 1, and ? = 2 under different levels of a scaled F6,6 contamination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.20 Variance of estimate of ? for ?0 = 0, ?1 = 1, and ? = 2 under different levels of a scaled F6,6 contamination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.21 Bias in the estimation of ?0 for ?0 = 0, ?1 = 1, and ? = 2 under different levels of a scaled F6,6 contamination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 x 3.22 Variance of estimate of ?0 for ?0 = 0, ?1 = 1, and ? = 2 under different levels of a scaled F6,6 contamination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.23 Bias in the estimation of ?1 for ?0 = 0, ?1 = 1, and ? = 2 under different levels of a scaled F6,6 contamination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.24 Variance of estimate of ?1 for ?0 = 0, ?1 = 1, and ? = 2 under different levels of a scaled F6,6 contamination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.1 Measures of strength for 62 Alaskan earthquakes with the OIV (dashed line) and the RIV (solid line) fit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2 Mahalanobis Distances for the original dataset. . . . . . . . . . . . . . . . . . . . . 90 4.3 Weights assigned by the RIV estimator to each point of the original dataset. . . . . 90 4.4 Clean and Contaminated Datasets with the OIV (dashed line) and the RIV (solid line) fit. Solid points identify those that have been artificially contaminated. . . . . 92 xi Chapter 1 Literature review 1.1 Introduction This thesis is focused on the theory of robust estimation in Generalized Linear Models (GLMs), Generalized Linear Mixed Models (GLMMs) and Linear Models with endogeneity. Traditional estimation methods in GLMs and GLMMs rely on distributional assumptions about the random components, as well as the implicit assumption that the explanatory variables are uncorrelated with the error term. In most real world applications these distributional assumptions are only approximately valid, and some covariates may be endogenous. During the past decades researchers have become increasingly aware that some statistical procedures can be extremely sensitive to small deviations from the model assumptions, hence questioning their empirical usefulness. Although there is extensive work on robust inference in the context of linear regression models, its extension toGLMs, GLMMsandlinearmodels withendogeneityremainslimited. Inthe followingparagraphs I briefly describe these models and address the main focus of this thesis. Many regression problems involve response variables that have a distribution other than the Normal distribution. There are a variety of models commonly used in these cases. Logistic and Probit regressions are used to model binary response variables. Poisson regression is often used to model count data, and proportional hazard and accelerated failure time models are well known models for survival times. Nelder and Wedderburn (1972) demonstrate the unity of these and other methods using the idea of Generalized Linear Models (GLMs). Generalized Linear Models extend classical linear regressions in two main directions. First, GLMs accommodate non-normally distributed responses. Second, they allow a non-linear relation (the link function) between the mean of the response variable and a linear function of the predictors. These models are described in detail in Section 1.2. 1 A natural extension of GLMs is to add a linear combination of random components to the linear predictor. The resulting models are known as Generalized Linear Mixed Models (GLMMs). These new models are widely used when the responses are correlated, such as those coming from longitudinal data studies, or when the data show overdispersion, such as the Poisson-Gamma model. A description of them is given in Section 1.3. Most of the methods of estimation considered for both GLMs and GLMMs rely on strong distributional assumptions (e.g., Nelder and Wedderburn, 1972; McCullagh and Nelder, 1989; McGilchrist and Yau, 1995; Lee and Nelder, 1996). However, their sensitivity to small deviations fromtheseassumptionshasnotbeenextensivelystudied. Bythe1960?sstatisticianswereconcerned by the fact that the performance of some estimators was very unstable under small deviations from idealized distributional assumptions. Various measures of robustness were introduced for the location estimation problem, such as the Influence Function (Hampel, 1968, 1974) and the Change- of-Variance Function (Rousseeuw, 1981). Section 1.5 will briefly describe these concepts. Their extensions to estimators of multiple linear regression parameters have been developed (Hampel, 1973; Huber, 1973 ; Ronchetti and Rousseeuw, 1985). However, there are still many concepts that have not been explored for GLMs and GLMMs, for example the Change-of-Variance Function (CVF). In Chapters 2 and 3 we investigate, using the CVF, the behavior of the asymptotic variance- covariance matrix of the class of M-estimators of GLMs and GLMMs respectively, when the distribution of the random components is slightly contaminated. Given that the notion of CVF had not been extended to these models before, we extend its definition and analyze its behavior for some existing M-estimators in the literature. Another generally implicit assumption in GLMs is that the covariates are uncorrelated with the error term. In practice, however, this may not be true. That is, some explanatory variables may be endogenous. Neglecting this endogeneity may cause a severe bias in the parameter estimates. In particular, linear models with endogeneity problems have been widely studied. Section 1.4 includes a brief review of these models. However, most of the existing estimation methods for 2 these models are extremely sensitive to outliers and influential points. In Chapter 4, we propose robust instrumental variables estimators for linear models with endogenous covariates. These estimators are constructed using high breakdown point S-estimators of multivariate location and scatter matrix. Their robustness is investigated by means of the Influence Function (IF). Moreover, diagnostic techniques to identify outliers and influential points in the sample are developed. The remaining of this Chapter consists of a brief survey of the literature on GLMs (Section 1.2), GLMMs (Section 1.3) and Linear Models with endogeneity (Section 1.4) together with a description of some elements of robustness theory that are going to be used throughout this thesis (Section 1.5). 1.2 Generalized Linear Models Nelder and Wedderburn (1972) introduced Generalized Linear Models (GLMs) as a unified frame- work for models that had previously been studied in the literature such as linear regression, in- troduced over two hundred years ago by Gauss and Legendre (e.g., Stigler, 1981), logit (e.g., Berkson, 1944) and probit models (e.g., Bliss, 1934). Even though all pieces had already existed, these authors were the first to show the similarities between seemingly disparate methods. Us- ing a methodology analogous to that developed for linear models, GLMs can be used to model response variables having a distribution belonging to an exponential family of distributions. Fur- thermore, the relationship between the response and the explanatory variables does not need to be linear. Section 1.2.1 presents the general setup of GLMs and Section 1.2.2 some of its methods of estimation. 1.2.1 The model GLMs are built under the following set of assumptions: ? Consider an n-dimensional vector of independent random responses Y. Conditionally, given that the vector of explanatory variablesXi =xi, each response variableYi has a distribution 3 in the exponential family, taking the form fYi|Xi(yi|xi,?i,?) = exp braceleftbiggy i?i ?b(?i) a(?) +h(yi,?) bracerightbigg , i = 1,...,n, (1.1) for some specific functions a(.), b(.) and h(.). If the dispersion parameter ? is known, this density belongs to an exponential family with canonical parameter ?i. ? An arbitrary function of the conditional mean of the response is modelled as linear in the predictors, i.e., g(E[Yi|xi]) = ?i =xi? (1.2) where ? is the vector of unknown parameters that we want to estimate. The function g(.) is called the link function and ?i is called the linear predictor. If ?i = ?i, g(.) is called the canonical link. ? Assume that X1,X2,...,Xn are iid random vectors with a marginal density given by u(x) which does not depend on the unknown vector of parameters ?. It can be easily shown that E[Yi|xi] = ?i = bprime(?i) and V[Yi|xi] = bprimeprime(?i)a(?) (1.3) where primes denote differentiation with respect to ?i. Thus the variance of Yi is the product of two functions: the variance function, bprimeprime(.), depending only on ?i (and hence on the mean, ?i), and another function depending on the dispersion parameter ?. Moreover, from (1.1), (1.2) and (1.3) one can derive the score statistic Sj = nsummationdisplay i=1 bracketleftbigg?l(? i;yi) ??i d?i d?i d?i d?i ??i ??j bracketrightbigg = nsummationdisplay i=1 bracketleftbigg(y i ??i) a(?) 1 V d?i d?ixij bracketrightbigg (1.4) where l(?i;yi) is the log-likelihood function of each component of Y. 1.2.2 Estimation Estimates of parameters for GLMs can be obtained using methods based on maximum likelihood (Nelder and Wedderburn, 1972). These estimates are generally computed not as global maximizers 4 of the log-likelihood function, but as the roots of the score statistics (1.4) which correspond to local maxima. For many important models, however, global and local maxima coincide. Explicit math- ematical expressions for estimators can be found only in some special cases (such as the Normal or the exponential distribution), but in general, numerical methods such as Newton-Raphson or Fisher?s scoring method will be needed. It can be shown that the latter is equivalent to an iterative weighted least squares algorithm. GLMs were incorporated in the GENSTAT statistics package and in the GLIM software. Now, most major statistical packages, such as SAS, S-Plus and R, have facilities for GLMs. Conditions for uniqueness and existence of MLE have been studied for various models based on concavity of the log-likelihood (Haberman, 1974; Wedderburn, 1976; Silvapulle, 1981; Kauf- mann, 1988). However, these conditions are difficult to check in practice. Moreover, under regu- larity conditions, asymptotic existence and uniqueness, consistency and asymptotic normality of the MLE have been proved (e.g., Haberman, 1977; Fahrmeir and Kaufmann, 1985). An important extension of GLMs is the approach known as Quasi-likelihood models intro- duced by Wedderburn (1974). Noting that the score function defined in (1.4) depends on the parameters only through the mean, ?i and the variance, b(?i)a(?), the full distributional assump- tion about the random component was replaced by a weaker assumption in which only the first and the second moments have to be specified. The estimators derived from the score equations in this manner are called maximum quasi-likelihood estimators (MQLE). When the distribution of Yi belongs to an exponential family, the MQLE are the MLE. Under general conditions, Fahrmeir (1990) proved the consistency and asymptotic normality of the MQLE. In the past decade, Bayesian methods have been developed for analyzing GLMs (Dey et al., 1999; Fahrmeir and Kaufmann, 1985). Bayesian models assume that ? is a random vector with prior density p(?). It is well known that an optimal estimator for ? under quadratic loss is the posterior mean. However, its computation requires solving integrals having the dimension of ?, which in general is not feasible. Some methods based on numerical or Monte Carlo integration have been proposed (e.g., Naylor and Smith, 1982; Smith et al., 1985). In general, application of these 5 methods is limited to models having a low dimensional parameter vector?. For higher dimensions, MCMC simulation techniques are commonly used (e.g., Dellaportas and Smith, 1993; Clayton, 1996). Based on samples drawn from the posterior density, posterior means and variances can be approximated using their sample analog. Moreover, posterior mode estimation is an alternative to full posterior mean estimation (e.g., Laird, 1978; Duffy and Santner, 1989). As the posterior mode estimator maximizes the posterior density, it is not required to solve any problem of integration. Kedem and Fokianos (2001) extends the generalized linear models methodology to time series where the data and the covariates are time dependent. For more details on a GLM and its extensions see McCullagh and Nelder (1989), Fahrmeir and Tutz (1997) and Dobson (2002). 1.3 Generalized Linear Mixed Models If the linear predictor of a Generalized Linear Model includes one or more random components in addition to the usual fixed effects, the resulting model is known as the Generalized Linear Mixed Model (GLMM). Examples include the Poisson-Gamma model used to account for the overdispersion often observed in count data or the Binomial-Beta model for binary data with correlated responses inherent in longitudinal or repeated measures designs (Lee and Nelder, 1996; Breslow and Clayton, 1993). In Section 1.3.1 we present the general setup of GLMMs and in Section 1.3.2 we describe some methods of estimations proposed for these models. 1.3.1 The model Let Y be the vector of n observations and U a vector of random effects. Assume that ? Conditionally, given the vector of random effects, U = u and X = x, the variables Y1,Y2,...,Yn are independent and each one has a distribution belonging to the exponential family, talking the form fYi|Xi,u(yi|Xi,u,?i,?) = exp braceleftbiggy i?i ?b(?i) a(?) +h(yi,?) bracerightbigg for some specific functions a(.), b(.) and h(.). 6 ? Let E[Yi|Xi,u] = ?i = bprime(?i) and g(?i) = ?i = Xi?+Ziu, where ? ? Rp is a vector of unknown parameters,u?Rq is a vector of random effects,Xi is a row vector of explanatory variables and Zi is the model vector for the random effects. ? U ? Fu(u|D), where D is the covariance matrix. We may assume later that D = D(?), for some unknown vector ?. The mixing distribution Fu is often assumed to be normal (McGilchrist, 1994; BreslowandClayton, 1993)butwearenotgoingtomakethatassumption in this thesis. ? Let? = (?T,?T)T ?R(p+q) be the vector of unknown parameters that we want to estimate. 1.3.2 Estimation A major difficulty in making inference about GLMMs is computational. Provided that the model is correctly specified and that the usual regularity conditions hold, the (marginal) maximum like- lihood estimator of ? is consistent and asymptotically normal (White, 1982). However, this esti- mator requires the evaluation of high-dimensional integrals as the likelihood function of the model is given by: L(?,?,D|Y) = integraldisplay productdisplay fYi|u(yi|u,Xi,?,?)dFu(u). The integral has dimension equal to q, which makes the problem practically intractable, except for some particular cases (Anderson and Aitkin, 1985; Crouch and Siegelman, 1990). To overcome this difficulty, alternative methods of estimation have been proposed. One such method was developed by McCulloch (1997) who used EM-like algorithms to obtain full maximum likelihood estimators. Three algorithms were proposed. First, he constructed a Monte Carlo version of the EM algorithm, called MCEM. Second, he proposed a new procedure, called Monte Carlo Newton-Raphson (MCNR). Finally, he adapted simulated maximum likelihood (SML) to this class of models. Some simulation studies were performed to investigate the convergence of these procedures. Other methods of estimation that avoid integration of the random effects have been proposed (Schall, 1991; McGilchrist, 1994; Kuk, 1995; Lee and Nelder, 1996). Instead of using the marginal 7 density to construct the likelihood function, the idea is to maximize the joint density to obtain approximate maximum likelihood estimates (or REMLE) of the fixed effects and the dispersion components. Breslow and Clayton (1993) used maximum quasi-likelihood estimation, introduced by Wed- derburn (1974) in GLMs (see Section 1.2.2), for GLMs with random effects. They proposed two different methods of estimation for GLMMs: the penalized quasi-likelihood (PQL) method and the marginal quasi-likelihood (MQL). The essential difference between the PQL and the MQL estimating equations is that the former incorporates the random effect terms Ziu in the linear predictor while the other specifies the model in terms of the marginal mean h(E[Yi|Xi]) =Xi?. An alternative method of estimation in GLMMs is based on the maximization of the joint distribution of the observed data and the random effects with respect to the parameters and the random effects (Lee and Nelder, 1996; McGilchrist et al., 1995). The idea is to extend the mixed- models equations of Henderson (1950) to models with more general distributional assumptions. This method is of particular interest when the estimation (or prediction) of the random effects is desired. In plant variety trials, for example, it is sometimes realistic to consider the variety as a random effect. The objective of variety trials is generally to find the best variety or to estimate the yield of each variety. Other methods have recently been proposed by Jiang (1998, 1999, 2001) for estimating the fixed effect and variance components in a GLMM. Jiang (1998) used simulated moments in a method of moments approach to avoid the evaluation of high dimensional integrals. He called it the ?method of simulated moments? (MSM). Under suitable conditions the MSM estimators are consistent as the total number of observations and the number of simulated random vectors go to infinity. However, simulation shows that these estimators can be inefficient. Jiang (2001) developed a new method that improves the efficiency of previous estimators as well as weakens some assumptions about the model. In addition, Jiang (1999) proposed a method of inference for GLMMs which relies on weak distributional assumptions about the random effects. He generalized the well-known method of weighted least squares (WLS) and proposed the penalized generalized 8 WLS (PGWLS) estimate of both the fixed and the random effects. In practice, one may not have sufficient information to estimate the random effects adequately. In those cases we may have to integrate out the random effects and estimate only the fixed effects and the variance components. The author remarked that it might be possible to estimate a subset of the random effects. This requires distributional assumptions only about the random effects that can not be estimated with adequacy, and only those are integrated out. He derived the likelihood function conditional on a subset of the random effects and maximized it to obtain the maximum conditional likelihood (MCL) estimates. 1.4 Endogenous covariates A key condition for the ordinary least squares (OLS) estimator to be consistent is that the error term is uncorrelated with each of the regressors. However, there are many situations in which this assumption is not satisfied, i.e., the model contains ?endogenous? covariates. In such a sit- uation, the OLS estimator yields biased and inconsistent parameter estimates, even when not all the covariates are endogenous. Moreover, when some covariates are endogenous, other covariates with theoretical coefficient zero in the regression may appear as significant in an ordinary estima- tion. The ?endogeneity? problem arises very often due to three main reasons: omitted variables, measurement error, and simultaneity. Omitted variables appear when we would like to control for one or more additional variables but, usually because of data unavailability, we cannot include them in the regression model. In such a case the omitted covariate becomes part of the disturbance term. If such covariate is correlated with any of the covariates included in the regression, then the correlation between the regressors and the error term is different from zero. Measurement errors appear when instead of observing the true explanatory variables x?i, one observesXi =x?i +?i. Then, the measurement error?i becomes part of the disturbance term inducing a correlation between the observed covariates and the error term. Simultaneity arises when at least one of the explanatory variables is determined simultane- 9 ously along with the response. If one covariate is determined partly as a function of the response, then that covariate and the error term are generally correlated. An equation can have more than one source of endogeneity. For example, in looking at the effect of alcohol consumption on worker productivity measured by wages, we usually think that alcohol usage is correlated with unobserved factors such us family background, that also affect wage. In addition, alcohol demand would generally depend on income, which is largely determined by wage. Finally, measurement error in alcohol usage is always a possibility. For a discussion of the three kinds of endogeneity as they arise in a particular field, see Deaton?s (1995) survey chapter on econometric issues in development economics. For a classical reference on measurement error models see Fuller (1987). Endogeneity problems are clearly explained in Amemiya (1985). A complete survey on endogeneity can be found in Anderson (1984). A common approach to address this problem is to use additional information contained in instrumental variables, which are variables that do not belong to the original equation, are correlated with the existing explanatory variables but uncorrelated with the error term. These new variables can be used to construct ordinary instrumental variables (OIV) estimators that yield consistent parameter estimates. Although the use of instrumental variables dated to the late twenties, Sargan (1958) provided a general treatment to the IV method and established its asymptotic properties. For a review of Sargan?s work see Arellano (2002). 1.4.1 Linear model with endogeneity Consider the following model where some covariates are correlated with the error term, i.e., the model contains endogenous covariates. Y = ?0 +X?1 +? (1.5) with E(?) = 0 and Cov(X,?) negationslash= 0 where Y is the n-dimensional column vector of observations of the response, ?0 is the regression intercept, ?1 is a p-dimensional column vector of parameters, X is an n?p matrix of observable random covariates, and ? is the n-dimensional column vector of i.i.d. unobserved disturbances 10 with zero mean. We can also write this model as Yi = ?0 +Xi?1 +?i with E(?i) = 0 and Cov(Xi,?i) negationslash= 0 for i = 1,...,n, where Yi, and ?i are the ith elements ofY and?, respectively, andXi is the ith row of X. If the covariates are uncorrelated with the disturbances, then ?0 and ?1 can be estimated consistently by ordinary least squares (OLS). However, when Cov(X,?) negationslash= 0, OLS produces in- consistent estimates. For example, assuming normality in a classical error-in-variables model with only one explanatory variable and uncorrelated error terms, it can be shown using the properties of the bivariate normal distribution that E[??1] = ?1(?x?x? +?uu)?1?x?x?. (1.6) where ?uu and ?x?x? are the variances of the measurement error and the true covariates, respec- tively. Dropping the normality assumption, the RHS of (1.6) represents the probability limit of ??1 as n tends to infinity. In both cases, the OLS is inconsistent and it is usually said that it has been attenuated by the measurement error in X. 1.4.2 Estimation The method of ordinary instrumental variables provides a general solution to the problem of endogenous covariates. To use this approach, we need a q-dimensional row vector of instrumental variables Wi, such that q greaterorequalslant p. In this thesis, however, we focus on the case where the model is exactly identified, i.e., p = q. For the instruments to be valid, Wi needs to be correlated with the endogenous covariates, but uncorrelated with the disturbance term. More formally, Wi must satisfy the following two conditions E(WTi ?i) = 0 (1.7) rank E(WTi Xi) = p. (1.8) 11 Note that the rank condition (1.8) means that Wi is sufficiently linearly related to Xi so that E(WTi Xi) has full rank. If all covariates are endogenous, thenWi is a list of p variables not contained in the original equation. When the model contains s exogenous and r endogenous variables (with s+r = p), each exogenous variable is already uncorrelated with the disturbance term and thus serves as an instrument for itself. In this case the vector Wi = (X1,...,Xs,I1,...,Ir), where I1,...,Ir are also uncorrelated with the disturbance but are not included in the original equation. The ordinary instrumental variables (OIV) estimator is defined as ??OIV = (??0,??1) = (?Y ? ?X??1,(WTX)?1WTY) (1.9) where ?Y = n?1summationtextni=1Yi, ?X = n?1summationtextni=1Xi, and W is the (n?p) matrix of observations on the p instruments. This estimator is consistent provided that the data contains no extreme observations. However, it is well known that it has an unbounded Influence Function (Krasker and Welsch, 1985) and that a single aberrant observation can break it down (i.e., it has zero breakdown point). Thus, in Chapter 4 we present a robust version of this estimator. 1.5 Robustness The problem of robustness was addressed by a number of eminent statisticians many years be- fore a mathematical theory of robust estimation was developed. By the 1960?s statisticians were concerned by the fact that the performance of some estimators was very unstable under small deviations from idealized distributional assumptions. This motivated the search of ?robust? pro- cedures which still behave fairly well under deviations from the assumed model. There have been several approaches to robust estimation of population parameters, including minimax asymptotic variance (Huber 1964), qualitative robustness (Hampel 1971), Influence Function (Hampel 1974), and Change-of-Variance Function (Hampel et al. 1981), among others. In this Section we review the definitions of M-estimates, Influence Function and Change- of-Variance Function of one-dimensional estimators and its extensions to classical linear models. 12 1.5.1 M-estimates Let X1,...,Xn be a set of independent and identically distributed observations belonging to some sample space X. Consider the parametric model F?, where the unknown parameter ? belongs to some parameter space ?. Huber (1964) proposed to estimate the parameter ? using Tn, defined by solving summationdisplay ?(Xi;Tn) = min Tn ! (1.10) When?(x;?) = lnf(x;?), the estimatorTn is the maximum likelihood estimator. Thus, estimators satisfying equation (1.10) are called ?M-estimator?, which comes from ?generalized maximum likelihood estimator?. When ? has derivative ?(x,?) = ??(x,?)/??, the estimate Tn satisfies the implicit equation summationdisplay ?(Xi;Tn) = 0. We will often identify? with theM-estimator it defines. IfFn is the empirical distribution function of X, the M-estimator is also defined as Tn = T(Fn), where T is the functional given by integraldisplay ?(x;T(F))F(dx) = 0. (1.11) 1.5.2 Influence Function The Influence Function (IF) was first introduced by Hampel (1974) in order to investigate the behavior of the asymptotic value of a one-dimensional estimator under small perturbations of the underlying distribution. More precisely, let F? = (1 ??)F +??x denote a neighborhood of the nominal distribution of the observations, F, contaminated by ?x, the point mass at x. Then, Definition 1.5.1. The Influence Function of the estimator defined by the functional T at a dis- tribution F is given by IF(x;T,F) = lim ??0 T(F?)?T(F) ? . for those x where the limit exists. 13 Remark 1.5.1. Note that if the limit exists, then IF(x;T,F) = (?/??)T(F?)|?=0, which is the directional (G?ateaux) derivative of T at F, in the direction of ?x. See Hampel et al. (1986) for further discussion. Heuristically, the IF describes the effect of an infinitesimal contamination at the point x on the estimate, standardized by the mass of the contamination. Replacing F by F? in equation (1.11), differentiating with respect to ? and assuming that integration and differentiation may be interchanged, one can derive the IF of the M-estimator defined by ?; that is, IF(x;?,F) = ???T(F?)|?=0 = ?(x;T(F))?integraltext(?/??)[?(y;?)] T(F)dF(y) . (1.12) An important summary of the IF is the gross-error sensitivity of T at F, which can be thought as a measure of the worst influence that an infinitesimal contamination can have on the estimate. Definition 1.5.2. The gross-error sensitivity of T at F is given by ?? = sup x |IF(x;T,F)|, where the supremum is taken over all x where the IF(x;T,F) exists. Moreover, we say that T is B-robust at F if ?? is finite. Therefore, an M-estimator defined by a function ? in (1.11), is B-robust at F if and only if ?(.,T(F)) is bounded. The IF in Linear Models Huber (1973) extended his results on robust estimation of a location parameter to the case of linear regression. In the framework of M-estimation, he proposed using Tn defined by Tn = min ??? nsummationdisplay i=1 ?((yi ?xi?)/?), (1.13) for some function ? : R?R+ and for a fixed ?. If ? has derivative ?, then Tn satisfies the system of equations nsummationdisplay i=1 ?((yi ?xiTn)/?)xi = 0. (1.14) 14 The functional T(F) corresponding to the M-estimator defined by 1.14 is the solution of integraldisplay ?((y?xT(F))/?)xdF(y,x) = 0. Using (1.12) it can be shown that IF((x,y);T,F) is unbounded in the x-space. Thus, these estimators are sensitive to high leverage points. Other estimators addressing this problem have been proposed. Definition 1.5.3. (Maronna and Yohai, 1981) A GM-estimator Tn for linear models is defined implicitly by nsummationdisplay i=1 ?(xi,(yi ?xiTn)/?)xi = 0. (1.15) where the function ? : Rp ? R ? R is continuous up to a finite set C(x;?), odd in the second argument and positive. Moreover, it is assumed that the set of points where it is continuous but (?/?r)?(x,r) is not defined or not continuous, denoted by D(x;?), is finite for all x. All known proposals for ? may be written in the form ?(x,r) = w(x)?(r?(x)). Note that Huber?s proposal, defined in (1.13), uses w(x) = ?(x) = 1. Mallows? and Schweppe?s proposals use ?(x) = 1 and ?(x) = 1/w(x), respectively (see Hill, 1977 and Merrill and Schweppe, 1971). Writing (1.15) as a functional equation, replacing the joint distribution H of the responses and the carriers by H? = (1??)H+??(x,y), and following Definition 1.5.1, it is easy to show that the IF of T at H is a p?1 vector given by IF((x,y);T,H) = ?(xi,(yi ?xiT(H)/?)M?1(?,H)x where M(?,H) =integraltext(?/?r)?(xi,(yi ?xiT(H)/?)xxTdH(x,y) (Hampel et al., 1986). Two different measures of sensitivity were introduced to describe the worst possible influence of contamination by outliers on the asymptotic value of T. Definition 1.5.4. The unstandardized gross-error sensitivity of T is defined as ??u(T,H) = sup{bardblIF((x,y);T,H)bardbl;x?Rp,y ?R\C(x,?)}, 15 and the (self-)standardized gross-error sensitivity is defined as ??s(T,H) = sup{[IFT((x,y);T,H)V?1(T,H)IF((x,y);T,H)];x?Rp,y ?R\C(x,?)}, where V(T,H) is the asymptotic variance of T under model H. Moreover, an estimator T is Bu-(Bs-)robust when ??u(??s) is finite. Krasker and Welsch (1982) noted that the unstandardized gross-error sensitivity is not invariant with respect to linear parameter transforms. Thus they introduced the (self-)standardized gross- error sensitivity to overcome this lack of invariance. For appropriate choices of functions ?(.), the GM-estimators are Bu-(Bs-)robust (Hampel et al., 1986). Other estimators, not covered in this review, have been proposed for classical linear mod- els such as MM-estimators, ?-estimators and S-estimators. For a survey on robust regression estimation see Maronna et al. (1993) and Rousseeuw and Leroy (1987). 1.5.3 Change-of-variance function Other important asymptotic concepts that are also interesting to study include the asymptotic variance and the asymptotic efficiency. Rousseeuw (1981) first defined the Change-of-Variance Function (CVF) of anM-estimator of a location parameter to investigate the infinitesimal stability of its asymptotic variance in the presence of contamination of the nominal distribution, assumed to be symmetric. In this Section we briefly describe the CVF of an M-estimator of a one-dimensional location parameter, together with its extensions to linear regression models. Let F? = (1??)F +?(12?x + 12??x). Consider the M-estimator defined by nsummationdisplay i=1 ?(xi ?Tn) = 0 which corresponds to the functional T defined by integraldisplay ?(x?T(F))dF(x) = 0. Assume that ? is a continuous, odd and positive function, with continuous derivative, ?prime, up to a finite set D(?). Let 0 c (2.30) wherecis a tuning constant. For this particular case, the last two terms of (2.29) are bounded as? is bounded. However, the functiont(X,Y;?) defined in (2.27) contains a quadratic terme2? which is not downweighted by ?(x). Then, the first term in the supremum in (2.29) is still unbounded 33 with respect to x. Thus, for these choices of weighting functions, the resulting estimators are not V-robust. 2.4 Simulation We carried out a set of simulations to compare the sensitivity of the estimated variance of the MLE with the estimated variance of a robust estimator (ROB) under different levels of contamination in a Logistic model with one covariate. For each level of contamination, we generated 500 Monte Carlo replications of the response variable using S-Plus Version 6.2.1. The values of the covariate are fixed in all the simulations and range from 1.52 to 2.36 in an equally spaced grid. The MLE of the regression coefficients and the estimates of its variance matrix were obtained using the glm function available in S-Plus. The robust estimator is in the class of M-estimators defined in (2.26) using the Huber function defined in (2.30) for ?(?) and w(x) = 1. For the Huber function we used two different values for the tuning constant, c = 1.2 and c = .8. The estimates of the coefficients and the variance matrix were obtain using an algorithm written by Cantoni and Ronchetti (2001). We also report the asymptotic efficiency of the robust estimator relative to the MLE. This quantity corresponds to the ratio of the traces of the variance matrices. 2.4.1 The Logistic model We generated clustered binary data according to the following model P(Yij = 1) = exp(?0 +?1xi)1+exp(? 0 +?1xi) , for i = 1,...,29; and j = 1,...,20, (2.31) with regression coefficients given by ?0 = 2 and ?1 = ?2. The binary data were grouped by covariate class so that the responses represent the number of successes in each cluster. Table 2.1 summarizes the results of the parameter estimates obtained by the MLE and the robust estimator (ROB) for the generated data. The last two rows correspond to the Monte Carlo standard errors and the mean of the standard errors obtained using the algorithms, respectively. When there is no contamination, the biases of both the MLE and the ROB estimators of the beta coefficients are not significantly different from zero. The bias of the robust estimator is reduced 34 when its tuning constant c is smaller. However, the efficiency of the estimator relative to the MLE decreases as well from .9467 to .8656. The estimated standard errors are slightly overestimated using the glm and Cantoni and Ronchetti?s algorithms. Table 2.1: Summary of maximum likelihood (MLE) and robust (ROB) estimation for un- contaminated data in a Logistic Model. c = 1.2 c = .8 MLE ROB MLE ROB ?0 ?1 ?0 ?1 ?0 ?1 ?0 ?1 Min. -0.7548 -3.3021 -1.1359 -3.3392 -0.8777 -3.3640 -0.7365 -3.7034 1st Qu. 1.3687 -2.3304 1.3228 -2.3717 1.2806 -2.3166 1.2526 -2.3580 Median 2.0231 -2.0128 2.0788 -2.0332 2.0154 -2.0234 2.0320 -2.0108 Mean 2.0195 -2.0121 2.0126 -2.0078 1.9753 -1.9898 1.9841 -1.9936 3rd Qu. 2.6667 -1.6652 2.6686 -1.6166 2.5894 -1.5970 2.7161 -1.6099 Max. 4.4567 -0.6145 4.6807 -0.4784 4.4945 -0.5340 5.0911 -0.6397 MC.sd 0.9266 0.4974 0.9519 0.5112 0.9196 0.4915 0.9615 0.5130 est.sd 0.9467 0.5068 0.9824 0.5262 0.9466 0.5065 1.0174 0.5445 2.4.2 The contaminated model In this Section we examine the effect of different levels of contamination on the maximum likelihood and the robust estimation. In each generated dataset, one cluster i is randomly chosen and with probability (1??) the observations yij in the cluster that are 0 are turned into 1. The responses are again grouped by covariate class and we fit the Logistic model described in (2.31). Tables 2.2-2.4 present the results of the simulation for the data generated under a conta- minated model, using a tuning constant c = 1.2 and c = .8 to construct the robust estimator. Figure 2.1 and Figure 2.2 show that the bias of the MLE increases as the level of contamination increases, while that of the robust estimator (ROB) remains almost constant for both values of c. However, the bias of these estimators is not significantly different from zero considering the 35 estimated standard errors presented in last row of these tables. Table 2.2: Summary of maximum likelihood (MLE) and robust (ROB) estimation using c = 1.2, for various levels of contaminated data in a Logistic Model. MLE ROB MLE ROB ?0 ?1 ?0 ?1 ?0 ?1 ?0 ?1 5% contamination 10% contamination Min. -1.7595 -3.5773 -1.8903 -3.6289 -1.1987 -3.6296 -1.1567 -3.8509 1st Qu. 1.3807 -2.3772 1.3773 -2.3863 1.3098 -2.3147 1.2777 -2.3691 Median 2.0590 -2.0280 2.0892 -2.0247 1.9862 -1.9673 2.0122 -1.9989 Mean 2.0466 -2.0254 2.0290 -2.0148 1.9453 -1.9673 1.9806 -1.9875 3rd Qu. 2.6799 -1.6664 2.7376 -1.6564 2.6289 -1.6297 2.6854 -1.6202 Max. 4.9967 -0.1254 5.1118 -0.0503 4.8443 -0.2771 5.1597 -0.2814 20% contamination 30% contamination Min. -0.8327 -3.6770 -0.8941 -3.4581 -0.9000 -3.5435 -0.5828 -3.4148 1st Qu. 1.2577 -2.3001 1.2968 -2.3304 1.0483 -2.3046 1.3224 -2.3494 Median 1.8884 -1.9313 1.9647 -1.9666 1.8508 -1.8810 1.9884 -1.9789 Mean 1.9190 -1.9366 1.9634 -1.9690 1.8661 -1.8954 2.0171 -1.9951 3rd Qu. 2.5661 -1.5923 2.6672 -1.5954 2.6417 -1.4769 2.6916 -1.6227 Max. 5.1027 -0.4582 4.6842 -0.4088 4.8628 -0.4323 4.7456 -0.6242 40% contamination 50% contamination Min. -1.4049 -3.6251 -0.9642 -3.5806 -1.9302 -3.6233 -1.2113 -3.5559 1st Qu. 0.9911 -2.2364 1.2731 -2.3223 0.9138 -2.2813 1.2958 -2.4215 Median 1.9086 -1.8786 2.0225 -1.9736 1.7519 -1.7966 1.9846 -1.9757 Mean 1.7828 -1.8389 1.9740 -1.9722 1.7683 -1.8164 1.9957 -1.9805 3rd Qu. 2.5389 -1.4124 2.6264 -1.6031 2.6694 -1.3668 2.8081 -1.5955 Max. 5.1939 -0.1428 5.0892 -0.5024 5.1219 0.0796 4.9889 -0.3342 36 Table 2.3: Summary of maximum likelihood (MLE) and robust (ROB) estimation using c = .8, for various levels of contaminated data in a Logistic Model. MLE ROB MLE ROB ?0 ?1 ?0 ?1 ?0 ?1 ?0 ?1 5% contamination 10% contamination Min -0.6247 -3.5177 -1.1913 -3.5637 -1.0991 -3.5059 -1.2039 -3.6807 1st Qu 1.3611 -2.3937 1.3877 -2.3985 1.2671 -2.2925 1.2936 -2.3186 Median 2.0484 -2.0152 2.0777 -2.0305 1.9197 -1.9419 1.9657 -1.9531 Mean 2.0594 -2.0334 2.0590 -2.0328 1.9232 -1.9507 1.9839 -1.9836 3rd Qu. 2.7384 -1.6762 2.7768 -1.6682 2.5642 -1.5879 2.6048 -1.5993 Max 4.7088 -0.6826 4.8789 -0.5246 4.9747 -0.3288 4.9895 -0.2965 20% contamination 30% contamination Min -0.9245 -3.8372 -1.3915 -3.9301 -0.8242 -3.9357 -1.0934 -3.7273 1st Qu. 1.2288 -2.2896 1.3691 -2.3765 1.1221 -2.2363 1.3032 -2.3186 Median 1.9212 -1.8963 2.0182 -1.9693 1.8223 -1.8636 2.0180 -1.9884 Mean 1.9135 -1.9306 2.0646 -2.0188 1.8454 -1.8855 2.0010 -1.9898 3rd Qu. 2.5893 -1.5614 2.8056 -1.6373 2.4620 -1.5201 2.6552 -1.6164 Max. 5.3126 -0.4582 5.5562 -0.2790 5.5506 -0.5593 5.1225 -0.4277 40% contamination 50% contamination Min. -1.2236 -4.1892 -0.7640 -3.8083 -1.6370 -4.3537 -0.9848 -4.0250 1st Qu. 1.0760 -2.2464 1.3246 -2.3371 0.8108 -2.3223 1.1498 -2.3348 Median 1.8088 -1.8266 2.0358 -1.9940 1.7012 -1.7711 1.9199 -1.9424 Mean 1.8166 -1.8535 2.0224 -1.9970 1.7622 -1.8147 1.9511 -1.9591 3rd Qu. 2.6091 -1.4548 2.6604 -1.6330 2.7166 -1.3026 2.6534 -1.5339 Max. 6.2623 -0.2642 5.5913 -0.6453 6.1654 0.0290 5.7108 -0.4373 37 Table 2.4: Monte Carlo and estimated standard errors of the maximum likelihood (MLE) and the robust (ROB) estimators for various levels of contaminated data in a Logistic Model. c = 1.2 MLE ROB MLE ROB ?0 ?1 ?0 ?1 ?0 ?1 ?0 ?1 5% contamination 10% contamination MC.sd 0.9788 0.5233 0.9932 0.5307 0.9969 0.5325 1.0276 0.5495 est.sd 0.9465 0.5068 0.9816 0.5258 0.9425 0.5040 0.9801 0.5246 20% contamination 30% contamination MC.sd 1.0044 0.5436 1.0052 0.5412 1.0844 0.5819 1.0010 0.5349 est.sd 0.9318 0.4978 0.9735 0.5207 0.9233 0.4926 0.9726 0.5205 40% contamination 50% contamination MC.sd 1.1692 0.6240 1.0343 0.5531 1.3121 0.7014 1.0621 0.5657 est.sd 0.9151 0.4874 0.9719 0.5198 0.9070 0.4826 0.9703 0.5190 c = .8 5% contamination 10% contamination MC.sd 0.9421 0.5075 1.0084 0.5418 0.9636 0.5157 1.0008 0.5379 est.sd 0.9474 0.5074 1.0187 0.5457 0.9390 0.5019 1.0113 0.5410 20% contamination 30% contamination MC.sd 0.9813 0.5282 1.0366 0.5584 1.0334 0.5499 1.0231 0.5447 est.sd 0.9296 0.4964 1.0076 0.5393 0.9236 0.4926 1.0097 0.5401 40% contamination 50% contamination MC.sd 1.1270 0.6027 1.0440 0.5534 1.3252 0.7108 1.1097 0.5999 est.sd 0.9136 0.4867 1.0074 0.5389 0.9081 0.4833 1.0071 0.5384 Figure 2.3 and Figure 2.4 illustrate that the standard errors of the MLE slightly increase 38 with the level of contamination while those of the ROB estimator are almost constant. Moreover, Figure 2.5 and Figure 2.6 present the estimated bias of the standard errors of both estimators for the two values ofc, respectively. Again, the difference between the MC and the estimated standard errors of the MLE becomes more important with more contamination. However, for both values of c, bias of the estimated standard error of the ROB estimator remains almost unchanged. 0.000.050.100.150.200.25 0% 5% 10% 20% 30% 40% 50% level of contamination b0-MLE b1-MLE b0-ROB b1-ROB Figure 2.1: Bias of the MLE and the robust (ROB) estimator using c = 1.2. 0.000.050.100.150.200.25 0% 5% 10% 20% 30% 40% 50% level of contamination b0-MLE b1-MLE b0-ROB b1-ROB Figure 2.2: Bias of the MLE and the robust (ROB) estimator using c = .8. 39 0.40.50.60.70.80.91.01.11.21.31.4 0% 5% 10% 20% 30% 40% 50% level of contamination bo-MLE b1-MLE bo-ROB b1-ROB Figure 2.3: Estimated standard errors of the MLE and the robust (ROB) estimator using c = 1.2. 0.40.50.60.70.80.91.01.11.21.31.4 0% 5% 10% 20% 30% 40% 50% level of contamination bo-MLE b1-MLE bo-ROB b1-ROBFigure 2.4: Estimated standard errors of the MLE and the robust (ROB) estimator using c = .8. 40 -0.05 0.000.050.100.150.200.250.300.350.400.45 0% 5% 10% 20% 30% 40% 50% level of contamination b0-MLE b1-MLE b0-ROB b1-ROB Figure 2.5: Bias of the estimated standard errors of the MLE and the robust (ROB) estimator using c = 1.2. -0.05 0.000.050.100.150.200.250.300.350.400.45 0% 5% 10% 20% 30% 40% 50% level of contamination b0-MLE b1-MLE b0-ROB b1-ROB Figure 2.6: Bias of the estimated standard errors of the MLE and the robust (ROB) estimator using c = .8. 41 2.5 Conclusions In this Chapter we extend the definitions of CVF and CVS to GLMs and study how an ?- contamination in the distribution perturbs the asymptotic variance of the estimators. We derive the CVF and the CVS for the class of M-estimators and we analyze in detail the MLE of GLMs with canonical links. In particular we derive the CVS for three commonly used GLMs: Logistic, Poisson and Gamma models. We found that, in general, the MLE is not V-robust, thus a conta- mination of the distribution can seriously affect its asymptotic variance. Moreover, we obtain the CVF for the class of M-estimators in the subclass of linear models, which was previously analyzed by Ronchetti and Rousseeuw (1985). We also study the CVS of a class of Mallows-type estimators and conclude that in general, for GLMs, V-robustness does not imply B-robustness as was proved for linear models (Ronchetti and Rousseeuw, 1985). We perform a simulation study to compare the performance of the MLE with that of a robust estimator in a Logistic model. In all simulated cases, the variance of the robust estimator remains almost constant and unbiased under different levels of contamination. However, that of the MLE increases with the level of contamination as well as its bias. The bias of the MLE is also increasing while that of the robust estimators does not change significantly. 42 Chapter 3 Change-of-variance function in GLMMs 3.1 Introduction This Chapter investigates how the asymptotic variance of the estimators of Generalized Linear Mixed Models (GLMMs) is affected when the conditional distribution of the responses is correctly specified, but the mixing distribution of the random effects is slightly contaminated. To study the infinitesimal stability of the asymptotic variance of the estimators, we extend the notion of Change-of-Variance Function (CVF) and the Change-of-Variance Sensitivity (CVS) to GLMMs. Generalized Linear Mixed Models are used to model the relationship between a function of the mean of the responses and a linear predictor that include a linear combination of random components. In addition GLMMs can accommodate nonnormally distributed responses such as Gamma or Poisson random variables. The random effects included in the linear predictor allow us to account for correlation between observations and overdispersion or to make subject-specific inference. A commonly used estimator for these models is the marginal MLE. Provided that the model is correctly specified and that the usual regularity conditions hold, this estimator is consis- tent and asymptotically normal (White, 1982). Some authors derive the joint maximum likelihood estimator to overcome computational difficulties (see Section 1.3.2). However, a contamination in the mixing distribution does not affect the estimation of the model coefficients. Thus we are not examining this estimator here. For further details see Section 1.3. In most practical applications, one rarely knows the true model. A natural question is what happens to the estimation if one does not assume the correct model. In particular, in this Chapter we will examine the case where the conditional distribution of the responses is correctly specified, but the mixing distribution of the random effects U is not. Gustafson (1996) studied the inconsistency of maximum likelihood estimators for certain conjugate mixture models under 43 misspecifications of the mixing distribution. He investigated the magnitude of the asymptotic bias using an influence function approach. Smith and Weems (2004) extended Gustafson?s approach to include a regression structure in the mean. They proved that the maximum likelihood estimators are robust under perturbations of the mixing distribution for Poisson-lognormal models. Neuhaus et al. (1992) examine the performance of the mixed-effects logistic regression MLE when the mixing distribution is misspecified. By a simulation experiment, they also studied the effect of the misspecification over the estimated standard errors of the estimators. However, to the best of my knowledge, there is no previous analytical work on the local effect of a mixing distribution misspecification in the asymptotic variance of the estimators for GLMMs. The remainder of the Chapter is organized as follows. In Section 3.2 we extend the notions of CVF and CVS for GLMMs. In particular, we derive the CVF of the (marginal) MLE. The CVS of this estimator is analyzed in detail for the Poisson-Gamma model in Section 3.3 and for two mixed-effects Binomial models in Section 3.4. A simulation study is performed for the Poisson- Gamma model and the results are summarized in Section 3.5. We end with some conclusions and some future research directions in Section 3.6. 3.2 The CVF of the M-estimators A misspecification of the mixing distribution may affect not only the behavior of the estimator itself but also its asymptotic variance. One can investigate the infinitesimal effects of a contami- nation of the type (3.1) on the asymptotic variance of the estimator by studying the CVF and the CVS. Although Definitions 1.5.5 and 1.5.6 were made in the framework of M-estimation of a one- dimensional parameter, they can be extended to the case of multivariate parameters. Ronchetti et al. (1985) defined the CVF and the CVS for estimators of classical linear regression coeffi- cients. In this Section we extend these definitions for M-estimators of GLMMs parameters under a contamination in the mixing distribution. Consider the model introduced in Section 1.3.1. Let fF be the marginal density of Yi given 44 Xi =xi when the random effects are distributed according to the mixing distribution F. That is, fF(yi) = fYi|xi(yi) = integraldisplay exp braceleftbiggy i?i ?b(?i) a(?) +h(yi,?) bracerightbigg f(u)du, i = 1,...,n. We are interested in estimating the vector of unknown parameters ? = (?T,?T) ? R(p+q), where ? ? Rp is the vector of regression coefficients and ? ? Rq is the vector of unknown parameters of the mixing distribution. Suppose that the mixing distributionF is slightly contaminated by a distributionG, so that the random variables U are actually generated from a distribution which is an ?-contamination of the nominal distribution, denoted F? = (1??)F +?G. (3.1) We will assume thatGis any distribution having the same first two moments asF. This restriction on G ensures that ? is interpretable as the true parameter vector, no matter how much the true model deviates from the nominal model (Gustafson, 1996). Let ? be the class of all such distributions G. Notation 3.2.1. Let EF be the expected value taken with respect to the density fF. Similarly define EG and E?. Let HF be the joint distribution of the response, the random effects and the covariates assuming the correct distribution F for the random effects and HG be the corresponding one when the contaminating distribution G is assumed. Note that a contamination of type (3.1) in the mixing distribution induces the same kind of contamination in the joint distribution, i.e. H? = (1??)HF +?HG. The M-estimator ?? is the solution of nsummationdisplay i=1 ?(xi,yi;?) = 0 (3.2) for suitably chosen functions ? from Rp ?R?R(p+q) to R(p+q) such that EH[?(X,Y;?)] = 0. 45 Note that ?? can be also defined as ?? =T(Hn), where the functional T is implicitly defined by integraldisplay ?(x,y;T(H))dH(x,y) = 0. (3.3) Under regularity conditions, by the law of large numbers, 1 n nsummationdisplay i=1 ?(xi,yi;?) ?E?[?(Xi,Yi;?)], where the expected value is taken under the true model (3.1). Let ?? be the solution of the equation E?[?(Xi,Yi;?)] = 0. (3.4) Then the zeros of (3.2) and those of (3.4) should also become close as n goes to infinity. In other words, under regularity conditions (Huber, 1967) we expect ?? to converge to ??. In particular, note that when ?(x,y;?) = ?? log(fF(y;x,?)), then ?? is the MLE. Definition 3.2.1. The Change-of-Variance Function (CVF) of T at HF under a contamination G in the mixing distribution is defined as CVF(T,HF,G) = ???[V(T,H?)]?=0. Definition 3.2.2. The unstandardized Change-of-Variance Sensitivity of T at HF is k?(T,HF) = sup G?? {tr CVF(T,HF,G)/tr V(T,HF)}. The estimator is called V-robust when its CVS, k?, is finite. In order to derive the CVF presented in Definition (3.2.1), we need the asymptotic variance of the M-estimates of GLMMs parameters. Huber (1967) proved asymptotic normality of M- estimators under weaker conditions than usual for a general class of models. Using a Taylor expansion approach, one can derive the asymptotic variance presented in next theorem. Theorem 3.2.1. Under regularity conditions, the asymptotic variance of the M-estimator hatwide? de- fined by the functional T in (3.3) is given by V(T,H?) = M?1? Q?braceleftbigM?1? bracerightbigT , (3.5) 46 where M? = E? bracketleftBig ????(X,Y;?)vextendsinglevextendsingle?=? ? bracketrightBig , (3.6) Q? = E? bracketleftBig (???(X,Y;?))(???(X,Y;?))T vextendsinglevextendsingle?=? ? bracketrightBig . (3.7) Applying Definition 3.2.1 to the asymptotic variance defined in (3.5), we can derive the CVF of the M-estimators of GLMMs. As the MLE is a commonly used estimator in GLMMs (Anderson and Aitkin, 1985; Crouch and Siegelman, 1990; McCulloch, 1997), in this Section we will derive its CVF instead. In particular, in Sections 3.3 and 3.4 we will examine the CVF and the CVS of the Poisson-Gamma models and two mixed effects Binomial models respectively. 3.2.1 The MLE We introduce some notation that will be used throughout this Chapter. For simplicity, the subindex i corresponding to the ith observation is omitted in the following results. Notation 3.2.2. Let l = logfF(y;x,?) be the log-likelihood function of Y given x under the nominal model. Notation 3.2.3. Let lr = (?/??r)logfF(y;x,?)vextendsinglevextendsingle?=? 0 ; lrj = (?2/??r??j)logfF(y;x,?)vextendsinglevextendsingle?=? 0 ; and lrjk = (?3/??r??j??k)logfF(y;x,?)vextendsinglevextendsingle?=? 0 , where ?r = ?r for r = 1,...,p, and ?r = ?r forr = p+1,...,p+q. Notation 3.2.4. Let Iks = [I?1(?)]ks, where I(?) is the Fisher information matrix. Notation 3.2.5. EG?F(?) = EG(?)?EF(?). Notation 3.2.6. Define Jrjk = EF[lrjk], and let Jrj be the vector obtained by fixing the first two indices of the three-way array. We start by evaluating (3.6) and (3.7) at ? = 0: M?|?=0 = EF bracketleftBig ??2??T log(fF(Y;x,?))vextendsinglevextendsingle?=? 0 bracketrightBig = I(?0), Q?|?=0 = EF bracketleftBig (?? log(fF(Y;x,?)))(?? log(fF(Y;x,?)))T vextendsinglevextendsingle?=? 0 bracketrightBig = I(?0). 47 Assuming that interchange of expectation and differentiation is allowed, and after some straightforward calculations, one obtains: CVF(MLE,HF,G) = ?V(?)?? vextendsinglevextendsingle vextendsinglevextendsingle ?=0 = I?1(?0) bracketleftbigg ?2?M(?)?? vextendsinglevextendsingle vextendsinglevextendsingle ?=0 + ?Q(?)?? vextendsinglevextendsingle vextendsinglevextendsingle ?=0 bracketrightbigg I?1(?0), (3.8) where parenleftbigg?M(?) ?? vextendsinglevextendsingle vextendsinglevextendsingle ?=0 parenrightbigg ij = summationdisplay k bracketleftBiggsummationdisplay s parenleftbigIksE G(ls) parenrightbigbracketrightBiggE F(lijk)+EG?F(lij), (3.9) parenleftbigg?Q(?) ?? vextendsinglevextendsingle vextendsinglevextendsingle ?=0 parenrightbigg ij = summationdisplay k bracketleftBiggsummationdisplay s parenleftbigIksE G(ls) parenrightbigbracketrightBiggE F(liklj +lrljk)+EG?F(lrlj). (3.10) Therefore, the V-robustness of the MLE depends on the contaminating function G through the first two order derivatives of the log-likelihood function. 3.3 The Poisson-Gamma Model Poisson models are widely used in various areas of application such as biology, reliability and environmental statistics, where the observed responses consist of the number of times an event occurs. Examples include the Gaver and O?Muircheartaigh (1987) data, which consists of the number of failures of 10 pumps (Lee and Nelder, 1996), or the number of colonies produced in the spleen of a recipient animal (Frome et al., 1973). A common practical complication of these models is overdispersion. In most cases, count data display substantial extra variation relative to the Poisson, which is completely determined by its mean. Some authors studied the effect of overdispersion on inferences made under the Poisson model (Paul and Plackett, 1978; Cox 1983). Many models have been proposed to accommodate overdispersion in statistical analysis, including the use of GLM with random effects (Lee and Nelder, 1996). If the distribution of multiplicative random effects applied to the mean of a Poisson model is assumed to be Gamma, then the marginal distribution of the response is Negative Binomial. This mixture of Poisson distributions is called Poisson-Gamma. In this Section we will derive the CVF when the Gamma mixing distribution is contaminated by another distribution G. For simplicity, we will examine the model with only one fixed and one random effect. That is, 48 ? Let Yi|xi,ui be a Poisson random variable with mean ui?i, where ?i = exp(?0 +?1xi). ? Let Xi and Ui be independent random variables. ? Assume further that Ui ? ?(1/?,?). Then E(Ui) = 1, and V(Ui) = ? Therefore the conditional distribution of Yi|xi is Negative Binomial, so that the log-likelihood function is given by l = K +log? parenleftbigg yi + 1? parenrightbigg ?log? parenleftbigg1 ? parenrightbigg ? parenleftbigg yi + 1? parenrightbigg log(1+?i?)+yilog(?i)+yilog(?), (3.11) E[Yi|xi] = ?i, E[Y2i |xi] = ?2i(1+?)+?i. (3.12) For simplicity, the subindex i is omitted in the following results. Notation 3.3.1. Let ? = ?/(1+??). Notation 3.3.2. Let ?(n)(u) = (dn+1/dun+1)(log(?(u)). These functions are known as polygamma functions. For example: the Digamma function is the function ?(u) = (d/du)(log(?(u)) and the Trigamma function is ?prime(u) = (d2/(du2)log(?(u)). Notation 3.3.3. For any z > 0, let ??(y;z) = ?(y+z)??(z), and similarly define ??prime(y;z) and ??primeprime(y;z). It is easy to show that for any positive integer n 0 ? ??(n;z) = n?1summationdisplay j=0 1 (z+j) ? n z (3.13) and ?nz2 ? ??prime(n;z) = ? n?1summationdisplay j=0 1 (z+j)2 ? 0 (3.14) 3.3.1 The CVF Using (3.8)-(3.11), we can derive the CVF for ??, the (marginal) MLE of ? = (?0,?1,?)T, when the mixing distribution F is contaminated by a distribution G. For simplicity in the notation we 49 will write ??(Y) instead of ??(Y;1/?) throughout this Chapter. Similarly we will write ??prime(Y) and ??primeprime(Y). Tedious calculations give I(?0) = ? ?? ? B 0 0 ?eF33 ? ?? ?, where B = ? ?? ? EX(?) EX(?X) EX(?X) EX(?X2) ? ?? ? and eF33 = EF(l33) = ??2EX(?) +??4EF[??prime(Y)]. More- over, ?M(?) ?? vextendsinglevextendsingle vextendsinglevextendsingle ?=0 = ?1eF 33 ? ?? ? eG3 A 0 0 eG3 eF333 ?aG33eF33 ? ?? ?, (3.15) where A = ? ?? ? EX(?2) EX(?2X) EX(?2X) EX(?2X2) ? ?? ?, eG3 = EG(l3) = 1?2EX[log(1+??)]? 1?2EG[??(Y)], aG33 = EG?F(l33) = ? 2?3EX[log(1+??)]+ 2?3EG[??(Y)]+ 1?4EG?F[??prime(Y)], eF333 = EF(l333) = 1?3EX bracketleftbigg ? ?(4+5??)(1+??)2 bracketrightbigg ? 6?5EF[??prime(Y)]? 1?6EF[??primeprime(Y)]. Similarly, ?Q(?) ?? vextendsinglevextendsingle vextendsinglevextendsingle ?=0 = 1eF 33 ? ?? ? 2eG3 A w wt bG3,3eF33 ?eG3 cF33 ? ?? ?, where w = ? ?? ? bG1,3eF33 ?eG3 cF13 bG2,3eF33 ?eG3 cF23 ? ?? ?, bG1,3 = ? 1?2EG?F bracketleftbigg(Y ??) (1+??)??(Y) bracketrightbigg , bG2,3 = ? 1?2EG?F bracketleftbiggX(Y ??) (1+??) ??(Y) bracketrightbigg , bG3,3 = EG?F bracketleftbigg ? 2Y?3(1+??)??(Y)+ parenleftbigg 2? ?3(1+??) ? 2log(1+??) ?4 parenrightbigg ??(Y)+ 1?4??(Y)2 bracketrightbigg . and cFk3 = EF [lk3l3 +lkl33], for i = 1,2,3. As the expectations cFk3 depend only on the nominal distribution F, which is fixed in our analyzes, we will not present their detailed forms. Finally, 50 following (3.8) we get the CVF for the Poisson-Gamma model: CVF = ?V(?)?? vextendsinglevextendsingle vextendsingle ?=0 = 1(eF 33)3 ? ?? ? 4eG3 (eF33)2B?1AB?1 ?eF33B?1w ?eF33wTB?1 eG3 (2eF333 ?cF33)+(bG3,3 ?2aG33)eF33 ? ?? ?. (3.16) 3.3.2 V-Robustness of Parameter Estimates According to Definition 3.2.2, an estimator is V-robust if its Change-of-Variance Sensitivity (CVS) is finite. In this Section we show that the diagonal entries of the CVF in (3.16) are bounded, which suffices to prove that the MLE of the Poisson-Gamma models are V-robust. Lemma 3.3.1. For any G? ?, the quantities (i) eG3 , (ii) aG33, and (iii) bG3,3 are all bounded in G? ?. Proof. (i) Recall from (3.15) that eG3 = 1?2EX[log(1+??)]? 1?2EG[??(Y)] Using z = 1/? in (3.13), the law of iterated expectations and (3.12), we get 0 ?EG[??(Y)] ??EG(Y) = ?EX(?), for all G? ?. (3.17) Then 1 ?2EX[log(1+??)???] ?e G 3 ? 1 ?2EX[log(1+??)]. 51 (ii) From (3.15) aG33 = ? 2?3EX[log(1+??)]+ 2?3EG[??(Y)]+ 1?4EG?F[??prime(Y)]. (3.18) Again, using z = 1/? in (3.14), the law of iterative expectations and (3.12), we get ??2EG[Y] ?EG[??prime(Y)] ? 0. (3.19) Thus, using inequalities (3.17) and (3.19) in (3.18) we obtain aG33 ? ? 2?3EX[log(1+??)]+ 2?2EX(?)? 1?4EF[??prime(Y)], for all G? ?, aG33 ? ? 2?3EX[log(1+??)]? 1?4 (?EX(?)+EF[??prime(Y)]), for all G? ?. (iii) Finally, bG3,3 = EG?F bracketleftbigg ? 2Y?3(1+??)??(Y)+ parenleftbigg2?? ?2(1+??)log(1+??) ?4(1+??) parenrightbigg ??(Y)+ 1?4??(Y)2 bracketrightbigg . If (1+??)log(1+??) ???, using (3.12), (3.13) and (3.17) we can show that bG3,3 ? EF bracketleftbigg2Y??(Y) ?3(1+??) ? parenleftbigg 2? ?3(1+??) ? 2log(1+??) ?4 parenrightbigg ??(Y)? 1?4??(Y)2 bracketrightbigg +EX bracketleftbigg 2?2 ?2(1+??) ? 2?log(1+??) ?3 bracketrightbigg + 1?2EX[?2(1+?)+?], bG3,3 ? EF bracketleftbigg2Y??(Y) ?3(1+??) ? parenleftbigg 2? ?3(1+??) ? 2log(1+??) ?4 parenrightbigg ??(Y)? 1?4??(Y)2 bracketrightbigg ? 2?2(1+??)[?+(1+?)?2]. Similarly, if ?? < (1+??)log(1+??), then bG3,3 < EF bracketleftbigg2Y??(Y) ?3(1+??) ? parenleftbigg 2? ?3(1+??) ? 2log(1+??) ?4 parenrightbigg ??(Y)? 1?4??(Y)2 bracketrightbigg + 1?2EX[?2(1+?)+?], bG3,3 > EF bracketleftbigg2Y??(Y) ?3(1+??) ? parenleftbigg 2? ?3(1+??) ? 2log(1+??) ?4 parenrightbigg ??(Y)? 1?4??(Y)2 bracketrightbigg EX bracketleftbigg 2?2 ?2(1+??) ? 2?log(1+??) ?3 bracketrightbigg ? 2?2(1+??)[?+(1+?)?2]. Proposition 3.3.1. The MLE of ? = (?0,?1,?)T for the Poisson-Gamma Model is V-robust. 52 Proof. Note that the diagonal entries of the CVF (3.16) depend on the contaminating distribution G only through the quantities of the previous Corollary. Moreover, V(?,F) is the asymptotic variance under the nominal model and therefore it does not depend on G. Thus, using the results of Corollary 3.3.1 and according to Definition 3.2.2, we prove that the MLE of the Poisson-Gamma parameters is V-robust. 3.4 Mixed-Effects Binomial Models In many applications, one needs to study the relationship between binomial responses and several explanatory variables. The response may also be a vector of binary responses per experimental unit or cluster. If the data are grouped as frequencies for each cluster, the response variable can be modelled by the binomial distribution. For example, in teratologic applications, pregnant animals are exposed to a pharmaceutical substance and they are sacrificed prior to the birth of the litter (Heagerty and Zeger, 2000). The fetuses of each litter are then examine to determine the presence or absence of a malformation. The response variable records this information for each fetus per litter (binary responses) or the number of fetuses per litter affected by the drug (binomial response). As in the case of the Poisson models, binary or binomial data often exhibit overdispersion with respect to the nominal variance. One possible explanation for the overdispersion is that in general, there exists intracluster dependence. In other words, observations from the same individual or cluster tend to be more similar than observations from different subjects. Many models have been proposed to model clustered binary data, including the use of GLM with random effects (e.g., Stiratelli et al., 1984; Neuhaus et al., 1992; Prentice, 1988; Heagerty and Zeger, 2000). The Beta-Binomial distribution is sometimes used to model binomial data with extra variation (e.g., Crowder, 1978; Williams, 1982; McCullagh and Nelder, 1989). In this Section we review alternative models that have been proposed in the literature to study binomial data. In particular, we examine two simple models that have an attractive marginal closed form density and hence maximum likelihood procedures are used to estimate the model 53 parameters. Finally, the effect of a contamination in the mixing distribution on the asymptotic variance-covariance matrix is investigated using the CVF. 3.4.1 General Mixed-Effect Binomial Models ? For i = 1,...,n, let Xi be a p-dimensional vector of covariates independent of Ui, a random intercept. ? Assume further that Xi are independent and identically distributed for i = 1,...,n. ? Similarly, the random effects Ui, i = 1,...,n are independent and identically distributed. ? Conditionally, given that Xi = xi and Ui = ui, for each cluster i, we observe ni binary responses Yij, j = 1,...,ni. Let Yi = summationtextnij=1Yij. Then, Yi|xi,ui is a binomial random variable with mean nipi, where g(pi) = ?i +xi? and ?i = ?(ui). There are several link functions g commonly used in the literature: ? The logit link function g(?) = log(?/(1??)). ? The identity link function g(?) = ?. ? The probit link function g(?) = ??1(?), where ? is the cumulative distribution function of a standard normal random variable. A common approach to estimate the parameters of a mixed-effects binomial model is using maximum likelihood methods (Neuhaus et al., 1992; Heagerty and Zeger, 2000; Neuhaus, 2001). The main difficulty of this method is that to obtain the likelihood, one must solve a set of integrals that are typically intractable. The marginal likelihood function is given by nproductdisplay i=1 integraldisplay f(yi|xi,ui)dG(ui) = nproductdisplay i=1 integraldisplay ? ?? ? ni yi ? ?? ?pyii (1?pi)ni?yidG(ui). (3.20) In general, there is no closed form for the marginal likelihood (3.20). Approaches to overcome this difficulty include numerical integration (Neuhaus, 2001), approximate solutions (Stiratelli et al., 54 1984; Neuhaus et al. 1992) and Monte Carlo EM algorithms (McCulloch, 1997). Two exceptions that we are going to examine later are: (1) A model with ni = 1, for i = 1,...,n, g the identity link function and a random intercept Ui ?Beta(?1,?2). (2) A model with only a random intercept having Beta distribution, i.e. Ui ?Beta(?1,?2) and ? = 0. The model described in item (2) is known as the Beta-Binomial model and was extensively used to model overdispersion of binomial data (e.g., Crowder, 1978; Williams, 1982; McCullagh and Nelder, 1989). A disadvantage of this model is that it does not include the relation with other explanatory variables or fixed effects. Lee and Nelder (1996) consider an alternative approach to incorporate fixed effects in this model. They maximize the hierarchical likelihood (logarithm of the joint density function) to obtain the Maximum Hierarchical Likelihood Estimates (MHLEs). Note however, thatacontaminationinthemixingdistributionwillnotaffecttheseestimators. Therefore, we are not going to study this approach. Another commonly used model consists of assuming that conditionally on the random effects Ui = ui, the response variable Yi has a binomial distribution with mean niui and the random effects have a Beta distribution. Moreover, a p-dimensional vector of covariates Xi is incorporated to the model using the relation g(E(Ui)) =Xi? (Williams, 1982; Kuppert et al., 1986). However, this is not a mixed-effects model so we are not going to cover its analysis in this Chapter. A major disadvantage of maximum likelihood estimation is that it requires full specification of the mixing distribution. Neuhaus et al. studied the effect of a misspecification of the mixing distribution on the parameter estimates. The effects on the estimated standard errors were ex- amined by a simulation study. We derive the CVF of MLE for Models (1) and (2) to study the stability of the estimated standard errors under a slight contamination of the mixing distribution. 55 3.4.2 V-Robustness of Parameter Estimates Model (1) Let?s consider first a linear probability model with one observation per cluster, i.e., ni = 1 and g(?) = ?. The main disadvantage of this model is that ? is restricted to the interval [0,1], thus imposing a restriction on the parameters ?. It is easy to show that this model has marginal density given by f(yi|xi) = parenleftbigg ? 1 ?1 +?2 +xi? parenrightbiggyi + parenleftbigg ? 2 ?1 +?2 ?xi? parenrightbigg1?yi , for yi = 0,1. Then the log-likelihood is a linear function of the responses yi: l = l(?;xi,yi) = log parenleftbigg ? 2 ?1 +?2 ?xi? parenrightbigg +yilog parenleftbigg? 1 +(?1 +?2)xi? ?2 ?(?1 +?2)xi? parenrightbigg , where ? = (?1,?2,?T)T ?Rp+2. Remark 3.4.1. Note that for any quadratic function q(.), EF[q(Y)] = EG[q(Y)] for any G ? ?. Thus, EG?F[lj] = EG?F[ljlk] = EG?F[ljk] = 0, where lj = ?l/??j, ljk = ?2l/??j??k and j,k = 1,...,p+2. Gustafson (1996) defined the local effect of the mixing distribution misspecification on the estimators ?? by ?primej(0) = integraldisplay {I?1(?0)sj(?0;u)}d(G?F)(u) where I?1(?0) is the inverse of the Fisher information matrix and sj(?0;u) = E[lj|u] is the conditional score forj = 1,...,p+2. An immediate consequence of Remark 3.4.1 is that?primej(0) = 0, for all G? ?. Gustafson (1996) called these estimators first-order consistent. The results of Remark 3.4.1 can also be used to analyze the CVF of the MLE for Model (1). Following (3.8)-(3.10) we can derive the CVF and note that the CVF(MLE,G) does not depend on G, for any G ? ?. Therefore, the (marginal) maximum likelihood estimators are V-robust according to Definition 3.2.2. 56 Model (2) The Beta-Binomial Model is more useful from a practical point of view. Its marginal density is given by f(yi) = ? ?? ? ni yi ? ?? ? ?(?1 +?2)?(yi +?1)?(ni +?2 ?yi) ?(?1)?(?2)?(?1 +?2 +ni) , the conditional first two moments of the response are given by E[Yi|ui] = niui, V[Yi|ui] = niui(1?ui), and the unconditional moments are E[Yi] = nipi, V[Yi] = nipi(1?pi)[1+?(ni ?1)]. (3.21) where pi = ?1/(?1+?2), and ? = 1/(?1+?2+1). From equation (3.21), we can see how the extra variation is added to the model. For simplicity, we assume that ni = m for i = 1,...,n and the subindex i is omitted in the following results. Notation 3.4.1. For k = 1,2, let ??k(u) = ??(u;?k) defined in Notation (3.3.2). Similarly define ??primek(u). The log-likelihood function, except for a constant that does not depend on the unknown parameters, is given by l = log?(?1+?2)+log?(y+?1)+log?(m+?2?y)?log?(?1)?log?(?2)?log?(?1+?2+m). (3.22) Using (3.8)-(3.11), we can derive the CVF for ??, the (marginal) MLE of ? = (?1,?2), when the mixing distribution F is contaminated by a distribution G. Note that this matrix depends on the contaminating function G only through EG[lr], EG[lrj], EG[lrlj], for r,j = 1,2. Moreover, the partial derivatives lr and lrj are up to a constant equal to ??k(y) and ??primek(m?y), respectively. Thus, the CVF depends on G only through the expectations analyzed in the following Lemma: Lemma 3.4.1. Let U be a random variable having a beta distribution with parameters (?1,?2). Given U = u, let Y be a binomial random variable with mean m times u. Let ?k(.) and ?primek(.) be the polygamma functions defined in Notation (3.4.1). Then, for any G? ?, the expectations 57 (i) EG[??1(Y)], (ii) EG[??2(m?Y)], (iii) EG[(??1(Y))2], (iv) EG[(??2(m?Y))2], (v) EG[(??1(Y))(??2(m?Y))], (vi) EG[??prime1(Y)] and (vii) EG[??prime2(m?Y)] are all bounded. Proof. We first compute the first two absolute moments of Y that are going to be used throughout this proof. Using conditional expectations, EG[Y] = EG[E[Y|U]] = EG[mU] = m ?1? 1 +?2 ,and EG[Y2] = EG[E[Y2|U]] = EG[mU(1?U)+m2U2] = m(m?1)EG[U2]+mEG[U] = m(m?1)?1(?1 +1)(? 1 +?2)(?1 +?2 +1) + m?1? 1 +?2 . (3.23) (i) Replacing z = ?1 in (3.13), taking iterated expectations and using (3.23) we obtain 0 ?EG[??1(Y)] ? 1? 1 EG[Y] = m? 1 +?2 . (3.24) Hence, EG[??1(Y)] is bounded for all G? ?. (ii) The proof is almost identical to that in (i) once we note that conditionally, given that U = u, the random variable W = m?Y is a binomial random variable with mean m(1 ?u). Then, replacing ?1 with ?2 and Y with W in (3.24) we get 0 ?EG[??2(W)] ? m? 1 +?2 , for all G? ?. 58 (iii) From (3.13) we can prove that 0 ?EG[??21(Y)] ? 1?2 1 EG[Y2]. (3.25) for all G? ?. Then, using (3.23) we obtain 0 ?EG[??21(Y)] ? m? 1 bracketleftbigg (m?1)(? 1 +1) (?1 +?2)(?1 +?2 +1) + 1 (?1 +?2) bracketrightbigg . (iv) Again replacing Y with W = m?Y in (3.25) and ?1 with ?2, we obtain 0 ?EG[??21(W)] ? m?2 2 bracketleftbigg (m?1)? 1(?1 +1) (?1 +?2)(?1 +?2 +1) + (1?2m)?1 (?1 +?2) +m bracketrightbigg for all G? ?. (v) As ??1(Y) and ??2(m?Y) are both nonnegative functions, and using the Cauchy-Schwarz inequality, we get 0 ?EG[??1(Y)??2(m?Y)] ?E1/2G [??1(Y)]E1/2G [??2(m?Y)] Then, by (iii) and (iv), EG[??1(Y)??2(m?Y)] is also bounded for all G? ?. (vi) Replacing z with ?1 in (3.14), taking iterated expectations and using (3.23) we can show that 0 ?EG[??prime1(Y)] ? ? 1?2 1 EG[Y] = ? m? 1(?1 +?2) (3.26) Hence, EG[??prime1(Y)] is bounded for all G? ?. (vii) Similarly, replacing ?1 with ?2 and Y with W = m?Y in (3.26) we obtain 0 ?EG[??prime2(W)] ? ? m? 2(?1 +?2) for all G? ?. 59 Proposition 3.4.1. The MLE of ? = (?1,?2)T for the Beta-Binomial Model is V-robust. Proof. Note that the entries of the CVF (3.8) depend on the contaminating distribution G only through the expectations analyzed in Lemma 3.4.1. Thus, using the results of this lemma and according to Definition 3.2.2, we prove that the (marginal) MLE of the Poisson-Gamma parameters is V-robust. Remark 3.4.2. Gustafson (1996) found the exact minimum and maximum first order bias of the estimators but only for m ? 5. Using results (i) and (ii) in Lemma 3.4.1, we can get bounds for any value of m. As these bounds are finite, we can conclude that the estimators are B-robust for any value of m. 3.5 Simulation A simulation study was performed in order to assess the magnitude of the change in the variance of the estimators when the mixing distribution is contaminated in the Poisson-Gamma Model. The performance of the estimators was investigated in samples generated by S-Plus Version 6.2.1 for different choices of population parameters and different types of contaminations. The MLE of these parameters was obtained using a modified version of the glm.nb function available in the MASS library (Venables and Ripley, 1999). The modification of the glm.nb function corresponds to a reparametrization of the log-likelihood in order to estimate the variance of the gamma distribution. 3.5.1 The Poisson-Gamma model We start examining the Poisson-Gamma model without any contamination in the mixing distribu- tion. More precisely, we generate 1000 covariatesXi from a standard normal distribution and 1000 random effects Ui from a gamma distribution with E(Ui) = 1 and V(Ui) = ?, for i = 1,...,1000. Conditionally on (xi,ui), a sample of 1000 random variables Yi is generated from a Poisson distri- bution with E(Yi|xi,ui) = uiexp{?0 +?1xi} (see Section (3.3.1) for details of the model). Different choices of ? = (?0,?1,?)T are considered in order to analyze later the effect of the contamination on distributions with different characteristics. The vector of parameters 60 (?0,?1)T determines the shape and location of ? = exp(?0 + ?1x). Three different choices are used in the simulation study for this vector that describe three general positions of the curve ?: {(0,1),(?2,1),(2,1)}. The Gamma distribution depends on ?. For 0 < ? < 1, the density has a mode at y = 1?? and is positively skewed. For ? > 1, it decreases monotonically. For ? = 1 the exponential distribution is obtained as a special case. Therefore, we consider the following set of parameter values for ?: {.25,.5,1,1.5,2}. For each choice of parameters, 1000 Monte Carlo replications of the random effects {Ui} and the responses {Yi} were generated. The same sample of {Xi} was used in all replications. The MLE of ? was computed for each of these random samples and the results were used to obtain estimates of the mean and variance of the estimators, which are summarized in Table 3.1 below. When the model is correctly specified there is a small bias in the estimates of the population parameters. In almost all cases the bias is of the order of 10?3, with the exception of the bias of ? when the true population parameter is? = (2,1,?)T, where the bias is of order 10?2 for all choices of ?. The estimated variance of the estimate of ? increases in all cases as the true parameter ? increases. 61 Table 3.1: Means and variances of MLE of Poisson-Gamma model. True parameters MC mean MC variance ?0 = 0, ?1 = 1 ?0 ?1 ? ?0 ?1 ? tau=.25 -0.001778 1.000549 0.249692 0.001527 0.001263 0.001769 ?=.5 -0.003705 1.001110 0.500010 0.001915 0.001785 0.003416 ?=1 -0.003886 0.997852 0.997655 0.002604 0.002574 0.008107 ?=1.5 -0.002655 0.998346 1.491551 0.003081 0.003264 0.015580 ?=2 -0.003281 1.000541 1.999453 0.003572 0.003799 0.028664 ?0 = 2, ?1 = 1 ?0 ?1 ? ?0 ?1 ? ?=.25 1.998179 1.000251 0.249354 0.000427 0.000449 0.000301 ?=.5 1.999330 0.998404 0.499168 0.000709 0.000784 0.000839 ?=1 1.997411 1.001111 0.998728 0.001295 0.001311 0.002738 ?=1.5 1.996011 0.999635 1.496918 0.001638 0.001849 0.006091 ?=2 1.998376 1.001075 1.993784 0.002238 0.002227 0.010421 ?0 = ?2, ?1 = 1 ?0 ?1 ? ?0 ?1 ? ?=.25 -2.005116 0.999984 0.239731 0.009631 0.005439 0.026945 ?=.5 -1.999489 0.996300 0.477674 0.010382 0.006407 0.047147 ?=1 -2.015171 1.005591 0.988291 0.010992 0.008519 0.083042 ?=1.5 -2.005543 0.998886 1.480759 0.012495 0.009616 0.133988 ?=2 -2.008588 0.999845 1.973542 0.013128 0.009952 0.198262 3.5.2 The contaminated model In this Section we examine the behavior of the estimators and their estimated variances under two contaminated models with different tail behaviors. The contaminating distribution considered are the lognormal and the scaled F-distribution (cF, where c is a positive constant and F is a random variable having an F-distribution). The random effects are now generated from a mixed 62 distribution given by F? = (1??)G+?L where G represents the Gamma distribution and L the contaminating distribution, both with expectation equal to 1 and and variance equal to ?. Various choices of the ? parameter are considered for the lognormal contaminating distri- bution and results are illustrated in Figures 3.1-3.18. In the case of scaled F, the restriction on the first two moments imposes a constraint on the degrees of freedom and the constant c. Because smaller degrees of freedom of the denominator means heavier tails of the resulting distribution, we choose this parameter to be 6. Both c and the degrees of freedom of the numerator are now completely determined by ?. As we want the degrees of freedom to be an integer, this restricts the choices of ?. For this reason only one set of parameters is used in the simulation for this contaminating distribution. For the case of a lognormal contamination, Figures 3.1, 3.3 and 3.5 show that the bias of ?? increases as the level of contamination increases. For true parameters ?0 = 0 and ?0 = 2, this bias also increases at each level of contamination, with the value of the true variance (see Figures 3.3 and 3.5). Figures 3.2, 3.4 and 3.6 show that the estimated variance of this estimate remains almost constant throughout all levels of contamination for all choices of the true ?. Moreover, for all levels of contamination, the magnitude of the estimated variance is larger for larger values of the true variance. A similar behavior is found in the estimated bias and variance of the parameter ?0 as can be seen in Figures 3.7-3.12. However, note that the values of the bias of the estimate of ?0 are much smaller than those of ?. The behavior of the bias of ??1 is not monotonic as can be seen in Figures 3.13, 3.15 and 3.17. Considering that the values of this bias are of the order of 10?3, we interpret these as pure noise from the simulation. The estimated variance follows the same pattern as that of previous parameters (see Figures 3.14, 3.16 and 3.18). 63 0.000.050.100.150.200.25 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.1: Bias in the estimation of ? for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 0 0.05 0.1 0.15 0.2 0.25 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.2: Variance of estimate of ? for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 64 0.000.050.100.150.200.250.300.35 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau-.5 tau=1 tau=1.5 tau=2 Figure 3.3: Bias in the estimation of ? for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 0.0000.0050.0100.0150.0200.0250.0300.035 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.4: Variance of estimate of ? for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 65 0.000.100.200.300.400.50 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.5: Bias in the estimation of ? for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 0 0.0020.0040.0060.008 0.01 0.012 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.6: Variance of estimate of ? for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 66 It is also interesting to note that for each choice of ?, the variances of all estimates decrease as ?0 moves from ?2 (for which the curve ? is flatter around 0) to 2 (for which the curve ? is steeper around 0). Similar results are found in the estimates of the parameters and its variances under a scaled F contamination. Figures 3.19, 3.21 and 3.23 show the bias of the estimates under different levels of contamination. As before, the bias of ??0 and ?? increases with ? while that of ??1 does not follow a monotonic behavior. Moreover, under this contamination, the magnitude of the bias of ??0 and ?? is larger than for the lognormal contamination. It is important to note that the variances of all the estimators, under different parameter choices and under both contaminated functions, remain bounded as it was proved theoretically in Section 3.3.1. 67 0.000.010.020.030.040.050.06 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.7: Bias in the estimation of ?0 for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 0.0080.009 0.01 0.0110.0120.0130.014 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.8: Variance of estimate of ?0 for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 68 0.000.010.020.030.040.050.06 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.9: Bias in the estimation of ?0 for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 0.00150.00200.00250.00300.00350.00400.0045 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.10: Variance of estimate of ?0 for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 69 0.000.010.020.030.040.050.06 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.11: Bias in the estimation of ?0 for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 0 0.0005 0.001 0.0015 0.002 0.0025 0.003 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.12: Variance of estimate of ?0 for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 70 0.0000.0010.0020.0030.0040.0050.006 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.13: Bias in the estimation of ?1 for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 0.0050.0060.0070.0080.009 0.01 0.0110.012 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.14: Variance of estimate of ?1 for ?0 = ?2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 71 0.0000.0010.0020.0030.0040.005 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.15: Bias in the estimation of ?1 for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 0.0000.0010.0020.0030.0040.005 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.16: Variance of estimate of ?1 for ?0 = 0, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 72 0.00000.00050.00100.00150.00200.00250.0030 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.17: Bias in the estimation of ?1 for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 0 0.0005 0.001 0.0015 0.002 0.0025 0.003 0% 5% 10% 20% 30% 40% 50% level of contamination tau=.25 tau=.5 tau=1 tau=1.5 tau=2 Figure 3.18: Variance of estimate of ?1 for ?0 = 2, ?1 = 1, and different values of ? under different levels of a lognormal contamination. 73 0 0.10.20.30.40.50.6 0% 5% 10% 20% 30% 40% 50% level of contamination Figure 3.19: Bias in the estimation of ? for ?0 = 0, ?1 = 1, and ? = 2 under different levels of a scaled F6,6 contamination. 0.0190.0210.0230.0250.0270.029 0% 5% 10% 20% 30% 40% 50% level of contamination Figure 3.20: Variance of estimate of ? for ?0 = 0, ?1 = 1, and ? = 2 under different levels of a scaled F6,6 contamination. 74 0 0.05 0.1 0.15 0.2 0.25 0% 5% 10% 20% 30% 40% 50% level of contamination Figure 3.21: Bias in the estimation of ?0 for ?0 = 0, ?1 = 1, and ? = 2 under different levels of a scaled F6,6 contamination. 0.00330.00340.00350.00360.00370.00380.0039 0% 5% 10% 20% 30% 40% 50% level of contamination Figure 3.22: Variance of estimate of?0 for?0 = 0, ?1 = 1, and? = 2 under different levels of a scaled F6,6 contamination. 75 0 0.00040.00080.00120.0016 0% 5% 10% 20% 30% 40% 50% level of contamination Figure 3.23: Bias in the estimation of ?1 for ?0 = 0, ?1 = 1, and ? = 2 under different levels of a scaled F6,6 contamination. 0.00330.00340.00350.00360.00370.00380.0039 0.004 0% 5% 10% 20% 30% 40% 50% level of contamination Figure 3.24: Variance of estimate of?1 for?0 = 0, ?1 = 1, and? = 2 under different levels of a scaled F6,6 contamination. 76 3.6 Conclusions and future research In this Chapter we analyze the sensitivity of the asymptotic variance of the M-estimators for GLMMs under a slight contamination of the mixing distribution. This article adds to previous work by presenting the CVF for this general class of estimators in GLMMs and analyzing the CVS in detail for the MLE of the Poisson-Gamma model and two mixed-effects Binomial models. In all cases, it was found that the MLE is V-robust when the distribution of the random effects is con- taminated by any other distribution sharing the first two moments with the nominal distribution. A simulation study was performed to illustrate the relevance of this result for the Poisson-Gamma model. In all simulated cases, the variance of the estimators remain almost constant under different levels of contamination. While the Poisson-Gamma model is attractive for its distributional closed form and its applicability, it might be interesting to examine other Poisson mixed models, as the Poisson-inverse Gaussian or the Poisson-lognormal. Moreover, other estimators suggested in the literature can be also studied. For example, we can derive the CVF of quasi-likelihood estimators for GLMMs or in particular, for Poisson-mixed models. The simulation study was performed to see the relevance of our theoretical result. It would be interesting to repeat this study for small samples to see if the asymptotic results still hold in the case of finite samples. 77 Chapter 4 Robust Instrumental Variables Estimator 4.1 Introduction A classical problem in linear regression arises when some of the covariates are ?endogenous?, that is, correlated with the error term in the equation to be estimated. In such a situation the ordinary least squares (OLS) estimator yields biased and inconsistent parameter estimates. A common approach to address this problem is to use additional information contained in variables that do not belong to the original equation but are correlated with the endogenous covariates. Under certain conditions, such ?instruments? can be used to construct ordinary instrumental variables (OIV) estimators that yield consistent parameter estimates. However, despite its widespread use, the OIV estimator is highly sensitive to outliers in the response, the covariates, and even the instruments. In this Chapter we propose a robust instrumental variables (RIV) estimator based on a robust multivariate location and scatter matrix estimator. Instead of estimating the regression parameters directly as a solution to a robust estimating equation, we robustify the solution of the ordinary estimating equations using high breakdown point S-estimators. We show that, when an appropriate S-estimator is chosen, our RIV estimator is bounded influence (i.e., resistant to extreme observations), consistent, and asymptotically normal. Since the ordinary instrumental variables estimator (OIV) and its most efficient version known as two-stage least squares estimator (2SLS) are extremely sensitive to aberrant observations, some robust instrumental variables estimators have been developed. Amemiya (1982) extended the least absolute deviation (LAD) estimator as an alternative to the 2SLS estimator. Powell (1983) shows the asymtotic normality of Amemiya?s estimator under weak conditions. However, like LAD, this estimator is not bounded-influence. Krasker and Welsch (1985) proposed an instrumental vari- 78 ables version of the earlier Krasker-Welsch estimator (Krasker and Welsch, 1982). Their estimator is a bounded-influence weighted instrumental variables estimator that downweights an observation only if its influence would otherwise exceed the maximum allowable influence. However, the esti- mator is complex and hard to implement. More recently, Flavin (1999) derived an instrumental variables version of the Huber (1973) estimator. Although the author claims that his estimator is easier to implement than the Krasker-Welsch instrumental variables estimator, such an estimator is not bounded-influence. Wagenvoort and Waldmann (2002) developed two bounded-influence instrumental variables estimators which are robust versions of 2SLS and generalized method of moments (GMM) estimators, respectively. These estimators are also complicated to implement and compute. We add to previous work by providing a robust instrumental variables estimator which is less computational expensive than those currently available. As our estimator is a natural extension of the ordinary instrumental variables estimator, it is also easy to implement and interpret. In addition, it is in the class of weighted instrumental variables estimators, which gives a simple way to flag outliers and influential points. These properties are extremely useful, specially when using high-dimensional datasets such as those used in data mining, where it is unfeasible to identify one aberrant point at a time or to use computationally demanding algorithms. We also provide an S-Plus/R algorithm to compute both the regression coefficients and the asymptotic covariance matrix estimates (available from the author). We also propose a diagnostic technique based on our robust covariance-based estimator. Our RIV estimator is a weighted instrumental variables estimator which downweights those points with high Mahalanobis distances. Thus, we propose to detect outliers in any of the variables by comparing the Mahalanobis distances of each data point to the quantiles of the chi-squared distribution with degrees of freedom given by the number of variables in the dataset. Equivalently, we can also look at the weights to flag outliers and leverage points. The remainder of this Chapter is organized as follows. In Section 4.2, we introduce our RIV estimator. In Section 4.3, we discuss some of its properties. In Section 4.4, we compute 79 the corresponding Influence Function and show that it is bounded, and we use it to derive a covariance matrix estimator of our RIV estimator. In Section 4.5, we use a real data example with measurement errors and we artificially contaminate it to compare the performance of our RIV estimator with that of the OIV estimator. In addition, we illustrate the use of our diagnostic techniques. Section 4.6 concludes. 4.2 Robust Instrumental Variables Estimator In this Section we propose a robust instrumental variables estimator. Instead of estimating the regression parameters directly as solutions to robust estimating equations, we robustify the solution of the ordinary estimating equations. We note that the OIV estimator, defined in (1.9), is a function of the sample mean and the sample covariance matrix. However, the sample mean and the sample covariance matrix are not robust estimators of the multivariate location and scatter matrix. Hence, the OIV estimator is extremely sensitive to outliers and influential points. Thus, we propose using a robust multivariate location and scatter estimator to construct a robust instrumental variables estimator (RIV) analogous to the OIV estimator. LetZi = (Xi,Wi,Yi)T, for i = 1,...,n, be the (2p+1)-dimensional vector of observations. Let (M,S) ?R(2p+1)?PDS(2p+1) be a robust multivariate location and scatter matrix estimator, where PDS(2p+ 1) is the set of all positive definite symmetric matrices of order 2p+ 1. We can split up the vector M and the matrix S accordingly. That is, M =parenleftbigMX,MW,MYparenrightbigT and S = ? ?? ?? ?? ? SXX SXW SXY SWX SWW SWY SYX SYW SYY ? ?? ?? ?? ? . (4.1) Consider the model described in Section 1.4.1. We define the RIV estimator of the regression coefficients ? as ??RIV = (??0,??1) = g(M,S) = (MY ?MX??1,S?1WXSWY). (4.2) Note that the RIV estimator defined in (4.2) reduces to the OIV estimator defined in (1.9) when 80 (M,S) is the sample location and scatter estimator. Maronna and Morgenthaler (1986) and Croux et al. (2003) proposed analogous estimators for classical regression models without endogeneity. We need a robust location and covariance matrix estimator to replace the sample ones. Many of these multivariate estimators are available, such as M-estimators (Maronna, 1976), Stahel-Donoho estimators (Stahel, 1981; Donoho, 1982), the Minimum Volume Ellipsoid and Min- imum Covariance Determinant estimators (Rousseeuw, 1984), S-estimators (Davies, 1987; Lop- uha?a, 1989) and componentwise and pairwise estimators (Gnanadesikan and Kettenring, 1972; Maronna and Zamar, 2002). It can be proved that the S-estimators are consistent and asymptoti- cally normal, affine equivariant, positive definite, bounded-influence and they achieve the maximal breakdown point (asymptotically 1/2) regardless of dimension of the data for an appropriate choice of ? (Davies, 1987; Lopuha?a, 1989). Thus, we use this family of estimators to summarize the data and construct our estimator. However, our estimator can be constructed using any other choice of multivariate location and scatter matrix estimator (see Section 1.5.5 for a further description of these estimators). For a finite sample Z1,...,Zn ?R(2p+1) the S-estimator is defined as the solution (M,S) to the problem of minimizing det(S) subject to 1 n nsummationdisplay i=1 ? parenleftBigradicalBig (Zi ?M)TS?1(Zi ?M) parenrightBig = b0 (4.3) among all (M,S) ?R(2p+1) ?PDS(2p+1) (Lopuha?a, 1989). In order to obtain robust estimates and preserve asymptotic normality the function ? must satisfy the following conditions: (R1) ? is symmetric, has a continuous derivative ? and ?(0) = 0. (R2) There exists a finite constant c0 > 0 such that ? is strictly increasing on [0,c0] and constant on [c0,+?). (R3) ?prime(y) and u(y) = ?(y)/y are bounded and continuous. The constant 0