ABSTRACT 
 
Title of Dissertation:                     FIRST ORDER AUTOREGRESSIVE MIXED 
EFFECTS ZERO INFLATED POISSON MODEL 
FOR LONGITUDINAL DATA ? A BAYESIAN 
APPROACH 
 
 
                                                       Chin-Fang Weng, Doctor of Philosophy, 2014                                                    
 
 
Dissertation Directed by:              Professor Robert J. Mislevy 
                                                      Department of Human Development and Quantitative 
                                                      Methodology 
 
 
 
The First Order Autoregressive (AR(1)) Mixed Effects Zero Inflated Poisson (ZIP) 
Model was developed to analyze longitudinal zero inflated Poisson data through the 
Bayesian Approach.  The model describes the effect of covariates via regression and time 
varying correlations within subject.  Subjects are classified into a ?perfect? state with 
response equal to zero and a Poisson state with response following a Poisson regression 
model.  The probability of belonging to the perfect state or Poisson state is governed by a 
logistic regression model.  Both models include autocorrelated random effects, and there 
is correlation between random effects in the logistic and Poisson regressions.   
Parameter estimation is investigated using simulation studies and analyses (both 
frequentist and Bayesian) of simpler mixed effect models.  In the large sample setting we 
investigate the Fisher information of the model.  The Fisher information matrix is then 
used to determine an adequate sample size for the AR(1) ZIP model.  Simulation studies 
demonstrate the capability of Bayesian methods to estimate the parameters of the AR(1) 
ZIP model for longitudinal zero inflated Poisson data.  However, a tremendous 
computation time and a huge sample size are required by the full AR(1) ZIP model.   
Simpler models were fitted to simulated AR(1) ZIP data to investigate whether 
simplifying the assumed random structure could permit accurate estimates of fixed effect 
parameters.  However, simulations showed that the bias of two nested models, ZIP model 
and mixed effects ZIP model, are too large to be acceptable.  The AR(1) ZIP model was 
fitted to data on numbers of cigarettes smoked, collected in the National Longitudinal 
Study of Youth.  It was found that decisions on whether to smoke and on the number of 
cigarettes to smoke were significantly related to age, sex, race and smoking behavior by 
peers.  The random effect variances, autocorrelation coefficients and correlation between 
logistic and Poisson random effect were all significant.               
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
FIRST ORDER AUTOREGRESSIVE MIXED EFFECTS ZERO INFLATED POISSON 
MODEL ON LONGITUDINAL DATA ? A BAYESIAN APPROACH 
 
 
 
by 
 
 
Chin-Fang Weng 
 
 
 
 
Dissertation submitted to the Faculty of the Graduate School of the 
University of Maryland, College Park, in partial fulfillment 
of the requirements for the degree of 
Doctor of Philosophy 
2014  
 
 
 
 
 
 
 
 
 
 
 
 
 
Advisory committee: 
 
  Dr.     Robert J. Mislevy, Chair 
  Dr.     Eric V. Slud 
  Dr.     George Macready 
  Dr.     Hong Jiao 
  Dr.     Jeffrey R. Harring 
 
 
 
 
 
 
 
 
 
 
 
 
 
?Copyright by 
Chin-Fang Weng 
2014 
 
 
 
 
 
 
 
 
 
 
  
 
DEDICATION 
To Paul 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ii 
 
 ACKNOWLEDGMENTS 
First and foremost, I must thank my advisor, Professor Robert Mislevy, for all his help 
and kindness while I was working on my dissertation.  Dr. Mislevy awakened my interest 
in Bayesian statistics, allowed me to choose my own research path, let me be an 
independent researcher, and stood by me throughout my career as a doctoral student.  I 
am grateful to him beyond what words can express. 
I owe a debt to Professor Eric Slud, who served on my Ph.D. committee and was my 
M.A. advisor in the STAT Program.  His ideas and suggestions greatly improved my 
dissertation, and he was always available for advice. 
Professors Jeffrey Harring, Hong Jiao and George Macready were kind enough to serve 
on my doctoral committee.   I appreciate their careful reading of this dissertation and their 
helpful and valuable suggestions.    
Professor James Yorke of Mathematics was my Advanced Calculus teacher and a 
constant source of encouragement.  He inspired me not only in my Master?s studies in 
STAT but also during my Ph.D. program in EDMS. 
I thank my colleague Mark Bauder for useful discussions that helped me to sharpen my 
thinking and understand my problem more deeply. 
Min Min gave me invaluable support and encouragement.  She is the perfect definition of 
a good friend. 
 
 
 
 
 
 
iii 
 
Table of Contents 
List of Tables?????????????????????????????.v 
List of Figures?????????????????????????????vii   
Chapter I: Introduction?????????????????...?.?????.1 
1.1    Motivation???????????????????...??..??????.1    
1.2    Theme of the Dissertation?????????????..??..??????..1 
1.3    Contributions?????????????????..??????????3 
Chapter II: Literature Review???????????????????..??..4       
2.1    Development of Analyses on Zero Inflated Count Data??????????.....4 
2.2    Poisson Model and Zero Inflated Poisson Models?????????????.6 
2.2.1    Generalized Linear Models: Poisson Model and Logistic Regression 
            Model??????????????.????????????..6 
2.2.2    Zero Inflated Poisson Model???????????...??????.7 
2.3    Mixed Effect Zero Inflated Poisson Models???????????????...8 
2.3.1    Generalized Linear Mixed Models???????????????10   
            2.3.2    Random Intercept Zero Inflated Poisson Model?????????....11 
2.4    AR(1) Generalized Linear Mixed Model???????????????..?12 
2.5    Bayesian Analysis?????????????????????????.14 
            2.5.1   Maximum Likelihood Estimation??????..?????????.14 
            2.5.2   Bayesian Inference..?????????????????????16 
                       2.5.2.1   Prior Distribution??????????????????..19 
                       2.5.2.2   MCMC Estimation??????????????????19 
                       2.5.2.3   Program Language for Bayesian Inference Using Gibbs Sampling 
iv 
 
                                     ?????????????????????????...25      
Chapter III: AR(1) Mixed Effects ZIP Model?..??????????????.32 
3.1    Data Example???????????????????????????32 
3.2    AR(1) Mixed Effects ZIP Model???????????????????..34 
3.2.1    Fixed Effect ZIP Model???????????????????.34 
3.2.2    Mixed Effect ZIP Model for Repeat Measure Data????????. .36 
3.2.3    AR(1) Mixed Effects ZIP Model for Longitudinal Data??.????..37 
Chapter IV: Research Questions and Pilot Study??????????????..41 
4.1    Research Questions????????????????????????..41 
4.2   Pilot Study????????????????????????????.42 
             4.2.1   ZIP, MIXED, and AR(1) Model Features????????????.42 
             4.2.2   Generating Simulated Longitudinal Zero Inflated Poisson Data???...44 
                         4.2.2.1   Data Structure of NLSY97???????.???????44 
                         4.2.2.2   Generating Simulated Data??????????????.49 
            4.2.3   Program Language for Bayesian Inference Using Gibbs Sampling (BUGS) 
                       ?????????????????????????????.52 
            4.2.4   Use of Fisher Information for Determining Sample Size??????...55 
                         4.2.4.1   Ideas and Approach?????????????????.56 
                         4.2.4.2.   Analyses of Fisher Information of GLMM on MLE and MCMC 
                                       ?????????????????????????.60 
                         4.2.4.3   MCMC Analyses of Fisher Information of AR(1) Model and 
                                       Determining the Sample Size?????????????..67 
                         4.2.4.4   The Most Difficult Parameters to Estimate in the AR(1) Model 
v 
 
                                       ?????????????????????????.69 
Chapter V: Data Analyses???????????????????????72 
5.1   Model Comparison?????????????????????????.72 
5.2   NLSY97 Data Analysis???????????????????????..80 
Chapter VI: Conclusion and Future Research????..???????????.88 
6.1   Summary??????????????????????...??????.88 
6.2   Topics for Future Research??????????????????????88 
Reference???????????????????????? ??????..90 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
vi 
 
List of Tables 
1.1   Hierarchy of ZIP Models ??..?..????????????????...??3 
3.1   Description of Smoking Data in the NLSY97 at Time 1?...?????..??.?.34 
 
4.1   ZIP, MIXED, and AR(1) Model Structures????..???????????.42 
4.2   ZIP, MIXED, AR(1) Model Computation Requirement at Each MCMC Iteration 
              ??..?????????????????????????..???..43 
4.3   Number of Cigarettes Smoked Per Day from Year 1997 to Year 2010 ???.?..45 
4.4   Descriptive Statistics of NLSY97 from Year 1997 to Year 2001????.??....46 
4.5   Number of Cigarettes Smoked Per Day in Year 1997???????????..47 
4.6   Correlations of Smoking Status, Non-smoker or Smoker, from 1997 to 2001??.48 
4.7   Correlations of Number of Cigarettes Smoked, from 1997 to 2001??????.48 
4.8   Descriptive Statistics of Simulated Data???????????????...?50 
4.9   Correlations of Zero Indicator, from Year 1 to Year 5??????????.?51 
4.10   Correlations of Poisson Counts, from Year 1 to Year 5????.?..???.?..51 
4.11   Parameter Values of Simulated Data??..??????????.?????52 
4.12   Parameter, Prior, and Initial Values of Simulated Data??????????...54  
4.13   Logistic Regression, ? = 300 , ? = 5???...???????.??????62 
4.14   Logistic Regression, ? = 1200 , ? = 5???...???????.????.?.63 
4.15   Logistic Regression, ? = 10000 , ? = 5???????????.???.?..64 
4.16   Poisson Regression, ? = 300 , ? = 5?????????????.??...?.65 
4.17   Poisson Regression, ? = 1200 , ? = 5????...????????.???..65 
4.18   Poisson Regression, ? = 10000 , ? = 5??????????????..?..66 
4.19   AR(1) Model, ? = 2000 , ? = 5??????????????????....67 
4.20   AR(1) Model, ? = 5000 , ? = 5?????????????????...?.68 
4.21   The Ten Eigenvalues of AR(1) Model??..??????????????..69 
vii 
 
4.22   Parameters and Eigenvectors of AR(1) Model???..??????.????71 
5.1   Estimates of Fixed Effect of ZIP, MIXED, and AR(1) Models???..????.77 
5.2   NLSY97 Data Coding Example?.?????????????????.?.81 
5.3   Numbers of Cigarettes Smoked Per Day from Year 1997 to Year 2001?..??....83 
5.4   Numbers of Cigarettes Smoked Per Day in Year 1997??..???????.?.83 
5.5   Correlations of Number of Cigarettes Smoked, from 1997 to 2001????..?...83 
5.6   NLSY 1997 ~ 2001 Data Analysis Result (AR(1) Model)....?..??????.?85 
5.7   NLSY 1997 ~ 2001 Data Analysis Result (MIXED Model).?..??????.?86 
5.8   NLSY 1997 ~ 2001 Data Analysis Result (ZIP Model)???..??????.?86 
5.9   Comparison of AR(1), MIX, and ZIP Model Estimates of Means?.....??..??89  
 
  
viii 
 
List of Figures 
5.1       95% Credible Interval of Beta_0 Fixed Effects?...???????????..78 
5.2       95% Credible Interval of Beta_1 Fixed Effects?...?????????..??78 
5.3       95% Credible Interval Beta_2 Fixed Effects????...???????..??79 
5.4       95% Credible Interval Gamma_0 Fixed Effects?...??????????.....79 
5.5       95% Credible Interval Gamma_1 Fixed Effects????...????????.80 
  
ix 
 
Chapter I 
Introduction 
1.1   Motivation 
It is commonly seen that a set of count data contains an excess of zeros relative to 
standard distributions for such data, so that data analyses encounter an over-dispersion 
problem.  Such zero inflated data appear in many fields, such as social science, medical 
research, and industrial processes.  Many researchers have studied this problem and 
developed various zero-inflated models in response.  This dissertation concerns 
extensions of the zero-inflated Poisson (ZIP) model.   
   
1.2    Theme of the Dissertation 
The ZIP model originally was introduced by Lambert (1992).  The ZIP model 
treated the zero inflated count data as coming from a mixture distribution: one component 
is a Poisson distribution, where y = 0, 1, 2, ?, and the other component is a zero state, 
which generates zero counts only.  Note that, when a count is zero, it may have come 
from either the Poisson state or from the zero state.  This is why the ordinary Poisson 
model has an overdispersion problem when applied to ZIP data.  Sometimes, the zeros 
can be 60%, 70%, or higher in the data set, which introduces bias into an analysis using a 
standard model for counts, such as the Poisson.   
In the ZIP model, there are two Generalized Linear Models (GLMs): a logistic 
model and a Poisson model.  The logistic model yields a binary outcome and filters the 
count data into two classes, either from the Poisson class or from the zero class.  The 
Poisson model component works as an ordinary Poisson regression; that is, the log of the 
1 
 
Poisson rate parameter can be modeled as a linear function of covariates.  Note that these 
two models work simultaneously.  
Later the ZIP model was extended to the mixed effect ZIP model (Neelon 
O?Malley, & Normand,  2010) by introducing random effects in to the model; that is, a 
random intercept logistic model and a random intercept Poisson model (a General Linear 
Mixed Model, or GLMM).  With the random effects, the mixed effect ZIP model can 
address the correlation within subjects encountered in analyses of repeated measurements 
data, clustered data, or meta-analysis.   
In this study, the mixed effect ZIP model is extended to a first-order regressive or 
AR(1) mixed effect ZIP model.  That is, the ZIP model additionally has an AR(1) random 
intercept logistic model and an AR(1) random intercept Poisson model with respect to the 
time-ordered aspect of observations.  The AR(1) structure of the random effects thus 
serves to model the correlation arising from time waves in longitudinal data.  Moreover, 
the AR(1) random effects in the logistic and Poisson models can be correlated.  The 
AR(1) mixed effect ZIP model is an extremely general model such that previously 
published models are special cases of AR(1) mixed effect ZIP model.  A thorough 
literature review has shown no prior publication of work on an AR(1) mixed effect ZIP 
model. 
Our investigations of this new model have shown estimation within a Bayesian 
framework to be possible but arduous.  The focus of the dissertation research therefore is 
to understand the importance of including all random components, in the sense of 
robustness.  That is, if data have been generated by a process described by the full model, 
what are the qualities of estimates of fixed-effects parameters in the full model (which are 
2 
 
typically the effects of primary interest in applied work) in comparison with successively 
simpler sub-models? 
Table 1.1 presents a hierarchy of models that we will examine.  The Poisson 
Model is the simplest one, and the model structural complexity is increased from left to 
the right, to the AR(1) Mixed Effect ZIP model.  We want to investigate the robustness of 
the AR(1) model in terms of regression coefficient estimations.    
Table 1.1   Hierarchy of ZIP Models 
Components  Poisson Model ZIP Model Mixed Effect 
ZIP Model 
AR(1) Mixed 
Effect ZIP 
Model 
GLM (Poisson)  X X   
GLM (Logistic)  X   
GLMM(Poisson)    X X 
GLMM(Logistic)   X X 
Corr of GLMMs   X X 
AR(1)    X 
 
1.3    Contributions 
 This dissertation contains the following contributions: 
a) We develop a new model ? the AR(1) ZIP model ? which can better fit longitudinal 
count data. 
b) The AR(1) model is implemented by a Bayesian estimation method through BUGS 
software. 
c) The AR(1) ZIP model requires large samples.  We show how to approximate the 
Fisher Information of the model and use it to determine the needed sample size.   
d) For simulated data generated according to the AR(1) ZIP model, simulation studies 
show that simpler models (ZIP with no random effects or ZIP with random intercepts) 
do not yield valid estimates of regression parameters. 
3 
 
Chapter II  
Literature Review 
2.1   Development of Analyses of Zero Inflated Count Data  
Poisson regression models provide a standard framework for the analysis of count 
data.  However, in practice, count data are often overdispersed compared to the Poisson 
distribution with the same mean.  One cause of overdispersion is that the zero counts 
exceed the expected frequency in the Poisson distribution.  Examples of zero-inflated 
data appear in many fields, such as road safety (Miaou, 1994), sexual behavior (Heilbron, 
1994), and reading tests (Jansen, 1986).  The excessive zero counts happen when the data 
contains both structural zeros and sampling zeros (Ridout, Demetrio, & Hide, 1998).  
Suppose that the underlying population is a mixture of two groups, say perfect and 
imperfect.  Members of the perfect class will produce zero errors only.  That is, one 
mechanism generates only zeros, while the other process generates both zero and nonzero 
counts.  When this situation happens, the standard Poisson model will not describe this 
data well and will have an overdispersion problem.  In other words, there are too many 
counts at higher values in relation to the Poisson mean, and thus to the best-fitting 
Poisson.  An example comes from a manufacturing production process with the following 
characteristics: the production process is in a near perfect state so that zero defects are 
observed with high probability.  However when the environment changes to an imperfect 
state, defective units may be produced.  The environmental changes are unobservable and 
random.  In this practical case, the proportion of observed zeros exceeds what a Poisson 
model would predict.     
4 
 
Several classes of discrete mixture models have been proposed for zero-modified 
data. These include hurdle models (Mullahy, 1986; Heilbron, 1994) and zero-inflated 
models (Lambert, 1992).  The hurdle model is a two-stage model for count data. One part 
is a binary model for whether the response outcome is zero or positive. Conditional on a 
positive outcome, the second part uses a truncated count model that modifies an ordinary 
distribution by conditioning on a positive outcome. For instance, this might be a 
truncated Poisson distribution or a truncated negative binomial.  The hurdle model can 
handle zero inflation or deflation.  The zero-inflated model (Lambert, 1992) only handles 
zero inflation. Two types of zeros can occur: one comes from the zero state and the other 
from the ordinary count distribution state. That is, the distribution is a mixture of an 
ordinary count model, such as the Poisson or negative binomial, with one that is 
degenerate at zero.  Min and Agresti (2005) argue that zero-inflated count models are 
more natural than a hurdle model and that one should think of the population as a 
mixture. 
Lambert (1992) proposed the Zero Inflated Poisson (ZIP) model, which contains 
logistic regression and Poisson regression components, and additional applications 
appeared.  Based on the features of data, extended ZIP models were developed.  Ghosh, 
Mukhopadhyay, & Lu  (2006) extend the ZIP models to a broad class of distributions 
(e.g., power series distributions) to fit zero-inflated data with a relatively small to 
moderate sample size.  Some data observations are correlated, such as repeated 
measurements, spatially correlated data, or longitudinal data, so random effects were 
introduced into models.  Hall (2000) incorporated one random intercept effect into the 
Poisson regression only, but not in the logistic regression, to account for correlated 
5 
 
responses in a repeated measures designed experiment.  Neelon, O?Malley, & Normand  
(2010) included correlated random effects in both components of the model.   
Many authors took frequentist approaches to estimate the parameters in ZIP 
models.  In the case of a mixed effect ZIP model the computational techniques are 
complicated and require approximations of high dimensional integrals (Hall, 2000; Min 
& Agresti, 2005).  For applications in meta-analysis (as encountered in the author?s work; 
Weng, 2008), it was not until 2008 that Bates created the logistic random intercept 
computation function, glmer ( ), provided in R software (R, 2008).  Recent versions of 
SAS software can fit random effect GLM models using PROC GLMMIX.  Certain more 
general mixed effect regression models, but neither the ZIP model nor the mixed effect 
ZIP model can be estimated using SAS PROC NLMIXED.   
 
2.2   Poisson Model and Zero Inflated Poisson Model 
2.2.1   Generalized Linear Models: Poisson Model and Logistic Regression Model 
The unity of many statistical methods was demonstrated by Nelder and 
Wedderburn (1972) using the idea of the Generalized Linear Model (GLM).  GLMs 
extend ordinary regression models to encompass nonnormal response distributions and 
modeling functions of the mean (Agresti, 2002).  That is, response variables may have 
distributions other than the normal distribution ? they may even be categorical rather than 
continuous.  Also, the relationship between the response and explanatory variables need 
not be of the linear form.  
Consider the Poisson regression model: in the Poisson distribution, the probability 
function for the discrete random variable Y is ?(?,?) =
?????
?!
.  where ? takes the values 
6 
 
0, 1, 2, ?.  This can be rewritten as ?(?,?) = exp(? log? ? ? ? log(?!)).  In a sample 
(?? ,??), ? = 1, ? ,?, the ?? are independent Poisson random variables with expected 
value ?[??] = ?? = exp(??
??),  where ?? is a covariate vector and ? is an unknown 
parameter vector.  The link function, ?? = ?(??) = ??
??, relates the mean response ?? 
to ??. Consider the logistic regression model: the ?? are independent Bernoulli random 
variables with expected values ?[??] = ?? =
exp (??
??)
1+exp (??
??)
.  There, the logit transformation, 
? = log ?
?
1??
?, serves to link the linear predictor, ?? = ??
??, to the mean response: 
?(??) = ?(??) = log ?
??
1???
? = ??
??.  Note that ?? lies in the interval [0, 1] and ?? lies in 
(??, +?).     
 
2.2.2   Zero Inflated Poisson Model 
One often assumes that a set of count data has a Poisson distribution.  In many 
practical cases, however, the proportion of observed zeros exceeds what a Poisson model 
would predict.  The zero inflated Poisson (ZIP)  model accounts for the excess zeros by 
assuming the data are drawn from a mixture of a Poisson population and a population 
producing zeros with probability one. 
In a ZIP model, there are two generalized linear models (GLM): a Poisson 
regression model and a logistic regression model. The ZIP model accounts for the excess 
zeros by assuming the data are drawn from a mixture of a Poisson population and a 
degenerate distribution at zero. That is, the random variable equals zero with probability 
one; there is no variation.  In ZIP regression, the responses Y are independent and can be 
written as: 
7 
 
                               Pr(? = ?) = ??1 (?) +  (1?  ?)?2(?; ?),                                          (2.1)   
where ?1  (?) =  ?
1 ?? ? = 0,
0 ?? ? ? 0,
 
          ?2(?;  ?) = ?
????/?!, 
          log(?) = ??,  
          logit(?) = log ?
P
1? ?
? =  ??.  
The function ?2(?; ?) is a standard Poisson distribution for modeling counts with mean ? 
and support {0, 1, 2,?}, X and Z are covariate matrices, and ?  is a mixture proportion 
with 0 ? ? ? 1.   Note that the covariate matrix X, that affects the Poisson mean, may or 
may not be the same as the covariate matrix Z, that affects the probability of perfect state 
membership.  
 
2.3.   Mixed Effect Zero Inflated Poisson Model  
Fixed Effect vs. Random Effect   
In a study, variable effects are either fixed or random depending on how the levels 
of the variables that appear in the study are determined.  If one imagines that an 
experiment is repeated and the levels of an effect are identical, that effect would be 
considered as a fixed effect.  If the levels of an effect change in an uncontrollable way 
when the experiment is repeated, the effect would be considered as a random effect.  The 
levels of a random effect are regarded as a sample from some distribution. 
An example of fixed effect occurs in a clinical trial analyzing the effectiveness of 
three drugs. Assume that the three drugs are the only candidates for the clinical trial, that 
the subjects are a representative sample from a population of interest and that the 
conclusions of the clinical trial are restricted to just those three drugs.  Then the effect of 
8 
 
the variable drug is a fixed effect, since the drug factor in the study represents all levels in 
which inferences are to be made.   
Suppose instead that the clinical trial was performed in ten clinics, that the 
subjects in a clinic are drawn from the population served by that clinic and that the ten 
clinics are a sample from a larger population of clinics.  A replication of the trial would 
use a different random sample of clinics.  The factor of interest is still drug, and clinic is a 
nuisance factor. The conclusions of the clinical trials are not restricted only to the patients 
served by the ten clinics but rather to the population of patients served by all clinics. The 
ten clinics are a random sample of the clinic population.  The variable clinic can be 
treated as a random effect variable and inferences are valid for the population of patients 
served by all clinics. 
In real world problems, there may be random subject-to-subject variation not 
explained by covariates.  So a random subject effect is introduced into models.  In a case 
of clinical meta-analysis data, the individual subject parameter is not of research interest 
and there may be many subjects; to eliminate nuisance subject parameters, a random 
subject effect is introduced into models.  In a clustered or repeated measures design, 
observations nested within subjects are correlated with each other while observations on 
different subjects are independent of each other.  So again a random subject effect is 
introduced into such models.  The resulting model is a mixed effect model; that is, the 
model includes both fixed effects for the covariates and the random effect for subjects 
typically with the assumption that the conditional responses are normally distributed.     
 
 
9 
 
2.3.1   Generalized Linear Mixed Model 
A random effect vector u is incorporated into the GLM as follows:  Assume 
???|? ~????|????? ????  are conditionally independent, given u, 
???????? = ???, 
?????? =  ??? = ???
? ? +???
? ??, 
? ~ ??(?), 
where Wij represents a vector of covariates.  The so-called dispersion parameter ? is often 
known to be 1, as in binomial or Poisson models.  The resulting model is a generalized 
linear mixed model (GLMM).   
Note that the random effect in a GLMM can be a random intercept effect, a 
random slope effect, or both.  A growth model, which is often used in agricultural 
research studies, is an example of a GLMM having random intercept and random slope 
effects.  An example of a GLMM having random intercept effect only is the Rasch model 
in item response theory (IRT) (Rasch, 1960).  In the Rasch model, item responses 
(? = 1,2, ? , ?) are nested within subject(? = 1,2, ? ,?).  The probability of a correct 
response to the dichotomous item ? (??? = 1) conditional on the random effect or 
?ability? of subject ? (??) in terms of the logistic cdf is ????? = 1???? = ???? ? ???, 
where ?(?) is the cumulative logistic function, and ?? is the item parameter.  In the Rasch 
model, subjects are modeled as a random effect and items are included as covariates.  
Note that in IRT literature the subject ability usually is denoted as ? instead of ? 
(Hedeker & Gibbons, 2006).  A special case of GLMM is no random effect at all.  That 
is, one drops the u term, so that the model reduces to a GLM. 
10 
 
The following is an example of a random intercept effect GLMM in a repeated 
measurement data setting.  Assume there are ? = 1, ? ,? subjects and ? = 1, ? , ? repeated 
observations nested within each subject.  A random-intercept model augments the linear 
predictor with a single random effect for subject i, 
??? = ???
? ?+ ?? 
where ?? is the random effect (one for each subject).  These random effects represent the 
influence of subject ? on his/her repeated observations that is not captured by the 
covariates.  These are treated as random effects because the sampled subjects are thought 
to represent a population, and their corresponding effect terms are assumed to be 
distributed as ?(0,??
2).  The parameter ??
2 indicates the variance in the population 
distribution, and therefore the degree of heterogeneity of subjects.  The conditional mean 
of ??? , denoted as ???, is specified as ????????,????.  Note that the assumption of normal 
distributions for the random effects is not necessary; one can have some distribution other 
than normal.  However some prefer to use the normal distribution for ease of 
interpretation (Weng, 2008).       
 
2.3.2   Random Intercept Zero Inflated Poisson Model 
In the previous section we presented a basic fixed-effects ZIP model.  Now we 
expand the basic ZIP model by incorporating subject random effects.  For a random 
effect ZIP model, the two generalized linear models are extended to two generalized 
linear mixed models (GLMM): mixed effect Poisson regression model and mixed effect 
logistic regression model:   
                                           ?????????? = log ?
?ij
1? ???
? =  ???? +  ??                                      (2.2)  
11 
 
                                        log????? = ???? + ??                                                                      (2.3) 
where ?  and ? are vectors of fixed effect coefficients and ??  and  ??  denote random 
effects with  
                    ??  ~ ?(0,??
2 ), 
                    ??  ~ ?(0,??
2 ).  
The normal random intercepts (??  and  ??) capture the heterogeneity of subjects.  The 
variances ??
2 and ??
2 measure the differences among subjects.  Note that here the subject 
effects do not change over time.  Neelon, O?Malley, & Mormand (2010) allowed ?? and 
 ?? to be correlated. 
 
2.4    AR(1) Mixed Effects Model 
There are designs in which data are collected in such a way that data points are 
correlated, such as cluster sampling data, matched pair data, repeated measures data, time 
series data, and longitudinal data.  For repeated measures data, observations taken on the 
same subject tend to be more similar than observations taken on different subjects.  The 
assumptions of the repeated measures models used in psychological research are that the 
measurements have equal variances at all time points and that the correlation between 
measurements on the same subject is the same regardless of the time lag between 
measurements.  For time series data, observations taken close in time tend to be more 
similar than measurements taken far apart in time, so correlations decrease as the time lag 
increases.  Longitudinal data typically exhibits the features of time series data.   
In mixed models, the variance-covariance matrix of the observations involves the 
covariance structure of the random effects and the covariance structure of the random 
12 
 
errors.  We need to select an appropriate covariance structure for the random errors.  If 
one chooses a structure that is too simple, one risks increasing the Type I error rate; if too 
complex, one sacrifices power and efficiency. 
The first-order autoregressive covariance structure takes into account a common 
trend in longitudinal data.  The covariance pattern model with first order autoregressive 
[AR(1)] structure is commonly used in time series analysis to describe the correlation 
structure.  The AR(1) process only depends on two parameters.  The covariance for time 
points j and ?? is 
    ???? = = ?
2?|???
?| , 
where ? is the AR(1) parameter and ?2 is the error variance.  The basic idea is that 
random effects are autocorrelated over time and the correlations decrease as observations 
are further apart in time.  The assumption in the AR(1) model is that the longitudinal data 
are equally spaced. This means that the distance between time 1 and 2 is the same as time 
2 and 3, time 3 and 4, and so on.  
In the previous section, a GLMM model was described to handle repeated 
measures data.  For longitudinal data, AR(1) random effects are incorporated into GLMM 
to yield models that allow for subject heterogeneity and some form of time dependence of 
the errors (Sun, Speckman, & Tsutakawa , 2000).  Other correlation structures for time 
series data might have been chosen (e.g., AR(p) with p > 1 or moving average, MA(q), q 
> 0), but AR(1) is simple, is easy to simulate and captures the time series correlation 
structure.  See Chatfield (2009) for more details on time series models. 
 
 
13 
 
 2.5   Bayesian Data Analyses 
2.5.1   Maximum Likelihood Estimation 
For a parametric statistical model ?, a common estimator is the maximum 
likelihood estimator.  Let ?1,? ,?? be i.i.d. with PDF ?(?;?) where ? ? ?, and  ? is 
the parameter space.  The likelihood function is 
??(?) = ?(?1,? ,??;?) =??(??;?)
?
?=1
. 
The log likelihood function is ??(?) = log [??(?)].  The maximum likelihood estimator 
(MLE) ????  is a point in the parameter space such that ???????? = sup??? ??(?).  
The MLE has several nice properties (Bickel & Doksum, 2007, pp. 331-332).  
They are stated here for a scalar parameter, but similar results are true in the 
multiparameter case. 
a) Consistency ? When ? is identifiable, under general conditions, the MLE 
????
?
? ??, where ?? is the true value of the parameter ?. 
b) Asymptotic Normality ? Given n i.i.d. observations, the Fisher information 
(Bickel & Doksum, 2007, p. 180) is defined as ??(?) = ??? ??
?
??
????(?;??
2
? , 
and under regularity conditions this is also equal to ???? ?
?2
??2
????(?;?)?.  
Let ??? = ?????(????).  Under appropriate regularity conditions, ??? =
?1/??(?), and 
?????
???
?
? ?(0,1), as ? ? ?.  Furthermore, let ???? = ?1/??(????).  
Then 
?????
????
?
? ?(0,1).   
14 
 
c) Cramer-Rao Lower Bound ? Let ??? be any unbiased estimator of ?.  Its variance 
is bounded below by the inverse Fisher information as ? ? ?: ????????? ?
1
??(?)
.  
The limiting variance of the maximum likelihood estimator is equal to this lower 
bound under suitable regularity conditions (Bickel & Doksum, 2007, p. 331). 
The Fisher information plays a central role in maximum likelihood estimation.  
Let?s take a closer look at the Information Matrix.  The asymptotic variance-covariance 
matrix of the ML estimator of a vector valued parameter ?, is calculated as the inverse of 
the Fisher information matrix, [?(?)]?1 (Bickel & Doksum, 2007, pp. 331-332, pp. 386-
387; Shao, 2003, pp. 290-292) where 
           ??(?) = ?[{(
?
??
) log ?(?)}{(
?
??
) log ?(?)}?] 
                                                  = ??[?
?2
?????
? log ?(?)].                                      (2.4) 
Thus, the asymptotic variance-covariance matrix of ???? is: 
a-var-cov?????? =  [??(?)]
?1                                     
                                                                 = ??? ?
?2log?(?)
?????
??
?1
                          (2.5) 
From above, we see that the inverse of the information matrix is exactly the same as the 
Cramer-Rao lower bound.     
Asymptotic Efficiency 
Suppose that ?? is a consistent, asymptotically unbiased estimator of ?.  Under 
regularity conditions the asymptotic variance of a suitably normalized version of ?? is 
bounded below by the inverse Fisher information.  A multiparameter version of this 
bound is also available.  (This is an asymptotic version of the Cramer-Rao bound.)  An 
15 
 
estimator whose asymptotic variance achieves this lower bound is said to be 
asymptotically efficient. 
Eigenvalues and Eigenvectors of the Fisher Information Matrix 
Suppose that one has a sample of independent and identical observations 
(?1, ? ,??)  whose distribution depends on a multidimensional parameter ? ? ?
?.   
Let ?1= Fisher information about ? in one observation, and let  
??(?) = ?? = ??1 = ??? ?
?2
?????
????(?)? = Fisher information in N observations.   
Under regularity conditions, asymptotically 
??????? ? ??
?
? ?(?, ?1
?1), 
? ???? ? ?(?,?
?1?1
?1). 
where ??1?1
?1 is the Cramer-Rao lower bound for an unbiased estimator. 
Furthermore, ?1  can be written 
?1 = ???
? 
where ? = diag??1, ?2,? , ???,    ?1 ? ?2 ? ? ? ?? are eigenvalues, the matrix 
? = ??1,?2,? ,???,  and its columns are eigenvectors.  Furthermore, ??
? = ??? = ?, so 
that ??
? ?? = ?
1   if ? = ?
0   if ? ? ?
.  Since ??????
? ??? ?
1
?
?
1
??
?, for all k =1,?  ,p,  the ?hardest? 
estimation problem in terms of variance is to estimate ??
? ?? (Shao, 2003, pp. 290-292). 
 
2.5.2   Bayesian Inference  
The frequentist statistical methods have the following features: 
a) Probability refers to limiting relative frequency. 
b) Data are random. 
16 
 
c) Estimators are functions of data, so they are random. 
d) Parameters are fixed, unknown constants.  They are not subject to probabilistic 
interpretation. 
e) Procedures are subject to probabilistic statements.  For example, a 95% 
confidence interval traps the true parameter value in 95% of all possible samples. 
An alternative is the Bayesian approach: 
a) Probability refers to degree of belief.  Therefore parameters are random and 
described probabilistically. 
b) Inference about a parameter ? is carried out by computing its probability 
distribution given the observed data.  One starts with a prior distribution ?(?) and 
also chooses a likelihood function ?(?|?) ? note this is a function of ?, not ?.  
After observing data ?, one applies Bayes? Theorem to obtain the posterior 
distribution ?(?|?): 
?(?|?) =
?(?)?(?|?)
??(? ?)?(?|? ?)?? ?
? ?(?)?(?|?), 
where ? ?  ??(? ?)?(??? ?)?? ?is known as the normalizing constant.  
c) For large n and under regularity conditions, posterior distributions are 
approximately normal.  Specifically ?(?|??) ? ? ????, ??
?2
??2
 log????????
?1
? 
(Bickel & Doksum, 2007, p. 339) where  ???is the Bayesian estimate.  That is, the 
posterior distribution converges to this limiting normal distribution.  This is a 
consequence of the Bernstein-von Mises theorem stated in the next section. 
 
 
17 
 
Asymptotic Equivalence of Bayes and M.L. Estimators 
A precise statement of the large sample behavior of the posterior distribution is 
given in the Bernstein-von Mises theorem.  
Bernstein ? von Mises (BvM) Theorem: Under suitable regularity conditions, the 
posterior distribution is asymptotically normal with asymptotic mean equal to the 
maximum likelihood estimator (MLE) and asymptotic variance equal to the inverse of the 
total Fisher information matrix, as for the MLE.  See Bickel & Doksum (2007, p. 339). 
This theorem has the following important consequences.  Let ??? be the MLE 
based on n observations, let ??? be the posterior mean based on n observations, and let ?
? 
be the true value of the parameter ?.  Under general conditions that apply in exponential 
families: 
a) ???? ? ??? decreases to 0 in probability at rate 
1
??
 , and ???? ? ?
?? decreases to 0 in 
probability at rate 
1
??
.  Thus, the two estimators converge to the truth at the same 
rate.  
b) ???? ? ???? decreases to 0 in probability at rate 
1
?
.  Therefore, for large n, the 
difference between the posterior mean and the MLE is of smaller order of 
magnitude than the error of estimation due to sampling as n goes to infinity. 
c) The asymptotic variance for the MLE is the inverse of the expected information:  
?(?)?1 = ???? ?
?2
??2
log ??(??|?)??
?1
.  This is the same as the variance of the 
asymptotic posterior distribution.  The variance of the approximate posterior 
distribution is the observed information:  
 ?????? = ?
?2
??2
ln ??(??|?) |?=?? .   
18 
 
 2.5.2.1   Prior Distribution 
A standard maximum likelihood inferential model is often formally identical to a 
Bayesian model in which the prior probability distribution is an appropriate uniform 
distribution function, although often the prior distribution is improper: its integral is 
infinite rather than equal to one.   
Under suitable regularity conditions, Bayes estimates and maximum likelihood 
estimates are asymptotically identical for any proper prior distribution (Chao, 1970) with 
support in a neighborhood of the true parameter values.  Therefore with sufficiently large 
sample size the choice of the prior has a negligible effect on the inference.   
A common frequentist criticism of the Bayesian approach is that ?subjective? 
priors have great impact on the posterior distribution for problems with small sample 
sizes.  There is a developing literature on robust Bayesian inference that seeks to mitigate 
this problem by developing estimators that are relatively insensitive to a wide range of 
prior distributions (Berger, 1984).  The strongest argument for inclusion of priors is that 
there often exists scientific evidence at hand before the statistical model is developed and 
it would be foolish to ignore such previous knowledge.  Furthermore, a formal statement 
of the prior distribution is an overt, nonambiguous assertion within the model 
specification that the reader can accept or dismiss (Gelman et al. 2003, p. 14).  
 
2.5.2.2   MCMC Estimation 
The integral in the denominator of the formula for the posterior distribution  
?(?|?) =  
?(?)?(?|?)
? ?(?)?(?|?)??
 
?
 
19 
 
is often impossible to evaluate analytically even if  the integrand can be expressed in a 
simple form.  In such a case, one can not calculate the posterior distribution explicitly and 
therefore one can not calculate expressions such as the posterior mean in closed form.  
Moreover, in many cases, one can not draw samples directly from the posterior to 
approximate such quantities by simulation.  In such problems, Markov chain Monte Carlo 
(MCMC) enables one to draw samples which are approximately distributed according to 
?(?|?) and thereby one can approximate quantities related to ?(?|?) (Gelman et al., 
2003, pp. 285-287).  
Standard Monte Carlo simulation would generate independent random variables 
?1, ? ,?? distributed exactly according to ?(?|?).  These values would be used to 
approximate the posterior expectation of some function ?(?), ?[?(?)|?],  as an average: 
?[?(?)|?] = ??(?)?(?|?)?? ?
1
?
??(??)
?
?=1
. 
?[?(?)|?] might be a posterior mean or higher order moment of the parameter ?.  
MCMC methods approximate integral quantities by drawing a very long sequence 
of values ?1, ? , ?? from a Markov chain whose limiting probability distribution exists 
and is equal to ?(?|?).  The primary distinction between standard Monte Carlo 
simulation methods and MCMC methods is the dependence structure of consecutive 
simulated values.  Standard Monte Carlo methods produce a set of independent simulated 
values according to some desired probability distribution.  MCMC methods produce 
chains in which each of the simulated values is dependent on the preceding value.  The 
basic principle is that if a properly constructed chain has run long enough, it will produce 
an almost stationary time sequence of approximately identically distributed values whose 
20 
 
marginal distributions will be extremely close to the posterior distribution of interest.    
One can summarize this distribution by letting the chain wander around to generate a 
large approximately representative sample from the limiting distribution and then 
producing summary statistics from the simulated values (Gelman et al., 2003, pp. 285-
287). 
 
The Gibbs Sampler 
The Gibbs sampler is a widely used MCMC technique for producing useful chain 
values.  It requires specific knowledge about the nature of the conditional relationships 
among the variables of interest.  The basic idea is to generate a new realization of each 
variable from its so-called full conditional, which is defined as its conditional 
distribution, given the current values of all the other variables and the observed data.  
Then by cycling repeatedly through these conditional random draws, one parameter at a 
time, we can eventually approach the joint distribution of interest (Gelman et al., 2003, 
pp. 287-289).  
The Gibbs sampler generates a sequence of k dimensional random vectors 
?? = [??1? ,???], ? = 1,2, ? using a transition mechanism based on sampling from the 
conditional distribution?(???|??1? ,??,??1,???1,?+1? ,???1,?, ?).   This is a Markovian 
updating scheme based on sampling from the conditional probability distributions.  The 
limiting distribution of interest is the posterior distribution, p(?|Y). Regularity conditions 
are needed to assure the convergence of the Gibbs sampler to the desired limiting 
distribution (Shao, 2003, pp. 247-248). 
21 
 
There should be an analytically definable full conditional distribution for each 
component of the ? vector and these probability distributions need to be written explicitly 
so that it is possible to draw samples from the described distribution.  However, Gibbs 
samplers can work also when the conditionals themselves are obtained by some 
simulation technique like a rejection method.  This requirement facilitates the iterative 
nature of the Gibbs sampling algorithm, which cycles through these full conditionals 
drawing parameter values based on the most recent version of all the previous parameters 
in the list.  The order does not matter, but it is essential that the most recent draws from 
the other samples be used.  The following is the procedure: 
a)  Choose a starting value ?0
 = [?01, ? ,?0?]. 
b) At time ?, starting at j = 1, complete the single cycle by drawing values from the 
k conditional distributions given by: 
??1~?(?1|???1,2, ? ,???1,?,?)    
??2~?(?2|??1,???1,3, ? ,???1,?,?) 
? 
??,??1~?(???1|??1, ? ,??,??2,???1,?,?) 
???~???????1, ? ,??,??1,??. 
c) Increment t and repeat until a convergence criterion is satisfied (noting that 
convergence is to stationarity rather than to a point, as it would be for iteratively 
calculated randomization-based estimators). 
The Metropolis Algorithm 
Another Markov chain Monte Carlo tool is the Metropolis algorithm (Gelman et 
al., 2003, pp. 289-292; Shao, 2003, pp. 249-250 and references therein).  It works with 
22 
 
the joint distribution rather than the conditional distributions for the parameters in the 
model.  The idea behind the Metropolis algorithm is that, when we cannot easily generate 
values from the joint (posterior) distribution of interest, we can often find a ?similar? 
distribution that is easy to sample from.  We need to make sure that this alternate 
distribution is defined over the same support as the target distribution and that it does not 
favor areas of low density of the target.  Once a candidate point in multivariate space has 
been produced by this candidate generating distribution we will accept or reject it based 
on the target distribution.  The resulting Markov chain wanders around and favors higher 
density regions.  It will also explore other lower density regions, but with lower 
probability as we would want.  Note that unlike the Gibbs sampler, the Metropolis 
algorithm does not necessarily have to move to a new position at each iteration; a 
candidate sample from the proposal distribution is accepted as the next value in the chain 
as a function of its relative density in the proposal distribution and the target distribution.  
It was noted in the preceding discussion of the Gibbs sampler that when it is not 
possible to draw directly from a parameter?s full conditional, a simulation method can be 
employed.  The family of related MCMC programs that are collectively called BUGS 
(Ntzoufras, 2009; Spiegelhalter, et al., 2003) uses Metropolis sampling to approximate 
draws for full conditionals when it is not possible to draw directly from full-conditionals.  
This is called Metropolis-within-Gibbs sampling. 
It is important, for the Gibbs samplers and the Metropolis algorithm, in real 
applications to run the Markov chain for some initial period to let it settle into the 
distribution of interest before recording values. This is called ?burn in.? 
23 
 
Assessing the Convergence of Markov Chains 
The empirical results from a given MCMC analysis are not reliable until the chain 
has come close to its stationary distribution and has sufficiently mixed throughout.  Some 
convergence problems may come directly from badly chosen priors, which present 
difficult algorithmic challenges especially in fully-developed specifications with large 
numbers of parameters, near-collinearity, or parameters for which little information is 
available in the data. One common alternative is to use highly diffuse but proper priors; 
however, this alternative can sometimes lead to slow convergence of the Markov chain. 
There are basically three approaches to determining convergence for Markov 
chains: (1) assessing the theoretical and mathematical properties of particular chains, (2) 
diagnosing summary statistics from in-progress models, and (3) perfect sampling (Gill, 
2008).  In this chapter, only the second approach is discussed.  It is essential to remember 
that the convergence diagnostics described below are actually indicators of ?non-
convergence.?  That is, failing to find evidence of non-convergence with these procedures 
does not imply convergence.  Also the information each diagnostic provides has 
limitations; thus, multiple diagnostics should be used. 
A simple way to monitor convergence is monitor the Monte Carlo error (MC 
error), which measures the variability of each estimate due to the simulation.  The MC 
error must be low in order to calculate the parameter of interest with increased precision 
(Ntzoufras, 2009).  Monitoring autocorrelations is useful and low values indicate fast 
convergence (although parameters that are not well determined from the data can have 
high autocorrelations even when the chains have converged to stationarity).  Another way 
is to monitor the trace plots, the plots of the generated values versus iteration number.  If 
24 
 
all values are within a zone without strong periodicities and trends, then convergence is 
more likely.  Another way is to run multiple chains with different starting points (Brooks 
& Gelman, 1998).  This strategy supports both visual checks and statistical checks.  
When we observe that the different chains mix, the convergence is very likely.  Finally, 
several statistical tests have been developed and used as convergence diagnostics.  
CODA (Best et al., 1996) and BOA (Smith, 2005) software have been developed to 
implement diagnostics to the output of BUGS and WinBUGS software.  We may 
improve the mixing of the chain, reducing the time to reach convergence, by 
implementing transformation of the parameters of the parameters of interest (Gilk & 
Roberts, 1996).    
 
2.5.2.3   Program Language of Bayesian Inference Using Gibbs Sampling (BUGS) 
Modeling for Nonstandard Distributions 
The zero inflated models introduce an extra probability parameter to capture an 
excess of zero values.  The zero inflated version of A random variable Y ~ ZID(?0 ,?), 
(where ZID denotes zero inflated distribution),  has a probability function of the form 
????(?) = ?0?(? = 0) + (1?  ?0)??(?;  ?)                                          
where ??(?;  ?) is the probability function of distribution D with parameters ?.  The 
distribution D could be Poisson, gamma, binomial, negative binomial, generalized 
Poisson, bivariate Poisson, or multivariate Poisson.  From the equation above, we can see 
that ZID is a nonstandard distribution which is not listed among BUGS? prespecified 
distributions.  There is an approach to specify nonstandard prior and likelihood in BUGS: 
the zeros-ones trick (Spiegelhalter et al., 2003; Ntzoufras, 2009).  Thus the programs for 
25 
 
our three models all involve the zero-ones trick.  This trick allows arbitrary sampling 
distributions to be used.  However it has been shown that this method can be very 
inefficient and give a very high MC error. (Spiegelhalter et al., 2003).   
The idea of the zeros-ones trick is to use the Poisson distribution to indirectly 
specify an arbitrary model likelihood (Ntzoufras, 2009).  Assume a model with log-
likelihood ? ??
?
?=1 = ? log ?(??|?)
?
?=1 .  Then the log-likelihood can be re-written as 
?(?|?) =  ???? 
?
?=1
=?
??(???)(???)
0
0!
?
?=1
 =  ???(0;???) 
?
?=1
                       
where ??(?; ?) denotes a Poisson probability function with mean  ?.  Hence, ??(0;???) =
?[?? = 0], where the new random variables ?? , ? = 1, ? , ? follow the Poisson 
distribution with means equal to ??? , and all observed values are set equal to zero.  To 
ensure the positivity of the mean of each new random variable, a positive constant term C 
is added to the mean.  This is equivalent to multiplying each likelihood factor by ???.  
This action does not affect the likelihood since it is equivalent to multiplying the resulting 
(unnormalized) posterior distribution by a constant term, ????.  The likelihood is equal to 
?(?|?) =  
??(???+ ?)(??? +  ?)
0
0!
=  ???(0;??? +  ?) 
?
?=1
                        
The constant C must be selected in such a way that ??? +  ? > 0 for all i = 1, 2, ?, n.  
For example, the normal model can be fitted in WinBUGS using the following code 
(Ntzoufras, 2009, p. 276 ): 
C <- 10000 
for (i in 1:n) { 
      zeros[i] <- 0 
26 
 
      zeros[i] ~ dpois(zeros.mean[i]) 
      zeros.mean[i] <- -l[i] + C 
      l[i] <- -0.5*log(2*3.14) ? 0.5log(s2) ? 0.5*pow( y[i] ? mu[i], 2)/(s2)       }  
Convergence Checking By Using Bugs Sample Monitor Tool 
To check convergence when running several MCMC chains, several built-in 
diagnostic / monitor tools are used while running BUGS.  As noted in the previous 
section on diagnosing convergence, these are diagnostic tools rather than certain evidence 
of convergence.  They are only capable of identifying nonconvergence. 
a) Trace plot: this is a plot of generated values against each iteration number.  Shown 
below are examples with two independent chains. 
a. This is an example where possible non-convergence evidence was found: 
The chains have not mixed well.  (The chains may be in fact moving 
around the same stationary distribution ? i.e., may have converged ? but 
are mixing very slowly.  Looking at a longer interval of the chain would 
be the next step to see if mixing occurs, but at a slower rate than could be 
discerned in this interval of about 200 cycles.)  
iteration
549505490054850
 
b. This is an example where ?evidence of non-convergence? was not found: 
the chains have mixed well. 
27 
 
iteration
549505490054850
 
b) Posterior density: According to the Bernstein ? von Mises theorem, the posterior 
distribution for continuous parameters is asymptotically normal under broadly 
satisfied conditions (although problems can be constructed in which the true 
posterior for a parameter is in fact bimodal).  BUGS produces an approximate 
visual kernel estimate of the posterior density or probability function. 
a. This is an example where possible ?non-convergence evidence? was 
found. 
phi sample: 16000
phi
0.825 0.85 0.875 0.9 0.925
 
b. This is an example where evidence of ?non-convergence evidence? was 
not found. 
gamma[2] sample: 16000
gamma[2]
5.5 5.75 6.0 6.25 6.5
 
c) History:  This is a full trace plot of all stored values.  Values within a zone without 
strong trends or periodicities are suggestive of convergence of the chain.   
28 
 
a. This is an Example where non-convergence evidence was found.  (Again, 
examining a longer run is important, since slow mixing can also produce 
the graph shown immediately below.  Viewed from a further distance in 
this sense the chain shown below might look essentially like the graph 
below it.) 
iteration
46000 47500 50000 52500
 
b. This is an example where ?non-convergence evidence? was not found 
iteration
46000 47500 50000 52500
 
d) The plot of Gelman-Rubin diagnostic:  This is an ANOVA type diagnostic 
(Gelman & Rubin, 1992); multiple chains converging to the same stationary 
distribution should each show similar within-chain variation.  This common within-
chain variance should be equal to the pooled variance.  Values of R close to one 
(shown as the red line in the plots below) suggest convergence.  
a. This is an example where evidence of non-convergence was found. 
29 
 
sigma.u chains 1 : 2
start-iteration
15051 16000 17000 18000
 
b. This is an example where evidence of non-convergence was not found. 
gamma[2] chains 1 : 2
start-iteration
15051 16000 17000 18000
 
Monte Carlo Error and the Posterior Standard Deviation 
The BUGS output produces two error measures: the Monte Carlo (MC) error and 
the posterior standard deviation.  The MC error measures the variability of each 
variable?s average across the Monte Carlo chain.  This variability is due to running the 
chain for a finite number of cycles and can be made as small as desired by increasing the 
number of draws. On the other hand, the posterior standard deviation, the analog of the 
standard error in conventional statistical inference, represents genuine uncertainty due to 
having a finite set of data and cannot be reduced other than by obtaining additional real 
data.  Ideally, we would like to run the program until there is both no evidence of non-
convergence and negligible MC error, and then calculate the posterior means.  In BUGS, 
the batch mean method (Roberts, 1996, p.50) is used to estimate the MC error.  The 
30 
 
formula for the posterior standard deviation can be found in Carlin and Louis (2000, p. 
172) and in Ntzoufras (2009, pp. 39-40).  
Autocorrelation 
This is a plot of all autocorrelations using lags from 1 to 50.  If autocorrelations 
are low, then convergence tends to be obtained in a relatively lower number of iterations.  
Thus autocorrelation indicates how fast the convergence will be.  If autocorrelations are 
large, the sampler will need many iterations to get close to the target distribution.  Once it 
gets there, we will need to draw a very large number of additional iterations in order to 
get accurate estimation of features of the posterior.  The Monte Carlo sampling error will 
be high.   It might appear at first blush that there is a way to reduce the autocorrelation 
among repeated draws by using the ?thin? tool.  For example, when thin = 10, then 
BUGS will generate 10 iterations but will store only the last one in every sequence of 10 
generated values.  The autocorrelation among the saved values will in fact be lower.  
However, the chain lengths needed to reach stationarity or to achieve a satisfactory 
approximation of posteriors are the same.  There is no real advantage in thinning except 
to reduce storage requirements and the cost of handling the simulations when very long 
runs are being carried out and subsequently analyzed.  Examples of autocorrelation plots 
are presented below: 
beta[1]
lag
0 50
 
31 
 
phi
lag
0 50
 
psi
lag
0 50
 
32 
 
Chapter III 
AR(1) Mixed Effects ZIP Model 
3.1    Data Example 
Before discussing the model, we would like to describe a data set which 
motivated the new model development.  The National Longitudinal Survey of Youth 
(NLSY97, 1997) data set consists of a national sample of youths who were 12 to 17 years 
old as of December 31, 1996.  Round 1 of the survey took place in 1997 and subsequent 
surveys were conducted annually.  Cigarette use is our target to measure.  Three 
questions were asked: 
a) Have you ever smoked a cigarette? 
b) During the past 30 days, on how many days did you smoke a cigarette? 
c) When you smoked a cigarette during the past 30 days, how many cigarettes did you 
    usually smoke each day?  
In the NLSY97 dataset, there are approximately 9,000 youths observed at 13 time 
points.  The observation ?number of cigarettes smoked per day in the last 30 days? is 
denoted as ???, for subject ? = 1, ? ,?, at time point ? = 1, ? , ?.  Assume the responses 
are independent distributed across subjects.  That is, Subject 1?s response doesn?t affect 
Subject 2?s response.  Since the individual subject parameters are not of research interest, 
the subject is treated as a random effect.  That is, subjects with same covariates may have 
different responses because of the randomness.  Within each individual, there is a random 
fluctuation at the 13 time points too.  It is very likely that a subject?s smoking status 
given the covariates will not remain a constant over the 13 time points.  So we can say 
that subject is a multidimensional random effect.  The fluctuations of each individual are 
33 
 
independent of the others.  The distributions of the randomness will be described in 
mathematical expressions in the following section.   
There are 13 time points in the data set, and a first-order autoregressive model is 
used to describe the autocorrelations across the time points (note that these 
autocorrelations are unrelated to the autocorrelations of draws in MCMC chains 
discussed earlier).  There are four covariates: sex (two levels), race (four levels), age (six 
levels), and peers (what percentage of your friends are smokers?; five levels), the effects 
associated with which are also treated as fixed effects.  Although there are many 
parameters in the model, the data set is big enough to estimate them.  In the population, 
we expect that there is an increasing tendency to smoke, which is a function of time point 
or age; that is, a positive Poisson regression coefficient on the age variable is expected.   
The survey question (a) ?Have you ever smoked a cigarette?? can be a covariate 
too.  Question (a) presumably provides very good information about the latent zero-class 
indicator that the ZIP model must posit.   
The survey question (c) ?When you smoked a cigarette during the past 30 days, 
how many cigarettes did you usually smoke each day?? yields the response variable, ???.  
The ???, the number of cigarettes a subject smokes, is a non-negative integer.  We will 
model ??? with various forms of Poisson distributions.  The average number of cigarettes 
a subject smokes at time j is a quantity of interest.   
We looked the descriptive statistics of this data set before attempting any serious 
model building or analysis.  Take the responses at Time 1 as an example:  This data set 
shows excessive numbers of zeros so that a ZIP model is proposed instead of Poisson 
model.  From Table 3.1.1 the total number of data points is 5436 (non-smokers) + 3517 
34 
 
(smokers) = 8953.  We can think of the 146 smokers (smoking not observed) as sampling 
zeros; and think of the 5436 (non-smokers) as structural zeros.  If the whole data set is 
treated as a sample from a Poisson distribution without covariates, the mean of sample is 
0.9 [8106 (total number of cigarettes smoked) / 8953 = 0.9], we would expect 8953??0.9 
= 3640 zeros.  However there are 5582 (146 + 5436 = 5582) zeros (note that we did not 
count zeros from missing values) in the data set.  A ZIP model would better fit the data 
by having 5436 zeros as structural zeros or 5436 observations in the perfect class.      
Table 3.1   Description of Smoking Data in the NLSY97 at Time 1 
Smokers: 3517 Non-smokers: 5436 
Observed smokers: 1618 Missing values: 1899 Non-smokers:5436 
Zero cigarette 
smokers: 146 
Non-zero 
cigarette 
smokers: 1472 
 
3.2   AR(1) Mixed Effects ZIP Model  
In this section the proposed model, AR(1) mixed effect ZIP model, is presented in 
three steps.  In Subsection 3.2.1, a basic fixed-effects ZIP model is described with a 
rectangular data structure.  In Subsection 3.2.2, the basic ZIP model is extended to 
incorporate two random subject effects: one in the logistic model and the other in the 
Poisson model.  Also, in this mixed effect ZIP model correlation between the two random 
subject effects is incorporated.  In Subsection 3.2.3, the AR(1) mixed effect ZIP model, 
in which two time-varying random subject effects are incorporated in the ZIP model, is 
described.   
3.2.1   The Fixed Effect ZIP Model 
In a rectangular data set, let the discrete random variable ??? be the j-th observed 
count on the i-th subject, ? = 1, ? ,?, ? = 1, ? , ?.  Assume that ??? is distributed as a 
35 
 
mixture of two components: (1) responses are zero with probability one (perfect state); 
(2) responses follow a Poisson distribution (Poisson state).  Assume that an unobserved 
random variable, ???, indicates the state membership of the subject, either the perfect state 
or the Poisson state.  Note that if ??? > 0, the subject was in the Poisson state, but if ??? = 
0, the subject may have been in either of the two states.  This key feature makes the ZIP 
model different from the Poisson model.  The ??? are assumed to be from a Bernoulli 
distribution with parameter ???, such that ????? = 1? =  ??? and  ????? = 0? =  1 ?  ???.   
If ??? = 1  then ??? = 0, coming from the perfect state  and if ??? = 0 then  ??? = ?, y = 
0, 1, 2, ?, coming from a Poisson distribution.  Therefore, ??? has the zero-inflated 
Poisson (ZIP) distribution: 
                          ??? ~ ?
0,                                  with probability ???                
Poisson?????,          with probability ?1?  ????,
           (3.1) 
where Poisson (??? ) is defined as P(??? = ???) = exp (????)???
??? ???!? . 
Covariates can enter into the model in two places: in the logistic regression model 
and in the Poisson regression model.    
a) The logistic regression model is for predicting the status, perfect state or Poisson 
state. The probability ??? is expressed as  
                      logit????? = log ?
?ij
1 ?  ???
? =  ????? =  ?0 + ?1???
1 +?+ ?????
? .            (3.2) 
b) The log-linear  regression for the Poisson mean is expressed as  
                                           log????? = ????? = ?0 + ?1???
1 +?+ ?????
? .                           (3.3) 
where ??? and ??? are  covariate vectors and ? and ? are vectors of regression coefficients 
for the logistic regression model and Poisson regression model, respectively.  The 
36 
 
components of ??? and ??? could be the same or different from each other.  This model 
was essentially proposed by Lambert (1992).  
 
3.2.2   The Mixed Effect ZIP Model 
In the previous paragraphs a basic ZIP model was presented.  In a repeated 
measures study, observations within a subject are correlated, and one might model this 
situation by adding a subject parameter to a regression model.  Now we extend the ZIP 
model by incorporating a subject random effect (or random intercept effect).  
For the mixed effect ZIP model, two random subject effects are introduced in the 
model so that 
       logit????? = log?
?ij
1?  ???
? =  ????? +  ??                                           (3.4) 
         log????? = ????? +  ??                                                                              (3.5) 
where ?  and ? are vectors of fixed effect coefficients and ??  and  ??  denote random 
random effects with  
                    ??  ~ ?(0,??
2 ), 
                    ??  ~ ?(0,??
2 ).  
The normal random intercepts (??  and  ??) capture the heterogeneity of subjects.  The 
variances ??
2 and ??
2 indicate the differences among subjects.  Note that here the subject 
effects do not change over time.  Time-varying random effects will be described later.        
Furthermore, assume that ??  and  ?? are correlated because the same subject is 
modeled in the two regressions.  The random effects follow a bivariate normal 
distribution with zero means and covariance matrix 
37 
 
          ?
??
??
?  ~ ?(0, ?),         ? =  ?
??
2 ?????
????? ??
2 ?                                        (3.6) 
where ? denotes corr(?? ,  ??).  This is a mixed effect ZIP model for repeated measures 
data without involving time.  Finally, assume that ???|{?? , ??: ? = 1, ? ,?} are 
conditionally independent and conditionally follow a ZIP distribution.  This model is 
presented by Neelon et al. (2010). 
 
3.2.3   AR(1) Mixed Effects ZIP Model 
It is assumed that a subject can move from the perfect state to the Poisson state 
and back again across time points j.  For example, a non-smoker at Time 1 might begin 
smoking at Time 2; and a smoker at Time 1, Time 2, and Time 3 might decide to quit at 
Time 4.  This is saying that any one cell in the person-by-occasions matrix can be a 
structural zero.  This is different from some latent class models, where a person who 
begins as a zero-class member necessarily remains there at all time future points.  To 
capture the variation across time points within a subject, first-order autoregressive 
(AR(1)) processes ??? , ??? are substituted for ??, ?? in equations (3.4) and (3.5).  This is 
our proposed model.   
Time-varying random effects are incorporated into the model as follows.  Let 
???  and  ??? denote the random effects, where i indexes subject and j indexes time.  The 
other covariates remain as fixed effects.  Then write  
                               ??? =  ???,??1 +  ???,                                                                                (3.7) 
where ? = 2, ? , ?, and ???
???
~  ?(0,??
2) , and ???  are independent of ???~?(0,??
2).  The 
parameter ? is a first order autocorrelation coefficient with ?1 < ? < 1.   
38 
 
Furthermore, 
                   ???????? =  ?
2??????,??1?+  ??
2.                                                     (3.8) 
To make ???????? independent of the time index j, assume   
??
2 =  ??
2?2 + ??
2       ?      ??
2 =  ??
2(1?  ?2) 
Then the variance-covariance matrix of ???  is 
          ??? ??? ?
??1
?
???
? = ??
2  
?
?
?
?
?
1 ? ?2 ? ???1
? ? ? ?
?
?2
?
???1 ? 1 ?
?
?
?
?
,                                   (3.9)     
    ??????,??1,???? =  ????,??1????,??1 +  ????? =  ??????
2 ?+  0 = ???
2,            (3.10) 
           ???????? , ??,??1  ? =  
???2
??
2 =  ? = first order autocorrelation.                          (3.11) 
Similarly, an AR(1) model is also assumed for the ??? with variance ??
2 and first order 
autocorrelation ??.  We also assume that ? = ???????? , ???? ? 0.  Let ? be the 
autoregressive coefficient matrix such that ? =  ?
? 0
0 ?? where ? is an autocorrelation 
coefficient for ??? and ? is an autocorrelation coefficient for ???, that is, 
??,? =  ???,??1 +  ????.   
The autoregressive model is thus 
                    ?
???
???
? =  ? ?
??,??1
??,??1
?+  ?
???
????
?                                                             (3.12) 
where the terms on the right hand side are independent.  Then  
                                         ? = ??? ??? ?
???
???
?                                            
                                                                 =  ?
??
2 ???????
??????? ??
2 ? 
39 
 
                                                           = ?V?+ ?
??
2 ?????????
????????? ???
2 ?             (3.13) 
where        ??????? =  ????????? + ????????? 
                                 =   ????????? +  ?????????(1? ?2)(1? ?2)                (3.14)     
              ???? =   (1?  ???)??? ?(1? ?2)(1? ?2)?                                           (3.15)    
            ??
2 =  ??
2(1? ?2)                                                                                            (3.16)   
                  ???
2 = ??
2(1? ??2)                                                                                            (3.17)   
Interpretation of Mixed Effect Logistic Regression 
As in section 3.2.2, the ??? are conditionally independent ZIP variables, given 
{??? , ???: ? = 1, ? ,?; ? = 1, ? , ?}.  Because the AR(1) model is conditioned on the 
random effects, the interpretation of results is not immediate.  The following is an 
example (Agresti, 2002; pp. 496-500) for a single covariate X1. 
In fixed effect logistic regression:  
?[nonsmoker|X1] =
exp[?0 + ?1?1]
1 + exp[?0 + ?1?1]
. 
In mixed effect logistic regression the subject specific probability is: 
?[nonsmoker|U, X1] =
exp[?0 + ?1?1 + ?]
1 + exp[?0 + ?1?1 + ?]
. 
The population average probability is: 
?[nonsmoker|X1] = ?
1
???2?
???
2/(2??
2 )
?
??
exp [?0 + ?1?1 + ?]
1 + exp [?0 + ?1?1 + ?]
?? 
where  ?~?(0,??
2).   
In this random effect model the link function is no longer the logit.  If no random 
40 
 
Effects are present, the odds in favor of success increase by a factor exp(?1) when ?1 in 
creases by one unit. This statement is true for all subjects, regardless of the values of ?1, 
and it is true when averaging over the population.  In the presence of random effects, the 
same statement is true of the conditional odds of success, given the u and ?1. However, 
the average change in odds is no longer exp(?1), but instead it is a nonconstant function 
of ?1.  The link function no longer has the logistic shape and the effect of ?1 is reduced 
(Agresti 2002, p.500).  Note also that the integral above can not be expressed 
analytically.  Analysis of the model will therefore be more computational.  
41 
 
Chapter IV 
Research Questions and Pilot Study 
4.1   Research Questions  
Sample Size For AR(1) Mixed Effect Zip Model 
In general, the more parameters a model has, the more data points are needed.  
Also, some parameters are harder than the other parameters to estimate; that is, some 
parameters need more data points to provide enough information for estimation.  The 
question arises:  How big a sample size is needed to estimate the AR(1) Mixed effect ZIP 
model?   
Importance of Random Effect Parameters  
The AR (1) model contains fixed effect parameters and extra five random effect 
parameters to describe a zero inflated Poisson longitudinal data set.  The five random 
effect parameters greatly increase the complexity of the model structure and hence the 
estimation difficulty (due in large part because the five parameters of the random effects 
and autoregression structure require in MCMC estimation the generation of correlated 
effect values for each sampled individual at each time point).  A practical concern arises: 
when the fixed effects parameters are the main interest, how do simpler submodels of the 
AR(1) model perform, in terms of magnitude of bias and estimation error?  In other 
words, when the full model is misspecified by ignoring some random effects, is 
estimation of fixed effects robust against this misspecification?  
Research Questions  
       The following research questions were studied in the pilot study in connection 
with the AR(1) mixed effect zip model. 
42 
 
a) Can we estimate this model using MCMC methods in a computationally 
efficient way? 
b) How do we determine the sample size to obtain sufficiently accurate estimates? 
c) Can a simpler model be fitted to actual AR(1) zip data with only minor bias and 
large computational savings? 
d) Which parameters in the AR(1) zip model can be estimated most accurately?  
Which are estimated least accurately? 
 
4.2   Pilot Study 
4.2.1   ZIP, MIXED, and AR(1) Model Features 
Full Model and Nested Models 
The ZIP model and MIXED model are nested in the AR(1) model, i.e., the full 
model.  We can interpret these two nested models in terms of the AR(1) model structure.  
Table 4.1   ZIP, MIXED, and AR(1) Model Structures 
Random Effect Parameter ZIP Model Mixed Model AR(1) Model 
? 0 1 -1 < ? < 1 
?? 0 1 -1 < ?? < 1 
?? 0 
[implies ??? ? 0]  
> 0 
[and ??? ? ??] 
> 0 
?? 0 
[implies ??? ? 0] 
      > 0 
[and ??? ? ??]   
      > 0  
??? --- -1 ? ??? ? 1 -1 < ??? < 1 
 
Recall that the AR(1) model states ??? = ???,??1 + ?1 ? ?2???, where the random 
fluctuation term ??? is independent of ??,??1 and satisfies ?????? = 0 and ???????? =
??
2.  If  ? = 1  then ??? = ??,??1 for all j.  If the AR(1) model is the true model, fitting the 
Mixed Model to a data set could cause biased estimates of fixed effect parameters 
43 
 
because of the misspecification of ? and ??.  The ZIP model is only correct for 
independent data, and could cause bias when analyzing either longitudinal data or cluster 
data.      
Computation Requirements 
The complexity of these three models is different, so the requirements for their 
computation are different too.  The numbers of random quantities which must be 
generated at each MCMC iteration for the case of five time points are tabulated below. 
Table 4.2   ZIP, MIXED, AR(1) Model Computation Requirement at Each MCMC 
Iteration 
Data  ???, ? = 1,? ,5.  ? = 1,? ,?.  
Model Fixed effect parameters, random effect 
parameters, random effects in a model. 
Total computation 
requirement per MCMC 
iteration  
ZIP 5 Fixed effect parameters: ?0,?1,?2,?0, ?1. 5 
Mixed 5 Fixed effect parameters: ?0,?1,?2,?0, ?1. 
3 Random effect parameters: ??,??,???. 
2 Random effects per subject: ?? , ?? . 
5 + 3 + 2*N 
AR(1) 5 Fixed effect parameters: ?0,?1,?2,?0, ?1. 
5 Random effect parameters: ??,??,???,?,??. 
10 Random effects per subject: 
??1,? ,??5.      ??1,? , ??5.  
5 + 5 + 10*N 
 
From the table we see the AR(1) model requires much more computational work than 
ZIP and the Mixed model.  Thus if interest lies mainly in the fixed effects, it might be a 
better strategy to fit a simpler model that is a ?correct model? or a ?nearly correct model? 
for the data at the cost of some bias.   
Slowness and High Autocorrelation    
The ZIP, Mixed, and AR(1) models all suffer from the slowness of 
WinBUGS/OpenBUGS, especially the AR(1) model.  It is not surprising for it to take 60 
44 
 
hours to for the AR(1) model to converge with sample size N = 2000 and one predictor 
variable only.  Here are possible reasons: 
a) Model Complexity 
As shown in the previous paragraph, the AR(1) model requires computational 
work in proportion to  ?5 + 5 + 10*N? for each MCMC iteration. 
b) Non-standard distributions 
All three models involve a ZIP distribution, which is a mixture of a Poisson 
distribution and a degenerate distribution at zero.  There is an approach to specify 
nonstandard prior and likelihoods in BUGS: the zeros-ones trick (Spiegelhalter et 
al., 2003d; Ntzoufras, 2009).  Thus our three models programs all employ the 
zero-ones trick.  This trick allows arbitrary sampling distributions to be used.  
However, it has been shown that this method can be very inefficient and give a 
very high MC error (Spiegelhalter et al., 2003). 
 
4.2.2   Generating Simulated Longitudinal Zero Inflated Poisson Data 
There are three subsections presented here.  First, we talk about the structure of 
NLSY97 data; second, the generation of simulated data motivated by the NLSY97 is 
presented; third, we discuss the sensitivity of parameter choices for the data. 
  
4.2.2.1   Data Structure of NLSY97 
In NLSY97, there are 8,953 subjects, who are aged from 12 to 16 in year 1997.  
The total number of subjects decreases every year, for example, from 8,953 (in year 
1997) to 7,851 (in year 2001).  Each AGE group is about 20% of the data.  For the SEX 
45 
 
variable, the males and females are equal proportions of the data.  For the RACE 
variable, there are four levels: black (26%), Hispanic (21%), mixed race (1%), and non 
black/Hispanic (non_b_h) (52%).  The PEER variable (What percent of your friends are 
smokers?) is in ordinal scale.   The levels are: "1" ="LE 10%", "2" ="ABT 25%", "3" 
="ABT 50%", "4" ="ABT 75%", and "5" ="GT 90%?.  Each subject reports whether he 
or she ever smoked.  Those who respond ?yes? are the smoker or Poisson class.  Those 
who respond ?no? are the nonsmoker or perfect class.  This is a longitudinal survey with 
14 annual waves starting in the year 1997.  The number of cigarettes smoked per day is 
the research interest in this dissertation.   
The following descriptive statistics were taken from smoker class (Poisson class) 
members only to present the important features of the NLSY97 data.  The descriptive 
statistics of the number of cigarettes smoked per day from year 1997 to year 2010 are 
presented in Table 4.3.  There is an increasing trend of number of cigarettes smoked per 
day, but it is not linear.    
Table 4.3   Number of Cigarette Smoked Per Day from Year 1997 to Year 2010 
Year Number of Cigarettes Smoked Per 
Day 
Year Number of Cigarettes Smoked Per 
Day 
1997 5.01 2004 9.05 
1998 6.24 2005 9.48 
1999 7.28 2006 9.16 
2000 8.01 2007 9.16 
2001 8.47 2008 9.35 
2002 8.63 2009 9.31 
2003 8.79 2010 9.36 
 
We will look more closely at data from Year 1997 to Year 2001 in Table 4.4.  In 
the NLSY97 data, the survey question ?Have you ever smoked cigarettes before?? 
classified subjects into ?Self-Indicated Smoker? and ?Self-Indicated Non-Smoker? 
46 
 
groups.  We use this survey question as a proxy variable for the latent class-membership 
indicator variable, smoker (Poisson class) or non-smoker (perfect class).  Although they 
are not really Poisson class and perfect class, this breakdown is useful in designing our 
simulation study.  Based on this definition, we see the Poisson Class (the smokers) 
slightly increased from 39% up to 42%.  The perfect class (the nonsmokers, the structural 
zeros) remained at around 60% of the total population. Thus, despite the binning, the 
NLSY97 can plausibly be modeled as zero inflated Poisson data.  The Poisson zeros (the 
sampling zeros) are 4.1% of the Poisson class in 1997 and 3.2% of the Poisson class in 
2001.  The observed numbers of (indicated smoker) Poisson zeros from 1997 to 2001 are: 
146, 144, 103, 111, and 104.  However, the expected numbers of Poisson zeros from 
Year 1997 to Year 2001 are: 23, 6, 2, 1, and 0.7.  These expected numbers of Poisson 
zeros are obtained by using the observed means from Table 4.3 as ? in this equation: 
Expected Poisson Zeros = (size of Poisson class) * (probability of Poisson zero) 
                                       = ? ?
????0
0!
? = ????.   
As we can see, there are big discrepancies between the numbers of observed Poisson 
zeros and expected Poisson zeros.  This is evidence of an over dispersion problem.  
Therefore, we will incorporate this real data feature into our simulated data.     
Table 4.4   Descriptive Statistics of NLSY97 from Year 1997 to Year 2001 
 
             
Year  1997 1998 1999 2000 2001 
Total subjects 8,953 8,358 8,148 8,027 7,851 
Mean of Poisson class Cigarettes  
Smoked Per Day 
5.01 6.24 7.28 8.01 8.47 
Perfect class (Self-Indicated Non-Smoker)  5,436 5,317 5,082 4,825 4,591 
Percentage of perfect class  62% 65% 64% 61% 60% 
Poisson class (Self-Indicated Smoker)  3,517 3,041 3,066 3,202 3,260 
Observed Zeros from Poisson Class 146 144 103 111 104 
Expected  Zeros from Poisson Class 23 6 2 1                   0.7 
47 
 
Furthermore, the distribution of the number of cigarettes smoked per day is not 
quite Poisson either.  In Table 4.5 we see some spikes at cigarette numbers 10, 15, 20, 
and 30.  We suspect that there are two latent classes of subjects: light smokers and heavy 
smokers.  The light smokers answered this question in units of numbers of cigarettes, 
while some heavy smokers answered this question in units of packages of cigarettes.   
Table 4.5   Number of Cigarettes Smoked Per Day in Year 1997  
Numbers of 
Cig. Smoked 
Per Day 
Numbers 
of smokers 
Numbers of 
Cig. Smoked 
Per Day 
Numbers 
of smokers 
Numbers of 
Cig. Smoked 
Per Day 
Numbers 
of smokers 
0 29 9 2 18 4 
1 94 10 50 19 1 
2 57 11 3 20 32 
3 50 12 4 22 0 
4 32 13 3 23 1 
5 36 14 0 24 2 
6 12 15 29 25 3 
7 16 16 5 26 1 
8 12 17 5 30 7 
 
As stated above, the distribution of observed numbers of counts does not seem to 
be Poisson and there is evidence of over dispersion.  Random effect models can deal with 
unmeasured predictors; that is, the random effects part would be treated as a fixed effects 
part if those explanatory variables had been observable.  Random effects also sometimes 
represent random measurement error in the explanatory variables (Agresti, 2002).  Thus, 
the random effect parameters, ?? and ??, are probably important for analyzing NLSY97 
data.      
Next, the correlation matrix of smoking status, non-smoker or smoker, is 
presented in Table 4.6.  These correlations provide a good idea what magnitude the 
correlation should be in logistic regression part when we generate simulated data.  Based 
48 
 
on Table 4.6, the first order autocorrelations are around 0.6.  We found by simulation 
experiments that such values corresponded to an AR(1) parameter ? of approximately 
0.85 to 0.90 in the logistic regression.          
Table 4.6   Correlations of Smoking Status, Non-smoker or Smoker, from 1997 to 2001  
Year  1997 1998 1999 2000 2001 
1997 1.00   .5   .43  .39  .35 
1998  .50 1.00   .61  .52  .47 
1999  .43  .61 1.00   .64  .55 
2000  .39  .52  .64 1.00   .65 
2001  .35  .47  .55 .65 1.00  
   
Next, the correlations of numbers of cigarettes smoked are presented in Table 4.7.  
The observed first order autocorrelation is about 0.5.  We found by simulation 
experiments that such values corresponded to an AR(1) parameter ?? of approximately 
0.85 or 0.90 in the Poisson regression.          
Table 4.7   Correlations of Number of Cigarettes Smoked, from 1997 to 2001  
Year  1997 1998 1999 2000 2001 
1997 1.00  .45 .34 .35 .30 
1998 .45 1.00 .51 .44 .35 
1999 .34 .51 1.00  .55 .40 
2000 .35 .44 .55 1.00  .51 
2001 .3 .35 .40 .51 1.00  
   
The two autocorrelation matrices above showed that the autocorrelations (? and 
??) are important in the structure of the NLSY97 data.  Analyzing the NLSY97 data but 
ignoring the AR(1) correlations and autocorrelation would be a mistake.  The practical 
question, and a key question addressed in this dissertation, is what impact omitting the 
autocorrelations (? and ??) has on inferences about fixed effect parameters.    
 
 
49 
 
4.2.2.2   Generating Simulated Data 
This simulated data were generated based on a random effect AR(1) ZIP model.  
An R program for generating the data was written according the formulas in Chapter III:   
????? = 0? = ??? + ?1? ?????1? exp??????? 
????? = ?:? > 0? = ?1? ????????????????
?/?! 
where 
                       logit ??? = ?0 + ?1?1 + ?2?2 + ??? 
           log ??? = ?0 + ?1?2 + ??? 
           ???~?(0,??
2) 
           ???~?(0,??
2) 
           ??????? , ???? = ??????? 
                        ??,?+1 = ???,? + ??,?+1 
  ??,?+1 = ????,? + ???,?+1 
  ???~?(0, (1? ?
2)??
2) 
  ????~?(0, (1? ??
2)??
2) 
  ???????? , ?
?
??? = ???(1? ???)/?(1? ?2)(1? ??2) 
The subscript ?, ? = 1,? ,?,  indexes subjects.  The subscript ?, ? = 1,? ,? = 5, indexes 
time points, which are equally spaced in this study.  There are two types of response 
variables: one is Bernoulli (in the logistic regression for class membership), and the other 
is counts (in the Poisson regression for members of the Poisson class only).  A binary 
covariate (?1) with equiprobable values 1 and 0 was created to represent SEX.  A 
quadratic function of time was the second covariate (?2).  The quadratic term is meant to 
50 
 
capture the nonlinear trend in the mean of number cigarettes smoked per day.  The two 
components in the simulated data are:      
Logistic regression:  ?0 +  ?1?1 + ?2?2 + ??? . 
Poisson regression:  ?0 + ?1?2 + ???. 
The parameter values are: 
Fixed Effect  True value Random Effect  True value 
?0 .00 ?? 3.50 
?1 .50 ?? .80 
?2 -3.00 ? .85 
?0 .50 ?? .85 
?1 6.00 ??? .10 
 
By manipulating the ten parameter values, the simulated data presents all the 
features of NLSY97 data such as excessive zeros, excessive Poisson zeros, AR(1) 
correlations, and auto correlations.  Only the bumps in the histograms of the numbers of 
cigarettes smoked are not simulated.   
There are two sample sizes:  N = 2000 and N = 5000.  The N = 2000 and N = 5000 
are used in the section dealing with sample size determination.  In the section dealing 
with model comparison (ZIP, MIXED, and AR(1) models), N = 2000 is used. 
Table 4.8   Descriptive Statistics of Simulated Data 
Year  1 2 3 4 5 
Total subjects 5000 5000 5000 5000 5000 
Mean of Poisson class Cigarettes  
Smoked Per Day 
1.845 2.879 4.053 4.989 5.217 
Perfect class (Self-Indicated Non-Smoker) 2502 2414 2324 2278 2270 
Percentage of perfect class 50% 48% 46% 45% 45% 
Poisson class (Self-Indicated Smoker) 2498 2586 2676 2722 2730 
Observed Poisson Zeros 314 179 103 70              65 
 
 
51 
 
Table 4.9   Correlations of Zero Indicator, from Year 1 to Year 5  
Year  1 2 3 4 5 
1 1.00 .47 .39 .32 .27 
2 .47 1.00 .47 .39 .33 
3 .39 .47 1.00 .47 .40 
4 .32 .39 .47 1.00 .49 
5 .27 .33 .40 .49 1.00 
  
Table 4.10   Correlations of Poisson Counts, from Year 1 to Year 5 
Year  1 2 3 4 5 
1 1.00 .65 .56 .49 .41 
2 .65 1.00 .70 .58 .48 
3 .56 .70 1.00 .70 .59 
4 .49 .58 .70 1.00 .73 
5 .41 .48 .59 .73 1.00 
  
Choices of Parameter Values for Generating Simulated Data 
Recall that there is big discrepancy in the NLSY97 data set between the observed 
Poisson zeros and expected Poisson zeros ? there are far more observed Poisson zeros in 
the data set than theoretical expected Poisson zeros.  From the first set of parameter 
values (see Table 4.11), we generated very few Poisson zeros.  This essentially means the 
logistic regression part is not necessary for ZIP model because there is no need to 
distinguish Poisson zeros and perfect zeros.    In other words, with such a large separation 
between the data for the two classes, we have nearly distinct data subsets for the two 
classes.  This is less interesting as a mixture problem.  By adjusting the parameter values 
we were able to generate data with nontrivial zeros for the Poisson class, and hence a 
nontrivial mixture problem. 
 
 
 
 
 
 
52 
 
Table 4.11   Parameter Values of Simulated Data                  
Fixed 
Effect  
1st set 
value 
Final set 
value 
Random 
Effect  
1st set 
value 
Final set 
value 
?0 1.0 0 ?? 3.5 3.5 
?1 0.5 0.5 ?? 0.4 0.8 
?2 - 4.0 -3.0 ? 0.95 0.85 
?0 2.0 0.5 ?? 0.95 0.85 
?1 1.0 6.0 ??,? 0.7 0.1 
 
4.2.3   Program Language for Bayesian Inference Using Gibbs Sampling (BUGS) 
Estimation 
With the Bayesian approach, the distinction between fixed and random effects is 
less stark, as every effect has a probability distribution.  From a Bayesian perspective all 
parameters are seen as random quantities arising from proper probability distributions, 
thus all effects are random.  The key concept to distinguish between random and fixed 
terms is the implicit versus explicit prior distribution.   
Consider the OLS regression model ? = ?? + ?.  From a classical statistical 
point of view, X is considered a fixed variable matrix, and ? is considered as a fixed 
regression parameter vector, that is, a vector of fixed effects.  The only random quantity 
in this model is the vector e, which is the source of the randomness of Y.  However, from 
a Bayesian view, ? is considered as a vector of random variables, because it has an 
explicit prior in the formulation.  From a Bayesian point of view, the classical approach 
implicitly assumes an improper uniform prior distribution on ?.  For more examples, see 
Lynch (2007).  For a Bayesian a variable with hyperpriors may be regarded as a random 
effect; that is, the effects are modeled as exchangeable before data are observed; others 
are treated as fixed effects.    
53 
 
Regarding the values of hyperpriors, there are Empirical Bayes (EB) and Full 
Bayes (FB) approaches.  EB estimates the highest-level hyperpriors, and then treats the 
resulting point estimates as known true values to estimate lower-level parameters.  This 
process controverts Bayesian philosophy by ?using the data twice? (Lindley, 1969, p. 
421).   
If EB approach is used to estimate the hyperparameters, this can lead to 
overoptimistic estimates of precision of the estimated parameters of interest.  For 
example, in an analysis of traffic safety data, Carriquiry and Pawlovich (2004) concluded 
that ?often EB analysts obtain unrealistically low standard errors.?  The Full Bayes (FB) 
approach is used in this pilot study.   
Choices of Priors and Initial Values  
In the simulation study we attempted to use reasonably informative priors for the 
unknown parameters, because we found that the BUGS program is very sensitive to the 
choice of the priors and initial values in the ZIP model, the MIXED model, especially in 
the AR(1) model.  For example in a Normal distribution, replacing ? =
1
????????
= 0.1 
with ? = 0.01 will cause the failure of the BUGS program.  We believe that if the prior is 
too vague, extremely large parameter values may be drawn by the MCMC process and 
that these extreme values lead to numerical instability which cause BUGS to crash.  
Unfortunately, BUGS does not provide useful diagnostics to identify the cause of a crash. 
Since we have sample sizes of 2000 and 5000, we expect the prior effect to wash 
out because of the large sample size.  However, the BUGS program does not always 
behave in this way unless we use highly informative priors.  In simpler models, such as 
the GLMM logistic model, we arbitrarily chose prior values and the estimates were very 
54 
 
good.  For details, readers are referred to Section 4.2.4 dealing with Fisher information 
and sample size.  We present the parameter values, prior choices and initial values in 
Table 4.12.  Initial values of the random effects were selected by bugs from the assumed 
random effect distributions, given the initial values of the parameters listed in Table 4.12. 
Table 4.12   Parameter, Prior, and Initial Values of Simulated Data  
Fixed Effect True 
value 
Prior choices initial 
values(1) 
initial 
values(2) 
?0 .00 ~dnorm(0, 0.1) .00 0.25 
?1 .50 ~dnorm(0.5, 0.1) 0.50 0.25 
?2 -3.00 ~dnorm(-3, 0.1) -3.00 -2.00 
?0 .50 ~dnorm(0.5, 0.1) 0.50 0.20 
?1 6.00 ~dnorm(6, 0.1) 6.00 5.00 
Random Effect      
?? 3.50 Tau.u ~gamma(0.1, 1.2) Tau.u =0.082 Tau.u =0.07 
?? .80 Tau.v ~gamma(0.1, 0.006) Tau.v =1.6 Tau.v =1 
? .85 ~dbeta(2.4, 0.6) .85 .80 
?? .85 ~dbeta(2.4, 0.6) .85 .80 
??,? .10 Rho.z~ dbeta(.3, 2.7)   
 
A Constraint In Bugs Program For AR(1) Mixed Zip Model 
There is a parameter constraint in the AR(1) Mixed ZIP model.  It is not obvious 
to see this constraint when writing the model and corresponding BUGS program, so we 
mention it here:   
??? = ???? ?  ? 
where ???? =   (1?  ???)??? ?(1? ?2)(1? ?2)?  , and 
            ? =
?(1??2)(1??2)
1????
. 
The term c has a maximum of 1 when ? =  ??, and is smaller than 1 otherwise.  It is 
never negative if ? and ?? are in (-1,1).  Since ???? has to be less than 1, then given 
particular values of ? and ??, ??? will be bounded usually by something less than 1.  
55 
 
Thus, if ??? gets too big, the ???? is forced to be bigger than 1, which is mathematically 
invalid and will cause problems for the algorithm.  It is better to put the prior on ???? 
rather than  ???.   
 
4.2.4   Use of Fisher Information for Determining Sample Size 
The AR(1) Mixed effect ZIP model was developed in an attempt to incorporate all 
sources of variability in a longitudinal ZIP data set described in Chapter III; hence the 
model structure is quite complicated.  In general, the more parameters a model has, the 
more data points are needed.  Also, some parameters are harder than others to estimate; 
that is, some parameters need more data points to provide enough information for 
estimation.   
We are concerned about how many observations are needed to ensure the data 
contains enough information to carry out the estimation with desirable standard errors of 
estimates.  We approach this problem by generating empirical approximations of  Fisher 
information for submodels of the AR(1) model in order  to obtain lower bounds for 
sample sizes for the parameters of primary interest, namely the fixed-effect parameters in 
the regression models.  We then extend these results to obtain the full model sample size.   
In section 4.2.4.1, we discuss the ideas of using Fisher Information for 
determining sample size and the approach to this problem.  We start from GLMM: 
random intercept logistic regression and random intercept Poisson regression in Section 
4.2.4.2.  After that, we investigate the AR(1) mixed effect ZIP model in Section 4.2.4.3.  
The most difficult parameters to estimate in the AR(1) Model will be discussed in Section 
4.2.4.4. 
56 
 
4.2.4.1   Ideas and Approach for Determining Sample Size 
Ideas 
When observations are i.i.d., the diagonal elements of the inverse of the Fisher 
information matrix of the parameters in a model, divided by the sample size, are 
asymptotic sampling variances of the maximum likelihood estimators of the respective 
parameters.  Therefore they provide a gauge for determining how a large a sample is 
needed for the distribution of estimates of a given parameter to be concentrated near the 
true value.  A smaller sample would produce estimates with large dispersion and which 
would be indistinguishable from false values.  Thus correct parameter values would be 
difficult to identify.   
Recall that the expected negative second derivative of the log likelihood is the 
Fisher information matrix (Bickel & Doksum, 2007, pp. 179-188) in N observations: 
                    ? ??
?2
??????
????(?)?  = ??  = Fisher information in N observations. 
According to large sample theory, if the observations are i.i.d., then  ?? ? ?  but   
1
?
?? ? ?1, a non-random matrix that depends on the parameters but is not dependent on 
sample size.  In large samples the distribution of the normalized maximum likelihood 
estimator is approximately ?? (??? ? ?) ? ?(?,?1
?1)  where ??? is the MLE of ?.  
Therefore in large samples the MLE has approximate covariance matrix  
?????? ( ???) ?  
1
?
 ?1
?1Note that ?? = ??1 and (??)
?1 = ??
?1 =
1
?
?1
?1.   
In Bayesian estimation, if   ??? is the Bayes estimator of ?, then  
 ??(??? ? ???)
?
? 0      (by the Bernstein ? von Mises theorem).  Therefore, by 
combining these facts we can say that ?????? ( ???) ?  
1
?
 ?1
?1  when N, is large but 
57 
 
finite.  Markov Chain Monte Carlo methods produce an estimator of the covariance 
matrix of ??.  Therefore we can estimate ?1
?1 by ? ???????? ? ?????.  The approximate 
relationship ?1
?1 ? ? ???????? ? ????? improves as N increases.  Furthermore, if ?1 and 
?2are both large enough, then  
?1 ???????? ? ?????  ?  ?2 ???????? ? ????? ? ?1
?1. 
In a practical problem, we don?t know how big N has to be to get an accurate estimate of 
?1
?1.  In a simulation one can generate and analyze samples of two different (large) sizes. 
As a measure of the closeness of two estimated inverse Fisher information matrices ?1,?1
?1   
and ?1,?2
?1 , estimated from sample sizes N1 and N2, one can either look at  
??1??1,?1
?1 ? ?1,?2
?1 ??, 
where ?1(?) is the maximum eigenvalue of a matrix A. 
or look at  
??? |??1,?1(?, ?) ? ??1,?2(?, ?)|
2
??
 
where ?1,?1
?1 (?, ?) is the (?, ?) element of ?1,?1
?1 .  If ?1,?1
?1 ? ?1,?2
?1  we can assume that we  
have estimated  ?1
?1  with reasonable accuracy.  Thus, by monitoring the approximate  
Fisher information for different sample sizes we can determine the ideal sample size for  
the particular model and level of accuracy that we are interested in. 
 
Approach 
It is known that calculating the observed Fisher information can be 
computationally challenging, especially in a complicated model setting. At this time there 
58 
 
is no convenient way to get the observed Fisher Information in general for models as 
complex as the AR(1) Mixed Effect ZIP model.  Essentially we will use available 
empirical methods to approximate Fisher information of relevant submodels of the full 
model.  The GLMM model is simpler than the ZIP model with random effects (Mixed 
ZIP Model).  Examining the GLMM model (either Poisson or logistic) might give some 
insight about the Mixed ZIP Model.  The SAS PROC GLIMMIX software computes 
MLE's for GLMMs.   
First, we use the Hessian matrix (matrix of second derivatives of log likelihood, 
which estimates the negative Fisher information matrix) in the SAS output to 
approximate the Fisher information for the GLMM logistic model and GLMM Poisson 
model, respectively.  Secondly, we will get posterior samples from OpenBUGS software 
using the CODA output.  These posterior samples yield estimates of the Fisher 
Information and its eigenvalues using R software, for the GLMM Logistic model and 
GLMM Poisson model, respectively.  
Although the Fisher information plays a central role in MLE, we don?t know of 
any paper which computes the observed Fisher information in a Bayesian estimation 
setting.  According to theory, under regularity conditions the covariance matrices of the 
MLE and a Bayesian estimate are asymptotically equivalent.  However, we aren't sure 
how fast the Bayes estimates will converge to the MLE's because we don't know how 
strongly the prior information affects the Bayes estimates for finite sample sizes.   
We expect to see discrepancies between the Bayes estimates and MLE when the 
sample sizes are small.  The discrepancies are due to the assumed prior information, 
MCMC error, sampling error, plus the differences between the two methods.  Thus, in the 
59 
 
following step, we compare the Fisher Information estimates between MLE and Bayesian 
estimation.   
We will start from a smaller sample size and move to a larger sample size.  Using 
SAS helps to check on how close the Bayes computations are to the MLE computations 
when N is large.  In this way, we will approximate the theoretical quantity in a numerical 
example.  Also, we can make sure our approach to Fisher Information in Bayesian 
analysis is on the right track.  However, after examining estimates of Fisher information 
in smaller samples, using both SAS (MLE) and BUGS (MCMC), we will only use SAS 
in larger sample sizes because of the BUGS computational burden.  While SAS might 
need only a couple of seconds to obtain the Hessian matrix, BUGS might take several 
hours to reach model convergence with a sample size of 10,000.          
After our study of GLMM, we will investigate the sample size for the AR(1) 
Mixed effect model, which is our main interest.  Here we will use BUGS (MCMC) only 
but not MLE.  The reason is that we can?t compute the MLE in large samples because the 
MLE requires integrals which can not be calculated analytically: 
? =???(??,??)??????  
?
?=1
? ????????
? ?+ ????????? = 0?
?
?=1
+ ?1 ? ????? + ????? ????????????
? ?+ ????? ?
???????
? ?+ ???????
???!
?? 
where 
60 
 
               ?????
? ? + ???? =
???????
? ?+????
1+????
? ?+????
  and 
               ? depends on ??,??,???,?, and ??. 
As in the previous step, we will start from a smaller sample size and move to 
larger sample sizes until the change of estimated Fisher information per observation 
becomes small.  For such a sample size, we conclude that the covariance matrix of the 
Bayes estimates are close to those of the MLE.  Such an N would appear to guarantee that 
the sample Fisher information and the theoretical information are nearly equal.          
 
4.2.4.2.   Analyses of Fisher Information of GLMMs on MLE and MCMC 
Generate Simulated Data  
We create repeated measure data with three covariates (?1,?2,?3) and ? 
responses (?1~??), where ? is the number of time points.  Each subject i shares its unique 
random effect, ??, with each response and forms a cluster.  The response variable for 
logistic regression is a Bernoulli variable, and the response variable for Poisson 
regression is a count variable.  The two models are: 
Logistic regression:  ?????[?(? = 1)] = ?1 + ?2??1 + ?3?2 + ?4?4 + ??. 
Poisson regression:  log[E(Y)] =?1 +  ?2?1 + ?3?2 + ?4?4 + ??. 
The parameter values for these regressions are: ?1 = ?.5,    ?2 = .1,    ?3 = .2,    
 ?4 = ?.3,    ?? = .5,    ?? = .5.    
Estimation and Sample Size of Simulated Data 
Two estimation methods are used.  One is MLE, in which SAS PROC GLIMMIX 
is used.  The Hessian matrix will be obtained from PROC GLIMMIX.  Then the matrix 
will be used to get the Fisher information and eigenvalues through R software.  The other 
61 
 
method is MCMC, in which OpenBUGS is used. The coda output in BUGS contains 
samples that MCMC draws from posterior distribution.  The sample covariance matrix of 
these estimates will be calculated using R software and inverted to approximate the 
Fisher information.  There are 3 sample sizes for both regressions:  
(1) ? = 300 , ? = 5.     (2) ? = 1200 , ? = 5.     (3) ? = 10000 , ? = 5. 
Results from Logistic Regression 
In this subsection three tables are presented with sample sizes 300 by 5, 1200 by 
5, and 10000 by 5, respectively.  Parameter estimates and their associated standard errors, 
Fisher Information matrices, and eigenvalues are provided in each table.  For sample 
sizes 300 and 1200, results from MLE and MCMC method are both presented; for sample 
size 10000, only results from MLE are presented.  
From Table 4.13, we see the MLE and MCMC estimations are in the right 
direction but not very accurate.  There are some discrepancies among the estimations 
from the two methods.  The small sample size (N=300) also makes these two methods 
unstable.  From Table 4.14, we see the MLE and MCMC estimations are getting more 
accurate and the discrepancies between the estimates from the two methods are getting 
smaller.  The bigger sample size (N=1200) makes the estimation better and brings MLE 
and MCMC results closer together.      
Next we compare the Fisher Information matrix per cluster (with fixed t) and 
eigenvalues from   MLE between Tables 4.14 and 4.15.  We consider the estimates from 
Table 4.13 unstable because of the small sample size and exclude them from the 
comparison.  The average Fisher Information matrix seems to tend to a limiting matrix as 
N gets large, as we move from the second to the third Binomial example (Tables 4.14 and 
62 
 
Table 4.15).  The Binomial-case limit is reached around N=1200 since we see there is not 
much change in the average information as N increases to N=10000.      
Table 4.13   Logistic Regression, ? = 300 , ? = 5 
Parameter True value Estimate (MLE) se (MLE) Estimate(MCMC) Se(MCMC) 
Intercept ?1  -.5 -.4494 .1107 -.5328 .1185 
x1 ?2    .1 .0165 .1259 .0945 .1324 
x2 ?3   .2 .2916 .0687 .3587 .0724 
x3 ?4   -.3 -.3269 .0761 -.2957 .0801 
u ??    .5 .4302 .0531 .5422 .0833 
Estimated Fisher information per subject/cluster from MLE 
 ?1 ?2 ?3 ?4 ?? 
??  0.8667 0.3815 0.1392 0.8144 0.8144 
??  0.3815 -0.0124 0.3694 0.0561 
??   0.7480 0.0915 -0.0252 
??    1.3475 0.1588 
??         0.3183 
Eigenvalues from MLE 
? 2.1552 0.7431 0.3218 0.2910 0.1508 
Estimated Fisher information per subject/cluster from MCMC 
 ?1 ?2 ?3 ?4 ?? 
??  0.7343 0.3125 0.1072 0.6931 0.1069 
??  0.3333 -0.0088 0.2959 0.0264 
??   0.6943 0.0548 -0.0734 
??    1.1730 0.1366 
??         0.5189 
Eigenvalues from MCMC 
? 1.8342 0.7243 0.4663 0.2917 0.1374 
 
 
Table 4.14:   Logistic Regression, N = 1200 , t = 5 
Parameter True value Estimate (MLE) S e (MLE) Estimate(MCMC) Se(MCMC) 
Intercept ?1   -.5 -.4352 .0593 -.4800 .0631 
x1 ?2    .1 .0662 .0642 .0805 .0679 
x2 ?3    .2 .1755 .0322 .1884 .0344 
x3 ?4   -.3 -.3156 .0417 -.3156 .0433 
u ?u    .5 .5202 .0293 .6061 .0502 
 
Estimated Fisher information per subject/cluster from MLE 
 ?1 ?2 ?3 ?4 ?? 
?1  .8092 .3846 .0085 .7620 .1107 
?2  .3846 -.0041 .3624 .0509 
63 
 
?3   .8049 .0354 -.0227 
?4    1.2003 .1317 
??     .2603 
 
Eigenvalues from MLE 
? 1.9837 .8063 .2917 .2408 .1367 
 
Estimated Fisher information per subject/cluster from MCMC 
 ?1 ?2 ?3 ?4 ?? 
?1  .7247 .3468 .0211 .7006 .1124 
?2  .3463 -.0099 .3360 .0455 
?3   .7034 .0381 -.0108 
?4    1.1230 .1308 
??     .3485 
 
Eigenvalues from MCMC 
? 1.8315 .7019 .3279 .2633 .1214 
 
 
 
 
 
 
 
 
 
 
Table 4.15   Logistic Regression, ? = 10000 , ? = 5 
Parameter True value Estimate (MLE) Se (MLE) Estimate(MCMC) Se(MCMC) 
Intercept ?1   -.5 -.5435 .0207   
x1 ?2    .1 .1407 .0222   
x2 ?3    .2 .2154 .0112   
x3 ?4   -.3 -.2944 .0139   
u ??    .5 -.5435 .0207   
 
Estimated Fisher information per subject/cluster from MLE 
 ?1 ?2 ?3 ?4 ?? 
64 
 
?1  .8061 .4062 .0421 .7691 .1176 
?2  .4062 -.0106 .3872 .0551 
?3   .7965 .0503 -.0259 
?4    1.2510 .1408 
??     .2628 
 
Eigenvalues from MLE 
? 2.0434 .7955 .3089 .2412 .1335 
 
Results from Poisson Regression 
 We make comparisons for Poisson regression similar to those for Logistic 
regression.  Again, we compare the Fisher information matrix per cluster (with fixed t) 
and eigenvalues from MLE between Tables 4.17 and 4.18 only.  The convergence of the 
estimated average Fisher information matrices seems slower in the Poisson case.  As we 
move from the second to the third Poisson example (Tables 4.17 and 4.18), we still find 
noticeable differences between N=1200 and N=10000.   
 
 
 
 
 
 
Table 4.16   Poisson Regression, ? = 300 , ? = 5 
Parameter True value Estimate (MLE) S e (MLE)   
Intercept ?1   -.5 -.4632 .0786   
x1 ?2    .1 .1086 .0896   
x2 ?3    .2 .0467 .0484   
x3 ?4   -.3 -.3148 .0539   
V ??    .5 .4465 .0236   
 
Estimated Fisher information per subject/cluster from MLE 
 ?1 ?2 ?3 ?4 ?v 
65 
 
?1  1.7019 .7764 .2115 1.4731 .7197 
?2  .7764 .0026 .7058 .3212 
?3   1.4803 .0794 .0559 
?4    2.4284 .6550 
??     1.8087 
 
Eigenvalues from MLE 
? 4.3453 1.4824 1.4227 .6476 .2978 
 
Table 4.17   Poisson Regression, N = 1200 , t = 5 
Parameter True value Estimate (MLE) S e (MLE) Estimate(MCMC) Se(MCMC) 
Intercept ?1   -.5 -.5508 .0441   
x1 ?2    .1 .1353 .0476   
x2 ?3    .2 .2171 .0240   
x3 ?4   -.3 -.3033 .0307   
V ??    .5 .5167 .0137   
 
Estimated Fisher information per subject/cluster from MLE 
 ?1 ?2 ?3 ?4 ?? 
?1  1.4717 .7205 .1323 1.2998 .5818 
?2  .7205 .0488 .6419 .2706 
?3   1.4601 .1612 .0142 
?4    2.0309 .5485 
??     1.332 
 
Eigenvalues from MLE 
? 3.7325 1.4502 1.0407 .5444 .2474 
 
 
 
Table 4.18   Poisson Regression, N = 10000 , t = 5 
Parameter True value Estimate (MLE) S e (MLE)   
Intercept ?1   -.5 -.5023 .0149   
x1 ?2    .1 .1182 .0160   
x2 ?3    .2 .1886 .0080   
x3 ?4   -.3 -.2977 .0099   
V ??    .5 .4994 .0044   
 
Estimated Fisher information per subject/cluster from MLE 
 ?1 ?2 ?3 ?4 ?? 
66 
 
?1  1.5553 .7962 .1662 1.3838 .6209 
?2  .7962 .0653 .7066 .3139 
?3   1.5551 .1551 .0351 
?4    2.2331 .5832 
??     1.5224 
 
Eigenvalues from MLE 
? 4.0452 1.5400 1.2081 .6115 .2572 
 
Conclusions 
a) In both logistic regression and Poisson regression cases, the PROC GLIMMIX 
results look reasonably satisfactory and show that the logistic regression 
parameters can be well estimated once N gets as large as 1000 or 2000. 
b)  We see that Bayesian computations lead to reasonably good approximations of 
      the per observation Fisher Information matrix.  Furthermore, the Bayesian 
computations give good estimates of ML variance.  We expect that when we 
move to the larger samples, the posterior covariance matrix and MLE covariance 
matrix will be reasonably similar. 
 
 
 
 
4.2.4.3   MCMC Analyses of Fisher Information of AR(1) Model and Determining             
the Sample Size   
Simulated samples of sizes of 2000 and 5000 were analyzed under the AR(1) 
model for purpose of comparison.   The AR(1) model with sample size of 5000 was run 
with 50000 iterations.  A sample of 10000 sets of 10 estimates drawn from the 
67 
 
approximate posterior distribution (there are 10 parameters in the model) was obtained 
using BUGS?s coda function, which provides values drawn in the MCMC chains.  These 
values were used to compute an estimated variance-covariance matrix.  The inverse of 
this estimated covariance matrix is used to estimate the Fisher information, its 
eigenvalues, and its eigenvectors.    Results are shown in Tables 4.19 and 4.20.   
Table 4.19   AR(1) Model, ? = 2000 , ? = 5 
Parameter  True value Estimate  Sd MC_error val2.5pc val97.5pc 
?1    0 -.0693 .1814 .0072 -.4252 .2914 
?2     .50 .3985 .1481 .0061 .1059 .6855 
?3     -3.00 -2.5690 .7912 .0320 -4.1960 -1.0170 
?1    .50 .4994 .0560 .0034 .3919 .6113 
?2 6.00 6.0580 .2108 .0089 5.6600 6.4760 
? .85 .8338 .0169 .0014 .7992 .8664 
?? .85 .8504 .0086 5.339E-4 .8326 .8664 
??? .10 .0995 .0650 .0058 6.694E-5 .2529 
??     3.50 3.2140 .2313 .0204 2.7800 3.6800 
?? .80 .8048 .0166 .0012 .7713 .8369 
Estimated Fisher information per subject/cluster from MCMC 
 ?1    ?2    ?3 ?1    ?2    ? ?? ??? ??     ??     
?1    .033 -.01 -.116 .001 -.007 .000 .000 -.001 .000 .000 
?2      .022 -.004 .001 -.002 .000 .000 .001 .003 .000 
?3       .626 -.009 .037 .002 .000 -.001 -.032 .000 
?1       .003 -.01 .000 .000 .002 .002 .000 
?2     .044 .000 .000 -.001 -.002 .000 
?      .000 .000 .000 -.002 .000 
??       .000 .000 .000 .000 
???        .004 .004 .000 
??             .054 .000 
??          .000 
Eigenvalues from MCMC 
? 8.813 2.926 2.073 1.418 .124 .109 .018 .011 .009 .000 
Table 4.20   AR(1) Model, ? = 5000 , ? = 5 
Parameter  True value Estimate  Sd MC_error val2.5pc val97.5pc 
?1    0 -.0377 .1227 .0065 -.2856 .1963 
?2     .50 .5484 .1015 .0053 .3547 .7515 
?3     -3.00 -2.7250 .5148 .0265 -3.7500 -1.7150 
?1    .50 .4864 .0347 .0023 .4146 .5516 
?2 6.00 6.0970 .1282 .0061 5.8470 6.3590 
? .85 .8560 .0101 9.519E-4 .8352 .8746 
?? .85 .8545 .0054 4.395E-4 .8435 .8643 
68 
 
??? .10 .1204 .0424 .0040 .0217 .1942 
??     3.50 3.5110 .1512 .0141 3.2250 3.7820 
?? .80 .8094 .0097 8.988E-4 .7920 .8276 
 
The Fisher information per subject/cluster from MCMC 
 ?1    ?2    ?3 ?1    ?2    ? ?? ??? ??     ??     
?1    .072 .037 .014 -.055 -.012 .028 -.012 .023 .007 .046 
?2      .040 .007 -.029 -.007 .020 .002 .018 .001 .007 
?3       .004 -.008 -.002 .002 .000 .002 .002 .005 
?1       1.318 .263 .016 -.184 -.607 -.008 .349 
?2     .065 -.005 -.032 -.120 -.002 .059 
?      3.498 .022 -.051 .157 -.050 
??       7.659 .160 -.015 -1.606 
???        .400 -.005 -.256 
??             .018 -.007 
??          2.658 
 
Eigenvalues from MCMC 
? 8.150 3.508 2.357 1.487 .097 .095 .017 .012 .008 .001 
 
Discussion of Results 
a) Although ? = 2000 is sufficient for GLMM, it is obviously insufficient for the 
AR(1) model.  For example, some diagonal elements in the Fisher information 
matrix are smaller than 10?5 (coded as 0), which means almost no information 
for that particular parameter.  We see that the Fisher Information is much 
improved when ? = 5000.   
b) For this particular data, the Fisher information indicates some parameters are 
hard to estimate, in terms of variance.  These parameters are  ?3, ??, ?2, ?2, 
?1(see elements in the Fisher information matrix, N = 5000).    
 
4.2.4.4   The Most Difficult Parameters to Estimate in the AR(1) Model 
69 
 
In this section we use information from eigenvalues and eigenvectors of matrices 
related to the Fisher information matrix to obtain insight into the AR(1) model.  In 
principal component analysis, the eigenvector with the largest eigenvalue is the direction 
along which the data set has the maximum variance.  In our problem we use the 
eigenvalues and eigenvectors of the estimated Fisher information matrix in a similar way.  
The largest eigenvalue corresponds to a linear function of the parameters with maximum 
information, or equivalently the function estimated with the smallest asymptotic variance.  
Similarly, the smallest eigenvalue corresponds to a linear function estimated with the 
largest asymptotic variance.   
In Table 4.21 the ten eigenvalues of the estimated information matrix (based on 
N=5000) are presented.  Based on Table 4.21 we would say there are four large 
eigenvalues.  The other six eigenvalues are quite small, especially the last two 
eigenvalues. 
Table 4.21   The Ten Eigenvalues of AR(1) Model 
8.150428495 3.507783275 2.356704440 1.487195778 0.097264029 
0.094725558 0.016639069 0.012097777 0.008334560 0.000724701 
  
The eigenvectors are presented in Table 4.22.  This is a 10 by 10 matrix.  The 
rows correspond to the ten parameters and the columns present the ten eigenvectors 
corresponding to the eigenvalues from the largest to the smallest.  Here the smaller 
eigenvalues and eigenvectors are more important to investigate.  The elements/loadings 
in each eigenvector are normalized such that sum of squares of the loadings in each 
eigenvector is equal to one.  The regression coefficients and predictors in the two 
regression models are: 
70 
 
Logistic regression: ?1 + ?2 ? ??? + ?3 ? ?
????
5
? (1?
????
5
) + ???  
Poisson regression:  ?1 + ?2 ? ?
????
5
? (1?
????
5
) + ???  
First, we highlight in yellow the large loadings in the first four eigenvectors.  We 
interpret these eigenvectors as the linear functions of parameters that are easiest to 
estimate.  These involve the parameters ?1, ?2, ?, ??, ???, ??, ??.  The second 
eigenvector essentially identifies the parameter ?.  Note that none of the three logistic 
regression coefficients have high loadings in the first four eigenvectors.   
Next, we highlight in blue the large loadings in the smaller six eigenvectors.  We 
conclude that linear functions of the three logistic regression coefficients were the hardest 
parameters to estimate, especially ?3.  Note that the tenth eigenvector, which has an 
eigenvalue near zero, essentially identifies ?3. In general, parameters related to logistic 
regression are harder to estimate.  We also notice that regression coefficients involving 
the quadratic predictor, ?3  and ?2, are harder to estimate.  These results are consistent 
with the discussion in Section 4.2.4.3.   
Let?s take ?3 as an example.  In simulation, AR(1) model with N = 5000, the 
estimated s.e. was 0.5148, and  ?10 was 0.000724701.  If we want the estimated s.e. to be 
less than 0.5, the N has to be 5524: 
????10
? ??? =
1
???10
< 0.5  
? ?
1
(0.5)2(.000724)
 = 5524. 
 
Table 4.22   Parameters and Eigenvectors of AR(1) Model  
? 8.150428495 3.5077832750 2.3567044400 1.487195778 0.097264029 
?1 0.002627000     0.0078315629   0.0042732218   0.053230019    0.066761348 
71 
 
?2 0.000215518  0.0055340834   0.0034163521 0.025205583   0.027109138 
?3 0.000087200   0.0004505660   0.0002479095   0.006935629    0.020035937 
?1 0.043540820   0.0088705441   0.3796528605    0.797113660   0.412493910 
?2 0.007753001 0.0004909037    -0.0726045401    0.163163534    -0.099486900 
? -0.006987224   -0.9983729333 -0.0179533414 -0.024233796 -0.016183366 
?? -0.957105300    0.0135031324 -0.2784509125 -0.078807726 0.002222653 
???   -0.032675910 0.0167047758 0.2116909238 -0.372488909 -0.898067529 
?? 0.001366294 -0.0449497463 0.0038878937 -0.003488855 0.081163939 
?? 0.284373300 0.0244649188   0.8531570177  -0.434640681 -0.031272926 
      
? 0.0947255580 0.0166390690 0.0120977770 0.008334560 0.0007247010 
?1 0.8158251911 0.4707308382 -0.0025058171 0.268145184 -0.1831846257 
?2 0.5466244745 -0.7718167730 0.0682756898 -0.315034824 -0.0133967069 
?3 0.1643590362 0.0975165962 -0.0570951533 -0.008923131 0.9796363244 
?1 0.0688304168   0.0100104426 -0.2075381777 0.018100629 -0.0104995302 
?2 0.0004384083 0.1549497234 0.9486398717 -0.180315659 0.0413194493 
? -0.0127023489 -0.0153659508 0.0113242856 0.038553616 0.0043682125 
?? -0.0029583784 0.0004382848 -0.0002940427 -0.002751346 -0.0001719452 
???   0.0218496526   0.0733196724 -0.0333442856 -0.039944026   0.0025231199 
?? 0.0506938926   0.3787757157 -0.2187146712 -0.890368592 -0.0687715847 
?? -0.0257264521 -0.0030187629 0.0025136791 -0.009193641   0.0020120047 
 
  
72 
 
Chapter V 
Data Analyses 
In Chapter V We present the results of two data analyses.  In Section 5.1 we 
compare three models using simulated data generated from the ZIP AR(1) MODEL.  In 
Section 5.2 we use the most promising model chosen from Section 5.1 to analyze the real 
data ? NLSY97.      
 
5.1   Model Comparison 
We created simulated data from the AR(1) MODEL (see details in Section 4.2.4) 
with two sample sizes,  ? =  2000 and ? =  5000, and fitted the ZIP, MIXED, and 
AR(1) models.  Recall that the ZIP model has no random effects and all observations are 
assumed to be independent.  The MIXED model has three random effect parameters, 
which describe within cluster random effects.  Subjects are independent but observations 
within subject are dependent.  The AR(1) model has five random parameters, which 
model longitudinal data with autocorrelated random effects.  The subjects are still 
independent. 
Since the data is simulated from an AR(1) model, we know all the true parameter 
values.  Thus, the DIC is not necessarily needed for the model comparison.  We expected 
that the AR(1) model would perform the best.  However, we would like to see if the two 
simpler models, ZIP and MIXED, can produce acceptable bias level on the fixed effects.  
In other words, we want to investigate how important the random effect parameters are 
for the estimation of the fixed effects.   
 
73 
 
 Results and Discussion 
The results are presented in Table 5.1.  For sample size N = 2000, the results are 
summarized as follows:  in terms of bias, the AR(1) model is the best and the ZIP model 
is the worst one; in terms of variance, the ZIP model is the best and the AR(1) is the 
worst one; in terms of MSE, the MIXED model is the best.  We also present the 95% 
credible intervals (CI95) for these models (Figure 5.1.1--Figure 5.1.5).  For the ZIP 
model, four parameters were out of the CI95, and one parameter is on the boundary.  For 
MIXED model, three parameters were in the CI95, one parameter is on the boundary, and 
one parameter is outside the boundary.  For the AR(1) model, all five parameters were 
inside of the CI95.   
Since the AR(1) model is the true model, consequently its bias is smaller than the 
other two models.  Note that, although AR(1) is the true model, there is some bias in its 
parameter estimates because of the priors.  We expect these biases will decrease as the 
sample size increases.  These three models have the same amount of information (data) 
so that the ZIP model performs the best in variance estimation because there are fewer 
parameters to estimate.   
In the following, we present the posterior density of AR(1) model parameters 
(simulated data sample size N = 5000).  Evidence of nonconvergence was not found for 
fixed effect parameters; however, the random effect parameters need more time to reach 
their stationary distribution.   
 
 
74 
 
 a) Fixed effect parameter: Beta (1) 
beta[1] sample: 9000
beta[1]
-0.75 -0.5 -0.25 0.0 0.25 0.5
 
b) Fixed effect parameter: Beta (2) 
beta[2] sample: 9000
beta[2]
0.0 0.25 0.5 0.75 1.0
 
c) Fixed effect parameter: Beta (3) 
beta[3] sample: 9000
beta[3]
-5.0 -4.0 -3.0 -2.0 -1.0 0.0
 
d) Fixed effect parameter: Gamma (1) 
gamma[1] sample: 9000
gamma[1]
0.3 0.4 0.5 0.6 0.7
 
e) Fixed effect parameter: Gamma (2) 
gamma[2] sample: 9000
gamma[2]
5.5 5.75 6.0 6.25 6.5
 
75 
 
f) Random effect parameter: Phi 
phi sample: 9000
phi
0.82 0.84 0.86 0.88 0.9
 
g) Random effect parameter: Psi 
psi sample: 9000
psi
0.83 0.84 0.85 0.86 0.87
 
h) Random effect parameter: Rho.uv 
rho.uv sample: 9000
rho.uv
-0.1 0.0 0.1 0.2 0.3
 
i) Random effect parameter: Sigma U 
  
sigma.u sample: 9000
sigma.u
3.0 3.25 3.5 3.75 4.0
 
j) Random effect parameter: Sigma V 
sigma.v sample: 9000
sigma.v
0.76 0.78 0.8 0.82 0.84
 
76 
 
For sample size N = 5000, the results are summarized as follows:  in terms of 
bias, the AR(1) model is better; in terms of variance, the MIXED model is the better; in 
terms of MSE, the AR(1) model is better.  We also present the 95% credible intervals 
(CI95) for these two models (Figure 5.1.1--Figure 5.1.5).  For the MIXED model, two 
parameters were in the CI95, two parameters are on the boundary, and one parameter is 
outside the boundary.  For AR(1) model, all five parameters were inside of the CI95.   
When N increases from 2000 to 5000, we see that the posterior standard deviation 
decreases by one third.  Recall the large sample theory for MLE, which states that the 
variance and sample size are related by: 
????2???? =
?1
?2
????1????. 
Thus, we expected the posterior standard deviation of sample size 5000 to be (2/5)1/2 = 
0.63 times that of the standard deviation when N=2000.  Our results appear consistent 
with the large sample theory. 
Is it possible that AR(1) model is the winner of MSE for all the five fixed effect 
parameters?  We think the answer is YES when the sample size N is large.  However we 
are unable to compute the sample size analytically.  Recall that, 
MSE =(????)2 + ???????? = [?????? ? ?]
2 + ???(???) 
Under regularity conditions and when the model is correctly specified, 
?????? = ? +
?1
?
+ ?(1/?) 
???????? =
?1
?
+ ?(1/?)  
where ?1 and ?1 are some functions of the parameters 
If the model is misspecified, 
77 
 
E????
???? = ? + ?(?) +
?2
?
+ ?(1/?) 
???????
???? = ???? + ?(1/?) 
where ?(?) is effect of misspecification: ?(?) ? 0 
We don?t know whether ???? <  ?1,  
Then                ???????? ? (
?1
?
)2 +
?1
?
  
                        ???????
???? ? (?(?) +
?2
?
)2 +
????
?
 
As ? ? ?,    ???????
???? ? ???????? ? ?(?)
2 +
2?2?(?)
?
+
????
?
?
?1
?
?
?1
?
? ?(?)2 
 
Table 5.1   Estimates of Fixed Effect of ZIP, MIXED, and AR(1) Models 
    
Node  
True  
value 
ZIP (N=2000) MIXED (N=2000) AR(1) (N=2000)  
    
mean sd mse mean sd mse mean sd mse mse 
winner 
?0 .00 .32 .07 .11  .07 .11 .02  -.07 .18 .04  MIX 
?1 .50 .18 .04 .11  .27 .10 .06  .40 .15 .03  AR(1) 
?2 -3.00 -2.33 .34 .57  -2.08 .46 1.06  -2.57 .79 .81  ZIP 
?0 .50 
.92 .02 .18  .54 .03 . 00 .50 .06 . 00 
MIX& 
AR(1) 
?1 6.00 5.53 .10 .24  6.02 .11 .01  6.17 .20 .05  MIX 
Node True  
value 
 MIXED (N=5000) AR(1) (N=5000)  
   
mean sd mse mean sd mse mse 
winner 
?0 .00    0.089 0.08 0.014 -0.038 0.123 0.016 MIX 
?1 .50    0.356 0.071 0.026 0.548 0.102 0.013 AR(1) 
?2 -3.00    -2.132 0.313 0.852 -2.725 0.515 0.341 AR(1) 
?0 .50    0.543 0.02 0.002 0.486 0.035 0.001 AR(1) 
?1 6.00    5.953 0.072 0.007 6.097 0.128 0.026 MIX 
 
78 
 
 
Figure 5.1       95% Credible Interval of Beta_0 Fixed Effects  
 
 
 
Figure 5.2       95% Credible Interval of Beta_1 Fixed Effects 
 
79 
 
  
Figure 5.3       95% Credible Interval Beta_2 Fixed Effects  
 
 
Figure 5.4       95% Credible Interval Gamma_0 Fixed Effects  
 
80 
 
 
Figure 5.5       95% Credible Interval Gamma_1 Fixed Effects  
 
5.2   NLSY97 Data Analysis 
Imputation of Missing Values 
We choose the NLSY97 data to illustrate the models.  Note that the NLSY97 
survey personnel do not, in general, impute missing values or perform internal 
consistency checks across waves.  The following missing value conventions are used 
throughout the data: Noninterview, Valid Skip (respondent was not asked a question 
because it did not apply to him or her), Invalid Skip, Don?t Know, and Refusal.  Item 
nonresponse due to refusals, don?t knows, or invalid skips is usually quite small, so the 
degree to which the weights are incorrect is probably quite small (NLSY, 1997).  The 
weights are not considered here.  Even though the sample size is almost 9000 subjects, 
we have a serious missing value problem.  Only 600 self-reported smoking subjects 
81 
 
report cigarette consumption at all five observation times.  Since the ZIP models require 
large sample size, an imputation is needed.   
The following table is the NLSY97 coding for variables NUMBER OF 
CIGARETTES SMOKED 1997 and HAVE YOU EVER SMOKED 1997.  The entries 
are frequencies of responses to each combination of values. Blanks represent zeros. 
Table 5.2   NLSY97 Data Coding Example 
 Have you ever smoked in 1997 
D I R N V 0 (No) 1(Yes) 
Number of cigarettes  
smoked in 1997 
Don?t Know (D)       6 
Invalid skip (I)        
Refuse (R)        4 
Non interview (N)        
Valid skip (V) 5 2 24   5436 1899 
Cigarettes smoked (Y)      0 1608 
Valid total       8943 
 
Only the ?Valid Skip? in ?Number of Cigarettes Smoked? cell requires 
imputation. The other four types of missing values are deleted from the analysis.  Records 
with missing values in the covariates, e.g., ?Have You Ever Smoked,? are also deleted 
from the analysis.   
The 5436 subjects who responded ?NO? to ?Have you ever smoked? are non-
smokers and their numbers of cigarettes smoked are simply imputed as Y = 0.  The 1899 
subjects who responded ?YES? are smokers and we impute their numbers of cigarettes 
smoked as integers Y with Poisson distributions, given the covariates.  Note that the Y 
values of these 3057 subjects (1899 + 1608) do not follow a standard Poisson distribution 
but an overdispersed Poisson.  Since we don?t know how to impute overdispersed 
Poisson data, the use of the Poisson distribution for Y will yield smaller variances 
compared to the original unobserved data.     
 
82 
 
Because the missing pattern is ?Swiss cheese? (e.g., a subject may respond in year 1, 
refuse  to respond in year 2, and then respond in year 3) and subjects do not always 
remain in the same class from year to year, we could not impute all five years at one time.  
Instead, we did the imputation year by year.  As a result, our imputation yields smaller 
time correlations compared to original unobserved data.   
For 1997 we used only the 1899 + 1608 = 3507 subjects who responded ?YES? to 
?Have you ever smoked? to impute the missing values, using Poisson regression in 
WinBUGS.  The covariates are: AGE (ordinal scale), SEX, PEER (ordinal scale), and 
RACE (nominal scale).  If there were missing covariates, we deleted that case.  The 
priors for the regression coefficients were normal with mean and variance estimated from 
observed data and initial values of the regression coefficients were drawn from their 
priors.  No random effects were used in the imputation.  BUGS imputed the missing 
counts ?? as random draws from their predictive distribution.  This analysis treats the 
missing data mechanism as ignorable, given the covariates (Gelman et al. 2003, pp. 517-
518).  For each of the later years we used exactly the same methods as described above.  
The imputations for each of the five years were performed independently. 
From the descriptive statistics in Tables 5.3 ~ 5.5, the imputation has some degree 
of bias: We observe slightly smaller means, smaller standard deviations, and lower 
correlations.  If we could provide information in the longitudinal zero inflated Poisson 
data setting--that is, AR(1) model imputation--the imputation might do a better job.  
However, this is beyond the scope of this research.     
 
 
 
 
83 
 
Table 5.3   Numbers of Cigarettes Smoked Per Day from Year 1997 to Year 2001 
Before imputation  After imputation  
Variable  N Mean  SD  Variable N Mean  SD 
cig97 1608 5.0 6.0  cig_97 3485 4.6 4.5 
cig98 2153 6.2 7.6  cig_98 2993 6.1 6.6 
cig99 2418 7.3 8.3  cig_99 3026 7.2 7.6 
cig00 2603 8.0 8.5  cig_00 3146 8.0 7.9 
cig01 2749 8.5 8.8  cig_01 3226 8.4 8.3 
 
Table 5.4   Correlations of Number of Cigarettes Smoked, from 1997 to 2001  
Before imputation After imputation 
Year  1997 1998 1999 2000 2001 Year  1997 1998 1999 2000 2001 
1997 1.00  .45 .34 .35 .30 1997 1.00 .38 .30 .27 .26 
1998  1.00 .51 .44 .35 1998  1.00 .47 .42 .31 
1999   1.00  .55 .40 1999   1.00 .50 .38 
2000    1.00  .51 2000    1.00 .47 
2001     1.00  2001     1.00 
 
Table 5.5   Numbers of Cigarettes Smoked Per Day in Year 1997 
Before imputation After imputation  Before imputation After imputation  
cig97 Frequency cig_97 Frequency cig97 Frequency cig_97 Frequency 
0 146 0 232 16 8 16 8 
1 450 1 609 17 8 17 9 
2 232 2 531 18 7 18 7 
3 139 3 467 19 1 19 1 
4 94 4 370 20 68 20 67 
5 95 5 301 21 0 21 0 
6 48 6 233 22 1 22 1 
7 30 7 146 23 1 23 1 
8 32 8 127 24 5 24 5 
9 8 9 71 25 7 25 7 
10 124 10 152 26 1 26 1 
11 5 11 18 27 0 27 0 
12 16 12 26 28 0 28 0 
13 5 13 14 29 0 29 0 
14 2 14 5 30 14 30 14 
15 61 15 62     
 
  
84 
 
NLSY97 Data Analysis 
The imputed NLSY97 data has N = 6932 complete cases, five time waves, and 
four covariates.  The RACE variable was coded using three dummy variables, the 
indicators of  ?BLACK,? ?HISPANIC,? and ?MIXED.?  Now we have 19 parameters to 
estimate.  The SEX variable was coded using one dummy variable, the indicator of SEX 
= ?female?.  We explored the unimputed NLSY data using a GLMM Poisson model and 
GLMM logistic model to get some idea about the data.  Then we used this knowledge to 
assign some of the priors (fixed effect, ??, and ?? priors) and initial values (fixed effect, 
??, and ??).  We also used the knowledge from our previous simulation studies to assign 
the priors and some of the initial values.  The remaining initial values were generated by 
BUGS from the prior distributions.  All the prior distributions and model structure are the 
same as the model used in the simulation study.  Only one chain is used in this analysis, 
but we still have some other tools to check convergence.  The model runs smoothly but 
extremely slowly.  Because of high autocorrelation, the iterations were thinned by a 
factor of 20.  In each 24 hours OpenBUGS could only run 5000 iterations, which with the 
thinning factor produced 250 saved values in the chains.  The AR(1) model ran for more 
than 15 days, performing 250 X 20 X 15 = 75000 (iterations) with a burn-in of 
60000 iterations.  The ratios (MC_error/Sd) are around 16%.  Ideally, we would like 
these ratios to be no more than 1%.   
Table 5.6 displays the results.  Based on the results, we highlight important 
random effect parameters in yellow and the most important covariate parameters in green 
in terms of statistical significance.  The random effect parameters, ?, ??, ??, ??, and ??? 
are overwhelmingly significant.  We conclude that the NLSY data are strongly influenced 
85 
 
by the random effects, which are highly correlated.  Evidently a simpler random effect 
structure would not adequately fit these data, so the ZIP model may produce serious bias 
for this NLSY data.  However, MIXED model might produce milder bias than we saw in 
the simulation study of Chapter 5.1, in which ? = .85 and ?? =  .85.  The MIXED model 
will cause negligible bias and smaller variance when ? = 1 and ?? =  1.     
Most of the covariates have a significant effect in the model.  Based on the 
logistic regression, which estimates the probability that a youth is a nonsmoker, Hispanic 
and mixed race youths are more likely than whites to smoke because their coefficients 
have negative signs; the older youths are more likely than younger youths to be smokers; 
the youth having more peers who are smokers is more likely be a smoker.    Based on the 
Poisson regression, which estimates the numbers of cigarettes that a youth smoked, black 
smokers smoke more cigarettes than white smokers; Hispanic and mixed smokers smoke 
fewer cigarettes than white smokers; the older youths are more likely than younger 
youths to smoke more cigarettes; the youth having more peers as smokers are more likely 
to smoke more cigarettes. 
Table 5.6   NLSY 1997 ~ 2001 Data Analysis Results (AR(1) Model) 
Logistic  Mean Sd MC_error Poisson  Mean Sd MC_error 
beta 5.282 .652 .115 gamma -.514 .063 .011 
age -.251 .021 .004 age_p .041 .002 .000 
peer -.438 .057 .008 peer_p .248 .015 .003 
sex .912 .209 .028 sex_p -.056 .042 .007 
black .860 .502 .088 black_p 1.949 .043 .007 
hisp -2.572 .219 .032 hisp_p -.741 .059 .011 
mixed -12.58 .236 .029 mixed_p -3.397 .030 .005 
? .949 .000 .000 ?? .868 .000 .000 
?? 7.223 .054 .009 ?? 3.635 .006 .001 
        
??? .886 .002 .000     
 
86 
 
Since we don?t know the true parameters of the NLSY97 data, we would like to 
use the deviance information criterion (DIC) in BUGS to make model fit comparisons.  
However the DIC assumes that the posterior mean can be used as a ?good? summary of 
central location for description of the posterior distribution (Ntzoufras, 2009, pp 140-
141).  Since we did not run the model long enough to meet the model convergence 
assumption, we did not use DIC for these analyses.  We present the other two model 
results in the following.     
Table 5.7   NLSY 1997 ~ 2001 Data Analysis Results (MIXED Model) 
Logistic  Mean Sd MC_error Poisson  Mean Sd MC_error 
beta 4.027 2.174 0.339 gamma -0.909 0.338 0.052 
age -0.232 0.031 0.003 age_p 0.035 0.010 0.001 
peer -0.532 0.046 0.006 peer_p 0.199 0.022 0.003 
sex 0.155 0.069 0.004 sex_p -0.244 0.031 0.003 
black 2.708 2.236 0.351 black_p 0.873 0.359 0.055 
hisp -0.820 0.122 0.009 hisp_p 0.114 0.057 0.006 
mixed -0.754 0.108 0.009 mixed_p 0.540 0.032 0.001 
?? 2.464 0.255 0.038 ?? 0.706 0.089 0.013 
??? -0.472 0.081 0.012     
   
Table 5.8   NLSY 1997 ~ 2001 Data Analysis Results (ZIP Model) 
Logistic  Mean Sd MC_error Poisson  Mean Sd MC_error 
beta 3.578 .038 .008 gamma -.609 .018 .003 
age -.116 .008 .002 age_p .039 .001 .000 
peer -.308 .009 .001 peer_p .144 .003 .000 
sex .103 .023 .001 sex_p -.204 .007 .000 
black .108 .147 .032 black_p 1.299 .017 .004 
hisp -.472 .037 .004 hisp_p .049 .013 .002 
mixed -.491 .030 .003 mixed_p .427 .010 .001 
 
These three models all have ratio (MC_error / SD) around 16%.  Some parameters 
are not quite convergent yet; while some are not mixing well.  Running a longer time 
might be helpful.     
87 
 
Even the ZIP model needs an extra one or two weeks of running time.   Based on 
observing the trace plots and hist plots, we found that the estimation of Poisson 
regression is much worse than logistic regression in terms of the stability of the Markov 
chain.  This is opposite to the results we saw in the simulation study.  A possible reason is 
the bumps of numbers of cigarettes smoked, that the frequency of 5, 10, 15, or 20 
cigarettes smoked are especially higher than other numbers.  As for the estimation of 
variances, the ZIP model has smallest variances and the AR(1) model has largest 
variances.  This finding agrees with our simulation study.          
Although none of three models is the true model, we think AR(1) is closer to the 
truth.  None of the parameter estimates are close together values in the three models.  
Some of parameters have the same sign and the magnitudes of the values occur in 
decreasing order of AR(1), MIX, and ZIP.  For example, the estimated means of ??? are 
5.282 (AR(1)), 4.027 (MIX),  and 3.578 (ZIP).  However, some parameters do even not 
meet this level of agreement (highlighted in Yellow).  We also found that more than half 
of the MIX model estimates (highlighted in red) have big discrepancies from those of the 
AR(1) model.  The prior in MIX model is one possible reason.     
Table 5.9   Comparison of AR(1), MIX, and ZIP Model Estimates of Means   
LOGISTIC  AR(1) MIX ZIP POISSON  AR(1) MIX ZIP 
beta 5.282 4.027 3.578 gamma -.514 -0.909 -.609 
age -.251 -0.232 -.116 age_p .041 0.035 .039 
peer -.438 -0.532 -.308 peer_p .248 0.199 .144 
sex .912 0.155 .103 sex_p -.056 -0.244 -.204 
black .860 2.708 .108 black_p 1.949 0.873 1.299 
hisp -2.572 -0.820 -.472 hisp_p -.741 0.114 .049 
mixed -12.58 -0.754 -.491 mixed_p -3.397 0.540 .427 
?? 7.223 2.464  ?? 3.635 0.706  
??? .886 -0.472      
  
88 
 
Chapter VI 
Conclusions and Future Research 
6.1   Summary 
In this study, we developed the AR(1) model to handle longitudinal zero inflated 
Poisson data.  We conducted simulation studies of the model and fitted the model to a 
real world data set, the National Longitudinal Survey of Youth.  We used the theory of 
maximum likelihood estimation and Fisher information to develop estimates of the 
necessary sample size for our simulations.  We compared the performance of the AR(1) 
model to simpler models to assess bias and sampling error.  
For the simulated AR(1) data, which had features of the NLSY97 data, the ZIP 
model and MIXED model produced seriously biased estimates of the fixed effect 
parameters. However, these simpler models had less sampling variability than the AR(1) 
model, because fewer parameters had to be estimated.  
The AR(1) model requires a large sample size to obtain adequate accuracy.  Also, 
the large sample size, complicated model structure, and slowness of BUGS software 
require a tremendous length of time to analyze data using MCMC methods.  The AR(1) 
model as implemented in BUGS, imposes a heavy computational burden.   
 
6.2   Topics for Future Research 
The AR(1) ZIP model entails a heavy computational burden.  A more 
sophisticated MCMC algorithm might potentially reduce the computational requirements 
and make the model more useful to practitioners.  Such algorithms might not be possible 
in BUGS and may require programming in more flexible languages, such as R or C++. 
89 
 
Maximum likelihood estimation does not seem like a practical way to analyze the 
model because of the need to compute high dimensional integrals for each subject in 
order to evaluate the likelihood.   
It is possible that a Generalized Estimating Equation (GEE) approach might be 
devised.  The GEE approach requires one to model the mean response accurately, but a 
correct model of the variance-covariance structure is not required (Agresti, 2002, Ch. 11).  
However, the mean response still involves integrals that can?t be evaluated in closed 
form.  Therefore GEE may also lead to computational difficulties. 
Fitting the GLMM model instead of the AR(1) model is another alternative 
method.  Our simulations suggest that estimates of regression coefficients may be biased 
under this model.  One can not be sure whether the smaller standard errors and 
computational savings would offset the bias in estimating regression coefficients.   
 
  
90 
 
References 
Agresti, A. (2002) Categorical Data Analysis 2nd ed. New York: John Wiley & Sons.  
Berger, J. O. (1984). The Robust Bayesian Viewpoint.  In Robustness of Bayesian 
      Analysis, Joseph B. Kandane (Ed.) Amsterdam: North Holland, 63 ? 144. 
Best, N., Cowles, M. & Vines, K. (1996).  CODA: Convergence Diagnostics and  
      Output Analysis Software for Gibbs Sampling Output, Version 0.30, MRC  
      Biostatistics Unit, Institute of Public Health, Cambridge, UK. 
Bickel, P. & Doksum, K. A. (2007). Mathematical Statistics: Basic Ideas and 
      Selected Topics. New Jersey: Pearson Prentice Hall. 
Brooks, S. P., & Gelman, A. (1998). General Methods for Monitoring Convergence of 
      Iterative Simulations. Journal of Computational and Graphical Statistics, 7(4), 434- 
      455. 
Carlin, B. & Louis, T. (2000). Bayesian and Empirical Bayes Methods for Data 
      Analysis. NY: Chapman & Hall/CRC.     
Carriquiry, A. & Pawlovich, M. (2004). From Empirical Bayes to Full Bayes: 
      Methods for Analyzing Traffic Safety Data.  
     http://publications.iowa.gov/13274/1/eb_fb_comparison_whitepaper_october2004.pdf     
Chao, M. T. (1970). The Asymptotic Behavior of Bayes? Estimators.  Annals of 
      Mathematical Statistics 41, 601 ? 608.     
Chatfield, C. (2009).  The Analysis of Time Series: An Introduction (6th ed.).  NY: 
       Chapman & Hall/CRC. 
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2003).  Bayesian Data 
      Analysis. Second Edition. New York: Chapman & Hall. 
91 
 
Gelman, A., & Rubin, D. (1992). Inference from Iterative Simulation Using Multiple  
        Sequences. Statistical Science. 7, 457-511. 
Ghosh, S.K., Mukhopadhyay P, & Lu, JC. (2006). Bayesian Analysis of Zero- 
        Inflated Regression Models.  Journal of Statistical Planning and Inference, 136, 
        1360?75. 
Gilk, W., & Roberts, G. (1996).  Strategies for Improving MCMC, in Markov Chain 
       Monte Carlo in Practice. Gilk, W. R., Richardson, S., & Spiegelhalter, D. (Eds.) 
       UK: Chapman & Wall. 89 -110. 
Gill, J. (2008). Bayesian Methods: A Social and Behavioral Sciences Approach. 
        Second Edition. FL: Chapman & Hall/CRC. 
Hall, D. B. (2000). Zero-Inflated Poisson and Binomial Regression with Random 
        Effects: A Case Study.  Biometrics. 56, 1030-1039. 
Hedeker, D., & Gibbons, R. (2006). Longitudinal Data Analysis. New Jersey: Wiley. 
Heibron, D. (1994). Zero-altered and Other Regression Models for Count Data with 
        Added Zeros.  Biometrical Journal, 36, 531-547.   
Jansen, M. G. H. (1986). A Bayesian Version of Rasch?s Multiplicative Poisson 
        Model for the Number of Errors of an Achievement Test. Journal of Educational 
        Statistics, 11, 2, 147-160.  
Lambert, D. (1992). Zero-Inflated Poisson Regression Models with an Application 
       to Defects in Manufacturing. Technometrics, 34, 1-14. 
Lindley, D. V. (1969).  Discussion of Compound Decisions and Empirical Bayes, J. 
        B. Copas. Journal of the Royal Statistical Society, Series B 31, 397-425.  
Lynch, S. M. (2007) Introduction to Applied Bayesian Statistics and Estimation for 
92 
 
         Social Scientists. NY: Springer. 
Miaou, S.-P. (1994). The Relationship Between Truck Accidents and Geometric 
         Design of Road Sections. Poisson versus Negative Binomial Regressions.  Accident 
         Analysis & Prevention, 26, 471-482.  
Min, Y., & Agresti, A. (2005). Random Effect Models for Repeated Measures of Zero- 
        Inflated Count Data. Statistical Modelling, 5, 1?19. 
Mullahy, J. (1986) Specification and Testing of Some Modified Count Data Models. 
       Journal of Econometrics, 33, 341?65. 
Neelon, B. H,  O?Malley, A. J., & Normand, S. T. (2010). A Bayesian Model for  
       Repeated Measures Zero-Inflated Count Data with Application to Outpatient  
       Psychiatric Service Use. Stat Modelling, 10(4), 421?439. 
Nelder, J. A., & Wedderburn, R. W. M. (1972) Generalized Linear Models. 
       Journal of the Royal Statistical Society, Series A, 135, 370-384. 
NLSY97. (1997). National Longitudinal Survey of Youth 1997.  
       http://www.bls.gov/nls/nlsy97.htm  
Ntzoufras, I. (2009). Bayesian Modeling Using WinBUGS. USA: Wiley.  
R Development Core Team (2008), R: A language and environment for statistical 
      Computing. Vienna: R Foundation for Stat. Comp., http://WWW.R-project.org. 
Rasch, G. (1980). Probabilistic Models for Some Intelligence and Attainment Tests. 
        Chicago: The University of Chicago Press. 
Ridout, M., Demetrio, C. G. B., & Hide, J. (1998). Models for Count Data with 
        Many Zeros. Invited paper presented at the Nineteenth International Biometric 
       Conference, Capetown, South Africa. 
93 
 
Roberts, G. (1996). Markov Chain Concepts Related to Sampling Algorithms.  In 
        Gilks, W., Richardson, S., & Spiegelhalter, D. (Eds.) Markov Chain Monte Carlo in 
        Practice. Uk: Chapman & Hall. pp. 45-58.  
Shao, J. (2003). Mathematical Statistics. NY: Springer. 
Smith, B. (2005). (B)ayseian (O)utput (A)nalysis Program (BOA) Version 1.1.5  
        User?s Manual, Technical Report, Department of Health.  The University of Iowa, 
        available at http://www.public-health.uiowa.edu/boa  
Spiegelhalter, D. Thomas, A. Best, N. & Lunn, D. (2003) WinBUGS User 
        Manual, Version 1.4. MRC Biostatistics Unit, Institute of Public Health and 
        Department of Epidemiology and Public Health, Imperial College School of  
        Medicine, UK available at http://www.mrc-bsu.cam.ac.uk/bugs. 
Sun, D., Speckman, P. L., & Tsutakawa, R. K. (2000). Random Effects in 
       Generalized Linear Mixed Models. Generalized Linear Models: A Bayesian 
       Perspective. New York: Marcel Dekker.   
Weng, C. F. (2008).  Fixed versus Mixed Parameterization in Logistic Regression  
        Models: Application to Meta-Analysis.  Unpublished Master Thesis. Department of  
        Mathematics at University of Maryland, College Park.   
 
 
 
 
 
94 
 
 95