ABSTRACT
Title of dissertation: HANDLING OF MISSING
DATA WITH GROWTH MIXTURE MODELS
Daniel Y. Lee, Doctor of Philosophy, 2019
Dissertation directed by: Professor Jeffrey R. Harring
University of Maryland
Department of Human Development and
Quantitative Methodology
The recent growth of applications of growth mixture models for inference with
longitudinal data has introduced a wide range of research dedicated to testing the
different aspects of the model. One area of research that has not drawn much at-
tention, however, is the performance of growth mixture models with missing data
and when using the various methods for dealing with them. Missing data are usu-
ally an inconvenience that must be addressed in any data analysis scenario, and the
use of growth mixture models is no less an exception to this. While the literature
on various other aspects of growth mixture models has grown, not much research
has been conducted on the consequences of mishandling missing data. Although the
literature on missing data has generally accepted the use of modern missing data
handling techniques, these techniques are not free of problems nor have they been
comprehensively tested in the context of growth mixture models. The purpose of
this dissertation is to incorporate the various missing data handling techniques on
growth mixture models and, by using Monte Carlo simulation techniques, to pro-
vide guidance on specific conditions in which certain missing data handling methods
will produce accurate and precise parameter estimates typically compromised when
using simple, ad hoc, missing data handling approaches, or incorrect techniques.
HANDLING OF MISSING DATA
WITH GROWTH MIXTURE MODELS
by
Daniel Y. Lee
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2019
Advisory Committee:
Professor Jeffrey R. Harring, Chair/Advisor
Professor Xin He
Professor Laura M. Stapleton
Professor Tracy M. Sweet
Professor Ji Seung Yang
©c Copyright by
Daniel Y. Lee
2019
Dedication
This dissertation is dedicated to my mom, dad, and Yehyun. They have been a
constant reminder that no matter what happens, at the end of the day, everything
will be okay.
ii
Acknowledgments
An overwhelming number of people have been a part of this accomplishment.
Among them, there are a select group of individuals I would especially like to ac-
knowledge. Without their support, finishing this dissertation would have been im-
possible.
First, I would like to thank the EDMS family, especially my advisor, Jeffery
R. Harring, who has gone above and beyond simply being my advisor. Throughout
the years he has become a close mentor, colleague, and friend. I would also like
to thank the members of my dissertation committee, Ji Seung Yang, Tracy Sweet,
Laura Stapleton and Xin He, as well as other EDMS professors I’ve had the pleasure
to collaborate with and learn from during my time in Maryland: Gregory Hancock,
Hong Jiao and Robert Lissitz. I would also like to acknowledge the HDQM staff,
especially Jannitta, Charm, and Cornelia, for their amazing administrative support.
Some of the best memories I made during my time in the program were with
classmates I used to share the office space in Cole with. I would like to thank
them for all the memories collaborating on assignments, talking smack about the
undergrads, and introducing me to the Wednesdays Lamb shank special over at
Stamp. I would especially like to thank my brother in Christ and colleague Kaiwen
for his unwavering friendship since the beginning.
My last few years in the program would not have been possible if not for CAL.
In particular, I would like to thank Dorry Kenyon, Shu Jing Yen, and the PQR team
for making my job at CAL stress free so that I could keep my mind on finishing this
dissertation. I am also very thankful for the group of colleagues who were constantly
checking up on my progress.
Miriam, Andrew, Charlotte, Eugene, Ed, Joe, Chunky, and other family and
friends in Maryland, New Jersey, New York, and Korea have also been a major
influence in my life during this time. I would like to thank them for reminding me
that there are more important things in life than work.
Lastly, above all, I would like to thank Jesus Christ, my lord and my savior,
for blessing me with the health, motivation, and intellect to reach this milestone.
iii
Table of Contents
Dedication ii
Acknowledgements iii
List of Tables vii
List of Figures x
1 Introduction 1
1.1 Growth Mixture Models and Missing Data . . . . . . . . . . . . . . . 1
1.2 The Current Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Literature Review 8
2.1 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Missing Data Taxonomy . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Methods for Addressing Missing Data . . . . . . . . . . . . . . 12
2.1.2.1 Original Methods . . . . . . . . . . . . . . . . . . . . 13
2.1.2.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . 17
2.1.2.3 Expectation Maximization Algorithm . . . . . . . . . 20
2.1.2.4 Multiple Imputation . . . . . . . . . . . . . . . . . . 21
2.1.2.5 Bayesian Inference Behind Multiple Imputation . . . 23
2.1.2.6 Joint Multivariate MI . . . . . . . . . . . . . . . . . 24
2.1.2.7 Fully Conditional Specification Multiple Imputation . 26
2.2 Growth Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Growth Models Extension to Finite Mixtures . . . . . . . . . . . . . . 31
2.3.1 Estimation of Growth Mixture Models . . . . . . . . . . . . . 33
2.3.1.1 Maximum Likelihood Estimation with the EM Al-
gorithm . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.1.2 Fully Bayesian Estimation . . . . . . . . . . . . . . . 35
2.4 Handling Missing Data with Growth Mixture Models . . . . . . . . . 41
2.4.1 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.2 Bayesian Methods . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.3 Single-Stage Multiple Imputation . . . . . . . . . . . . . . . . 44
iv
2.4.4 Two-Stage Multiple Imputation . . . . . . . . . . . . . . . . . 45
2.4.5 Summary: Comparison of Methods . . . . . . . . . . . . . . . 51
2.5 Research Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Methods 54
3.1 Step 1: Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.1 Class Separation . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.1.2 Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Step 2: Data Amputation . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2.1 MNAR Missingness (CMAR+) . . . . . . . . . . . . . . . . . . 61
3.2.2 MCAR Missingness . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3 Step 3: Missing Data Handling Approaches . . . . . . . . . . . . . . . 63
3.3.1 Label Switching . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.2 Convergence Issues . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.3 Number of Imputations . . . . . . . . . . . . . . . . . . . . . . 66
3.3.4 Choice of Priors . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 Summary of Manipulated Conditions . . . . . . . . . . . . . . . . . . 69
3.5 Step 4: Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . 72
3.5.1 Convergence Rates . . . . . . . . . . . . . . . . . . . . . . . . 72
3.5.2 Accuracy Rates . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.5.3 Relative Bias and Absolute Bias . . . . . . . . . . . . . . . . . 73
3.5.4 Standard Error Bias . . . . . . . . . . . . . . . . . . . . . . . 74
3.5.5 Factorial ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.6 Expectations from the Study . . . . . . . . . . . . . . . . . . . . . . . 76
4 Results 78
4.1 Complete Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.1.1 Convergence Rates . . . . . . . . . . . . . . . . . . . . . . . . 81
4.1.2 Classification Accuracy Rates . . . . . . . . . . . . . . . . . . 82
4.1.3 Relative Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.4 Standard Error Bias . . . . . . . . . . . . . . . . . . . . . . . 92
4.2 Missing Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.1 Convergence Rates . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.2 Classification Accuracy Rates . . . . . . . . . . . . . . . . . . 99
4.2.3 Relative Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2.3.1 Results of the Main Effects . . . . . . . . . . . . . . 106
4.2.3.2 Results of the Interaction Effects . . . . . . . . . . . 110
4.2.4 Standard Error Bias . . . . . . . . . . . . . . . . . . . . . . . 124
4.2.4.1 Results of the Main Effects . . . . . . . . . . . . . . 129
4.2.4.2 Results of the Interaction Effects . . . . . . . . . . . 130
4.3 Summary of Main Findings . . . . . . . . . . . . . . . . . . . . . . . 134
4.3.1 Full Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 134
4.3.2 Missing Data Analysis . . . . . . . . . . . . . . . . . . . . . . 135
v
5 Discussion 137
5.1 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.2 Limitations and Future Studies . . . . . . . . . . . . . . . . . . . . . 143
A Complete Data Methods Relative Bias and SE/SD Ratio Tables 149
B Missing Data Methods SE/SD Ratios and Pairwise Comparison Tables 152
C Code to Implement the Two-Stage Imputation Method in Mplus 162
Bibliography 166
vi
List of Tables
3.1 Data Generating Parameters for Varying MD Conditions . . . . . . . 59
3.2 Coefficients for Creating Missing Data . . . . . . . . . . . . . . . . . 62
3.3 Mean Relative Bias Values by Analysis Method (Complete Data Meth-
ods) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4 Overview of Simulation Conditions . . . . . . . . . . . . . . . . . . . 70
4.1 Abbreviations and Symbols Used for Summary of Results . . . . . . . 80
4.2 Mean Convergence Rates for Complete Data Methods Grouped by
Separation and Sample Size . . . . . . . . . . . . . . . . . . . . . . . 82
4.3 Mean Classification Accuracy Rates for Complete Data Methods Grouped
by Separation and Sample Size . . . . . . . . . . . . . . . . . . . . . 84
4.4 Means and Medians of the Overall Absolute Relative Bias Grouped
by Method, Sample Size (top), and Class Separation (bottom) . . . . 85
4.5 Factorial ANOVA Results for Relative Bias of the Estimates of Pa-
rameter Means (Complete Data Methods) . . . . . . . . . . . . . . . 88
4.6 Factorial ANOVA Results for Relative Bias of the Estimates of Pa-
rameter Variances (Complete Data Methods) . . . . . . . . . . . . . . 89
4.7 Mean Relative Bias Grouped by Method and Sample Size for Esti-
mates of µint , Ψslop, Θ, and πc1 (Complete Data Methods) . . . . . 90C2
4.8 Mean Relative Bias Grouped by Method and Separation for Estimate
of πc1(Complete Data Methods) . . . . . . . . . . . . . . . . . . . . . 91
4.9 Mean Relative Bias Values Grouped by Analysis Method (Complete
Data Methods) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.10 Means and Medians of the SE/SD Ratios Grouped by Method, Sam-
ple Size (top), and Separation (bottom) for Complete Data Methods . 93
4.11 Factorial ANOVA Results for SE/SD Ratios of the Estimates of the
Mean Parameters (Complete Data Methods) . . . . . . . . . . . . . . 94
4.12 Factorial ANOVA Results for SE/SD Ratios of the Estimates of the
Variance Parameters (Complete Data Methods) . . . . . . . . . . . . 94
4.13 Mean SE/SD Ratios Grouped by Method and Sample Size (Complete
Data Methods) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.14 Mean Convergence Rates for Missing Data Methods Grouped by
Method and Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . 99
vii
4.15 Mean Classification Accuracy Rates for Missing Data Methods Grouped
by Method and Separation . . . . . . . . . . . . . . . . . . . . . . . . 100
4.16 Means and Medians of the Overall Absolute Relative Bias Grouped
by Method, Sample Size (Top), and Separation (Bottom) (Missing
Data Methods) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.17 Means and Medians of the Overall Absolute Relative Bias Grouped
by Method and Missing Rate . . . . . . . . . . . . . . . . . . . . . . 104
4.18 Means and Medians of the Overall Absolute Relative Bias Grouped
by Method and Missing Mechanism . . . . . . . . . . . . . . . . . . . 105
4.19 Factorial ANOVA Results for Relative Bias of Estimates of Parameter
Means (Missing Data Methods) . . . . . . . . . . . . . . . . . . . . . 106
4.20 Factorial ANOVA Results for Relative Bias of Estimates of Parameter
Variances (Missing Data Methods) . . . . . . . . . . . . . . . . . . . 106
4.21 Mean Relative Bias Grouped by Levels of Separation (SP) . . . . . . 108
4.22 Mean Relative Bias Grouped by Analysis Methods (ME) . . . . . . . 109
4.23 Mean Relative Bias Grouped by Sample Size (SS) . . . . . . . . . . . 109
4.24 Main Effect Pairwise Comparisons Corresponding to ME × SS Inter-
action of Intercept Variances for Relative Bias . . . . . . . . . . . . . 113
4.25 Main Effect Pairwise Comparisons Corresponding to ME × SS Inter-
action of Slope Variances for Relative Bias . . . . . . . . . . . . . . . 116
4.26 Main Effect Pairwise Comparisons Corresponding to ME × SS Inter-
action of Intercept Slope Covariances for Relative Bias . . . . . . . . 118
4.27 Main Effect Pairwise Comparisons Corresponding to ME × SP Inter-
action of Proportion Parameter for Relative Bias . . . . . . . . . . . . 120
4.28 Mean Relative Bias of the Residual Variance Estimate Grouped by
Method, Missing Data Rate, and Sample Size . . . . . . . . . . . . . 122
4.29 Overall Mean SE/SD Ratios for Missing Data Methods Grouped by
Method, Sample Size (top), and Class Separation (bottom) . . . . . . 125
4.30 Factorial ANOVA Results for SE/SD Ratios of Estimates of Means
(Missing Data Methods) . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.31 Factorial ANOVA Results for SE/SD Ratios of Estimates of Variances
(Missing Data Methods) . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.32 Mean SE/SD Ratios Grouped by Analysis Method (ME) . . . . . . . 129
A.1 Relative Bias for All Parameters by All Conditions Using Full Data
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
A.2 SE/SD Ratios for All Parameters by All Conditions Using Full Data
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
B.1 SE/SD Ratios For Mean Estimates Grouped by ME and SS (Missing
Data Methods) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
B.2 SE/SD Ratios For Variance Estimates Grouped ME and SS (Missing
Data Methods) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
B.3 Main Effect Pairwise Comparisons Corresponding to ME × SS Inter-
action of Intercept Variances for SE/SD Ratios . . . . . . . . . . . . . 155
viii
B.4 Main Effect Pairwise Comparisons Corresponding to ME × SS Inter-
action of Slope Variances for SE/SD Ratios . . . . . . . . . . . . . . . 156
B.5 Main Effect Pairwise Comparisons Corresponding to ME × SS Inter-
action of Intercept Slope Covariances for SE/SD Ratios . . . . . . . . 157
B.6 Main Effect Pairwise Comparisons Corresponding to ME × SS Inter-
action of Residual Variances for SE/SD Ratios . . . . . . . . . . . . . 158
B.7 Main Effect Pairwise Comparisons Corresponding to ME × MR In-
teraction of Intercept Variances for SE/SD Ratios . . . . . . . . . . . 159
B.8 Main Effect Pairwise Comparisons Corresponding to ME × MR In-
teraction of Slope Variances for SE/SD Ratios . . . . . . . . . . . . . 160
B.9 Main Effect Pairwise Comparisons Corresponding to ME × MR In-
teraction of Residual Variances for SE/SD Ratios . . . . . . . . . . . 161
ix
List of Figures
3.1 True population class 1 and class 2 growth curves . . . . . . . . . . . 59
3.2 Running means of parameter estimates . . . . . . . . . . . . . . . . . 71
4.1 Overall means and medians of the absolute relative bias by ME and SS 86
4.2 Overall means and medians of the absolute relative bias by ME and SP 87
4.3 Overall means and medians of the SE/SD ratios by ME and SS . . . 93
4.4 SE/SD ratios for the intercept mean, slope mean, intercept variance,
slope variance and intercept slope covariance by method . . . . . . . 96
4.5 Overall means and medians of the absolute relative bias by ME and
SS (Missing data methods) . . . . . . . . . . . . . . . . . . . . . . . . 102
4.6 Overall means and medians of the absolute relative bias by ME and
SP (Missing data methods) . . . . . . . . . . . . . . . . . . . . . . . 103
4.7 Relative bias of the estimates of intercept variance, slope variance,
and intercept slope covariance by ME and SS, and estimates of pro-
portion by ME and SP . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.8 Three-way ME × SS × MR interaction plot of the relative bias of the
residual variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.9 Overall means and medians of the SE/SD ratios by ME and SS (Miss-
ing data methods) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.10 Overall means and medians of the SE/SD ratios by ME and MR
(Missing data methods) . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.11 Two-way ME × SS interaction plot of the SE/SD ratios for inter-
cept variance, slope variance, intercept slope covariance, and residual
variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.12 Two-way ME ×MR interaction plot of the SE/SD ratios for intercept
variance, slope variance, and residual variance . . . . . . . . . . . . . 132
x
Chapter 1: Introduction
1.1 Growth Mixture Models and Missing Data
Finite mixture models are statistical models that allow the possibility of la-
tent heterogeneous subpopulations. For example, subpopulations can exist based on
observed variables such as gender, age, or ethnicity, but finite mixture models hy-
pothesize the existence of unknown groups beyond these observed categorizations.
Finite mixture models have been applied to a wide range of statistical models typ-
ically used when the interest lies in testing the assumption of latent subgroups.
Finite mixture models have also been used to account for non-normal distributions
and are otherwise known to be indirect applications of the model (Bauer, 2007;
Titterington, Smith, & Makovm, 1985). Of interest to applied researchers is a par-
ticular finite mixture model in the context of longitudinal studies known as a growth
mixture model (GMM; B. O. Muthén & Shedden, 1999). The GMM extends finite
mixture models to repeated measures or panel data. The assumption is that there
exists heterogeneous latent groups that become apparent through varying patterns
of change over time. For example, Colder, Campbell, Ruel, Richardson, and Flay
(2002) applied a GMM to identify classes of growth trajectories of adolescent alco-
hol use and found five distinct groups that exhibited different patterns of growth
1
distinguished by levels of emotional distress and risk taking. Another more recent
example is a study done by Musu-Gillette, Wigfield, Harring, and Eccles (2015),
who applied a GMM to identify groups of students showing different trajectories of
self-concept in math, and how likely these groups were to choose a math-intensive
major in college (a distal outcome). The use of GMMs has also been used in other
disciplines such as psychiatry (Lin, Narayan, Drevets, Ye, & Li, 2017), medicine
(Hesser, Hedman, Lindfors, Andersson, & Ljótsson, 2017), and public health (Kon-
ing et al., 2016). The increased popularity of GMMs in recent years kindled a number
of methodological studies that focused on issues pertaining to using the model un-
der varying conditions. Much of the methodological research has dealt with testing
indices for data-model fit (Henson, Reise, & Kim, 2007; Nylund, Asparouhov, &
Muthén, 2007), evaluating methods that help decide on the number of subgroups
(Bauer & Curran, 2003; McNeish & Harring, 2017; Nylund et al., 2007; Tofighi &
Enders, 2007), examining the classification accuracy of models (Enders & Tofighi,
2008; Peugh & Fan, 2012), comparing estimation methods for finite mixtures (Hipp
& Bauer, 2006; McLachlan & Krishnan, 2007), or showing how different settings
other than the default settings in popular software for estimating these models can
impact parameter recovery and precision (Li, Harring, & Macready, 2014).
Despite the numerous research studies devoted to refining the use of GMMs,
not much attention has been devoted to testing GMMs in the presence of missing
data. In the social and behavioral sciences, missing data analysis merits special
attention because repeated measures data used for fitting GMMs is typically rife
with missing data due to various issues like data collection error, participant non-
2
response to specific items, drop-out, or failure to participate in at least one wave of
data collection.
According to seminal work by Rubin (1976), missing data can be categorized
into three distinct types: Missing Completely at Random (MCAR), Missing at Ran-
dom (MAR), and Missing Not at Random (MNAR). MCAR missingness is estab-
lished when the missingness is a result of unsystematic omission. MAR missingness
is established when the missingness is related to other variables in the dataset but
not the variable containing the missing itself. MNAR missingness is established
when the missingness is related to the variable that has the missing values or some
unobserved variable. Rubin as well as others (Enders, 2010; Schafer & Graham,
2002) have shown that while MCAR missingness is benign when the percentage of
missing is not substantial, MAR and MNAR missingness and incorrect handling of
such missing data can lead to misleading results in certain situations. To add to
the difficulties of handling missing data that is MAR or MNAR, it is impossible
to establish whether missingness is MAR or MNAR. That is, there is no formal
test of MAR or MNAR. Fortunately, researchers have established reliable methods
to address missing data when the missingness is MAR. These methods allow for
more accurate parameter estimation than some traditional ad hoc methods, such as
listwise deletion, that have been naively used in the past (Schafer & Graham, 2002).
The result of studies focused on dealing with missing data, together with
the advancement of computer software readily available for applying sophisticated
missing data handling techniques, has allowed two methods to become very popular:
full-information maximum likelihood via the Expectation-Maximization algorithm
3
(FIML-EM) (Arbuckle, 1996) and multiple imputation (MI) (Rubin, 1987). These
two archetypal approaches have been shown to produce comparable results when
used to handle missing data under the assumptions of MAR and MCAR (Collins,
Schafer, & Kam, 2001; Enders, 2010; Graham, 2003, 2009). FIML is an estimation
technique that takes into consideration any of the missingness in the data. The
usefulness of FIML lies in the EM algorithm, an iterative process that, without
having to discard any information, alternates between an ”expectation step” and a
”maximization step” to arrive at optimized parameter estimates. Alternatively, the
MI approach fills in the missing data multiple times prior to fitting a model of interest
to the data. The filling-in process is also an iterative process, but unlike the EM
algorithm, is rooted in Bayesian methods. The complete, multiply-imputed, datasets
are then estimated using the main model of interest in a separate step. The resulting
parameters from the separate analyses are then aggregated using special combination
rules proposed by Rubin (2003). More details on these techniques to handle missing
data, their advantages and disadvantages, will be explained in Chapter 2.
Studies on handling missing data in the context of growth mixture models
recommend using FIML over MI (Enders & Gottschall, 2011; McLachlan & Peel,
2000; Sterba, 2016) because MI requires the grouping information to be known
a priori in order to correctly impute data (Enders, 2010) for each group. Since
the grouping information for mixture models is latent, the use of MI with mixture
models is problematic. Failure to specify a grouping variable when using MI has
been shown to produce biased parameter estimates and incorrect identification of
classes (Enders & Gottschall, 2011; Sterba, 2016).
4
Despite the clear advantages of using FIML over MI with mixture models, MI
can still be an attractive alternative over FIML, particularly for situations in which
covariates are part of the model. For example, practitioners prefer using MI because
of the convenience of handling the missing data in a separate step using covariates
that may not pertain to the main research question (Collins et al., 2001). Further-
more, standard conditional likelihood FIML estimation methods typically used in
commercial statistical software such as Mplus (L. K. Muthén & Muthén, 1998) will
remove cases that have incomplete covariate information because these likelihood
models assume covariates are fixed and do not contain missing data (Sterba, 2014).
As a result, software programs that are capable of estimating these conditional mod-
els require that covariate information be complete or any cases containing missing
covariates be completely removed 1. This type of deletion has been shown to result
in biased parameter estimates and decreased power, especially if the missingness
is MAR or MNAR. For situations with a considerable amount of missing covari-
ate data, this can cause significant sample size reductions that will lead to reduced
power (potential increase in Type II error rates) and erroneous statistical inferences
(Schafer & Graham, 2002).
To circumvent the complexities of handling missing data in the context of
GMMs, more recent studies have turned to Bayesian methods to estimate GMMs
with missing data. These studies consider situations in which the missingness is
related to the latent class, constituting an MNAR missingness mechanism. For ex-
1Sterba (2014) has shown a workaround to this default method and is discussed in a subsequent
section
5
ample, studies like Song and Lee (2002), Cai, Song, and Hser (2010), and Lu, Zhang,
and Lubke (2011) proposed using Bayesian methods to address missing data that
were MAR when using GMMs, finding favorable outcomes when using such models.
However, these methods are not as readily accessible and require additional analy-
ses and specifications special to Bayesian statistics, such as specifying proper priors,
that can complicate the process. In addition, running some models using Bayesian
methods can take longer periods of time to converge depending on the complexity
of the model and the scales of the variables containing missing values.
As an alternative, Harel (2007, 2009) suggested using a two-stage imputation
approach to remedy the problems of using MI with mixture models. This idea, orig-
inally proposed by Shen (2000) as a way to impute data of different types (Rubin,
2003; Schafer & Graham, 2002), was extended to the context of mixture regression
models. According to Harel (2009), the two-stage imputation method allows for
unbiased parameter estimates that are similar to those obtained from FIML estima-
tion. However, this method has not yet been tested in the context of GMMs and is
not readily available in standard software like Mplus. In addition, it is unclear as to
the types of situations in which this method will work.
1.2 The Current Study
A number of studies have used some of these missing data handling methods
but using different types of models. In addition, none of these studies have investi-
gated the different conditions where these different techniques may be useful. The
6
purpose of this study is to compare the different methods—those that have been
suggested for handling missing data in the context of growth mixture models as
well as those not yet extended in this longitudinal context—using a Monte Carlo
simulation. A variety of simulation conditions that are pertinent to practitioners
interested in using GMMs will be tested to provide insight on the performance of
these missing data handling techniques in comparison to each other.
7
Chapter 2: Literature Review
It is helpful to begin with an introduction to the array of topics that will
be discussed in this study and that will comprise the literature review. The first
section will be an introduction to missing data terminology. This section will then
be followed by a brief introduction to growth models and growth models in the
context of finite mixtures. To conclude, the existing literature on handling missing
data vis-à-vis growth mixture models will be discussed in more detail.
2.1 Missing Data
The existence of missing data in educational, social and behavioral science
datasets is ubiquitous and presents an issue mandating action. The problems that
missing data can cause and the ramifications of ignoring these problems or improp-
erly handling them have been extensively documented since the seminal work by
Rubin (1976), who laid the foundation in this area by establishing a classification
system for the different types of missing data mechanisms and providing sugges-
tions on how to deal with them. Since then, various methods have been proposed
to address missing data based on an array of data features as well as the proposed
analytic model, including
8
1. type of missing data,
2. type of variables that are missing,
3. types of models,
4. software availability and,
5. assumptions that should be made about the missingness.
The last 30 years alone have produced substantial literature in this area. As a result,
the methodology on handling missing data now spans a diverse number of disciplines.
Some of the concepts that have been carefully studied and widely adopted as a result
are discussed next.
2.1.1 Missing Data Taxonomy
Rubin (1976) regarded missing data as random variables with a probability
distribution. Missing data were classified as having one of three mechanisms:
1. Missing completely at random (MCAR)
2. Missing at random (MAR)
3. Missing not at random (MNAR).
Thorough explanations of these mechanisms can be found in Rubin (1976), Schafer
and Graham (2002), and Enders (2010). A summary of these mechanisms and their
differences are presented next.
9
For data that is MAR, the missingness is independent of the missing variable,
but dependent on other observed variables. If I represents the set of indicators of
the missing data in a dataset, then MAR is formally defined as the probability of I
when it is related to observed variables, Yobs, but not the missing variable, or Ymis.
That is,
Pr(I|Ycomp = {Yobs,Ymiss},φ) = Pr(I|Yobs,φ), (2.1)
where Ycomp is the complete data consisting of observed and unobserved data (Yobs
and Ymiss respectively), and Pr represents a probability distribution. The φ term
represents a vector of parameters that captures the relationship between I and the
observed data. To illustrate, a simple example of missing data under a MAR mech-
anism is when an individual’s measure of income is missing not due to an individual
embarrassment to report a low income, but because of another variable related to
income, such as employment or unemployment status that has been observed. In
this scenario, an individual could potentially have a missing income variable if the
individual did not have a job or any other means of attaining income.
If the missingness were also independent of observed variables, then this would
constitute MCAR. MCAR is the mechanism in which the missingness is independent
of both the missing and observed variables. This type of missing can be regarded
as unsystematic missingness because the missing values present themselves in a
completely random manner. More formally, this means that
Pr(I|Ycomp = {Yobs,Ymiss},φ) = Pr(I|φ). (2.2)
10
MCAR situations arise, for example, through accidental omission or miscoding. The
literature defines MCAR and MAR missingness to be ignorable because likelihood-
based maximization methods can produce unbiased estimates even if the model for
missing is ignored (Little & Rubin, 2002) due to how the likelihood of the miss-
ing data can be differentiated out of the overall likelihood. This concept will be
expounded on in Section 2.1.2.2.
Finally, an MNAR mechanism is thought to be operating when the missingness
is dependent on the variable with missing data. In other words, the probability of
missing data, I, is influenced by Ymis, above and beyond influences through Yobs.
That is,
Pr(I|Ycomp = {Yobs,Ymiss},φ) = Pr(I|Yobs,Ymiss,φ). (2.3)
In the context of the income example, an individual’s missing income variable may
be suspected of being MNAR if an income measures was missing because income
was low or high, or some other unmeasured variable that caused an individual to
leave the income field blank. MNAR is often referred to as a non-ignorable process
because a missing data model is needed in conjunction with the analytic model to
produce unbiased model estimates. According to the aforementioned definitions of
missingness mechanisms, missing observable data caused by latent variables consti-
tute a MNAR mechanism because latent variables are never observed. However, as
will be discussed in Section 2.4.4, missingness caused by latent variables can be con-
stituted as ”ignorable” by a concept called conditional extended ignorability (Harel,
2003; Harel & Schafer, 2002, 2009).
11
Unless the missingness was planned, which may be the case in planned miss-
ing designs such as cohort-sequential or accelerated longitudinal designs, MAR or
MNAR are untestable assumptions and the model causing the missing data is un-
known to the researcher. Even though these are untestable assumptions, in most
applied situations, missing data can be considered MAR or MNAR (Glynn, Laird,
& Rubin, 1986). Using deletion or single imputation methods under assumptions of
MAR or MNAR can produce biased estimates (Enders, 2010; Schafer & Graham,
2002). Studies have shown that using maximum likelihood (ML) or multiple im-
putation (MI) is more appropriate for these situations. When the missing data are
assumed to be MNAR, studies have recommended using selection models (Diggle
& Kenward, 1994) or pattern mixture models (Little, 1993) as part of a sensitivity
analysis, although even with these advancements there are no guarantees that the
methods will work to reduce bias (Enders, 2010). The next sections will further
discuss the use of MI and ML for addressing missing data under MAR.
2.1.2 Methods for Addressing Missing Data
A plethora of strategies have been proposed to address missing data, especially
in cases when the missing mechanisms are MAR and MNAR. Ignoring missing data
in these cases has been shown to create biased parameter estimates and inflated
standard errors (de Leeuw, Hox, & Huisman, 2003) which result in lower statistical
power. This section will present some of the methods that have been used in the
last several decades.
12
2.1.2.1 Original Methods
Prior to the development of the more sophisticated methods for addressing
missing data, listwise and pairwise deletion were the standard practice. Because the
missingness is assumed to be haphazard under MCAR, deletion methods may be
appropriate, although they will more often lead to biased estimates and reduction
of statistical power when the percentage of missing is high (Enders, 2010). Despite
the consequences of using these methods, many studies still continue to use them
(Peugh & Enders, 2004) and deletion methods continue to be the default method in
many statistical software programs (i.e., SPSS).
Listwise deletion removes cases that have at least one missing value on variables
used in an analysis, essentially leaving a complete dataset but at the cost of wasted
information. In addition to lowering statistical power from the loss of data, if the
missingness cannot be assumed MCAR, then listwise deletion will produce biased
parameter estimates (Enders, 2010).
As a way to preserve data and power, pairwise deletion only removes cases on
a variable-by-variable basis. That is, the deletion method removes only the cases
that contain missing data for the variables that are being used for the analysis.
In this scenario, individual cases with missing data are retained as long as they
have data for the variables for which an analysis is being conducted. Pairwise dele-
tion also requires the assumption of MCAR missingness or will otherwise produce
distorted estimates, which has been shown through empirical research (Arbuckle,
1996; B. O. Muthén, Kaplan, & Hollis, 1987; Wothke, 2000). For example, pairwise
13
deletion is often the cause of nonpositive definite matrices (Little, 1992; Wothke,
1993) and difficulty in computation of standard errors due to indefinite sample sizes
which present themselves when variable information is available for some but not for
others (Little, 1992). Although deletion methods are generally never recommended,
they were the only available methods in the advent of statistical computing and
consequently became the default method.
A popular non-parametric way to correct for the discrepancies caused by dele-
tion methods that is often used in survey methodology is the use of an inverse
probability weighting approach. In typical survey studies, sampling is done in a
probabilistic manner, allowing analyses conducted on these samples to be general-
ized to the larger population. Sampling weights are used in order for samples to
represent the population and provide unbiased estimates of population parameters
(Kalton, 1983; Kish, 1965). In the presence of missing data, sampling weights can be
adjusted by multiplying an inverse of a response rate to the weights and applying
them to any individuals without missing data. Individuals with missing data are
listwise deleted, and those remaining are given a weight that accounts for the miss-
ingness as well. These inverse weights are constructed using variables that correlate
with both response propensity and the outcome variable. However, it has been shown
that leaving out some of these variables from the adjustments can cause some biased
variance estimates, especially if variables that correlate highly are omitted from the
adjustments (D. Y. Lee, Harring, & Stapleton, in press). In addition, because the
weighting approach relies on listwise deletion, this will also inevitably cause a loss
of power if the rate of missing data is substantial.
14
Several imputation methods were introduced to address the issue of loss of
power and precision that resulted from methods such as listwise or pairwise dele-
tion. Imputation methods try to replace the missing data with plausible values to
preserve power and improve precision by allowing the inclusion of variables that
are related to the missingness into the analysis (Schafer & Graham, 2002). Several
well-known single imputation methods have been popular mainly due to convenience
and availability in software. Popular single imputation methods include arithmetic
mean imputation (Wilks, 1932), hot-deck imputation (Scheuren, 2005), regression
imputation (Buck, 1960), and stochastic regression imputation.
Arithmetic mean imputation replaces the missing values with the arithmetic
mean of the variable with missing data using cases with values on the variable.
Regression imputation replaces the missing values with a predicted value from a re-
gression model using the complete data. Arithmetic mean and regression imputation
are no longer recommended because although these imputation techniques preserve
the mean structure of the data, they fail to preserve the variance and other higher
moments of the data (Schafer & Graham, 2002), which leads to distorted measures
of association and incorrect inferences. In addition, studies have demonstrated that
the results from these two methods produce attenuated covariances and biased pa-
rameter estimates as the rate of missing data increases, even under the assumption
of MCAR (Beale & Little, 1975).
Hot-deck imputation is popular in survey applications that was originally de-
veloped by the Census Bureau to deal with missing survey data. Hot-deck impu-
tation is a technique where missing values are replaced by values from individuals
15
that are similar on other observed variables with the individual. Although several
versions of the hot-deck imputation method exist, the general idea is to impute for
any missing variables by randomly choosing from a sample of individuals with sim-
ilar characteristics on other observed variables. Hot-deck imputation is known to
preserve univariate distributions of the data and the variability of the filled-in data
(Enders, 2010). Research has shown, however, that this imputation method does
not perform well for estimation of measures of association such as correlations and
regression coefficients (Brown, 1994; Schafer & Graham, 2002).
For the purpose of preserving variability, the stochastic regression imputation
approach augments each predicted value from a regression imputation with a random
normally distributed residual term. Simulation studies have shown that under as-
sumptions of MAR, stochastic regression imputation can produce similar estimates
as those from maximum likelihood and multiple imputation (Gold & Bentler, 2000).
However, while estimates from a single imputation may produce unbiased estimates,
standard errors are underestimated because the predicted values are treated as real
data, ignoring the sampling error associated with missing data (Enders, 2010).
Another imputation approach that has gained popularity is predictive mean
matching (PMM), which was introduced by Rubin (1986). PMM uses a regression
imputation approach, but instead of using imputation values straight from the re-
gression function, it uses the function to help match the observed cases to those with
missing data. The values from these cases then serve as the imputations for the miss-
ing data. While Rubin’s original suggestion was to use a single case for matching,
later studies have shown that in order to obtain proper standard errors the number
16
of cases needs exceed five, essentially turning the method into a multiple imputation
approach (Morris, White, & Royston, 2014). The method is especially convenient
for non-normal data due to its semi-parametric approach (Horton & Lipsitz, 2001).
However, the precise number of imputations needed to produce unbiased estimates
and correctly sized standard errors is unknown and can vary based on sample size.
This method can become especially problematic if sample sizes are small because
the number of potential matching cases is limited (McNeish, 2017). In addition, the
use of PMM has not been researched enough to compare its performance relative
to other missing data methods. In addition, some software packages like SPSS im-
plement the method using Rubin’s traditional method of using a single imputation,
which is not a recommended practice.
In light of all the advancement in the area of missing data methodology, re-
searchers and practitioners in recent years have turned to methods that have been
extensively tested, as studies have shown that ignoring MAR missing mechanisms
can produce biased parameters if not addressed properly. These methods are full
information maximum likelihood and multiple imputation, which are discussed in
detail next.
2.1.2.2 Maximum Likelihood
Most statistical methods rely on maximum likelihood estimation for obtaining
parameter estimates of a model. This involves identifying a likelihood probability
function appropriate for the analysis and estimating the parameters that maximize
17
the probability given the observed data. In situations with missing data, estimation
requires the use of intricate methods that strategically audition parameters in an
iterative fashion until an optimal solution is derived. Several iterative methods have
been developed for this purpose and will be discussed in subsequent sections. Com-
plexities arise when a dataset contains missing data. In such situations, maximum
likelihood estimation can still be utilized. This method of dealing with missing data
is sometimes referred to as full-information maximum likelihood (FIML; Anderson,
1957; Dempster, Laird, & Rubin, 1977; Lord, 1955) estimation because it uses all
available data without having to discard any cases that have missing data. FIML
is currently regarded as the state-of-the-art for handling missing data, particularly
when the preponderance of missingness occurs on an outcome variable, under the
assumption of MAR (Schafer & Graham, 2002).
In the general maximum likelihood estimation framework, the likelihood of
producing a pattern of data for p variables from N individuals can be expressed as
∏N
L(θ|y) = fi(y(i,comp)|θ), (2.4)
i=1
where
y(i,comp) = {y(i,obs),y(i,mis)} = {yi1, yi2, ..., yip}, (2.5)
L is the overall likelihood function value which is scalar, θ are the parameters to be
estimated, and fi is a probability density function for individual i. Any continuous
variables with missing data, for example yig, can be integrated out of the probability
18
distribution to obtain the marginal probability of the individual’s observed data
given model parameters. That is,
∫
fi(y(i,obs)|θ) = fi(y(i,comp|θ)dyg, (2.6)
yg
given the assumption that the missingness is MAR. Any discrete missing variables
are summed over. Then, for example, for M individuals with completely observed
data and (N −M) individuals with missing data, the missing data likelihood can
be expressed as
∏N ∏M
L(θ|y) = fi(y(i,comp)|θ) fi(y(i,obs)|θ), (2.7)
i=1 N+1
which is the product of the likelihoods for all the observations. In other words,
given the varying patterns of observed and unobserved data in yi, the essence of
full-information maximum likelihood estimation is to use each individual’s complete
data and to find the parameters across individuals that maximize the likelihood
function.
Several methods have been proposed to obtain the values of θ that maximize
this likelihood function. Typically, parameters for models with closed solution forms
and no missing data can be obtained using first derivatives. However, iterative pro-
cedures are required for complex models with large θ vectors, missing data, and/or
latent variables. The expectation maximization (EM) algorithm, which is discussed
next, is one such iterative method that is especially useful for maximizing likelihood
19
functions when the dataset contains missing data.
2.1.2.3 Expectation Maximization Algorithm
The expectation maximization (EM; Dempster et al., 1977) optimization algo-
rithm is a two-step procedure that iterates between an expectation-step (E-step) and
maximization-step (M-step) to find optimal parameter estimates in the likelihood
sense. In the most general sense, the E-step calculates a conditional expectation
of the log-likelihood function1. More specifically, if θ(m) is the mth iteration value
of θ and L(θ|y) is the complete-data log likelihood function, then the (m + 1)th
iteration of the algorithm begins with the E-step, which requires the calculation of
the expectation of L(θ|y) conditional on θ(m) and y. That is,
[ ]
Q(θ|θ(m)) = E logL(θ|y);y,θ(m) . (2.8)
The M-step then maximizesQ(θ|θm) by choosing θ(m+1) such thatQ(θ(m+1);θ(m)) ≥
Q(θ;θ(m)). The E and M steps continue until the difference between L(θ(m+1)) and
L(θ(m)) is arbitrarily small.
Analysts rely heavily on maximum likelihood with the EM algorithm to es-
timate models using data that contain missing data, more so now than before due
to it being the default estimation algorithm in several statistical packages. Several
alternative methods for optimization are available, such as Newton-Raphson and
Fisher scoring methods, yet none are as robust as the EM algorithm (Arbuckle,
1The log of the likelihood is used for the computational convenience of working with sums rather
than products of likelihood functions
20
1996). It should be noted that these iterative optimization algorithms carry the risk
of non-convergence to an optimum solution, and often times require unwieldy com-
putations that may be difficult to carry out for more complex models. Typically, the
class of models that fall under finite mixtures are maximized via the EM algorithm,
because the latent classes to be estimated are essentially treated as another missing
data variable. An alternative method to deal with missing data that is comparable
to the ML method using EM that is rooted in Bayesian methodology is presented
next.
2.1.2.4 Multiple Imputation
As a way to retain the advantages of single imputation methods and model
sources of uncertainty mentioned before, Rubin (1987) recommended using multiple
imputation (MI). MI fills in values for the missing data in three general steps:
1. creation of multiple complete datasets by imputation of plausible values in the
cells where missing data are present,
2. analysis of the multiple complete datasets, and
3. aggregation of the results from the multiple analyses.
The underrepresented uncertainty of parameter estimates is addressed by creating
several versions of complete data that produce multiple parameter estimates. These
estimates are then averaged to produce a final parameter estimate. Special rules
derived by Rubin (1987) are applied in the aggregation stage of standard errors to
account for between imputation variance and within imputation variance.
21
The first step of MI consists of creating M datasets, with each dataset con-
taining varying values for the cells with missing data. Once the data are filled in,
the second stage consists of analyzing the M complete datasets and obtaining the
estimates of the model parameters of interest. The resulting M point estimates are
then averaged to obtain the final parameter estimates, θ̄. That is,
M
1 ∑
θ̄ = θ̂m, (2.9)
M
m=1
where θ̂m is one of the M point estimates. Special rules proposed by Rubin (1987)
are applied to obtain the final standard errors. These special rules take into account
the within-imputation variability, Vw, and the between-imputation variability, Vb.
Specifically, the standard errors are combined such that the resulting standard error,
SET , is √
Vb
SET = Vw + Vb + , (2.10)
M
where
1 ∑M
Vw = s
2
m (2.11)M
m=1
∑M1
Vb = (θ̂m − θ̄)2, (2.12)
M − 1
m=1
and the s2m term in Equation (2.11) represents the squared standard error from the
analysis of dataset m.
Filling the missing data cells in step 1 is typically the most crucial step of
the process and warrants further discussion. Several versions of the imputation step
22
have been introduced, but in general, these methods can be classified into two frame-
works: joint-modeling (JM; Rubin, 1987) and fully conditional specification (FCS;
van Buuren, Brand, Groothuis-Oudshoorn, & Rubin, 2006). While these two ap-
proaches are different in how they impute the missing data, they are both grounded
in Bayesian methods. The Bayesian method behind multiple imputation is discussed
next.
2.1.2.5 Bayesian Inference Behind Multiple Imputation
The imputation step of the MI procedure, whether by JM or FCS, requires
the use of Bayesian inference. Bayesian inference considers parameters to be random
variables with distributions that are inferred using the available data. The Bayesian
method amounts to sampling these random variables from a series of probability
distributions. Specifically, this consists of:
1. specifying probability distribution functions (or PDFs) for the parameters of
interest [typically referred to as the prior distributions, or simply prior(s)],
2. specifying a likelihood function—a conditional PDF of the data given the
parameters—that use the data to provide evidence about the parameters, and
3. creating a posterior distribution using the prior(s) and likelihood function to
describe the distribution of the parameters in light of the data.
According to Bayes’ theorem, the relationship among these three is expressed by
| Pr(θ) Pr(Y |θ)Pr(θ Y ) = , (2.13)
Pr(Y )
23
where Y is the data, θ is a parameter of interest, Pr(θ|Y ) is the posterior distribution,
Pr(θ) is the prior distribution function for the parameter of interest and Pr(Y |θ)
is the likelihood function of the data given the parameters of interest, and Pr(Y )
is the marginal distribution of Y . The marginal distribution Y on the right-hand
side of the equation is simply a scaling constant that does not affect the shape of
the posterior distribution, so this relation is typically expressed as a proportional
relationship between the posterior distribution and the product of the prior and the
likelihood distribution. That is,
Pr(θ|Y ) ∝ Pr(θ) Pr(Y |θ), (2.14)
which is the fundamental relationship in Bayesian inference.
The manner in which the parameters for the missing data posteriors are ob-
tained is what sets the JM approach and FCS approach apart. These two approaches
are discussed next.
2.1.2.6 Joint Multivariate MI
The joint multivariate modeling imputation approach consists of a data aug-
mentation process. During the I-step, for each missing data pattern (or unique com-
binations of observed and missing variables), missing values are drawn from a pos-
terior distribution of the missing data that is conditioned on the observed data,
Yobs, and mean vector and covariance matrix (contained in θ) of all the observed
and missing variables (assuming multivariate normality). In other words, the filled-
24
in data at the tth step in the particular I-step are random draws from a posterior
predictive distribution that is conditional on the observed data and the mean vector
and covariance matrix of the data drawn from the previous P-step of the missing
data model. That is,
Y ∗t ∼ Pr(Ymis|Yobs,θ∗t−1). (2.15)
At the P-step, using a Monte Carlo simulation, new values for θ are drawn from
respective posterior distributions of the means and covariances that are formulated
using appropriate priors and likelihoods based on the filled-in data from the preced-
ing I-step. That is,
θ∗t ∼ Pr(θ|Yobs,Y ∗t ). (2.16)
These newly drawn mean and covariance parameters contained in θ are then used to
construct the equations used to obtain the new sets of imputations in the subsequent
I-step. This back and forth computation continues until the distributions for all the
parameters in θ are stationary, or have converged2. The imputations created at
specified steps of the entire data augmentation process are saved to make up the
final set of imputations. A major assumption of using the joint multivariate approach
is that the data are normally distributed, which may not always be the case.
2The meaning of convergence in Bayesian methods is discussed in detail in Gelman and Rubin
(1992)
25
2.1.2.7 Fully Conditional Specification Multiple Imputation
The fully conditional specification (FCS) method is often referred to as se-
quential regression or chained equations. Like JM, FCS is grounded in Bayesian
inference, yet the manner in which missing data is imputed is different. Mainly,
while the JM-MI approach uses a joint multivariate distribution to impute the miss-
ing data, FCS uses separate univariate distributions of the variables with missing
data, essentially eliminating the need for all variables to be normally distributed. In
this regard, FCS has the flexibility of allowing the multiple imputation of a combi-
nation of normally distributed and non-normally distributed missing data. Variables
with missing data are imputed one at a time, starting with the variable with the
lowest rate of missingness. Subsequently imputed variables condition on all previ-
ously imputed variables until all the missing data has been imputed. Like in the
JM approach, MCMC simulation is used to draw new sets of parameters for the
conditional posterior distributions that were used to create the imputations, using
the filled in values as part of the likelihood function making up the posterior from
where the new parameters for the imputations are drawn from. This process con-
tinues until a specified number of times. For multivariate normal data, the FCS has
been shown to be equivalent to JM (Hughes et al., 2014; Mistler & Enders, 2017).
2.2 Growth Models
Several options are now available for analyzing data from longitudinal studies.
In the social and behavior sciences, these studies typically use repeated measures or
26
panel designs to collect data from subjects at various time points, although intensive
longitudinal designs are becoming more prevalent. The use of latent growth models
(LGM), also known as latent curve models in the SEM literature (McArdle, 1986;
Meredith & Tisak, 1990), have represent a class of models commonly used to ana-
lyze such data. Although the use of the name latent growth models and latent curve
models are in some ways synonymous, the more general term latent growth modeling
represent a more generalized approach and will be used from here on out. The idea
behind LGMs is that subjects share the same functional form (e.g., linear, quadratic,
etc.), but have distinct shapes stemming from idiosyncratic parameterizations. For
example, with linear LGMs, inference will involve the estimation of individual start-
ing points (intercepts) and growth rates (slopes). Generally, the functional form of
the model is determined on theoretical grounds or derived from an exploration of
the data and will dictate the number of parameters per individual.
LGMs can be regarded as two-level regression models where observations at
each time within individuals are the first level, and the individuals are the sec-
ond level (see, e.g. Singer & Willett, 2003). The observations at each timepoint are
regarded as indicators of factors representing patterns of change. For example, to
establish a linear LGM, the factors represent the intercept and slope, and the load-
ings are specified constants. Something to note is that in the latent curve modeling
literature, loadings can also be estimated to represent other patterns (Meredith &
Tisak, 1990).
For LGMs, a model for a single data point for individual i at time t, or yit, is
27
typically specified as follows:
yit = η0i + η1iTIMEt + it, (2.17)
where η0i and η1i are the individual’s intercept and slope growth factors, respectively,
and it represents the residual at time t allowing an individual’s fitted function to
deviate from their data. Also, the TIMEt term represents the t
th timepoint for
individual i. The intercept and slope growth factors, η0i and η1i respectively, can be
further expanded as
η0i = αη0 + ζ0i, (2.18)
and
η1i = αη1 + ζ1i, (2.19)
respectively. Although unspecified in Equations (2.18) and (2.19), these growth fac-
tors can easily incorporate time-invariant covariates (i.e., individual attributes). In-
dividuals’ random effects, or deviations that exist from the mean for each growth
factor, are represented by ζ0i for the intercept, and ζ1i for the slope. These random
effect terms are assumed to be multivariate normally distributed with a mean vector
of zero. Combined, using matrix notation commonly used in SEM, Equations 2.17,
2.18, and 2.19 become
yi = Λiηi + i, (2.20)
where
ηi = α+ ζi. (2.21)
28
For T repeated measures or time points per individual where t = 1, 2, ..., T , yi is
a T × 1 vector of measures for individual i and ηi is a p × 1 vector of the growth
factors. This notation of T implies that each individual is assumed to have the same
number of timepoints and no missing data. For a linear model with two growth
factors, p is equal to 2. The Λi term is a T × p matrix of factor loadings that is
specified to model the desired functional form for the data for each individual. It
should be mentioned that the Λi term is specific to each individual, for example,
when applying this model to time-unstructured data (McNeish & Matta, 2018). By
way of example, assuming that every individual follows the same functional form
of growth, Λi for a linear growth model with measurements at three equally-spaced
time-points would be  
1 1
Λ = 1 2i . (2.22)
1 3
The α term represents a p×1 vector containing the intercepts for the growth factor
(e.g., αη0 and αη1 in Equations 2.18 and 2.19), and ζi is a p × 1 vector of random
effects for each growth factor. The i term is a T × 1 individual vector of residuals
across the T timepoints. Both ζi and i are assumed to be mutually independent and
multivariate normally distributed with means equal to zero and covariance matrices
represented by Ψ and Θi, respectively. That is,
ζi ∼MVN(0,Ψ), (2.23)
i ∼MVN(0,Θi), (2.24)
29
where Ψ is a p × p matrix of the variances and covariances of the growth factors
and Θi is the T × T covariance matrix of the time-specific residuals. In addition,
the residuals and random effects are assumed to be uncorrelated.
The yi terms are a linear combination of normally distributed growth factors
and residuals, making yi also multivariate normally distributed. That is,
yi ∼MVN(µi,Σi), (2.25)
where the model implied mean µi and covariance Σi are
µi = Λiα, (2.26)
and
Σi = Λ
′
iΨΛi + Θi. (2.27)
In standard latent growth modeling, it is assumed that samples come from a homo-
geneous population. In some instances, however, multiple heterogeneous groups may
exist within a population, in which case the LGM can be modified as a multiple-
group model that allows groups to have unique growth patterns. Groups are typically
identified by discrete covariates that are manifested in the population (e.g. gender,
age group). Groups may also be theorized latent groups. Models that allow the pos-
sibility of such latent groups are called growth mixture models and will be discussed
next.
30
2.3 Growth Models Extension to Finite Mixtures
The growth mixture model (GMM) allows for the assumption that there exists
an unobserved heterogeneous population underlying the data (B. O. Muthén, 2001,
2004; B. O. Muthén & Muthén, 2000; B. O. Muthén & Shedden, 1999; Nagin, 1999).
In other words, the GMM is a multiple group model but with latent heterogeneous
classes in the population. This model falls under a more general category of models
called finite mixture models (McLachlan & Peel, 2000; B. O. Muthén & Shedden,
1999).
Assuming multivariate normality, the combined probability density of a vector
of continuous outcome variables for each individual can be expressed as
∑K
f(yi|π,θ) = πkfk(yi|θk), (2.28)
k=1
where K represents the number of latent classes, θk is a parameter vector (e.g.,
means and covariances) for each latent class, fk represents the marginal density for
class k, and the πk term represents the proportion of the sample belonging to each
class k, or oth∑erwise known as the mixing proportion for class k, where 0 < πk < 1,
and π = 1− K−1k j=1 πj. In other words, for each class there exists a specific density
function that is represented by a statistical model with unique class parameters
(McLachlan & Peel, 2000).
For each class k, a linear GMM can be expressed by Equations 2.20 and 2.21
31
with a subscript k. That is,
yi = Λiηi + i (2.29)
and
ηi = αk + ζi. (2.30)
where the model-implied mean vector and covariance matrix for class k, analogous
to the previously presented Equations 2.26 and 2.27 are expressed as
µi|k = Λiαk, (2.31)
and
Σi|k = ΛiΨkΛ′i + Θik, (2.32)
respectively. The residual terms are also class specific such that
ζi|k ∼ N(0,Ψk) (2.33)
i|k ∼ N(0,Θik), (2.34)
although it is common to constrain the Ψk and Θik terms to be equal across classes,
as is the case with Mplus. Class-specific covariates can also explain variation in
y and can also serve as predictors of the mixing proportion parameter through a
multinomial logistic link function (see, e.g., Maddala, 1983; Nagin, 1999).
32
2.3.1 Estimation of Growth Mixture Models
Although there are several approaches to estimate the parameters for the
GMM, this study will focus on two approaches that are currently very popular
in the GMM literature and which are also well regarded for handling missing data:
Maximum Likelihood using the Expectation Maximization algorithm and Bayesian
estimation methods, specifically using a Gibbs sampler. This section is devoted to
introducing these two approaches.
2.3.1.1 Maximum Likelihood Estimation with the EM Algorithm
Estimation of the parameters in Equations 2.29 and 2.30 requires maximization
of the log-likelihood in Equation 2.28. This can be accomplished using an iterative
procedure like Newton-Raphson (Liu, Hancock, & Harring, 2011) for direct maxi-
mization, quasi-Newton or the EM algorithm (Dempster et al., 1977; McLachlan &
Krishnan, 2007; McLachlan & Peel, 2000). The focus of this section will be on the
EM algorithm, specifically for finite mixtures.
Using the ideas introduced in Sections 2.1.2.2 and 2.1.2.3, the log-likelihood
of the mixture model to be maximized can be expressed as follows
∑ [ ]N ∑N ∑K
logL(θ|y) = f(yi|π,θ) = log πkfk(yi|θk) , (2.35)
i=1 i=1 k=1
for N individuals with parameters defined the same as Equation 2.28. The chal-
lenge of working with this likelihood is that, in addition to model parameters, the
33
classification parameter, πk, is latent and therefore completely missing. Dempster
et al. (1977) demonstrated how this likelihood could be assumed to be similar to
other missing data problems. Essentially, Equation 2.35 can be re-written with the
assumption that the class parameter is observed. That is, for every individual in a
sample of size N , let zi be a vector containing K zero-one binary variables, zik, in-
dicating individual i’s disassociation/association to class k, or zi = {ci1, ci2, ..., cik}
where {
1, if subject i belongs to class k
cik = 0, otherwise
Then, the log-likelihood with the “observed” class variable is:
∑N ∑K { }
logL(θ|y, z) = zik log(πk) + log[fk(yi|θk)] . (2.36)
i=1 k=1
∑
When zik is assumed to be observed, then π =
N
k i=1 zik/N , and the maximization
of the mixture model becomes less cumbersome. Missing data are also handled as
explained in Section 2.1.2.2.
Maximization of the log-likelihood then becomes conditional only on the ob-
served y and candidates of θ, and the same ideas presented in Section 2.1.2.3. The
E-step involves updating the values of θ by using the conditional, “complete-data”
expectation of the likelihood function presented in Equation 2.36. That is,
[ ]
Q(θ|θ(m)) = E logL(θ, z) | y,θ(m)
∑N ∑K
= E(cik|y,θ(m)){log(πk) + log[fk(yi|θk)]}
i=1 k=1
34
where
∑ ∣π f (y |θ(m)E | (m) k k i )(cik y,θ ) = K ∣∣∣π (m)k=1 kfk(yi|θ ) ∣θm (2.37)
= Pr(θ(m)).
Essentially, the E-step amounts to computing a posterior probability for each indi-
vidual evaluated at the current parameter values at the mth iteration.
The posterior probabilities are then carried over to the M-step, where they
are used for maximizing the conditional expectation by replacing the unknown class
indicators cik with πk and computing other model parameters in θ. These new
parameters are then carried over to iteration (m+ 1) starting with the E-step. The
E-step and M-steps are iterated until the parameters and class proportions that
maximize the likelihood satisfy the convergence criterion (Harring, 2012).
2.3.1.2 Fully Bayesian Estimation
Bayesian estimation using a MCMC simulation method is another possible
approach for obtaining parameter estimates for the GMM (Asparouhov & Muthén,
2010; Gelman, Carlin, Stern, & Rubin, 1995), which can also be implemented to
handle missing data. Although there are a number of nuanced MCMC methods
that have been developed, the Gibbs sampler is one of the more widely used meth-
ods (Diebolt & Robert, 1994; Geman & Geman, 1984; B. O. Muthén, Muthén, &
Asparouhov, 2010). This section will discuss the specifics of the Gibbs sampler.
During the Gibbs sampler, a sequence of values for unknown parameters, la-
35
tent variables, and missing observations are iteratively obtained in order to create
posterior distributions for the unknown parameters and missing data variables. As
was discussed in Section 2.1.2.5 (presented by Equation 2.13), this is done by con-
ditioning on the observed data and prior information and iteratively updating one
parameter and missing value at a time. Parameters are continuously updated by
conditioning on the priors and observed data, but also on the values from each
previous iteration.
In other words, if θi is a vector of all unknown parameters and missing
data at iteration i, where i = 1, 2, ..., I, then the parameters to be estimated, in-
cluding the missing data, are separated into D groups (d = 1, 2, ..., D such that
θi = {θ1i, ...,θDI}) of unknown variables that share the same prior distributions
(Asparouhov & Muthén, 2010; S.-Y. Lee, 2007). At each iteration, the terms within
θi are updated by conditioning on the previous instance of all other elements of
θi, the observed data (Yobs) and the specified prior distribution. For example, for
iteration i, each element in θi is updated following the sequence:
[θ1i|θ2(i−1), ...,θ(D)(i−1),Yobs,Pr(θ)]
[θ2i|θ1i,θ3(i−1), ...,θ(D)(i−1),Yobs,Pr(θ)]
(2.38)
...
[θDi|θ1I , ...,θ(D−1)(i−1),Yobs,Pr(θ)],
where Pr(θ) is the prior distribution.
Specifically when addressing for any missing data, missing data values are
36
updated one at a time. Given the joint posterior distribution:
Pr(Y ∗mis,Ω,Z,π,θ |Yobs)
where θ∗ contains mean and variance parameters, Ymis is the collection of missing
data, Ω is the matrix of latent variables, Z = (z1, ...,zi) is the matrix of associ-
ation/disassociation variables, π is the vector of proportion variables, at iteration
(j)
(j + 1) with current observations of Ω(j),Z(j),π(j),θ∗(j),Ymis, the sampler iterates
the following steps:
(j+1)
1. Step a: Generate (Y ,Ω(j+1),Z(j+1)mis ) from Pr(Ymis,Ω,Z|Yobs,π,θ∗)
(j+1)
2. Step b: Generate (θ∗(j+1),π(j+1)) from Pr(θ∗,π|Y ,Y ,Ω(j+1),Z(j+1)obs mis )
(j+1)
Step (a) can be further broken down. Specifically, Ymiss is generated from
Pr(Y ∗(j) (j)miss|θ ,π ,Ω(j+1),Z(j+1),Yobs),
where:
∏n
Pr(Y |θ∗(j),π(j)miss ,Ω(j+1),Z(j+1),Y ) = Pr(y ∗obs missi|θ ,ω, wi)
i=1
and
Pr(ymissi|θ∗
D
,ω, wi) = N(µmiss,(ik) + Λmiss,(ik)ηik,Ψmiss,(ik)).
Here, µmiss,(ik) is a p× 1 mean sub-vector of µk with any elements corresponding to
observed components deleted, and Λmiss,(ik) and Ψmis,(ik) are p× q sub-matrices of
37
Λk and Ψk, respectively, with observed components deleted (S.-Y. Lee, 2007, page
373). Similar to ML estimation, if the missingness is under the assumption of MAR,
then the missingness mechanism is ignorable and should not be responsible for the
bias of the parameter estimates (Little & Rubin, 2002).
Typically, several replications, or chains of the sampling are created in order
to assess convergence. This is easily done by varying starting values and seeds. The
separate chains and iterations continue until convergence can be determined which
is assessed using the potential scale reduction factor (PSR; Gelman et al., 1995),
sometimes together with graphical analysis of the chains. For multiple chains, C,
where C is an integer such that C > 1, the PSR is a function of the within-chain
variation, or
1 ∑C 1 ∑I
W = (θ − θ̄ )2ij .j , (2.39)
C I
j=1 i=1
relative to between-chain variation, or
1 ∑C
B = (θ̄.j − θ̄..), (2.40)
C − 1
j=1
∑
wh∑ere θij is the value of θ at iteration i in chain j, θ̄ =
1 I
.j i=1 θij and θ̄.. =I
1 I θ̄
C j=1 .j
. Then the PSR is obtained by the equation
√
W +B
PSR = , (2.41)
W
where convergence is determined by PSR values close to 1 or below 1.1 (Gelman et
al., 1995). When this condition of convergence has been satisfied, the final posterior
38
distributions for all unknown variables are created from a segment of the chains,
typically towards the end of all the generated θis. The characteristics of the posterior
distributions for each parameter, such as the mean, are then used as estimates.
The prior distribution for each parameter is usually specified by the researcher
in order to add a level of certainty about the parameter. Priors can be specified to
be noninformative when there is great amount of uncertainty and little information
about a certain parameter. Alternatively, they can be specified to be informative
when there is a high degree of certainty and information. For example, a researcher
might use information from previous studies in order to incorporate a certain amount
of information into the prior.
The choice of priors is an important aspect of estimating GMM parameters,
especially when sample sizes per class are small or when classes are not well separated
(Depaoli, 2013; Tueller & Lubke, 2010). The use of conjugate priors, or priors that
mathematically lead to the same distributional family as the posterior distribution,
is common (Asparouhov & Muthén, 2010; S.-Y. Lee, 2007). The Gibbs sampler in the
context of GMMs involves sampling values for the class proportions, growth factor
means, growth factor variances, and residual variances. In the presence of missing
data, additional sampling is done for each missing data value (Distribution ??). The
appropriate priors for these unknown parameters are described next.
For the class proportion parameters, a Dirichlet distribution (Dir) is an ap-
propriate conjugate prior distribution. That is,
πk ∼ Dir(δ1, ..., δK) (2.42)
39
where the δ1....δK are the hyperparameters, or the parameters that describe the prob-
ability distribution. For this particular distribution, the level of confidence about the
class proportion parameters is specified through the δ terms, which convey the level
of information and confidence about the class proportions.
A normal distribution is an appropriate conjugate prior for the growth factor
means. That is,
αk ∼ N(µα ,Ψ) (2.43)k
where the hyperparameters µα and Ψ represent the expectation for the factork
means and the matrix of the factor variance/covariances, respectively. An appropri-
ate prior for the Ψ term here is the inverse Wishart (IW) distribution such that
Ψ ∼ IW (Ω, ν), (2.44)
where hyperparameter Ω is a p× p matrix of values greater than zero and hyperpa-
rameter ν is an integer value such that ν > p− 1.
For the residual variances, the appropriate conjugate prior is the inverse gamma
(IG) distribution such that if ωtt represents a single cell in the T ×T square residual
variance matrix, Θ, then
ωtt ∼ IG(awtt , bwtt), (2.45)
with hyperparameters awtt and bwtt controlling the shape and scale of the distribution
to manipulate the level of informativeness about the each element of the residual
variance matrix. In the context of GMMs, Bayesian methods with varying prior
40
specifications have been shown to recover parameter estimates quite well under
specific conditions such as certain sample sizes, class proportions, and specifications
of priors.
Next, a brief discussion on how all these different missing data handling meth-
ods that were presented have fared with GMMs is presented.
2.4 Handling Missing Data with Growth Mixture Models
2.4.1 Maximum Likelihood
Missing data in the growth mixture modeling context can be problematic if
not handled correctly, even when the missing data are assumed to be ignorable.
Using a multiple group growth model, Enders and Gottschall (2011) found that
parameter estimates attenuated group differences when the grouping variable was
not considered as part of the imputation, even when the missingness was MCAR
because any group differences in the means or variances were ignored during the
data imputation. Their suggestion when it came to finite mixture models was to use
FIML-EM instead, especially since the grouping variable is always unobserved for
such models.
As Sterba (2016) pointed out, the use of FIML-EM can be problematic when
covariate information is missing. Models that incorporate covariate information re-
quire the use of conditional models. These conditional models are typically fit using a
conditional likelihood that require complete covariate information. Otherwise, stan-
dard statistical software such as Mplus that have the capability to run such models
41
will listwise delete data with missing covariates by default. The purpose of the study
by Sterba (2016) was precisely to show how to use Mplus to prevent listwise deletion
of data by using a joint likelihood approach, whereby in addition to estimating the
parameters of the conditional model, the parameters of the covariates’ marginal dis-
tribution are also estimated. Without this workaround, however, if the missingness
is based on any of the covariates (MAR mechanism), then the listwise deletion of
these cases could potentially bias parameter estimates.
Although studies seem to indicate that FIML-EM is a viable solution for the
missing data problem with GMMs, some still consider MI as a possible solution for
addressing missing data in the context of finite mixtures—particularly when miss-
ingness occurs on covariate information. While the literature on this topic is very
limited, a considerable amount of research has been put forth in the past 20 years
by a select few who have suggested using a two-stage multiple imputation approach
(Harel, 2007; Harel & Schafer, 2009; Rubin, 2003; Shen, 2000). The two-stage multi-
ple imputation approach, which will be discussed in Section 2.4.4, considers the class
variable to be a second stage of the imputation process. The studies that consider
the two-stage imputation vouch for the method under conditions of MAR. However,
not much research has been conducted in this particular area and none of the sim-
ulation studies on this approach considered different conditions that are typically
investigated in the context of finite mixture models.
In addition to using FIML-EM and two-stage imputation, Bayesian methods
have been explored in the area of growth mixture modeling, and some studies have
also considered methods for addressing missing data in these contexts. These studies
42
and their findings are presented next.
2.4.2 Bayesian Methods
In the GMM context, missing data can be handled using a Gibbs sampler,
as was explained in Section 2.3.1.2. Using a Bayesian approach to growth mixture
models has been shown to produce favorable results. In a simulation study by Depaoli
(2013), parameter recovery for a three class growth mixture model was investigated
under varying degrees of class separation, sample size, class proportion, and method
of analysis (FIML vs. fully Bayesian approach using different priors). The study
showed that a Bayesian framework with informative priors produced less biased
estimates of the means (slopes and intercepts) than when using maximum likelihood
estimation, particularly when class separations were larger (Mahalanobis distances
greater than 1.0) and samples consisted of as little as 150 individuals. Estimates of
the variance parameters, however, were not well recovered under all conditions. One
limitation of the study, however, was that missing data was not considered.
Investigations of GMMs under varying missing data conditions have also been
conducted. In a simulation study, Lu et al. (2011) investigated the use of a fully
Bayesian approach for estimating GMMs when different types of missingness were
introduced in the analysis. Considering sample size, class probability, rate of miss-
ing, and missing data mechanisms MNAR and MCAR, the study concluded that
Bayesian estimation generally performed well but only under sample sizes conditions
of 1000 or more. In addition, their missing rates were in the range of 5% to 20%,
43
and sample sizes below 500 were not considered, although they made the claim that
a Bayesian method is useful as long as missing data rates stay below 5% for small
sample sizes. This claim however, was not substantiated by any past research nor
as part of their simulation study.
Other studies have also investigated the use of Bayesian methods with missing
data, but none have considered different missing data mechanisms in the context of
GMMs with various conditions like the two aforementioned studies. For example, S.-
Y. Lee and Tang (2006) investigated the use Gibbs sampling for parameter recovery
of a factor analytic model with non-ignorable missing data, which was essentially a
replication of work by S.-Y. Lee and Song (2003), with the exception that MNAR
missingness was considered. In particular, their analysis focused on the choice of
priors for these models, concluding that the choice of priors did not make much
difference. The conditions they manipulated, however, were very narrow in scope,
and more importantly did not consider the existence of multiple latent classes.
In general, while there exist studies which have considered the use of Bayesian
methods for GMMs, none have conducted a systematic investigation on the use of
Bayesian methods with missing data under varying conditions that are pertinent to
the study of both GMMs and missing data.
2.4.3 Single-Stage Multiple Imputation
Aside from the study by Enders and Gottschall (2011), which cautioned against
using MI for general mixture models, the use of single-stage MI has not been thor-
44
oughly investigated. A few studies have, however, investigated the use of MI on lon-
gitudinal models. For example, Huque, Carlin, Simpson, and Lee (2018) investigated
the use of different JM and FCS methods specifically designed to accommodate for
missing time-dependent covariates measured at irregular time intervals. Their study
found that the standard JM and FCS methods performed adequately for longitudi-
nal studies, and that they were only really useful when data were irregularly spaced.
Schafer and Yucel (2002) and Asparouhov and Muthén (2010) also introduced nu-
anced versions of the JM imputation method, specifically designed for mixed effects
multilevel models. van Buuren (2011) later introduced an analogous method using
FCS and Mistler and Enders (2017) also compared nuanced versions of the JM and
FCS methods specifically for multilevel modeling. Although such studies have con-
sidered these missing data handling methods in the context of growth models, none
conducted a systematic investigation of the performance of MI in the context of
growth mixture models.
2.4.4 Two-Stage Multiple Imputation
Another suggested method for addressing the missing data problem in GMMs
has been brought to attention by Harel (2003) and Harel and Schafer (2009), who
proposed using a two-stage imputation method. The two-stage imputation method is
an extension of the nested MI approach proposed by Shen (2000) and then applied in
survey data by Rubin (2003). This method essentially groups the missing data, Ymis,
into qualitatively different types, allowing for a separate assessment of contribution
45
of each type of missing data to the overall uncertainty. The crux of the two-stage
imputation is in the assumption of extended ignorability as described in detail by
Harel (2003) and Harel and Schafer (2009). Recalling from the ignorability conditions
presented by Equation 2.1, if the missing data, Ymis, is regarded as the decomposition
of two types of missing data, or Y A Bmis = (Ymis, Ymis), and the missing data indicator
I+ represents the missingness indicator matrix for the two types of missing data,
then the missingness is regarded to be MAR+ if
Pr(I+|Y ,Y A ,Y B ,φ+ +obs miss miss ) = Pr(I |Yobs,φ+). (2.46)
According to Rubin (1987), under assumptions of traditional MAR, imputations
for any missing data can be drawn from Pr(Ymiss|Yobs), meaning that I can be
ignored. Analogously, Harel (2003) shows that I+ can be ignored and the following
relationships become true:
Pr(Y Amiss|Yobs, I+) = Pr(Y Amiss|Yobs), (2.47)
and
Pr(Y B |Y , I+) = Pr(Y B Amiss obs miss|Yobs,Ymiss). (2.48)
Proofs of these relationships are available in (Harel, 2003, p. 37). A weaker condi-
tion based on these relationships that are pertinent to mixture models, where the
missingness in the manifest variables might be conditioned on the completely unob-
served latent class variable (e.g., missing manifest variable, Y Bmiss, is dependent on
46
completely unobserved clatent class variable, Y Amiss), is the conditional extended ig-
norability condition. Harel defines a conditional extended ignorability assumption,
or CMAR+, where I+ is conditional on the observed data, the second group of
missing data, and the parameter relating these variables. That is,
Pr(I+|Yobs,Y A Bmiss,Ymiss,φ+) = Pr(I+|Yobs,Y A +miss,φ ). (2.49)
If the CMAR+ condition can be assumed, then the following relationship is also
true:
Pr(Y B A + B Amiss|Yobs,Ymiss, I ) = Pr(Ymiss|Yobs,Ymiss), (2.50)
which, under Rubin’s original assumptions of MAR, by which imputations can be
drawn from Pr(Ymiss|Yobs), imputations for a second set of missing data can be drawn
from Pr(Y Bmiss|Yobs,Y Amiss) as long as the mechanism that made Y Bmiss is not related
to Y Bmiss. For example, if Y
A
mis represents the missing class indicators (completely
unobserved) and Y Bmis represents the missing values in the manifest data, if the
mechanism that created Y Bmis is not dependant on Y
A B
mis or Ymis, then it is considered
MAR+. If the mechanism that created Y Bmis depends only on Y
A
mis and possibly
other observed manifest variables, a possibility in latent class models where the
missingness can depend on the latent class, then this situation constitutes CMAR+
according to Harel (2003, p. 51).
In the context of a model with latent classes, Harel and Schafer (2009) regard
the latent indicators of class as MCAR, and require that missing data in the manifest
47
variables be at worst MAR. To reiterate, however, although the manifest variables
may be conditional on the completely unobserved latent class, which by traditional
definition would constitute MNAR, it is considered to be CMAR+, and therefore
ignorable. According to this, MAR can be assumed and MI should work at both
stages, first when latent classes are imputed, and then after when the manifest
variables are imputed for each imputed latent class. If the manifest variables for
some reason were considered to be MNAR, then the MNAR mechanism would need
to be addressed as well.
In the two-stage MI method, m sets of Y Amis, or class indicators, are first gen-
erated and retained from an MCMC chain in the I-step. That is,
A(j)
Ymis ∼ Pr(Y Amis|Yobs), (2.51)
where j = 1, 2, ...,m. Then, for each m sets of the latent classes, assuming that the
latent classes are fixed as known, n− 1 sets of Y Bmis are generated. That is,
B(j,k)
Ymis ∼ Pr(Y Bmis|
A(j)
Yobs, Ymis ), (2.52)
where k = 1, 2, ..., n. Here, there n − 1 sets because the first set is generated from
first stage of imputations. This will result in a total of mn complete datasets that
are to be analyzed.
After analysis, the resulting mn parameter estimates are averaged to obtain
48
the final estimates. That is,
∑m ∑n1
θ̄.. = θ̂(j,k), (2.53)
mn
j=1 k=1
where θ̂(j,k) is the estimated parameter for stage 1 imputed data set j, for stage 2
imputed data set k.
Standard errors are combined using a nuanced version of Rubin’s rules derived
by Shen (2000). Analogous to Rubin’s rules, Shen’s rules formulate the resulting
standard errors, SE2stgT , to be a function of the within-imputation variability, V
2stg
w ,
the between-imputation variability, V 2stgb , and a third component: the complete-
data variance, Û (j,k)... Specifically, if U is the estimated squared standard error of
the parameter estimate, then
√
SE2stg − 1 2stg 1= Û + (1 )V + (1 + )V 2stgT .. w b , (2.54)n m
where the complete-data variance is
1 ∑m ∑n
Û.. = U
(j,k), (2.55)
mn
j=1 k=1
the between-block imputation variance is
∑m
V 2stg
1
b = (θ̄j. − θ̄
2
− ..
) , (2.56)
m 1
j=1
49
and the within-block imputation variance is
∑m ∑n
V 2stg
1 1
= (θ̂ − θ̄ )2w jk j. . (2.57)m n− 1
j=1 k=1
The two-stage imputation method has been mainly applied to empirical data
in situations involving missing data. For example, it has been applied when the
missing data can be thought of as being of distinct types of missingness, such as
surveys containing planned and unplanned missing data (Graham, Taylor, & Cum-
sille, 2001), longitudinal models with ignorable intermittent missing data points
(Hedeker & Gibbons, 1997) and non-ignorable dropout missing (Harel, 2003). The
method has also been applied with latent variable models that involve analysis with
missing data and unobserved latent variables such as latent class regression models
(Harel, Chung, & Miglioretti, 2013) and latent class contingency tables Winship,
Mare, and Warren (2002).
Despite finding studies that used the two-stage imputation method to address
missing data, there were no studies that tested the method through simulation on
different conditions of missingness or GMMs. Thus, while the method has been
theoretically justified and applied in several different scenarios, the extent to which
the method will work and how well it will perform relative to other missing data
handling methods such as FIML, single imputation, or fully Bayesian methods is
unclear.
50
2.4.5 Summary: Comparison of Methods
As was mentioned previously, Enders (2010) and Enders and Gottschall (2011)
have strongly cautioned against using MI and have suggested using FIML-EM for
finite mixtures in the presence of missing data. While this is the general recommen-
dation, aside from the study by Enders and Gottschall (2011) and Depaoli (2013),
there were no other studies that systematically investigated the performance for
FIML-EM on GMMs. Lu et al. (2011) demonstrated evidence through simulation
that fully Bayesian methods are another viable option for handling missing data in
the context of GMMs. The research was limited however, because the conditions
tested were limited. Depaoli (2013) also investigated the estimation of GMMs under
a Bayesian framework with varying conditions, but did not investigate how missing
data would effect the results. The two-stage imputation method seems to be a viable
solution to the single multiple imputation problem and has been used for several
missing data problems involving latent classes, but none of the studies investigated
the method on GMMs with missing data, whether by simulation or empirical data
analysis.
In light of of these studies, there is an evident gap in the literature regarding
GMMs and missing data. Therefore, the purpose of the current study is to fill this
gap by testing how the different methods for addressing missing data, as suggested
by the current available research studies, perform on a growth mixture model.
51
2.5 Research Goals
The main purpose of this study is to extend the use of these proposed methods:
FIML-EM, fully Bayesian using Gibbs sampling, and a two-stage MI approach,
to handle missing data in view of modeling conditions that are common in the
growth mixture modeling and missing data literature. In many GMM simulation
studies, these conditions include sample size, number of classes, and latent class
separation. In many missing data studies, these conditions include rate of missing
and missingness mechanism. Other important considerations that have been the
focus of studies on GMMs such as class enumeration and fit evaluation criteria (see,
e.g., Bauer, 2007; Bauer & Curran, 2003) will not be considered, which are considered
one of many limitation of the study. These factors will not be explored because the
intention of the current study is to focus on the effects of missingness on parameter
estimation. Specifically, the current research will compare the performance of list-
wise deletion, FIML, single-stage MI, two-stage MI, and a fully Bayesian approach
for estimating growth mixture models when there is missing data at various occasions
under varying conditions commonly tested in the GMM literature as well as different
missing data conditions that may have an influence on how these methods perform.
The specific research questions that this study would like to address are as
follows:
1. For what rate of missing data and class separations will the GMM produce
biased parameter estimates when utilizing the different missing data handling
methods under conditions of MAR (or CMAR+) missingness and MCAR miss-
52
ingness?
2. When the goal is to classify individuals into groups, for which method will
the growth model provide the highest percentage of classification accuracy for
varying rates of missing data, sample size, class separation and missingness
mechanism?
3. Are there any advantages to using any of the missing data handling methods
in the context of GMMs? Particularly, is using a more complicated method
worth the extra effort for producing unbiased parameter estimates?
4. In addition to bias, which missing data handling methods produce standard
error estimates that are most accurate?
In order to derive at answers to these questions, a Monte Carlo simulation study
will be conducted, details of which will be laid out in Chapter 3. Chapter 4 will then
focus on the results from the simulation study outlined in Chapter 3, and Chapter
5 will conclude the study with a discussion on the results, limitations of the study,
and possible future extensions to the current study.
53
Chapter 3: Methods
A Monte Carlo simulation method will be used to compare the differences
between the varying methods available for handling missing data in the context of
growth mixture models. Monte Carlo simulation involves the comparison of differ-
ent methods of data analysis using known population parameters as the basis of
comparison. Conditions that are known to impact the parameter recovery of the
population parameters include: missingness mechanism, missing rate, sample size,
class proportion, and class separation. The simulation will involving the following
steps:
Step 1 : A series of complete datasets will be generated using a set of known pa-
rameters. The desired sample size, class proportion, and class separation
conditions will be achieved at this step.
Step 2 : Each dataset will be made incomplete, a process called data amputation, at
various percentages of missingness, following rules of MNAR (or CMAR+)
and MCAR missing data mechanisms.
Step 3 : The incomplete datasets will be analyzed using six missing data handling
approaches: Listwise deletion, FIML-EM, MI, two-stage MI, and Bayesian
54
via Gibbs sampling.
Step 4 : Parameter estimates produced from each approach for all conditions will be
compared with the parameters from a complete data analysis (analysis on
the data prior to incorporating any missing data conditions) and evaluated
on the relative bias of parameter estimates, accuracy of the standard errors
of the estimates, classification accuracy, and convergence rates.
Each section in this chapter will provide details on each of these four steps. The
chapter will conclude with a summary of the expected outcomes from the simulation.
3.1 Step 1: Data Generation
The first part of the data simulation will involve generating data from a two-
class growth model with random intercepts and slopes. This will be done by param-
eterizing a two-group factor analytic model such that the loadings will be fixed to
model linear growth, as was described in Section 2.2, and varying the means of the
latent growth factors for the two classes to achieve varying degrees of class separa-
tion. Latent growth factor variances and covariances will be invariant for the two
classes. The degree of separation between the two classes will be further discussed
in Section 3.1.1. Although simulation studies in the past have investigated other is-
sues with more than two classes, this number was chosen in order to keep the study
focused on the missing data handling issue.
The number of timepoints or repeated measures will also be based on previous
research that have investigated growth modeling and GMMs through simulation. A
55
general recommendation by Willett, Singer, and Martin (1998) and Vickers (2003)
is to use at least three timepoints for linear growth models, although B. O. Muthén
and Curran (1997) showed how the required sample size to achieve the same amount
of power sharply dropped when going from three to four timepoints, with a mini-
mal drop observed after adding more than four timepoints. Other simulation studies
exploring the use of GMMs by Monte Carlo simulation also used at least four time-
points (see e.g., Depaoli, 2013; Lu et al., 2011; McNeish & Harring, 2017; Nylund
et al., 2007; Tofighi & Enders, 2007). For this reason, the current study will use a
population model with four timepoints.
Sample sizes for each class and separation between classes will be manipulated
during the data generation. Details on how these conditions will be chosen and
manipulated are presented next.
3.1.1 Class Separation
Latent class separation has been shown to have a large effect on the estimation
accuracy of GMMs. For GMMs, class separation is often regarded as the amount
of overlap between latent class growth trajectories, where the greater the overlap
between classes, the closer the distance between classes, and the more difficult it is
to differentiate between latent classes. The specific conditions to manipulate class
separation will be based on previous studies that have investigated realistic measures
of separation typically found in empirical data.
Class separation can be defined in several different ways. For example, Tofighi
56
and Enders (2007) achieved varying levels of separation by specifying different
within-class variance parameters across latent classes. Other studies have defined
class separation through differences in the latent growth factors (Depaoli, 2013; Li,
Chen, Cui, & Liu, 2017; Lubke & Neale, 2006; Tueller & Lubke, 2010), although
Depaoli (2013) showed that this is equivalent to differences in the manifest repeated
measure variables. Following the suggestion of these studies, the current study will
generate data from subpopulations with different latent growth factors to achieve
varying degrees of class separation as defined by the Mahalanobis distance.
Assuming that the class variance/covariance matrices are equal, the Maha-
lanobis distance (MD; Mahalanobis, 1936) is a measure used to quantify class sep-
aration [although measures of distance when covariance matrices are not assumed
equal are also available (Anderson & Bahadur, 1962)]. For the current study, the
MD will be used as a measure of class separation, and as a result, it will be assumed
that the covariances of class growth parameters between classes are equal. The MD
is defined as follows:
[ ]
MD = (µ(1) − µ(2))′Σ−1 1/2(µ(1) − µ(2)) , (3.1)
where the µ terms represent the means of the growth factors (intercept means and
slope means of the two classes) and the Σ−1 represents the inverse of the common
variance/covariance matrix of the growth parameters for all classes. The MD dis-
tance can therefore be manipulated by varying the values of the population latent
growth intercept and slope means for each class. Previous studies have used MDs
57
as low as 0.5 to represent very poor separation, MDs as high as 2.0 to represent
high separation, and MDs of 1.0 and 1.5 to represent poor and moderate separation
(see,e.g. Depaoli, 2013; Lubke & Muthén, 2007; Lubke & Neale, 2006; Tueller &
Lubke, 2010). These same distances will be used for the current study.
In order to achieve the four MD conditions, the class 1 parameters will re-
main fixed while the class 2 parameters will be changed to achieve the desired MD.
The class 1 parameters will be fixed based on parameters that were used in Kaplan
(2002), which investigated trajectories of reading development among kindergartners
and first graders. These parameters were also used in a simulation study by Depaoli
(2013). Growth parameter variances and covariances, as well as time-specific resid-
uals will remain fixed. Population values that will be used for the data generation
for each distance condition are presented in Table 3.1. In addition to population
parameter values, the population class 1 and class 2 growth curves can be seen in
plotted in Figure 3.1.
58
Table 3.1
Data Generating Parameters for Varying MD Conditions
High Moderate Poor Very poor
Parameter separation separation separation separation
Means (MD ≈ 2.0) (MD ≈ 1.5) (MD ≈ 1.0) (MD ≈ 0.5)
Class 1 48.00 48.00 48.00 48.00
Intercept
Class 1 3.00 3.00 3.00 3.00
Slope
Class 2 40.82 43.10 45.66 47.34
Intercept
Class 2 4.00 4.00 4.00 3.60
Slope
[ ]
18.00
Growth parameter variances (Ψ) =
1.20 2.00
Time-specific residuals (Θ) = 15I4
Class 1 curve
MD = 0.5 (Very poor separation)
MD = 1.0 (Poor separation)
MD = 1.5 (Moderate separation)
MD = 2.0 (High separation)
0.0 0.5 1.0 1.5 2.0
x
Figure 3.1. True population class 1 and class 2 growth curves for varying
degrees of separation.
59
y
35 40 45 50 55
3.1.2 Sample Size
The sample sizes that will be investigated for the current study will come
from previous simulation studies. Simulation studies involving growth modeling have
demonstrated that sample size interacts closely with class separation and class pro-
portion to achieve varying degrees of successful parameter recovery (see, e.g. As-
parouhov & Muthén, 2010; Depaoli, 2013; Lubke & Neale, 2006; Tueller & Lubke,
2010). Generally, the findings from these studies suggest that lower sample sizes
require larger distances and closer to equal class proportions in order to properly
identify classes and achieve acceptable levels of bias. Most of these studies, however,
have not considered the impact of their results when there is missing data.
Depaoli (2013) used sample sizes of 150 and 800 with four levels of class sepa-
ration for linear growth mixture models and generally found better estimates with
sample sizes of 800 when separation was poor. The study by Tueller and Lubke
(2010) found that convergence and parameter estimates were consistently accept-
able when sample sizes were greater than 300 together with class separation greater
than MD = 1.5 and generally performed well with sample sizes of 100 but sepa-
ration greater than MD = 2.0 or sample sizes of 1000 with separation at MD =
0.5. Another similar study by Lubke and Neale (2006) investigated small sample
sizes, finding that sample sizes as small as 75 per class were needed even when the
distance between class was small (MD = 1.5) and as small as 25 per class were
needed for correct model detection when separation was large (MD = 3). In light of
these studies, given that similar MD distances will be used to generate data for the
60
different classes, total sample sizes of 100, 200, 500, and 1000 will be used for the
current study, and class proportions will be kept equal.
3.2 Step 2: Data Amputation
The second step following data generation will be to systematically delete
data, a process otherwise known as data amputation. The amputation process will
be carried out so that the desired missing rate and strength of correlation with
variables that cause the missingness can be achieved all whilst adhering to the
desired missingness mechanism.
3.2.1 MNAR Missingness (CMAR+)
To simulate a MNAR missingness mechanism, or the CMAR+ according to
Harel (2003), the probability that individual i will have data point yit at time t
missing will be calculated based on the following logistic function:
{ } exp(m)Pr Rit = 1 = . (3.2)
1 + exp(m)
Rit is the missingness indicator, where if Rit = 1 then the individual i’s observation
at time t is missing. Also, m = βZi + c, where Zi represents the unobserved class
variable, β is the coefficient used to control the relation between class and the
propensity to be missing at timepoint 1 and c will be used to control the rate of
missing. Presented in Table 3.2 are the coefficients that will be used to create the
desired rates of missing and correlation from the class variable.
61
Table 3.2
Coefficients for Creating Missing Data
miss. rate ρ β c
5% 0.25 4.0 2.0
10% 0.34 4.0 1.3
20% 0.36 2.0 0.5
30% 0.37 1.7 -0.1
40% 0.36 1.5 -0.6
Note. The correlation term ρ here is an approximation
of the point-biserial correlation typically used to quantify
the relationship between a continuous variable (missing-
ness at timepoint 1) and a dichotomous variable (class).
3.2.2 MCAR Missingness
For each outcome variable (outcome at each timepoint), a randomly drawn
value from 0 to 1 from a uniform distribution will be compared against a specified
threshold that will create an overall rate of missing. For example, to achieve a 20%
rate of missing, the threshold will be 0.20, where if the randomly generated value
is below 0.20, the value will be amputated and otherwise left as is (not missing).
In this way, the missingness is completely haphazard and adheres to the rules of
MCAR missingness.
Finally, the missing rate for each sample needs to be specified. Since the pur-
pose of this study is to find the tolerable amount of missing data before it is no
longer possible to use GMMs, low percentages of missing data from 5% up to 40%
will be evaluated. Specifically, percentages of 5, 10, 20, 30, and up to 40% will be
investigated.
62
3.3 Step 3: Missing Data Handling Approaches
After the data has been amputated, the six different missing data handling
approaches described in Chapter 2 will be implemented to analyze the data. To
summarize, the methods that will be tested are:
1. Listwise deletion (LD)
2. Full-Information Maximum likelihood (FI)
3. Multiple imputation (MI)
4. Two-stage multiple imputation (2M)
5. Fully Bayesian approach (FB)
There are several potential issues that may arise during the simulation study
that must be addressed. For example, label switching is a common problem when
conducting MCMC simulations dealing with GMMs. This issue is especially perti-
nent to any methods using Bayesian methods. Convergence is another issue that
must be addressed when dealing with GMMs. These issues will be the focus of the
following sub-sections.
3.3.1 Label Switching
Previous research indicates that label-switching can be problematic for finite
mixture models (Tueller, Drotar, & Lubke, 2011). Label switching occurs when
63
the order of the latent classes arbitrarily ”switch” during estimation. When us-
ing Bayesian methods, including multiple imputation, there are two different places
where the label switching can occur. First, label switching can occur arbitrarily dur-
ing the MCMC chain (i.e., within-chain label switching). Second, it can occur across
chains (i.e., between-chain label switching). In both these situations, label switching
can distort the final estimates and complicate the assessment of convergence.
The issue of label switching has been studied by Celeux, Hurn, and Robert
(2000), and several ways to prevent it have been suggested. A common approach
that is used to avoid within-chain label switching, which will be used in this study,
is to constrain the prior for the intercept of one class to be greater than the intercept
of the other. This will then prevent the Gibbs sampler from switching mid-chain.
Between-chain label switching will be avoided by using a single chain. It should be
noted that when a single chain is used, convergence can be assessed by taking the
PSR of the values towards the end of the chain (Asparouhov & Muthén, 2010).
For non-Bayesian methods such as FIML-EM, label switching is possible be-
tween replications. To avoid this issue, means of the growth factors will be con-
strained so that the mean for class 1 is always greater than the mean for class
2. This method is commonly used in simulation studies involving GMMs to avoid
between replication label switching.
64
3.3.2 Convergence Issues
Simulation studies involving finite mixture models are often susceptible to con-
vergence issues, whether by FIML-EM or MCMC methods. For example, Tueller and
Lubke (2010) explained that nonconvergence can be a cause of several factors such as
model mispecification, model nonidentification, insufficient starting values, or char-
acteristics of the data such as multivariate outliers. Local maxima solutions also
pose a threat to solutions produced by FIML-EM methods. For Bayesian MCMC
methods, label switching within and between chains will also cause problems in the
assessment of convergence using the PSR as well as problems with the estimation
of parameters.
For FIML-EM methods, to avoid local maxima solutions, Tueller and Lubke
(2010) recommend replicating the final likelihood by varying the starting values and
setting a maximum number of replicated likelihoods. In order to implement this
suggestion, 100 replications with perturbed starting values will be used, and a min-
imum criteria of 10 optimized likelihoods will be used as a way to show convergence
to a global maximum.
For the MCMC methods, convergence will be assessed using the PSR factor,
where a PSR of 1 with a tolerance of ±0.05 will be considered converged. As was
discussed in the previous section, label switching will be accounted for by specifying
certain constraints within a chain and creating one single chain to avoid any between
chain label switching.
Low convergence rates can compromise the results of the simulation study
65
because a low number of simulation replications will not represent a true sampling
distribution of the population parameters. It is expected that some conditions, such
as ones with low sample sizes, high percentages of missing data, and low separation
will experience higher rates of non-convergence. Such issues will be identified so that
any replications with convergence problems will be skipped until enough replications
are collected to satisfy the desired number of replications. Also, the total number of
replications needed to satisfy the minimum number of replications will be recorded in
order to report on an overall convergence rate. These convergence rates will provide
evidence of the findings from the simulation study.
3.3.3 Number of Imputations
For the multiple imputation approach, there are several specifications that
must be made. One of these is the choice of number of imputations. It has been
suggested that the number imputations for single-stage MI be set at M = 40 in order
to avoid any power issues (Graham, 2009). McGinniss and Harel (2016) showed that
under MAR conditions and smaller rates of missing information (less than 40%),
increasing the number of imputations for multiple-stage MI models did not produce
significantly different bias or coverage rates among varying combinations of number
of imputations at each stage, especially for problems involving estimation of point
estimates and variances. Therefore, they recommended using as few as N = 10
imputations in the first stage and M = 2 imputations in the second stage. To be
conservative, for the two-stage MI approach, N = 20 imputations will be used for
66
the first stage and M = 4 imputations will be used for the second stage for a total
of 80 imputations per dataset.
3.3.4 Choice of Priors
Using a Bayesian approach in the context of GMMs requires a careful consider-
ation of the types of priors that will be used, especially when sample sizes are small
since the influence of priors becomes greater as sample size decreases. In certain
situations, variances of growth factors are especially susceptible to prior misspecifi-
cation (Depaoli, 2013; McNeish, 2016). To investigate the kind of priors that should
be used for the growth model in the current simulation study, a preliminary simula-
tion will be conducted on the full data (without any of the missing data conditions)
to see how different priors will affect variance parameter estimation. The simulation
will involve comparing maximum likelihood estimation using the EM algorithm, and
a Bayesian MCMC method using prior specifications for the variance parameters as
suggested by McNeish (2016) for latent growth models. The prior specifications that
will be compared are as follows:
1. The default Mplus improper inverse Wishart distribution prior for the variance
parameters, or IW (0, d = −p − 1), where the first argument represents a
scale matrix, the second argument represents the degrees of freedom, and p
represents the number of elements in the diagonal of the factor covariance
matrix. For the current study, the prior(s [for th]e ele)ments of the variance0 0
covariance matrix will be specified as IW ,−3 .
0 0
67
2. An inverse Wishart distribution for the variance covariance matrix, where
the first argument is the identity matrix, I, and the second argument is p.
This is called the proper inverse prior, which was previously recommended
by Chen (2011), Lunn, Jackson, Best, Thomas, and Spiegelhalter (2013), and
Asparouhov and Muthén (2010). In other words, a possible proper prior for the
variance covariance matrix is IW (pI, p)1. For this particular(s[tudy,]the)prior2 0
for the variance covariance structure will be specified as IW , 2 .
0 2
3. A data driven prior that substitutes the variance covariance elements of the
IW distribution with those estimated from a standard FIML-EM run. In other
words, the elements of the IW distr(ibutio[n wil]l dep)end on the estimates of thea b
variance covariance matrix, or IW 2 × , 2 , where a, b, c are the esti-
b c
mates of the intercept variance, intercept slope covariance, and slope variance,
respectively.
The results from the complete data study (no missing data conditions) are
summarized in Table 3.3, where the aggregated relative bias values of all the pa-
rameters for each method are presented. The Table shows that the noninformative,
inverse Wishart priors (IW) for the variances showed the lowest values of relative
bias. None of the estimates of the variance parameters were extremely biased. Al-
though these findings were inconsistent with suggestions from past studies, for this
particular situation, the improper inverse priors seemed to be the best choice to
use for the missing data handling methods that would be compared in the main
1Per Lunn et al. (2013), who advise scaling I by p
68
simulation study.
Table 3.3
Mean Relative Bias Values by Analysis Method (Complete Data
Methods)
Variable MLEM IW PW DD
Class 1 Intercept 0.04 0.00 0.02 0.00
Class 2 Intercept -0.01 0.00 -0.01 -0.11
Class 1 Slope -0.25 -0.04 -0.07 -0.06
Class 2 Slope 0.14 0.03 0.06 0.05
Intercept Variance -0.05 0.08 -0.19 0.01
Slope Variance -0.19 -0.04 -0.19 -0.14
Int. Slope Covariance -0.20 -0.06 0.64 0.06
Residual Variance -0.01 0.00 0.01 0.00
Proportion -0.11 -0.02 -0.02 0.09
Note. MLEM = full data ML, IW = full data Bayesian with improper IW
priors, PW = full data Bayesian with proper IW priors, DD = full data
Bayesian with data-driven priors. Values in bold indicate problematic values
according to several studies (Curran, West, & Finch, 1996; Flora & Curran,
2004; Kaplan, 1989).
3.4 Summary of Manipulated Conditions
Table 3.4 shows a summary of the conditions that will be fixed and manipu-
lated. Fully crossed, a total of 800 combinations of manipulated conditions will be
evaluated. For each sample size and degree of separation condition, an additional 32
conditions will be evaluated using the complete data using the ML-EM method or
a fully Bayesian approach with improper IW priors. The results from the complete
data analysis will serve as a basis for comparison for the missing data methods.
69
Table 3.4
Overview of Simulation Conditions
Factors Levels
Repeated measures 4 timepoints
Latent classes 2 classes
Sample size 100, 200, 500, 1000
Degree of separation (MD) 0.5, 1.0, 1.5, 2.0
Missing rate 5, 10, 20, 30, 40
Missing mechanism MCAR, CMAR+
Missing data handling method LD, FI, MI, FB, 2M
Note. LD = listwise deletion, FI = full-information maximum likelihood (EM),
MI = single-stage multiple imputation, FB = fully Bayesian, 2M = two-stage
multiple imputation, MCAR = missing completely at random, MNAR = miss-
ing not at random, CMAR+ = conditional extended missing at random.
For each sample size and class separation condition, datasets will be created
in order to achieve a sufficient number of replications that will achieve stable av-
erage bias values. During a preliminary simulation, a running mean of the most
egregious condition (40% missing, listwise deletion, with a sample size of 100 and
very poor separation) was examined, with the idea that the parameter estimates
from this condition would perhaps take the longest to stabilize. Both the mean and
variance estimates were monitored. As can be seen in Figure 3.2, 100 replications
was sufficient for the means and variance estimates to become stable. Therefore, it
was determined that 100 replications would be used for every cell in the simulation.
70
class 1
class 2
0 20 40 60 80 100
Iteration
0 20 40 60 80 100
Iteration
Figure 3.2. Running means of the slope mean (top) and slope variance
(bottom) parameter estimates for the listwise deletion method. Horizon-
tal lines are provided to show the true parameter values.
Class proportions, or the number of cases in each latent class, is another im-
portant manipulable factor that can have an effect on the estimation of GMMs. For
example, Tueller and Lubke (2010) and Tofighi and Enders (2007) found that classes
with relatively small sample sizes decreased the accuracy of identifying the classes.
Although unrealistic, the proportion of each latent class will be kept the same in
order to keep the study focused on the missing data handling techniques and other
conditions.
All data simulation and manipulation will be conducted using Mplus version
71
Variance (slope) Mean (slope)
2 3 4 5 2 3 4 5
8.1 (L. K. Muthén & Muthén, 1998) and R (Team, 2008). To facilitate the manipu-
lation of all the conditions, R will be used to call Mplus, manipulate the codes, and
collect the necessary output that will be used to evaluate the methods. The criteria
that will be used to make the comparisons are discussed next.
3.5 Step 4: Evaluation Criteria
Several criteria will be considered when evaluating the results of the follow-
ing simulation study. Convergence rates, accuracy rates, parameter estimate relative
bias, and parameter estimate standard errors will be considered. Because the main
interest of the study is to investigate how the different missing data handling meth-
ods perform under certain missing data conditions, the overall performance of the
methods will be evaluated first. This will be done by comparing the means and
medians of the absolute relative bias and standard error bias rates of all the param-
eter estimates combined, grouped by method of analysis. Then, to pinpoint specific
differences across parameters, a factorial ANOVA will be conducted to find any ma-
nipulated conditions or interactions between conditions that show any meaningful
differences across specific parameter estimates that account for differences in relative
bias and/or standard error ratios.
3.5.1 Convergence Rates
It is expected that some conditions, such as ones with low sample sizes, high
percentages of missing data, and low separation will experience higher rates of non-
72
convergence. Some replications may converge but produce nonsensical negative vari-
ance estimates. Solutions from such replications can taint the overall results of a
particular cell, so they will not be included as a valid replication. The final conver-
gence rate will be calculated as the total number of fully-converged datasets divided
by the total number of replications that were needed to get to the desired number
of 100 replications. The susceptibility of non-converge will be higher for the two-
stage imputation because any of the imputed datasets at each stage may experience
convergence issues. To address this, extra imputations will be created at each stage
and filtered through to obtain imputed sets that have converged and have produced
sensical solutions (i.e., positive estimates of the variance).
3.5.2 Accuracy Rates
Classification accuracy will be evaluated based on the percentage of correct and
incorrect classifications. High classification accuracy will mean that a high percent-
age of individuals were classified into the true class. Most likely class memberships
for each individual can be obtained from the posterior probability after estimation.
These class assignments will then be compared to the individual’s true class.
3.5.3 Relative Bias and Absolute Bias
For each manipulated condition, parameter recovery of the fixed and random
effects for the set number of replications will be assessed using relative bias. Relative
bias is computed as the difference between the mean of the parameter estimates and
73
the true value of the parameter divided by the true value of the parameter. That is,
(∑reps )
b=1 θ̂b − θ
reps 0
θ̂RB = ,
θ0
where θ̂b is the parameter estimate, θ0 is the population value for θ and reps is the
number of replications. Relative bias is a percentage, and ideally will be close to
zero. In many studies, a cut-off of 10% relative bias is considered substantial (see,
e.g. Curran et al., 1996; Flora & Curran, 2004; Kaplan, 1989). The same criteria
will be used to flag unacceptable levels of bias.
In addition to relative bias, absolute bias will be considered. Absolute bias
will be calculated as the absolute value of the relative bias, which will be used to
assess the overall bias observed across parameter estimates for each data analysis
method. These absolute bias measures will then be aggregated using means in order
to observe the overall bias for each analysis method. Medians will also be computed
in order to assess the bias without any outliers.
3.5.4 Standard Error Bias
Bias of the standard errors of the estimates, or the analogous posterior stan-
dard deviations of the estimates for the Bayesian method, will be evaluated by the
ratio between the square root of the mean variance of θ̂ and the standard deviation
of the 100 parameter estimates. That is,
SE(θ̂)
SE Bias = ,
SD1(θ̂)
74
where √√√
SE(θ̂) = √ ∑100100−1 [SEb(θ̂)]2,
√ b=1
and SD1(θ̂) = SD(θ̂) · 99/100, which is the corrected sample standard deviation
of the 100 parameter estimates. Standard errors are considered to be accurate when
this ratio is close to 1. Standard error bias values greater than 1 indicate overesti-
mated standard errors, which lead to Type II errors, and values less than 1 indicate
underestimated errors, which lead to Type I errors.
3.5.5 Factorial ANOVA
In order to pinpoint any effects of the manipulated conditions on the parameter
estimates, a split-plot design factorial ANOVA will be conducted. In this particu-
lar design, the missing data handling method will be regarded as a within-subject
factor, and all other conditions will be considered between-subject factors. All main
effects (sample size, class separation, missing mechanism, missing rate, and missing
data handling method) and up to three-way interactions will be considered on the
relative bias and standard error bias of the estimates of the means, variances and
class proportion parameters (9 parameters in total: two intercept means, two slope
means, intercept and slope variances, intercept and slope covariances, residual vari-
ances, and two class proportions). Only factors and interactions that are identified
as statistically significant (p-value ≤ 0.05) and with effect sizes of η2 ≥ 0.06 (Cohen,
1988) will be presented and discussed with additional detail.
75
3.6 Expectations from the Study
Given the past studies and the review of literature that was presented in
Chapter 2, the expectations of the simulation study are as follows:
1. Previous studies have suggested using FIML-EM in finite mixture contexts, so
the expectation is that it will perform well in terms of classification accuracy,
parameter recovery and standard error estimation across both missingness
mechanisms, large sample sizes, and moderate to large class separation. Its
performance with smaller sample sizes and poorer class separation conditions
should be not as optimal as using a Bayesian estimation approach.
2. For samples less than 500, the fully Bayesian method is expected to perform
better than FIML-EM-based methods in terms of classification rates and pa-
rameter recovery. Differences in performance will not be as apparent for larger
samples.
3. The two-stage MI approach should outperform single-stage MI, regardless of
missingness mechanism, because previous studies indicated that MI will bias
parameter estimates when the grouping structure is not accounted for (Enders
& Gottschall, 2011).
4. Larger rates of missing will severely impact the performance of most methods
and an interaction with sample size and class separation with missing data
handling method should be significant. In other words, the extent of severity
of convergence, classification accuracy, parameter estimate bias, and parameter
76
estimate standard error bias will mainly depend on the sample size and class
separation.
77
Chapter 4: Results
This chapter will present the results from the simulation study that was de-
scribed in Chapter 3. A few aspects of the simulation that were not planned a-priori
will be discussed and considered in the organization of this chapter. The original
analysis plan did not include running a separate analysis on the full dataset (dataset
without any missing data) using different methods. Initially, the plan was to run the
full data analysis using only maximum likelihood via EM (MLEM), and to compare
the results from this analysis with those from the missing data methods when miss-
ing data conditions (missing rate and missingness mechanism) were factored into the
simulation design. This would not have been a proper way to compare the different
methods because the Bayesian-based methods for dealing with missing data would
not have been comparable. An initial simulation comparing results from using ML-
EM and Bayesian inference using the default Mplus noninformative priors revealed
contrasting convergence rates, accuracy rates, and model parameter estimates. As a
result of this preliminary investigation, the simulation design was slightly modified
so that the full, unaltered data (without missing data), was analyzed using both
ML-EM and the Bayesian approach using different priors first. The details of the
different priors that were tested is discussed in Section 3.3.4 of Chapter 2. A thor-
78
ough review of the results from this comparison is presented in Section 4.1 of the
current chapter.
Based on the results from the full data analysis, it made more sense to compare
parameter estimates with estimates from the full data rather than the true estimates
because the estimates from the full data analysis represented the best-case scenario.
In other words, estimates from any missing data methods would be expected to
produce results no better than the full data analysis, so it would only make sense
that the estimates from the missing data methods be compared to the estimates
from the full data methods.
With these changes in mind, Section 4.2 will discuss the results from the anal-
ysis conducted on the data with the missing data conditions. First, the convergence
rates from each missing data handling method for each level of the manipulated con-
ditions will be presented. Then, a summary of the accuracy rates will be provided.
Finally, the relative bias and standard error bias of the parameter estimates will
be presented accordingly based from a factorial ANOVA of the manipulated con-
ditions. To conclude the chapter, the final section, Section 4.3, will be a summary
of the main findings. To aid in the discussion of the chapter, the commonly used
abbreviations, symbols, and terms used in Chapter 3 and beyond are provided in
Table 4.1.
79
Table 4.1
Abbreviations and Symbols Used for Summary of Results
Condition/Variable Term Levels
Sample Size SS 100, 200, 500, 1000
Separation SP 0.5 (vpoor), 1.0 (poor), 1.5 (mod), 2.0 (high)
Missing Rate (%) MR 5, 10, 20, 30, 40
Missing Mechanism MM MCAR,CMAR+
Method ME MLEM, IW, PW, DD, LD, FI, MI, FB, 2M
Intercept and Slope Means µint
Slope Mean µslope
Intercept Variance Ψint
Slope Variance Ψslope
Intercept Slope Covariance Ψ12
Residual Variance Θ
Class Proportion π
Note. MLEM = full data ML, IW = full data Bayesian with improper IW priors, PW
= full data Bayesian with proper IW priors, DD = full data Bayesian with data-driven
priors, LD = listwise deletion, FI = full-information maximum likelihood (EM), MI
= single-stage multiple imputation, FB = fully Bayesian, 2M = two-stage multiple
imputation.
80
4.1 Complete Data Analysis
The full data analysis involved comparing convergence rates, classification ac-
curacy rates, relative bias of the parameter estimates and accuracy of the standard
errors from GMM modeling on data without any missing data using two methods:
ML estimation via EM (MLEM) and Bayesian estimation via Gibbs sampling. As
was discussed in Section 3.3.4, the Bayesian method involved using three different
sets of priors on the estimates of the variance parameters in order to identify a
good set of priors for the missing data analysis. The results from this preliminary
simulation study are presented next.
4.1.1 Convergence Rates
As was discussed in Section 3.3.2, convergence was identified when the repli-
cation satisfied all convergence criteria for the method that was being tested. The
convergence rates for the methods used to analyze the complete data are presented in
Table 4.2. The table shows that the Bayesian methods produced higher convergence
rates than MLEM across different sample size and class separation conditions, with
the exception of when the data-driven (DD) priors were used. This was an expected
result because in order to obtain the priors for the DD condition, the MLEM esti-
mation from where these priors came from would need to have converged. In other
words, if the initial ML estimation did not converge, then the DD prior method
would also have counted as non-converged.
81
Table 4.2
Mean Convergence Rates for Complete Data Methods Grouped by Separa-
tion and Sample Size
Method SP N = 100 N = 200 N = 500 N = 1000
MLEM vpoor 0.42 0.57 0.78 0.78
IW 0.85 0.89 0.88 0.88
PW 0.93 0.91 0.97 0.93
DD 0.51 0.60 0.68 0.76
MLEM poor 0.43 0.65 0.79 0.83
IW 0.86 0.87 0.89 0.93
PW 0.91 0.94 0.98 0.97
DD 0.53 0.61 0.68 0.74
MLEM mod 0.50 0.65 0.78 0.78
IW 0.88 0.90 0.88 0.92
PW 0.94 0.94 0.94 0.96
DD 0.56 0.61 0.77 0.76
MLEM high 0.55 0.68 0.79 0.69
IW 0.70 0.95 0.94 0.91
PW 0.94 0.93 0.96 0.96
DD 0.59 0.70 0.85 0.86
The convergence rates for the Bayesian approach using inverse-Wishart (IW)
and proper inverse-Wishart (PW) priors remained relatively stable at 85% or more,
with the exception of one condition at N= 100 and high separation for the IW priors.
The PW priors showed the highest rates of convergence across all conditions near
90%. The IW priors also produced relatively high convergence rates and matched
the convergence rates from using the PW priors. As sample size increased, the rate
of convergence increased for the MLEM method, increasing from 55% to 70% in the
high separation condition, although never reaching the same rate of above 90% that
the Bayesian method reached when using PW priors.
4.1.2 Classification Accuracy Rates
The accuracy rates for the full data analyses are presented in Table 4.3. The
accuracy rates for the MLEM and Bayesian approach using different prior specifica-
82
tions were quite poor when the separation was very poor regardless of sample size,
dwindling at around 50% to 55%, although the Bayesian inference in general was
producing classification accuracy rates that were slightly higher (no more than 2%
greater). As separation increased, however, accuracy rates increased for all methods.
Accuracy rates also increased as the sample size increased for the high separation
condition, although the magnitude of difference was different for different methods.
For example, for the MLEM method, the classification accuracy rates reached up
to 11% higher when the separation was high. With Bayesian inference, the accu-
racy rates went from as low as 53% to as high as 70% (using PW priors) when the
separation was high.
Another notable observation is that the Bayesian approach across different
prior specifications produced similar classification accuracy rates across sample size
and separation conditions. The DD priors always produced slightly lower (around
1% to 2% less) classification accuracy rates, which may have been due to the fact
that the priors were constructed based on the results from the MLEM method.
These results indicate that class separation contributed to how accurately the mod-
els were able to classify individuals, which is consistent with the literature. Similar
to the observations made with the convergence rates, the Bayesian method consis-
tently produced higher accuracy rates than MLEM across all cells, albeit not as as
noticeably as the convergence rates.
83
Table 4.3
Mean Classification Accuracy Rates for Complete Data Methods Grouped
by Separation and Sample Size
Method SP N = 100 N = 200 N = 500 N = 1000
MLEM vpoor 0.51 0.51 0.51 0.52
IW 0.53 0.54 0.54 0.54
PW 0.53 0.53 0.53 0.54
DD 0.53 0.53 0.53 0.53
MLEM poor 0.53 0.54 0.53 0.54
IW 0.59 0.59 0.58 0.58
PW 0.59 0.59 0.58 0.59
DD 0.59 0.59 0.57 0.56
MLEM mod 0.57 0.56 0.57 0.57
IW 0.64 0.64 0.64 0.63
PW 0.64 0.65 0.64 0.64
DD 0.65 0.63 0.62 0.60
MLEM high 0.59 0.61 0.63 0.63
IW 0.69 0.69 0.68 0.68
PW 0.70 0.70 0.70 0.70
DD 0.68 0.68 0.68 0.68
4.1.3 Relative Bias
The means and medians of the overall absolute relative bias measures across
ME and SS, and ME and SP are provided in Table 4.4. The absolute bias values
were calculated by taking the absolute bias of the relative bias of each parameter
and aggregating the values across all the parameters. This was done in order to
get a general overview of bias for each method at varying conditions of sample
size and class separation. Medians are provided alongside the means in order to
more accurately assess the effects of any parameters that had extreme outlying
relative bias measures. Values in bold indicate relative bias values exceeding the
10% acceptable threshold according to Curran et al. (1996) and others. A summary
of this information is also provided by plots in Figure 4.1 for each sample size
condition, and the plots in Figure 4.2 for each separation condition.
84
Focusing on medians across sample sizes, the tables show that the Bayesian
methods using the IW priors and DD priors produced less biased parameter esti-
mates than those from MLEM. Table 4.4 shows that overall, parameters estimated
using MLEM were biased for all sample size conditions and the bias increased as
sample size decreased. The MLEM estimates for smaller samples sizes, on average,
produced relative bias measures of approximately 6.5% higher than the acceptable
threshold of 10%. The mean was consistently higher for MLEM method, a reflection
of a few parameters being extremely biased. The IW method was consistently below
the 10% threshold, but the mean at high separation produced a relative bias value
of 11.2% although the median bias remained at 2.7%, which indicates that a few
parameters were extremely biased. The PW priors produced medians of relative bias
below the acceptable threshold but means consistently above, again indicating that
certain parameters were quite biased well beyond the 10% acceptable cut-off.
Table 4.4
Means and Medians of the Overall Absolute Relative Bias Grouped by Method,
Sample Size (top), and Class Separation (bottom)
N = 100 N = 200 N = 500 N = 1000
Method Mean Median Mean Median Mean Median Mean Median
MLEM 0.167 0.145 0.161 0.166 0.125 0.092 0.125 0.100
IW 0.079 0.023 0.079 0.029 0.067 0.021 0.076 0.028
PW 0.151 0.059 0.148 0.040 0.127 0.051 0.114 0.033
DD 0.090 0.053 0.085 0.070 0.098 0.093 0.123 0.120
vpoor poor mod high
Method Mean Median Mean Median Mean Median Mean Median
MLEM 0.161 0.166 0.131 0.121 0.137 0.103 0.149 0.108
IW 0.082 0.034 0.037 0.014 0.069 0.025 0.112 0.027
PW 0.165 0.059 0.152 0.036 0.119 0.035 0.105 0.031
DD 0.129 0.108 0.094 0.085 0.087 0.070 0.086 0.057
85
N = 100 N = 200
30 30
20 20
10 10
0 0
MLEM IW PW DD MLEM IW PW DD
N = 500 N = 1000
30 30
20 20
10 10
0 0
MLEM IW PW DD MLEM IW PW DD
mean median
Figure 4.1. Overall means and medians of the absolute relative bias by
ME and SS.
The bar plots across class separation (SP) and method (ME) in Figure 4.2
show that the means and medians of the relative bias measures decreased with larger
separation. For example, the relative bias measures decreased when the separation
went from very poor to high. The MLEM medians of the relative bias values went
from 16% to 10%, the IW method medians of the relative bias went from 3.4% to
2.7%, the PW method medians of the relative bias went from 3.4% to 2.7% and the
PW method medians of the relative bias went from 5.9% to 3.1%. Interestingly, the
mean relative bias increased when the class separation became high, for example,
86
Absolute Relative Bias
for the IW method, although the medians generally decreased. Again, this was an
indication that certain parameters did not change much even with the higher sample
size or when class separation was higher.
Very Poor Separation Poor Separation
30 30
20 20
10 10
0 0
MLEM IW PW DD MLEM IW PW DD
Moderate Separation High Separation
30 30
20 20
10 10
0 0
MLEM IW PW DD MLEM IW PW DD
mean median
Figure 4.2. Overall means and medians of the absolute relative bias by
ME and SP.
To pinpoint any meaningful differences in relative bias of each parameter esti-
mate across conditions, a split-plot design factorial ANOVA was used. Method (ME)
was regarded as a within subject factor while class separation (SP) and sample size
(SS) were used as between subject factors. Presented in Table 4.5 and Table 4.6
are the percentages of variance explained (η2 values) by the manipulated conditions
87
Absolute Relative Bias
for the relative bias of each parameter estimate. Only effects that were significant
according to the criteria specified in Section 3.5.5 were investigated. The table shows
that there was a significant SP×ME interaction for relative bias of the estimates of
the class proportion parameter (η2 = 13). In addition, a significant SS×ME inter-
action was flagged for the relative bias of the the estimates of the class 2 intercept
(η2 = 15.9), proportion (η2 = 11.8), slope variance (η2 = 10.4), and residual variance
(η2 = 31.3). Among parameters with relative bias not flagged for any interaction
effects, significant main effects for SP was observed for the relative bias of the es-
timates of the class 1 intercept mean (η2 = 50), class 2 slope mean (η2 = 20.4),
intercept variance (η2 = 66.7), intercept slope covariance (η2 = 34.5), and residual
variance (η2 = 6.5) parameter estimates. Significant main effects for ME were also
observed for the relative bias of all the estimates.
Table 4.5
Factorial ANOVA Results for Relative Bias of the Estimates of Parameter Means
(Complete Data Methods)
Factor µint µC1 int µC2 slop µC1 slop πC2 c1
F p-val. η2 F p-val. η2 F p-val. η2 F p-val. η2 F p-val. η2
SP×ME - - - - - - - - - - - - 13.9 < 0.01 13
SS×ME - - - 10.4 < 0.01 15.9 - - - - - - 12.7 < 0.01 11.8
SP 373.7 < 0.01 50 47.3 < 0.01 20.4 - - - 35.2 < 0.01 10.6 - - -
SS 47.5 < 0.01 6.3 - - - 108 < 0.01 8 - - - 18.3 < 0.01 7.6
ME 822.4 < 0.01 35.5 99.4 < 0.01 50.5 551.5 < 0.01 77.9 125.3 < 0.01 74.4 207.2 < 0.01 64.4
Note. Cells containing a “-” indicate non-significance (p-value > 0.05) and/or effect sizes less than
6%. SP: class separation; SS: sample size; ME: analysis method.
Table 4.7 is provided to examine the ME×SS interaction of the relative bias
for the estimates of the class 2 intercept, slope variance, residual variance and class
proportion parameter. First, while the ANOVA pointed to differences in the rela-
88
Table 4.6
Factorial ANOVA Results for Relative Bias of the Estimates of Parameter Vari-
ances (Complete Data Methods)
Factor Ψint Ψslop Ψ12 Θ
F p-val. η2 F p-val. η2 F p-val. η2 F p-val. η2
SS×ME - - - 8.6 < 0.01 10.4 - - - 35.2 < 0.01 31.3
SP 445 < 0.01 66.7 - - - 113.5 < 0.01 34.5 4.8 0.028 6.5
ME 602.8 < 0.01 24.5 192.5 < 0.01 77.5 444.1 < 0.01 58.1 187.2 < 0.01 55.4
Note. Cells containing a “-” indicate non-significance (p-value > 0.05) and/or effect sizes less
than 6%.
tive bias measures produced among the methods for each level of sample size, the
majority of these relative bias measures, with the exception of the slope variance
parameter, were below the 10% threshold, meaning that the methods produced ac-
ceptable levels of bias. Focusing on the slope variance parameter, all the methods,
with the exception of the IW method, produced relative bias measures above 10%.
The relative bias for the MLEM method and PW methods both increased in magni-
tude from -17% to -20% as sample sizes increased, while the DD method decreased
from -18% to -12% in magnitude as sample sizes increased. The IW method fol-
lowed a similar trend as the DD method, increasing in relative bias from 2% to 7%
as the sample size increased. However, the IW method was the only method that was
able to keep the relative bias of the slope variance parameter under the acceptable
threshold.
The DD method and MLEM method also produced severely biased parameter
estimates of the intercept means and proportion parameters at sample sizes of 500
and 1000 (-12% and -19% relative bias, respectively). The MLEM method also
produced severely biased parameter estimates of the proportion parameter at sample
89
sizes of 100 and 200 (-12% and -18%, respectively).
Table 4.7
Mean Relative Bias Grouped by Method and Sample Size for
Estimates of µint , ΨC2 slop, Θ, and πc1 (Complete Data Methods)
Method Variable N = 100 N = 200 N = 500 N = 1000
MLEM µint -0.02 -0.01 -0.01 -0.01C2
IW 0.00 0.00 0.01 0.00
PW -0.02 -0.02 -0.01 -0.01
DD -0.04 -0.08 -0.12 -0.19
MLEM Ψslop -0.17 -0.17 -0.20 -0.20
IW 0.02 -0.03 -0.05 -0.07
PW -0.17 -0.19 -0.19 -0.20
DD -0.18 -0.13 -0.12 -0.12
MLEM Θ -0.02 -0.01 0.00 0.00
IW 0.00 0.00 0.00 0.00
PW 0.02 0.01 0.01 0.01
DD 0.00 0.00 0.00 0.00
MLEM πc1 -0.12 -0.18 -0.06 -0.08
IW -0.01 -0.02 -0.02 -0.02
PW -0.02 -0.03 -0.03 -0.02
DD 0.03 0.05 0.10 0.19
Note. Values in bold indicate severely biased estimates (greater than 10%
relative bias in either direction).
Table 4.8 shows the significant ME×SP interaction for relative bias values
of the proportion parameter. In addition to confirming that the MLEM method
produced biased estimates of the proportion parameter, it confirms that the class
separation condition was an important factor of how well the proportion parameter
was recovered when using the MLEM and DD methods. Specifically, very poor sep-
aration conditions produced severely negatively biased estimates of the proportion
parameter. For the MLEM method, the relative bias was -18% when separation was
very poor, 12% when separation was poor, and -11% when separation was moderate,
all which are considered severely biased. The DD method produced positively biased
estimates of 13% for very poor separation and 11% for poor separation. The IW and
PW method were unaffected by separation and produced unbiased estimates for all
90
separation conditions (between 2% and 1% negative bias).
Table 4.8
Mean Relative Bias Grouped by Method and Separation
for Estimate of πc1(Complete Data Methods)
Method Variable vpoor poor mod high
MLEM πc1 -0.18 -0.12 -0.11 -0.03
IW -0.02 -0.01 -0.02 -0.01
PW -0.02 -0.02 -0.02 -0.02
DD 0.13 0.11 0.09 0.03
Table 4.5 shows that the majority of the variance for the differences in relative
bias for certain parameters was explained by the main effects of the method. For
example, for the estimates of the class 1 and class 2 slope parameters, 78% and 74%
variance, respectively, was explained by the differences in the relative bias observed
by method of analysis. The aggregated relative bias measures for any parameters
that were not involved in any significant two-way interactions are provided in Ta-
ble 4.9. The table shows that the MLEM produced severely biased estimates of the
class 1 slope means (-25%) and class 2 slope means (14%), as well as the intercept
slope covariance (-20%). The PW method produced severely biased estimates of the
intercept variance (-19%) and the intercept slope covariance (64%).
Table 4.9
Mean Relative Bias Values Grouped by Analysis
Method (Complete Data Methods)
Parameter MLEM IW PW DD
µint 0.04 0.00 0.02 0.00c1
µslopc1 -0.25 -0.04 -0.07 -0.06
µslopc2 0.14 0.03 0.06 0.05
Ψint -0.05 0.08 -0.19 0.01
Ψ12 -0.20 -0.06 0.64 0.06
91
4.1.4 Standard Error Bias
To get a general idea of how well each of the methods estimated the stan-
dard errors, the standard error ratios for all the parameters were aggregated using
means and medians. These means and medians were grouped by sample size, and
the resulting values are presented in Table 4.10. The values in this table are also
graphically provided by Figure 4.3. The values of the medians are compared against
the means in order to minimize the influence of any extreme SE/SD ratios. The
Figure shows that the MLEM method produced underestimated standard errors re-
gardless of sample size, and the standard errors decreased even further as sample
size increased, from a median SE/SD ratio of 0.68 to 0.62. The IW priors produced
overestimated standard errors for sample sizes less than 1000, but the ratios drawing
closer to 1 as sample sizes increased was indicative that larger sample sizes corrected
the standard error closer to the sampling variability. The PW priors produced the
most accurate standard errors at sample size 500 (SE/SD ratio of 0.97 median val-
ues), but overestimated standard errors for sample size 100 (SE/SD ratio of 1.06)
and 200 (SE/SD ratio of 1.10). The data-driven (DD) priors consistently produced
underestimated standard errors and became more underestimated as sample size
increased, from a mean of 0.91 ratio for N = 100 to 0.81 ratio for N = 1000. The
overall mean and median of the standard error bias rates for each method across
separation showed similar patterns as the plots across sample sizes so they are not
presented, but the bottom of Table 4.10 shows similar patterns as those that were
presented across sample sizes.
92
Table 4.10
Means and Medians of the SE/SD Ratios Grouped by Method, Sample Size (top),
and Separation (bottom) for Complete Data Methods
N = 100 N = 200 N = 500 N = 1000
Method Mean Median Mean Median Mean Median Mean Median
MLEM 0.73 0.68 0.74 0.66 0.70 0.66 0.67 0.62
IW 1.30 1.26 1.22 1.20 1.13 1.11 1.03 1.03
PW 1.06 1.06 1.09 1.10 0.98 0.97 0.91 0.91
DD 0.91 0.99 0.87 0.97 0.87 0.96 0.81 0.81
vpoor poor mod high
me Mean Median Mean Median Mean Median Mean Median
MLEM 0.75 0.71 0.71 0.64 0.69 0.63 0.70 0.61
IW 1.15 1.11 1.20 1.18 1.19 1.14 1.14 1.11
PW 1.04 1.05 1.03 1.00 1.02 1.01 0.95 0.94
DD 0.90 0.98 0.86 0.95 0.85 0.91 0.85 0.94
N = 100 N = 200
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
MLEM IW PW DD MLEM IW PW DD
N = 500 N = 1000
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
MLEM IW PW DD MLEM IW PW DD
mean median
Figure 4.3. Overall means and medians of the SE/SD ratios by ME and
SS.
93
SE/SD Ratio
The results from the ANOVA for the differences in SE/SD ratios presented
in Table 4.11 indicate that there was a significant ME by SS interaction for the
differences in the SE/SD ratios of the intercept means and residual variances. All
other parameters were flagged as having significant differences in the SE/SD ratios
for varying conditions of the main effects for sample sizes (SS), class separations
(SP), and/or methods (ME).
Table 4.11
Factorial ANOVA Results for SE/SD Ratios of the Estimates of the Mean Pa-
rameters (Complete Data Methods)
Factor µint µC1 int µC2 slop µC1 slopC2
F p-val. η2 F p-val. η2 F p-val. η2 F p-val. η2
SS×ME 10.1 < 0.01 12.6 - - - - - - - - -
SP - - - - - - - - - - - -
SS - - - - - - - - - - - -
ME 188.1 < 0.01 78.3 575 < 0.01 91.7 233.6 < 0.01 88.9 134.2 < 0.01 82.6
Note. Cells containing a “-” indicate non-significance (p-value > 0.05) and/or effect sizes less
than 6%. SP: class separation; SS: sample size; ME: analysis method.
Table 4.12
Factorial ANOVA Results for SE/SD Ratios of the Estimates of the Variance
Parameters (Complete Data Methods)
Factor Ψint Ψslop Ψ12 Θ
F p-val. η2 F p-val. η2 F p-val. η2 F p-val. η2
SS×ME - - - - - - - - - 3.2 < 0.01 21.8
SP - - - - - - 14.1 < 0.01 15.8 - - -
SS 6.7 0.012 13.4 69.9 < 0.01 22.6 42.2 < 0.01 47.2 19.7 < 0.01 46.7
ME 123 < 0.01 73.4 75.9 < 0.01 63.1 15.1 < 0.01 17.6 - - -
Note. Cells containing a “-” indicate non-significance (p-value > 0.05) and/or effect sizes
less than 6%.
The SE/SD ratios produced by each method by sample size for the intercept
means and residual variances are presented in Table 4.13. Focusing on the intercept
means, the SE/SD ratios ranged from 0.47 to 1.26. The standard errors were more
94
accurate (closest to 1) for the Bayesian approaches (IW, PW, DD) than the standard
errors from MLEM, given that the ratios produced by the MLEM method were at
around 50%, which indicates that the MLEM method underestimated the standard
errors. Among the Bayesian methods, the PW prior produced the closest SE ratios
to 1 for the intercept means when the sample size was 100 and 200 (0.98 and 0.93,
respectively), while the IW prior produced the most accurate, albeit overestimated,
standard errors at sample sizes of 500 and 1000 (1.07 and 0.97, respectively). The DD
prior produced overestimated standard errors and the magnitude of them became
greater as sample size increased, from a SE ratio of 1.12 at sample size 100 to 1.47
at sample size 1000. The variability of these ratios were not as great for the residual
variances, although the differences by method and sample were flagged as significant
by the ANOVA.
Table 4.13
Mean SE/SD Ratios Grouped by Method and Sample Size
(Complete Data Methods)
Method Variable N = 100 N = 200 N = 500 N = 1000
MLEM µint 0.49 0.49 0.52 0.47C1
IW 1.43 1.18 1.07 0.97
PW 0.98 0.93 0.85 0.85
DD 1.12 1.05 1.26 1.47
MLEM Θ 1.05 1.05 1.01 1.01
IW 0.97 1.07 0.97 1.02
PW 0.96 1.08 0.97 1.02
DD 0.99 1.03 0.99 0.99
Finally, for the rest of the parameters that were only flagged for SE/SD ratio
differences for across each method, plots were produced and can be seen in Fig-
ure 4.4. The standard errors from MLEM were consistently underestimated while
the standard errors for the Bayesian methods varied, but we generally closer to 1.
95
The intercept and slope variance measures for the data driven priors method were
severely underestimated for the class 2 intercept, while the IW prior method was
consistently producing overestimated standard errors. According to the plot, the
proper inverse prior method produced the most accurate standard errors, with most
of the SE/SD ratio of the parameters around 1.
●
   Param.
● intc2
slopc1
slopc2
● vint
vslop
vsi
●
●
MLEM IW PW DD
Method
Figure 4.4. SE/SD ratios for class 2 intercept mean (intc2), class 1 and
class 2 slope mean (slopc1, slopc2), intercept variance (vint), slope vari-
ance (vslop) and intercept slope covariance (vsi) for each method.
Based on the results observed from this section, all the Bayesian-based methods
for handling missing data (FB and 2M) used the improper inverse Wishart (IW)
prior because the convergence rates, accuracy rates, relative bias and standard error
bias results indicated that the the Bayesian approach with the IW priors performed
relatively better than the other approaches. Moving forward, it became important to
96
SE/SD Ratio
0.2 0.4 0.6 0.8 1.0 1.2 1.4
compare results from the ML-based methods (LD, FI, and MI) with the results from
the full-data analysis using MLEM estimation, and the results from the Bayesian-
based methods (FB and 2M) with the results from the full-data analysis using
Bayesian inference with IW priors.
97
4.2 Missing Data Analysis
The results from the missing data analysis will be presented next in a similar
format as the previous section, which discussed the results from the full data analysis
using ML and the Bayesian method using three different prior specifications. Like in
the previous section, a series of factorial ANOVAs will be used to narrow down the
discussion to some key findings. First, the convergence rates across the missing data
handling methods will be compared, focusing on the methods. Next, the accuracy
rates will be compared. In the last two sections, the results from the comparison of
relative bias and standard error bias will be presented.
4.2.1 Convergence Rates
Table 4.14 shows clear differences in the convergence rates across methods
and sample sizes. As expected, the LD method produced the lowest convergence
rates (28%). The FI method produced relatively low convergence rates compared to
using MI, FB, and 2M, with a convergence rate as low as 38% when the sample size
was 100. The Bayesian approach produced the highest rates of convergence, even
when sample sizes were as small as 100, never going below 80%. The two-stage MI
also produced relatively high rates of convergence, although not as high as the fully
Bayesian approach, which maintained convergence rates between 81% and 89% .
Compared to the complete methods, the missing data methods produced lower
rates of convergence when the missing data were introduced. One unexpected result
occurred at sample sizes of 100, where the FB and 2M methods produced higher rates
98
of convergence (90% and 89%, respectively) than when the full data was analyzed
using Bayesian inference with IW priors (82%).
Table 4.14
Mean Convergence Rates for Missing Data Methods Grouped by
Method and Sample Size
Method N = 100 N = 200 N = 500 N = 1000
MLEM 0.48 0.63 0.79 0.77
IW 0.82 0.90 0.90 0.91
LD 0.28 0.40 0.57 0.67
FI 0.39 0.52 0.66 0.75
MI 0.67 0.79 0.85 0.89
FB 0.90 0.90 0.84 0.82
2M 0.89 0.89 0.84 0.81
4.2.2 Classification Accuracy Rates
Presented in Table 4.15 are the accuracy rates for all the methods by separa-
tion condition. It is clear from this table that these rates improved as the separation
increased from very poor to high for every method, with the largest improvement for
the FB method, which went from 53% accuracy when the separation was very poor,
to 66% when the separation was high. At the very poor classification condition,
all the methods produced the same poor results of around 51% to 53% accuracy
rates. The classification accuracy rates were also slightly better for the Bayesian
method and the imputation methods across all separations, with the largest differ-
ences observed between FB and FI when the class separation was moderate. When
the separation was high, FI, MI, and 2M produced similar classification accuracy
rates, but the FB method was around 6% higher than all other methods. As ex-
99
pected, the missing data conditions dropped the classification accuracy rates across
all methods.
Table 4.15
Mean Classification Accuracy Rates for Missing
Data Methods Grouped by Method and Separation
Method vpoor poor mod high
MLEM 0.51 0.54 0.57 0.62
IW 0.54 0.58 0.64 0.68
LD 0.51 0.53 0.57 0.60
FI 0.51 0.53 0.56 0.60
MI 0.52 0.54 0.57 0.61
FB 0.53 0.57 0.62 0.66
2M 0.52 0.54 0.56 0.60
4.2.3 Relative Bias
Tables of the absolute relative bias grouped by sample sizes and analysis meth-
ods are presented in Table 4.16. The absolute relative bias measures for all the
parameters were aggregated by method and by sample size and class separation to
observe the general bias that was produced by each method. The large discrepancies
between the means and medians of the absolute relative bias values indicates that
the overall bias may be a result of a few parameter estimates that were extremely
biased. For example, for sample sizes of 100, the medians for all the methods, with
the exception of the MLEM and LD methods, were below the 10% threshold while
many of the means were above the threshold. In general, the LD, FI, and MI meth-
ods showed the most problematic relative bias values for all samples sizes (values in
bold). The FB and 2M methods produced the closest relative bias measures to the
100
full data method (IW), while the LD, FI, and MI methods were generally the most
biased and close to values around the MLEM method.
Table 4.16
Means and Medians of the Overall Absolute Relative Bias Grouped by Method,
Sample Size (Top), and Separation (Bottom) (Missing Data Methods)
N = 100 N = 200 N = 500 N = 1000
Method Mean Median Mean Median Mean Median Mean Median
MLEM 0.17 0.14 0.16 0.17 0.12 0.09 0.13 0.10
IW 0.08 0.02 0.08 0.03 0.07 0.02 0.08 0.03
LD 0.19 0.10 0.16 0.10 0.13 0.09 0.13 0.10
FI 0.15 0.09 0.13 0.09 0.12 0.08 0.13 0.10
MI 0.13 0.08 0.13 0.08 0.12 0.07 0.12 0.09
FB 0.10 0.03 0.08 0.02 0.07 0.02 0.07 0.03
2M 0.08 0.03 0.07 0.03 0.08 0.03 0.08 0.03
vpoor poor mod high
Method Mean Median Mean Median Mean Median Mean Median
MLEM 0.16 0.17 0.13 0.12 0.14 0.10 0.15 0.11
IW 0.08 0.03 0.04 0.01 0.07 0.03 0.11 0.03
LD 0.15 0.10 0.15 0.10 0.15 0.10 0.16 0.10
FI 0.15 0.10 0.12 0.08 0.12 0.07 0.14 0.11
MI 0.17 0.09 0.13 0.10 0.10 0.06 0.11 0.08
FB 0.07 0.04 0.05 0.02 0.08 0.02 0.12 0.03
2M 0.08 0.04 0.05 0.02 0.06 0.02 0.10 0.04
101
N = 100 N = 200
20 20
15 15
10 10
5 5
0 0
MLEM IW LD FI MI FB 2M MLEM IW LD FI MI FB 2M
N = 500 N = 1000
20 20
15 15
10 10
5 5
0 0
MLEM IW LD FI MI FB 2M MLEM IW LD FI MI FB 2M
mean median
Figure 4.5. Overall means and medians of the absolute relative bias by
ME and SS. The results from the full data analysis methods are included
in the plots as a basis for comparison.
The bottom portion of Table 4.16 presents similar information grouped by
the foud class separation conditions. Overall, the very poor separation condition
produced the estimates that were very biased, but the high separation condition
also produced large values of relative bias. The mean of the relative bias values
were nearly all above 10%. Again, the large discrepancies between the means and
medians is an indication that the bias may have been attributed to a few parameters
estimates that produced very large values of relative bias.
102
Absolute Relative Bias
Very Poor Separation Poor Separation
20 20
15 15
10 10
5 5
0 0
MLEM IW LD FI MI FB 2M MLEM IW LD FI MI FB 2M
Moderate Separation High Separation
20 20
15 15
10 10
5 5
0 0
MLEM IW LD FI MI FB 2M MLEM IW LD FI MI FB 2M
mean median
Figure 4.6. Overall means and medians of the absolute relative bias by
ME and SP. The results from the full data analysis methods are included
in the plots as a basis for comparison.
Table 4.17 shows the absolute relative bias for each method aggregated by
each missing data rate. In general, larger percentages of missing data produced
larger overall measures of relative bias, and these may be attributed to a few select
variables because the medians were largely unaffected. For example, using the LD
method, the median at 5% missing rate was at 9% relative bias and at 40% missing
rate it was at 10% relative bias, but the mean relative bias went from 12.5% at 5%
missing to almost 20% at 40% missing. The same observation can be made for FI,
103
Absolute Relative Bias
MI, FB, and 2M methods, where the rate of change between the mean and median
are different as missing data rate increased. The table also shows that the FB and
2M methods seem to have mitigated the effects of the missing data much better
than the LD, FI, and MI, given that although missing rate increased, median bias
also remained at around the same rates as the IW method (between 2.2 and 3.4%
absolute relative bias). The discrepancies between the mean and median for these
methods was also an indication that there may have been a few parameters that
increased the values of relative bias as the rate of missing increased, as it can be
seen that the mean relative bias at 40% was near 10% for both FB and 2M (9% and
8%, respectively).
Table 4.17
Means and Medians of the Overall Absolute Relative Bias Grouped by Method and
Missing Rate
5% miss. 10% miss. 20% miss. 30% miss. 40% miss.
Method Mean Median Mean Median Mean Median Mean Median Mean Median
MLEM† 0.145 0.136 - - - - - - -
IW† 0.075 0.026 - - - - - - -
LD 0.126 0.099 0.130 0.101 0.142 0.095 0.166 0.099 0.199 0.100
FI 0.124 0.082 0.125 0.092 0.130 0.093 0.138 0.086 0.153 0.097
MI 0.136 0.091 0.129 0.084 0.125 0.086 0.124 0.079 0.119 0.071
FB 0.067 0.022 0.069 0.025 0.078 0.026 0.084 0.025 0.097 0.028
2M 0.073 0.030 0.074 0.030 0.077 0.032 0.077 0.034 0.080 0.034
†These methods were not applied on any missing data conditions. Results reported here are from 0%
missing rate condition.
Finally, Table 4.18 shows side-by-side tabulations of the overall absolute rela-
tive bias measures aggregated by missingness mechanisms and methods. The differ-
ences in the overall absolute relative bias measures were almost identical.
104
Table 4.18
Means and Medians of the Overall Absolute Rela-
tive Bias Grouped by Method and Missing Mech-
anism
MCAR CMAR+
me Mean Median Mean Median
MLEM† 0.14 0.14 - -
IW† 0.08 0.03 0.08 0.03
LD 0.16 0.10 0.14 0.10
FI 0.14 0.09 0.13 0.09
MI 0.13 0.08 0.13 0.08
FB 0.08 0.02 0.08 0.03
2M 0.08 0.03 0.08 0.03
†These methods were not applied on any missing data
conditions. Results reported here are from 0% missing rate
condition.
To pinpoint meaningful differences in relative bias of each parameter estimate
between the conditions of the various manipulated conditions, a split-plot design
factorial ANOVA was conducted, where method (ME) was considered a within sub-
ject factor and other manipulated factors (class separation (SP), sample size (SS),
missing data rate (MR), missingness mechanism (MM)) were considered between
subject factors. Presented in Table 4.19 and Table 4.20 are the percentages of vari-
ance explained (η2 values) by the differences in the manipulated conditions for the
relative bias of each parameter estimate. Only main effects and interaction effects
that were significant according to the criteria presented in Section 3.5.5 were tab-
ulated. A three-way interaction among missing rate (MR), sample size (SS), and
method of analysis (ME) for the relative bias of the residual variance was flagged as
significant. In addition, two-way interactions were flagged between SP×ME (class
proportion parameter), SS×ME (intercept variance, slope variance, and intercept-
slope covariance parameters), and ME×SS (intercept variance, slope variance, and
105
residual variance). All three-way interactions, which subsume any two-way inter-
actions and main effects, as well as any two-way interactions, which subsume any
significant main effects, will be discussed in more detail next.
Table 4.19
Factorial ANOVA Results for Relative Bias of Estimates of Parameter Means
(Missing Data Methods)
µint µC1 int µC2 slop µC1 slop πC2 c1
Factor F η2 F η2 F η2 F η2 F η2
SP×ME - - - - - - - - 29.1 12.5
SP 4633.9 39.8 4187.6 57.8 268.4 10.3 203.7 10.7 156.5 17.4
SS 759 6.5 - - - - - - 69.7 7.7
MM - - - - - - - - 177 6.6
ME 3309.2 42.4 891.6 24 1311.3 61.9 1097.5 64 - -
Note. Cells containing a “-” indicate non-significance (p-value > 0.05) and/or effect sizes
less than 6%. Values in bold are discussed in further detail. MM: missing mechanism; MR:
missing rate; SP: class separation; SS: sample size; ME: analysis method.
Table 4.20
Factorial ANOVA Results for Relative Bias of Estimates of Parameter
Variances (Missing Data Methods)
Ψint Ψslop Ψ12 Θ
Factor F η2 F η2 F η2 F η2
MR×SS×ME - - - - - - 85.8 15.2
SS×ME 282.4 6.4 104.8 9.5 176.3 8.3 552.6 24.4
MR×ME - - - - - - 343.1 20.2
SP 6676.8 56 - - 2214.8 35.7 - -
SS - - 752.2 20.7 1016.2 16.4 - -
ME 3119.3 23.7 1559.4 47.1 1106.7 17.3 1960.1 28.9
Note. Cells containing a “-” indicate non-significance (p-value > 0.05) and/or
effect sizes less than 6%. Values in bold are discussed in further detail. MM:
missing mechanism; MR: missing rate; SP: class separation; SS: sample size; ME:
analysis method.
4.2.3.1 Results of the Main Effects
To compare pairs of means of the relative bias values of the main effects, the
Tukey’s HSD procedure was used. Tables 4.21 through 4.23 show the means of the
106
relative bias measures for any parameters for which the manipulated conditions were
not flagged for interactions. Table 4.21 shows that while increasing class separation
decreased the relative bias of the estimates of the intercept and slope mean param-
eters, they increased the relative bias of the estimates of the intercept variance and
intercept slope covariance parameter. For example, the relative bias of the class 1
intercept mean when the separation was very poor was positively biased at 6%, but
it gradually decreased to 0% as the class separation increased to high. Similarly, the
relative bias of the class 2 slope means decreased from -22% when the separation
was very poor to -13% when the separation was high. This similar pattern was not
observed for the estimates of the variance, however. For the intercept variance, the
very poor and high separation conditions produced severely biased parameter esti-
mates in opposite directions, where the mean of the relative bias for the very poor
condition was negatively biased at -21% and the mean of the relative bias for the
high condition was positively biased at 21%. This similar pattern can be observed
for the intercept slope covariance (Ψ12), where the relative bias went from 21% when
separation was very poor to -48% when the separation was high. Another observa-
tion to point out is that the most problematic parameter estimates were observed for
the slope mean, intercept variance, and intercept slope covariance, as the majority
of the aggregated relative bias values were outside of the ±10% acceptable range.
107
Table 4.21
Mean Relative Bias Grouped by Levels of Separation (SP)
SP µint µC1 int µC2 slop µC1 slop Ψ ΨC2 int 12
vpoor 0.06 -0.04 -0.22 0.16 -0.21 0.21
poor 0.04 -0.02 -0.18* 0.10* -0.16 -0.04
mod 0.02 0.00 -0.16* 0.10*† 0.02 -0.36
high 0.00 0.02 -0.13 0.10† 0.21 -0.48
Note. Means are averaged over levels of ME, SS, MR, and MM. All pairwise comparisons
were significant at the α = 0.05 level unless otherwise noted. *Differences between poor
and moderate separation were not significant. †Differences between moderate and high
separation were not significant. All p-values were adjusted using the Tukey method.
Values in bold indicate severe relative bias.
In regards to the ME main effect, Table 4.22 shows significant differences be-
tween all the relative bias values produced by the methods for most of the parameter
estimates except for the intercept and slope means. The LD, FI and MI methods
showed severely biased estimates of the slope mean (greater than 20%), while the
FB and 2M methods were within the acceptable threshold of ±10%. The differences
were also significantly different, with the exception of the relative bias of the inter-
cept and slope estimate using the FB and 2M methods. The means of the relative
bias from the full data analysis are also provided as a way to examine how the
missing data conditions affected the estimation of some of the parameters for the
different methods. This comparison indicates that the relative bias was not affected
when missing data were introduced because the relative bias for the MLEM-based
methods remained at around -25% and the Bayesian methods remained at around
3-4% for the bias of these particular parameter estimates.
The SS main effect for the intercept mean also showed significant differences
for different sample sizes, and Table 4.23 showed a decrease in the relative bias of
108
the parameter estimates as sample size increased, from a relative bias of 3.9% at
sample size 100 to a relative bias of 1.7% at sample size 1000. While these pairwise
differences were flagged as significant, they were within the 10% acceptable bias
threshold.
Table 4.22
Mean Relative Bias Grouped by Analysis Methods (ME)
ME µint µC1 int µC2 slop µC1 slopC2
MLEM 0.04 -0.01 -0.25 0.14
IW 0.00 0.00 -0.04 0.03
LD 0.06 -0.02 -0.25 0.14†
FI 0.04 -0.03 -0.26 0.21
MI 0.03 -0.02 -0.21 0.14†
FB 0.00* 0.00* -0.06 0.03
2M 0.00* 0.00* -0.09 0.05
Note. Means are averaged over levels of SP, SS, MR, and MM.
All pairwise comparisons were significant at the α = 0.05 level
unless otherwise noted. *Differences between FB and 2M were not
significant. †Differences between LD and MI were not significant.
All p-values were adjusted using the Tukey method. Values in bold
indicate severe relative bias.
Table 4.23
Mean Relative Bias Grouped by
Sample Size (SS)
SS µintC1
100 0.039
200 0.035
500 0.023
1000 0.017
Note. Means are averaged over levels
of SP, ME, MR, and MM. All pair-
wise comparisons were significant at
the α = 0.05 level unless otherwise
noted. All p-values were adjusted us-
ing the Tukey method.
109
4.2.3.2 Results of the Interaction Effects
To investigate the two-way interaction effects (ME×SS and ME×SP), main ef-
fect pairwise comparisons between the methods were conducted for the different lev-
els of the second factor. These comparisons are provided in Tables 4.24 through 4.27.
The flagged three-way interaction (MR×SS×ME) was further investigated using the
relative bias measures presented in Table 4.28 and a plot of these values provided
by Figure 4.8.
Focusing on the estimates of the intercept variance, Table 4.24 shows signifi-
cant differences in the relative bias for almost all comparisons between the methods
except for FI and MI at sample size 500 and 1000, and FB and 2M at sample size
1000. The differences between methods decreased as the sample size increased. For
example, at sample size 100, a near 50% difference in the relative bias value was
observed between LD and FB, whereas when the sample size is 200, this difference
decreased to 34% and continued to decrease at sample size 500 to 21%, and to 14% at
sample size 1000. For sample size 100, the largest differences were observed between
LD and FB or 2M (49% and 45%, respectively), FI and FB and 2M (29% and 25%,
respectively), and MI and FB and 2M (36% and 32%, respectively). The smallest
differences were observed between FI and MI (7%), and FB and 2M (4%). For sam-
ple size 200 to 1000, similar patterns emerged, where LD and FB or 2M, FI and FB
or 2M, and MI and FB or 2M produced the largest differences, whereas FI and MI,
and FB and 2M produced the smallest differences in measures of relative bias. The
top left panel of Figure 4.7 shows these differences graphically. Another observation
110
to note that is made more evident by observing the Figure is that for sample size
100, all the conditions produced relative bias values outside of the acceptable 10%
threshold. After sample size 100, only the FB and 2M method produced intercept
variance estimates within the 10% threshold whereas for the other methods, it took
a sample size of 1000 to produce intercept variance estimates that were acceptable.
111
Intercept Variance Slope Variance
● ●
●    SS    SS
●
● 100 ● 100
200 ● 200
500 500
1000 1000
●
●
●
● ●
●
●
●
●
MLEM LD FI MI IW FB 2M MLEM LD FI MI IW FB 2M
Method Method
Intercept Slope Covariance Proportion
   SS    SP
●
● 100 ● vpoor
200 poor
500 mod
1000 high
●
● ●
● ●
●
●
●
● ●
●
● ●
MLEM LD FI MI IW FB 2M MLEM LD FI MI IW FB 2M
Method Method
Figure 4.7. Interaction plots for ME × SS of the relative bias measure for the
intercept variance (top left), slope variance (top right), intercept slope covariance
(bottom left) and interaction plot for the ME × SP of relative bias of the estimate
of the class proportion parameter (bottom right).
112
Relative Bias Relative Bias
−1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 −0.3 −0.2 −0.1 0.0 0.1
Relative Bias Relative Bias
−0.15 −0.10 −0.05 0.00 0.05 0.10 −0.2 −0.1 0.0 0.1
Table 4.24
Main Effect Pairwise Comparisons Corresponding to ME × SS In-
teraction of Intercept Variances for Relative Bias
Contrast (ME) SS X̄ME=j,SS=k X̄ME=j′,SS=k′ Difference
LD - FI 100 -0.34 -0.14 -0.20
LD - MI -0.34 -0.20 -0.13
LD - FB -0.34 0.16 -0.49
LD - 2M -0.34 0.12 -0.45
FI - MI -0.14 -0.20 0.07
FI - FB -0.14 0.16 -0.29
FI - 2M -0.14 0.12 -0.25
MI - FB -0.20 0.16 -0.36
MI - 2M -0.20 0.12 -0.32
FB - 2M 0.16 0.12 0.04
LD - FI 200 -0.23 -0.07 -0.16
LD - MI -0.23 -0.13 -0.10
LD - FB -0.23 0.11 -0.34
LD - 2M -0.23 0.07 -0.30
FI - MI -0.07 -0.13 0.06
FI - FB -0.07 0.11 -0.18
FI - 2M -0.07 0.07 -0.14
MI - FB -0.13 0.11 -0.24
MI - 2M -0.13 0.07 -0.20
FB - 2M 0.11 0.07 0.04
LD - FI 500 -0.14 -0.04 -0.10
LD - MI -0.14 -0.05 -0.09
LD - FB -0.14 0.07 -0.21
LD - 2M -0.14 0.03 -0.17
FI - MI -0.04 -0.05 0.02*
FI - FB -0.04 0.07 -0.10
FI - 2M -0.04 0.03 -0.07
MI - FB -0.05 0.07 -0.12
MI - 2M -0.05 0.03 -0.08
FB - 2M 0.07 0.03 0.04
LD - FI 1000 -0.07 0.02 -0.09
LD - MI -0.07 0.03 -0.09
LD - FB -0.07 0.07 -0.14
LD - 2M -0.07 0.07 -0.14
FI - MI 0.02 0.03 -0.01*
FI - FB 0.02 0.07 -0.05
FI - 2M 0.02 0.07 -0.05
MI - FB 0.03 0.07 -0.04
MI - 2M 0.03 0.07 -0.04
FB - 2M 0.07 0.07 0.00*
Note. Means are averaged over levels of SP, MR, and MM. All pairwise com-
parisons were significant at the α = 0.05 level unless otherwise noted by a *. All
p-values were adjusted using the Tukey method.
113
Focusing on the estimates of the slope variance, Table 4.25 shows some counter
intuitive results. For example, for some of the methods, smaller sample sizes actually
produced relative bias measures closer to 0. This was the case for the values produced
by the LD, FI and MI methods, where for sample size 100, the relative bias for these
methods was 3%, 10%, and 25%, respectively, for sample size 200, the relative bias
increased to 9%, 14%, and 27%, for sample size 500, it increased further to 18% and
19% for the LD and MI methods, and for sample size 1000, these values increased
even further to 20% and 21% for the LD and MI methods. The Bayesian-based
methods (FB and 2M) showed a different pattern where the sample sizes of 100
and 1000 produced the greatest degrees of relative bias (i.e., FB at sample size 100
produced 15% relative bias and FB at sample size 1000 produced 11% relative bias)
and sample sizes of 200 and 500 produced relative bias below 10% (i.e., FB at sample
size 200 produced 3% relative bias and FB at sample size 500 produced 6% relative
bias).
The top right panel of Figure 4.7 shows the relative bias values of the slope
variance together with the relative bias values produced by the complete data anal-
ysis methods (MLEM and IW). The figure shows that the MLEM method produced
extremely negatively biased estimates of the slope variance (around -20% relative
bias), which makes the results observed from the sample size of 100 for the LD and
MI methods seem questionable. The results observed for sample sizes of 500 and
1000 for the ML-based methods were actually consistent with the complete data
MLEM method. The IW method was actually negatively biased, and the Figure
shows that when missing data were added, the variability of the relative bias values
114
increased for both the FB and 2M method, with relative bias measures deviating
the most away from the IW method for sample sizes of 100 and 200.
115
Table 4.25
Main Effect Pairwise Comparisons Corresponding to ME × SS In-
teraction of Slope Variances for Relative Bias
Contrast (ME) SS X̄ME=j,SS=k X̄ME=j′,SS=k′ Difference
LD - FI 100 -0.03 -0.10 0.07
LD - MI -0.03 -0.25 0.21
LD - FB -0.03 0.15 -0.18
LD - 2M -0.03 0.07 -0.11
FI - MI -0.10 -0.25 0.15
FI - FB -0.10 0.15 -0.25
FI - 2M -0.10 0.07 -0.17
MI - FB -0.25 0.15 -0.39
MI - 2M -0.25 0.07 -0.32
FB - 2M 0.15 0.07 0.07
LD - FI 200 -0.09 -0.14 0.05
LD - MI -0.09 -0.27 0.18
LD - FB -0.09 0.03 -0.13
LD - 2M -0.09 -0.04 -0.05
FI - MI -0.14 -0.27 0.13
FI - FB -0.14 0.03 -0.17
FI - 2M -0.14 -0.04 -0.10
MI - FB -0.27 0.03 -0.30
MI - 2M -0.27 -0.04 -0.23
FB - 2M 0.03 -0.04 0.07
LD - FI 500 -0.18 -0.19 0.02*
LD - MI -0.18 -0.26 0.08
LD - FB -0.18 -0.06 -0.11
LD - 2M -0.18 -0.14 -0.04
FI - MI -0.19 -0.26 0.07
FI - FB -0.19 -0.06 -0.13
FI - 2M -0.19 -0.14 -0.05
MI - FB -0.26 -0.06 -0.20
MI - 2M -0.26 -0.14 -0.12
FB - 2M -0.06 -0.14 0.08
LD - FI 1000 -0.20 -0.21 0.01*
LD - MI -0.20 -0.22 0.02
LD - FB -0.20 -0.11 -0.09
LD - 2M -0.20 -0.13 -0.07
FI - MI -0.21 -0.22 0.01*
FI - FB -0.21 -0.11 -0.10
FI - 2M -0.21 -0.13 -0.08
MI - FB -0.22 -0.11 -0.11
MI - 2M -0.22 -0.13 -0.09
FB - 2M -0.11 -0.13 0.02
Note. Means are averaged over levels of SP, MR, and MM. All pairwise com-
parisons were significant at the α = 0.05 level unless otherwise noted by a *. All
p-values were adjusted using the Tukey method.
116
Focusing on the estimates of the intercept slope covariance, Table 4.26 shows
both that the extreme relative bias values are corrected by larger sample sizes and
that the differences between the method create smaller differences. The sample size
effect can be seen by the extreme change in relative bias when using FI, for example,
which goes from -59% for sample size 100, to -7% for sample size 1000. In addition,
the greatest differences were between LD and FB or 2M, FI and 2M, and MI and
FI and 2M. For sample size 100, the different between LD and FB is -55%, the
difference between FI and 2M is -28%, and the difference between MI and FB is 55%.
For sample size 1000, these differences become -9%, -14%, and -14%. Examining the
bottom left plot of Figure 4.7, it is clear that sample size 100 created the most
biased estimates of intercept slope covariance. The estimates using the MI method
produced covariances with relative bias values that deviated from all other methods
(positively biased). The FB and 2M methods produced relative bias values closest
to the IW method, even when sample sizes were at 100 and 200, albeit extremely
downwards biased.
117
Table 4.26
Main Effect Pairwise Comparisons Corresponding to ME × SS In-
teraction of Intercept Slope Covariances for Relative Bias
Contrast (ME) SS X̄ME=j,SS=k X̄ME=j′,SS=k′ Difference
LD - FI 100 -1.00 -0.59 -0.41
LD - MI -1.00 0.10 -1.10
LD - FB -1.00 -0.45 -0.55
LD - 2M -1.00 -0.31 -0.69
FI - MI -0.59 0.10 -0.69
FI - FB -0.59 -0.45 -0.14
FI - 2M -0.59 -0.31 -0.28
MI - FB 0.10 -0.45 0.55
MI - 2M 0.10 -0.31 0.42
FB - 2M -0.45 -0.31 -0.14
LD - FI 200 -0.56 -0.33 -0.24
LD - MI -0.56 0.19 -0.75
LD - FB -0.56 -0.27 -0.30
LD - 2M -0.56 -0.05 -0.51
FI - MI -0.33 0.19 -0.52
FI - FB -0.33 -0.27 -0.06
FI - 2M -0.33 -0.05 -0.27
MI - FB 0.19 -0.27 0.46
MI - 2M 0.19 -0.05 0.25
FB - 2M -0.27 -0.05 -0.21
LD - FI 500 -0.15 -0.04 -0.11
LD - MI -0.15 0.15 -0.30
LD - FB -0.15 -0.08 -0.07
LD - 2M -0.15 0.14 -0.29
FI - MI -0.04 0.15 -0.19
FI - FB -0.04 -0.08 0.04*
FI - 2M -0.04 0.14 -0.18
MI - FB 0.15 -0.08 0.23
MI - 2M 0.15 0.14 0.01*
FB - 2M -0.08 0.14 -0.22
LD - FI 1000 -0.07 -0.07 0.00*
LD - MI -0.07 -0.01 -0.06
LD - FB -0.07 0.01 -0.09
LD - 2M -0.07 0.07 -0.14
FI - MI -0.07 -0.01 -0.06
FI - FB -0.07 0.01 -0.08
FI - 2M -0.07 0.07 -0.14
MI - FB -0.01 0.01 -0.02*
MI - 2M -0.01 0.07 -0.08
FB - 2M 0.01 0.07 -0.06
Note. Means are averaged over levels of SP, MR, and MM. All pairwise com-
parisons were significant at the α = 0.05 level unless otherwise noted by a *. All
p-values were adjusted using the Tukey method.
118
Focusing on the relative bias of the proportion parameter, Table 4.27 shows
smaller values of relative bias overall and several comparisons among the methods
showing non-significant differences between the values produced among the differ-
ent methods as the class separation increases. At moderate and high class separa-
tion conditions, these differences become only significant between LD and all other
methods. The ME×SP interaction plot for the proportion parameter presented in
the bottom right panel of Figure 4.7 shows clear differences for vpoor and poor
conditions. Although almost all estimates were within the ±10% acceptable bias
range and many of the differences between MLEM-based methods and Bayesian-
based methods, the Bayesian-based methods (IW, FB, and 2M) still produced the
smallest values of relative bias regardless of sample size. For MLEM-based methods,
the pattern was mostly unclear, and surprisingly, the complete data MLEM method
produced some severely underestimated parameters when the separation was less
than high.
119
Table 4.27
Main Effect Pairwise Comparisons Corresponding to ME × SP In-
teraction of Proportion Parameter for Relative Bias
Contrast (ME) SP X̄ME=j,SS=k X̄ME=j′,SS=k′ Difference
LD - FI vpoor 0.07 0.07 0.00*
LD - MI 0.07 0.06 0.01*
LD - FB 0.07 0.01 0.07
LD - 2M 0.07 0.01 0.07
FI - MI 0.07 0.06 0.01*
FI - FB 0.07 0.01 0.07
FI - 2M 0.07 0.01 0.07
MI - FB 0.06 0.01 0.06
MI - 2M 0.06 0.01 0.06
FB - 2M 0.01 0.01 0.00*
LD - FI poor 0.08 0.04 0.05
LD - MI 0.08 0.05 0.04
LD - FB 0.08 0.00 0.08
LD - 2M 0.08 0.00 0.08
FI - MI 0.04 0.05 -0.01*
FI - FB 0.04 0.00 0.03
FI - 2M 0.04 0.00 0.03
MI - FB 0.05 0.00 0.04
MI - 2M 0.05 0.00 0.04
FB - 2M 0.00 0.00 0.00*
LD - FI mod 0.04 -0.01 0.04
LD - MI 0.04 0.02 0.02*
LD - FB 0.04 0.00 0.04
LD - 2M 0.04 0.00 0.04
FI - MI -0.01 0.02 -0.02
FI - FB -0.01 0.00 -0.01*
FI - 2M -0.01 0.00 -0.01*
MI - FB 0.02 0.00 0.02*
MI - 2M 0.02 0.00 0.02*
FB - 2M 0.00 0.00 0.00*
LD - FI high -0.09 -0.03 -0.07
LD - MI -0.09 -0.03 -0.07
LD - FB -0.09 -0.01 -0.09
LD - 2M -0.09 -0.01 -0.08
FI - MI -0.03 -0.03 0.00*
FI - FB -0.03 -0.01 -0.02*
FI - 2M -0.03 -0.01 -0.02*
MI - FB -0.03 -0.01 -0.02*
MI - 2M -0.03 -0.01 -0.02*
FB - 2M -0.01 -0.01 0.00*
Note. Means are averaged over levels of SS, MR, and MM. All pairwise compar-
isons were significant at the α = 0.05 level unless otherwise noted by a *. All
p-values were adjusted using the Tukey method.
120
The three-way interaction among ME, SS, and MR conditions for the estimates
of the residual variance is presented in Figure 4.8 with values that were used to
produced the plots provided by Table 4.28. To make fair comparisons, the values
and plots of the relative bias for the full data analysis using ML and IW priors
are also provided. The first glaring observation is that larger missing rates produced
more biased estimates across all sample sizes and methods. As expected, the greatest
bias was observed for the estimates from the LD method, where the bias increased to
unacceptable ranges for sample size of 100 at missing data rate of 30% (-10% relative
bias) and 40% (-19% relative bias). Another notable observation is that bias of the
residual variances generally decreased as sample size increased. For example, for the
FI method, at a 40% missing rate, the residual variance went from -5.7% for sample
size 100 to .2% for sample size 1000. the This is made clear by the fact that the
scale of the plots decrease as the sample sizes increase. Overall, once the sample size
reached 200, none of the methods produced estimates outside of the 10% relative
bias criteria except for when the missing data rate was at 30 and 40% and the LD
method was used.
121
Table 4.28
Mean Relative Bias of the Residual Variance Estimate Grouped
by Method, Missing Data Rate, and Sample Size
Method MR N = 100 N = 200 N = 500 N = 1000
IW 0 -0.001 -0.004 -0.001 0.002
MLEM 0 -0.025 -0.011 -0.003 0.003
LD 5 -0.028 -0.013 -0.002 0.001
FI 5 -0.027 -0.014 -0.002 0.001
MI 5 -0.008 -0.008 0.001 0.003
FB 5 -0.003 -0.005 -0.001 0.003
2M 5 0.003 -0.002 0.000 -0.003
LD 10 -0.031 -0.013 -0.005 0.001
FI 10 -0.029 -0.014 -0.003 0.002
MI 10 -0.006 -0.006 0.003 0.004
FB 10 -0.004 -0.006 -0.003 0.002
2M 10 0.007 -0.002 -0.002 -0.001
LD 20 -0.060 -0.023 -0.006 -0.001
FI 20 -0.031 -0.019 -0.005 0.000
MI 20 0.004 0.003 0.004 0.004
FB 20 -0.012 -0.009 -0.005 0.000
2M 20 0.030 0.007 0.004 -0.001
LD 30 -0.100 -0.042 -0.015 -0.007
FI 30 -0.038 -0.025 -0.009 0.000
MI 30 0.014 0.011 0.008 0.005
FB 30 -0.016 -0.015 -0.010 -0.003
2M 30 0.056 0.020 0.007 0.000
LD 40 -0.193 -0.092 -0.035 -0.010
FI 40 -0.057 -0.041 -0.012 -0.002
MI 40 0.035 0.017 0.011 0.008
FB 40 -0.026 -0.024 -0.016 -0.003
2M 40 0.096 0.034 0.009 0.004
122
100 200
   MR    MR
0 0
● 5 ● 5
10 10
20 20
● 30 30
● ●
● 40 ● ● 40
● ●
● ●
MLEM LD FI MI IW FB 2M MLEM LD FI MI IW FB 2M
500 1000
   MR    MR
0 0
● 5 ● 5
10 10
20 20
● 30 ● ● ● ● 30
● ● ● ● ●40 40
MLEM LD FI MI IW FB 2M MLEM LD FI MI IW FB 2M
Figure 4.8. Three-way interaction ME× SS×MR for the relative bias of the residual
variances.
123
Relative Bias Relative Bias
−0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10
Relative Bias Relative Bias
−0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10
4.2.4 Standard Error Bias
The overall standard errors across method and varying sample sizes can be
seen in the top portion of Table 4.29. These values are also graphically presented
by Figure 4.9. Graphically, the trends look very clear. The MLEM method and all
other methods based on ML were consistently underestimated. The Bayesian-based
methods (IW, FB, and 2M) were consistently overestimated. Although difficult to
tell from the bar plots, the LD method produced even more underestimated standard
errors as sample size increased, producing a mean SE/SD ratio of 0.78 for sample
size 100 and a SE/SD ratio of 0.72 for sample size 1000. This ratio also decreased
when using the FI method, from 0.75 to 0.69. These results were actually consistent
with the complete data MLEM method, which produced a ratio of 0.73 for sample
size 100 and a ratio of 0.67 for sample size 1000. The MI and FB methods actually
approached 1 as sample size increased (from 0.89 to 0.96 and from 1.30 to 1.04,
respectively). The 2M method produced the most overestimated standard errors of
all, and the larger sample size actually increased the standard errors (from 1.50 to
2.1). The overall standard errors across class separation conditions are also provided
on the bottom portion of Table 4.29 and showed a similar trend as those observed
from different sample size conditions.
For each missing data rate, the MI method produced accurate standard errors
as the rate of missing increased, producing SE/SD ratios from 0.72 to 1.123 when
missing rate went from 5% to 40%. The FB method kept standard errors consistent
with SE/SD ratios near 1.15, which were closest to SE/SD ratios produced by the full
124
data analysis using IW, while the LD and FI methods maintained underestimated
standard errors, similar to those produced by the full data MLEM method (at around
0.7 for the means).
Table 4.29
Overall Mean SE/SD Ratios for Missing Data Methods Grouped by Method, Sample
Size (top), and Class Separation (bottom)
N = 100 N = 200 N = 500 N = 1000
Method Mean Median Mean Median Mean Median Mean Median
MLEM 0.733 0.684 0.736 0.656 0.701 0.660 0.669 0.617
IW 1.295 1.255 1.220 1.197 1.130 1.107 1.031 1.030
LD 0.789 0.668 0.772 0.677 0.734 0.665 0.726 0.674
FI 0.751 0.672 0.747 0.639 0.714 0.657 0.694 0.650
MI 0.896 0.889 0.936 0.918 0.946 0.954 0.962 0.958
FB 1.296 1.290 1.249 1.233 1.108 1.082 1.036 1.028
2M 1.503 1.625 1.704 1.821 1.799 1.947 1.946 2.084
vpoor poor mod high
Method Mean Median Mean Median Mean Median Mean Median
MLEM 0.746 0.715 0.709 0.640 0.687 0.625 0.697 0.614
IW 1.149 1.106 1.201 1.177 1.187 1.137 1.138 1.106
LD 0.762 0.677 0.750 0.658 0.756 0.666 0.753 0.686
FI 0.742 0.656 0.719 0.645 0.724 0.651 0.720 0.647
MI 0.956 0.960 0.923 0.910 0.926 0.928 0.935 0.927
FB 1.184 1.165 1.169 1.150 1.192 1.148 1.144 1.105
2M 1.828 1.952 1.765 1.980 1.720 1.906 1.639 1.719
125
N = 100 N = 200
2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
MLEM IW LD FI MI FB 2M MLEM IW LD FI MI FB 2M
N = 500 N = 1000
2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
MLEM IW LD FI MI FB 2M MLEM IW LD FI MI FB 2M
mean median
Figure 4.9. Overall means and medians of the SE/SD ratios by ME
and SS (Missing data methods). The results from the full data analysis
methods are included in the plots as a basis for comparison.
126
SE/SD Ratio
10% Missing 20% Missing
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
MLEM IW LD FI MI FB 2M MLEM IW LD FI MI FB 2M
30% Missing 40% Missing
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
MLEM IW LD FI MI FB 2M MLEM IW LD FI MI FB 2M
mean median
Figure 4.10. Overall means and medians of the SE/SD ratios by ME
and MR (Missing data methods). The results from the full data analysis
methods are included in the plots as a basis for comparison.
127
SE/SD Ratio
A factorial ANOVA of the standard error bias measures revealed a signifi-
cant two-way ME×MR interaction for the intercept variances, slope variances, and
residual variances, as well as a significant two-way ME×SS interaction for all the
variance measures. With the exception of the residual variances, the majority of the
variability (η2 values between 85% and 93%) was explained by SE/SD differences
in the method that was used, so these will be the focus of discussion next, followed
by a discussion of the interaction effects that were flagged.
Table 4.30
Factorial ANOVA Results for SE/SD Ratios of Estimates of
Means (Missing Data Methods)
µint µC1 int µC2 slop µC1 slopC2
Factor F η2 F η2 F η2 F η2
ME 6319 92.4 2713.4 85.8 4515 91.3 6648.7 92.8
Note. Values in bold are discussed in further detail. MM: missing mech-
anism; MR: missing rate; SP: class separation; SS: sample size; ME:
analysis method.
Table 4.31
Factorial ANOVA Results for SE/SD Ratios of Estimates of
Variances (Missing Data Methods)
Ψint Ψslop Ψ12 Θ
Factor F η2 F η2 F η2 F η2
ME×SS 119.9 9.0 297.3 18.4 408.1 21.2 327.7 53.8
ME×MR 66.3 6.7 94.4 7.8 - - 76 16.6
ME 2382.3 59.8 3130.5 64.7 3542 61.2 - -
Note. Cells containing a “-” indicate non-significance (p-value > 0.05)
and/or effect sizes less than 6%. Values in bold are discussed in further
detail. MM: missing mechanism; MR: missing rate; SP: class separation; SS:
sample size; ME: analysis method.
128
4.2.4.1 Results of the Main Effects
The SE/SD ratios for each method averaged over all conditions for intercept
means and slope means is presented in Table 4.32. With the exception of differences
in SE/SD ratios between the LD and FI methods, all other comparisons between
methods produced significant differences. Similar to what was observed in the overall
summary of the ratios, MLEM-based methods (LD, FI, and MI) produced under-
estimated standard errors (SE/SD ratios between 0.5 and 0.6) while the FB and
2M methods produced overestimated standard errors (SE/SD ratios between 1.23
and 2.46). Main effects of ME observed for the SE estimates of the variance are not
discussed because significant interactions with other conditions were found, which
are discussed next.
Table 4.32
Mean SE/SD Ratios Grouped by Analysis Method
(ME)
ME µint µC1 int µ µC2 slopC1 slopC2
LD† 0.53 0.59 0.59 0.53
FI† 0.51 0.59 0.59 0.50
MI 0.78 0.91 0.96 0.88
FB 1.29 1.28 1.24 1.23
2M 2.00 2.05 2.46 2.33
Note. Means are averaged over levels of SP, SS, MR, and MM.
All pairwise comparisons were significant at the α = 0.05 level
unless otherwise noted. †Differences between LD and FI were
not significant for all parameters. All p-values were adjusted
using the Tukey method.
129
4.2.4.2 Results of the Interaction Effects
To investigate the two-way interaction effects (ME×SS and ME×MR), main
effect pairwise comparisons between the methods were conducted for the differ-
ent levels of the second factor. The results of these comparisons are provided in
Appendix B, Tables B.3 through B.9. No three-way interactions were flagged. In
general, no clear patterns emerged from the pairwise comparisons, although most
comparisons were flagged as significantly different.
To make some comparison visible, the SE/SD ratios for any differences that
were flagged for significant interaction effects were plotted and are presented in
Figure 4.11 to show the ME×SS interactions and Figure 4.12 to show the ME×MR
interactions. In these plots, the SE/SD ratios of the complete data methods (MLEM
and IW) are also provided as a basis of comparison.
Although most comparisons showed significant differences, overall, no clear
patterns emerged and the results corroborated what was previously observed. Mainly,
MLEM-based method (MLEM, LD, FI, MI) tended to underestimate the standard
errors, and Bayesian-based methods (IW, FB, 2M) tended to overestimates stan-
dard errors. This can be observed in Figure 4.11, where the intercept variance, slope
variance and the intercept slope covariance show the points for MLEM, LD, FI, and
MI below the dotted line, and the points for IW, FB, and 2M (with the exception
of the intercept variance and residual variance) above the dotted line.
The 2M method did not display any noticeable pattern across parameters. Al-
though the overall summary revealed that the 2M method generally overestimated
130
Intercept Variance Slope Variance
●
●
   SS    SS
● 100 ● 100
200 200
500 500
1000 1000
●
●
● ●
● ●
●
● ●
● ●
●
MLEM LD FI MI IW FB 2M MLEM LD FI MI IW FB 2M
Intercept Slope Covariance Residual Variance
   SS    SS
● 100 ● 100
200 200
500 500
1000 1000
●
●
●
●
●
●
●
●
●
●
●
● ●
●
MLEM LD FI MI IW FB 2M MLEM LD FI MI IW FB 2M
Figure 4.11. Interaction plots for ME × SS of SE/SD ratios for intercept variance,
slope variance, intercept slope covariance, and residual variance.
131
SE/SD Ratio SE/SD Ratio
0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 0.6 0.8 1.0 1.2
SE/SD Ratio SE/SD Ratio
0.7 0.8 0.9 1.0 1.1 1.2 1.3 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Intercept Variance Slope Variance
●
●    MR    MR
0 0
● 5 ● 5
10 10
20 20
30 30
40 40
●
●
● ● ●
●
● ●
MLEM LD FI MI IW FB 2M MLEM LD FI MI IW FB 2M
Residual Variance
●
   MR
0
● 5
10
20
30
40
● ●
●
●
MLEM LD FI MI IW FB 2M
Figure 4.12. Interaction plots for ME × MR of SE/SD ratios for intercept variance,
slope variance, and residual variance.
132
SE/SD Ratio
0.6 0.7 0.8 0.9 1.0 1.1
SE/SD Ratio
0.9 1.0 1.1 1.2
SE/SD Ratio
1.0 1.2 1.4 1.6 1.8
the estimates of the SEs, the plot of the intercept variance and residual variance for
each sample size revealed the opposite; SEs for the 2M method were consistently
underestimated for the intercept variance and were over or underestimated for the
residual variance depending on the sample size. For example, for the residual vari-
ances, at sample sizes of 100 and 200, the SE/SD ratios for the 2M method were
0.72 and 0.93 (underestimated), respectively, but at sample sizes of 500 and 1000,
these ratios were 1.13 and 1.34 (overestimated), respectively. A similar lack of pat-
tern was observed in Figure 4.12, where the intercept variance for the 2M method
was consistently underestimated, and the residual variance was over or underesti-
mated depending the rate of missing data. To say the least, the estimates of the
standard error for the 2M method were quite difficult to make sense of compared
to other methods because the direction and severity of the under or over estimation
was dependant on the parameter as well as the level of the secondary factor being
compared.
An observation that was questionable was that the lower sample sizes (100
and 200) for the MLEM-based methods seemed to correct for the underestimated
standard errors, because the complete data method (MLEM) was producing un-
derestimated standard errors to begin with, so this correction may have been an
artifact of larger sample sizes inflating the standard errors rather than the methods
producing more accurate estimates of the standard error for smaller sample sizes.
The FB method produced the closest estimates of the standard errors compared to
the complete data method for all parameters.
133
4.3 Summary of Main Findings
This final section of chapter 4 is provided as a way to summarize the results
by enumerating the main findings. The main findings from the simulation study are
as follows:
4.3.1 Full Data Analysis
1. Convergence and classification accuracy rates: Using the full data, Bayesian
inference methods produced higher convergence rates and classification rates.
The inverse Wishart and data-driven prior methods produced the highest rates
of convergence among the Bayesian methods, while the classification accuracy
was similar among the Bayesian methods.
2. Relative bias: Among the different priors that were tested for the Bayesian
method, the improper inverse Wishart priors for the variance parameters pro-
duced overall less biased parameters. In general, the variance parameters like
the slope variance and intercept slope covariance parameters were the most
severely biased when using the MLEM method and Bayesian method using
PW and DD priors. The MLEM method also produced severely biased esti-
mates of the slope means and proportion parameters when the class separation
not high. Based on these observations, the Bayesian method using IW priors
outperformed all other methods for the complete data methods.
3. Standard errors: Overall, standard errors were underestimated for the MLEM
134
and Bayesian inference using the data-driven MLEM-based priors methods.
The improper inverse Wishart and proper inverse Wishart priors produced
overestimated standard errors that decreased and reached near-perfect accu-
racy at sample sizes of 500 and 1000.
4.3.2 Missing Data Analysis
1. Convergence and classification accuracy rates: The convergence rates were
generally smaller across the missing data conditions compared to the conver-
gence rates that were observed from the complete data analysis. However, the
fully Bayesian and two-stage imputation method maintained convergence rates
close to the rates from the full data analysis using IW priors. The listwise dele-
tion and full-information maximum likelihood methods produced even lower
convergence rates than when the full data was used. Classification accuracy
decreased in general as separation decreased. The fully Bayesian approach
maintained the closest accuracy rates to the full data method. Accuracy rates
decreased when the two-stage imputation method was used. Using listwise
deletion, FIML, and single-stage MI had a very minimal impact on the classi-
fication accuracy rates compared the rates produced when using MLEM.
2. Relative bias: Overall, the relative bias of the FB and 2M methods were closest
to the relative bias that was observed when missing data was not a factor. The
LD, FI and single stage MI method produced greater over bias than MLEM,
but this bias decreased as sample size increased. The large relative bias could
135
be attributed to severely biased slope means and severely biased variances
at specific sample sizes and class separation conditions, similar to what was
observed in the full data analysis. The estimates of the variance parameters
created the most bias for most methods, but the the MLEM-based methods
were the most susceptible to the addition of missing data conditions.
3. Standard errors: In general, the MLEM-based methods produced underesti-
mated standard errors by about a factor of 0.5. The FB method produced ac-
curate standard errors with larger samples and slightly overestimated standard
errors when sample sizes decreased and varying missing data rate conditions.
The 2M method produced overestimated standard errors by almost a factor
of 2 across all conditions.
136
Chapter 5: Discussion
The results from the analysis without any of the missing data conditions was as
expected in that the ML method was more susceptible to non-convergence than the
Bayesian method with different priors. The use of data-driven priors produced lower
convergence rates than when noninformative improper priors and proper priors were
used, but this was expected because the priors had to be obtained from an initial
ML run. The data-driven priors were still slightly higher than the results from the
ML method because the convergence criteria for obtaining the priors from the ML
estimates were more relaxed. For example, it was not of concern if the standard
errors from the ML estimation were very large. These types of results were omitted
from the ML results. The higher convergence rates observed from the noninformative
and proper prior methods were expected because ranges were not being restricted
on the posterior distributions of the parameters by means of any specific prior.
The classification accuracy rates did improve with increased sample sizes and
increased class separation, and the Bayesian methods did improve the accuracy
rates slightly over ML-EM (approximately 2-12% increase). These accuracy rates
are related to the estimates of the proportion and variance parameters, which were
shown to be biased for the ML-EM method. If the class proportion was incorrect,
137
and variance parameters were biased, then the susceptibility of individuals being
placed into the wrong class would be greater.
In general, bias across parameters was found to be greatest when using the
MLEM method, especially for the variance estimates, which were generally under-
estimated. This was actually consistent with the growth modeling literature, which
has pointed to the issue that even with complete data, some maximum likelihood
methods under a growth modeling SEM framework will underestimate factor vari-
ances and standard errors (Browne & Draper, 2006), even though large sample sizes
should correct for these discrepancies. In this regard, Browne and Draper (2006) and
also McNeish and Harring (2017), who conducted a similar study with a focus on
samples of less than 100, recommend using a restricted version of the ML method to
obtain unbiased parameter estimates. This finding was also partially supported by
the results presented in Depaoli (2013), which show that slope variances and some-
times the intercept and intercept slope variances were consistently underestimated
when using the standard ML-EM method (the default in Mplus) was used.
The discrepancies found among the Bayesian method using different prior spec-
ifications was different than what was found in the literature. In general, the litera-
ture on choosing appropriate priors in the context of GMMs is still mainly unclear,
and the results that were observed from the full data analysis were in some way
a testament to this. In general, the relative bias of the parameters were the least
biased when noninformative (improper inverse Wishart) priors were used, which was
partially contrary to what Depaoli (2013) and McNeish (2016)1 found. Actually, the
1Prior specifications tested were for small sample sizes (less than 100) in the context of latent
138
superior performance of the noninformative inverse Wishart (diffuse) priors were
more in line with suggestions by Browne and Draper (2006), although the priors
tested in that particular study were in the context of longitudinal multilevel mod-
els. In any case, the purpose of the current study was not to find the priors that
produced the least biased parameter estimates, so using the noninformative default
Mplus priors was justifiably good enough for the purposes of the current study. In
hindsight, however, although unrealistic, a better approach would have been to use
the population values to inform the hyperparameters so that any effects of missing
data on GMM estimation could be isolated.
Overall standard errors were underestimated for the MLEM method, gener-
ally overestimated for the noninformative (improper inverse Wishart) and proper
identity inverse Wishart prior method, and generally underestimated for the data-
driven prior method. These results were actually consistent with what was observed
in McNeish (2016), where the Bayesian MCMC methods using noninformative pri-
ors produced overly wide coverage intervals (overestimated standard errors) while
FIML consistently had coverage intervals that were too short (underestimated stan-
dard errors). The reverse impact that the priors had on the standard errors of the
intercept means was interesting (standard error ratios converged to 1 as sample sizes
increased using noninformative priors, while they deviated upwards from 1 as sample
sizes increased using data driven priors). In addition, standard errors were severely
underestimated for the intercept means when using data driven priors, which may
growth models, which may not have been appropriate for larger samples and in mixture settings,
where the number of parameters to account for is greater.
139
have been an artifact of the priors specified for the variance parameters since priors
were not specified for any of the means.
The purpose of analyzing the full-data methods using different prior specifica-
tions was to find an appropriate basis of comparison for the missing data methods,
which were grounded on MLEM (LD, FI, ML) or Bayesian inference (FB, 2M). Un-
fortunately, this investigation was not exhaustive or perfect in its execution, and a
more thorough investigation of different priors would have been ideal. However, since
this was not the main focus of the study, the analysis of the missing data methods
was conducted in light of the results that were obtained from testing these suggested
prior specifications, which in the end, prompted the use of the noninformative priors
for the Bayesian methods.
The accuracy rates across methods when missing data conditions were intro-
duced were mainly indistinguishable for very poor class separations. The FB method,
however, was clearly producing the highest accuracy rates with poor, moderate, and
high class separation conditions while the other methods produced similar accuracy
rates. The lower accuracy rates that were observed for the 2M method were some-
what surprising given that the parameter estimates were quite comparable to the FB
method. In retrospect, this may have been an artifact of how these accuracy rates
were collected for the 2M method during the simulation, which was different from
how accuracy rates were collected for non-imputation methods. This observation
warrants a separate investigation.
With the introduction of missing data conditions, there was a downward bias
of the parameter estimates across methods with increased missing data rates. This
140
bias was only attributed to a few parameters, however, as could be seen in the
large discrepancies between the means and medians of the relative bias measures.
The variable-by-variable comparison indicated severe problems when estimating the
intercept variances and slope variances at sample sizes of 100 and 200. While the
FB and 2M methods were able to keep the bias controlled with the introduction of
some missing data conditions, LD, FI, and MI methods further biased the estimates.
Furthermore, the standard errors were consistently overestimated for the 2M
method, while in general, the FB method produced SE to SD ratios closest to 1,
albeit overestimated standard errors were observed when sample sizes were at 100
and 200. The LD, MI, and FI methods, which were grounded on FIML-EM, produced
underestimated standard errors.
LD and MI methods performed relatively poorly compared to FI, FB and 2M,
which was consistent with Enders and Gottschall (2011), who suggested avoiding MI
for mixture models altogether. Based on research by Harel et al. (2013), however,
2M seemed to be a viable alternative to MI.
Another important point that should not go ignored is that the differences
when imposing an MCAR as supposed to a CMAR+ missingness mechanism was
minimal, and whether the missing data approaches did anything to improve estima-
tion when missingness was the more severe CMAR+ missingness was not very clear
from the results. With multiple group models, Enders and Gottschall (2011) as well
as Sterba (2016) suggested that the missingness mechanism was inconsequential to
how poorly the MI method would perform. In this regard, it made sense that there
was very little difference observed between the estimates from the two missingness
141
mechanisms. However, these results are far from conclusive because the MLEM esti-
mates were already producing poor estimates to begin with, making the MI method
incomparable to the 2M method.
5.1 Recommendations
Based on the findings from the simulation study, a few recommendations can
be made. First, the Bayesian inference and two-stage imputation methods may serve
as viable alternatives to MLEM and single-stage MI in terms of producing less bi-
ased parameter estimates for GMMs. However, this comparison warrants further
investigation given that the full-data analysis step of the single-stage MI method
was conducted using MLEM, which produced relatively more biased estimates com-
pared to the Bayesian method. Although choosing priors can be an intricate step,
the advantages in terms of producing higher convergence rates, higher classification
accuracy rates, less biased parameter estimates, and more accurate standard errors
(for the Bayesian approach) may be worth the additional analysis. One disadvantage
of using the two-stage imputation method is that although the tools are available in
Mplus, the setup is more involved and requires the researcher to take additional steps
outside of Mplus. The steps that were taken to conduct the two-stage imputation
method for this study are outlined in Appendix C.
Even with the use of noninformative flat priors, the fully Bayesian approach
was superior in regards to the evaluation criteria that were considered, especially
with the estimation of the standard errors. While the two-stage imputation was
142
relatively better than single-stage imputation and FIML in terms of recovering pa-
rameter estimates, the standard errors were consistently overestimated, which in
practical applications would have led to inflated Type II error rates.
Nonetheless, these recommendations should be taken with caution because
several problems arose during the simulation study, which could have compromised
some of the results. In addition, there were several factors of the study that kept
the results from this study from being generalized to all kinds of situations. These
issues and other related limitations are discussed next.
5.2 Limitations and Future Studies
The first part of the simulation study was an attempt to find a useful, non-
biasing, and realistic set of parameter priors. This precursory analysis was far from
a comprehensive sensitivity analysis that may have been useful. Defining priors
through a sensitivity analysis is a popular topic of interest in Bayesian literature
because there are several ways to define priors. Three prior specifications found
in the literature were used in the current study, one of them being the use of the
MLEM estimates for the data-driven method, which a popular way of defining priors.
Although unrealistic, it may have been best to forgo this step and use the population
values as priors in order to isolate the effects of missing data conditions alone.
However, this also poses the problem that the results from the Bayesian methods
would then produce overly favorable results, perhaps enough to damper any effects
of the missing data conditions. In this regard, using the most naive approaches for
143
both (using the IW priors, which was the default Mplus setting) may have been
appropriate. Nevertheless, a useful future study would be to conduct a sensitivity
analysis using various prior specification methods that are currently available in the
context GMMs with missing data.
The method of choosing priors becomes even murkier when implementing a
multiple imputation approach. For example, during the study it became evident
that the choice existed between using MLEM and Bayesian inference for analyzing
the multiply imputed data. This choice of analysis presented itself at both stages of
the imputation for two-stage imputation, and at the first stage for the single-stage
imputation method. After imputation, there was the additional choice of analyzing
the datasets with imputed data using MLEM or Bayesian inference. For the current
study, the default ML method was used for single-stage MI after the imputation step.
Unfortunately, this will have produced incomparable results between the single-stage
and two-stage imputation. This was an obvious limitation and would be a topic to
investigate in the near future.
Using Mplus for the simulation presented problems due to convergence issues.
For example, for numerous replications, the estimates of the variances needed to be
monitored because Mplus would terminate successfully without flagging any nega-
tive variances. For some replications, Mplus even allowed extremely large standard
errors to go unnoticed, so results from such replications needed to be filtered out
of the results. Convergence issues were even more problematic when running the
multiple imputation methods, where it became unclear as to when and how many
of the estimations from the imputed datasets produced sensical results. To ensure
144
that all estimations for imputations converged and produced good estimated values,
extra imputations were created and estimated until the desired number of imputa-
tions per replication were obtained. The same procedure was used for the two-stage
imputation method, where additional imputations at both stages were created and
estimated until the total number of desired imputations were collected.
The type of model that was tested was also a limitation in itself. Some other
conditions of the model that are typically manipulated but were fixed for the current
study are: total number of classes, varying class proportions, total number of data
collection points (timepoints), structure of the intervals between timepoints, struc-
ture of the variance components of the model, the inclusion of time-invarying/time-
varying covariates, and the overall shape of the model. For example, simulation
studies involving GMMs have regularly tested assumptions and data conditions with
more complex models such as higher order models like quadratic growth models or
piece-wise models (see, e.g., Ning & Luo, 2017; Wu, Zumbo, & Siegel, 2011), oth-
erwise known as knot models (see, e.g., Kohli, 2011). These models are often more
complex and the effect of missing data on these types of GMMs has gone largely
unexplored.
The GMM model that was tested in this study was also highly restrictive in the
sense that between class random effects were kept invariant, which has been noted
to be highly untenable in practice (Bauer & Curran, 2003). Setting random effects
to be invariant across classes is actually an Mplus default, which has been set as so
in order to lessen the computational burdens of estimation, given that any specific
invariances across class variance parameters may not be of interest to most applied
145
researchers. In contrast, more restrictive models, like the latent class growth model
(Nagin, 1999), which does not impose a variance covariance structure of the growth
parameters, is another popular approach for when individual growth trajectories
are not of particular interest. An investigation on how these models would perform
under missing data conditions could be another future study.
The number of timepoints in the current study was also fixed at 4 and these
timepoints were also fixed to have the same distance across timepoints. This type of
data collection is rarely the norm, as the number of measurements and the intervals
when these measurements are collected at each timepoint often vary. This data
design often leads to different time-residual variance covariance structures that need
to be addressed. Simulation studies involving longitudinal models, however, have
shown that imputation methods differ when the times between timepoints vary.
Several variations of the JM approach and FCS approach have been developed for
this purpose and investigating these methods under GMMs would be a useful study.
Another crucial step of general finite mixture modeling that was completely
overlooked in the study was the class enumeration process, which is in itself an
intricate part of the methodology. This process involves comparing the fit indices of
models with a varying number of classes, in an effort to identify the correct number
of subpopulations. This is an important step because as Bauer and Curran (2003)
demonstrated, even data from a homogeneous population may exhibit the presence
of multiple classes when the population is non-normal. Bauer (2007) also enumerated
several violations of the model that can adversely affect class enumeration, including
the presence of missing data that is non-ignorable. This research was mainly using
146
MLEM and fit indices available for MLEM, but fit indices for Bayesian methods
are different (see, e.g., Ning & Luo, 2017) and the literature is unclear on how
these indices perform when different missing data conditions plague the data. In
addition, there are no established procedures for evaluating fit when using multiple
imputation, making the class enumeration process even more unclear for imputation
methods.
The number of imputations for the two-stage imputation method is also un-
clear. McGinniss and Harel (2016) conducted a simulation for a three stage model,
but only on the means. How the estimates of the variances are affected by the num-
ber of first and second stage imputations is still unclear. This hole in the literature
leaves the question of whether or not estimates of the variances would have been
better if more imputations were done at each stage. In addition, it is unclear if these
suggested number of imputations are affected by different class separations.
Finally, several useful missing data handling methods like pattern mixture
models and predictive mean matching were not considered in the current study. For
example, Bauer and Curran (2003) suggested using a pattern mixture model for
GMMs when any missing data are assumed to be MAR. There is also a growing lit-
erature on the use of non-parametric tree imputation techniques using classification
and regression trees (Breiman, 2017) or random forest models (Ho, 1995). These
alternative methods for handling missing data in the growth mixture context could
be useful but the extent of their usefulness would need to be further investigated.
Although the popularity of GMMs has grown in the last decade, the same
cannot be said about methods to handle missing data, despite the reality that most
147
realistic datasets are rife of missing data. This study was nowhere exhaustive of
all the possible scenarios that one might encounter when dealing with missing data
with GMMs. However, the main goal was not to be exhaustive, but rather, to inves-
tigate a few conditions that were previously considered in similar studies, and more
importantly, to begin a discussion on better ways one might be able to deal with
missing data in the context of GMMs.
148
Appendix A: Complete Data Methods Relative Bias and SE/SD Ra-
tio Tables
149
150
Table A.1
Relative Bias for All Parameters by All Conditions Using Full Data Methods
N = 100 N = 200 N = 500 N = 1000
Method Variable vpoor poor mod high vpoor poor mod high vpoor poor mod high vpoor poor mod high
MLEM µint 0.10 0.07 0.05 0.03 0.10 0.08 0.06 0.02 0.06 0.04 0.02 -0.01 0.06 0.04 0.01 -0.01C1
IW 0.03 0.01 -0.01 -0.03 0.03 0.01 -0.01 -0.02 0.02 0.01 -0.01 -0.02 0.03 0.01 -0.01 -0.02
PW 0.05 0.03 0.01 0.00 0.05 0.03 0.01 0.00 0.04 0.02 0.00 -0.01 0.03 0.02 0.00 -0.01
DD 0.03 0.02 0.00 -0.02 0.03 0.02 0.00 -0.01 0.02 0.01 -0.01 -0.02 0.02 0.00 -0.02 -0.02
MLEM µint -0.05 -0.04 -0.01 0.01 -0.04 -0.03 0.00 0.01 -0.03 -0.02 0.00 0.02 -0.03 -0.03 0.00 0.03C2
IW -0.03 -0.01 0.01 0.04 -0.02 -0.01 0.02 0.03 -0.02 0.00 0.02 0.03 -0.02 0.00 0.02 0.02
PW -0.04 -0.03 -0.01 0.01 -0.04 -0.03 -0.01 0.01 -0.03 -0.02 0.00 0.01 -0.03 -0.02 0.01 0.01
DD -0.09 -0.05 -0.03 0.00 -0.17 -0.07 -0.08 -0.02 -0.14 -0.20 -0.11 -0.02 -0.26 -0.25 -0.20 -0.05
MLEM µslop -0.38 -0.33 -0.34 -0.32 -0.32 -0.27 -0.23 -0.15 -0.30 -0.23 -0.16 -0.13 -0.28 -0.22 -0.20 -0.15C1
IW -0.11 -0.05 -0.05 -0.06 -0.07 -0.02 -0.03 -0.02 -0.05 -0.01 -0.02 -0.02 -0.07 -0.01 0.00 -0.01
PW -0.11 -0.07 -0.07 -0.07 -0.08 -0.04 -0.07 -0.07 -0.08 -0.07 -0.06 -0.07 -0.09 -0.06 -0.07 -0.06
DD -0.14 -0.09 -0.11 -0.11 -0.08 -0.07 -0.06 -0.06 -0.07 -0.01 -0.05 -0.04 -0.05 0.00 -0.02 -0.02
MLEM µslop 0.19 0.24 0.29 0.35 0.15 0.26 0.24 0.24 0.22 0.30 0.25 0.23 0.12 0.24 0.24 0.25C2
IW 0.10 0.17 0.16 0.15 0.07 0.14 0.14 0.14 0.05 0.11 0.13 0.13 0.05 0.11 0.12 0.13
PW 0.08 0.17 0.17 0.17 0.08 0.15 0.17 0.17 0.07 0.16 0.16 0.16 0.08 0.16 0.18 0.16
DD 0.11 0.17 0.19 0.19 0.07 0.16 0.15 0.16 0.06 0.13 0.15 0.14 0.07 0.13 0.16 0.14
MLEM Ψint -0.26 -0.23 -0.09 0.15 -0.25 -0.21 -0.02 0.17 -0.18 -0.14 0.02 0.20 -0.15 -0.11 0.09 0.28
IW -0.06 -0.02 0.17 0.47 -0.14 -0.07 0.14 0.37 -0.12 -0.08 0.12 0.32 -0.15 -0.06 0.14 0.29
PW -0.40 -0.36 -0.27 -0.08 -0.38 -0.36 -0.21 -0.03 -0.29 -0.26 -0.09 0.06 -0.25 -0.18 0.00 0.13
DD -0.19 -0.16 0.04 0.23 -0.20 -0.15 0.01 0.21 -0.16 -0.10 0.07 0.24 -0.12 -0.05 0.17 0.26
MLEM Ψslop -0.16 -0.13 -0.16 -0.22 -0.16 -0.19 -0.16 -0.18 -0.25 -0.16 -0.17 -0.21 -0.23 -0.16 -0.19 -0.23
IW 0.01 0.03 0.04 0.01 -0.06 0.00 -0.04 -0.04 -0.07 -0.02 -0.07 -0.06 -0.11 -0.05 -0.06 -0.07
PW -0.13 -0.16 -0.17 -0.21 -0.15 -0.14 -0.22 -0.24 -0.18 -0.19 -0.19 -0.19 -0.20 -0.19 -0.21 -0.19
DD -0.18 -0.15 -0.21 -0.19 -0.10 -0.12 -0.13 -0.16 -0.12 -0.09 -0.15 -0.12 -0.14 -0.08 -0.13 -0.12
MLEM Ψ12 -0.09 -0.31 -0.48 -0.57 0.18 0.08 -0.45 -0.70 0.32 0.07 -0.36 -0.40 0.29 0.00 -0.32 -0.42
IW 0.29 -0.01 -0.34 -0.62 0.46 0.12 -0.25 -0.51 0.37 0.10 -0.20 -0.37 0.49 0.13 -0.28 -0.38
PW 0.85 0.78 0.67 0.64 0.87 0.83 0.66 0.53 0.84 0.73 0.43 0.41 0.88 0.59 0.35 0.24
DD 0.31 0.22 0.01 -0.14 0.37 0.16 -0.07 -0.13 0.46 0.13 -0.12 -0.20 0.38 0.06 -0.27 -0.20
MLEM Θ -0.04 -0.02 -0.02 -0.02 -0.01 -0.01 -0.01 -0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01
IW -0.01 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
PW 0.01 0.02 0.03 0.03 0.01 0.01 0.02 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
DD -0.01 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MLEM πc1 -0.19 -0.14 -0.11 -0.02 -0.24 -0.20 -0.21 -0.08 -0.10 -0.08 -0.04 -0.01 -0.17 -0.07 -0.06 -0.01
IW -0.01 0.00 -0.01 -0.01 -0.02 -0.01 -0.03 -0.01 -0.02 -0.02 -0.02 -0.01 -0.03 -0.03 -0.02 -0.01
PW -0.02 -0.02 -0.02 -0.01 -0.02 -0.03 -0.04 -0.02 -0.03 -0.02 -0.03 -0.02 -0.01 -0.02 -0.01 -0.02
DD 0.05 0.03 0.02 0.00 0.11 0.02 0.05 0.01 0.11 0.18 0.09 0.02 0.23 0.23 0.21 0.07
MLEM πc2 0.19 0.14 0.11 0.02 0.24 0.20 0.21 0.08 0.10 0.08 0.04 0.01 0.17 0.07 0.06 0.01
IW 0.01 0.00 0.01 0.01 0.02 0.01 0.03 0.01 0.02 0.02 0.02 0.01 0.03 0.03 0.02 0.01
PW 0.02 0.02 0.02 0.01 0.02 0.03 0.04 0.02 0.03 0.02 0.03 0.02 0.01 0.02 0.01 0.02
DD -0.05 -0.03 -0.02 0.00 -0.11 -0.02 -0.05 -0.01 -0.11 -0.18 -0.09 -0.02 -0.23 -0.23 -0.21 -0.07
Note. MLEM = full data ML, IW = full data Bayesian with improper IW priors, PW = full data Bayesian with proper IW priors, DD = full data Bayesian with data-driven
priors. vpoor is MD = 0.5, poor is MD = 1.0, mod is MD = 0.5, high is MD = 0.5.
151
Table A.2
SE/SD Ratios for All Parameters by All Conditions Using Full Data Methods
N = 100 N = 200 N = 500 N = 1000
Method Variable vpoor poor mod high vpoor poor mod high vpoor poor mod high vpoor poor mod high
MLEM µintc1 0.50 0.47 0.47 0.51 0.52 0.48 0.48 0.48 0.51 0.52 0.50 0.56 0.45 0.49 0.46 0.47
IW 1.36 1.55 1.41 1.41 1.05 1.20 1.17 1.30 1.02 1.09 1.14 1.02 0.90 1.22 0.99 0.78
PW 0.96 0.96 1.05 0.93 0.83 0.95 1.01 0.92 0.93 0.83 0.83 0.83 0.85 0.99 0.84 0.74
DD 1.07 1.13 1.15 1.12 1.12 1.02 1.07 1.00 1.34 1.51 1.14 1.05 1.62 1.59 1.70 0.98
MLEM µintc2 0.68 0.55 0.58 0.78 0.73 0.70 0.65 0.61 0.74 0.66 0.64 0.67 0.67 0.55 0.58 0.71
IW 1.54 1.56 1.66 1.78 1.47 1.40 1.49 1.55 1.33 1.21 1.33 1.13 1.05 1.38 1.10 0.82
PW 1.13 1.15 1.17 1.21 1.13 1.21 1.14 1.00 1.02 0.97 0.92 0.84 1.08 0.87 0.94 0.78
DD 0.13 0.18 0.20 0.24 0.10 0.12 0.13 0.17 0.09 0.09 0.10 0.15 0.09 0.09 0.10 0.12
MLEM µslop 0.49 0.50 0.51 0.54 0.62 0.59 0.59 0.59 0.59 0.62 0.57 0.55 0.54 0.54 0.58 0.61c1
IW 1.31 1.52 1.35 1.26 1.28 1.49 1.13 1.24 1.27 1.12 1.13 1.26 1.07 1.26 1.17 1.14
PW 1.07 1.00 1.03 1.09 1.01 1.10 0.98 0.97 1.07 1.00 1.06 0.88 1.07 0.93 1.09 0.77
DD 1.13 0.97 1.05 0.91 0.96 0.92 0.91 0.94 0.95 0.79 0.79 0.92 0.82 0.82 0.77 0.81
MLEM µslop 0.53 0.56 0.52 0.57 0.67 0.59 0.52 0.50 0.51 0.53 0.48 0.49 0.73 0.59 0.61 0.56c2
IW 1.15 1.25 1.46 1.43 1.32 1.21 1.55 1.25 1.29 1.47 1.40 1.12 1.06 1.23 0.94 1.07
PW 1.09 0.97 1.17 0.97 1.26 1.21 1.25 1.23 1.20 1.07 1.12 0.94 0.94 0.99 1.00 0.83
DD 1.01 1.02 0.90 1.21 1.04 1.05 1.11 1.15 1.37 0.99 0.99 1.16 1.06 1.07 0.93 0.76
MLEM Ψint 0.75 0.71 0.69 0.60 0.66 0.60 0.64 0.59 0.70 0.66 0.59 0.54 0.62 0.59 0.59 0.49
IW 1.12 1.21 1.40 1.49 1.00 1.13 1.24 1.26 1.10 0.99 1.08 0.95 1.05 1.06 0.99 0.72
PW 0.98 1.06 0.94 0.99 0.84 0.94 0.96 0.87 0.85 0.82 0.79 0.73 0.88 0.80 0.79 0.68
DD 0.84 0.79 0.92 1.00 0.70 0.78 0.90 0.84 0.90 0.87 0.74 0.87 0.79 0.69 0.83 0.67
MLEM Ψslop 1.01 0.92 1.03 1.09 0.95 0.89 0.85 0.90 0.86 0.91 0.73 0.75 0.73 0.73 0.69 0.75
IW 1.24 1.15 1.23 1.16 1.23 1.07 1.14 1.08 1.21 1.07 1.23 1.22 0.97 1.15 1.01 0.98
PW 1.20 1.29 1.12 1.16 1.15 1.29 1.22 1.31 1.14 1.19 1.25 1.11 1.02 0.89 0.99 0.76
DD 0.88 0.84 0.93 0.93 0.85 0.77 0.83 0.82 0.99 0.71 0.67 0.87 0.70 0.66 0.59 0.62
MLEM Ψ12 1.04 0.99 0.87 0.81 1.07 0.99 1.00 0.88 1.06 0.88 0.77 0.81 0.86 0.78 0.71 0.67
IW 1.17 1.21 1.12 1.09 1.19 1.15 1.08 1.07 1.08 1.01 1.06 0.91 0.90 1.05 0.98 0.88
PW 1.12 1.06 1.10 1.09 1.13 1.43 1.27 1.00 1.14 1.11 0.87 0.88 1.06 0.79 0.83 0.71
DD 1.25 1.25 1.08 1.05 1.15 1.18 1.05 1.06 1.07 0.98 0.77 0.98 0.85 0.76 0.80 0.72
MLEM Θ 1.11 1.04 1.02 1.02 1.04 1.02 1.07 1.09 0.96 0.99 1.00 1.07 0.99 1.05 0.98 1.04
IW 0.97 1.00 0.94 0.95 1.08 1.08 1.04 1.10 0.98 0.97 0.97 0.97 1.00 1.00 1.02 1.04
PW 0.96 0.97 0.95 0.95 1.05 1.13 1.09 1.06 0.99 0.97 0.95 0.97 0.99 1.02 1.02 1.05
DD 0.99 0.98 1.01 0.99 1.06 1.06 1.00 0.99 0.99 1.01 0.99 0.99 0.96 0.98 0.98 1.04
Note. MLEM = full data ML, IW = full data Bayesian with improper IW priors, PW = full data Bayesian with proper IW priors, DD = full data Bayesian with data-driven
priors. vpoor is MD = 0.5, poor is MD = 1.0, mod is MD = 0.5, high is MD = 0.5.
Appendix B: Missing Data Methods SE/SD Ratios and Pairwise Com-
parison Tables
152
Table B.1
SE/SD Ratios For Mean Estimates Grouped by ME
and SS (Missing Data Methods)
Method Variable N = 100 N = 200 N = 500 N = 100
MLEM µint 0.488 0.490 0.522 0.488C1
IW 1.432 1.181 1.067 1.432
LD 0.514 0.537 0.559 0.514
FI 0.507 0.517 0.582 0.507
MI 0.684 0.732 0.841 0.684
FB 1.425 1.359 1.239 1.425
2M 1.852 1.947 2.099 1.852
MLEM µint 0.648 0.674 0.677 0.648C2
IW 1.636 1.476 1.252 1.636
LD 0.532 0.607 0.580 0.532
FI 0.592 0.581 0.593 0.592
MI 0.872 0.881 0.954 0.872
FB 1.475 1.377 1.163 1.475
2M 1.939 2.210 1.973 1.939
MLEM µslop 0.509 0.595 0.584 0.509C1
IW 1.360 1.285 1.196 1.360
LD 0.554 0.577 0.624 0.554
FI 0.519 0.617 0.654 0.519
MI 0.779 0.945 1.049 0.779
FB 1.398 1.198 1.231 1.398
2M 2.269 2.335 2.551 2.269
MLEM µslop 0.546 0.568 0.501 0.546C1
IW 1.321 1.333 1.321 1.321
LD 0.557 0.541 0.513 0.557
FI 0.520 0.492 0.476 0.520
MI 0.816 0.897 0.877 0.816
FB 1.376 1.413 1.108 1.376
2M 2.157 2.437 2.230 2.157
153
Table B.2
SE/SD Ratios For Variance Estimates Grouped ME
and SS (Missing Data Methods)
Method Variable N = 100 N = 200 N = 500 N = 100
MLEM Ψint 0.685 0.625 0.624 0.685
IW 1.303 1.158 1.029 1.303
LD 0.890 0.733 0.646 0.890
FI 0.700 0.652 0.608 0.700
MI 0.819 0.802 0.809 0.819
FB 1.277 1.205 1.048 1.277
2M 0.527 0.595 0.623 0.527
MLEM Ψslop 1.014 0.897 0.813 1.014
IW 1.196 1.131 1.183 1.196
LD 1.077 1.010 0.914 1.077
FI 1.020 0.983 0.860 1.020
MI 1.184 1.118 1.011 1.184
FB 1.235 1.151 1.007 1.235
2M 1.347 1.599 1.821 1.347
MLEM Ψ12 0.926 0.986 0.883 0.926
IW 1.145 1.124 1.016 1.145
LD 1.153 1.107 1.002 1.153
FI 1.105 1.077 0.920 1.105
MI 0.934 1.017 0.991 0.934
FB 1.194 1.223 1.062 1.194
2M 1.213 1.580 1.961 1.213
MLEM Θ 1.048 1.053 1.006 1.048
IW 0.967 1.075 0.974 0.967
LD 1.034 1.066 1.035 1.034
FI 1.043 1.058 1.022 1.043
MI 1.083 1.092 1.035 1.083
FB 0.986 1.064 1.009 0.986
2M 0.719 0.932 1.131 0.719
154
Table B.3
Main Effect Pairwise Comparisons Corresponding to ME × SS In-
teraction of Intercept Variances for SE/SD Ratios
Contrast (ME) SS X̄ME=j,SS=k X̄ME=j′,SS=k′ Difference
LD - FI 100 0.97 0.71 0.26
LD - MI 0.97 0.82 0.15
LD - FB 0.97 1.28 -0.31
LD - 2M 0.97 0.53 0.44
FI - MI 0.71 0.82 -0.11
FI - FB 0.71 1.28 -0.56
FI - 2M 0.71 0.53 0.19
MI - FB 0.82 1.28 -0.46
MI - 2M 0.82 0.53 0.29
FB - 2M 1.28 0.53 0.75
LD - FI 200 0.76 0.66 0.11
LD - MI 0.76 0.80 -0.04
LD - FB 0.76 1.20 -0.44
LD - 2M 0.76 0.59 0.17
FI - MI 0.66 0.80 -0.14
FI - FB 0.66 1.20 -0.55
FI - 2M 0.66 0.59 0.06
MI - FB 0.80 1.20 -0.40
MI - 2M 0.80 0.59 0.21
FB - 2M 1.20 0.59 0.61
LD - FI 500 0.66 0.61 0.05
LD - MI 0.66 0.81 -0.15
LD - FB 0.66 1.05 -0.39
LD - 2M 0.66 0.62 0.03
FI - MI 0.61 0.81 -0.20
FI - FB 0.61 1.05 -0.44
FI - 2M* 0.61 0.62 -0.01
MI - FB 0.81 1.05 -0.24
MI - 2M 0.81 0.62 0.19
FB - 2M 1.05 0.62 0.43
LD - FI 1000 0.63 0.59 0.04
LD - MI 0.63 0.85 -0.22
LD - FB 0.63 0.97 -0.34
LD - 2M* 0.63 0.66 -0.02
FI - MI 0.59 0.85 -0.26
FI - FB 0.59 0.97 -0.38
FI - 2M 0.59 0.66 -0.07
MI - FB 0.85 0.97 -0.11
MI - 2M 0.85 0.66 0.20
FB - 2M 0.97 0.66 0.31
Note. Means are averaged over levels of SP, MR, and MM. All pairwise com-
parisons were significant at the α = 0.05 level unless otherwise noted by a *. All
p-values were adjusted using the Tukey method.
155
Table B.4
Main Effect Pairwise Comparisons Corresponding to ME × SS In-
teraction of Slope Variances for SE/SD Ratios
Contrast (ME) SS X̄ME=j,SS=k X̄ME=j′,SS=k′ Difference
LD - FI 100 1.12 1.05 0.07
LD - MI 1.12 1.18 -0.06
LD - FB 1.12 1.23 -0.11
LD - 2M 1.12 1.35 -0.22
FI - MI 1.05 1.18 -0.13
FI - FB 1.05 1.23 -0.19
FI - 2M 1.05 1.35 -0.30
MI - FB 1.18 1.23 -0.05
MI - 2M 1.18 1.35 -0.16
FB - 2M 1.23 1.35 -0.11
LD - FI* 200 1.04 1.00 0.03
LD - MI 1.04 1.12 -0.08
LD - FB 1.04 1.15 -0.11
LD - 2M 1.04 1.60 -0.56
FI - MI 1.00 1.12 -0.11
FI - FB 1.00 1.15 -0.15
FI - 2M 1.00 1.60 -0.59
MI - FB* 1.12 1.15 -0.03
MI - 2M 1.12 1.60 -0.48
FB - 2M 1.15 1.60 -0.45
LD - FI 500 0.92 0.86 0.06
LD - MI 0.92 1.01 -0.10
LD - FB 0.92 1.01 -0.09
LD - 2M 0.92 1.82 -0.91
FI - MI 0.86 1.01 -0.15
FI - FB 0.86 1.01 -0.15
FI - 2M 0.86 1.82 -0.96
MI - FB* 1.01 1.01 0.00
MI - 2M 1.01 1.82 -0.81
FB - 2M 1.01 1.82 -0.81
LD - FI 1000 0.82 0.77 0.05
LD - MI 0.82 0.99 -0.17
LD - FB 0.82 0.96 -0.13
LD - 2M 0.82 2.11 -1.28
FI - MI 0.77 0.99 -0.22
FI - FB 0.77 0.96 -0.19
FI - 2M 0.77 2.11 -1.34
MI - FB* 0.99 0.96 0.03
MI - 2M 0.99 2.11 -1.12
FB - 2M 0.96 2.11 -1.15
Note. Means are averaged over levels of SP, MR, and MM. All pairwise com-
parisons were significant at the α = 0.05 level unless otherwise noted by a *. All
p-values were adjusted using the Tukey method.
156
Table B.5
Main Effect Pairwise Comparisons Corresponding to ME × SS In-
teraction of Intercept Slope Covariances for SE/SD Ratios
Contrast (ME) SS X̄ME=j,SS=k X̄ME=j′,SS=k′ Difference
LD - FI 100 1.09 1.02 0.07
LD - MI 1.09 0.93 0.16
LD - FB 1.09 1.19 -0.10
LD - 2M 1.09 1.21 -0.12
FI - MI 1.02 0.93 0.08
FI - FB 1.02 1.19 -0.18
FI - 2M 1.02 1.21 -0.20
MI - FB 0.93 1.19 -0.26
MI - 2M 0.93 1.21 -0.28
FB - 2M 1.19 1.21 -0.02
LD - FI 200 1.07 1.04 0.03
LD - MI 1.07 1.02 0.05
LD - FB 1.07 1.22 -0.15
LD - 2M 1.07 1.58 -0.51
FI - MI 1.04 1.02 0.02
FI - FB 1.04 1.22 -0.19
FI - 2M 1.04 1.58 -0.54
MI - FB 1.02 1.22 -0.21
MI - 2M 1.02 1.58 -0.56
FB - 2M 1.22 1.58 -0.36
LD - FI 500 0.98 0.92 0.06
LD - MI 0.98 0.99 -0.01
LD - FB 0.98 1.06 -0.08
LD - 2M 0.98 1.96 -0.98
FI - MI 0.92 0.99 -0.07
FI - FB 0.92 1.06 -0.14
FI - 2M 0.92 1.96 -1.04
MI - FB 0.99 1.06 -0.07
MI - 2M 0.99 1.96 -0.97
FB - 2M 1.06 1.96 -0.90
LD - FI 1000 0.93 0.87 0.06
LD - MI 0.93 1.02 -0.09
LD - FB 0.93 0.98 -0.06
LD - 2M 0.93 2.17 -1.24
FI - MI 0.87 1.02 -0.15
FI - FB 0.87 0.98 -0.12
FI - 2M 0.87 2.17 -1.30
MI - FB 1.02 0.98 0.03
MI - 2M 1.02 2.17 -1.15
FB - 2M 0.98 2.17 -1.18
Note. Means are averaged over levels of SP, MR, and MM. All pairwise com-
parisons were significant at the α = 0.05 level unless otherwise noted by a *. All
p-values were adjusted using the Tukey method.
157
Table B.6
Main Effect Pairwise Comparisons Corresponding to ME × SS In-
teraction of Residual Variances for SE/SD Ratios
Contrast (ME) SS X̄ME=j,SS=k X̄ME=j′,SS=k′ Difference
LD - FI* 100 1.07 1.06 0.01
LD - MI* 1.07 1.08 -0.01
LD - FB* 1.07 0.99 0.08
LD - 2M 1.07 0.72 0.35
FI - MI* 1.06 1.08 -0.02
FI - FB 1.06 0.99 0.07
FI - 2M 1.06 0.72 0.34
MI - FB 1.08 0.99 0.10
MI - 2M 1.08 0.72 0.36
FB - 2M 0.99 0.72 0.27
LD - FI* 200 1.09 1.08 0.01
LD - MI* 1.09 1.09 -0.01
LD - FB* 1.09 1.06 0.02
LD - 2M 1.09 0.93 0.15
FI - MI* 1.08 1.09 -0.02
FI - FB* 1.08 1.06 0.01
FI - 2M 1.08 0.93 0.14
MI - FB* 1.09 1.06 0.03
MI - 2M 1.09 0.93 0.16
FB - 2M 1.06 0.93 0.13
LD - FI* 500 1.03 1.02 0.01
LD - MI* 1.03 1.03 0.00
LD - FB* 1.03 1.01 0.02
LD - 2M 1.03 1.13 -0.10
FI - MI* 1.02 1.03 -0.01
FI - FB* 1.02 1.01 0.02
FI - 2M 1.02 1.13 -0.11
MI - FB* 1.03 1.01 0.03
MI - 2M 1.03 1.13 -0.10
FB - 2M 1.01 1.13 -0.12
LD - FI* 1000 1.01 1.01 0.00
LD - MI* 1.01 1.01 0.00
LD - FB* 1.01 0.99 0.02
LD - 2M 1.01 1.34 -0.33
FI - MI* 1.01 1.01 0.00
FI - FB 1.01 0.99 0.02
FI - 2M 1.01 1.34 -0.33
MI - FB* 1.01 0.99 0.02
MI - 2M 1.01 1.34 -0.33
FB - 2M 0.99 1.34 -0.35
Note. Means are averaged over levels of SP, MR, and MM. All pairwise com-
parisons were significant at the α = 0.05 level unless otherwise noted by a *. All
p-values were adjusted using the Tukey method.
158
Table B.7
Main Effect Pairwise Comparisons Corresponding to ME × MR In-
teraction of Intercept Variances for SE/SD Ratios
Contrast (ME) MR X̄ME=j,MR=k X̄ME=j′,MR=k′ Difference
LD - FI* 5 0.63 0.63 0.00
LD - MI* 0.63 0.67 -0.04
LD - FB 0.63 1.11 -0.48
LD - 2M* 0.63 0.63 0.00
FI - MI* 0.63 0.67 -0.04
FI - FB 0.63 1.11 -0.48
FI - 2M* 0.63 0.63 0.00
MI - FB 0.67 1.11 -0.45
MI - 2M* 0.67 0.63 0.04
FB - 2M 1.11 0.63 0.48
LD - FI* 10 0.65 0.63 0.03
LD - MI 0.65 0.71 -0.06
LD - FB 0.65 1.09 -0.44
LD - 2M 0.65 0.61 0.05
FI - MI 0.63 0.71 -0.09
FI - FB 0.63 1.09 -0.47
FI - 2M* 0.63 0.61 0.02
MI - FB 0.71 1.09 -0.38
MI - 2M 0.71 0.61 0.10
FB - 2M 1.09 0.61 0.48
LD - FI 20 0.71 0.63 0.08
LD - MI 0.71 0.81 -0.10
LD - FB 0.71 1.15 -0.44
LD - 2M 0.71 0.60 0.11
FI - MI 0.63 0.81 -0.18
FI - FB 0.63 1.15 -0.52
FI - 2M* 0.63 0.60 0.03
MI - FB 0.81 1.15 -0.34
MI - 2M 0.81 0.60 0.21
FB - 2M 1.15 0.60 0.55
LD - FI 30 0.81 0.65 0.15
LD - MI 0.81 0.90 -0.09
LD - FB 0.81 1.13 -0.33
LD - 2M 0.81 0.59 0.22
FI - MI 0.65 0.90 -0.25
FI - FB 0.65 1.13 -0.48
FI - 2M 0.65 0.59 0.06
MI - FB 0.90 1.13 -0.23
MI - 2M 0.90 0.59 0.31
FB - 2M 1.13 0.59 0.54
LD - FI 40 0.98 0.67 0.30
LD - MI 0.98 1.02 -0.04
LD - FB 0.98 1.14 -0.16
LD - 2M 0.98 0.57 0.40
FI - MI 0.67 1.02 -0.34
FI - FB 0.67 1.14 -0.46
FI - 2M 0.67 0.57 0.10
MI - FB 1.02 1.14 -0.12
MI - 2M 1.02 0.57 0.45
FB - 2M 1.14 0.57 0.57
Note. Means are averaged over levels of SP, SS, and MM. All pairwise comparisons
were significant at the α = 0.05 level unless otherwise noted by a *. All p-values
were adjusted using the Tukey method.
159
Table B.8
Main Effect Pairwise Comparisons Corresponding to ME × MR In-
teraction of Slope Variances for SE/SD Ratios
Contrast (ME) MR X̄ME=j,MR=k X̄ME=j′,MR=k′ Difference
LD - FI* 5 0.87 0.88 -0.01
LD - MI* 0.87 0.89 -0.02
LD - FB 0.87 1.07 -0.20
LD - 2M 0.87 1.91 -1.04
FI - MI* 0.88 0.89 -0.02
FI - FB 0.88 1.07 -0.20
FI - 2M 0.88 1.91 -1.03
MI - FB 0.89 1.07 -0.18
MI - 2M 0.89 1.91 -1.01
FB - 2M 1.07 1.91 -0.83
LD - FI* 10 0.88 0.88 0.00
LD - MI 0.88 0.96 -0.08
LD - FB 0.88 1.09 -0.21
LD - 2M 0.88 1.84 -0.96
FI - MI 0.88 0.96 -0.08
FI - FB 0.88 1.09 -0.21
FI - 2M 0.88 1.84 -0.97
MI - FB 0.96 1.09 -0.13
MI - 2M 0.96 1.84 -0.89
FB - 2M 1.09 1.84 -0.76
LD - FI* 20 0.95 0.91 0.04
LD - MI 0.95 1.09 -0.14
LD - FB 0.95 1.08 -0.13
LD - 2M 0.95 1.74 -0.79
FI - MI 0.91 1.09 -0.18
FI - FB 0.91 1.08 -0.16
FI - 2M 0.91 1.74 -0.83
MI - FB 1.09 1.08 0.02
MI - 2M 1.09 1.74 -0.65
FB - 2M 1.08 1.74 -0.67
LD - FI 30 1.04 0.95 0.10
LD - MI 1.04 1.17 -0.13
LD - FB* 1.04 1.08 -0.04
LD - 2M 1.04 1.61 -0.57
FI - MI 0.95 1.17 -0.23
FI - FB 0.95 1.08 -0.14
FI - 2M 0.95 1.61 -0.66
MI - FB 1.17 1.08 0.09
MI - 2M 1.17 1.61 -0.43
FB - 2M 1.08 1.61 -0.53
LD - FI 40 1.14 1.00 0.14
LD - MI 1.14 1.26 -0.12
LD - FB* 1.14 1.12 0.02
LD - 2M 1.14 1.49 -0.35
FI - MI 1.00 1.26 -0.26
FI - FB 1.00 1.12 -0.13
FI - 2M 1.00 1.49 -0.50
MI - FB 1.26 1.12 0.14
MI - 2M 1.26 1.49 -0.23
FB - 2M 1.12 1.49 -0.37
Note. Means are averaged over levels of SP, SS, and MM. All pairwise comparisons
were significant at the α = 0.05 level unless otherwise noted by a *. All p-values
were adjusted using the Tukey method.
160
Table B.9
Main Effect Pairwise Comparisons Corresponding to ME × MR In-
teraction of Residual Variances for SE/SD Ratios
Contrast (ME) MR X̄ME=j,MR=k X̄ME=j′,MR=k′ Difference
LD - FI* 5 1.04 1.03 0.01
LD - MI* 1.04 1.04 0.00
LD - FB* 1.04 1.01 0.03
LD - 2M 1.04 1.19 -0.15
FI - MI* 1.03 1.04 -0.01
FI - FB* 1.03 1.01 0.02
FI - 2M 1.03 1.19 -0.16
MI - FB* 1.04 1.01 0.03
MI - 2M 1.04 1.19 -0.16
FB - 2M 1.01 1.19 -0.18
LD - FI* 10 1.03 1.04 -0.01
LD - MI* 1.03 1.04 -0.01
LD - FB* 1.03 1.01 0.02
LD - 2M 1.03 1.14 -0.11
FI - MI* 1.04 1.04 0.00
FI - FB* 1.04 1.01 0.03
FI - 2M 1.04 1.14 -0.10
MI - FB* 1.04 1.01 0.03
MI - 2M 1.04 1.14 -0.10
FB - 2M 1.01 1.14 -0.13
LD - FI* 20 1.06 1.05 0.01
LD - MI* 1.06 1.05 0.01
LD - FB 1.06 1.01 0.05
LD - 2M* 1.06 1.04 0.02
FI - MI* 1.05 1.05 0.00
FI - FB 1.05 1.01 0.03
FI - 2M* 1.05 1.04 0.00
MI - FB 1.05 1.01 0.04
MI - 2M* 1.05 1.04 0.01
FB - 2M* 1.01 1.04 -0.03
LD - FI* 30 1.06 1.04 0.01
LD - MI* 1.06 1.07 -0.01
LD - FB 1.06 1.01 0.05
LD - 2M 1.06 0.94 0.12
FI - MI* 1.04 1.07 -0.03
FI - FB 1.04 1.01 0.03
FI - 2M 1.04 0.94 0.10
MI - FB 1.07 1.01 0.06
MI - 2M 1.07 0.94 0.13
FB - 2M 1.01 0.94 0.07
LD - FI* 40 1.06 1.05 0.01
LD - MI* 1.06 1.07 -0.02
LD - FB 1.06 1.01 0.04
LD - 2M 1.06 0.83 0.23
FI - MI* 1.05 1.07 -0.03
FI - FB 1.05 1.01 0.04
FI - 2M 1.05 0.83 0.22
MI - FB 1.07 1.01 0.06
MI - 2M 1.07 0.83 0.25
FB - 2M 1.01 0.83 0.18
Note. Means are averaged over levels of SP, SS, and MM. All pairwise comparisons
were significant at the α = 0.05 level unless otherwise noted by a *. All p-values
were adjusted using the Tukey method.
161
Appendix C: Code to Implement the Two-Stage Imputation Method
in Mplus
The following steps are provided as guidance to implement the two-stage im-
putation method for GMMs using Mplus. Note that this procedure is illustrated
using a two class model GMM, but the procedure can be generalized to more than
two classes as well as other types of mixture models. Assuming M first stage im-
putations (classes) and N second stage imputations (manifest variables), the steps
are:
1. Run the first stage imputation code:
TITLE: GMM Stage 1 MI for 2-stg MI (2M)
DATA:
FILE=data.dat;
VARIABLE:
NAMES = id y1-y4;
USEV = y1-y4;
MISSING = ALL (-99);
CLASSES = c(2);
IDVARIABLE = id;
ANALYSIS:
TYPE = MIXTURE;
ESTIMATOR = BAYES;
PROCESSORS = 1;
CHAINS = 1;
STVALUES = ML;
MODEL:
%overall%
[y1-y4@0];
i s | y1@0 y2@1 y3@2 y4@3;
y1-y4;
i(vi);
s(vs);
i WITH s;
%c#1%
[i](i1);
[s](s1);
162
%c#2%
[i](i2);
[s](s2);
MODEL CONSTRAINT:
i1 > i2;
s2 > s1;
SAVEDATA:
FILE = stg1probs.dat;
SAVE = fscores(30 1000);
OUTPUT:
TECH8;
In this code, M = 30 and can modified by changing the first value of the line
SAVE = fscores(30 1000);
2. Extract the latent class plausible values from the stg1probs.dat file. This file
should contain M sets of imputed class variables.
3. Take one set of plausible class values (m) out of the M sets of imputed class
values and create a new data file that includes the original data with missing
values and a single column for the plausible class values.
4. Separate the classes to different data files so that the next set of imputations
are conditional on the same class. In other words, for each dataset m, create
separate datasets for each class. For each group dataset, use the following
Mplus code to multiply impute N sets of the missing values:
TITLE: Stage 2 GROUP 1 or 2 2-stg MI (2M)
DATA:
FILE = stg2_group1_data.dat;
VARIABLE:
NAMES = id y1-y4 g;
USEV = y1-y4;
MISSING = ALL (-99);
IDVARIABLE = id;
AUXILIARY = g;
DATA IMPUTATION:
IMPUTE = y1-y4;
Ndatasets = 10;
SAVE = stg2_g1_imp*.dat;
THIN = 1000;
ANALYSIS:
TYPE = BASIC;
163
This should create N × 2 fully imputed datasets. In this code, N = 10 and
can be modified by changing the value of
Ndatasets = 10;
Combine the corresponding datasets back together (group 1 imputed data and
group 2 imputed data), resulting in a total of N fully imputed datasets for
dataset m.
5. Run a multiple-group (two class model with known classes) growth mixed
effects model for each dataset in N using the following code:
TITLE:
Stage 2 combined MI with complete data (2M)
DATA:
FILE = stg2_g1g2Combined_data.dat;
VARIABLE:
NAMES = y1-y4 g id;
USEV = y1-y4 g;
CLASSES = c(2);
KNOWNCLASS = c(g = 1 g = 2);
IDVARIABLE = id;
ANALYSIS:
TYPE = MIXTURE;
ESTIMATOR = BAYES;
PROCESSORS = 1;
CHAINS = 1;
MODEL:
%overall%
[y1-y4@0];
i s | y1@0 y2@1 y3@2 y4@3;
y1-y4(yerr);
i;
s;
i WITH s(v);
%c#1%
[i](i1);
[s](s1);
%c#2%
[i](i2);
[s](s2);
MODEL CONSTRAINT:
i1 > i2;
s2 > s1;
6. Repeat steps 3 to 5 for each imputed class variable m and for each imputed
dataset n.
164
7. Combine parameter estimates by averaging across all M by N results.
8. Combine standard errors by following rules outlined in Section 2.4.4.
165
Bibliography
Anderson, T. W. (1957). Maximum likelihood estimates for a multivariate normal
distribution when some observations are missing. Journal of the American
Statistical Association, 52 (278), 200–203.
Anderson, T. W., & Bahadur, R. R. (1962). Classification into two multivariate normal
distributions with different covariance matrices. The Annals of Mathematical
Statistics , 33 (2), 420–431. doi: 10.1214/aoms/1177704568
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data.
In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation
modeling (pp. 243–278). Hillsdale, NJ: Lawrence Erlbaum.
Asparouhov, T., & Muthén, B. O. (2010). Bayesian analysis using Mplus: Technical
implementation. Manuscript submitted for publication.
Bauer, D. J. (2007). Observations on the use of growth mixture models in
psychological research. Multivariate Behavioral Research, 42 (4), 757–786.
Bauer, D. J., & Curran, P. J. (2003). Distributional assumptions of growth mixture
models: implications for overextraction of latent trajectory classes. Psychological
Methods , 8 (3), 338–363.
Beale, E. M. L., & Little, R. J. A. (1975). Missing values in multivariate analysis.
Journal of the Royal Statistical Society. Series B (Methodological), 37 (1),
129–145.
Breiman, L. (2017). Classification and regression trees. New York: Routledge.
166
Brown, R. L. (1994). Efficacy of the indirect approach for estimating structural
equation models with missing data: A comparison of five methods. Structural
Equation Modeling: A Multidisciplinary Journal , 1 (4), 287–316. doi:
10.1080/10705519409539983
Browne, W. J., & Draper, D. (2006). A comparison of Bayesian and likelihood-based
methods for fitting multilevel models. Bayesian Analysis , 1 (3), 473–514. doi:
10.1214/06-BA117
Buck, S. F. (1960). A method of estimation of missing values in multivariate data
suitable for use with an electronic computer. Journal of the Royal Statistical
Society , Series B(22), 302–307.
Cai, J.-H., Song, X.-Y., & Hser, Y.-I. (2010). A Bayesian analysis of mixture structural
equation models with non-ignorable missing responses and covariates. Statistics
in Medicine, 29 (18), 1861–1874.
Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential
difficulties with mixture posterior distributions. Journal of the American
Statistical Association, 95 (451), 957–970.
Chen, F. (2011). The RANDOM statement and more: Moving on with PROC MCMC.
In SAS Global Forum 2011 Conference. Cary, NC: SAS Institute, Inc.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ:
Erlbaum.
Colder, C. R., Campbell, R. T., Ruel, E., Richardson, J. L., & Flay, B. R. (2002). A
finite mixture model of growth trajectories of adolescent alcohol use: Predictors
167
and consequences. Journal of Consulting and Clinical Psychology , 70 (4),
976–985.
Collins, L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and
restrictive strategies in modern missing data procedures. Psychological Methods ,
6 (4), 330–351.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to
nonnormality and specification error in confirmatory factor analysis.
Psychological Methods , 16–29.
de Leeuw, E. D., Hox, J. J., & Huisman, M. (2003). Prevention and treatment of item
nonresponse. Journal of Official Statistics , 19 (2), 153–176.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from
incomplete data via the EM algorithm. Journal of the Royal Statistical Society.
Series B (Methodological), 39 (1), 1–38.
Depaoli, S. (2013). Mixture class recovery in GMM under varying degrees of class
separation: Frequentist versus Bayesian estimation. Psychological Methods , 18 (2),
186–219.
Diebolt, J., & Robert, C. P. (1994). Estimation of finite mixture distributions through
Bayesian sampling. Journal of the Royal Statistical Society. Series B
(Methodological), 56 (2), 363–375.
Diggle, P., & Kenward, M. G. (1994). Informative drop-out in longitudinal data
analysis. Journal of the Royal Statistical Society. Series C (Applied Statistics),
43 (1), 49–93.
168
Enders, C. K. (2010). Applied missing data analysis. New York, NY: The Guilford
Press.
Enders, C. K., & Gottschall, A. C. (2011). Multiple imputation strategies for multiple
group structural equation models. Structural Equation Modeling: A
Multidisciplinary Journal , 18 (1), 35–54.
Enders, C. K., & Tofighi, D. (2008). The impact of misspecifying class-specific residual
variances in growth mixture models. Structural Equation Modeling: A
Multidisciplinary Journal , 15 (1), 75–95.
Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of
estimation for confirmatory factor analysis with ordinal data. Psychological
Methods , 9 (4), 466–491. doi: 10.1037/1082-989X.9.4.466
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995). Bayesian data analysis.
Chapman and Hall.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple
sequences. Statistical Science, 7 (4), 457–472.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the
Bayesian restoration of images. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 6 , 721–741. doi: 10.1109/TPAMI.1984.4767596
Glynn, R. J., Laird, N. M., & Rubin, D. B. (1986). Selection modeling versus mixture
modeling with nonignorable nonresponse. In Drawing inferences from self-selected
samples (pp. 115–142). New York, NY: Springer.
Gold, M. S., & Bentler, P. M. (2000). Treatments of missing data: A Monte Carlo
169
comparison of RBHDI, iterative stochastic regression imputation, and
expectation-maximization. Structural Equation Modeling: A Multidisciplinary
Journal , 7 (3), 319–355.
Graham, J. W. (2003). Adding missing-data-relevant variables to FIML-based
structural equation models. Structural Equation Modeling: A Multidisciplinary
Journal , 10 (1), 80–100.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world.
Annual Review of Psychology , 60 (1), 549–576.
Graham, J. W., Taylor, B. J., & Cumsille, P. E. (2001). Planned missing-data designs
in analysis of change. In New methods for the analysis of change (pp. 335–353).
Washington, DC, US: American Psychological Association. doi:
10.1037/10409-011
Harel, O. (2003). Strategies for data analysis with two types of missing values
(Unpublished doctoral dissertation). Pennsylvania State University, State
College, PA.
Harel, O. (2007). Inferences on missing information under multiple imputation and
two-stage multiple imputation. Statistical Methodology , 4 (1), 75–89.
Harel, O. (2009). The estimation of R2 and adjusted R2 in incomplete data sets using
multiple imputation. Journal of Applied Statistics , 36 (10), 1109–1118.
Harel, O., Chung, H., & Miglioretti, D. (2013). Latent class regression: Inference and
estimation with two-stage multiple imputation: Two-stage MI for LCR.
Biometrical Journal , 55 (4), 541–553.
170
Harel, O., & Schafer, J. L. (2002). Multiple imputation in two stages. In ASA
Proceedings of the Joint Statistical Meetings. New York, NY.
Harel, O., & Schafer, J. L. (2009). Partial and latent ignorability in missing-data
problems. Biometrika, 96 (1), 37–50.
Harring, J. R. (2012). Finite mixtures of nonlinear mixed effects models. In
J. R. Harring & G. R. Hancock (Eds.), Advances in longitudinal methods in the
social and behavioral sciences. Charlotte, NC: Information Age Publishing, Inc.
Hedeker, D., & Gibbons, R. D. (1997). Application of random-effects pattern-mixture
models for missing data in longitudinal studies. Psychological Methods , 2 (1),
64–78. doi: 10.1037/1082-989X.2.1.64
Henson, J. M., Reise, S. P., & Kim, K. H. (2007). Detecting mixtures from structural
model differences using latent variable mixture modeling: A comparison of
relative model fit statistics. Structural Equation Modeling: A Multidisciplinary
Journal , 14 (2), 202–226.
Hesser, H., Hedman, E., Lindfors, P., Andersson, E., & Ljótsson, B. (2017). The
specific effect of systematic exposure in irritable bowel syndrome: complier
average causal effect analysis using growth mixture modeling. Psychological
Medicine, 47 (15), 2653–2662.
Hipp, J. R., & Bauer, D. J. (2006). Local solutions in the estimation of growth mixture
models. Psychological Methods , 11 (1), 36–53.
Ho, T. K. (1995). Random decision forests. In Proceedings of the third international
conference on document analysis and recognition (Vol. Volume 1, pp. 278–282).
171
Washington, DC, USA: IEEE Computer Society.
Horton, N. J., & Lipsitz, S. R. (2001). Multiple Imputation in Practice. The American
Statistician, 55 (3), 244–254. doi: 10.1198/000313001317098266
Hughes, R. A., White, I. R., Seaman, S. R., Carpenter, J. R., Tilling, K., & Sterne,
J. A. C. (2014). Joint modelling rationale for chained equations. BMC Medical
Research Methodology , 14 , 28. doi: 10.1186/1471-2288-14-28
Huque, M. H., Carlin, J. B., Simpson, J. A., & Lee, K. J. (2018). A comparison of
multiple imputation methods for missing data in longitudinal studies. BMC
Medical Research Methodology , 18 (1), 168. doi: 10.1186/s12874-018-0615-6
Kalton, G. (1983). Introduction to survey sampling. Beverly Hills, CA: Sage
Publications.
Kaplan, D. (1989). A study of the sampling variability and z-values of parameter
estimates from misspecified structural equation models. Multivariate Behavioral
Research, 24 (1), 41–57. doi: 10.1207/s15327906mbr2401 3
Kaplan, D. (2002). Methodological advances in the analysis of individual growth with
relevance to education policy. Peabody Journal of Education, 77 (4), 189–215.
Kish, L. (1965). Sampling organizations and groups of unequal sizes. American
sociological review , 30 (4), 564–572.
Kohli, N. (2011). Estimating unknown knots in piecewise linear-linear latent growth
mixture models (Unpublished doctoral dissertation). University of Maryland,
College Park, MD.
Koning, M., Hoekstra, T., de Jong, E., Visscher, T. L. S., Seidell, J. C., & Renders,
172
C. M. (2016). Identifying developmental trajectories of body mass index in
childhood using latent class growth (mixture) modelling: Associations with
dietary, sedentary and physical activity behaviors: a longitudinal study. BMC
Public Health, 16 (1), 1128.
Lee, D. Y., Harring, J. R., & Stapleton, L. M. (in press). Accounting for respondent
attrition in the longitudinal modeling of panel data. Journal of Experimental
Education.
Lee, S.-Y. (2007). Structural equation modeling: A Bayesian approach. Chichester,
England: Wiley.
Lee, S.-Y., & Song, X.-Y. (2003). Model comparison of nonlinear structural equation
models with fixed covariates. Psychometrika, 68 (1), 27–47. doi:
10.1007/BF02296651
Lee, S.-Y., & Tang, N.-S. (2006). Analysis of nonlinear structural equation models with
nonignorable missing covariates and ordered categorical data. Statistica Sinica,
16 (4), 1117–1141.
Li, M., Chen, N., Cui, Y., & Liu, H. (2017). Comparison of different LGM-based
methods with MAR and MNAR dropout data. Frontiers in Psychology , 8 . doi:
10.3389/fpsyg.2017.00722
Li, M., Harring, J. R., & Macready, G. B. (2014). Investigating the feasibility of using
Mplus in the estimation of growth mixture models. Journal of Modern Applied
Statistical Methods , 13 (1), 31.
Lin, M., Narayan, V., Drevets, W. C., Ye, J., & Li, Q. (2017). Application of growth
173
mixture modeling in antidepressant treatment response studies. Biological
Psychiatry , 81 (10), 224.
Little, R. J. A. (1992). Regression with missing X’s: A review. Journal of the
American Statistical Association, 87 (420), 1227–1237.
Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data.
Journal of the American Statistical Association, 88 (421), 125–134.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. John
Wiley & Sons.
Liu, M., Hancock, G. R., & Harring, J. R. (2011). Using finite mixture modeling to
deal with systematic measurement error: A case study. Journal of Modern
Applied Statistical Methods , 10 (1), 249–261.
Lord, F. M. (1955). Estimation of parameters from incomplete data. Journal of the
American Statistical Association, 50 (271), 870–876.
Lu, Z. L., Zhang, Z., & Lubke, G. (2011). Bayesian inference for growth mixture
models with latent class dependent missing data. Multivariate Behavioral
Research, 46 (4), 567–597.
Lubke, G., & Muthén, B. O. (2007). Performance of factor mixture models as a
function of model size, covariate effects, and class-specific parameters. Structural
Equation Modeling: A Multidisciplinary Journal , 14 (1), 26–47.
Lubke, G., & Neale, M. C. (2006). Distinguishing between latent classes and
continuous factors: Resolution by maximum likelihood? Multivariate Behavioral
Research, 41 (4), 499–532.
174
Lunn, D., Jackson, C., Best, N., Thomas, A., & Spiegelhalter, D. (2013). The BUGS
book. Boca Raton, FL: CRC Press.
Maddala, G. S. (1983). Limited dependent and qualitative variables in econometrics.
New York: Cambridge University Press.
Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the
National Institute of Science of India, 12 , 49–55.
McArdle, J. J. (1986). Latent variable growth within behavior genetic models.
Behavior Genetics , 16 (1), 163–200.
McGinniss, J., & Harel, O. (2016). Multiple imputation in three or more stages.
Journal of Statistical Planning and Inference, 176 , 33–51. doi:
10.1016/j.jspi.2016.04.001
McLachlan, G., & Krishnan, T. (2007). The EM algorithm and extensions (Vol. 382).
John Wiley & Sons.
McLachlan, G., & Peel, D. (2000). Finite mixture models. Hoboken, NJ: Wiley.
McNeish, D. (2016). Using data-dependent priors to mitigate small sample bias in
latent growth models: A discussion and illustration using Mplus. Journal of
Educational and Behavioral Statistics , 41 (1), 27–56. doi:
10.3102/1076998615621299
McNeish, D. (2017). Missing data methods for arbitrary missingness with small
samples. Journal of Applied Statistics , 44 (1), 24–39. doi:
10.1080/02664763.2016.1158246
McNeish, D., & Harring, J. R. (2017). The effect of model misspecification on growth
175
mixture model class enumeration. Journal of Classification, 34 (2), 223–248.
McNeish, D., & Matta, T. (2018). Differentiating between mixed-effects and
latent-curve approaches to growth modeling. Behavior Research Methods , 50 (4),
1398–1414. doi: 10.3758/s13428-017-0976-5
Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55 (1),
107–122.
Mistler, S. A., & Enders, C. K. (2017). A comparison of joint model and fully
conditional specification imputation for multilevel missing data. Journal of
Educational and Behavioral Statistics , 42 (4), 432–466. doi:
10.3102/1076998617690869
Morris, T. P., White, I. R., & Royston, P. (2014). Tuning multiple imputation by
predictive mean matching and local residual draws. BMC Medical Research
Methodology , 14 (1), 75. doi: 10.1186/1471-2288-14-75
Musu-Gillette, L. E., Wigfield, A., Harring, J. R., & Eccles, J. S. (2015). Trajectories
of change in students’ self-concepts of ability and values in math and college
major choice. Educational Research and Evaluation, 21 (4), 343–370.
Muthén, B. O. (2001). Latent variable mixture modeling. In G. A. Marcoulides &
R. E. Schumacker (Eds.), New developments and techniques in structural equation
modeling (pp. 1–33). Hillsdale, NJ: Lawrence Erlbaum Associates.
Muthén, B. O. (2004). Latent variable analysis: Growth mixture modeling and related
techniques for longitudinal data. The SAGE Handbook of Quantitative
Methodology for the Social Sciences , 345 , 368.
176
Muthén, B. O., & Curran, P. J. (1997). General longitudinal modeling of individual
differences in experimental designs: A latent variable framework for analysis and
power estimation. Psychological Methods , 2 (4), 371–402.
Muthén, B. O., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with
data that are not missing completely at random. Psychometrika, 52 , 431–462.
Muthén, B. O., & Muthén, L. K. (2000). Integrating person-centered and
variable-centered analyses: Growth mixture modeling with latent trajectory
classes. Alcoholism: Clinical and Experimental Research, 24 (6), 882–891.
Muthén, B. O., Muthén, L. K., & Asparouhov, T. (2010). Bayesian analysis using
Mplus.
Muthén, B. O., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes
using the EM algorithm. Biometrics , 55 (2), 463–469.
Muthén, L. K., & Muthén, B. O. (1998). Mplus user’s guide (Eigth ed.). Los Angeles,
CA: Muthén & Muthén.
Nagin, D. S. (1999). Analyzing developmental trajectories: a semiparametric,
group-based approach. Psychological Methods , 4 (2), 139.
Ning, L., & Luo, W. (2017). Class identification efficacy in piecewise GMM with
unknown turning points. The Journal of Experimental Education, 86 (2),
282–307. doi: 10.1080/00220973.2017.1301354
Nylund, K. L., Asparouhov, T., & Muthén, B. (2007). Deciding on the number of
classes in latent class analysis and growth mixture modeling: A Monte Carlo
simulation study. Structural Equation Modeling: A Multidisciplinary Journal ,
177
14 (4), 535–569.
Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review
of reporting practices and suggestions for improvement. Review of Educational
Research, 74 (4), 525–556.
Peugh, J. L., & Fan, X. (2012). How well does growth mixture modeling identify
heterogeneous growth trajectories? A simulation study examining GMM’s
performance characteristics. Structural Equation Modeling: A Multidisciplinary
Journal , 19 (2), 204–226.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63 (3), 581.
Rubin, D. B. (1986). Statistical matching using file concatenation with adjusted
weights and multiple imputations. Journal of Business and Economic Statistics ,
4 (1), 87–94. doi: 10.2307/1391390
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY:
Wiley.
Rubin, D. B. (2003). Nested multiple imputation of NMES via partially incompatible
MCMC. Statistica Neerlandica, 57 (1), 3–18.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art.
Psychological Methods , 7 (2), 147–177.
Schafer, J. L., & Yucel, R. M. (2002). Computational strategies for multivariate linear
mixed-effects models with missing values. Journal of Computational and
Graphical Statistics , 11 (2), 437–457. doi: 10.1198/106186002760180608
Scheuren, F. (2005). Multiple Imputation: How it began and continues. The American
178
Statistician, 59 (4), 315–319.
Shen, Z. (2000). Nested multiple imputation (Unpublished doctoral dissertation).
Harvard University, Department of Statistics, Cambridge, MA.
Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling
change and event occurrence. New York: Oxford University Press.
Song, X.-Y., & Lee, S.-Y. (2002). Analysis of structural equation model with ignorable
missing continuous and polytomous data. Psychometrika, 67 (2), 261–288. doi:
10.1007/BF02294846
Sterba, S. K. (2014). Handling missing covariates in conditional mixture models under
missing at random assumptions. Multivariate Behavioral Research, 49 (6),
614–632.
Sterba, S. K. (2016). Cautions on the use of multiple imputation when selecting
between latent categorical versus continuous models for psychological constructs.
Journal of Clinical Child and Adolescent Psychology , 45 (2), 167–175.
Team, R. D. C. (2008). R: A language and environment for statistical computing
(Tech. Rep.). Vienna, Austria: R Foundation for Statistical Computing.
Retrieved from http://www.R-project.org
Titterington, D. M., Smith, A. F. M., & Makovm, U. E. (1985). Statistical analysis of
finite mixture models. New York: Wiley.
Tofighi, D., & Enders, C. K. (2007). Identifying the correct number of classes in
growth mixture models. In G. R. Hancock (Ed.), Advances in Latent Variable
Mixture Models (pp. 317–341). Greenwhich: Information Age Publishing, Inc.
179
Tueller, S. J., Drotar, S., & Lubke, G. H. (2011). Addressing the problem of switched
class labels in latent variable mixture model simulation studies. Structural
Equation Modeling: A Multidisciplinary Journal , 18 (1), 110–131. doi:
10.1080/10705511.2011.534695
Tueller, S. J., & Lubke, G. H. (2010). Evaluation of structural equation mixture
models: Parameter estimates and correct class assignment. Structural Equation
Modeling: A Multidisciplinary Journal , 17 (2), 165–192. doi:
10.1080/10705511003659318
van Buuren, S. (2011). Multiple imputation of multilevel data. In J. J. Hox &
J. K. Roberts (Eds.), Handbook of Advanced Multilevel Analysis (pp. 173–196).
New York, NY: Routledge.
van Buuren, S., Brand, J. P., Groothuis-Oudshoorn, C. G., & Rubin, D. B. (2006).
Fully conditional specification in multivariate imputation. Journal of Statistical
Computation and Simulation, 76 (12), 1049–1064. doi:
10.1080/10629360600810434
Vickers, A. J. (2003). How many repeated measures in repeated measures designs?
Statistical issues for comparative trials. BMC Medical Research Methodology , 3 ,
22. doi: 10.1186/1471-2288-3-22
Wilks, S. S. (1932). Moments and distributions of estimates of population parameters
from fragmentary samples. Annals of Mathematical Statistics , 3 (2), 163–195.
Willett, J. B., Singer, J. D., & Martin, N. C. (1998). The design and analysis of
longitudinal studies of development and psychopathology in context: Statistical
180
models and methodological recommendations. Development and Psychopathology ,
10 (2), 395–426.
Winship, C., Mare, R. D., & Warren, J. R. (2002). Latent class models for contingency
tables with missing data. In J. A. Hagenaars & A. L. McCutcheon (Eds.),
Applied Latent Class Analysis. Cambridge University Press.
Wothke, W. (1993). Nonpositive definite matrices in structural modeling. In
K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp.
256–293). Newbury Park, CA: Sage.
Wothke, W. (2000). Longitudinal and multi-group modeling with missing data. In
T. D. Little, K. U. Schnabel, & J. Baumert (Eds.), Modeling longitudinal and
multiple group data: Practical issues, applied approaches and specific examples
(pp. 219–240). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Wu, A. D., Zumbo, B. D., & Siegel, L. S. (2011). General piecewise growth mixture
model: Word recognition development for different learners in different phases.
Journal of Modern Applied Statistical Methods , 10 (1), 226–248. doi:
10.22237/jmasm/1304223600
181