ABSTRACT
 Title of thesis: SELECTION OF FIXED AND RANDOM EFFECTS
 IN LINEAR MIXED EFFECTS MODELS WITH
 APPLICATIONS TO THE TRIAL OF ACTIVITY IN
 ADOLESCENT GIRLS
 Edward Grant, Master of Public Health, 2013
 Directed by: Professor Tong Tong Wu
 Department of Epidemiology and Biostatistics
 Linear mixed e ect (LME) models have become popular in modeling data in a
 wide variety of  elds, particularly in public health. These models are bene cial be-
 cause they are able to account for both the means as well as the covariance structure
 of clustered or longitudinal data. However, as studies are able to collect an increas-
 ing amount of data for large numbers of predictors, a major challenge has been the
 selection of only important variables to create a more interpretable, parsimonious
 model. Previous methods for LME models have been ine cient in variable selection,
 but three new methods attempt to select and estimate both important  xed and
 important random e ects simultaneously. The models are compared through anal-
 ysis of simulated longitudinal data. Additionally, as an example of the important
 applications to public health, the methods are applied to the Trial of Activity in
 Adolescent Girls (TAAG) study, to determine important predictors for Moderate to
 Vigorous Physical Activity (MVPA).
SELECTION OF FIXED AND RANDOM EFFECTS
 IN LINEAR MIXED EFFECTS MODELS WITH
 APPLICATIONS TO THE TRIAL OF ACTIVITY IN
 ADOLESCENT GIRLS
 by
 Edward Grant
 Thesis submitted to the Faculty of the Graduate School of the
 University of Maryland, College Park in partial ful llment
 of the requirements for the degree of
 Master of Public Health
 2013
 Advisory Committee:
 Professor Tong Tong Wu, Chair, Advisor
 Professor Shuo Chen
 Professor Brit Saksvig
Table of Contents
 List of Tables iii
 List of Figures iv
 List of Abbreviations v
 1 Introduction 1
 2 Background 4
 2.1 Linear Mixed-E ects Models for Longitudinal Data . . . . . . . . . . 4
 2.2 Penalization Methods for Selection of Fixed E ects . . . . . . . . . . 5
 2.2.1 Lasso Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
 2.2.2 Smoothly Clipped Absolute Deviation (SCAD) Penalty . . . . 10
 2.2.3 Review of Other Penalization Methods . . . . . . . . . . . . . 11
 2.3 Review of Methods for Selection of Random E ects . . . . . . . . . . 12
 2.4 Information Criteria for Model Selection . . . . . . . . . . . . . . . . 12
 2.5 Summary of Previous Model Selection Methods . . . . . . . . . . . . 13
 2.6 Trial of Activity in Adolescent Girls (TAAG) . . . . . . . . . . . . . . 14
 3 Methods 17
 3.1 Method 1 - Double Penalization . . . . . . . . . . . . . . . . . . . . . 17
 3.2 Method 2 - Joint Penalization . . . . . . . . . . . . . . . . . . . . . . 21
 3.3 Method 3 - Independent Selection with Proxy Matrix . . . . . . . . . 24
 4 Analysis of Simulated Data 28
 4.1 Simulation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
 4.2 Simulation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
 4.3 Simulation 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
 4.4 Simulation 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
 4.5 Simulation 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
 4.6 Simulation 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
 4.7 Simulation 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
 4.8 Summary of Simulation Studies . . . . . . . . . . . . . . . . . . . . . 35
 5 Real Data Analysis 37
 5.1 Data Description and Model Formuation . . . . . . . . . . . . . . . . 37
 5.2 Results of Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . 39
 6 Conclusion 41
 Appendix 44
 References 65
 ii
List of Tables
 1 Example of Model Selection Using Lasso . . . . . . . . . . . . . . . . 44
 2 Simulation 1 Results - Parameter Estimates . . . . . . . . . . . . . . 45
 3 Simulation 1 Results - Summary . . . . . . . . . . . . . . . . . . . . . 46
 4 Simulation 2 Results - Parameter Estimates . . . . . . . . . . . . . . 47
 5 Simulation 2 Results - Summary . . . . . . . . . . . . . . . . . . . . . 49
 6 Simulation 3 Results - Parameter Estimates . . . . . . . . . . . . . . 50
 7 Simulation 3 Results - Summary . . . . . . . . . . . . . . . . . . . . . 51
 8 Simulation 4 Results - Parameter Estimates . . . . . . . . . . . . . . 52
 9 Simulation 4 Results - Summary . . . . . . . . . . . . . . . . . . . . . 53
 10 Simulation 5 Results - Parameter Estimates . . . . . . . . . . . . . . 54
 11 Simulation 5 Results - Summary . . . . . . . . . . . . . . . . . . . . . 55
 12 Simulation 6 Results - Parameter Estimates . . . . . . . . . . . . . . 56
 13 Simulation 6 Results - Summary . . . . . . . . . . . . . . . . . . . . . 57
 14 Simulation 7 Results - Parameter Estimates . . . . . . . . . . . . . . 58
 15 Simulation 7 Results - Summary . . . . . . . . . . . . . . . . . . . . . 59
 16 TAAG 2 Data - Fixed E ects . . . . . . . . . . . . . . . . . . . . . . 60
 17 TAAG 2 Data - Random E ects . . . . . . . . . . . . . . . . . . . . . 62
 18 Reference Table for Fixed and Random E ects Predictors . . . . . . . 63
 iii
List of Figures
 2.1 Geometric Interpretation of the Lasso Penalization . . . . . . . . . . 7
 iv
List of Abbreviations
 AIC Akaike Information Criterion
 BIC Bayes Information Criterion
 EM Expectation-Maximization
 LARS Least Angle Regression
 LASSO Least Absolute Shrinkage and Selection Operator
 LME Linear Mixed E ects
 ML Maximum Likelihood
 MVPA Moderate to Vigorous Physical Activity
 PA Physical Activity
 PE Physical Education
 OLS Ordinary Least Squares
 OSCAR Octagonal Shrinkage and Clustering Algorithm for Regression
 REML Restricted Maximum Likelihood
 SCAD Smoothed Clipped Absolute Deviance
 TAAG Trial of Activity in Adolescent Girls
 v
Chapter 1
 Introduction
 Linear mixed-e ects (LME) models (Laird and Ware 1982) are statistical mod-
 els that are used in the analysis of clustered or longitudinal data. LME models es-
 timate the relationship between the dependent variable and the predictors included
 in the model, accounting for both the  xed e ects and the random e ects of the
 independent variables. Compared with linear regression models without considering
 clustering or temporal e ects, LME models are able to more accurately estimate the
  xed e ects by estimating the covariance structure through the inclusion individual-
 speci c random e ects. Ignoring the covariance structure has been shown to lead
 to biased estimates (Lange and Laird 1998).
 Improvements in technology have enabled researchers to collect and store data
 on an increasing number of predictors. However, inferences and predictions of an
 LME model that includes all predictors become too complex or infeasible as the
 number of predictors, all of which include  xed and random components, increases.
 One challenge in LME models is choosing a parsimonious model that selects only
 the signi cant covariates, while excluding variables that have no true e ect on the
 outcome. Many methods have been published on model selection, but three new
 methods have been introduced which, unlike many previous approaches, can esti-
 mate both  xed and random e ects simultaneously. First, a method developed by
 1
Li et al. (2012), optimizes a regularization problem with two separate penalization
 methods for  xed and random e ects. Next, Bondell et al. (2010) select and es-
 timate  xed and random e ects simultaneously by maximizing a jointly penalized
 regularization problem. Finally, Fan and Li (2012) use a proxy matrix to account
 for covariance structure in maximizing a penalized pro le likelihood for  xed and
 random e ects separately. These methods present more practical ways of selecting
 important  xed and random e ects of LME models compared to previous methods.
 The goal of this thesis will be to compare the e cacy and accuracy of these new
 variable selection methods though analysis of data simulation studies.
 Additionally, a comparison of these methods will be performed through a real
 public health data set. As an example of the power of these methods, we consider
 the study of the Trial of Activity for Adolescent Girls 2 (TAAG 2), which determined
 the predictors of physical activity among adolescent girls from 6 schools in Maryland
 (Young et al. 2013). Data for 65 multilevel variables from 551 girls were collected
 at two time points, 2006 and 2009, when the girls were enrolled in the 8th and 11th
 grade, respectively. Using traditional methods, building a parsimonious model for
 this data would be tedious and could introduce bias. However, this data can be
 analyzed e ciently using these methods to determine which variables truly have a
 relationship with the outcome variable of interest, moderate to vigorous physical
 activity (MVPA). This data analysis will demonstrate the important applications
 that these methods can have in the public health  eld.
 The rest of this thesis will proceed as follows. Section 2 will discuss previous
 methods used for variable selection and give an introduction to the TAAG trial.
 2
Section 3 will introduce the three methods used for variable selection in LME models.
 In Section 4, simulation studies will be performed to compare the e ectiveness of
 the three methods. Section 5 will carry out the analysis of the TAAG data set.
 Finally, in section 6, the strengths and weaknesses of the methods will be discussed,
 along with brief implications of their use on high-dimensional data.
 3
Chapter 2
 Background
 2.1 Linear Mixed-E ects Models for Longitudinal Data
 Consider a longitudinal study with n subjects. Each subject i, (i = 1; :::; n)
 has m observations. The number of observations can be generalized to mi so that the
 number of observations can vary across subjects, for a total of
 Pn
 i=1mi = N obser-
 vations. Suppose there are p covariates associated with the  xed e ects, X1; :::; Xp
 and q random e ects associated with the random e ects Z1; :::; Zq. For subject i,
 let xij be the vector of p predictors for  xed e ects for j 2 f1; : : : ; pg and zik be the
 vector of predictors for q random e ects for k 2 f1; : : : ; qg. For the outcome Y at
 observation t, the linear mixed-e ects model can then be written as:
 Yit = xTit + z
 T
 itbi +  ij
 where  is the p  1 parameter vector for  xed e ects, bi is the q  1 param-
 eter vector for random e ects, and  ij represents the error term which is inde-
 pendently and identically distributed to N(0;  2). The random e ects for each
 subject are independently and identically distributed to multivariate normal distri-
 bution MVN(0;  2 ), where  is the m  m covariance matrix. By combining
 Yi = (Yi1; :::; Yim)T ;XTi = (xi1; :::; xim); and Z
 T
 i = (zi1; :::; zim); the LME mode can
 4
be simpli ed to:
 Yi = Xi +Zibi +  i (2.1)
 where Xi is the m p design matrix for the  xed e ects and Zi is the m q design
 matrix for the random e ects for subject i. The error term  i is independently
 and identically distributed N(0;  2Im). The subject-speci c random e ects bi are
 independent of the population speci c  xed e ects  : It can be seen that  is
 associated with the  xed e ects and is used for predicting the mean, while b is
 associated with the random e ects and accounts for much of the variance in the
 model.
 2.2 Penalization Methods for Selection of Fixed E ects
 Other methods have become widely used for performing model selection more
 e ciently, notably penalized regression algorithms. Many di erent penalization
 methods have been used to estimate parameters in a regression model with outcome
 yi, design matrix Xi, and coe cients  . The penalization methods reduce the num-
 ber of dimensions of the model by setting the coe cients of unimportant predictors
 to 0, leading to a more parsimonious and interpretable model. For parameter vector
  = ( ;  1; : : : ;  p), penalization methods can be summarized as
 min f( ) = g( ) + P ( ); (2.2)
 5
where g( ) is a loss function and P ( ) is a penalty function on  with tuning
 parameter  .
 2.2.1 Lasso Penalty
 When there are a large number of parameters to be estimated, the \least
 absolute shrinkage and selection operator," or lasso (Tibshirani 1996), is a commonly
 used regularization method to reduce the number of parameters. A simple example
 is lasso penalized linear regression, in which the penalized objective function (2.2)
 can be written as:
 NX
 i=1
 (yi  
X
 j
  jxij)
 2 +  
X
 j
 j jj; (2.3)
 where  > 0. Lasso applies a constraint on the residual sum of squares when  nding
 the Ordinary Least Squares (OLS) estimate. Placing this constraint is equivalent
 to adding the penalty term  
Pp
 j=1j jj to the OLS estimates. The regularization
 problem (2.3) can then be rewritten as, in dual formation, the sum of the absolute
 value of the coe cients constrained to less than a certain tuning parameter t  0:
 min
 (
 NX
 i=1
 (yi  
X
 j
  jxij)
 2
 )
 ; subject to
 X
 j
 j jj t;
 When t is large, the magnitude of the constraint placed on the estimates is minimal,
 resulting in solutions close to the OLS estimates. When t is small, the constraint
 placed on the solution causes shrinkage in the OLS estimates towards zero. Equiv-
 alently, a large  corresponds to a small t, resulting in a larger penalty on the
 6
Introduction to High-Dimensional Penalized Methods Coordinate Descent Algorithms Applications in High-Dimensional Data Analysis Discussion References
 Lasso vs. Ridge
 In `2 regression with lasso and ridge penalties
 Lasso solution: the contours touch the square, occurring at a corner
 , a zero coe cient
 Ridge solution: no corners to hit , no zero solutions
 13 / 64
 Figure 2.1: Geometric Interpretation of the Lasso Penalization (Hastie et al. 2009).
 The  gure on the left is the Lasso Penalty, while the  gure on the right is the Ridge
 Penalty. The shaded region represents the constraint region for the coe cients. The
 contours represent the area the objective function has coe cient values.
 parameters. As the value of  increases, a greater magnitude of shrinkage will be
 placed on the coe cient estimates. An advantage of the lasso is its ability to not
 only shrink these coe cient estimates towards the origin, but also to perform model
 selection of the important variables by setting unimportant coe cients to exactly
 zero.
 Figure 2.1 represents a simple problem with two coe cients,  1 and  2, show-
 ing a geometric interpretation of the lasso (left) compared with the ridge penalty
 (right). The ridge penalty, where P ( ) =  
Pp
 j=1  
2
 j , is a method that is known to
 only shrink coe cients without setting any to zero. Similar to the lasso, this penalty
 is equivalent to a constraint region for the estimation of the parameters, given by:
  ^ridge = arg min
 (
 NP
 i=1
 (yi  
P
 j
  jxij)2
 )
 , subject to
 P
 j
  2j  t;
 However, the shaded regions in the  gure show that the penalties result in di erent
 7
shapes for their constraint regions. The region for the ridge?s penalty P ( ) =
  ( 21 + 
2
 2) is represented as a circular shape, while the lasso?s P ( ) =  (j 1j+j 2j) is
 represented as a diamond shape with corners at 0. The elliptical contours correspond
 to the quadratic form of the loss function
 PN
 i=1(yi 
P
 j  jxij)
 2, where the center of
 the contour shape is the non-penalized OLS solution for  1 and  2. The minimum
 of the sum of g( ) + P ( ) will produce the optimal value of  1 and  2, and this
 minimum occurs at the point of intersection of the contours and the shaded region.
 In the  gure, the solution on the left shows that  1 will to be set to 0. In con-
 trast, in the ridge plot, it can be seen that this will almost never happen due to the
 rounded shape of both the loss function and of penalty function. This demonstrates
 the important bene t of the lasso penalty: due to the shape of its constraint region,
 it is more likely to eliminate unimportant predictors and perform model selection
 (Hastie et al. 2009).
 A simple example of the model selection attribute of the lasso can be seen in
 Table 1, which compares estimates of six predictors using ordinary least squares,
 estimates after ridge penalization, and estimates after lasso penalization with the
 true model of three predictors. For sample size n = 30 the true model is:
 y =  0 + 0:5X1 + 1X3 + 0:5X4 +  ;
 where  0 = 1:5; X  N(0; 1); and   N(0; 1). The true predictors in the model
 are X1; X3; and X4, while X2; X5, and X6 are noise variables. While the least
 squares o ers estimates of the coe cients, the unimportant predictors, as expected,
 8
remain in the model with small, nonzero coe cients. The ridge penalty shrinks the
 estimates compared to the least squares estimates, but fails to eliminate the noise
 variables. Conversely, the lasso is able to eliminate the false predictors X2; X5; and
 X6, while retaining the true predictors X1; X3; and X4. This example shows that
 using the lasso provides a more e cient way to perform model selection. However,
 it is important to note that the lasso penalty often produces biased estimates, since
 all coe cients, even important ones, shrink towards the origin. The magnitude of
 selected lasso coe cients will be underestimated due to this shrinkage.
 An extension of the lasso, the adaptive lasso (Zou 2006), seeks to minimize
 this bias. The adaptive lasso applies a weight  w to the lasso penalization, seeking
 to minimize:
 PN
 i=1(yi  
P
 j  ixij)
 2 +  
P
 j  wj jj
 where  w = 1=j ^j and  ^ is usually the ordinary least squares estimate and  > 0. It
 can be seen that, with this weight,  j?s with small values will be further penalized
 towards 0, further reducing the number of parameters in the model. Conversely,
 large and important coe cients will be minimally penalized. The adaptive lasso
 has been shown to have have the oracle properties de ned by Fan and Li (2001):
 the adaptive lasso consistently selects true variables in a known model and has
 asymptotic normality.
 9
2.2.2 Smoothly Clipped Absolute Deviation (SCAD) Penalty
 Consider the penalized regression problem (2.2). Due to the biased results
 of lasso, Fan and Li (2001) sought to create a penalty function that gave unbiased
 and sparse results, and was a continuous function. The Smoothly Clipped Absolute
 Deviation (SCAD) penalty for coe cient vector  by its derivative
 P 0 ( ) =
 8
 >>>>>>>><
 >>>>>>>>:
 sgn( ) ; if x < 0
 sgn( ) (a  j ja 1 ; if  < j j a :
 0; if j j> a 
for a > 2 and  > 0. The resulting penalty function is:
 P 0 ( ) =
 8
 >>>>>>>><
 >>>>>>>>:
  j j; if x < 0
  ( 2 2a j j+ 2)
 2(a 1) ; if  < j j a :
 (a+1) 2
 2 ; if j j> a 
The penalty function is a quadratic spline function, dependent on two tun-
 ing parameters a and  . The results were found to be relatively insensitive to the
 parameter a. Through cross-validation, a = 3:7 was found give satisfactory re-
 sults consistently. The second tuning parameter  can also be found through cross
 validation for given data.
 The advantages of the SCAD penalty are that it is continuous at all points
 except 0 and that it produces results with low bias. Similar to the adaptive lasso,
 10
when  is large, there is little penalization compared to when  is small. This will
 ensure that unimportant variables will be penalized heavily while leaving important
 variables relatively unpenalized. The SCAD penalty particularly outperforms the
 lasso penalty in selecting important variables while eliminating unimportant vari-
 ables when the variance of the data is large. Additionally, the SCAD penalty was
 also shown to have oracle properties.
 2.2.3 Review of Other Penalization Methods
 Similar to the lasso and SCAD regression, there have been many penalized reg-
 ularization methods to select and estimate  xed e ects. Zou and Hastie (2005)
 created the elastic net, which combines lasso and Ridge penalization methods and
 proposed the algorithm to solve the elastic net e ciently. Bondell and Reich (2008)
 created the octagonal shrinkage and clustering algorithm for regression (OSCAR).
 This method has the ability to select important variables among a set of highly cor-
 related predictors. However, these methods for selection of  xed e ects do not take
 both the correlation and covariance structure of the random e ects into account.
 Ignoring or underestimating covariance structure can lead to biased results of the
 variance estimates for the ordinary least squares of  xed e ects (Lange and Laird
 1998).
 11
2.3 Review of Methods for Selection of Random E ects
 In order to select important random e ects in a model, Stram and Lee (1994)
 discuss the use of likelihood ratio tests in testing for nonzero variance components
 linear mixed e ects models. Lin (1997) proposed a global score test to test the null
 hypothesis that all variance components were equal to 0, then individual score tests
 are determined for each random e ect and estimation of the variance components
 can be made. Hall and Praestgaard (2001) place constraints on the score tests
 to select and estimate important random e ects. Chen and Dunson (2003) use a
 Cholesky decomposition of the covariance matrix and Bayesian methods to select
 and estimate random e ects variances. Foster et al. (2009) develop a lasso-based
 method for selection of random e ects. These methods, however, only consider the
 random e ects and do not have the ability to select or estimate the  xed e ects.
 2.4 Information Criteria for Model Selection
 Information criteria methods are among the most popular methods of model
 selection, as they have the ability to select both  xed and random e ects in a
 linear mixed e ects model. Two of these methods are Akaike Information Criteria
 (AIC, Akaike 1973) and Bayes Information Criteria (BIC, Schwartz 1978). For these
 methods, the likelihood L is found for every combination of parameters in the model.
 From there, a penalization for the number of parameters in the model is added to
  2 lnL. The goal is to  nd the minimum of the following:
 AIC =  2 lnL+ 2k
 12
BIC =  2 lnL+ k ln(n)
 where k is the number of parameters, and n is the number of observations. The BIC
 method places a larger penalization on the number of parameters than AIC. The
 information criteria seek to  nd a balance between creating a model with good  t
 for the data and with having a small set of parameters.
 While these methods are e ective in choosing a parsimonious set of  xed and
 random e ects that give the best  t, they can be burdensome as the number of
 possible parameters increases. For p  xed e ect parameters and q random e ect
 parameters, the number of models that need to be compared for information criteria
 is 2p+q. As p and q increase, the number of possible models increases exponentially.
 Thus, for a large number of predictors, AIC and BIC are ine cient methods of
 variable selection.
 2.5 Summary of Previous Model Selection Methods
 While these methods are e ective in performing model selection, they are not
 ideal for use in LME models. The penalization methods for  xed e ects do not
 take the random e ects into account, and can lead to inaccurate estimates. The
 random e ects methods do not consider the  xed e ects. Information criteria can
  nd models with  xed and random e ects, but are extremely ine cient for problems
 with a large number of predictors. Recently, there have been other methods used to
 select both  xed and random e ects more e ciently. Jiang et al. (2008) consider a
 method to select models with important predictors that doesn?t rely on minimizing
 13
a criterion function. In this method, a statistical "fence" is created to eliminate
 incorrect models. From there, an optimal model is selected from the remaining
 models on the right side of this fence. This method eases the burden that is common
 with the information criterion methods.
 Three new methods have been created in recent years to simultaneously select
  xed and random e ects. The latter sections of this thesis will describe and evaluate
 these three methods.
 2.6 Trial of Activity in Adolescent Girls (TAAG)
 Physical inactivity has been identi ed as a risk factor for obesity, or a high
 percentage of body fat, especially in adolescents (Pietil ainen et al. 2008). The
 prevalence of childhood overweight has been increasing drastically in the United
 States (Ogden et al. 2002). The resulting health problems that can occur from
 physical inactivity and obesity, such as type 2 diabetes, high blood pressure, and
 sleep disorders, have been on the rise in children and adolescents in recent years
 (Daniels et al. 2005). It has been recommended by the Council on Sports Medicine
 and Fitness and Council on School Health (2006) that increasing physical activity
 in children and adolescents can be e ective in reducing the prevalence of obesity
 and the resulting health problems later on in life.
 Among black and white girls, physical activity declines as a child ages through
 adolescence (Kimm et al. 2002). This decline in physical activity is more prevalent
 in girls than in boys (Sallis et al. 1996). Previous school-based interventions targeted
 14
at boys and girls have not been extremely successful. The Trial of Activity in Ado-
 lescent Girls (TAAG) was a school and community-based, multisite, interventional
 trial targeted at girls in order to lessen the typical declines in physical activity. The
 study was conducted at 26 sites across six geographically diverse areas in the United
 States, consisting of California, Minnesota, Maryland, Louisiana, South Carolina,
 and Arizona. Data was collected at two time points in the spring of 2003, for girls
 in the 6th grade and in the spring of 2005, when the girls were in 8th grade. The
 intervention and control groups were assigned randomly in 2003. The program was
 designed to create environments in schools and the surrounding community that
 encouraged physical activity and to give cues or messages that incentivize physical
 activity (Webber et al. 2008). The purpose of the intervention was to reduce the
 declines in Moderate to Vigorous Physical Activity (MVPA) that normally occurs
 in adolescent girls.
 As a part of the TAAG study, data was collected to assess the sustainability
 of the program in the spring of 2006 in a new group of 8th grade girls. In the
 spring of 2009, followup data was collected from only the girls at the six Maryland
 TAAG sites for the Trial of Activity in Adolescent Girls 2 (TAAG 2). This data
 was collected when the 2006 8th grade group was in 11th grade. The purpose of
 the TAAG 2 study was determining factors at the individual, social, school, and
 neighborhood levels that may in uence levels of MVPA in adolescent girls (Young
 et al. 2013). The analysis in this paper will use data from only these 2006 and 2009
 time points. The main outcome of interest for this paper will be average MVPA
 minutes per day in the TAAG 2 study. The goal will be to select important  xed
 15
and random e ects of interest from the TAAG data. The methods will be discussed
 in the latter sections of this thesis.
 16
Chapter 3
 Methods
 3.1 Method 1 - Double Penalization
 This section describes the method created by Li et al. (2012). It selects and
 estimates for the parameters of an LME through a regularization problem with two
 penalization functions: one for the  xed e ects and one for the random e ects.
 Model Consider the LME given by equation (2.1) that is standardized to have
 a mean equal to 0 and a Euclidian norm equal to 0. The  xed e ect intercept is
 removed from the model, but a random intercept bi0 remains in the model. The
 mean and variance of Yi are E(Yi) = Xi and V ar(Yi) =  2(Zi ZTi + Im).
 Maximum Likelihood Estimation For N > p, a modi ed log-likelihood incor-
 porates the restricted log-likelihood (Harville 1974). To maximize for  ,  , and
  2,
 ?nM( ; ; 2) =  
1
 2
 nX
 i=1
 logj 2Vij 
1
 2
 log   2
 nX
 i=1
 XTi V
  1
 i Xi
  1
 2 2
 nX
 i=1
 (Yi  Xi )TV  1i (Yi  Xi ); (3.1)
 where Vi = Im+Zi ZTi , the covariance structure of Yi: When N  p, the restricted
 term in (3.1) becomes singular. Therefore, when N  p, the following full log-
 17
likelihood to be maximized is:
 ?nF ( ; ;  2) =  
1
 2
 nX
 i=1
 logj 2Vij 
1
 2 2
 nX
 i=1
 (Yi  Xi )TV  1i (Yi  Xi )(3.2)
 Objective Function and Penalization A general formula for the objective func-
 tion can be written:
 Qn( ; ;  2) = ?n( ; ;  2)  1P1( )  2P2( )
 where P1 is the penalization for the  xed e ects, P2( ) is the penalization for the
 random e ects, and  1 and  2 are their non-negative tuning parameters, respectively.
 The log likelihood ?n( ; ;  2) will be (3.1) or (3.2) when N > p and N  p,
 respectively.
 For the  xed e ects, an adaptive L1-norm, or adaptive lasso, penalty J1( ) is
 applied (Zou 2006), where
 P1( j) =
 pX
 j=1
 wjj jj;
 where wj = 1j j j is a weight given by dividing by the estimated coe cient.
 For the random e ects,  rst, a Cholesky Decomposition is used to break  
into  = LLT . The Cholesky factor of  , L, is a unique lower triangular matrix
 with positive diagonals. Penalization will then be performed on L. For any given
 k 2 f1; : : : ; qg,  nding a nonzero row (k) in L, or L(k) (and therefore the nonzero
 diagonal element  kk), will select the corresponding random e ect bk. Conversely,
 if L(k) is equal to 0, then the corresponding  kk will equal 0, e ectively removing
 18
the kth random e ect from the model. An adaptive weight is added to an L2-norm
 penalty and this penalty is applied to the random e ects, shrinking towards the
 coe cients toward zero:
 P2(L) =
 qX
 k=2
 wk
 q
 L2k1 + :::L
 2
 kq;
 where wk = 1jjL(k)jj is the weight given by dividing by the norm of the estimated
 coe cient. Again, this adaptive weight will help shrink small coe cients further
 towards zero, while leaving the important predictors unpenalized.
 Algorithm First,  2 can be estimated (Lindstrom and Bates 1988) by:
  ^2 =
 1
 N  p
 nX
 i=1
 (Yi  Xi )TV  1i (Yi  Xi );
 for N > p and
  ^2 = 1N
 nP
 i=1
 (Yi  Xi )TV  1i (Yi  Xi );
 for N  p. By inserting the estimated  ^2 into (3.1) and (3.2), respectively, the
 objective function can then be solved for one fewer parameter.
 The algorithm of estimating  and L is done in iterations, through maximizing
 the simpli ed objective function:
 Qn( ;L) = PR( ;L)  1
 pX
 j=1
 j jj   2
 pX
 j=1
 q
 L2k1 + :::+ L
 2
 kq
 where PR( ;L) is the updated log-likelihood functions (3.1) or (3.2) with  ^2 substi-
 tuted into the equation.
 19
The algorithm updates in iterations between two quadratic components until
 convergence. First, L is  xed and  is estimated, then  is  xed and L is updated.
 This is repeated until convergence. The step when L is  xed is similar to a lasso
 problem. However, when  is  xed, the random component must be split into two
 problems, estimating L and new parameter  , which is updated from L estimates.
 The algorithm is completed as follows:
 1. Initialize the parameters  (0), L(0),  (0)
 2. Update Lkj for iteration r by  nding the maximum of the  rst quadratic
 component:
 L(r)kj = arg maxLkj
 PR( (r 1);L) 
 22
 4
 qX
 k=1
 1
 ( (r 1)k )
 2
 kX
 j=1
 L2kj
 3. Update  k:
  (r)k =
 r
  2
 2
 jjL(r)k jj2
 4. Update  , using the LARS algorithm (Efron et al. 2004), the second
 quadratic component.
 5. If the di erence between L(r)kj and L
 (r 1)
 kj and between  
(r)
 j and  
(r 1)
 j are
 smaller than a speci ed amount, usually 10 5, then the algorithm can
 end and estimates are obtained. If not, the process from step 2 can be
 continued for iteration r + 1:
 20
3.2 Method 2 - Joint Penalization
 This section describes the method introduced by Bondell et al. (2010). This
 method simultaneously selects and estimates  xed and random e ects in an LME
 model using one joint penalization function for  xed and random e ects.
 Model Consider the LME model in equation (2.1). Using a modi ed Cholesky
 Decomposition (Chen and Dunson 2003), the covariance matrix  is factorized as
  = D  0D
 where  is a q q lower triangular matrix with 1?s on the diagonal and whose (l; r)th
 element is given by  lr and D = diag(d1; d2; : : : ; dq) is a diagonal matrix. After this
 decomposition, the LME model can be written
 Yi = Xi +ZiD bi +  i (3.3)
 where it is now assumed that Yi has been centered so that XTi Xi and Z
 T
 i Zi rep-
 resent correlation matrices and bi is independently and identically distributed to
 MVN(0;  2Im). The covariance matrix of bi is now expressed in terms of vector
 d = (d1; d2; : : : ; dq)T and of the free elements of  , denoted by vector  = ( lr : l =
 1; : : : ; q : r = l + 1; : : : ; q)T . Setting any dl = 0 will set the corresponding lth row
 and column of the covariance matrix  to 0 and therefore remove the lth random
 e ect from the model.
 21
In the new model in (3.3), Yi follows a normal distribution with mean Xi 
and variance Vi =  2(ZiD  TDZi + Im).
 Maximum Likelihood Estimation Given Y and by treating bi as observed,
 the log-likelihood function for the LME model is:
 ?F ( ;d; jY ; b) =  
N  nq
 2
 log  2  1
 2 2
 jjY  Xi  Zi eDe bjj2+bTb (3.4)
 where Z is a block diagonal matrix of Zi, and eD = In D and e = In D, where
  is the Kronecker product.
 Objective Function and Penalization By minimizing the jjY  Xi  Zi eDe bjj
 term in (3.4), the log-likelihood will be maximized. Therefore the objective function
 is:
 Qn( ;d; jY ; b) = jjY  Xi  Zi eDe bjj2+P 1( ;d)
 where P 1( ;d) is chosen to be an adaptive lasso penalty function with tuning
 parameter  1 such that:
 P 1( ;d) =  1
  Pp
 j=1
 j j j
 j ^j j
 +
 Pk
 k=1
 jdkj
 jd^kj
  
where  ^ and d^ are the ordinary least squares estimates. Rearranging the terms, the
 joint penalized objective function to be minimized in the algorithm is:
 QF ( ;d; jY ; b) = jjY  X  ZDiag(e b)(1q  In)djj2+ 1
  
pP
 j=1
 j j j
 j ^j j
 +
 kP
 k=1
 jdkj
 jd^kj
 !
 where 1q is a q  1 column vector of 1?s.
 22
Algorithm To solve for  ;d, and  , the expectation-maximization (EM) algo-
 rithm (Laird and Ware 1982) is used. The algorithm consists of two steps. First,
 the conditional expectation of QF ( ;d; jY ; b) is taken (E-Step), then the objective
 function is minimized (M-Step) with respect to ( T ;dT ; T )T . The overall process
 is as follows:
 1. Let  = ( T ;dT ; T )T , the vector of parameters and  (r) be the estimate
 of parameters at the rth step. For r = 0, the REML estimates are chosen
 for the parameters
 2. For the rth step,  rst take the E-step, or  nd the conditional expectation
 of the objective function, assuming the random e ects are unobserved:
 g( j (r)) = Ebjy; (r)
 n
 jjY  X  ZDiag(e b)(1q  In)djj2
 o
 +
  1
  
pX
 j=1
 j jj
 j  jj
 +
 kX
 k=1
 jdkj
 j  dkj
 !
 3. Complete the M-step by minimizing g( j (r)) with respect to  . This is
 completed by iterating between  and ( ; d).
 4. The process is completed for step r + 1 at step 2, unless convergence has
 occurred.
 23
3.3 Method 3 - Independent Selection with Proxy Matrix
 This section describes the method created by Fan and Li (2012). This method
 solves for the  xed e ects  and the random e ects b separately. A proxy matrix
 is substituted for the unknown true covariance structure during the selection and
 estimation of  xed and random e ects.
 Model Consider the model in (2.1). By stacking Yi;Xi; bi; and  i, notate Y ;X; b,
 and  . Let Z = diagfZ1; :::;Zng and ~ = diagf ; :::; g be block diagonal matri-
 ces. The  xed e ect predictors X are standardized so that each column has norm
 p
 n. The LME model becomes:
 Y = X +Zb+  
Maximum Likelihood Estimation of Fixed E ects The MLE for  xed e ects
 can be found by maximizing the joint density function of Y and b:
 f(y; b) = (2  ) (n+qm)=2j~ j 1=2
  exp
  
 1
 2 2
 (y  X  Zb)T (y  X  Z  1
 2
 bT ~  1b
  
Expressing the MLE for b in terms of a given  , is b^( ) = Bz(Y  X ), where
 Bz = (ZTZ +  2 ~  1) 1ZT . By inserting b^( ), the MLE for b in terms of  , the
 likelihood function for the  xed e ects  can be expressed as:
 ?n( ; b^( )) = exp
  
 1
 2 2
 (Y  X )TPz(Y  X )
  
(3.5)
 24
where Pz = (I  ZBz)T (I  ZBz) +  2BTz ~  1Bz. Finding the  that maximizes
 (3.5) will give the  xed e ects solution.
 Objective Function and Penalization for Fixed E ects A general formula
 for the objective function of  xed e ects is written:
 Qn( ) =
 1
 2
 (Y  X )TPz(Y  X ) + n
 pX
 j=1
 P 1(j j) (3.6)
 where the goal is to minimize Qn( ). It is required that the penalty function P 1(j j)
 is concave and increasing, so a smoothly clipped absolute deviation (SCAD, Fan and
 Li 2001) penalty function is chosen with tuning parameter  1 > 0.
 Proxy Matrix for Fixed E ects Because Pz is dependent on the unknown
 covariance matrix ~ and unknown variance  2, a proxy matrix ePz = (I +ZMZT )
 is substituted for Pz withM = log(N)I. Using thisM, the proxy matrix Pz satis es
 a condition of decreasing minimal signal decay strength as sample size increases. It
 also satis es constraints placed on the proxy matrix ePz to ensure that the model
 selection has the oracle property. With this proxy matrix substituted into (3.6), the
 optimization problem becomes a quadratic problem which can be solved using the
 LARS algorithm (Efron et al. 2004).
 Objective Function and Penalization for Random E ects The number
 of random e ects q is allowed to increase with sample size n. For Px = I  
X(XTX) 1XT , the objective function for the random e ects is
 25
Qn(b) = (y Zb)TPx(y Zb) +  2bT ~ +b
 where ~ + is the Moore-Penrose generalized inverse of ~ . Adding a penalty, the
 regularization problem is created:
 Qn(b) =
 1
 2
 (y Zb)TPx(y Zb) +
 1
 2
  2bT ~ +b+ n
 qnX
 k=1
 P 2(bk) (3.7)
 where P 2(bk) is the SCAD penalty function with parameter  2. In reality, the
 covariance matrix ~ and the variance  2 are unknown, so again, a proxy matrixM
 is substituted for   2 ~ so the regularization problem becomes:
 Qn(b) = 12(y Zb)
 TPx(y Zb) + 12 
2bTM 1b+ nPqnk=1 P 2(bk)
 Minimizing this equation gives an estimate of the random e ects parameter vector
 b^. Note that once the proxy matrix is substituted into the objective function, this
 method does not require knowledge of the  xed e ect parameter  .
 Proxy Matrix for Random E ects Again, M = (log n)I is chosen to satisfy
 constraints placed on the proxy matrix. Substituting this proxy matrix into (3.7)
 creates a quadratic optimization problem similar to the the adaptive elastic net (Zou
 and Hastie 2005). This allows the problem to be solved using existing quadratic
 algorithms.
 It should be noted that using (log n)I ignores correlations among the random
 e ects, which could introduce bias into the estimation of the covariance matrix.
 However, although there may be a biased covariance matrix estimate, it avoids
 errors caused by estimating a large number of parameters. The authors argue that
 26
the overall error caused by the accumulation of these errors from each parameter
 estimate would give poorer results than by using the proxy matrix.
 27
Chapter 4
 Analysis of Simulated Data
 Experiments of six simulated data situations will be conducted to compare
 the e ectiveness and accuracy of the three methods. All of the simulations will
 represent data sets where the number of observations N are greater than the number
 of predictors p. For all methods, tuning parameters are chosen through grid search to
  nd the  ?s that result in the lowest BIC. Each simulation consisted of 50 replicates.
 4.1 Simulation 1
 Setting This simulation generates a small study population of n = 30 clusters
 with m = 5 observations within each cluster, for observation l 2 f1; : : : ;mg. There
 are 10 predictors in consideration, of which only four are important  xed e ects and
 three of which are important random e ects. The random e ects will be selected
 from the same 10 predictors as the  xed e ects, so p = 10 and q = 10. The true
 model is given by:
 yil = (1 + bi0) + (3 + bi1l)xi1l + (1:5 + bi2l)xi2l + (2 + 0)xi5l + (2 + bi10l)xi10l (4.1)
 28
with xijl  N(0; 1) and Corr(xijl; xijl0) = 0:5l l0 . The random e ects (bi0l; bi1l; bi2l; bi10l)
 are generated from MVN(0;  2R), with  = 0:8 and
 R =
 0
 B
 B
 B
 B
 B
 B
 B
 B
 B
 B
 @
 1:0 0:5 0:3 0:2
 0:5 1:0 0:5 0:3
 0:3 0:5 1:0 0:5
 0:2 0:3 0:5 1:0
 1
 C
 C
 C
 C
 C
 C
 C
 C
 C
 C
 A
 (4.2)
 Result Simulation 1 results are in Tables 2 and 3. All methods correctly select all
 true  xed e ects 100% of the time, with the exception of  2 in Method 3, which was
 correctly selected 98 percent of the time. Only Method 1 selects all true random
 e ects 100 percent of the time. Methods 2 and 3 still perform well, selecting the
 all correct random e ects 92.67 percent and 70.67 percent of the time, respectively.
 Method 3 eliminates predictors the most, resulting in the smallest average model
 sizes for both  xed and random e ects. In fact, Method 3?s average model size is
 consistently less than the true model size, so when using Method 3, it is probable
 that true random e ects are not selected.
 4.2 Simulation 2
 Setting Simulation 2 has the same true LME equation as Simulation 1, seen in
 (4.1). However, this will be a a larger data set, where m = 8 observations within
 n = 200 clusters for p = 100 and q = 50 predictors. There are still only four
 important  xed e ects and three important random e ects, as in equation (4.1).
 29
All  i = 0 for  > 10. The random e ects are chosen from the  rst 50  xed e ects
 xij; where j = f1; : : : ; 50g, so p = 100 and q = 50.
 Results The results of simulation 2 can be found in Tables 4 and 5. Method 2 was
 unable to complete analysis due to lack of memory, resulting in error "Error: cannot
 allocate vector of size 793.8 Mb." Method 3 was unable to complete analysis for the
 random e ects, running for hours and then force closing MATLAB without results.
 Method 1 was able to select the true  xed and random e ects in 100 percent of the
 simulations, while Method 3 selected the true  xed e ects 100 percent of the time.
 Method 1 performed well at eliminating random e ects, including false predictors
 in the model only 0.469 percent of the time. Both methods were able to eliminate
 noise variables well, only selecting false  xed e ects less than one percent of the
 time.
 4.3 Simulation 3
 Setting This simulation generates a small study population of n = 60 clus-
 ters with m = 3 observations taken at within the cluster, with observation l 2
 f1; : : : ;mg: Xijl and Zikl are generated from N(0,1) with Corr(Xijl; Xijl0) = 0:5l l0
 and Corr(Zikl; Zikl0) = 0:8l l
 0
 . Random e ects bikl are generated fromMVN(0;  2R),
 where  = 0:8 and
 R =
 0
 B
 B
 B
 B
 B
 B
 @
 1:0 0:5 0:3
 0:5 1:0 0:5
 0:3 0:5 1:0
 1
 C
 C
 C
 C
 C
 C
 A
 (4.3)
 30
There are p = 10 predictors for  xed and q = 5 predictors for random e ects to
 choose from, with four true  xed e ects and two true random e ects. The true
 model is given by:
 yil = [1 + 3xi1l + 1:5xi2l + 2xi5l + 2xi10l] + [bi0l + bi1lzi1l + bi5lzi5l] (4.4)
 Results For  xed and random e ects selected from di erent sets of predictors, the
 simulation 3 results are found in Tables 6 and 7. Again, all methods perform well
 when selecting  xed e ects, correctly keeping true  xed e ects 100 percent of the
 time. The methods do not perform as well selecting random e ects as in simulation
 1, but still correctly select true variables, on average, more than 70 percent of the
 time. Method 1 performs the best in this regard at 83 percent. Methods 2 and 3
 both eliminate random predictors more heavily, resulting in average random model
 sizes that are less than the true model size.
 4.4 Simulation 4
 Setting Simulation 4 has the same true LME equation as Simulation 3, seen in
 (4.4). However, this will be a a larger data set, where n = 600 clusters with m = 3
 observations within the clusters for p = 50 and q = 10 predictors. There are still
 only four important  xed e ects and two important random e ects, as in equation
 (4.4). All  i = 0 for i > 10. All bi = 0 for i > 5.
 31
Results For larger data sets, simulation 4 results are found in Tables 8 and 9.
 Again, due to computational limitations, Method 2 was unable to complete both
  xed and random e ects, while Method 3 was unable to complete the analysis for
 the random e ects. Methods 1 and 3 selected the true  xed e ects 100 percent of
 the time. Method 3 was able to eliminate false  xed e ects 100 percent of the time.
 Method 1 e ectively selected true random e ects an average of 100 percent of the
 time, while eliminating false random e ects almost 94 percent of the time.
 4.5 Simulation 5
 Setting This simulation generates a small study population of n = 60 clusters with
 observations taken at m = 3 points within the clusters (observation l 2 f1; : : : ;mg).
 However, this will simulate a multilevel study, where the  xed e ects are selected at
 the individual level and the random e ects are selected from group level predictors.
 At the individual level, X ijl is generated from N(0,1) with Corr(Xijl; Xijl0) = 0:5l l
 0
 .
 At the group level, the predictors Zikl are nested within g = 6 groups, so all members
 of group g will have the same set of responses Zgl. Zgkl is generated from N(0,1) with
 Corr(Zgkl; Zgkl0) = 0:8l l
 0
 . Random e ects bigkl are generated from MVN(0;  2R),
 where  = 0:8 and R is equation 4.3 above. The random e ects have a subject-
 speci c intercept bi0l and predictor-associated bgk, for k 2 f1; : : : ; qg. There are
 p = 10 predictors for  xed and q = 5 predictors for random e ects to choose from,
 with four true  xed e ects and two true random e ects. The true model is given
 32
by:
 yigl = [1 + 3xi1l + 1:5xi2l + 2xi5l + 2xi10l] + [bi0l + bg1lzg1l + bg5lzg5l] (4.5)
 Results For results from nested clustered designs, Simulation 5 results can be
 found in Tables 10 and 11. All methods again perform well in the identi cation of
  xed e ects, selecting true  xed e ects 100 percent of the time. The methods do
 not perform as well selecting random e ects as in simulations 1 and 3. Methods 1
 and 2 perform the best at selecting random e ects, and among the two, Method 1 is
 also better at eliminating noise random e ects. Method 3 again eliminates random
 e ects from the model the most. Method 3?s models average just over one random
 e ect in each model, about half the true size.
 4.6 Simulation 6
 Setting Simulation 6 has the same true LME equation as Simulation 5, seen in
 (4.6). However, this will be a a larger data set, where n = 600 clusters with m = 3
 observations within clusters for p = 50 and q = 20 predictors. There are still only
 four important  xed e ects and two important random e ects, as in equation (4.6).
 All  i = 0 for i > 10. All bi = 0 for i > 5.
 Results Simulation 6 results are listed in Tables 12 and 13. Again Methods 2
 and 3 were limited by computational power. Methods 1 and 3 selected the true
  xed e ects 100 percent of the time, and both were able to eliminate noise  xed
 33
e ects 100 percent of the time. Method 1 e ectively selected true random e ects
 an average of 100 percent of the time, while eliminating false random e ects more
 than 95 percent of the time.
 4.7 Simulation 7
 Setting This simulation generates a small study population of n = 60 individuals
 with observations taken at m = 3 time points. It will simulate a multilevel study,
 where the  xed e ects are selected at the individual level and the random e ects
 are selected from group level predictors. It will also simulate a longitudinal study,
 with time t = (1; 2; 3).
 At the individual level Xij(t), for individual i and predictor j, X1 = t and X ij
 for j 2 (2; : : : ; p) are generated from N(0,1) with Corr(Xij(t); Xij0(t)) = 0:5t t0 . At
 the group level, the predictors Zgk are nested within g = 6 groups, so all members
 of group g will have the same set of responses Zgk(t). Zgk(t) is generated from
 N(0; 1) with Corr(Zgk(t); Zgk(t0)) = 0:8t t
 0
 . Random e ects b are generated from
 MVN(0;  2R), where  = 0:8 and R is equation 4.3 above. The random e ects have
 a subject-speci c intercept bi0(t) and predictor-associated bigk, for k 2 f1; : : : ; qg.
 There are p = 10 predictors for  xed and q = 5 predictors for random e ects to
 choose from, with four true  xed e ects and two true random e ects. The true
 34
model is given by:
 yig(t) = 1 +  Xij(t) + bi0(t) + bigkZgk(t)
 = [1 + 3x1(t) + 1:5xi2(t) + 2xi5(t) + 2xi10(t)] + [bi0(t) + big1zg1(t) + big5zg5(t)]
 Results For results from multilevel longitudinal designs, Simulation 7 results can
 be found in Tables 14 and 15. All methods again perform well in the identi cation
 of  xed e ects, selecting true  xed e ects 100 percent of the time. Methods 2 and
 3 overestimate  1, the coe cient for the time variable. Method 1 performs the
 best at selecting random e ects. Method 2 selects one of the true random e ects
 often while eliminating the other often. Both are selected more often than the noise
 random e ects, however. Method 3 again eliminates random e ects from the model
 the most. Method 3?s models are on average less than the true model size. Again,
 Method 1 performs the best overall.
 4.8 Summary of Simulation Studies
 Overall, Method 1 was the most e ective across data sets of di erent structures
 and sizes. It tends to underestimate the true parameter values of  xed and random
 e ects, but this can be expected from penalized optimization problems. This can be
 remedied through re-estimating the selected model without penalization. Method
 3 performed very well in the selection of  xed e ects in all settings, but could not
 perform selection of random e ects well for large data sets. Method 2 performed the
 worst. While able to select  xed e ects accurately, it performed poorly for random
 35
e ects in nested settings. It was also unable to complete any analysis for large data
 sets.
 36
Chapter 5
 Real Data Analysis
 5.1 Data Description and Model Formuation
 The data used in this analysis will include only the 2006 and 2009 time points
 of the TAAG 2 Maryland data set. The outcome variable of interest is the average
 minutes of MVPA per day in the adolescent girls at each time point. In total, there
 were 66 variables of interest for 551 subjects at two time points. A reference table
 of the variables can be found in Table 18.
 Data was collected at the individual, social, school, and neighborhood levels.
 Examples of predictors at the individual level include BMI, percent body fat, self
 esteem measures, enjoyment of physical activity, and depression, among others. At
 the social level, predictors include measures of peer and family support, such as
 amount of encouragement received from members in the household or time spent
 home alone. At the school level, variables include policies for items such as physical
 education and transportation, as well as metrics regarding the schools? performance
 academically. At the neighborhood level, variables include proximity to their school,
 parks, and physical activity facilities as well as measures of safety in the neighbor-
 hood (Young et al. 2013).
 Because the variable selection methods require that no data is missing, it is
 required that the subjects have measures at both time points. Those who were not
 37
present at both time points were removed from the set. Of the remaining data, miss-
 ing data is imputed using the Sequential Regression Imputation Method (Raghu-
 nathan et al. 2001) through IVEware (Raghunathan et al. 2002) where possible.
 It is not possible to impute factors at the neighborhood level that used GIS data,
 so most neighborhood level variables were not included in the selection procedures.
 Questions regarding the subjects? perceptions of their neighborhood, however, are
 included.
 For the ith girl, the  xed e ects variables Xij to be considered will be from
 the individual, social, and neighborhood levels, where j 2 f1 : : : pg. Time will also
 be included for longitudinal consideration in the model. The predictors associated
 with the random e ects will be used to generalize the variance components of the
 model. For girls in school g (1  g  6), the random e ects will be selected from the
 school-level variables Zgk, where k 2 f1 : : : qg. For the school-level variables, only
 the data from the 8th grade middle school time point will be considered. Therefore,
 these predictors are not time-dependent.
 For the outcome yig(t), or average daily MVPA at time t 2 (0; 1) for girl i in
 school g, the LME model is represented by
 yig(t) =  0 +  t+  Xi + bi0 + bg0 + biZg +  ig;
 where  0 is the  xed e ects intercept,  is the parameter associated with time, bi
 are the individual-speci c random e ects with intercept bi0, and bg0 is the school-
 speci c random intercept. Individual-speci c random e ects bi are distributed to
 38
N(0;  2 i). School-speci c random intercept bg0 is distributed to N(0;  2 g). The
 error term is distributed   N(0; 1).
 5.2 Results of Real Data Analysis
 Method 1 and Method 3 were able to complete analysis, while Method 2 was
 not able to due to incomplete rank of the Z matrix. The results of Methods 1 and
 3 can be found in Tables 16 and 17, with, for comparison, results of selected  xed
 and random e ects from the individual time points analyzed by Young et al. (2013).
 For Method 3, the  xed e ects X were scaled so each column had a norm of
 p
 n.
 Following this, the method selected 40  xed e ects and 1 random e ect. Only 6  xed
 e ects were eliminated from the model, while 18 random e ects were eliminated. In
 the simulations, the random e ects model sizes were generally smaller than the true
 model size. Therefore, it is likely true random e ects were eliminated from this
 model.
 Method 1 was more successful at creating a more parsimonious model. First,
 the data was standardized such that X and Z are scaled that have zero mean and
 unit Euclidean norm. Of the original 47  xed e ects and 19 random e ects, Method
 1 selected three  xed e ects and two random e ects. The  xed e ects selected were
 (1) self-management strategies (MSQBOD F), (2) perceived barriers (MSQBOD I),
 and (3) support from friends (MSQBOD OB). Because of the tendency of Method
 1 to underestimate the parameter values, the selected model was updated to a
 non-penalized  tted LME model using the lme4 package in R, with corresponding
 39
p-values for the  xed e ects. Although not selected, time was included in the re-
 estimated model to determine the longitudinal e ects of time on MVPA.
 After re-estimation, the results suggest that there is a positive association be-
 tween self-management strategies and friend support with MVPA ( MSQBOD F =
 0:12 and  MSQBOD OB = 0:43). There was also a negative association between per-
 ceived barriers and MVPA ( MSQBOD F =  0:21). Perceived barriers (p =< :001)
 and friend support (p < :001) were both highly signi cant, and self-management
 strategies was signi cant at the  = 0:05 signi cance level (p = :05). There was
 negative, but nonsigni cant, e ect of time on MVPA ( time =  0:52; p = :35).
 All three  xed e ects were selected in the previous study, and the results of per-
 ceived barriers and friend support show the same direction of association. Based
 on the results, it would be suggested to o er programs or interventions for improv-
 ing self-management strategies, for reducing barriers to physical activity, and for
 encouraging peers to give each other support in participating in physical activities.
 One random e ect was selected out of the 18 original variables. The random ef-
 fect selected was PPIC19, indicating whether the schools o ered interscholastic and
 intramural physical activity programs. This was selected for the 2009 11th grade
 time point in Young et al. (2013), but not in the 2006 8th grade model. Following
 re-estimation, the results from this analysis suggest that there is substantial varia-
 tion in MVPA from girl to girl associated with whether their middle school o ered
 interscholastic or intramural physical activity programs programs ( ^ = 6:68).
 40
Chapter 6
 Conclusion
 This thesis has presented three new methods for variable selection in LME
 models. The method proposed by Bondell et al. (2010) was one of the  rst to simul-
 taneously select  xed and random e ects in LME models. While it can e ectively
 select  xed e ects, it performs less accurately in selecting random e ects when the
 data becomes nested. Further, it cannot perform analysis when the data is time in-
 dependent, as was the case of the random e ects in the TAAG data. Also, the EM
 algorithm that Method 2 uses is an ine cient way to solve optimization problems.
 As data sets get larger through increases in sample size or number of predictors,
 the slow rate of convergence of the EM algorithm becomes ine cient and even im-
 plausible with limited computing resources. An option for high-dimensional data,
 where N  p, would be to reduce the number of  xed e ect parameters using pre-
 vious methods, such as the lasso, while ignoring the random e ects. Following this,
 the method could be applied to the random e ects and the selected  xed e ects.
 However, due to its slow rate of convergence, this method and its use of the EM
 algorithm would not be able ideal for use on high-dimensional data,
 Method 3, proposed by Fan and Li (2012) can accurately and quickly select
  xed e ects in LME models. In simulations it performed excellently in not only
 selecting true predictors, but it is very e ective at removing noise  xed e ects vari-
 41
ables from the model as well. However, the performance with the TAAG data was
 inconsistent with the sparse results displayed in the simulations. The use of the
 proxy matrix requires certain conditions to be satis ed. Notably, for  xed e ects
 X and random e ects Z, the signal and noise variables must not be highly corre-
 lated. By using a proxy matrix, the correlation between variables is ignored. In
 cases of highly correlated signal and noise predictors, the use of the proxy matrix
 could introduce bias that can hinder the model selection oracle property. There are
 potentially many correlated variables in the TAAG data set that could violate this
 condition set in order to use the proxy matrix, which may have caused the poor
 results. Additionally, the performance of Method 3 in selecting random e ects can
 be troublesome, as it tends to under-select true models. This can lead to models
 that are missing important random e ects.
 For high-dimensional data, it is necessary to  rst reduce the number of  xed
 e ects parameters while ignoring the random e ects through previous regularization
 methods. Next the random e ects can be selected using the chosen  xed e ects from
 the previous step. Finally, these  xed e ects can be selected and re-estimated using
 the selected random e ects from the second step.
 Based on the results of the simulations, Method 1 by Li et al. (2012) is clearly
 the optimal method of the three. It selects the true model consistently while elim-
 inating noise variables e ectively. Additionally, it?s new algorithm for solving the
 optimization problem is much more e cient than previous methods, such as the
 EM algorithm. By splitting the optimization problem into two penalized quadratic
 algorithms, convergence can be reached much quicker than previous methods. Ad-
 42
ditionally, this method can be used with high-dimensional data. All that is needed
 is to use the maximum likelihood approach in equation (3.2), instead of the REML-
 modi ed equation (3.1).
 The bene ts of these methods can surely prove invaluable to researchers. This
 is especially true in the  eld of public health, where longitudinal data is often used
 and is vital for understanding temporal trends of health outcomes. The temporal
 trends can provide a deeper understanding of biological, social, or environmental
 processes that can lead to progress in the discovery and improvement of health risks.
 With the methods introduced in this thesis, it is possible to e ciently and select
 important  xed and random e ects from large, complex sets of predictors. This can
 aid and advance the  eld of public health data greatly in the future, especially as
 technology and data collection methods improve.
 43
Table 1: Example of Model Selection Using Lasso
 Variable True Value Least Squares Ridge Lasso
 Intercept 1 1.04 0.96 0.90
 X1 0.5 0.69 0.58 0.52
 X2 0 0.13 0.06 0.00
 X3 1.5 1.44 1.36 1.26
 X4 0.5 0.39 0.43 0.41
 X5 0 0.11 0.04 0.00
 X6 0 -0.16 -0.003 0.00
 44
Table 2: Simulation 1 Results - Parameter Estimates
 Fixed E ects Random E ects
 Covariate  ^%  ^  ^ Error  ^%  ^  ^ Error
 Method 1
 1 100.00 2.37 0.32 100.00 0.53 0.42
 2 100.00 0.89 0.26 100.00 0.49 0.37
 3 26.00 0.09 0.05 28.00 0.09 0.10
 4 22.00 0.05 0.03 18.00 0.07 0.06
 5 100.00 1.68 0.18 56.00 0.17 0.18
 6 14.00 0.02 0.01 18.00 0.06 0.06
 7 6.00 0.01 0.00 18.00 0.02 0.02
 8 14.00 0.03 0.02 20.00 0.03 0.03
 9 14.00 0.03 0.02 16.00 0.05 0.04
 10 100.00 1.32 0.30 100.00 0.48 0.38
 Method 2
 1 100.00 3.02 0.18 100.00 0.90 0.42
 2 100.00 1.50 0.21 100.00 0.90 0.46
 3 62.00 0.03 0.13 56.00 0.60 0.47
 4 70.00 -0.03 0.11 50.00 0.63 0.46
 5 100.00 2.00 0.14 40.00 0.64 0.49
 6 72.00 -0.02 0.12 36.00 0.59 0.44
 7 66.00 0.01 0.10 52.00 0.58 0.44
 8 66.00 -0.03 0.10 28.00 0.58 0.40
 9 54.00 -0.00 0.07 42.00 0.65 0.48
 10 100.00 2.02 0.19 78.00 0.87 0.60
 Method 3
 1 100.00 3.03 0.25 74.00 - -
 2 98.00 1.52 0.30 72.00 - -
 3 0.00 0.00 0.00 2.00 - -
 4 0.00 0.00 0.00 10.00 - -
 5 100.00 1.94 0.18 10.00 - -
 6 0.00 0.00 0.00 2.00 - -
 7 0.00 0.00 0.00 2.00 - -
 8 0.00 0.00 0.00 4.00 - -
 9 0.00 0.00 0.00 14.00 - -
 10 100.00 1.99 0.21 66.00 - -
 45
Table 3: Simulation 1 Results - Summary
 Method 1 Method 2 Method 3
 Double Joint Independent
 Penalty Penalty Selection
 Avg Model Size Fixed (True = 4) 4.96 7.88 3.98
 Avg Model Size Random (True = 3) 4.74 5.8 2.56
 Percent True  Included 100.00 100.00 99.33
 Percent True D Included 100.00 92.67 70.67
 Percent False  Included 16.00 64.58 0.00
 Percent False D Included 21.75 35.00 6.29
 46
Table 4: Simulation 2 Results - Parameter Estimates
 Fixed E ects Random E ects
 Covariate  ^%  ^  ^ Error  ^%  ^  ^ Error
 Method 1
 1 100.00 2.52 0.15 100.00 0.53 0.06
 2 100.00 0.97 0.14 100.00 0.51 0.06
 3 4.00 0.02 0.00 8.00 0.01 0.00
 4 10.00 0.01 0.00 0.00 0.00 0.00
 5 100.00 1.89 0.05 32.00 0.04 0.00
 6 0.00 0.00 0.00 0.00 0.00 0.00
 7 2.00 0.01 0.00 0.00 0.00 0.00
 8 0.00 0.00 0.00 0.00 0.00 0.00
 9 0.00 0.00 0.00 2.00 0.01 0.00
 10 100.00 1.47 0.14 100.00 0.54 0.09
 11 2.00 0.01 0.00 4.00 0.01 0.00
 12-50 0.00 0.00 0.00 0.00 0.00 0.00
 51-100 0.00 0.00 0.00 - - -
 Method 2
 1-50 * * * * * *
 51-100 * * * - - -
 Method 3
 1 100.00 3.01 0.08 * * *
 2 100.00 1.48 0.07 * * *
 3 2.00 0.14 0.02 * * *
 5 100.00 2.00 0.05 * * *
 8 2.00 -0.17 0.02 * * *
 9 4.00 0.01 0.04 * * *
 10 100.00 1.99 0.07 * * *
 15 2.00 -0.16 0.02 * * *
 22 4.00 0.03 0.03 * * *
 28 2.00 -0.13 0.02 * * *
 29 2.00 0.15 0.02 * * *
 36 4.00 -0.01 0.03 * * *
 40 2.00 0.12 0.02 * * *
 45 2.00 0.14 0.02 * * *
 49 2.00 0.14 0.02 * * *
 54 2.00 0.14 0.02 - - -
 56 2.00 0.17 0.02 - - -
 66 2.00 0.14 0.02 - - -
 72 2.00 0.12 0.02 - - -
 73 2.00 -0.13 0.02 - - -
 75 2.00 0.13 0.02 - - -
 47
Table 4 { Continued
 Covariate  ^%  ^  ^ Error  ^%  ^  ^ Error
 76 2.00 -0.17 0.02 - - -
 77 2.00 0.13 0.02 - - -
 81 2.00 0.15 0.02 - - -
 88 2.00 -0.13 0.02 - - -
 89 2.00 -0.15 0.02 - - -
 90 2.00 -0.15 0.02 - - -
 98 2.00 -0.15 0.02 - - -
 *Could not complete due to computational limitations
 48
Table 5: Simulation 2 Results - Summary
 Method 1 Method 2 Method 3
 Double Joint Independent
 Penalty Penalty Selection
 Method 3
 Avg Model Size Fixed (True = 4) 4.22 * 4.54
 Avg Model Size Random (True = 3) 3.46 * *
 Percent True  Included 100.00 * 100.00
 Percent True D Included 100.00 * *
 Percent False  Included 0.229 * 0.563
 Percent False D Included 0.469 * *
 *Could not complete due to computational limitations
 49
Table 6: Simulation 3 Results - Parameter Estimates
 Fixed E ects Random E ects
 Covariate  ^%  ^  ^ Error  ^%  ^  ^ Error
 Method 1
 1 100.00 2.52 0.18 90.00 0.33 0.30
 2 100.00 1.03 0.15 46.00 0.20 0.20
 3 32.00 0.00 0.00 54.00 0.19 0.22
 4 36.00 0.00 0.01 36.00 0.18 0.17
 5 100.00 1.36 0.16 86.00 0.35 0.30
 6 4.00 0.00 0.00 - - -
 7 4.00 0.00 0.00 - - -
 8 2.00 0.00 0.00 - - -
 9 14.00 0.00 0.00 - - -
 10 100.00 1.24 0.16 - - -
 Method 2
 1 100.00 3.01 0.14 94.00 0.94 0.37
 2 100.00 1.45 0.18 20.00 0.71 0.33
 3 24.00 -0.02 0.04 2.00 1.18 0.17
 4 20.00 0.03 0.03 14.00 0.72 0.26
 5 100.00 1.97 0.14 56.00 0.90 0.47
 6 14.00 0.08 0.05 - - -
 7 16.00 0.04 0.04 - - -
 8 10.00 0.04 0.04 - - -
 9 18.00 -0.00 0.06 - - -
 10 100.00 1.94 0.13 - - -
 Method 3
 1 100.00 3.04 0.23 66.00 - -
 2 100.00 1.44 0.24 18.00 - -
 3 6.00 0.19 0.12 12.00 - -
 4 6.00 -0.49 0.12 20.00 - -
 5 100.00 2.01 0.20 76.00 - -
 6 8.00 -0.01 0.15 - - -
 7 2.00 0.39 0.05 - - -
 8 4.00 0.03 0.10 - - -
 9 4.00 0.39 0.08 - - -
 10 100.00 1.95 0.16 - - -
 50
Table 7: Simulation 3 Results - Summary
 Method 1 Method 2 Method 3
 Double Joint Independent
 Penalty Penalty Selection
 Avg Model Size Fixed (True = 4) 4.92 5.02 4.30
 Avg Model Size Random (True = 2) 2.44 1.86 1.26
 Percent True  Included 100.00 100.00 100.00
 Percent True D Included 83.00 75.00 71.00
 Percent False  Included 15.33 17.00 5.00
 Percent False D Included 26.00 18.00 25.00
 51
Table 8: Simulation 4 Results - Parameter Estimates
 Fixed E ects Random E ects
 Covariate  ^%  ^  ^ Error  ^%  ^  ^ Error
 Method 1
 1 100.00 2.93 0.03 100.00 0.69 0.13
 2 100.00 1.44 0.03 2.00 0.04 0.00
 3 12.00 0.00 0.00 8.00 0.01 0.00
 4 4.00 0.00 0.00 8.00 0.02 0.00
 5 100.00 1.91 0.03 100.00 0.63 0.00
 6 0.00 0.00 0.00 10.00 0.02 0.10
 7 0.00 0.00 0.00 4.00 0.02 0.00
 8 4.00 0.01 0.01 4.00 0.02 0.00
 9 10.00 0.00 0.00 2.00 0.03 0.00
 10 100.00 1.89 0.03 12.00 0.01 0.00
 11 4.00 0.00 0.00 - - -
 12-50 0.00 0.00 0.00 - - -
 Method 2
 1-10 * * * * * *
 11-50 * * * - - -
 Method 3
 1 100.00 3.00 0.05 * * *
 2 100.00 1.50 0.05 * * *
 3 0.00 0.00 0.00 * * *
 4 0.00 0.00 0.00 * * *
 5 100.00 2.00 0.05 * * *
 6 0.00 0.00 0.00 * * *
 7 0.00 0.00 0.00 * * *
 8 0.00 0.00 0.00 * * *
 9 0.00 0.00 0.00 * * *
 10 100.00 2.00 0.04 * * *
 11-50 0.00 0.00 0.00 - - -
 * Could not complete due to computational limitations
 52
Table 9: Simulation 4 Results - Summary
 Method 1 Method 2 Method 3
 Double Joint Independent
 Penalty Penalty Selection
 Avg Model Size Fixed (True = 4) 4.34 * 4.00
 Avg Model Size Random (True = 2) 2.50 * *
 Percent True  Included 100.00 * 100.00
 Percent True D Included 100.00 * *
 Percent False  Included 3.09 * 0.00
 Percent False D Included 5.55 * *
 *Could not complete due to computational limitations
 53
Table 10: Simulation 5 Results - Parameter Estimates
 Fixed E ects Random E ects
 Covariate  ^%  ^  ^ Error  ^%  ^  ^ Error
 Method 1
 1 100.00 2.46 0.19 78.00 0.41 0.21
 2 100.00 1.04 0.21 36.00 0.23 0.15
 3 34.00 0.01 0.01 36.00 0.25 0.16
 4 42.00 0.01 0.01 32.00 0.24 0.14
 5 100.00 1.31 0.15 60.00 0.32 0.19
 6 12.00 0.00 0.00 - - -
 7 2.00 0.00 0.00 - - -
 8 0.00 0.00 0.00 - - -
 9 6.00 0.00 0.00 - - -
 10 100.00 1.21 0.15 - - -
 Method 2
 1 100.00 3.00 0.12 94.00 0.91 0.59
 2 100.00 1.47 0.16 50.00 0.71 0.48
 3 44.00 0.01 0.07 44.00 0.81 0.48
 4 42.00 0.03 0.07 26.00 0.73 0.36
 5 100.00 1.99 0.10 42.00 0.79 0.42
 6 32.00 0.04 0.06 - - -
 7 38.00 -0.00 0.07 - - -
 8 48.00 0.00 0.08 - - -
 9 28.00 -0.02 0.05 - - -
 10 100.00 1.98 0.10 - - -
 Method 3
 1 100.00 2.97 0.17 50.00 - -
 2 100.00 1.50 0.18 16.00 - -
 3 2.00 0.41 0.06 20.00 - -
 4 0.00 0.00 0.00 16.00 - -
 5 100.00 2.00 0.14 52.00 - -
 6 2.00 0.39 0.06 - - -
 7 0.00 0.00 0.00 - - -
 8 2.00 -0.39 0.06 - - -
 9 0.00 0.00 0.00 - - -
 10 100.00 1.99 0.13 - - -
 54
Table 11: Simulation 5 Results - Summary
 Method 1 Method 2 Method 3
 Double Joint Independent
 Penalty Penalty Selection
 Avg Model Size Fixed (True = 4) 4.96 6.32 4.06
 Avg Model Size Random (True = 2) 2.38 2.56 1.04
 Percent True  Included 100.00 100.00 100.00
 Percent True D Included 69.00 68.00 51.00
 Percent False  Included 16.00 38.67 1.00
 Percent False D Included 34.67 40.00 17.33
 55
Table 12: Simulation 6 Results - Parameter Estimates
 Fixed E ects Random E ects
 Covariate  ^%  ^  ^ Error  ^%  ^  ^ Error
 Method 1
 1 100.00 2.77 0.04 90.00 0.45 0.32
 2 100.00 1.30 0.04 24.00 0.12 0.08
 3 0.00 0.00 0.00 12.00 0.20 0.12
 4 0.00 0.00 0.00 14.00 0.12 0.06
 5 100.00 1.69 0.05 88.00 0.48 0.34
 6 0.00 0.00 0.00 12.00 0.18 0.13
 7 0.00 0.00 0.00 6.00 0.13 0.05
 8 0.00 0.00 0.00 0.00 0.00 0.00
 9 0.00 0.00 0.00 0.00 0.00 0.00
 10 100.00 1.66 0.05 0.00 0.00 0.00
 12 0.00 0.00 0.00 2.00 0.17 0.04
 14 0.00 0.00 0.00 2.00 0.13 0.04
 15 0.00 0.00 0.00 2.00 0.10 0.04
 19 0.00 0.00 0.00 6.00 0.23 0.13
 20 0.00 0.00 0.00 4.00 0.12 0.05
 21-50 0.00 0.00 0.00 - - -
 Method 2
 1-20 * * * * * *
 21-50 * * * - - -
 Method 3
 1 100.00 3.01 0.05 * * *
 2 100.00 1.49 0.04 * * *
 3 0.00 0.00 0.00 * * *
 4 0.00 0.00 0.00 * * *
 5 100.00 1.99 0.04 * * *
 6 0.00 0.00 0.00 * * *
 7 0.00 0.00 0.00 * * *
 8 0.00 0.00 0.00 * * *
 9 0.00 0.00 0.00 * * *
 10 100.00 2.01 0.04 * * *
 11-20 0.00 0.00 0.00 * * *
 21-50 0.00 0.00 0.00 - - -
 *Could not complete due to computational limitations
 56
Table 13: Simulation 6 Results - Summary
 Method 1 Method 2 Method 3
 Double Joint Independent
 Penalty Penalty Selection
 Avg Model Size Fixed (True = 4) 4.00 * 4.00
 Avg Model Size Random (True = 2) 2.60 * *
 Percent True  Included 100.00 * 100.00
 Percent True D Included 89.00 * *
 Percent False  Included 0.00 * 0.00
 Percent False D Included 4.67 * *
 *Could not complete due to computational limitations
 57
Table 14: Simulation 7 Results - Parameter Estimates
 Fixed E ects Random E ects
 Covariate  ^%  ^  ^ Error  ^%  ^  ^ Error
 Method 1
 1 100.00 2.70 0.14 86.00 0.15 0.14
 2 100.00 1.24 0.13 52.00 0.03 0.04
 3 18.00 0.08 0.05 34.00 0.02 0.01
 4 24.00 0.06 0.04 42.00 0.03 0.03
 5 100.00 1.76 0.12 82.00 0.19 0.17
 6 10.00 0.05 0.02 - - -
 7 2.00 0.15 0.02 - - -
 8 10.00 0.05 0.02 - - -
 9 10.00 0.05 0.02 - - -
 10 100.00 1.72 0.10 - - -
 Method 2
 1 100.00 3.39 0.06 84.00 0.86 0.42
 2 100.00 1.42 0.17 20.00 0.69 0.35
 3 18.00 0.02 0.05 6.00 0.58 0.15
 4 14.00 0.03 0.04 10.00 0.75 0.24
 5 100.00 1.96 0.11 42.00 0.89 0.48
 6 14.00 0.02 0.04 - - -
 7 16.00 -0.02 0.05 - - -
 8 18.00 0.06 0.04 - - -
 9 18.00 0.00 0.03 - - -
 10 100.00 1.94 0.14 - - -
 Method 3
 1 100.00 3.44 0.10 64.00 - -
 2 100.00 1.49 0.19 16.00 - -
 3 8.00 0.21 0.14 14.00 - -
 4 2.00 -0.59 0.08 14.00 - -
 5 100.00 2.03 0.15 56.00 - -
 6 2.00 0.78 0.11 - - -
 7 2.00 -0.59 0.08 - - -
 8 0.00 0.00 0.00 - - -
 9 0.00 0.00 0.00 - - -
 10 100.00 2.02 0.16 - - -
 58
Table 15: Simulation 7 Results - Summary
 Method 1 Method 2 Method 3
 Double Joint Independent
 Penalty Penalty Selection
 Avg Model Size Fixed (True = 4) 4.74 * 4.00
 Avg Model Size Random (True = 2) 2.96 * 1.60
 Percent True  Included 100.00 * 100.00
 Percent True D Included 84.00 * 60.00
 Percent False  Included 12.33 * 0.00
 Percent False D Included 42.67 * 14.67
 59
Table 16: TAAG 2 Data - Fixed E ects
 Young et al. (2013) Results
 Method 1
 Method 1 Re-estimate Method 3 8th Grade 11th grade
  ^  ^ (p-value)  ^  ^ (p-value)  ^ (p-value)
 time - -0.52(.35)* 1.01 - -
 COMBPAREDUC - - -0.23 - -
 MSQBA5A - - 0.71 - -
 MSQBA5B - - -1.22 - -
 MSQBA7 - - 0.43 - -
 MSQBC1 - - -0.78 - -
 MSQBC2 - - - - -
 MSQBC3 - - -0.32 - -
 MSQBM1 - - -0.19 0.39(.27) -0.15(.72)
 MSQBM2 - - -0.45 -0.10(.76) -1.40(<.001)
 MSQBM3 - - -0.23 -0.03(.86) -0.04(.86)
 MSQBM4 - - - 0.78(.05) 0.80(.08)
 MSQBM5 - - 0.58 - -
 MSQBM6 - - 0.21 - -
 MSQBM7 - - -0.74 - -
 MSQBM8 - - 0.74 - -
 MSQBM9 - - -0.14 - -
 MSQBM10 - - -0.77 - -
 MSQBR1 - - 1.51 - -
 MSQBR2 - - -0.43 - -
 r1 - - -1.15 1.15(.36) 0.31(.83)
 r2 - - 1.18 2.25(.12) -0.73(.68)
 r3 - - -0.30 0.02(.99) -0.83(.66)
 BMI - - -2.47
 PFAT3 - - 1.71 -0.09(.08) -0.06(.43)
 MSQBA DAD MOM - - - - -
 MSQBOD B - - 0.51 0.12(.04) 0.05(.43)
 MSQBOD DA - - 0.89 - -
 MSQBOD DB - - 0.31 0.31(.20) 0.00(.99)
 MSQBOD E - - 0.25 - -
 MSQBOD F 0.05 0.12(.05) - -0.04(.68) 0.01(.95)
 MSQBOD G - - - 0.09(.34) 0.26(.03)
 MSQBOD H - - 1.08 -0.00(0.92) -0.31(0.01)
 MSQBOD I -0.09 -0.21(< :001) -0.84 -0.20(.04)  0:37(< :001)
 MSQBOD_JA - - -0.15 - -
 MSQBOD JB - - 0.17 0.00(0.92) -0.02(0.06)
 MSQBOD K - - 0.24 0.57(.14) -0.00(.99)
 60
Table 14 { Continued
 Young et al. (2013) Results
 Method 1
 Method 1 Re-estimate Method 3 8th Grade 11th grade
  ^  ^ (p-value)  ^  ^ (p-value)  ^ (p-value)
 MSQBOD LA - - -0.26 -0.19(.35) -0.11(0.65)
 MSQBOD LB - - -0.75 -0.13(.39) 0.38(.04)
 MSQBOD LC - - -0.52 - -
 MSQBOD N - - -0.43 - -
 MSQBOD OA - - -0.58 - -
 MSQBOD OB 0.22 0.43(< :001) 1.31 0.32(.08) 0.28(.22)
 MSQBOD OC - - -0.39 -0.08(.45) 0.07(.61)
 MSQB80P - - - 0.01(.80) 0.01(.89)
 MSQBQ1 - - 0.51 - -
 MSQBR34SUM - - -0.52 - -
 * Not selected but included in re-estimated model
 61
Table 17: TAAG 2 Data - Random E ects
 Young et al. (2013) Results
 Method 1
 Method 1 Re-estimate Method 3 8th Grade
  ^  ^ Selected  ^ (p-value)  ^ (p-value)
 Individual Level
 Intercept 3.97 7.61 - - -
 MSMA4 - - - -1.45(< :001) 0.11(.53)
 MSMA5A - - - -0.85(.20) 0.32(.28)
 MSMA5B - - Yes - -
 PPIC1C2 - - - -0.41(.22) -
 PPIC18A - - - -5.63(.19)
 PPIC19 1.47 6.68 - - 7.35(.01)
 PPIC21 - - - 16.76(< :001) 3.08(.15)
 PPIC22 - - - -2.93(0.21) -
 PPIC34 - - - -2.93(0.21) 5.79(.07)
 School Level
 Intercept 0.09 0.93 - - -
 62
Table 18: Reference Table for Fixed and Random E ects
 Predictors
 Variable Description
 Fixed E ects
 time time = (0,1) for 8th grade and 11th grade respectively
 COMBPAREDUC Parents? education combined
 MSQBA5A Employment status: father
 MSQBA5B Employment status: mother
 MSQBA7 Receive free or low-cost lunches at school
 MSQBC1 Di culty getting home from school-based activity
 MSQBC2 Di culty getting to community-based activity
 MSQBC3 Di culty getting home from community-based activity
 MSQBM1 Perceived places to go within walking distance of home
 MSQBM2 Perceived sidewalks in neighborhood
 MSQBM3 Perceived bike/walking trails in neighborhood
 MSQBM4 Perceived safety to walk/jog in neighborhood
 MSQBM5 Perceived walkers/bikers easily seen in neighborhood
 MSQBM6 Perceived tra c in neighborhood
 MSQBM7 Perceived frame in neighborhood
 MSQBM8 Perceived seeing kids outside playing in neighborhood
 MSQBM9 Perceived interesting things to look at in neighborhood
 MSQBM10 Perceived well-lit neighborhood
 MSQBR1 Grade began current middle school
 MSQBR2 Currently taking PE
 r1 Race: white
 r2 Race: black
 r3 Race: hispanic
 BMI BMI
 BMI85 BMI above 85th percentile
 BMI95 BMI above 95th percentile
 PFAT3 Percent Fat
 MSQBA DAD MOM Number of parents living with
 MSQBOD B Average time alone per week
 MSQBOD DA Sports team participation at school
 MSQBOD DB Sports team participation outside school
 MSQBOD E Enjoyment of PA classes/lessons
 MSQBOD F Self-management strategies
 MSQBOD G Self-e cacy
 MSQBOD H Enjoyment of PA
 MSQBOD I Perceived barriers
 63
Table 16 { Continued
 Variable Description
 MSQBOD JA Outcome expectancy
 MSQBOD JB Outcome expectancy value
 MSQBOD K Enjoyment of PE
 MSQBOD LA Positive PA school climate for teachers
 MSQBOD LB Positive PA school climate for boys
 MSQBOD LC PA norms
 MSQBOD N Access to recreational facilities
 MSQBOD OA Provides social support
 MSQBOD OB Friend support
 MSQBOD OC Family support
 MSQB80p Sum score on depressive scale
 MSQBQ1 Ever tried cigarettes
 MSQBR34 SUM Sum of PE class taking
 Random E ects (School Level)
 MSMA3E Percent white
 MSMA4 Percent free/reduced lunch
 MSMA5A Percent passing state math test
 MSMA5B Percent passing state English/reading test
 PDHA1 PE class size
 PPIC1C2 Required weeks of PE per year
 PPIC2 Percent students not meeting requirements
 PPIC17 PA school events this year
 PPIC19 Interscholastic and Intramural PA programs
 PPIC21 School ground changes in past year
 PPIC22 Policy changes that encourage PA
 PPIC24 Budget change positive for PA
 PPIC30 Percent bike/walk to school
 PPIC34 Unstructured free play before school
 PPIC35 Unstructured free play during school
 PPIC36 Unstructured free play after school
 PSB Numprog Number of programs in school
 MVPA MVPA at school
 64
References
 Akaike, H. (1973), \Information Theory and an Extension of the Maximum Likeli-
 hood Principle," Second International Symposium on Information Theory.
 Bondell, H., Krishna, A., and Ghosh, S. K. (2010), \Joint Variable Selection for
 Fixed and Random E ects in Linear Mixed-E ects Models," Biometrics, 66, 1069{
 1077.
 Bondell, H. D. and Reich, B. J. (2008), \Simultaneous regression shrinkage, variable
 selection, and supervised clustering of predictors with OSCAR," Biometrics, 64,
 115{23.
 Chen, Z. and Dunson, D. B. (2003), \Random E ects Selection in Linear Mixed
 Models," Biometrics, 59, 762{769.
 Council on Sports Medicine and Fitness and Council on School Health (2006), \Ac-
 tive healthy living: prevention of childhood obesity through increased physical
 activity," Pediatrics, 117, 1834{42.
 Daniels, S. R., Arnett, D. K., Eckel, R. H., Gidding, S. S., Hayman, L. L., Ku-
 manyika, S., Robinson, T. N., Scott, B. J., Jeor, S. S., and Williams, C. L. (2005),
 \Overweight in Children and Adolescents : Pathophysiology, Consequences, Pre-
 vention, and Treatment," Circulation, 111, 1999{2012.
 Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004), \Least Angle Re-
 gression," The Annals of Statistics, 32, 407{451.
 Fan, J. and Li, R. (2001), \Variable Selection via Nonconcave Penalized Likelihood
 and Its Oracle Properties," Journal of the American Statistical Association, 96,
 1348{60.
 Fan, Y. and Li, R. (2012), \Variable Selecion in Linear Mixed E ects Models,"
 Annals of Statistics, 40, 2043{2068.
 Foster, S. D., Verbyla, A. P., and Pitchford, W. S. (2009), \Estimation, Prediction
 and Inference for the Lasso Random E ects Model," Australian and New Zealand
 Journal of Statistics, 51, 43{61.
 Hall, D. B. and Praestgaard, J. T. (2001), \Order-restricted Score Tests for Homo-
 geneity in Generalised Linear and Nonlinear Models," Biometrika, 88, 739{751.
 Harville, D. (1974), \Bayesian Inference for Variance Components Using Only Error
 Contrasts," Biometrika, 61, 383{385.
 Hastie, T., Tibshirani, R., and Friedman, J. (2009), The Elements of Statistical
 Learning: Data Mining, Inference, and Prediction., Springer, 2nd ed.
 Jiang, J. J., Rao, J. S., Gu, Z., and Nguyen, T. (2008), \Fence Models for Mixed
 Model Selection," The Annals of Statistics, 36, 1669{92.
 65
Kimm, S. Y., Glynn, N. W., Kriska, A. M., Barton, B. A., Kronsberg, S. S., Daniels,
 S. R., Crawford, P. B., Sabry, Z. I., and Liu, K. (2002), \Decline in physical
 activity in black girls and white girls during adolescence," New England Journal
 of Medicine, 347, 709{15.
 Laird, N. M. and Ware, J. H. (1982), \Random-E ects Models for Longitudinal
 Data," Biometrics, 38, 963{974.
 Lange, N. and Laird, N. M. (1998), \The E ects of Covariance Structure on Vari-
 ance Estimation in Balanced Growth-Curve Models," Journal of the American
 Statistical Association, 84, 241{247.
 Li, Y., Wang, S., Song, P. X.-K., Wang, N., and Zhu, J. (2012), \Doubly Regularized
 Estimation and Selection in Linear Mixed-E ects Models for High-Dimenstional
 Longitudinal Data," .
 Lin, X. (1997), \Variance Component Testing in Generlised Linear Models With
 Random E ects," Biometrika, 84, 309{326.
 Lindstrom, M. J. and Bates, D. M. (1988), \Newton-Raphson and EM algorithms
 for Linear Mixed-E ects Models for Repeated-Measures Data," Journal of the
 American Statistical Association, 83, 1014{1022.
 Ogden, C., Kuczmarski, R., Flegal, K., Mei, Z., Guo, S., Wei, R., Grummer-Strawn,
 L., Curtin, L., Roche, A., and Johnson, C. (2002), \Centers for Disease Control
 and Prevention 2000 growth charts for the United States: improvements to the
 1977 National Center for Health Statistics Centers for Disease Control and Pre-
 vention 2000 growth charts for the United States: improvements to the 1977
 National Center for Health Statistics Version," Pediatrics, 109.
 Pietil ainen, K. H., Kaprio, J., Borg, P., Plasqui, G., Yki-J arvinen, H., Kujala, U. M.,
 Rose, R. J., Westerterp, K. R., and Rissanen1, A. (2008), \Physical inactivity and
 obesity: A vicious circle," Obesity, 16, 409{14.
 Raghunathan, T. E., Lepkowski, J., Solenberger, P. W., and Van Hoewyk, J. (2001),
 \A multivariate technique for multiply imputing missing values using a sequence
 of regression models," Survey Methodology, 27, 85{95.
 Raghunathan, T. E., Solenberger, P. W., and Van Hoewyk, J. (2002), \IVEware:
 Imputation and Variance Estimation Software," Computer Software.
 Sallis, J. F., Zakarain, J. M., Howell, M. F., and Hofstetter, C. R. (1996), \Ethnic,
 Socioeconomic, and Sex Di erences Ethnic, Socioeconomic, and Sex Di erences
 in Physical Activity Among Adolescents," Journal of Clinical Epidemiology, 49,
 125{34.
 Schwartz, G. (1978), \Estimating the Dimension of a Model," Annals of Statistics,
 6, 461{4.
 66
Stram, D. O. and Lee, J. W. (1994), \Variance Components Testing in the Longi-
 tudinal Mixed E ects Model," Biometrics, 50, 1171{7.
 Tibshirani, R. (1996), \Regression Shrinkage and Selection via the Lasso," Journal
 of the Royal Statistical Society, Series B (Methodological), 58, 267{88.
 Webber, L. S., Catellier, D. J., Lytle, L. A., Murray, D. M., Pratt, C. A., Young,
 D. R., Elder, J. P., Lohman, T. G., Stevens, J., Jobe, J. B., Pate, R. R., and
 TAAG Collaborative Research Group (2008), \Promoting physical activity in
 middle school girls: Trial of Activity for Adolescent Girls," Am J Prev Med,
 34, 173{84.
 Young, D., Saksvig, B., Wu, T., Zook, K., Li, X., Champaloux, S., Grieser, M.,
 Lee, S., and Treuth, M. (2013), \Multilevel Predictors of Physical Activity For
 Early, Mid, and Late Adolescent Girls," Journal of Physical Activity and Health
 (In press).
 Zou, H. (2006), \The Adaptive Lasso and Its Oracle Properties," Journal of the
 American Statistical Association, 101, 1418{1429.
 Zou, H. and Hastie, T. (2005), \Regularization and Variable Selection via the Elastic
 Net," Journal of the Royal Statistical Society, Series B (Methodological), 67, 301{
 320.
 67