ABSTRACT Title of thesis: SELECTION OF FIXED AND RANDOM EFFECTS IN LINEAR MIXED EFFECTS MODELS WITH APPLICATIONS TO THE TRIAL OF ACTIVITY IN ADOLESCENT GIRLS Edward Grant, Master of Public Health, 2013 Directed by: Professor Tong Tong Wu Department of Epidemiology and Biostatistics Linear mixed e ect (LME) models have become popular in modeling data in a wide variety of elds, particularly in public health. These models are bene cial be- cause they are able to account for both the means as well as the covariance structure of clustered or longitudinal data. However, as studies are able to collect an increas- ing amount of data for large numbers of predictors, a major challenge has been the selection of only important variables to create a more interpretable, parsimonious model. Previous methods for LME models have been ine cient in variable selection, but three new methods attempt to select and estimate both important xed and important random e ects simultaneously. The models are compared through anal- ysis of simulated longitudinal data. Additionally, as an example of the important applications to public health, the methods are applied to the Trial of Activity in Adolescent Girls (TAAG) study, to determine important predictors for Moderate to Vigorous Physical Activity (MVPA). SELECTION OF FIXED AND RANDOM EFFECTS IN LINEAR MIXED EFFECTS MODELS WITH APPLICATIONS TO THE TRIAL OF ACTIVITY IN ADOLESCENT GIRLS by Edward Grant Thesis submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial ful llment of the requirements for the degree of Master of Public Health 2013 Advisory Committee: Professor Tong Tong Wu, Chair, Advisor Professor Shuo Chen Professor Brit Saksvig Table of Contents List of Tables iii List of Figures iv List of Abbreviations v 1 Introduction 1 2 Background 4 2.1 Linear Mixed-E ects Models for Longitudinal Data . . . . . . . . . . 4 2.2 Penalization Methods for Selection of Fixed E ects . . . . . . . . . . 5 2.2.1 Lasso Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Smoothly Clipped Absolute Deviation (SCAD) Penalty . . . . 10 2.2.3 Review of Other Penalization Methods . . . . . . . . . . . . . 11 2.3 Review of Methods for Selection of Random E ects . . . . . . . . . . 12 2.4 Information Criteria for Model Selection . . . . . . . . . . . . . . . . 12 2.5 Summary of Previous Model Selection Methods . . . . . . . . . . . . 13 2.6 Trial of Activity in Adolescent Girls (TAAG) . . . . . . . . . . . . . . 14 3 Methods 17 3.1 Method 1 - Double Penalization . . . . . . . . . . . . . . . . . . . . . 17 3.2 Method 2 - Joint Penalization . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Method 3 - Independent Selection with Proxy Matrix . . . . . . . . . 24 4 Analysis of Simulated Data 28 4.1 Simulation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Simulation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Simulation 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 Simulation 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Simulation 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.6 Simulation 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.7 Simulation 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.8 Summary of Simulation Studies . . . . . . . . . . . . . . . . . . . . . 35 5 Real Data Analysis 37 5.1 Data Description and Model Formuation . . . . . . . . . . . . . . . . 37 5.2 Results of Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . 39 6 Conclusion 41 Appendix 44 References 65 ii List of Tables 1 Example of Model Selection Using Lasso . . . . . . . . . . . . . . . . 44 2 Simulation 1 Results - Parameter Estimates . . . . . . . . . . . . . . 45 3 Simulation 1 Results - Summary . . . . . . . . . . . . . . . . . . . . . 46 4 Simulation 2 Results - Parameter Estimates . . . . . . . . . . . . . . 47 5 Simulation 2 Results - Summary . . . . . . . . . . . . . . . . . . . . . 49 6 Simulation 3 Results - Parameter Estimates . . . . . . . . . . . . . . 50 7 Simulation 3 Results - Summary . . . . . . . . . . . . . . . . . . . . . 51 8 Simulation 4 Results - Parameter Estimates . . . . . . . . . . . . . . 52 9 Simulation 4 Results - Summary . . . . . . . . . . . . . . . . . . . . . 53 10 Simulation 5 Results - Parameter Estimates . . . . . . . . . . . . . . 54 11 Simulation 5 Results - Summary . . . . . . . . . . . . . . . . . . . . . 55 12 Simulation 6 Results - Parameter Estimates . . . . . . . . . . . . . . 56 13 Simulation 6 Results - Summary . . . . . . . . . . . . . . . . . . . . . 57 14 Simulation 7 Results - Parameter Estimates . . . . . . . . . . . . . . 58 15 Simulation 7 Results - Summary . . . . . . . . . . . . . . . . . . . . . 59 16 TAAG 2 Data - Fixed E ects . . . . . . . . . . . . . . . . . . . . . . 60 17 TAAG 2 Data - Random E ects . . . . . . . . . . . . . . . . . . . . . 62 18 Reference Table for Fixed and Random E ects Predictors . . . . . . . 63 iii List of Figures 2.1 Geometric Interpretation of the Lasso Penalization . . . . . . . . . . 7 iv List of Abbreviations AIC Akaike Information Criterion BIC Bayes Information Criterion EM Expectation-Maximization LARS Least Angle Regression LASSO Least Absolute Shrinkage and Selection Operator LME Linear Mixed E ects ML Maximum Likelihood MVPA Moderate to Vigorous Physical Activity PA Physical Activity PE Physical Education OLS Ordinary Least Squares OSCAR Octagonal Shrinkage and Clustering Algorithm for Regression REML Restricted Maximum Likelihood SCAD Smoothed Clipped Absolute Deviance TAAG Trial of Activity in Adolescent Girls v Chapter 1 Introduction Linear mixed-e ects (LME) models (Laird and Ware 1982) are statistical mod- els that are used in the analysis of clustered or longitudinal data. LME models es- timate the relationship between the dependent variable and the predictors included in the model, accounting for both the xed e ects and the random e ects of the independent variables. Compared with linear regression models without considering clustering or temporal e ects, LME models are able to more accurately estimate the xed e ects by estimating the covariance structure through the inclusion individual- speci c random e ects. Ignoring the covariance structure has been shown to lead to biased estimates (Lange and Laird 1998). Improvements in technology have enabled researchers to collect and store data on an increasing number of predictors. However, inferences and predictions of an LME model that includes all predictors become too complex or infeasible as the number of predictors, all of which include xed and random components, increases. One challenge in LME models is choosing a parsimonious model that selects only the signi cant covariates, while excluding variables that have no true e ect on the outcome. Many methods have been published on model selection, but three new methods have been introduced which, unlike many previous approaches, can esti- mate both xed and random e ects simultaneously. First, a method developed by 1 Li et al. (2012), optimizes a regularization problem with two separate penalization methods for xed and random e ects. Next, Bondell et al. (2010) select and es- timate xed and random e ects simultaneously by maximizing a jointly penalized regularization problem. Finally, Fan and Li (2012) use a proxy matrix to account for covariance structure in maximizing a penalized pro le likelihood for xed and random e ects separately. These methods present more practical ways of selecting important xed and random e ects of LME models compared to previous methods. The goal of this thesis will be to compare the e cacy and accuracy of these new variable selection methods though analysis of data simulation studies. Additionally, a comparison of these methods will be performed through a real public health data set. As an example of the power of these methods, we consider the study of the Trial of Activity for Adolescent Girls 2 (TAAG 2), which determined the predictors of physical activity among adolescent girls from 6 schools in Maryland (Young et al. 2013). Data for 65 multilevel variables from 551 girls were collected at two time points, 2006 and 2009, when the girls were enrolled in the 8th and 11th grade, respectively. Using traditional methods, building a parsimonious model for this data would be tedious and could introduce bias. However, this data can be analyzed e ciently using these methods to determine which variables truly have a relationship with the outcome variable of interest, moderate to vigorous physical activity (MVPA). This data analysis will demonstrate the important applications that these methods can have in the public health eld. The rest of this thesis will proceed as follows. Section 2 will discuss previous methods used for variable selection and give an introduction to the TAAG trial. 2 Section 3 will introduce the three methods used for variable selection in LME models. In Section 4, simulation studies will be performed to compare the e ectiveness of the three methods. Section 5 will carry out the analysis of the TAAG data set. Finally, in section 6, the strengths and weaknesses of the methods will be discussed, along with brief implications of their use on high-dimensional data. 3 Chapter 2 Background 2.1 Linear Mixed-E ects Models for Longitudinal Data Consider a longitudinal study with n subjects. Each subject i, (i = 1; :::; n) has m observations. The number of observations can be generalized to mi so that the number of observations can vary across subjects, for a total of Pn i=1mi = N obser- vations. Suppose there are p covariates associated with the xed e ects, X1; :::; Xp and q random e ects associated with the random e ects Z1; :::; Zq. For subject i, let xij be the vector of p predictors for xed e ects for j 2 f1; : : : ; pg and zik be the vector of predictors for q random e ects for k 2 f1; : : : ; qg. For the outcome Y at observation t, the linear mixed-e ects model can then be written as: Yit = xTit + z T itbi + ij where is the p 1 parameter vector for xed e ects, bi is the q 1 param- eter vector for random e ects, and ij represents the error term which is inde- pendently and identically distributed to N(0; 2). The random e ects for each subject are independently and identically distributed to multivariate normal distri- bution MVN(0; 2 ), where is the m m covariance matrix. By combining Yi = (Yi1; :::; Yim)T ;XTi = (xi1; :::; xim); and Z T i = (zi1; :::; zim); the LME mode can 4 be simpli ed to: Yi = Xi +Zibi + i (2.1) where Xi is the m p design matrix for the xed e ects and Zi is the m q design matrix for the random e ects for subject i. The error term i is independently and identically distributed N(0; 2Im). The subject-speci c random e ects bi are independent of the population speci c xed e ects : It can be seen that is associated with the xed e ects and is used for predicting the mean, while b is associated with the random e ects and accounts for much of the variance in the model. 2.2 Penalization Methods for Selection of Fixed E ects Other methods have become widely used for performing model selection more e ciently, notably penalized regression algorithms. Many di erent penalization methods have been used to estimate parameters in a regression model with outcome yi, design matrix Xi, and coe cients . The penalization methods reduce the num- ber of dimensions of the model by setting the coe cients of unimportant predictors to 0, leading to a more parsimonious and interpretable model. For parameter vector = ( ; 1; : : : ; p), penalization methods can be summarized as min f( ) = g( ) + P ( ); (2.2) 5 where g( ) is a loss function and P ( ) is a penalty function on with tuning parameter . 2.2.1 Lasso Penalty When there are a large number of parameters to be estimated, the \least absolute shrinkage and selection operator," or lasso (Tibshirani 1996), is a commonly used regularization method to reduce the number of parameters. A simple example is lasso penalized linear regression, in which the penalized objective function (2.2) can be written as: NX i=1 (yi X j jxij) 2 + X j j jj; (2.3) where > 0. Lasso applies a constraint on the residual sum of squares when nding the Ordinary Least Squares (OLS) estimate. Placing this constraint is equivalent to adding the penalty term Pp j=1j jj to the OLS estimates. The regularization problem (2.3) can then be rewritten as, in dual formation, the sum of the absolute value of the coe cients constrained to less than a certain tuning parameter t 0: min ( NX i=1 (yi X j jxij) 2 ) ; subject to X j j jj t; When t is large, the magnitude of the constraint placed on the estimates is minimal, resulting in solutions close to the OLS estimates. When t is small, the constraint placed on the solution causes shrinkage in the OLS estimates towards zero. Equiv- alently, a large corresponds to a small t, resulting in a larger penalty on the 6 Introduction to High-Dimensional Penalized Methods Coordinate Descent Algorithms Applications in High-Dimensional Data Analysis Discussion References Lasso vs. Ridge In `2 regression with lasso and ridge penalties Lasso solution: the contours touch the square, occurring at a corner , a zero coecient Ridge solution: no corners to hit , no zero solutions 13 / 64 Figure 2.1: Geometric Interpretation of the Lasso Penalization (Hastie et al. 2009). The gure on the left is the Lasso Penalty, while the gure on the right is the Ridge Penalty. The shaded region represents the constraint region for the coe cients. The contours represent the area the objective function has coe cient values. parameters. As the value of increases, a greater magnitude of shrinkage will be placed on the coe cient estimates. An advantage of the lasso is its ability to not only shrink these coe cient estimates towards the origin, but also to perform model selection of the important variables by setting unimportant coe cients to exactly zero. Figure 2.1 represents a simple problem with two coe cients, 1 and 2, show- ing a geometric interpretation of the lasso (left) compared with the ridge penalty (right). The ridge penalty, where P ( ) = Pp j=1 2 j , is a method that is known to only shrink coe cients without setting any to zero. Similar to the lasso, this penalty is equivalent to a constraint region for the estimation of the parameters, given by: ^ridge = arg min ( NP i=1 (yi P j jxij)2 ) , subject to P j 2j t; However, the shaded regions in the gure show that the penalties result in di erent 7 shapes for their constraint regions. The region for the ridge?s penalty P ( ) = ( 21 + 2 2) is represented as a circular shape, while the lasso?s P ( ) = (j 1j+j 2j) is represented as a diamond shape with corners at 0. The elliptical contours correspond to the quadratic form of the loss function PN i=1(yi P j jxij) 2, where the center of the contour shape is the non-penalized OLS solution for 1 and 2. The minimum of the sum of g( ) + P ( ) will produce the optimal value of 1 and 2, and this minimum occurs at the point of intersection of the contours and the shaded region. In the gure, the solution on the left shows that 1 will to be set to 0. In con- trast, in the ridge plot, it can be seen that this will almost never happen due to the rounded shape of both the loss function and of penalty function. This demonstrates the important bene t of the lasso penalty: due to the shape of its constraint region, it is more likely to eliminate unimportant predictors and perform model selection (Hastie et al. 2009). A simple example of the model selection attribute of the lasso can be seen in Table 1, which compares estimates of six predictors using ordinary least squares, estimates after ridge penalization, and estimates after lasso penalization with the true model of three predictors. For sample size n = 30 the true model is: y = 0 + 0:5X1 + 1X3 + 0:5X4 + ; where 0 = 1:5; X N(0; 1); and N(0; 1). The true predictors in the model are X1; X3; and X4, while X2; X5, and X6 are noise variables. While the least squares o ers estimates of the coe cients, the unimportant predictors, as expected, 8 remain in the model with small, nonzero coe cients. The ridge penalty shrinks the estimates compared to the least squares estimates, but fails to eliminate the noise variables. Conversely, the lasso is able to eliminate the false predictors X2; X5; and X6, while retaining the true predictors X1; X3; and X4. This example shows that using the lasso provides a more e cient way to perform model selection. However, it is important to note that the lasso penalty often produces biased estimates, since all coe cients, even important ones, shrink towards the origin. The magnitude of selected lasso coe cients will be underestimated due to this shrinkage. An extension of the lasso, the adaptive lasso (Zou 2006), seeks to minimize this bias. The adaptive lasso applies a weight w to the lasso penalization, seeking to minimize: PN i=1(yi P j ixij) 2 + P j wj jj where w = 1=j ^j and ^ is usually the ordinary least squares estimate and > 0. It can be seen that, with this weight, j?s with small values will be further penalized towards 0, further reducing the number of parameters in the model. Conversely, large and important coe cients will be minimally penalized. The adaptive lasso has been shown to have have the oracle properties de ned by Fan and Li (2001): the adaptive lasso consistently selects true variables in a known model and has asymptotic normality. 9 2.2.2 Smoothly Clipped Absolute Deviation (SCAD) Penalty Consider the penalized regression problem (2.2). Due to the biased results of lasso, Fan and Li (2001) sought to create a penalty function that gave unbiased and sparse results, and was a continuous function. The Smoothly Clipped Absolute Deviation (SCAD) penalty for coe cient vector by its derivative P 0 ( ) = 8 >>>>>>>>< >>>>>>>>: sgn( ) ; if x < 0 sgn( ) (a j ja 1 ; if < j j a : 0; if j j> a for a > 2 and > 0. The resulting penalty function is: P 0 ( ) = 8 >>>>>>>>< >>>>>>>>: j j; if x < 0 ( 2 2a j j+ 2) 2(a 1) ; if < j j a : (a+1) 2 2 ; if j j> a The penalty function is a quadratic spline function, dependent on two tun- ing parameters a and . The results were found to be relatively insensitive to the parameter a. Through cross-validation, a = 3:7 was found give satisfactory re- sults consistently. The second tuning parameter can also be found through cross validation for given data. The advantages of the SCAD penalty are that it is continuous at all points except 0 and that it produces results with low bias. Similar to the adaptive lasso, 10 when is large, there is little penalization compared to when is small. This will ensure that unimportant variables will be penalized heavily while leaving important variables relatively unpenalized. The SCAD penalty particularly outperforms the lasso penalty in selecting important variables while eliminating unimportant vari- ables when the variance of the data is large. Additionally, the SCAD penalty was also shown to have oracle properties. 2.2.3 Review of Other Penalization Methods Similar to the lasso and SCAD regression, there have been many penalized reg- ularization methods to select and estimate xed e ects. Zou and Hastie (2005) created the elastic net, which combines lasso and Ridge penalization methods and proposed the algorithm to solve the elastic net e ciently. Bondell and Reich (2008) created the octagonal shrinkage and clustering algorithm for regression (OSCAR). This method has the ability to select important variables among a set of highly cor- related predictors. However, these methods for selection of xed e ects do not take both the correlation and covariance structure of the random e ects into account. Ignoring or underestimating covariance structure can lead to biased results of the variance estimates for the ordinary least squares of xed e ects (Lange and Laird 1998). 11 2.3 Review of Methods for Selection of Random E ects In order to select important random e ects in a model, Stram and Lee (1994) discuss the use of likelihood ratio tests in testing for nonzero variance components linear mixed e ects models. Lin (1997) proposed a global score test to test the null hypothesis that all variance components were equal to 0, then individual score tests are determined for each random e ect and estimation of the variance components can be made. Hall and Praestgaard (2001) place constraints on the score tests to select and estimate important random e ects. Chen and Dunson (2003) use a Cholesky decomposition of the covariance matrix and Bayesian methods to select and estimate random e ects variances. Foster et al. (2009) develop a lasso-based method for selection of random e ects. These methods, however, only consider the random e ects and do not have the ability to select or estimate the xed e ects. 2.4 Information Criteria for Model Selection Information criteria methods are among the most popular methods of model selection, as they have the ability to select both xed and random e ects in a linear mixed e ects model. Two of these methods are Akaike Information Criteria (AIC, Akaike 1973) and Bayes Information Criteria (BIC, Schwartz 1978). For these methods, the likelihood L is found for every combination of parameters in the model. From there, a penalization for the number of parameters in the model is added to 2 lnL. The goal is to nd the minimum of the following: AIC = 2 lnL+ 2k 12 BIC = 2 lnL+ k ln(n) where k is the number of parameters, and n is the number of observations. The BIC method places a larger penalization on the number of parameters than AIC. The information criteria seek to nd a balance between creating a model with good t for the data and with having a small set of parameters. While these methods are e ective in choosing a parsimonious set of xed and random e ects that give the best t, they can be burdensome as the number of possible parameters increases. For p xed e ect parameters and q random e ect parameters, the number of models that need to be compared for information criteria is 2p+q. As p and q increase, the number of possible models increases exponentially. Thus, for a large number of predictors, AIC and BIC are ine cient methods of variable selection. 2.5 Summary of Previous Model Selection Methods While these methods are e ective in performing model selection, they are not ideal for use in LME models. The penalization methods for xed e ects do not take the random e ects into account, and can lead to inaccurate estimates. The random e ects methods do not consider the xed e ects. Information criteria can nd models with xed and random e ects, but are extremely ine cient for problems with a large number of predictors. Recently, there have been other methods used to select both xed and random e ects more e ciently. Jiang et al. (2008) consider a method to select models with important predictors that doesn?t rely on minimizing 13 a criterion function. In this method, a statistical "fence" is created to eliminate incorrect models. From there, an optimal model is selected from the remaining models on the right side of this fence. This method eases the burden that is common with the information criterion methods. Three new methods have been created in recent years to simultaneously select xed and random e ects. The latter sections of this thesis will describe and evaluate these three methods. 2.6 Trial of Activity in Adolescent Girls (TAAG) Physical inactivity has been identi ed as a risk factor for obesity, or a high percentage of body fat, especially in adolescents (Pietil ainen et al. 2008). The prevalence of childhood overweight has been increasing drastically in the United States (Ogden et al. 2002). The resulting health problems that can occur from physical inactivity and obesity, such as type 2 diabetes, high blood pressure, and sleep disorders, have been on the rise in children and adolescents in recent years (Daniels et al. 2005). It has been recommended by the Council on Sports Medicine and Fitness and Council on School Health (2006) that increasing physical activity in children and adolescents can be e ective in reducing the prevalence of obesity and the resulting health problems later on in life. Among black and white girls, physical activity declines as a child ages through adolescence (Kimm et al. 2002). This decline in physical activity is more prevalent in girls than in boys (Sallis et al. 1996). Previous school-based interventions targeted 14 at boys and girls have not been extremely successful. The Trial of Activity in Ado- lescent Girls (TAAG) was a school and community-based, multisite, interventional trial targeted at girls in order to lessen the typical declines in physical activity. The study was conducted at 26 sites across six geographically diverse areas in the United States, consisting of California, Minnesota, Maryland, Louisiana, South Carolina, and Arizona. Data was collected at two time points in the spring of 2003, for girls in the 6th grade and in the spring of 2005, when the girls were in 8th grade. The intervention and control groups were assigned randomly in 2003. The program was designed to create environments in schools and the surrounding community that encouraged physical activity and to give cues or messages that incentivize physical activity (Webber et al. 2008). The purpose of the intervention was to reduce the declines in Moderate to Vigorous Physical Activity (MVPA) that normally occurs in adolescent girls. As a part of the TAAG study, data was collected to assess the sustainability of the program in the spring of 2006 in a new group of 8th grade girls. In the spring of 2009, followup data was collected from only the girls at the six Maryland TAAG sites for the Trial of Activity in Adolescent Girls 2 (TAAG 2). This data was collected when the 2006 8th grade group was in 11th grade. The purpose of the TAAG 2 study was determining factors at the individual, social, school, and neighborhood levels that may in uence levels of MVPA in adolescent girls (Young et al. 2013). The analysis in this paper will use data from only these 2006 and 2009 time points. The main outcome of interest for this paper will be average MVPA minutes per day in the TAAG 2 study. The goal will be to select important xed 15 and random e ects of interest from the TAAG data. The methods will be discussed in the latter sections of this thesis. 16 Chapter 3 Methods 3.1 Method 1 - Double Penalization This section describes the method created by Li et al. (2012). It selects and estimates for the parameters of an LME through a regularization problem with two penalization functions: one for the xed e ects and one for the random e ects. Model Consider the LME given by equation (2.1) that is standardized to have a mean equal to 0 and a Euclidian norm equal to 0. The xed e ect intercept is removed from the model, but a random intercept bi0 remains in the model. The mean and variance of Yi are E(Yi) = Xi and V ar(Yi) = 2(Zi ZTi + Im). Maximum Likelihood Estimation For N > p, a modi ed log-likelihood incor- porates the restricted log-likelihood (Harville 1974). To maximize for , , and 2, ?nM( ; ; 2) = 1 2 nX i=1 logj 2Vij 1 2 log 2 nX i=1 XTi V 1 i Xi 1 2 2 nX i=1 (Yi Xi )TV 1i (Yi Xi ); (3.1) where Vi = Im+Zi ZTi , the covariance structure of Yi: When N p, the restricted term in (3.1) becomes singular. Therefore, when N p, the following full log- 17 likelihood to be maximized is: ?nF ( ; ; 2) = 1 2 nX i=1 logj 2Vij 1 2 2 nX i=1 (Yi Xi )TV 1i (Yi Xi )(3.2) Objective Function and Penalization A general formula for the objective func- tion can be written: Qn( ; ; 2) = ?n( ; ; 2) 1P1( ) 2P2( ) where P1 is the penalization for the xed e ects, P2( ) is the penalization for the random e ects, and 1 and 2 are their non-negative tuning parameters, respectively. The log likelihood ?n( ; ; 2) will be (3.1) or (3.2) when N > p and N p, respectively. For the xed e ects, an adaptive L1-norm, or adaptive lasso, penalty J1( ) is applied (Zou 2006), where P1( j) = pX j=1 wjj jj; where wj = 1j j j is a weight given by dividing by the estimated coe cient. For the random e ects, rst, a Cholesky Decomposition is used to break into = LLT . The Cholesky factor of , L, is a unique lower triangular matrix with positive diagonals. Penalization will then be performed on L. For any given k 2 f1; : : : ; qg, nding a nonzero row (k) in L, or L(k) (and therefore the nonzero diagonal element kk), will select the corresponding random e ect bk. Conversely, if L(k) is equal to 0, then the corresponding kk will equal 0, e ectively removing 18 the kth random e ect from the model. An adaptive weight is added to an L2-norm penalty and this penalty is applied to the random e ects, shrinking towards the coe cients toward zero: P2(L) = qX k=2 wk q L2k1 + :::L 2 kq; where wk = 1jjL(k)jj is the weight given by dividing by the norm of the estimated coe cient. Again, this adaptive weight will help shrink small coe cients further towards zero, while leaving the important predictors unpenalized. Algorithm First, 2 can be estimated (Lindstrom and Bates 1988) by: ^2 = 1 N p nX i=1 (Yi Xi )TV 1i (Yi Xi ); for N > p and ^2 = 1N nP i=1 (Yi Xi )TV 1i (Yi Xi ); for N p. By inserting the estimated ^2 into (3.1) and (3.2), respectively, the objective function can then be solved for one fewer parameter. The algorithm of estimating and L is done in iterations, through maximizing the simpli ed objective function: Qn( ;L) = PR( ;L) 1 pX j=1 j jj 2 pX j=1 q L2k1 + :::+ L 2 kq where PR( ;L) is the updated log-likelihood functions (3.1) or (3.2) with ^2 substi- tuted into the equation. 19 The algorithm updates in iterations between two quadratic components until convergence. First, L is xed and is estimated, then is xed and L is updated. This is repeated until convergence. The step when L is xed is similar to a lasso problem. However, when is xed, the random component must be split into two problems, estimating L and new parameter , which is updated from L estimates. The algorithm is completed as follows: 1. Initialize the parameters (0), L(0), (0) 2. Update Lkj for iteration r by nding the maximum of the rst quadratic component: L(r)kj = arg maxLkj PR( (r 1);L) 22 4 qX k=1 1 ( (r 1)k ) 2 kX j=1 L2kj 3. Update k: (r)k = r 2 2 jjL(r)k jj2 4. Update , using the LARS algorithm (Efron et al. 2004), the second quadratic component. 5. If the di erence between L(r)kj and L (r 1) kj and between (r) j and (r 1) j are smaller than a speci ed amount, usually 10 5, then the algorithm can end and estimates are obtained. If not, the process from step 2 can be continued for iteration r + 1: 20 3.2 Method 2 - Joint Penalization This section describes the method introduced by Bondell et al. (2010). This method simultaneously selects and estimates xed and random e ects in an LME model using one joint penalization function for xed and random e ects. Model Consider the LME model in equation (2.1). Using a modi ed Cholesky Decomposition (Chen and Dunson 2003), the covariance matrix is factorized as = D 0D where is a q q lower triangular matrix with 1?s on the diagonal and whose (l; r)th element is given by lr and D = diag(d1; d2; : : : ; dq) is a diagonal matrix. After this decomposition, the LME model can be written Yi = Xi +ZiD bi + i (3.3) where it is now assumed that Yi has been centered so that XTi Xi and Z T i Zi rep- resent correlation matrices and bi is independently and identically distributed to MVN(0; 2Im). The covariance matrix of bi is now expressed in terms of vector d = (d1; d2; : : : ; dq)T and of the free elements of , denoted by vector = ( lr : l = 1; : : : ; q : r = l + 1; : : : ; q)T . Setting any dl = 0 will set the corresponding lth row and column of the covariance matrix to 0 and therefore remove the lth random e ect from the model. 21 In the new model in (3.3), Yi follows a normal distribution with mean Xi and variance Vi = 2(ZiD TDZi + Im). Maximum Likelihood Estimation Given Y and by treating bi as observed, the log-likelihood function for the LME model is: ?F ( ;d; jY ; b) = N nq 2 log 2 1 2 2 jjY Xi Zi eDe bjj2+bTb (3.4) where Z is a block diagonal matrix of Zi, and eD = In D and e = In D, where is the Kronecker product. Objective Function and Penalization By minimizing the jjY Xi Zi eDe bjj term in (3.4), the log-likelihood will be maximized. Therefore the objective function is: Qn( ;d; jY ; b) = jjY Xi Zi eDe bjj2+P 1( ;d) where P 1( ;d) is chosen to be an adaptive lasso penalty function with tuning parameter 1 such that: P 1( ;d) = 1 Pp j=1 j j j j ^j j + Pk k=1 jdkj jd^kj where ^ and d^ are the ordinary least squares estimates. Rearranging the terms, the joint penalized objective function to be minimized in the algorithm is: QF ( ;d; jY ; b) = jjY X ZDiag(e b)(1q In)djj2+ 1 pP j=1 j j j j ^j j + kP k=1 jdkj jd^kj ! where 1q is a q 1 column vector of 1?s. 22 Algorithm To solve for ;d, and , the expectation-maximization (EM) algo- rithm (Laird and Ware 1982) is used. The algorithm consists of two steps. First, the conditional expectation of QF ( ;d; jY ; b) is taken (E-Step), then the objective function is minimized (M-Step) with respect to ( T ;dT ; T )T . The overall process is as follows: 1. Let = ( T ;dT ; T )T , the vector of parameters and (r) be the estimate of parameters at the rth step. For r = 0, the REML estimates are chosen for the parameters 2. For the rth step, rst take the E-step, or nd the conditional expectation of the objective function, assuming the random e ects are unobserved: g( j (r)) = Ebjy; (r) n jjY X ZDiag(e b)(1q In)djj2 o + 1 pX j=1 j jj j jj + kX k=1 jdkj j dkj ! 3. Complete the M-step by minimizing g( j (r)) with respect to . This is completed by iterating between and ( ; d). 4. The process is completed for step r + 1 at step 2, unless convergence has occurred. 23 3.3 Method 3 - Independent Selection with Proxy Matrix This section describes the method created by Fan and Li (2012). This method solves for the xed e ects and the random e ects b separately. A proxy matrix is substituted for the unknown true covariance structure during the selection and estimation of xed and random e ects. Model Consider the model in (2.1). By stacking Yi;Xi; bi; and i, notate Y ;X; b, and . Let Z = diagfZ1; :::;Zng and ~ = diagf ; :::; g be block diagonal matri- ces. The xed e ect predictors X are standardized so that each column has norm p n. The LME model becomes: Y = X +Zb+ Maximum Likelihood Estimation of Fixed E ects The MLE for xed e ects can be found by maximizing the joint density function of Y and b: f(y; b) = (2 ) (n+qm)=2j~ j 1=2 exp 1 2 2 (y X Zb)T (y X Z 1 2 bT ~ 1b Expressing the MLE for b in terms of a given , is b^( ) = Bz(Y X ), where Bz = (ZTZ + 2 ~ 1) 1ZT . By inserting b^( ), the MLE for b in terms of , the likelihood function for the xed e ects can be expressed as: ?n( ; b^( )) = exp 1 2 2 (Y X )TPz(Y X ) (3.5) 24 where Pz = (I ZBz)T (I ZBz) + 2BTz ~ 1Bz. Finding the that maximizes (3.5) will give the xed e ects solution. Objective Function and Penalization for Fixed E ects A general formula for the objective function of xed e ects is written: Qn( ) = 1 2 (Y X )TPz(Y X ) + n pX j=1 P 1(j j) (3.6) where the goal is to minimize Qn( ). It is required that the penalty function P 1(j j) is concave and increasing, so a smoothly clipped absolute deviation (SCAD, Fan and Li 2001) penalty function is chosen with tuning parameter 1 > 0. Proxy Matrix for Fixed E ects Because Pz is dependent on the unknown covariance matrix ~ and unknown variance 2, a proxy matrix ePz = (I +ZMZT ) is substituted for Pz withM = log(N)I. Using thisM, the proxy matrix Pz satis es a condition of decreasing minimal signal decay strength as sample size increases. It also satis es constraints placed on the proxy matrix ePz to ensure that the model selection has the oracle property. With this proxy matrix substituted into (3.6), the optimization problem becomes a quadratic problem which can be solved using the LARS algorithm (Efron et al. 2004). Objective Function and Penalization for Random E ects The number of random e ects q is allowed to increase with sample size n. For Px = I X(XTX) 1XT , the objective function for the random e ects is 25 Qn(b) = (y Zb)TPx(y Zb) + 2bT ~ +b where ~ + is the Moore-Penrose generalized inverse of ~ . Adding a penalty, the regularization problem is created: Qn(b) = 1 2 (y Zb)TPx(y Zb) + 1 2 2bT ~ +b+ n qnX k=1 P 2(bk) (3.7) where P 2(bk) is the SCAD penalty function with parameter 2. In reality, the covariance matrix ~ and the variance 2 are unknown, so again, a proxy matrixM is substituted for 2 ~ so the regularization problem becomes: Qn(b) = 12(y Zb) TPx(y Zb) + 12 2bTM 1b+ nPqnk=1 P 2(bk) Minimizing this equation gives an estimate of the random e ects parameter vector b^. Note that once the proxy matrix is substituted into the objective function, this method does not require knowledge of the xed e ect parameter . Proxy Matrix for Random E ects Again, M = (log n)I is chosen to satisfy constraints placed on the proxy matrix. Substituting this proxy matrix into (3.7) creates a quadratic optimization problem similar to the the adaptive elastic net (Zou and Hastie 2005). This allows the problem to be solved using existing quadratic algorithms. It should be noted that using (log n)I ignores correlations among the random e ects, which could introduce bias into the estimation of the covariance matrix. However, although there may be a biased covariance matrix estimate, it avoids errors caused by estimating a large number of parameters. The authors argue that 26 the overall error caused by the accumulation of these errors from each parameter estimate would give poorer results than by using the proxy matrix. 27 Chapter 4 Analysis of Simulated Data Experiments of six simulated data situations will be conducted to compare the e ectiveness and accuracy of the three methods. All of the simulations will represent data sets where the number of observations N are greater than the number of predictors p. For all methods, tuning parameters are chosen through grid search to nd the ?s that result in the lowest BIC. Each simulation consisted of 50 replicates. 4.1 Simulation 1 Setting This simulation generates a small study population of n = 30 clusters with m = 5 observations within each cluster, for observation l 2 f1; : : : ;mg. There are 10 predictors in consideration, of which only four are important xed e ects and three of which are important random e ects. The random e ects will be selected from the same 10 predictors as the xed e ects, so p = 10 and q = 10. The true model is given by: yil = (1 + bi0) + (3 + bi1l)xi1l + (1:5 + bi2l)xi2l + (2 + 0)xi5l + (2 + bi10l)xi10l (4.1) 28 with xijl N(0; 1) and Corr(xijl; xijl0) = 0:5l l0 . The random e ects (bi0l; bi1l; bi2l; bi10l) are generated from MVN(0; 2R), with = 0:8 and R = 0 B B B B B B B B B B @ 1:0 0:5 0:3 0:2 0:5 1:0 0:5 0:3 0:3 0:5 1:0 0:5 0:2 0:3 0:5 1:0 1 C C C C C C C C C C A (4.2) Result Simulation 1 results are in Tables 2 and 3. All methods correctly select all true xed e ects 100% of the time, with the exception of 2 in Method 3, which was correctly selected 98 percent of the time. Only Method 1 selects all true random e ects 100 percent of the time. Methods 2 and 3 still perform well, selecting the all correct random e ects 92.67 percent and 70.67 percent of the time, respectively. Method 3 eliminates predictors the most, resulting in the smallest average model sizes for both xed and random e ects. In fact, Method 3?s average model size is consistently less than the true model size, so when using Method 3, it is probable that true random e ects are not selected. 4.2 Simulation 2 Setting Simulation 2 has the same true LME equation as Simulation 1, seen in (4.1). However, this will be a a larger data set, where m = 8 observations within n = 200 clusters for p = 100 and q = 50 predictors. There are still only four important xed e ects and three important random e ects, as in equation (4.1). 29 All i = 0 for > 10. The random e ects are chosen from the rst 50 xed e ects xij; where j = f1; : : : ; 50g, so p = 100 and q = 50. Results The results of simulation 2 can be found in Tables 4 and 5. Method 2 was unable to complete analysis due to lack of memory, resulting in error "Error: cannot allocate vector of size 793.8 Mb." Method 3 was unable to complete analysis for the random e ects, running for hours and then force closing MATLAB without results. Method 1 was able to select the true xed and random e ects in 100 percent of the simulations, while Method 3 selected the true xed e ects 100 percent of the time. Method 1 performed well at eliminating random e ects, including false predictors in the model only 0.469 percent of the time. Both methods were able to eliminate noise variables well, only selecting false xed e ects less than one percent of the time. 4.3 Simulation 3 Setting This simulation generates a small study population of n = 60 clus- ters with m = 3 observations taken at within the cluster, with observation l 2 f1; : : : ;mg: Xijl and Zikl are generated from N(0,1) with Corr(Xijl; Xijl0) = 0:5l l0 and Corr(Zikl; Zikl0) = 0:8l l 0 . Random e ects bikl are generated fromMVN(0; 2R), where = 0:8 and R = 0 B B B B B B @ 1:0 0:5 0:3 0:5 1:0 0:5 0:3 0:5 1:0 1 C C C C C C A (4.3) 30 There are p = 10 predictors for xed and q = 5 predictors for random e ects to choose from, with four true xed e ects and two true random e ects. The true model is given by: yil = [1 + 3xi1l + 1:5xi2l + 2xi5l + 2xi10l] + [bi0l + bi1lzi1l + bi5lzi5l] (4.4) Results For xed and random e ects selected from di erent sets of predictors, the simulation 3 results are found in Tables 6 and 7. Again, all methods perform well when selecting xed e ects, correctly keeping true xed e ects 100 percent of the time. The methods do not perform as well selecting random e ects as in simulation 1, but still correctly select true variables, on average, more than 70 percent of the time. Method 1 performs the best in this regard at 83 percent. Methods 2 and 3 both eliminate random predictors more heavily, resulting in average random model sizes that are less than the true model size. 4.4 Simulation 4 Setting Simulation 4 has the same true LME equation as Simulation 3, seen in (4.4). However, this will be a a larger data set, where n = 600 clusters with m = 3 observations within the clusters for p = 50 and q = 10 predictors. There are still only four important xed e ects and two important random e ects, as in equation (4.4). All i = 0 for i > 10. All bi = 0 for i > 5. 31 Results For larger data sets, simulation 4 results are found in Tables 8 and 9. Again, due to computational limitations, Method 2 was unable to complete both xed and random e ects, while Method 3 was unable to complete the analysis for the random e ects. Methods 1 and 3 selected the true xed e ects 100 percent of the time. Method 3 was able to eliminate false xed e ects 100 percent of the time. Method 1 e ectively selected true random e ects an average of 100 percent of the time, while eliminating false random e ects almost 94 percent of the time. 4.5 Simulation 5 Setting This simulation generates a small study population of n = 60 clusters with observations taken at m = 3 points within the clusters (observation l 2 f1; : : : ;mg). However, this will simulate a multilevel study, where the xed e ects are selected at the individual level and the random e ects are selected from group level predictors. At the individual level, X ijl is generated from N(0,1) with Corr(Xijl; Xijl0) = 0:5l l 0 . At the group level, the predictors Zikl are nested within g = 6 groups, so all members of group g will have the same set of responses Zgl. Zgkl is generated from N(0,1) with Corr(Zgkl; Zgkl0) = 0:8l l 0 . Random e ects bigkl are generated from MVN(0; 2R), where = 0:8 and R is equation 4.3 above. The random e ects have a subject- speci c intercept bi0l and predictor-associated bgk, for k 2 f1; : : : ; qg. There are p = 10 predictors for xed and q = 5 predictors for random e ects to choose from, with four true xed e ects and two true random e ects. The true model is given 32 by: yigl = [1 + 3xi1l + 1:5xi2l + 2xi5l + 2xi10l] + [bi0l + bg1lzg1l + bg5lzg5l] (4.5) Results For results from nested clustered designs, Simulation 5 results can be found in Tables 10 and 11. All methods again perform well in the identi cation of xed e ects, selecting true xed e ects 100 percent of the time. The methods do not perform as well selecting random e ects as in simulations 1 and 3. Methods 1 and 2 perform the best at selecting random e ects, and among the two, Method 1 is also better at eliminating noise random e ects. Method 3 again eliminates random e ects from the model the most. Method 3?s models average just over one random e ect in each model, about half the true size. 4.6 Simulation 6 Setting Simulation 6 has the same true LME equation as Simulation 5, seen in (4.6). However, this will be a a larger data set, where n = 600 clusters with m = 3 observations within clusters for p = 50 and q = 20 predictors. There are still only four important xed e ects and two important random e ects, as in equation (4.6). All i = 0 for i > 10. All bi = 0 for i > 5. Results Simulation 6 results are listed in Tables 12 and 13. Again Methods 2 and 3 were limited by computational power. Methods 1 and 3 selected the true xed e ects 100 percent of the time, and both were able to eliminate noise xed 33 e ects 100 percent of the time. Method 1 e ectively selected true random e ects an average of 100 percent of the time, while eliminating false random e ects more than 95 percent of the time. 4.7 Simulation 7 Setting This simulation generates a small study population of n = 60 individuals with observations taken at m = 3 time points. It will simulate a multilevel study, where the xed e ects are selected at the individual level and the random e ects are selected from group level predictors. It will also simulate a longitudinal study, with time t = (1; 2; 3). At the individual level Xij(t), for individual i and predictor j, X1 = t and X ij for j 2 (2; : : : ; p) are generated from N(0,1) with Corr(Xij(t); Xij0(t)) = 0:5t t0 . At the group level, the predictors Zgk are nested within g = 6 groups, so all members of group g will have the same set of responses Zgk(t). Zgk(t) is generated from N(0; 1) with Corr(Zgk(t); Zgk(t0)) = 0:8t t 0 . Random e ects b are generated from MVN(0; 2R), where = 0:8 and R is equation 4.3 above. The random e ects have a subject-speci c intercept bi0(t) and predictor-associated bigk, for k 2 f1; : : : ; qg. There are p = 10 predictors for xed and q = 5 predictors for random e ects to choose from, with four true xed e ects and two true random e ects. The true 34 model is given by: yig(t) = 1 + Xij(t) + bi0(t) + bigkZgk(t) = [1 + 3x1(t) + 1:5xi2(t) + 2xi5(t) + 2xi10(t)] + [bi0(t) + big1zg1(t) + big5zg5(t)] Results For results from multilevel longitudinal designs, Simulation 7 results can be found in Tables 14 and 15. All methods again perform well in the identi cation of xed e ects, selecting true xed e ects 100 percent of the time. Methods 2 and 3 overestimate 1, the coe cient for the time variable. Method 1 performs the best at selecting random e ects. Method 2 selects one of the true random e ects often while eliminating the other often. Both are selected more often than the noise random e ects, however. Method 3 again eliminates random e ects from the model the most. Method 3?s models are on average less than the true model size. Again, Method 1 performs the best overall. 4.8 Summary of Simulation Studies Overall, Method 1 was the most e ective across data sets of di erent structures and sizes. It tends to underestimate the true parameter values of xed and random e ects, but this can be expected from penalized optimization problems. This can be remedied through re-estimating the selected model without penalization. Method 3 performed very well in the selection of xed e ects in all settings, but could not perform selection of random e ects well for large data sets. Method 2 performed the worst. While able to select xed e ects accurately, it performed poorly for random 35 e ects in nested settings. It was also unable to complete any analysis for large data sets. 36 Chapter 5 Real Data Analysis 5.1 Data Description and Model Formuation The data used in this analysis will include only the 2006 and 2009 time points of the TAAG 2 Maryland data set. The outcome variable of interest is the average minutes of MVPA per day in the adolescent girls at each time point. In total, there were 66 variables of interest for 551 subjects at two time points. A reference table of the variables can be found in Table 18. Data was collected at the individual, social, school, and neighborhood levels. Examples of predictors at the individual level include BMI, percent body fat, self esteem measures, enjoyment of physical activity, and depression, among others. At the social level, predictors include measures of peer and family support, such as amount of encouragement received from members in the household or time spent home alone. At the school level, variables include policies for items such as physical education and transportation, as well as metrics regarding the schools? performance academically. At the neighborhood level, variables include proximity to their school, parks, and physical activity facilities as well as measures of safety in the neighbor- hood (Young et al. 2013). Because the variable selection methods require that no data is missing, it is required that the subjects have measures at both time points. Those who were not 37 present at both time points were removed from the set. Of the remaining data, miss- ing data is imputed using the Sequential Regression Imputation Method (Raghu- nathan et al. 2001) through IVEware (Raghunathan et al. 2002) where possible. It is not possible to impute factors at the neighborhood level that used GIS data, so most neighborhood level variables were not included in the selection procedures. Questions regarding the subjects? perceptions of their neighborhood, however, are included. For the ith girl, the xed e ects variables Xij to be considered will be from the individual, social, and neighborhood levels, where j 2 f1 : : : pg. Time will also be included for longitudinal consideration in the model. The predictors associated with the random e ects will be used to generalize the variance components of the model. For girls in school g (1 g 6), the random e ects will be selected from the school-level variables Zgk, where k 2 f1 : : : qg. For the school-level variables, only the data from the 8th grade middle school time point will be considered. Therefore, these predictors are not time-dependent. For the outcome yig(t), or average daily MVPA at time t 2 (0; 1) for girl i in school g, the LME model is represented by yig(t) = 0 + t+ Xi + bi0 + bg0 + biZg + ig; where 0 is the xed e ects intercept, is the parameter associated with time, bi are the individual-speci c random e ects with intercept bi0, and bg0 is the school- speci c random intercept. Individual-speci c random e ects bi are distributed to 38 N(0; 2 i). School-speci c random intercept bg0 is distributed to N(0; 2 g). The error term is distributed N(0; 1). 5.2 Results of Real Data Analysis Method 1 and Method 3 were able to complete analysis, while Method 2 was not able to due to incomplete rank of the Z matrix. The results of Methods 1 and 3 can be found in Tables 16 and 17, with, for comparison, results of selected xed and random e ects from the individual time points analyzed by Young et al. (2013). For Method 3, the xed e ects X were scaled so each column had a norm of p n. Following this, the method selected 40 xed e ects and 1 random e ect. Only 6 xed e ects were eliminated from the model, while 18 random e ects were eliminated. In the simulations, the random e ects model sizes were generally smaller than the true model size. Therefore, it is likely true random e ects were eliminated from this model. Method 1 was more successful at creating a more parsimonious model. First, the data was standardized such that X and Z are scaled that have zero mean and unit Euclidean norm. Of the original 47 xed e ects and 19 random e ects, Method 1 selected three xed e ects and two random e ects. The xed e ects selected were (1) self-management strategies (MSQBOD F), (2) perceived barriers (MSQBOD I), and (3) support from friends (MSQBOD OB). Because of the tendency of Method 1 to underestimate the parameter values, the selected model was updated to a non-penalized tted LME model using the lme4 package in R, with corresponding 39 p-values for the xed e ects. Although not selected, time was included in the re- estimated model to determine the longitudinal e ects of time on MVPA. After re-estimation, the results suggest that there is a positive association be- tween self-management strategies and friend support with MVPA ( MSQBOD F = 0:12 and MSQBOD OB = 0:43). There was also a negative association between per- ceived barriers and MVPA ( MSQBOD F = 0:21). Perceived barriers (p =< :001) and friend support (p < :001) were both highly signi cant, and self-management strategies was signi cant at the = 0:05 signi cance level (p = :05). There was negative, but nonsigni cant, e ect of time on MVPA ( time = 0:52; p = :35). All three xed e ects were selected in the previous study, and the results of per- ceived barriers and friend support show the same direction of association. Based on the results, it would be suggested to o er programs or interventions for improv- ing self-management strategies, for reducing barriers to physical activity, and for encouraging peers to give each other support in participating in physical activities. One random e ect was selected out of the 18 original variables. The random ef- fect selected was PPIC19, indicating whether the schools o ered interscholastic and intramural physical activity programs. This was selected for the 2009 11th grade time point in Young et al. (2013), but not in the 2006 8th grade model. Following re-estimation, the results from this analysis suggest that there is substantial varia- tion in MVPA from girl to girl associated with whether their middle school o ered interscholastic or intramural physical activity programs programs ( ^ = 6:68). 40 Chapter 6 Conclusion This thesis has presented three new methods for variable selection in LME models. The method proposed by Bondell et al. (2010) was one of the rst to simul- taneously select xed and random e ects in LME models. While it can e ectively select xed e ects, it performs less accurately in selecting random e ects when the data becomes nested. Further, it cannot perform analysis when the data is time in- dependent, as was the case of the random e ects in the TAAG data. Also, the EM algorithm that Method 2 uses is an ine cient way to solve optimization problems. As data sets get larger through increases in sample size or number of predictors, the slow rate of convergence of the EM algorithm becomes ine cient and even im- plausible with limited computing resources. An option for high-dimensional data, where N p, would be to reduce the number of xed e ect parameters using pre- vious methods, such as the lasso, while ignoring the random e ects. Following this, the method could be applied to the random e ects and the selected xed e ects. However, due to its slow rate of convergence, this method and its use of the EM algorithm would not be able ideal for use on high-dimensional data, Method 3, proposed by Fan and Li (2012) can accurately and quickly select xed e ects in LME models. In simulations it performed excellently in not only selecting true predictors, but it is very e ective at removing noise xed e ects vari- 41 ables from the model as well. However, the performance with the TAAG data was inconsistent with the sparse results displayed in the simulations. The use of the proxy matrix requires certain conditions to be satis ed. Notably, for xed e ects X and random e ects Z, the signal and noise variables must not be highly corre- lated. By using a proxy matrix, the correlation between variables is ignored. In cases of highly correlated signal and noise predictors, the use of the proxy matrix could introduce bias that can hinder the model selection oracle property. There are potentially many correlated variables in the TAAG data set that could violate this condition set in order to use the proxy matrix, which may have caused the poor results. Additionally, the performance of Method 3 in selecting random e ects can be troublesome, as it tends to under-select true models. This can lead to models that are missing important random e ects. For high-dimensional data, it is necessary to rst reduce the number of xed e ects parameters while ignoring the random e ects through previous regularization methods. Next the random e ects can be selected using the chosen xed e ects from the previous step. Finally, these xed e ects can be selected and re-estimated using the selected random e ects from the second step. Based on the results of the simulations, Method 1 by Li et al. (2012) is clearly the optimal method of the three. It selects the true model consistently while elim- inating noise variables e ectively. Additionally, it?s new algorithm for solving the optimization problem is much more e cient than previous methods, such as the EM algorithm. By splitting the optimization problem into two penalized quadratic algorithms, convergence can be reached much quicker than previous methods. Ad- 42 ditionally, this method can be used with high-dimensional data. All that is needed is to use the maximum likelihood approach in equation (3.2), instead of the REML- modi ed equation (3.1). The bene ts of these methods can surely prove invaluable to researchers. This is especially true in the eld of public health, where longitudinal data is often used and is vital for understanding temporal trends of health outcomes. The temporal trends can provide a deeper understanding of biological, social, or environmental processes that can lead to progress in the discovery and improvement of health risks. With the methods introduced in this thesis, it is possible to e ciently and select important xed and random e ects from large, complex sets of predictors. This can aid and advance the eld of public health data greatly in the future, especially as technology and data collection methods improve. 43 Table 1: Example of Model Selection Using Lasso Variable True Value Least Squares Ridge Lasso Intercept 1 1.04 0.96 0.90 X1 0.5 0.69 0.58 0.52 X2 0 0.13 0.06 0.00 X3 1.5 1.44 1.36 1.26 X4 0.5 0.39 0.43 0.41 X5 0 0.11 0.04 0.00 X6 0 -0.16 -0.003 0.00 44 Table 2: Simulation 1 Results - Parameter Estimates Fixed E ects Random E ects Covariate ^% ^ ^ Error ^% ^ ^ Error Method 1 1 100.00 2.37 0.32 100.00 0.53 0.42 2 100.00 0.89 0.26 100.00 0.49 0.37 3 26.00 0.09 0.05 28.00 0.09 0.10 4 22.00 0.05 0.03 18.00 0.07 0.06 5 100.00 1.68 0.18 56.00 0.17 0.18 6 14.00 0.02 0.01 18.00 0.06 0.06 7 6.00 0.01 0.00 18.00 0.02 0.02 8 14.00 0.03 0.02 20.00 0.03 0.03 9 14.00 0.03 0.02 16.00 0.05 0.04 10 100.00 1.32 0.30 100.00 0.48 0.38 Method 2 1 100.00 3.02 0.18 100.00 0.90 0.42 2 100.00 1.50 0.21 100.00 0.90 0.46 3 62.00 0.03 0.13 56.00 0.60 0.47 4 70.00 -0.03 0.11 50.00 0.63 0.46 5 100.00 2.00 0.14 40.00 0.64 0.49 6 72.00 -0.02 0.12 36.00 0.59 0.44 7 66.00 0.01 0.10 52.00 0.58 0.44 8 66.00 -0.03 0.10 28.00 0.58 0.40 9 54.00 -0.00 0.07 42.00 0.65 0.48 10 100.00 2.02 0.19 78.00 0.87 0.60 Method 3 1 100.00 3.03 0.25 74.00 - - 2 98.00 1.52 0.30 72.00 - - 3 0.00 0.00 0.00 2.00 - - 4 0.00 0.00 0.00 10.00 - - 5 100.00 1.94 0.18 10.00 - - 6 0.00 0.00 0.00 2.00 - - 7 0.00 0.00 0.00 2.00 - - 8 0.00 0.00 0.00 4.00 - - 9 0.00 0.00 0.00 14.00 - - 10 100.00 1.99 0.21 66.00 - - 45 Table 3: Simulation 1 Results - Summary Method 1 Method 2 Method 3 Double Joint Independent Penalty Penalty Selection Avg Model Size Fixed (True = 4) 4.96 7.88 3.98 Avg Model Size Random (True = 3) 4.74 5.8 2.56 Percent True Included 100.00 100.00 99.33 Percent True D Included 100.00 92.67 70.67 Percent False Included 16.00 64.58 0.00 Percent False D Included 21.75 35.00 6.29 46 Table 4: Simulation 2 Results - Parameter Estimates Fixed E ects Random E ects Covariate ^% ^ ^ Error ^% ^ ^ Error Method 1 1 100.00 2.52 0.15 100.00 0.53 0.06 2 100.00 0.97 0.14 100.00 0.51 0.06 3 4.00 0.02 0.00 8.00 0.01 0.00 4 10.00 0.01 0.00 0.00 0.00 0.00 5 100.00 1.89 0.05 32.00 0.04 0.00 6 0.00 0.00 0.00 0.00 0.00 0.00 7 2.00 0.01 0.00 0.00 0.00 0.00 8 0.00 0.00 0.00 0.00 0.00 0.00 9 0.00 0.00 0.00 2.00 0.01 0.00 10 100.00 1.47 0.14 100.00 0.54 0.09 11 2.00 0.01 0.00 4.00 0.01 0.00 12-50 0.00 0.00 0.00 0.00 0.00 0.00 51-100 0.00 0.00 0.00 - - - Method 2 1-50 * * * * * * 51-100 * * * - - - Method 3 1 100.00 3.01 0.08 * * * 2 100.00 1.48 0.07 * * * 3 2.00 0.14 0.02 * * * 5 100.00 2.00 0.05 * * * 8 2.00 -0.17 0.02 * * * 9 4.00 0.01 0.04 * * * 10 100.00 1.99 0.07 * * * 15 2.00 -0.16 0.02 * * * 22 4.00 0.03 0.03 * * * 28 2.00 -0.13 0.02 * * * 29 2.00 0.15 0.02 * * * 36 4.00 -0.01 0.03 * * * 40 2.00 0.12 0.02 * * * 45 2.00 0.14 0.02 * * * 49 2.00 0.14 0.02 * * * 54 2.00 0.14 0.02 - - - 56 2.00 0.17 0.02 - - - 66 2.00 0.14 0.02 - - - 72 2.00 0.12 0.02 - - - 73 2.00 -0.13 0.02 - - - 75 2.00 0.13 0.02 - - - 47 Table 4 { Continued Covariate ^% ^ ^ Error ^% ^ ^ Error 76 2.00 -0.17 0.02 - - - 77 2.00 0.13 0.02 - - - 81 2.00 0.15 0.02 - - - 88 2.00 -0.13 0.02 - - - 89 2.00 -0.15 0.02 - - - 90 2.00 -0.15 0.02 - - - 98 2.00 -0.15 0.02 - - - *Could not complete due to computational limitations 48 Table 5: Simulation 2 Results - Summary Method 1 Method 2 Method 3 Double Joint Independent Penalty Penalty Selection Method 3 Avg Model Size Fixed (True = 4) 4.22 * 4.54 Avg Model Size Random (True = 3) 3.46 * * Percent True Included 100.00 * 100.00 Percent True D Included 100.00 * * Percent False Included 0.229 * 0.563 Percent False D Included 0.469 * * *Could not complete due to computational limitations 49 Table 6: Simulation 3 Results - Parameter Estimates Fixed E ects Random E ects Covariate ^% ^ ^ Error ^% ^ ^ Error Method 1 1 100.00 2.52 0.18 90.00 0.33 0.30 2 100.00 1.03 0.15 46.00 0.20 0.20 3 32.00 0.00 0.00 54.00 0.19 0.22 4 36.00 0.00 0.01 36.00 0.18 0.17 5 100.00 1.36 0.16 86.00 0.35 0.30 6 4.00 0.00 0.00 - - - 7 4.00 0.00 0.00 - - - 8 2.00 0.00 0.00 - - - 9 14.00 0.00 0.00 - - - 10 100.00 1.24 0.16 - - - Method 2 1 100.00 3.01 0.14 94.00 0.94 0.37 2 100.00 1.45 0.18 20.00 0.71 0.33 3 24.00 -0.02 0.04 2.00 1.18 0.17 4 20.00 0.03 0.03 14.00 0.72 0.26 5 100.00 1.97 0.14 56.00 0.90 0.47 6 14.00 0.08 0.05 - - - 7 16.00 0.04 0.04 - - - 8 10.00 0.04 0.04 - - - 9 18.00 -0.00 0.06 - - - 10 100.00 1.94 0.13 - - - Method 3 1 100.00 3.04 0.23 66.00 - - 2 100.00 1.44 0.24 18.00 - - 3 6.00 0.19 0.12 12.00 - - 4 6.00 -0.49 0.12 20.00 - - 5 100.00 2.01 0.20 76.00 - - 6 8.00 -0.01 0.15 - - - 7 2.00 0.39 0.05 - - - 8 4.00 0.03 0.10 - - - 9 4.00 0.39 0.08 - - - 10 100.00 1.95 0.16 - - - 50 Table 7: Simulation 3 Results - Summary Method 1 Method 2 Method 3 Double Joint Independent Penalty Penalty Selection Avg Model Size Fixed (True = 4) 4.92 5.02 4.30 Avg Model Size Random (True = 2) 2.44 1.86 1.26 Percent True Included 100.00 100.00 100.00 Percent True D Included 83.00 75.00 71.00 Percent False Included 15.33 17.00 5.00 Percent False D Included 26.00 18.00 25.00 51 Table 8: Simulation 4 Results - Parameter Estimates Fixed E ects Random E ects Covariate ^% ^ ^ Error ^% ^ ^ Error Method 1 1 100.00 2.93 0.03 100.00 0.69 0.13 2 100.00 1.44 0.03 2.00 0.04 0.00 3 12.00 0.00 0.00 8.00 0.01 0.00 4 4.00 0.00 0.00 8.00 0.02 0.00 5 100.00 1.91 0.03 100.00 0.63 0.00 6 0.00 0.00 0.00 10.00 0.02 0.10 7 0.00 0.00 0.00 4.00 0.02 0.00 8 4.00 0.01 0.01 4.00 0.02 0.00 9 10.00 0.00 0.00 2.00 0.03 0.00 10 100.00 1.89 0.03 12.00 0.01 0.00 11 4.00 0.00 0.00 - - - 12-50 0.00 0.00 0.00 - - - Method 2 1-10 * * * * * * 11-50 * * * - - - Method 3 1 100.00 3.00 0.05 * * * 2 100.00 1.50 0.05 * * * 3 0.00 0.00 0.00 * * * 4 0.00 0.00 0.00 * * * 5 100.00 2.00 0.05 * * * 6 0.00 0.00 0.00 * * * 7 0.00 0.00 0.00 * * * 8 0.00 0.00 0.00 * * * 9 0.00 0.00 0.00 * * * 10 100.00 2.00 0.04 * * * 11-50 0.00 0.00 0.00 - - - * Could not complete due to computational limitations 52 Table 9: Simulation 4 Results - Summary Method 1 Method 2 Method 3 Double Joint Independent Penalty Penalty Selection Avg Model Size Fixed (True = 4) 4.34 * 4.00 Avg Model Size Random (True = 2) 2.50 * * Percent True Included 100.00 * 100.00 Percent True D Included 100.00 * * Percent False Included 3.09 * 0.00 Percent False D Included 5.55 * * *Could not complete due to computational limitations 53 Table 10: Simulation 5 Results - Parameter Estimates Fixed E ects Random E ects Covariate ^% ^ ^ Error ^% ^ ^ Error Method 1 1 100.00 2.46 0.19 78.00 0.41 0.21 2 100.00 1.04 0.21 36.00 0.23 0.15 3 34.00 0.01 0.01 36.00 0.25 0.16 4 42.00 0.01 0.01 32.00 0.24 0.14 5 100.00 1.31 0.15 60.00 0.32 0.19 6 12.00 0.00 0.00 - - - 7 2.00 0.00 0.00 - - - 8 0.00 0.00 0.00 - - - 9 6.00 0.00 0.00 - - - 10 100.00 1.21 0.15 - - - Method 2 1 100.00 3.00 0.12 94.00 0.91 0.59 2 100.00 1.47 0.16 50.00 0.71 0.48 3 44.00 0.01 0.07 44.00 0.81 0.48 4 42.00 0.03 0.07 26.00 0.73 0.36 5 100.00 1.99 0.10 42.00 0.79 0.42 6 32.00 0.04 0.06 - - - 7 38.00 -0.00 0.07 - - - 8 48.00 0.00 0.08 - - - 9 28.00 -0.02 0.05 - - - 10 100.00 1.98 0.10 - - - Method 3 1 100.00 2.97 0.17 50.00 - - 2 100.00 1.50 0.18 16.00 - - 3 2.00 0.41 0.06 20.00 - - 4 0.00 0.00 0.00 16.00 - - 5 100.00 2.00 0.14 52.00 - - 6 2.00 0.39 0.06 - - - 7 0.00 0.00 0.00 - - - 8 2.00 -0.39 0.06 - - - 9 0.00 0.00 0.00 - - - 10 100.00 1.99 0.13 - - - 54 Table 11: Simulation 5 Results - Summary Method 1 Method 2 Method 3 Double Joint Independent Penalty Penalty Selection Avg Model Size Fixed (True = 4) 4.96 6.32 4.06 Avg Model Size Random (True = 2) 2.38 2.56 1.04 Percent True Included 100.00 100.00 100.00 Percent True D Included 69.00 68.00 51.00 Percent False Included 16.00 38.67 1.00 Percent False D Included 34.67 40.00 17.33 55 Table 12: Simulation 6 Results - Parameter Estimates Fixed E ects Random E ects Covariate ^% ^ ^ Error ^% ^ ^ Error Method 1 1 100.00 2.77 0.04 90.00 0.45 0.32 2 100.00 1.30 0.04 24.00 0.12 0.08 3 0.00 0.00 0.00 12.00 0.20 0.12 4 0.00 0.00 0.00 14.00 0.12 0.06 5 100.00 1.69 0.05 88.00 0.48 0.34 6 0.00 0.00 0.00 12.00 0.18 0.13 7 0.00 0.00 0.00 6.00 0.13 0.05 8 0.00 0.00 0.00 0.00 0.00 0.00 9 0.00 0.00 0.00 0.00 0.00 0.00 10 100.00 1.66 0.05 0.00 0.00 0.00 12 0.00 0.00 0.00 2.00 0.17 0.04 14 0.00 0.00 0.00 2.00 0.13 0.04 15 0.00 0.00 0.00 2.00 0.10 0.04 19 0.00 0.00 0.00 6.00 0.23 0.13 20 0.00 0.00 0.00 4.00 0.12 0.05 21-50 0.00 0.00 0.00 - - - Method 2 1-20 * * * * * * 21-50 * * * - - - Method 3 1 100.00 3.01 0.05 * * * 2 100.00 1.49 0.04 * * * 3 0.00 0.00 0.00 * * * 4 0.00 0.00 0.00 * * * 5 100.00 1.99 0.04 * * * 6 0.00 0.00 0.00 * * * 7 0.00 0.00 0.00 * * * 8 0.00 0.00 0.00 * * * 9 0.00 0.00 0.00 * * * 10 100.00 2.01 0.04 * * * 11-20 0.00 0.00 0.00 * * * 21-50 0.00 0.00 0.00 - - - *Could not complete due to computational limitations 56 Table 13: Simulation 6 Results - Summary Method 1 Method 2 Method 3 Double Joint Independent Penalty Penalty Selection Avg Model Size Fixed (True = 4) 4.00 * 4.00 Avg Model Size Random (True = 2) 2.60 * * Percent True Included 100.00 * 100.00 Percent True D Included 89.00 * * Percent False Included 0.00 * 0.00 Percent False D Included 4.67 * * *Could not complete due to computational limitations 57 Table 14: Simulation 7 Results - Parameter Estimates Fixed E ects Random E ects Covariate ^% ^ ^ Error ^% ^ ^ Error Method 1 1 100.00 2.70 0.14 86.00 0.15 0.14 2 100.00 1.24 0.13 52.00 0.03 0.04 3 18.00 0.08 0.05 34.00 0.02 0.01 4 24.00 0.06 0.04 42.00 0.03 0.03 5 100.00 1.76 0.12 82.00 0.19 0.17 6 10.00 0.05 0.02 - - - 7 2.00 0.15 0.02 - - - 8 10.00 0.05 0.02 - - - 9 10.00 0.05 0.02 - - - 10 100.00 1.72 0.10 - - - Method 2 1 100.00 3.39 0.06 84.00 0.86 0.42 2 100.00 1.42 0.17 20.00 0.69 0.35 3 18.00 0.02 0.05 6.00 0.58 0.15 4 14.00 0.03 0.04 10.00 0.75 0.24 5 100.00 1.96 0.11 42.00 0.89 0.48 6 14.00 0.02 0.04 - - - 7 16.00 -0.02 0.05 - - - 8 18.00 0.06 0.04 - - - 9 18.00 0.00 0.03 - - - 10 100.00 1.94 0.14 - - - Method 3 1 100.00 3.44 0.10 64.00 - - 2 100.00 1.49 0.19 16.00 - - 3 8.00 0.21 0.14 14.00 - - 4 2.00 -0.59 0.08 14.00 - - 5 100.00 2.03 0.15 56.00 - - 6 2.00 0.78 0.11 - - - 7 2.00 -0.59 0.08 - - - 8 0.00 0.00 0.00 - - - 9 0.00 0.00 0.00 - - - 10 100.00 2.02 0.16 - - - 58 Table 15: Simulation 7 Results - Summary Method 1 Method 2 Method 3 Double Joint Independent Penalty Penalty Selection Avg Model Size Fixed (True = 4) 4.74 * 4.00 Avg Model Size Random (True = 2) 2.96 * 1.60 Percent True Included 100.00 * 100.00 Percent True D Included 84.00 * 60.00 Percent False Included 12.33 * 0.00 Percent False D Included 42.67 * 14.67 59 Table 16: TAAG 2 Data - Fixed E ects Young et al. (2013) Results Method 1 Method 1 Re-estimate Method 3 8th Grade 11th grade ^ ^ (p-value) ^ ^ (p-value) ^ (p-value) time - -0.52(.35)* 1.01 - - COMBPAREDUC - - -0.23 - - MSQBA5A - - 0.71 - - MSQBA5B - - -1.22 - - MSQBA7 - - 0.43 - - MSQBC1 - - -0.78 - - MSQBC2 - - - - - MSQBC3 - - -0.32 - - MSQBM1 - - -0.19 0.39(.27) -0.15(.72) MSQBM2 - - -0.45 -0.10(.76) -1.40(<.001) MSQBM3 - - -0.23 -0.03(.86) -0.04(.86) MSQBM4 - - - 0.78(.05) 0.80(.08) MSQBM5 - - 0.58 - - MSQBM6 - - 0.21 - - MSQBM7 - - -0.74 - - MSQBM8 - - 0.74 - - MSQBM9 - - -0.14 - - MSQBM10 - - -0.77 - - MSQBR1 - - 1.51 - - MSQBR2 - - -0.43 - - r1 - - -1.15 1.15(.36) 0.31(.83) r2 - - 1.18 2.25(.12) -0.73(.68) r3 - - -0.30 0.02(.99) -0.83(.66) BMI - - -2.47 PFAT3 - - 1.71 -0.09(.08) -0.06(.43) MSQBA DAD MOM - - - - - MSQBOD B - - 0.51 0.12(.04) 0.05(.43) MSQBOD DA - - 0.89 - - MSQBOD DB - - 0.31 0.31(.20) 0.00(.99) MSQBOD E - - 0.25 - - MSQBOD F 0.05 0.12(.05) - -0.04(.68) 0.01(.95) MSQBOD G - - - 0.09(.34) 0.26(.03) MSQBOD H - - 1.08 -0.00(0.92) -0.31(0.01) MSQBOD I -0.09 -0.21(< :001) -0.84 -0.20(.04) 0:37(< :001) MSQBOD_JA - - -0.15 - - MSQBOD JB - - 0.17 0.00(0.92) -0.02(0.06) MSQBOD K - - 0.24 0.57(.14) -0.00(.99) 60 Table 14 { Continued Young et al. (2013) Results Method 1 Method 1 Re-estimate Method 3 8th Grade 11th grade ^ ^ (p-value) ^ ^ (p-value) ^ (p-value) MSQBOD LA - - -0.26 -0.19(.35) -0.11(0.65) MSQBOD LB - - -0.75 -0.13(.39) 0.38(.04) MSQBOD LC - - -0.52 - - MSQBOD N - - -0.43 - - MSQBOD OA - - -0.58 - - MSQBOD OB 0.22 0.43(< :001) 1.31 0.32(.08) 0.28(.22) MSQBOD OC - - -0.39 -0.08(.45) 0.07(.61) MSQB80P - - - 0.01(.80) 0.01(.89) MSQBQ1 - - 0.51 - - MSQBR34SUM - - -0.52 - - * Not selected but included in re-estimated model 61 Table 17: TAAG 2 Data - Random E ects Young et al. (2013) Results Method 1 Method 1 Re-estimate Method 3 8th Grade ^ ^ Selected ^ (p-value) ^ (p-value) Individual Level Intercept 3.97 7.61 - - - MSMA4 - - - -1.45(< :001) 0.11(.53) MSMA5A - - - -0.85(.20) 0.32(.28) MSMA5B - - Yes - - PPIC1C2 - - - -0.41(.22) - PPIC18A - - - -5.63(.19) PPIC19 1.47 6.68 - - 7.35(.01) PPIC21 - - - 16.76(< :001) 3.08(.15) PPIC22 - - - -2.93(0.21) - PPIC34 - - - -2.93(0.21) 5.79(.07) School Level Intercept 0.09 0.93 - - - 62 Table 18: Reference Table for Fixed and Random E ects Predictors Variable Description Fixed E ects time time = (0,1) for 8th grade and 11th grade respectively COMBPAREDUC Parents? education combined MSQBA5A Employment status: father MSQBA5B Employment status: mother MSQBA7 Receive free or low-cost lunches at school MSQBC1 Di culty getting home from school-based activity MSQBC2 Di culty getting to community-based activity MSQBC3 Di culty getting home from community-based activity MSQBM1 Perceived places to go within walking distance of home MSQBM2 Perceived sidewalks in neighborhood MSQBM3 Perceived bike/walking trails in neighborhood MSQBM4 Perceived safety to walk/jog in neighborhood MSQBM5 Perceived walkers/bikers easily seen in neighborhood MSQBM6 Perceived tra c in neighborhood MSQBM7 Perceived frame in neighborhood MSQBM8 Perceived seeing kids outside playing in neighborhood MSQBM9 Perceived interesting things to look at in neighborhood MSQBM10 Perceived well-lit neighborhood MSQBR1 Grade began current middle school MSQBR2 Currently taking PE r1 Race: white r2 Race: black r3 Race: hispanic BMI BMI BMI85 BMI above 85th percentile BMI95 BMI above 95th percentile PFAT3 Percent Fat MSQBA DAD MOM Number of parents living with MSQBOD B Average time alone per week MSQBOD DA Sports team participation at school MSQBOD DB Sports team participation outside school MSQBOD E Enjoyment of PA classes/lessons MSQBOD F Self-management strategies MSQBOD G Self-e cacy MSQBOD H Enjoyment of PA MSQBOD I Perceived barriers 63 Table 16 { Continued Variable Description MSQBOD JA Outcome expectancy MSQBOD JB Outcome expectancy value MSQBOD K Enjoyment of PE MSQBOD LA Positive PA school climate for teachers MSQBOD LB Positive PA school climate for boys MSQBOD LC PA norms MSQBOD N Access to recreational facilities MSQBOD OA Provides social support MSQBOD OB Friend support MSQBOD OC Family support MSQB80p Sum score on depressive scale MSQBQ1 Ever tried cigarettes MSQBR34 SUM Sum of PE class taking Random E ects (School Level) MSMA3E Percent white MSMA4 Percent free/reduced lunch MSMA5A Percent passing state math test MSMA5B Percent passing state English/reading test PDHA1 PE class size PPIC1C2 Required weeks of PE per year PPIC2 Percent students not meeting requirements PPIC17 PA school events this year PPIC19 Interscholastic and Intramural PA programs PPIC21 School ground changes in past year PPIC22 Policy changes that encourage PA PPIC24 Budget change positive for PA PPIC30 Percent bike/walk to school PPIC34 Unstructured free play before school PPIC35 Unstructured free play during school PPIC36 Unstructured free play after school PSB Numprog Number of programs in school MVPA MVPA at school 64 References Akaike, H. (1973), \Information Theory and an Extension of the Maximum Likeli- hood Principle," Second International Symposium on Information Theory. Bondell, H., Krishna, A., and Ghosh, S. K. (2010), \Joint Variable Selection for Fixed and Random E ects in Linear Mixed-E ects Models," Biometrics, 66, 1069{ 1077. Bondell, H. D. and Reich, B. J. (2008), \Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR," Biometrics, 64, 115{23. Chen, Z. and Dunson, D. B. (2003), \Random E ects Selection in Linear Mixed Models," Biometrics, 59, 762{769. Council on Sports Medicine and Fitness and Council on School Health (2006), \Ac- tive healthy living: prevention of childhood obesity through increased physical activity," Pediatrics, 117, 1834{42. Daniels, S. R., Arnett, D. K., Eckel, R. H., Gidding, S. S., Hayman, L. L., Ku- manyika, S., Robinson, T. N., Scott, B. J., Jeor, S. S., and Williams, C. L. (2005), \Overweight in Children and Adolescents : Pathophysiology, Consequences, Pre- vention, and Treatment," Circulation, 111, 1999{2012. Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004), \Least Angle Re- gression," The Annals of Statistics, 32, 407{451. Fan, J. and Li, R. (2001), \Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties," Journal of the American Statistical Association, 96, 1348{60. Fan, Y. and Li, R. (2012), \Variable Selecion in Linear Mixed E ects Models," Annals of Statistics, 40, 2043{2068. Foster, S. D., Verbyla, A. P., and Pitchford, W. S. (2009), \Estimation, Prediction and Inference for the Lasso Random E ects Model," Australian and New Zealand Journal of Statistics, 51, 43{61. Hall, D. B. and Praestgaard, J. T. (2001), \Order-restricted Score Tests for Homo- geneity in Generalised Linear and Nonlinear Models," Biometrika, 88, 739{751. Harville, D. (1974), \Bayesian Inference for Variance Components Using Only Error Contrasts," Biometrika, 61, 383{385. Hastie, T., Tibshirani, R., and Friedman, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction., Springer, 2nd ed. Jiang, J. J., Rao, J. S., Gu, Z., and Nguyen, T. (2008), \Fence Models for Mixed Model Selection," The Annals of Statistics, 36, 1669{92. 65 Kimm, S. Y., Glynn, N. W., Kriska, A. M., Barton, B. A., Kronsberg, S. S., Daniels, S. R., Crawford, P. B., Sabry, Z. I., and Liu, K. (2002), \Decline in physical activity in black girls and white girls during adolescence," New England Journal of Medicine, 347, 709{15. Laird, N. M. and Ware, J. H. (1982), \Random-E ects Models for Longitudinal Data," Biometrics, 38, 963{974. Lange, N. and Laird, N. M. (1998), \The E ects of Covariance Structure on Vari- ance Estimation in Balanced Growth-Curve Models," Journal of the American Statistical Association, 84, 241{247. Li, Y., Wang, S., Song, P. X.-K., Wang, N., and Zhu, J. (2012), \Doubly Regularized Estimation and Selection in Linear Mixed-E ects Models for High-Dimenstional Longitudinal Data," . Lin, X. (1997), \Variance Component Testing in Generlised Linear Models With Random E ects," Biometrika, 84, 309{326. Lindstrom, M. J. and Bates, D. M. (1988), \Newton-Raphson and EM algorithms for Linear Mixed-E ects Models for Repeated-Measures Data," Journal of the American Statistical Association, 83, 1014{1022. Ogden, C., Kuczmarski, R., Flegal, K., Mei, Z., Guo, S., Wei, R., Grummer-Strawn, L., Curtin, L., Roche, A., and Johnson, C. (2002), \Centers for Disease Control and Prevention 2000 growth charts for the United States: improvements to the 1977 National Center for Health Statistics Centers for Disease Control and Pre- vention 2000 growth charts for the United States: improvements to the 1977 National Center for Health Statistics Version," Pediatrics, 109. Pietil ainen, K. H., Kaprio, J., Borg, P., Plasqui, G., Yki-J arvinen, H., Kujala, U. M., Rose, R. J., Westerterp, K. R., and Rissanen1, A. (2008), \Physical inactivity and obesity: A vicious circle," Obesity, 16, 409{14. Raghunathan, T. E., Lepkowski, J., Solenberger, P. W., and Van Hoewyk, J. (2001), \A multivariate technique for multiply imputing missing values using a sequence of regression models," Survey Methodology, 27, 85{95. Raghunathan, T. E., Solenberger, P. W., and Van Hoewyk, J. (2002), \IVEware: Imputation and Variance Estimation Software," Computer Software. Sallis, J. F., Zakarain, J. M., Howell, M. F., and Hofstetter, C. R. (1996), \Ethnic, Socioeconomic, and Sex Di erences Ethnic, Socioeconomic, and Sex Di erences in Physical Activity Among Adolescents," Journal of Clinical Epidemiology, 49, 125{34. Schwartz, G. (1978), \Estimating the Dimension of a Model," Annals of Statistics, 6, 461{4. 66 Stram, D. O. and Lee, J. W. (1994), \Variance Components Testing in the Longi- tudinal Mixed E ects Model," Biometrics, 50, 1171{7. Tibshirani, R. (1996), \Regression Shrinkage and Selection via the Lasso," Journal of the Royal Statistical Society, Series B (Methodological), 58, 267{88. Webber, L. S., Catellier, D. J., Lytle, L. A., Murray, D. M., Pratt, C. A., Young, D. R., Elder, J. P., Lohman, T. G., Stevens, J., Jobe, J. B., Pate, R. R., and TAAG Collaborative Research Group (2008), \Promoting physical activity in middle school girls: Trial of Activity for Adolescent Girls," Am J Prev Med, 34, 173{84. Young, D., Saksvig, B., Wu, T., Zook, K., Li, X., Champaloux, S., Grieser, M., Lee, S., and Treuth, M. (2013), \Multilevel Predictors of Physical Activity For Early, Mid, and Late Adolescent Girls," Journal of Physical Activity and Health (In press). Zou, H. (2006), \The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, 101, 1418{1429. Zou, H. and Hastie, T. (2005), \Regularization and Variable Selection via the Elastic Net," Journal of the Royal Statistical Society, Series B (Methodological), 67, 301{ 320. 67