ABSTRACT Title of Thesis: EFFECT OF INSTRUCTIONAL CONSULTATION ON ACADEMIC ACHIEVEMENT IN THIRD THROUGH FIFTH GRADE Kristi S. Maslak, Master of Arts, 2011 Thesis directed by: Associate Professor William Strein Department of Counseling and Personnel Services The present study evaluated the effect of Instructional Consultation (Rosenfield, 1995) on the academic achievement of third through fifth grade students. Students whom teachers did (n = 201) and did not (n = 8119) select as the focus of consultation were balanced on their estimated propensity to be selected using logistic regression of observed covariates. Multilevel modeling compared students in the two treatment conditions on teacher assigned grades and standardized measures of reading and math, net of prior achievement. A small, but statistically significant negative effect of the program (d = -.13) was found for standardized measures of math. No significant differences were found on the other outcome measures. Limitations include model misspecification, missing data, and treatment diffusion. EFFECT OF INSTRUCTIONAL CONSULTATION ON ACADEMIC ACHIEVEMENT IN THIRD THROUGH FIFTH GRADE by Kristi S. Maslak Thesis submitted to the Faculty of the Graduate School of the University of Maryland, College park in partial fulfillment of the requirements for the degree of Master of Arts 2011 Advisory Committee: Associate Professor William Strein, Chair Professor Sylvia Rosenfield Assistant Professor Jeffrey Harring ? Copyright by Kristi S. Maslak 2011 ii Table of Contents List of Tables ...................................................................................................................... iii Chapter 1: Rationale and Overview of Literature ............................................................... 1 Instructional Consultation Model .................................................................................... 2 Instructional Consultation Research ................................................................................ 3 Selection Bias in Quasi Experiments ............................................................................... 7 Propensity Score Analysis ............................................................................................... 8 Research Question ........................................................................................................... 9 Chapter 2: Method ............................................................................................................. 10 Participants .................................................................................................................... 10 Measures ........................................................................................................................ 11 Demographics ............................................................................................................ 11 Achievement .............................................................................................................. 13 Teacher Surveys ......................................................................................................... 14 Missing Data .................................................................................................................. 15 Data Analysis ................................................................................................................. 18 Estimating Propensity ................................................................................................ 18 Validating the Propensity Model ............................................................................... 23 Evaluating Treatment Effects .................................................................................... 23 Chapter 3: Results ............................................................................................................. 26 Treatment Propensity ..................................................................................................... 26 Treatment Effects .......................................................................................................... 29 Chapter 4: Discussion........................................................................................................ 33 Limitations ..................................................................................................................... 34 Future Directions ........................................................................................................... 36 Appendix A: Measures Included in the Imputation Model ............................................... 39 References ......................................................................................................................... 40 iii List of Tables Table 1: Sample Characteristics ....................................................................................... 11 Table 2: Initial Differences on Measures for IC and Not IC Students .............................. 20 Table 3: Initial Differences on Missing Values for IC and Not IC Students ..................... 22 Table 4: Variables Retained Across Imputations when Estimating Propensity ................ 27 Table 5: Classification of Students as the Focus of an IC Case ....................................... 28 Table 6: Student Participants and Treatment Propensity Rages by Strata ....................... 28 Table 7: Intraclass Correlations and Reliabilities for Outcome Measures ...................... 29 Table 8: Effect of Being the Focus of an IC Case on Math Grades .................................. 30 Table 9: Effect of Being the Focus of an IC Case on Math SOL Scores ........................... 31 Table 10: Effect of Being the Focus of an IC Case on Reading Grades ........................... 31 Table 11: Effect of Being the Focus of an IC Case on Reading SOL Scores .................... 32 1 Chapter 1: Rationale and Overview of Literature Current practice goals for the specialty of school psychology are twofold: (a) to improve the academic and social-emotional development of all students, and (b) to build capacity within educational systems to foster development and prevent dysfunction (Yssledyke, et al., 2006). Consultation with classroom teachers, specialists, or administrators, rather than directly intervening with individual students, provides the most efficient means through which school psychologists can achieve both goals (Bradley-Johnson & Dean, 2000; Ehrhardt-Padgett, Hatzichristou, Kitson, & Myers, 2004; Gutkin & Curtis, 1999). Through consultation, school psychologists can help school professionals to apply the knowledge and skills needed to address and prevent academic and social-emotional difficulties among the students with whom they interact. Partly due to variability across the models currently driving the practice of consultation and the research methods used to evaluate its effectiveness, the evidence base for consultation in the schools has been characterized as ?promising but underdeveloped? (Erchul & Sheridan, 2008, p. 3). Most evidence in support of consultation in the schools centers on the application of behavior models of consultation that address student behavior problems and use experimental or single-subject designs that minimize the plausibility of threats to causal inference (Sheridan, Welch, & Orme, 1996). However, research on the effect of other models of consultation or their effect on academic achievement is less common, and when conducted, these studies often apply research methods that allow credible, alternative causal explanations to remain. 2 Instructional Consultation Model Instructional Consultation (Rosenfield, 1995) is a consultee-centered model of consultation that aims to improve student academic performance, decrease overall referrals and disproportionate minority referrals to special education, and to enhance teachers? instructional practices through a multi-stage problem solving collaboration between the teacher and a trained instructional consultant. According to Rosenfield (1995; 2005), student learning in the classroom results from an interaction among the student?s prior knowledge, the task demands, and the instruction delivered. When a student fails to meet teacher expectations for learning, Rosenfield?s Instructional Consultation (IC) model assumes an ecological mismatch, namely an incongruous relationship among elements of this three-part instructional triangle (Gravois, Rosenfield, & Gickling, 1999). Therefore, identifying the relational mismatch and creating balance among the student?s knowledge, task demands, and instruction are the focus of consultation within the IC model. The process of IC described by Rosenfield (1995; 2005) includes five stages: contracting, problem identification and analysis, intervention planning, intervention implementation and evaluation, and closure. At contracting, the instructional consultant responds to the teacher?s request for assistance, explains the assumptions of IC, and describes the collaborative, data-based process. During problem identification and analysis, the teacher and instructional consultant operationally define the presenting problem within the context of the instructional triangle (Gravois et al., 1999), use Instructional Assessment (Gravois & Gickling, 2008) to establish a baseline measure of the student?s current level of performance, and clarify performance goals. Throughout 3 the intervention planning and implementation stages, the teacher and instructional consultant pool knowledge about research-based instructional practices to design and implement interventions, regularly collect data to monitor student progress, evaluate intervention effectiveness, and modify operationally defined problems or interventions if needed. During the final stage, closure, the teacher and instructional consultant agree to end their current case because stated goals are successfully attained or because both agree that a referral for additional support services, such as special education, is warranted. Rosenfield and Gravois (1996) developed a multidisciplinary team model (IC Teams) to support and sustain the delivery of IC in schools. Within a school, the IC Team is composed of general educators, special educators, school administrators, school psychologists, and school social workers who are trained in the process of IC. According to Rosenfield and Gravois (1996; 1999), IC Teams differ from other problem solving team models in that the relationship between an individual team member and the teacher requesting assistance, rather than the team, operates as the primary forum for problem solving. Therefore, team members assume the role of case managers, and the team functions as a resource for targeted problem-solving and team member training. Through this case management approach to problem solving teams, Rosenfield, Silva, and Gravois (2008) suggest that IC Teams expands the capacity of schools to address the needs of students and staff, thereby diffusing and enhancing IC?s hypothesized treatment effects. Instructional Consultation Research Until recently, the research literature on IC and IC Teams has relied on quasi- experimental methods to evaluate the effect of the program on special education referral patterns, with limited studies of the program?s effect on student academic achievement. 4 Based on data from three separate pre-post studies, Gravois and Rosenfield (2002) conclude that IC Teams reduces the number of special education referrals and placements, and that fewer referrals and placements are made through the IC Teams process than concurrently operating pre-referral teams. In another pre-post study, Gravois and Rosenfield (2006) conclude that IC Teams decreases the risk and odds of minority special education placement compared to non-IC Teams schools. However, this research did not control several threats to internal validity common in pre-post quasi- experimental studies, namely history, maturation, selection, or interactions of selection with other threats. Using an interrupted time-series design, which is more robust in controlling threats to validity from history and maturation, but not selection, Newman (2007) did not find differences in special education referral patterns between IC Teams and non-IC Teams schools. Because these studies did not control for possible systematic differences between groups due to the method of treatment assignment, evidence for an effect of IC Teams on special education referral patterns is inconclusive. Two studies have specifically considered the effect of IC Teams on student academic achievement. Using a pre-post design, school-system-developed criterion- referenced measures of reading achievement, and a small sample size (N = 37), Levinsohn (2000) compared the differential effect on second grade reading achievement when students were served through an IC Team or a Student Support Team (SST). Although the goal of both teams was to facilitate problem-solving and address student reading difficulties, the context of problem-solving differed with IC Teams focusing on case management and SST utilizing team meetings. Levinsohn found that all students made pre-post gains in reading achievement, but gains did not differ between the IC 5 Team and SST conditions after controlling for prior student achievement. However, Levinsohn?s power to detect an effect was limited by the small sample size, and findings are not likely to generalize beyond second grade students due to sample restriction. Using a larger sample size (N = 5942) of fourth and fifth grade students in 28 schools and multilevel modeling to account for the nesting of students and classrooms within schools, Silva (2007) compared scores on state-wide, standardized, criterion- referenced measures of reading achievement between students attending IC Teams schools and students attending schools in which IC Teams were not implemented. While Silva did not find an effect for attending an IC Team school on reading scores on students, a significant positive effect was found on average classroom reading levels. This finding suggests that IC Teams may have a positive effect on reading achievement at the aggregated classroom level, rather than the individual level. However, comparing the coefficients and standard errors of the multilevel models that included only level-one predictors with models that included both level-one and level-two predictors suggests the presence of multicolinearity and raises questions about the validity of the inferences that can be made from the findings. Moreover, the treatment and no-treatment schools were grossly nonequivalent on several variables that are likely related to reading achievement, including percent of students receiving free and reduced meals, second language learners, and ethnic minority (non-white, non-Asian) status. Because between-school differences for these variables were not controlled at the school level, threats to validity from selection and interactions of selection with other threats remain plausible. Selection bias and its interactions with other threats to validity are salient in all five of the previously described studies on IC Teams. Therefore, systematic variation 6 between conditions remains a plausible explanation of the findings. A four-year, randomized-control-trial of the effect of IC Teams (Rosenfield & Gottfredson, 2004) has recently come to a close. Analysis of the data from this large scale study in which schools were randomly assigned to conditions has examined effects of the program on students and teachers during the final year of implementation, net of baseline performance, using both intent-to-treat-schools and intent-to-treat-students models. The intent-to-treat-schools model considers the schools that were randomized to treatment and control conditions during the baseline year and the students within those schools during the final year of implementation. Multilevel modeling of the intent-to- treat-schools model did not find any effects of IC Teams on standardized measures of reading or math achievement, and the effect on teacher assigned grades was mixed such that significant positive effects were found for grades in reading and math among third grade students, while significant negative effects were found for teacher assigned grades in reading among fourth grade students (Bruckman, Vu, & Vaganek, 2010). Analysis using the intent-to-treat-students model considers the students in the final year of implementation who attended treatment and control schools during the baseline year in which schools were randomly assigned. This analytical approach found small, but statistically significantly negative effects of IC Teams on standardized measures of reading among third and fourth grade students, and on math among third and fifth grade students (Bruckman et al., 2010). Furthermore, a statistically significant negative effect of IC Teams was found for teacher assigned grades in reading among fourth grade students. However, intent-to-treat-students models tend to slightly overestimate effects because differential attrition may introduce bias. 7 Regarding the effects of IC Teams on teachers, multilevel modeling of both intent-to-treat-teachers and intent-to-treat-schools models found significant positive effects of the program on teacher efficacy for general education teachers, and on collaboration for other educators (Experimental Evaluation of Instructional Consultation Teams, 2010b). Additional analysis did not find an interaction between levels of teacher use of consultation and teacher efficacy or collaboration (Experimental Evaluation of Instructional Consultation Teams, 2010a). The average percentage of general education teachers within each school who sought IC Teams support ranged from 19% (SD = 12) during Year 1 Intervention (2006- 07) to 48% (SD = 16) during Year 3 Intervention (2008-09) (Berger et al., 2010). With these low and variable levels of IC Teams use, IC was not likely to diffuse sufficiently within and across schools to yield effects on the population of students that could be measured using the intent-to-treat models. However, this level of use may be sufficient for an evaluation of IC that considers the effect of the program on the students who were the specific focus of the teacher consultation. Because the school, and not the student, was the level of random assignment and unit of comparison in the randomized-control- trial, any analysis of this data that compares students who were and were not the specific focus of teacher consultation must apply quasi-experimental research methods. Selection Bias in Quasi-Experiments The fundamental advantage of randomized experiments over all other research designs resides in the random assignment of units to conditions. When units are randomly assigned to conditions, initial differences between groups are attributed to chance and bias is diminished; therefore, differential outcomes are likely due to treatment 8 effects (Shadish, Cook, & Campbell, 2002). When units have not been randomly assigned to conditions, as is the case with quasi-experiments, selection bias, or systematic differences between conditions resulting from treatment assignment, is a possible threat to the validity of causal inference. Although it is possible, selection bias may not always remain a plausible threat to causal inference. One common method for reducing the plausibility of selection bias in quasi-experiments is matching. Matching involves equally distributing units with similar scores on a matching variable between treatment and control conditions. When scores are balanced across conditions, the matching variable no longer provides a plausible explanation for differential treatment outcomes. However, the number of required unit combinations increases exponentially with each matching variable considered, and simple matching procedures are not useful with a large number of matching variables. Propensity Score Analysis A modern statistical procedure, propensity score analysis, has been applied to quasi-experiments in fields such as medicine and psychiatry (Perkins, Tu, Underhill, Zhou, & Murray, 2000; Vanderweele, 2006), community mental health (Hodges & Grunwald, 2005; Ye & Kastukas, 2009), and education (Condron, 2008; Has-Vaughn, 2006; Hong & Yu, 20-08; Wu, West & Hughes, 2008) as a means to match subjects on a large number of selection and outcome variables. Propensity score analysis uses observed covariates to estimate each subject?s propensity for treatment. Specifically, a propensity score is the conditional probability that participant i will receive treatment ( )1=iZ as opposed to not receiving treatment ( )0=iZ given an observed covariate vector, ix , such that ( ) ( )iiii Ze xXx === |1Pr 9 (Rosenbaum & Rubin, 1983; 1984). Because treatment assignment, Z, is a dichotomous variable with a binomial distribution, treatment propensity can be estimated using the following equation: ? = += K k kk XZ 1 )(logit ?? (1) where k indexes observed covariates from 1 to K, and k? is the regression parameter for variable kX . When participants are balanced on estimated propensity and observed covariates that determine the propensity score covary with selection and outcome variables, conditions can be compared with equivalent expectations on observed covariates and the covariates no longer pose a selection threat to causal inference (Rosenbaum & Rubin, 1983). Indeed, when a randomized experiment and a quasi-experiment were compared with and without the use of propensity score analysis, Luellen, Shadish, and Clark (2005) found that 73-90% of the observed bias between the randomized and quasi-experiments were reduced when propensity score analysis was applied. Research Question The purpose of this study is to evaluate the effect of IC on academic achievement using propensity score analysis to reduce the plausibility of threats to validity in the quasi-experiment from selection bias. Specifically, after Year 1 Intervention (2006-07), did students who were the focus of teacher consultation in IC Team schools receive higher end-of-year grades and standardized achievement scores, net of prior achievement, than other students in IC Team schools who were not the focus of consultation, but were balanced on estimated propensity to be the focus? 10 Chapter 2: Method Participants Data were collected from 45 public elementary schools within a suburban county in the mid-Atlantic region of the United States as part of a four-year experimental evaluation of IC Teams (Rosenfield & Gottfredson, 2004) conducted between the 2005- 06 and 2008-09 school years. Of the schools, 11 had been implementing IC Teams for one to three years prior to the experiment and were not included in the experimental evaluation. The remaining 34 schools were matched on a risk composite, and schools from each matched pair were randomly assigned to treatment and control conditions such that 17 schools were assigned to each condition. As expected, post randomization checks found that treatment and control schools were equivalent on expectation for measured variables. The 11 non-experimental schools had higher proportions of students who were ethnic minorities, limited English proficient, and qualified for free and reduced meals (Bruckman et al., 2010; Silva, 2007; Vu et al., 2009). The sample for this study includes third through fifth grade students (N = 8320) and their teachers (N = 374) within the 28 schools implementing IC teams during the 2006-07 academic year (see Table 1). Kindergarten through second grade students were not sampled because a different grading rubric was applied and the students did not participate in annual standardized assessments of academic achievement. Classroom teachers self-selected to receive support from the IC Team and selected students (n = 201) to be the focus of consultation. The teacher-selected students were identified by their unique student identification code recorded on the case tracking forms maintained in each school implementing IC Teams. 11 Table 1 Sample Characteristics Teachers (N = 374) % Students (N = 8320) % Sex Sex Female 89 Female 48 Male 11 Male 52 Ethnicity Ethnicity Advantaged 87 Advantaged 44 Disadvantaged 13 Disadvantaged 56 Education Grade Level Bachelor's degree 49 3rd grade 34 Master's degree 51 4th grade 33 Years Teaching 5th grade 33 1 year or less 7 Services 2 to 5 years 32 Free/Reduced Meal 39 6 to 10 years 26 Special Education 12 11 to 20 years 19 ESOL 21 More than 20 years 16 IC Teams 2 Age Age 30 years or younger 31 Old for grade 18 31 to 40 years 22 Young for grade 2 41 to 50 years 17 51 years or older 30 Note. ESOL = English as a Second or Other Language. All percentages were rounded to the nearest integer. Measures Data for this study were measured during Pre-intervention Baseline (2005-06) and Year 1 Intervention (2006-07). Measures included school district maintained records for student and teacher demographic information, student grades, and standardized student achievement test scores. Additional measures included two teacher surveys that were administered as part of the experimental evaluation of IC Teams (Vu et al., 2009). Demographics. School district records provided measures of student and teacher characteristics, student enrollment status, and student services received. Characteristics of students and teachers were measured during Year 1 Intervention (2006-07) and included gender, ethnicity, and date of birth. Because Caucasian and Asian students are less likely to be 12 referred to interventions, such as special education, than are African American or Hispanic students (Artiles, Klinger, & Tate, 2006; O?Conner & Fernandez, 2006; Reid & Knight, 2006), ethnicity was recoded to provide a dichotomous measure of Caucasian/Asian ethnicity. Measures of student and teacher age were derived by subtracting date of birth from the date of the first day of school. Student measures of old for grade and young for grade were derived by comparing each student?s age of entry with grade level age expectations. District criteria for the allowable age of entry to Kindergarten suggest an age of entry to the third, fourth, and fifth grades as 8, 9, and 10 years, respectively. Students whose age exceeded expectations were identified as old for grade, and those whose age did not reach expectations were identified as young for grade. Student enrollment status was measured for both Pre-intervention Baseline (2005- 06) and Year 1 Intervention (2006-07). Measures included the date of enrollment, grade level in which the student was enrolled, the number of days enrolled, the number of days attended, and if the student was retained at the end of the year. A measure of being new to the district was derived by identifying students who did not have enrollment data for Pre-intervention Baseline (2005-06). Date of enrollment was recoded to provide a dichotomous measure of students who enrolled with the district after the first quarter grading period. A measure of the proportion of days enrolled was derived by dividing the number of days enrolled from the total number of school days according to the school calendar. Similarly, a measure of the proportion of days absent was derived by dividing the number of days attended by the number of days enrolled. Student services received were measured for both Pre-intervention Baseline (2005-06) and Year 1 Intervention (2006-07). The school district uses a comprehensive 13 coding system to record a student?s limited English proficiency status, qualification for free and reduced meals, and special education codes. This system was simplified to provide four dichotomous measures indicating whether or not a student was considered limited English proficient, received English as a second or other language services, qualified for free and reduced meals, or qualified for special education. Achievement. Academic achievement was measured during both Pre-intervention Baseline (2005-06) and Year 1 Intervention (2006-07). Measures included quarterly assigned teacher grades and test scores from state-wide, annually administered, standardized achievement tests. In the academic domains of listening, oral language, art, physical education, music, handwriting, and technology, teachers assigned grades ranging from ?N? (not meeting expectations) to ?S+? (outstanding). In the academic domains of reading, writing, math, social studies, and science, teachers assigned grades ranging from ?F? (failure) to ?A? (outstanding). Grades were recoded from nominal to numerical values in the following manner: S+ or A = 4; S, B+, or B = 3; S-, C+, or C = 2; and N, D+, or F = 1. An overall measure of student grade point average (GPA) was derived by averaging the quarterly grades received across all 12 academic domains. Specific domain measures of GPA were derived for reading, writing, math, and listening by averaging the quarterly grades within each domain. During the spring of each academic year, students in the third through fifth grades were administered standardized, state-wide assessments for reading and math achievement that were aligned with the state?s standards of learning (SOL). Students received scale scores ranging from 200-600. The SOLs were developed using Item 14 Response Theory to equate the scales across years of implementation, but not vertically across grade levels (Virginia Department of Education, 2005). Therefore, SOL scores did not provide an absolute measure of academic achievement. Rather, within each academic domain, SOL scores provided a measure of academic achievement relative to grade-level expectations. Teacher surveys. Teachers completed two surveys that were administered online through the school district?s intranet each February as part of the four-year experimental evaluation of IC Teams (Rosenfield & Gottfredson, 2004). Response rates for both surveys exceeded 80% for the 2005-06 and 2006-07 school years included in this study (Bruckman et al., 2010; Vu et al., 2009). The Teacher Self Report (TSR) was a 100-item survey composed of researcher developed items as well as items adapted from Teschannen-Moran and Hoy (2001) and Byrk and Schneider (2003). Mean composites were derived from five-point Likert scale measures of a teacher?s sense of efficacy when working with students, perception of collaboration among colleagues, job satisfaction, and instructional practices. Reliabilities for the Teacher Efficacy (? = .94 & .92), Collaboration (? = .88 & .82), Job Satisfaction (? = .92 & .92), and Instructional Practices (? = .90 & 91) composites were high for 2005-06 and 2006-07, respectively (Vu et al., 2009). Additional TSR items asked teachers about their highest level of education attained, teaching licensure status and type of licensure, years working as a teacher, and years working as a teacher in the current school. Two measures were derived to indicate teachers with a Master?s degree or higher, and teachers with a provisional or full 15 elementary (pre-K to 6) license. Years working as a teacher and years working in the current school were measured in the following manner: 1 = 1 year or less, 2 = 2 to 5 years, 3 = 6 to 10 years, 4 = 11 to 20 years, and 5 = 20+ years. For each student, teachers completed the Teacher Report on Student Behavior (TRSB). The TRSB was a 36-item survey that measured student behavior in the classroom and teacher perceptions of the student relationship. Two researcher developed items measured the teacher?s overall rating of a student?s academic progress and classroom behavior using a five point Likert-scale. Remaining items were adapted from the Teacher Observation of Classroom Adaptation-Revised (TOCA-R; Werthamer- Larsson, Kellam, & Wheeler, 1991) and the Student-Teacher Relationship Scale (STRS; Pianta, 2001), which used four-point and five-point Likert scales, respectively. Mean composites for the TOCA-R and STRS items were derived. Reliabilities for the Internalizing Behavior (? = .85 & .85), Externalizing Behavior (? = .90 & .90), Concentration and Readiness to Work (? = .92 & .92), Closeness (? = .86 & .85), and Conflict (? = .86 & .87) composites were high for 2005-06 and 2006-07, respectively (Bruckman, Vu, & Vaganek, 2010). Missing Data A total of 43 variables from the student and teacher measures were identified for the propensity score estimation and treatment outcome analyses. However, of the student sample, 63% (n = 5246) were missing values on one or more variables. Furthermore, with missing data for 62% (n = 124) of the students selected as the focus of IC, there was not a significant relationship between treatment selection and missing data (r = .004). Because most statistical methods and software packages, including those planned for this 16 study, assume complete sample case data, several approaches for handling the problem of missing data were considered. First considered was listwise deletion, whereby cases with one or more missing values are excluded from analyses. When the probability of missing data for any given variable is unrelated to the value of that variable and all other variables in the analysis, a condition known as missing completely at random (MCAR), the cases with complete data are assumed to be a random subsample of the full sample and parameter estimates will be unbiased (Allison, 2002; Schafer & Graham, 2002). However, in this study, data are known to be missing on account of at least two variables identified for the propensity analysis: a) students who were new to the school in 2006-07, and b) students who entered after the first quarter. Therefore, the data are not MCAR. When data are not MCAR, the cases with complete data no longer represent the full sample, and listwise deletion may introduce bias. Moreover, listwise deletion would have substantially reduced the size of the sample available for analysis, thereby reducing statistical power and inflating standard errors. As such, listwise deletion was not applied. The remaining approaches that were considered for handling missing data required imputation, or the process of using observed data to fill in missing values and build complete case data (Allison, 2002). Single imputation methods build one complete set of case data through multiple regression, maximum likelihood (ML) estimation, or the expectation maximization (EM) algorithm, for example. Of the single imputation methods, ML estimation yields relatively unbiased parameter estimates when sample sizes are large, but it requires specialized software. Although multiple regression can be implemented with general use software, it requires large sample sizes and data that are 17 MCAR to yield unbiased estimates. The EM algorithm can be implemented with general use software and data need not be MCAR. Instead, the EM algorithm assumes that data are missing at random (MAR), such that the probability of missing data for a given variable is unrelated to the value of that variable after controlling for other variables in the analysis. While it is possible that data in this study are MAR, testing that assumption is not possible because missing values are unknown. Multiple imputation (MI) is another method that can be implemented with general use software and assumes that data are MAR, but unlike the EM algorithm, which may yield standard errors that are biased downward, MI introduces random variance that adjusts standard errors upward and reduces bias (Allison, 2002). Therefore, missing data were imputed using MI, where missing values on x are predicted from known values on other variables such that the equation for imputing missing values on x given known values on y is as follows: iyxi usbyax . ~ ++= (2) where iyx us . is a random draw from the residual distribution of x for the ith participant. For a single data file with missing values, different draws of iyx us . generate multiple sets of data in which missing values were imputed. When using MI data files, analyses are completed for each set of data, and results are pooled. With moderate amounts of missing values, five imputed sets are sufficient to stabilize p-values and standard errors (Allison, 2009). Although data are missing for the majority of participants in this study, the mean number of missing values per participant was moderate (M = 5.55, SD = 6.22). Therefore, five imputed data sets were generated. 18 As Allison (2002) recommends, the imputation model was built from the pool of dependent and independent variables for estimating propensity and treatment effects. Measures from Pre-intervention Baseline (2005-06) and Year 1 Intervention (2006-07) were included to improve model fit. Imputations for continuous variables were constrained within allowable maximum and minimum values. Furthermore, multiple category variables were dummy coded. The dummy variable with the largest imputed value was assigned as the missing category value. A summary of measures included in the imputation model can be found in Appendix A. The problem of missing data was further addressed when estimating treatment propensity. With a sample size of N = 4500, D?Agostino and Rubin (2000) demonstrated that including missing indicator dummy variables to estimate propensity equates participant expectations on patterns of missing data as well as observed covariates. Therefore, a set of dummy variables was derived for each predictor to indicate cases that had imputed values on that predictor. These dummy variables were then included as predictors when building the treatment propensity model. Data Analysis Estimating treatment propensity. Propensity scores are most commonly estimated using logistic regression due to its advantages over other methods, including classification trees and ensemble methods (Luellen, 2007). First, logistic regression is relatively robust against violations of multivariate normality and homogeneity of variance-covariance matrices. Second, logistic regression can model curvilinear relationships between observed covariates and treatment assignment. Furthermore, Luellen demonstrated through a series of Monte 19 Carlo experiments that propensity scores derived through logistic regression were less likely to introduce selection bias and yielded a more precise adjusted estimate of treatment effects than other methods. Therefore, logistic regression was the propensity score estimation method chosen for this study. When building the regression model, decisions for determining the pool of variables were guided by practical and theoretical relationships with treatment assignment rather than parsimony or statistical significance of a single predictor (Luellen et al., 2005; Rubin & Thomas, 1996). Because characteristics of teachers who did and did not seek IC support may differ, the initial pool of variables included teacher demographics, TSR, and TRSB measures. Furthermore characteristics of students whom teachers did and did not choose as the focus of consultation may differ because student academic and behavioral difficulties are a focus of IC, and because students have historically been disproportionately referred to interventions, such as special education, according to student demographic characteristics (Artiles, Klinger, & Tate, 2006; O?Conner & Fernandez, 2006; Reid & Knight, 2006). Therefore, student grades, enrollment, support services received, and demographics were included when building the model. Initial differences between students who were and were not the focus of IC on the predictor variables are summarized in Table 2, and for missing value dummies in Table 3. The regression model included a pool of 66 variables, with 38 predictor variables and 28 missing value dummy indicators. Variables with the greatest likelihood of contributing to model fit were identified using the backward stepwise logistic regression procedure of SPSS Statistics 17.0 for Windows (IBM, 2008). Because Cramer (1999) 20 T ab le 2 Ini tia l D iffe ren ces on M ea sur es for IC an d N ot IC St ud en ts IC No t IC Me asu re M SD M SD t df ?2 d Stu de nt Ge nd er (fe ma le) .3 7 .4 9 .4 8 .5 0 8. 59 -. 22 * Ad va nta ge d E thn ici ty .4 3 .5 0 .4 4 .5 0 .1 4 -.0 2 Lim ite d E ng lis h P rof ici en t .2 1 .4 1 .2 7 .4 4 3. 57 -.1 4? Ol d f or Gr ad e .2 3 .4 2 .1 8 .3 8 3. 25 .1 2? Yo un g f or Gr ad e .0 2 .1 6 .0 2 .1 5 .0 8 .0 0 Fre e a nd R ed uc ed M eal s .3 5 .4 8 .3 9 .4 9 .8 9 -.0 8 Sp eci al Ed uc ati on a .1 3 .3 4 .1 3 .3 4 .0 2 .0 0 En gli sh as Se co nd La ng ua ge .1 7 .3 8 .2 1 .4 1 2. 36 -.1 0 Ne w to Di str ict .1 5 .3 6 .1 4 .3 5 .1 6 .0 3 En ter ed af ter 1s t Q ua rte r .0 7 .2 6 .0 8 .2 7 .0 5 -.0 4 Pr op ort ion D ay s E nro lle da .9 5 .1 7 .9 5 .1 6 .0 8 71 56 .0 0 Pr op ort ion D ay s A bse nta .0 4 .0 4 .0 3 .0 3 -.7 4 71 56 .0 6 Re tai ne d a t E nd of Y ear a .0 1 .0 8 .0 1 .0 7 .0 2 .0 0 Th ird G rad e .2 5 .4 4 .3 4 .4 7 6. 21 -. 20 * Fo urt h G rad e .5 4 .5 0 .3 3 .4 7 41 .63 . 43 ** * Fif th Gr ad e .2 0 .4 0 .3 4 .4 7 15 .55 -. 32 ** * Lis ten ing b 2. 87 .6 9 3. 11 .5 9 5.7 1 77 13 -. 37 ** * Ma thb 2. 71 .9 8 3. 05 .8 2 5.7 1 76 88 -. 38 ** * Re ad ing b 2. 61 .8 9 3. 03 .8 0 7.1 5 76 88 -. 50 ** * W riti ng b 2. 78 .8 9 3. 15 .8 0 5.7 0 76 78 -. 44 ** * GP Aa 2. 91 .3 6 3. 10 .3 5 6.8 6 71 49 -. 54 ** * Gl ob al Pro gre ssa 3. 39 1. 15 3. 85 1. 06 4.9 1 55 67 -. 42 ** * Gl ob al Be ha vio ra 3. 85 1. 08 4. 04 1. 02 2.0 7 55 35 -. 18 * Co nc en tra tio na 1. 71 .7 3 2. 04 .7 2 5.2 2 55 88 -. 46 ** * Ex ter na liz ing a .3 3 .4 8 .2 9 .4 6 -1 .00 55 88 .0 9 Int ern ali zin ga .6 1 .4 8 .5 7 .5 1 -.7 3 55 88 .0 8 Cl ose ne ssa 3. 14 .7 3 3. 19 .7 8 .7 3 55 82 -.0 7 Co nfl ict a .6 8 .9 4 .5 3 .9 0 -1 .84 55 82 .1 6? 21 T ab le 2. (co nti nu ed ) IC No t IC Me asu re M SD M SD t df ?2 d Te ach er Ge nd er (fe ma le) .8 1 .4 0 .8 9 .3 1 14 .22 -. 22 ** * Ad va nta ge d E thn ici ty .9 2 .2 8 .8 7 .3 4 3. 32 -.1 6? Ag e 34 .54 9. 49 40 .29 12 .40 6.2 4 70 44 -. 52 ** * Ye ars Te ach ing 2. 58 .8 2 3. 08 1. 18 5.8 5 72 14 -. 49 ** * Ye ars at Sc ho ol 2. 03 .6 9 2. 30 1. 07 3.5 9 72 14 -. 30 ** * Ele me nta ry Lic en sur e 1. 00 .0 0 .9 8 .1 3 3. 32 .2 2? Ma ste r's D eg ree or H igh er .4 8 .5 0 .5 1 .5 0 .8 1 -.0 6 Ef fic ac ya 4. 16 .3 5 4. 19 .4 5 .7 0 60 30 -.0 7 Co lla bo rat ion a 3. 87 .6 8 4. 13 .6 9 4.6 5 60 30 -. 38 ** * Jo b S ati sfa cti on a 4. 29 .5 5 4. 39 .7 1 1.8 9 60 30 -.1 6? Ins tru cti on al Pr act ice sa 3. 72 .4 4 3. 84 .5 2 2.9 6 60 04 -. 25 * No te. C hi squ are is co mp ute d f or dic ho tom ou s v ari ab les , a nd df = 1. Ef fec t s ize , d , is ca lcu lat ed as d = ( MI C - M No IC ) / ?p oole d. a M eas ure d d uri ng Pr e-I nte rve nti on B ase lin e ( 20 05 -06 ). b F irs t Q ua rte r G rad es Me asu red du rin g Y ear 1 Int erv en tio n ( 20 06 -07 ). ? p < .10 . * p < .0 5. ** *p <. 00 1. 22 Table 3 Initial Differences on Missing Values for IC and Not IC Students IC Not IC Measure M SD M SD ?2 d Student Measures FARM .00 .00 .00 .01 .03 .00 Special Education .15 .36 .14 .35 .16 .03 Proportion Days Enrolled .15 .36 .14 .35 .16 .03 Proportion Days Absent .15 .36 .14 .35 .16 .03 Retained at End of Year .16 .37 .15 .36 .27 .03 Listening .04 .20 .07 .26 3.31 -.13? Math .04 .21 .08 .27 2.82 -.17? Reading .04 .21 .08 .27 2.82 -.17? Writing .04 .21 .08 .27 3.00 -.17? GPA .14 .35 .14 .35 .00 .00 Global Progress .34 .47 .33 .47 .06 .02 Global Behavior .34 .48 .33 .47 .07 .02 Concentration .33 .47 .33 .47 .03 .00 Externalizing .33 .47 .33 .47 .03 .00 Internalizing .33 .47 .33 .47 .03 .00 Closeness .33 .47 .33 .47 .02 .00 Conflict .33 .47 .33 .47 .02 .00 Teacher Measures Gender .00 .07 .03 .17 4.14 -.23* Advantaged Ethnicity .16 .37 .06 .24 32.21 .32*** Age .08 .28 .15 .36 7.46 -.22* Years Teaching .04 .20 .13 .34 15.44 -.53*** Years at School .04 .20 .13 .34 15.44 -.53*** Elementary Licensure .10 .30 .20 .40 12.33 -.28*** Level of Education .05 .22 .14 .35 14.40 -.31*** Efficacy .15 .36 .28 .45 15.07 -.32*** Collaboration .15 .36 .28 .45 15.07 -.32*** Job Satisfaction .15 .36 .28 .45 15.07 -.32*** Instructional Practices .16 .37 .28 .45 14.51 -.29*** Note. Participants did not have missing values for gender, advantaged ethnicity, limited English proficient, old for grade, young for grade, English as second language, new to district, entered after 1st quarter, and grade level. Effect size, d, is calculated as d = (MIC - MNo IC) / ?pooled. ?p < .10. *p < .05. ***p <.001. found that adjusting the logistic regression cut point to match sampling proportions can improve case classification when group sample sizes are grossly unbalanced, as is the case with this study, the default cut point of .50 was changed to .976 to reflect that only 2.4% of participants were the specific focus of IC. 23 Propensity scores were stratified into five strata, and dummy variables were derived to indicate strata for each participant. Rosenbaum and Rubin (1994) recommend stratifying the propensity score into quintiles, or five strata, so that the propensity score distribution for participants in the treatment and no treatment groups are similar within strata. Furthermore, Cochran (1968) found that approximately 90% of the bias from a single continuous variable can be reduced with five strata. Validating the propensity model. Following the procedure described in Luellen et al. (2005), the propensity model was validated to assess whether participants were equated across treatment groups within strata on observed covariates. Each covariate from the pool of predictor variables was subject to analysis as a dependent variable to determine if the covariate differed between groups. An ANCOVA model (2 groups x 5 strata) with all two-way interactions was evaluated, and analyses were re-run after dropping non-significant interaction terms. Statistical significance at p < .05 for both main and interaction effects was considered. Because multiple analyses were performed, it was expected that 5% of the results (n = 3 variables) would be statistically significant by chance alone. Evaluating treatment effects. Students are nested within classroom teachers, and multilevel modeling with the Hierarchical Linear Modeling program (HLM 6.08; described in Raudenbush & Bryk, 2002) was used to evaluate the effect of IC on academic achievement during Year 1 Intervention (2006-07). Individual students comprise Level I, whereas classroom teachers comprise Level II. Analyses were conducted separately for each dependent measure, which were 4th quarter grades and SOL scores in reading and math. One factor 24 with two levels was whether or not the student was the focus of consultation during Year 1 Intervention (2006-07). A second factor was estimated propensity with five levels to equate participants across groups on observed covariates. Third grade students did not take SOLs during Pre-intervention Baseline (2005-06); therefore, first quarter grades from Year 1 Intervention (2006-07) in the same domain as the dependent measure were used to control for prior achievement when evaluating both grades and SOL scores. Among fourth and fifth grade students, correlations between first quarter grades and prior SOL scores were moderate (rmath = .582; rread = .487). Dichotomous variables were entered in the model uncentered, and prior achievement was entered group mean centered. The homogeneity of student level slopes was tested by entering predictors into the model with their slopes free to vary. Slopes that did not significantly vary between classrooms at p < .10 were fixed. Equation 3 describes the mixed model used to evaluate the effect of being the focus of IC on academic achievement when all slopes were left free to vary. G73bGbdcGbdd Gd4c G7dbGb34Gbdd Gd45G7dbGb35GbddG73aGb35 Gd45G7dbGb36GbddG73aGb36 Gd45G7dbGb37GbddG73aGb37 Gd45G7dbGb38GbddG73aGb38 Gd45G7dbGb39GbddG73aGb39 Gd45G7dbGb3aGbddG73aGb3a Gd45G74eGbdcGbdd Gd45G751Gb34Gbdd Gd45G751Gb35Gbdd Gd45G751Gb36Gbdd Gd45G751Gb37Gbdd Gd45G751Gb38Gbdd Gd45G751Gb39Gbdd Gd45G751Gb3aGbdd (3) where, G73bGbdcGbdd was the measure of academic achievement for the ith student in the jth classroom, G7dbGb34Gbdd was the unadjusted mean achievement in classroom j, G7dbGb35Gbdd was the effect of being the focus of an IC case, G7dbGb36Gbdd to G7dbGb39Gbdd were the effects of strata, G7dbGb3aGbdd was the effect of prior achievement, 25 G73aGb35 was student treatment assignment, G73aGb36 to G73aGb39 were student indicators for strata 2 through strata 5, G73aGb3a was the student measure of prior achievement, G74eGbdcGbdd was the residual error for student i in the jth classroom, and G751Gb35Gbdd to G751Gb3aGbdd were the residual errors for the jth classroom. 26 Chapter 3: Results Treatment Propensity Of the 66 variables entered into the backward stepwise logistic regression, 29 variables were retained as contributing to model fit in at least one imputed data set (see Table 4). Students in the fourth grade with a close teacher relationship, and who had a teacher with a Master?s degree or higher reporting higher than average self-efficacy for teaching and job satisfaction, were more likely to be selected as the focus of consultation. Students who received free and reduced meals or ESOL services, were new to the district, maintained lower than average grades, were rated by teachers as maintaining lower than average concentration, externalizing, or internalizing behaviors, and who had a male or younger than average teachers who were either new to teaching or to teaching at the school and reported lower than average collaboration and good instructional practices, were less likely to be selected as the focus of consultation. Furthermore, students with missing data about their teacher?s ethnicity were more likely to be selected as the focus of consultation, while students with missing data about their listening grades, grade point average, teacher?s sense of efficacy, and teacher?s teaching experience were less likely to be selected. However, these conclusions should be interpreted with caution as the retained model did not predict any students as being the focus of a teacher consultation despite the use of an adjusted cut value (see Table 5). Although treatment selection was not adequately modeled, the estimated propensity scores could be applied in further analyses to control for the retained variables. Therefore, the propensity scores were stratified into quintiles and validation checks were conducted. Because the model failed to classify any students as being 27 T ab le 4 Va ria ble s R eta ine d A cro ss Im pu tat ion s w he n E sti ma tin g P rop en sit y w ith Ba ckw ard St ep wi se Lo gis tic Re gre ssi on Va ria ble Mi nim um p va lue Ma xim um p va lue ? SE W ald Ex p(? ) ? SE W ald Ex p(? ) Stu de nt Fre e a nd R ed uc ed M eal s -.5 3 .17 1 9. 51 ** .5 9 -.4 4 .1 72 6. 61 ** .6 4 En gli sh as Se co nd La ng ua ge -.6 9 .21 2 1 0.4 6* * .5 0 -.6 2 .2 11 8. 76 ** .5 4 Ne w to Di str ict -2. 45 .83 1 8. 70 ** .0 9 -.1 9 .8 23 5 .60 * .1 4 Pr op ort ion D ay s A bse nta , c -2. 47 1.5 09 2. 69 1.1 4 Fo urt h G rad e 1. 09 .15 6 48 .92 ** * 2.9 9 1. 04 .1 54 45 .74 ** * 2.8 4 Lis ten ing b -.3 5 .12 4 8. 05 ** .7 0 -.2 4 .1 30 3 .50 ? .7 8 Ma thb -.2 4 .11 3 4 .33 * .7 9 -.2 0 .1 14 3 .09 ? .8 2 Re ad ing b -.3 2 .10 8 9.0 2* * .7 2 -.2 3 .1 09 4 .27 * .8 0 W riti ng b -.3 5 .11 4 9.5 9* * .7 0 -.2 9 .1 23 5 .42 * .7 5 GP Aa -.7 7 .27 9 7.6 2* * .4 6 -.5 2 .3 09 2 .88 ? .5 9 Gl ob al Be ha vio ra, c .1 3 .0 73 3 .04 ? 1.1 4 Co nc en tra tio na -.6 0 .16 4 13 .20 ** * .5 5 -.3 0 .1 67 3 .11 ? .7 4 Ex ter na liz ing a -.5 5 .17 7 9. 59 ** .5 8 -.4 1 .1 71 5 .83 * .6 6 Int ern ali zin ga -.5 7 .17 5 1 0.4 9* * .5 7 -.2 7 .1 57 2 .96 ? .7 6 Cl ose ne ssa .2 1 .11 2 3 .55 ? 1.2 3 .1 9 .1 05 3 .35 ? 1.2 1 Te ach er Ge nd er (fe ma le) -.7 3 .21 1 11 .83 ** * .4 8 -.5 7 .2 18 6. 83 ** .5 7 Ag e -.0 2 .01 0 3 .37 ? .9 8 -.0 2 .0 10 2 .94 ? .9 8 Ye ars Te ach ing -.5 2 .07 8 43 .28 ** * .6 0 -.3 0 .1 07 7. 94 ** .7 4 Ye ars at Sc ho ol -.3 0 .12 4 5 .74 * .7 4 -.2 6 .1 24 4 .49 * .7 7 Ma ste r's D eg ree or H igh er .5 7 .16 6 11 .58 ** * 1.7 6 .4 2 .1 65 6 .58 * 1.5 3 Ef fic ac ya .6 2 .22 2 7.8 5* * 1.8 6 .4 3 .2 17 3 .96 * 1.5 4 Co lla bo rat ion a -.3 7 .12 5 8.7 8* * .6 9 -.2 3 .1 26 3 .31 ? .7 7 Jo b S ati sfa cti on a .2 7 .12 7 4 .58 * 1.3 1 .2 3 .1 25 3 .51 ? 1.2 5 Ins tru cti on al Pr act ice sa -.5 9 .17 5 11 .34 ** * .5 6 -.3 3 .1 76 3 .48 ? .7 2 28 T ab le 4. (co nti nu ed ) Va ria ble Mi nim um p va lue Ma xim um p va lue ? SE W ald Ex p(? ) ? SE W ald Ex p(? ) Mi ssi ng In dic ato r Lis ten ing b -1. 86 .45 5 16 .75 ** * .1 6 -1. 07 .4 31 6 .22 * .3 4 GP Aa -2. 15 .83 6 6.6 3* * .1 2 -1. 77 .8 34 4 .51 * .1 7 Te ach er Eth nic ity 1. 59 .23 2 47 .09 ** * 4.9 2 1. 49 .2 38 39 .37 ** * 4.4 4 Ye ars Te ach ing -1. 79 .38 6 21 .48 ** * .1 7 -1. 18 .3 93 9. 01 ** .3 1 Ef fic ac ya -1. 02 .21 0 23 .48 ** * .3 6 - .8 3 .2 17 14 .60 ** * .4 4 No te. Pa ram ete r e sti ma tes di ffe red ac ros s i mp uta tio ns. D f = 1. a M eas ure d d uri ng Pr e-I nte rve nti on B ase lin e ( 20 05 -06 ). b F irs t q ua rte r g rad es me asu red du rin g Y ear 1 Int erv en tio n ( 20 06 -07 ). c V ari ab le ret ain ed in fe we r th an tw o i mp uta tio ns. ? p < .10 . * p < .0 5. ** p < .0 1 * ** p < .0 01 . T ab le 5 Cl ass ific ati on of St ud en ts as the Fo cu s o f a n I C Ca se Pr ed ict ed Pe rce nt Co rre ct Ob ser ve d No t IC IC No t IC 81 19 0 10 0 IC 20 1 0 0 No te. C ut va lue is .9 76 T ab le 6 Stu de nt Pa rti cip an ts an d T rea tm en t P rop en sit y R an ge s b y S tra ta IC No t IC Pr op en sit y Str ata n % n % Mi nim um Ma xim um 1 4 .0 1. 99 16 60 .0 20 .45 .00 0 .00 4 2 10 .4 5. 17 16 53 .6 20 .37 .00 4 .00 8 3 16 .4 8. 16 16 47 .6 20 .29 .00 8 .01 5 4 39 .2 19 .50 16 24 .8 20 .01 .01 5 .03 3 5 13 1.0 65 .17 15 33 .0 18 .88 .03 3 .69 4 No te. R esu lts ar e p oo led ac ros s i mp uta tio ns. 29 selected for treatment, the distribution of propensity scores and the number of cases within each strata was highly skewed such that 80% of the participants (n = 6656) had less than a 3.3% chance of being selected for IC (see Table 6). When validating the propensity score model, results from five, or 8%, of the validation analyses were statistically significant. Variables that continued to differentiate students who were selected as the focus of consultation from students who were not selected included teacher measures of age, gender, teaching experience, collaboration, and the missing value indicator for advantaged ethnicity. Although the number of significant analyses was greater than was likely to occur through chance alone, the number of measures did not grossly deviate from chance expectations. Therefore, the propensity model was retained without adjustments. Treatment Effects To determine if multilevel modeling was necessary, intraclass correlations (ICCs) were calculated separately for each outcome measure by running an unconditional model without predictors. The ICCs indicated that between group variance accounted for 13- 17% of the total outcome measure variance, and it was determined that multilevel modeling was appropriate (see Table 7). Table 7 Intraclass Correlations and Reliabilities for Outcome Measures Measure ? ?2 ICC ? 4th Quarter Math .11 .69 .13 .74 4th Quarter Reading .12 .64 .16 .77 Math SOL 979.13 5566.12 .15 .76 Reading SOL 1009.47 5042.13 .17 .78 Note. Some students (n = 134) were missing unique teacher identifiers and analyses were run with N = 8186 students. Tau (?) is the between group variance. Sigma squared (?2) is the within group variance. The intraclass correlation, or ICC, is the proportion of total variance accounted for by between group variance and is calculated as ? / (? + ?2). ? is the Lambda reliability. 30 The effects of being the focus of an IC case on academic achievement are summarized in Tables 8 through 11. With p < .05, the effect of being the focus of an IC case on fourth quarter math grades, fourth quarter reading grades, and reading SOL scores was not statistically significant (p = .27, .49; and .17, respectively). However, a statistically significant negative effect was found on math SOL scores (p = .04). Net of prior achievement and controlling for propensity strata, average math SOL scores were 11.54 points lower for students who were the focus of an IC case. The size of this effect was small (d = -.13). Table 8 Effect of Being the Focus of an IC Case on Math Grades Fixed Effect ? SE t Ratio Intercept, ?00 3.09 .09 36.24*** Focus of IC Case, ?01 -.07 .07 -1.11 Strata 2, ?02 -.11 .04 -3.23* Strata 3, ?03 -.25 .06 -4.27* Strata 4, ?04 -.41 .06 -6.98*** Strata 5, ?05 -.57 .06 -5.23* Prior Math Achievement, ?06 .46 .05 9.92*** Random Effect Variance Component df ?2 Intercept, u0j .14 26 54.64* IC Case, u1j .06 26 40.39* Strata 2, u2j .03 26 39.05* Strata 3, u3j .07 26 41.58* Strata 4, u4j .10 26 38.99* Strata 5, u5j .14 26 37.75? Prior Math Achievement, u6j .03 26 41.58* Residual Error, rij .44 ?p < .10. *p < .05. ***p <.001. 31 Table 9 Effect of Being the Focus of an IC Case on Math SOL Scores Fixed Effect ? SE t Ratio Intercept, ?00 496.03 6.00 82.63*** Focus of IC Case, ?01 -11.54 5.54 -2.08* Strata 2, ?02 -5.66 3.76 -1.50 Strata 3, ?03 -17.61 6.54 -2.69* Strata 4, ?04 -27.44 6.85 -4.00* Strata 5, ?05 -41.30 9.90 -4.17* Prior Math Achievement, ?06 44.92 2.78 16.15*** Random Effect Variance Component df ?2 Intercept, u0j 1443.98 108 269.72*** Strata 2, u2j 334.94 108 139.83* Strata 3, u3j 711.17 108 163.46* Strata 4, u4j 856.04 108 157.70* Strata 5, u5j 799.44 108 151.14* Prior Math Achievement, u6j 159.06 108 169.60*** Residual Error, rij 3544.37 Note. Focus of IC Case did not significantly vary between classrooms. Effect size, d, is calculated as 2t/df. ?p < .10. *p < .05. ***p <.001. Table 10 Effect of Being the Focus of an IC Case on Reading Grades Fixed Effect ? SE t Ratio Intercept, ?00 3.14 .08 41.86*** Focus of IC Case, ?01 -.05 .07 -.69 Strata 2, ?02 -.13 .05 -2.63* Strata 3, ?03 -.28 .05 -5.37*** Strata 4, ?04 -.45 .07 -6.51*** Strata 5, ?05 -.66 .07 -9.19*** Prior Reading Achievement, ?06 .33 .03 12.06*** Random Effect Variance Component df ?2 Intercept, u0j .17 108 226.29*** Strata 2, u2j .05 108 133.51* Strata 3, u3j .11 108 139.48* Strata 4, u4j .19 108 168.45*** Strata 5, u5j .18 108 140.96* Prior Reading Achievement, u6j .03 108 169.13*** Residual Error, rij .46 Note. Focus of IC Case did not significantly vary between classrooms. ?p < .10. *p < .05. ***p <.001. 32 Table 11 Effect of Being the Focus of an IC Case on Reading SOL Scores Fixed Effect ? SE t Ratio Intercept, ?00 478.24 5.17 92.50*** Focus of IC Case, ?01 -7.80 5.63 -1.39 Strata 2, ?02 -5.28 3.72 -1.42 Strata 3, ?03 -15.92 4.47 -3.57* Strata 4, ?04 -23.18 6.75 -3.44* Strata 5, ?05 -35.46 8.08 -4.39* Prior Reading Achievement, ?06 33.40 2.56 13.03*** Random Effect Variance Component df ?2 Intercept, u0j 1553.59 108 267.64*** Strata 2, u2j 469.14 108 144.70* Strata 3, u3j 869.27 108 147.94* Strata 4, u4j 1145.03 108 139.03* Strata 5, u5j 1290.12 108 139.07* Prior Reading Achievement, u6j 127.35 108 143.97* Residual Error, rij 3773.60 Note. Focus of IC Case did not significantly vary between classrooms. ?p < .10. *p < .05. ***p <.001. 33 Chapter 4: Discussion Instructional Consultation (Rosenfield, 1995) and its multidisciplinary team-based model of delivery, IC Teams (Rosenfield & Gravois, 1996), aim to improve student academic performance, decrease overall referrals and disproportionate minority referrals to special education, and to enhance teachers? instructional practices. Until recently, research on IC and IC Teams has used quasi-experimental methods that did not adequately address the problem of selection bias to evaluate the effect of the program on special education referral practices (Gravois & Rosenfield, 2002; 2006; Newman, 2007) or student reading achievement (Levinsohn, 2000; Silva, 2007). A randomized-control study of the effect of IC Teams has come to a close; however, levels of program use may not have been sufficient to yield measurable effects on the population of students (Berger et al., 2010). The present study evaluated the effect of IC on the students who were the specific focus of a teacher consultation and is the first quasi-experimental study of IC or IC Teams to reduce selection threats to causal inference by applying propensity score analysis. Specifically, this study evaluated the effect of IC on the reading and math achievement in the third through fifth grade by comparing students who were and were not selected as the focus of the consultation, but were balanced in their estimated propensity to have been selected. Results using multilevel modeling did not find statistically significant effects of IC on standardized measures of reading or teacher assigned grades in reading or math. However, a small, but statistically significant negative effect (d = -.13) was found on standardized measures of math. 34 These findings of no effects or slightly negative effects of the program on academic achievement are consistent with the intent-to-treat-students analyses from the recent randomized-control evaluation of IC Teams (Bruckman et al., 2010). While findings do not suggest that IC has a significant positive effect on academic achievement, the finding of a negative effect on a single measure of math achievement does not suggest that IC interfered with student learning. According to a summary program report (Gravois, Nelson, & Sherry, 2007), the majority of IC cases during Year 1 Intervention (2006-07) addressed student reading, writing, organizational, or behavioral concerns. Only 25% of the IC cases addressed student math concerns. Given the small percentage of cases that provided direct support to teachers and indirect support to students in the area of math, measurable positive effects would not have been likely. Furthermore, the 11-point difference in average math SOL scores between selected and non-selected students represents only 3% of the total possible range of scores. Therefore, this effect, while statistically significant, is not likely to be of practical importance. Limitations. Several problems limit the validity of inferences that can be made from the results of this study. First, the treatment propensity estimation model was poorly fit. Despite an adjustment in the cut value, the model did not classify any cases as having been selected as a focus of IC, and the distribution of propensity scores was highly positively skewed. Furthermore, after being balanced on estimated propensity, selected and non-selected students statistically significantly differed on slightly more covariates than would have been expected by chance alone. Although participants were balanced on the observed covariates included in the treatment propensity model when evaluating treatment effects, 35 treatment selection was not modeled effectively. Therefore, systematic differences between selected and non-selected students remain a plausible explanation for the findings. Both independent and dependent measures used to estimate treatment propensity and treatment effects had cases with missing values. Although participants were balanced on patterns of missing values and the MI procedure that was used to impute missing data yields parameter estimates and standard errors with less bias than single imputation methods (Allison, 2002), missing data remains a possible problem. First, imputation is less reliable for variables with a high proportion of missing values. While 63% (n = 5246) of the student sample was missing values on one or more variables (M = 5.55, SD = 6.22), approximately 30% of participants were missing values for the teacher survey composites. Due to the high proportion of missing values for the teacher survey composites, including these variables when estimating treatment propensity may have introduced bias and contributed to the poor model fit. Moreover, it is possible that data were not MAR, as is assumed for MI, thereby introducing further bias. Teachers may have extended their application of the knowledge and skills gained through consultation to improve instructional practices and address additional student concerns, thereby diffusing the effect of the program to non-selected students. If treatment diffusion occurred, the classroom may be a more appropriate unit of analysis than the student. With the classroom as the unit of analysis, it would be expected that classrooms whose teachers sought support of the IC Team would have higher average achievement, net of prior achievement, than classrooms whose teachers did not seek IC Team support. In fact, Silva (2007) did not find effects of attending an IC Team school 36 on students, but did find a significantly positive effect of being in an IC Team school on average classroom reading achievement. Finally, the measures of achievement used in this study may not have been sufficiently sensitive to measure change, and prior achievement may not have been sufficiently controlled. The SOLs broadly measure reading and math achievement, but the teacher consultation may have focused on only one of several skills that comprise the SOL score. Furthermore, first quarter domain grades were used as a covariate control when evaluating treatment effects because only fourth and fifth grade students had SOL scores from the previous year. However, for the fourth and fifth grade students, first quarter grades and prior SOL scores were only moderately correlated. Future Directions. When random assignment of students to IC and Non-IC conditions is not possible or practical, the utility of applying propensity scores to reduce selection threats relies on effectively modeling the selection process. While it is possible that variables relating to selection were not measured in this study, the treatment propensity model only considered main effects and may not have been sufficiently complex to model the student-teacher dynamics that influenced the selection process. According to Rosenbaum and Rubin (1984), adding interaction or non-linear terms to the propensity model may improve model fit. Replicating the current study with a better specified treatment propensity model would improve the validity of the inferences about the effect of IC on student academic achievement. Furthermore, the pursuit of a better specified treatment propensity model is an appropriate avenue for research independent of an evaluation of treatment effects. A brief review of the literature over the past 10 years did not find any 37 studies that attempted to quantify the dynamic process of referring students to school intervention teams. Instead, most studies simply described the referred sample or focused exclusively on referral odds based on student demographic characteristics. The problem of missing data is common in large-scale, school-based research. Imputing values for the teacher survey composites and including those measures when estimating treatment propensity may have introduced bias. Moreover, the pattern of missing data may not have been MAR, as had been assumed. Future research should evaluate the plausibility of these threats to the validity of the findings in this study. If treatment effects are consistent when treatment propensity is modeled with and without the teacher survey composites, then including variables with a high percentage of missing values was not a plausible limitation in this study. Furthermore, listwise deletion was not the chosen option for handling missing data because doing so may have substantially reduced effective sample size, and therefore, statistical power. However, listwise deletion is more robust to violations of the MAR assumption than the EM algorithm or MI (Allison, 2002), and comparing outcomes among listwise deletion, the EM algorithm, and MI data sets should be considered. If treatment propensity models and treatment effects are consistent across methods, then potential violations of the MAR assumption is less a less plausible limitation in this study. Finally future research that makes use of the data set and methods in this study to evaluate the effect of IC on student academic achievement should consider alternative student samples. First, evaluating the effect of the program exclusively among the fourth and fifth grade students would allow prior SOL scores to be used as controls for prior achievement. Second, the potential problem of treatment diffusion could be evaluated by 38 replicating the study, but sampling the non-selected students from the 17 schools not implementing IC Teams. Evaluating the effect of IC on academic achievement during Year 2 Intervention (2007-08) and Year 3 Intervention (2008-09) may yield further information about the effect of treatment diffusion and levels of use. 39 Appendix A Measures Included in the Imputation Model Student Teacher Measure Year Measure Year 2005- 06 2006- 07 2005- 06 2006- 07 Demographic Demographic Gender x Gender x Advantaged Ethnicity x Advantaged Ethnicity x Limited English Proficient x Age x Grade Level x TSR Old for Grade x Years Teaching x Young for Grade x Years at School x Services Elementary Licensure x Free and Reduced Meals x Level of Education x Special Education x x Efficacy x x English as Second Language x Collaboration x x IC Case x Job Satisfaction x x Enrollment Instructional Practices x x New to District in 2006-07 x TRSB Entered after 1st Quarter x Global Progress x x Proportion Days Enrolled x x Global Behavior x x Proportion Days Absent x x Concentration x x Retained at End of Year x x Externalizing x x Achievement Internalizing x x 1st Quarter Listening x x Closeness x x 1st Quarter Math x x Conflict x x 1st Quarter Reading x x 1st Quarter Writing x x 4th Quarter Listening x x 4th Quarter Math x x 4th Quarter Reading x x 4th Quarter Writing x x Listening GPA x x Math GPA x x Reading GPA x x Writing GPA x x Overall GPA x x Math SOL x x Reading SOL x x Note. Imputed measures are highlighted. Non-highlighted measures either did not have missing values or were included in the imputation model as a highly correlated predictor. Only one participant had a missing value for Free and Reduced Meals. 40 References Artiles, A., Klinger, J., & Tate, W. (2006). Representation of minority students in special education: Complicating traditional explanations. Educational Researcher, 35, 3- 5. Bradley-Johnson, S., & Dean, V. (2000). Role change for school psychology: The challenge continues in the new millennium. Psychology in the Schools, 37(1), 1-5. Berger, J., Vaganek, M., Yiu, H., Nelson, D., Rosenfield, S., Gravois, T., et al. (2010). Exploratory study of teacher utilization of Instructional Consultation Teams. Unpublished manuscript. University of Maryland at College Park. Bruckman, K., Vu, P., Vaganek, M., Berger, J., Rosenfield, S., & Gottfredson, G. (2010). The effects of Instructional Consultation Teams on student achievement and teacher ratings. Unpublished manuscript, University of Maryland at College Park. Bryk, A., & Schneider, B. (2003). Trust in schools: A core resource for school reform. Educational Leadership, 60, 40-44. Condron, D. (2008). An early start: Skill grouping and unequal reading gains in the elementary years. The Sociological Quarterly, 49, 363-394. D'Agostino, R. & Rubin, D. (2000). Estimating and using propensity scores with partially missing data. Journal of the American Statistical Association, 95, 749-759. Ehrhardt-Padgett, G., Hatzichristou, C., Kitson, J., & Meyers, J. (2004). Awakening to a new dawn: Perspectives of the future of school psychology. School Psychology Review, 33(1), 105-114. 41 Erchul, W., & Sheridan, S. (2008). Overview: The state of scientific research in consultation. In W.P. Erchul & S.M. Sheridan (Eds.), Handbook of research in consultation. New York: Lawrence Erlbaum Associates. Experimental Evaluation of Instructional Consultation Teams. (2010a). Does level of use make a difference for teacher outcomes? (Research Note 2). College Park, MD: University of Maryland, Department of Counseling and Personnel Services. Experimental Evaluation of Instructional Consultation Teams. (2010b). Intent-to-treat- school and intent-to-treat-teacher (maximum-opportunity for exposure) perspectives (Research Note 1). College Park, MD: University of Maryland, Department of Counseling and Personnel Services. Gravois, T., & Gickling, E. (2008). Best practices in curriculum-based assessment. In A. Thomas & J. Grimes (Eds.), Best practices in school psychology V (pp. 885-898). Bethesda, MD: National Association of School Psychologists. Gravois, T., Nelson, D., & Sherry, E. (2007). Prince William County Instructional Consultation Team Consortium: 2006-07 End of Year Progress Report, Prince William County. Gravois, T., & Rosenfield, S. (2002). A multi-dimensional framework for evaluation of instructional consultation teams. Journal of Applied School Psychology, 19(1), 5- 29. Gravois, T., & Rosenfield, S. (2006). Impact of instructional consultation teams on the disproportionate referral and placement of minority students in special education. Remedial and Special Education, 27(1), 42-52. 42 Gravois, T., Rosenfield, S., & Gickling, E. (1999). Instructional consultation teams: Training manual. College Park, MD: University of Maryland, Instructional Consultation Lab. Gutkin, T., & Curtis, M. (1999). School-based consultation theory and practice: The art of science of indirect service delivery. In C. R. Reynolds & T. B. Gutkin (Eds.), Handbook of school psychology (3rd ed.). New York: John Wiley. Hahs-Vaughn, D., & Onwuegbuzie, A. (2006). Estimating and using propensity score analysis with complex samples. The Journal of Experimental Education, 75, 31- 65. Hodges, K., & Grunwald, K. (2005). The use of propensity scores to evaluate outcomes for community clinics: Identification of an exceptional home-based program. The Journal of Behavioral Health Sciences & Research, 32(3), 294-305. Hong, G., & Yu, B. (2008). Effects of Kindergarten retention on children?s social- emotional development: An application of propensity score method to multivariate, multilevel data. Developmental Psychology, 44(2), 407-421. Levinsohn, M. (2000). Evaluating instructional consultation teams for student reading achievement and special education outcomes. Dissertation Abstracts International, 62 (01), 128A. (UMI No. 3001440) Luellen, J. (2007). A comparison of propensity score estimation and adjustment methods on simulated data. Dissertation Abstracts International, 68(5-B), 3433. (UMI No. 3263706). Luellen, J., Shadish, W., & Clark, M. (2005). Propensity scores: An introduction and experimental test. Evaluation Review, 29, 530-558. 43 Newman, D. (2007). An investigation of the effect of instructional consultation teams on special education placement rate. Unpublished master?s thesis. University of Maryland at College Park. O'Connor, C., & Fernandez, S. (2006). Race, class, and disproportionality: Reevaluating the relationship between poverty and special education placement. Educational Researcher, 35, 6-11. Perkins, S., Tu, W., Underhill, M., Zhou, X., & Murray, M. (2000). The use of propensity scores in pharmacoepidemiologic research. Pharmacoepidemiology and drug safety, 9, 93-101. Pianta, R. (2001). STRS Student-Teacher Relationship Scale. Professional manual. Odessa, FL: Psychological Assessment Resources. Raudenbush, S., & Bryk, A. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. Reid, K., & Knight, M. (2006). Disability justifies exclusion of minority students: A critical history grounded in disability studies. Educational Researcher, 35, 18-23. Rosenbaum, R., & Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55. Rosenbaum, R., & Rubin, D. (1984). Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association, 79, 516-524. Rosenfield, S. (1995). Instructional consultation: A model for service delivery in the schools. Journal of Educational and Psychological Consultation, 6, 297-316. 44 Rosenfield, S. (2005). Best practices in instructional consultation. In A. Thomas & J. Grimes (Eds.), Best practices in school psychology IV (pp. 609-623). Bethesda, MD: National Association of School Psychologists. Rosenfield, S., & Gottfredson, G. (2004). Evaluating the Efficacy of Instructional Consultation Teams. Unpublished grant proposal. University of Maryland, Department of Counseling and Personnel Services. Retrieved June 4, 2008 from http://www.icteams.umd.edu/Proposal%20for%20Project.pdf. Rosenfield, S., & Gravois, T. (1996). Instructional consultation teams: Collaborating for change. New York: Guilford. Rosenfield, S., & Gravois, T. (1999). Working with teams in the school. In C. R. Reynolds & T. B. Gutkin (Eds.), Handbook of school psychology (3rd ed., pp. 1025-1040). New York: John Wiley. Rosenfield, S., Silva, A., & Gravois, T. (2008). Bringing instructional consultation to scale: Research and development of IC and IC Teams. In W.P. Erchul & S.M. Sheridan (Eds.), Handbook of research in consultation. New York: Lawrence Erlbaum Associates. Rubin, D., & Thomas, N. (1996). Matching using propensity scores: Relating theory to practice. Biometrics, 52, 249-264. Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York: Houghton Mifflin Company. Schafer, J., & Graham, J. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147-177. 45 Sheridan, S., Welch, M., & Orme, S. (1996). Is consultation effective? A review of outcome research. Remedial and Special Education, 17(6), 341-354. Silva, A. (2007). A quasi-experimental evaluation of reading and special education outcomes for English language learners in instructional consultation schools. Unpublished doctoral dissertation, University of Maryland at College Park. Tschannen-Moran, M., & Hoy, A. (2001). Teacher efficacy: capturing an elusive construct. Teaching and Teacher Education, 17, 783-805. Vanderweele, T. (2006). The use of propensity score methods in psychiatric research. International Journal of Methods in Psychiatric Research, 15(2), 95-103. Virginia Department of Education. (2005). Virginia Standards of Learning Assessments Technical Report: 2003-2004 Administration. Retrieved December 13, 2009, from http://www.doe.virginia.gov/VDOE/Assessment/home.shtml. Vu, P., Bruckman, K., Koehler, J., Kaiser, L., Rosenfield, S., Nelson, D., et al. (2009). The effect of Instructional Consultation Teams on teacher beliefs and instructional practices. Unpublished manuscript, University of Maryland at College Park. Werthamer-Larson, L., Kellam, S., & Wheeler, L. (1991). Effect of first-grade classroom environment on shy behavior, aggressive behavior, and concentration problems. American Journal of Community Psychology, 19, 585-602. Wu, W., West, S., & Hughes, J. (2008). Effect of retention in first grade on children?s achievement trajectories over 4 years: A piecewise growth analysis using propensity score matching. Journal of Educational Psychology, 100(4), 727-740. 46 Ye, Y., & Kaskutas, L. (2009). Using propensity scores to adjust for selection bias when assessing the effectiveness of Alcoholics Anonymous in observational studies. Drug and Alcohol Dependence, 104, 56-64. Ysseldyke, J., Burns, M., Dawson, P., Kelley, B., Morrison, D., Ortiz, S., et al. (2006). School psychology: A blueprint for training and practice III. Bethesda, MD: National Association of School Psychologists.