ABSTRACT Title of dissertation: ERRORS IN HOUSING UNIT LISTING AND THEIR EFFECTS ON SURVEY ESTIMATES Stephanie Eckman, Doctor of Philosophy, 2010 Dissertation directed by: Dr. Frauke Kreuter Joint Program in Survey Methodology In the absence of a national population or housing register, field work orga- nizations in many countries use in-field housing unit listings to create a sampling frame for in-person household surveys. Survey designers select small geographic clusters called segments and specially trained listers are sent to the segments to record the address and/or description of every housing unit. These frames are then returned to the central office where statisticians select a sample of units for the survey. The quality of these frames is critically important for the overall survey quality. A well designed and executed sample, efforts to reduce nonresponse and measurement error, and high quality data editing and analysis cannot make up for errors of undercoverage and overcoverage on the survey frame. Previous work on housing unit frame quality has focused largely on estimat- ing net coverage rates and identifying the types of units and segments that are vulnerable to undercoverage. This dissertation advances our understanding of the listing process, using sociological and psychological theories to derive hypotheses about lister behavior and frame coverage. Two multiple-listing datasets support tests of these hypotheses. Chapter 1 demonstrates that two well-trained and expe- rienced listers produce different housing unit frames in the same segments. Chap- ter 2 considers listing as a principal-agent interaction, but finds limited support for the ability of this perspective to explain undercoverage in traditional listing. Chap- ter 3 has more success explaining the mechanisms of error in dependent listing. Listers tend not to correct the errors of inclusion and exclusion on the frame they update, leading to undercoverage and overcoverage. Chapter 4 tests for bias due to the observed undercoverage, but finds little evidence that lister error would lead to substantial changes in survey estimates. Housing unit listing is a complex task that deserves more research in the survey methods literature. This work fills in some of the gaps in our understanding of the listing process, but also raises many questions. The good news for survey researchers is that the listers? errors appear to be somewhat random with respect to the household and person characteristics, at least for the variables and datasets studied in this work. ERRORS IN HOUSING UNIT LISTING AND THEIR EFFECTS ON SURVEY ESTIMATES by Stephanie Eckman Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2010 Advisory Committee: Dr. Frauke Kreuter, Chair/Adviser Dr. Katharine G. Abraham Dr. J. Michael Brick Dr. Colm A. O?Muircheartaigh Dr. Melissa A. Milkie ? Copyright by Stephanie Eckman 2010 Dedication I dedicate this dissertation to my parents, Nancy and Clif Eckman, who let me get my own library card when I was five, taught me to always sit in the front row, and put up yard signs to alert the neighbors of my academic achievements. This degree is the culmination of your years of encouragement. ii Acknowledgments I received help from many people in preparing this dissertation. I will cer- tainly leave out a few, who I hope will accept my apologies. Funding to support my own time while working on this research as well as for the data collection came from many sources: the Census Bureau Dissertation Fel- lowship, the Centers for Disease Control and Prevention Grants for Public Health Dissertations, theMarylandPopulationResearchCenter, theCharlesCannellFund in Survey Methodology, and the Rensis Likert Fund in Research in Survey Method- ology. This research would not have been possible without the support of these funders. I was very lucky to spend the fall of 2009 at the Swiss Institute of Technology (ETH) in Z?rich, in the Chair of Professor Diekmann, who welcomed me into his research community. He and his students, especially Reto Meyer, were supportive and encouraging as I struggled with address matching and Swiss German. Thanks to Bob Groves for pestering me to explore this topic as a dissertation and making NSFG available to me to carry out the research. He put me in touch with the NSFG team, who were incredibly helpful to a stranger who wanted to get her hands on their most sensitive data. Nicole Kirgis, Shonda Kruger-Ndiaye, and Jim Lepkowski were particularly patient with me. Thanks also to Dr. Mosher at NCHS for permitting me to run my study in his survey and use the data before its public release. Tommy Wright at the Statistical Research Division of the Census Bureau al- iii lowed me to access desk space and data at the Census Bureau. He put me in touch with Jim Liu, Cliff Loudermilk, and Aliza Kwiat, who generously shared their data with me and answered my nearly endless questions. The Joint Program in Survey Methodology is a wonderful place to work and research. I enjoyed discussions over countless lunches in the conference room and have learned from all of the students and professors. When Roger was recruiting me to join the program, he said his motto was ?I love you, now get out of here? and I have benefitted from exactly that environment at JPSM. Particular thanks to Colm O?Muircheartaigh for getting me involved in cover- age research during my first few months at NORC. He showed me one could make a career out of fun survey methods research, and, over the course of many nice lunches, convinced me I could do it too. I am especially thankful to the members of the Kreuter Research Group over the years: Michael Lemay, Carolina Cases-Cordero, and of course Frauke Kreuter herself. My best education in how to be a researcher occurred in these weekly meetings. Many thanks to each of you for creating such a special environment. Both Frauke and Colm have been wonderful mentors and have also become good friends. Thank you, thank you! iv Table of Contents List of Tables vii List of Figures viii 1 Stochastic Coverage: Inter-lister Agreement in Repeated Housing Unit List- ing 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Data & Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Construction of Agreement Indicator . . . . . . . . . . . . . . . . 7 1.2.2 Correlates of Lister Disagreement . . . . . . . . . . . . . . . . . 11 1.2.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.1 Multi-Level Models of Lister Agreement . . . . . . . . . . . . . . 17 1.4 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 Mechanisms of Undercoverage in Traditional Housing Unit Listing 24 2.1 Background & Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4 Results & Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.1 Comparison to Previous Results . . . . . . . . . . . . . . . . . . 42 2.4.2 Tests of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3 Confirmation Bias in Dependent Housing Unit Listing 51 3.1 Background & Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.1 Manipulation of Input List . . . . . . . . . . . . . . . . . . . . . . 59 3.3 Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.1 Testing for Failure-to-Add Error . . . . . . . . . . . . . . . . . . 63 3.3.2 Testing for Failure-to-Delete Error . . . . . . . . . . . . . . . . . 65 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4.1 Failure-to-Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4.2 Failure-to-Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.4.3 Summary and Interpretation of Results . . . . . . . . . . . . . . 75 3.5 Discussion & Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4 Bias Due to Undercoverage in Housing Unit Frames 82 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2.1 Survey Background . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2.2 Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2.3 Undercoverage in NSFG listing . . . . . . . . . . . . . . . . . . . 86 v 4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3.1 Direct Approach to Bias Estimation . . . . . . . . . . . . . . . . 88 4.3.2 Indirect Approach to Bias Estimation . . . . . . . . . . . . . . . 89 4.3.2.1 Listing Propensity Models . . . . . . . . . . . . . . . . . 90 4.3.3 Variance of Bias Estimates . . . . . . . . . . . . . . . . . . . . . . 95 4.4 Results of Bias Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.5 Discussion & Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5 Conclusions 107 A Appendix: Coding of Quality of Listing Maps 115 B Appendix: Logistic Regression Models of Traditional Listing Propensity 118 C Appendix: Matching Addresses in NSFG Listing 120 C.1 Matching Input List to L3 . . . . . . . . . . . . . . . . . . . . . . . . . . 122 C.1.1 Step 1: Match by ID . . . . . . . . . . . . . . . . . . . . . . . . . . 122 C.1.2 Step 2: Automatic Address Match . . . . . . . . . . . . . . . . . 123 C.1.3 Step 3: Manual Address Match . . . . . . . . . . . . . . . . . . . 125 C.1.4 Quality Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.2 Matching Three Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 D Appendix: Interviewer Questionnaire 130 E Appendix: Development of Housing Unit Level Characteristics in NSFG Dataset 143 F Appendix: Debriefings with SRC Listers and Interviews 145 F.1 Questions to Guide Debriefing Discussions . . . . . . . . . . . . . . . . 145 F.2 Quotes from Transcripts of Debriefing Discussions . . . . . . . . . . . 149 F.2.1 Quotes from Lister A . . . . . . . . . . . . . . . . . . . . . . . . . 150 F.2.2 Quotes from Lister B . . . . . . . . . . . . . . . . . . . . . . . . . 152 F.2.3 Quotes from Lister D . . . . . . . . . . . . . . . . . . . . . . . . . 153 F.2.4 Quotes from Lister E . . . . . . . . . . . . . . . . . . . . . . . . . 156 F.2.5 Quotes from Lister F . . . . . . . . . . . . . . . . . . . . . . . . . 161 G Appendix: Census 2010 Address Canvassing Whistleblower Post 164 H Appendix: Content of NSFG Cycle 7 Female and Male Questionnaires 169 H.1 Female Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 H.2 Male Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 vi List of Tables 1.1 Matches Identified, by Matching Step . . . . . . . . . . . . . . . . . . . 7 1.2 Text Matches in Seven Passes . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Summary Statistics on Variables Available in Agreement Model . . . 15 1.4 Agreement Rates by Housing Unit Characteristics . . . . . . . . . . . 17 1.5 Model of Probability of Agreement Between Two Listers . . . . . . . . 20 2.1 Number of Housing Units Listed in First and Second Listing, by Seg- ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2 Comparison of Listing Rates by Housing Unit and Segment Charac- teristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.3 Traditional Listing Linear Probability Models, Selected Cases Only . 47 3.1 Number of Cases Deleted from Input to Third Listing, by Type . . . . 60 3.2 Number of Cases Added to Input to Third Listing, by Type . . . . . . . 60 3.3 Level of Manipulation in Four Segment Sets . . . . . . . . . . . . . . . 62 3.4 Percent of Unmanipulated and Deleted Units Listed in Dependent Listing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5 Failure-to-Add: Difference-in-Differences, in Percentage Points . . . . 68 3.6 Failure-to-Add: Listing Propensity Models on Unmanipulated and Deleted Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.7 Percent of Added Units Listed in Dependent Listing . . . . . . . . . . 72 3.8 Failure-to-Delete: Comparison of Listing Rates between Traditional and Dependent Listing for Added Cases . . . . . . . . . . . . . . . . . . 73 3.9 Failure-to-Delete: Deletion Propensity Models on Added Cases . . . . 74 3.10 Comparison of Listing Rates of Manipulated Cases in Traditional and Dependent Listing, in Percentage Points . . . . . . . . . . . . . . . . . . 77 4.1 Sample Performance in Quarter 12 of NSFG, Selected and Matched Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2 Variables Used in Bias Analysis . . . . . . . . . . . . . . . . . . . . . . . 86 4.3 Percent of Cases Listed by Second and Third Listings, by Survey Stage 88 4.4 Listing Propensity Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.5 Bias Methods with 99% Confidence Intervals for All Variables and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 B.1 Traditional Listing Propensity Models, Selected Cases Only . . . . . . 119 C.1 Parsing and Standardizing Street Variable . . . . . . . . . . . . . . . . 124 C.2 Automatic Matches found, by Pass . . . . . . . . . . . . . . . . . . . . . 124 C.3 Manual Matches Found . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.4 Number of Housing Units Listed for Each of Three Listings, by Segment129 vii List of Figures 1.1 Distribution of Demographic Characteristics of Blocks in Census Bu- reau?s Double-Listed Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 DistributionoftheDistanceBetweenMatchedPairsofHousingUnits, by Matching Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Block-levelAgreementRates. HorizontalAxisisthe215Blocks, Sorted by Agreement Rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1 Ratio of Number of Housing Units Listed by First Lister to Number Listed by Second, by Segment . . . . . . . . . . . . . . . . . . . . . . . . 42 4.1 Distribution of Predicted Listing Propensities, by Listing Method . . 94 4.2 Estimates of Bias in Survey Variables Due to Undercoverage, by List- ing Method and Estimation Method . . . . . . . . . . . . . . . . . . . . 99 A.1 Example SRC Listing Map . . . . . . . . . . . . . . . . . . . . . . . . . . 116 viii Chapter 1 Stochastic Coverage: Inter-lister Agreement in Repeated Housing Unit Listing Just as interviewers can introduce variability into survey estimates when they recruit respondents (O?Muircheartaigh and Campanelli, 1998) and administer questionnaires (Schnell and Kreuter, 2005), listers can introduce variance when they create housing unit frames. If the frames created by different listers in the same blocks are not the same over replications of the listing process, then the col- lected survey data will vary as well. This paper introduces the stochastic view of housing unit listing, which considers every unit in the target population to have a propensity to be covered. Some housing units will be listed on nearly every frame (i.e. have a listing coverage propensity near 100%) and others will be missing from most frames (a listing coverage propensity near 0%).1 This paper uses a dataset collected by the Census Bureau which contains two listings of a sample of areas. Two field representatives listed each block using the same methods. Analysis reveals a good deal of inter-lister disagreement: the two listers do not create the same frame. The extent of disagreement is worrisome for all studies which use listing to create sampling frames. 1Coverage propensity is analogous to response propensity (Dalenius, 1983; Oh and Scheuren, 1983;BethlehemandKersten,1985;GrovesandCouper,1992;LesslerandKalsbeek,1992)withone difference: cases can be inappropriately covered (that is, overcovered), but the concept of response propensity does not allow for inappropriate response. 1 1.1 Background Most studies of housing unit listing have focused on estimating net coverage rates, which is simply the number of housing units listed on the frame divided by the number that should have been listed. A net coverage rate less than 100% indicates that undercoverage exceeds overcoverage, and this is usually the case. Es- timates of net coverage in listed housing unit frames range from 80% to more than 99% (Manheimer and Hyman, 1949; Kish and Hess, 1958; Hawkes, 1986; Jacobs, 1986; Joncas, 1985; Childers, 1992; Barrett et al., 2002; Pearson, 2003; Thompson and Turmelle, 2004; Turmelle et al., 2005; O?Muircheartaigh et al., 2006, 2007). Units in multi-unit buildings were found to be both undercovered and overcovered, as are renter-occupied units, vacant units and trailers (Subcomittee on Survey Cov- erage, 1990; Bureau of the Census, 1993; Childers, 1992; Barrett et al., 2002; Eck- man and Kreuter, 2010). Several studies found low-income areas, rural areas and oddly-shaped segments to be undercovered (Subcomittee on Survey Coverage, 1990; O?Muircheartaigh et al., 2006, 2007).2 The contribution of lister characteristics to coverage have rarely been studied. An exception is Pearson (2003), who found only weak support for the hypothesis that senior field staff make fewer errors of under- coverage and overcoverage than others. A few coverage studies have made use of the stochastic conception of coverage, either explicitly or implicitly. Chang and Kott (2004) and Chhikara et al. (2007) use logistic regression to calculate the probability of a farm being undercovered in the 2Kish and Hess (1958)?s findings at the block and segment level are different, but their method of creating a gold-standard frame makes statements about coverage at these levels inappropriate, as they acknowledge. 2 Census of Agriculture. Several studies use logistic regression to model the likeli- hood that individuals will be undercovered in the Current Population Study or the decennial census (Fay, 1989; Fein, 1990; Alho et al., 1993). Much of the literature on coverage in the decennial census coverage uses the dual-system technique to es- timate the number of individuals or housing units missed in both operations (for dual-system estimates of housing unit undercoverage, see Childers, 1992, 1993; Barrett et al., 2002). The notion of coverage propensity underlies these studies (Wolter, 1986; Panel on Coverage Evaluation and Correlation Bias in the 2010 Cen- sus, National Research Council, 2008). However, the focus of these studies is to estimate the number of housing units missed in both the initial and follow up ef- forts, not to explore the disagreement between the listers as an object of interest itself. Using frames created by two sets of listers using identical listing methods in the same areas, this paper takes a first step towards understanding listing error by investigating inter-lister disagreement: how different are two frames created by two listers in the same segments using the same methods? I investigate agreement (and disagreement) between the listers without concern for which is accurate. Ei- ther of the two listings could have served as a survey frame, and I am interested in how and why they differ. This paper explores the degree of disagreement between two listers? housing unit frames and the correlates of this disagreement. Because this paper is the first contribution to look carefully at inter-lister disagreement, the analyses presented below are largely exploratory and descriptive. The next chap- ters take a more theoretical approach to exposing the mechanisms of lister error. 3 1.2 Data & Methods In 2007, as part of its ongoing effort to evaluate the coverage of the Master Ad- dress File (MAF), the Demographic Statistical Methods Division of the Census Bu- reau listed 5700 census blocks. This listing effort was called the Frame Assessment for Current Household Surveys (FACHS). This listing used the standard Census Bureau listing methods. When the MAF contained any addresses for the selected block, the listers were given these addresses in the listing software and updated this list in the field (this method of listing is called dependent listing). When the MAF contained no addresses, listers traveled around the block and created a list of all housing units (this method is called traditional listing). The in-field listing (tra- ditionalordependent)wasassumedtobethegoldstandardframe, andthecoverage of the un-updated MAF was evaluated against this standard.3 As a check on the assumption that the field listing was in fact the gold stan- dard frame, the Census Bureau repeated the in-field listing in a subsample of 301 of the 5700 blocks. A second lister, different than the first, was sent to each of these selected blocks to list it again. If the first lister used traditional listing, so did the second. If the first lister used dependent listing, the second lister did so as well, and the input listing to the second listing was identical to the input to the first. (That is, the second listing was not dependent on the first.) The second listing was completed quickly, always within five months of the first (personal communication with Clifford Loudermilk, Demographic Statistical Methods Division, Census Bu- 3For more background on the Master Address File, the FACHS evaluations and other assess- ments of the MAF?s coverage properties, see Kennel (2007); Liu (2008, 2009); Loudermilk and Li (2009). This paper does not address the important question of MAF coverage. 4 reau). All listers who participated in the listings were trained Census Bureau field representatives. This paper uses the data from this subset of the FACHS study to explore what factors lead two listers doing the same task to create different housing unit frames. Thesubsampleforthisdoublelistingexerciseisnotnationallyrepresentative. Blocks in which the United States Postal Service?s Delivery Sequence File, a major componentofadditionstotheMAF,showednogrowthwereexcludedfromselection. (See Kwiat (2009) for details on the selection of the 301 blocks.) Four of the blocks selected for double listing were not listed a second time due to staffing constraints. Of those fielded twice, not all contained housing units. For these reasons, 215 of the 301 selected blocks are available for my analyses. 37 blocks used traditional listing because there were no cases on the Master Address File to serve as an input frame, and 178 blocks used dependent listing. These 215 blocks are in 43 states. The number of selected blocks per state ranges from one to 30; in 14 states there is just one block. The completed blocks are in 156 block groups, meaning the sample is not very clustered. 124 of the blocks are individual selections that do not border any other selected blocks. All other block groups that contain a selected block contain just two or three, except one block group where 22 blocks were selected. Figure 1.1 gives the distribution of selected housing unit and demographic characteristics for the blocks. Both listings used the Census Bureau?s listing software which provides listers with a map of the blocks they are to list and displays the addresses on the input listing (if any). Listers can add units that are not on the input listing, delete units, 5 0 .2 .4 .6 .8 1 Pct. of units trailers Pct. of units added Pct. of units multi Pct. of pop. Afr.?Amer. Pct. of pop. Hispanic Figure 1.1: Distribution of Demographic Characteristics of Blocks in Census Bu- reau?s Double-Listed Dataset move units from one block to another, or simply indicate that the unit is correct on the input listing. Listers can also move housing units by creating new mapspots. As they confirm and add units, they record whether the unit is a trailer and whether it looks to have been constructed after 2000. When adding units, listers parse ad- dresses into several fields which greatly simplifies the task of matching the two listings to identify which housing units were listed by both listers. For each of the 215 completed blocks, I have two housing unit frames, created by two listers using the same methodology. I use these frames to identify the hous- ing unit and block characteristics that correlate with agreement and disagreement between the listers. 6 Table 1.1: Matches Identified, by Matching Step Step Matches Percent Step 1: Units from MAF 44,753 84.3% Step 2: Address matches, in SAS 8,249 15.5% Step 3: Address matches, manual review 93 0.2% Total 52,995 1.2.1 Construction of Agreement Indicator The first step in preparing the data for analysis was matching the two list- ings of each block to identify agreement and disagreement between the two listers. There were 59,363 housing units listed by the first lister and 60,943 units listed by the second lister.4 These listed cases represent 67,205 unique housing units. To construct the agreement indicator, I matched these two frames against each other in a multi-step process. This sort of matching work always requires judgments. In this section, I describe my matching protocol in detail. Other researchers might make different judgments and would thus create a slightly different agreement in- dicator, which would impact the results given below. However, I feel that all of the matching decisions I made are justifiable and defensible. I matched only housing units that at least one lister included on the frame. HousingunitsontheMAFthatweredeletedbybothlistersasnonexistentorout-of- block were not matched, even though these two deletions are a kind of agreement. All matching was performed within blocks, i.e. if two listers included units with the same address but in different blocks, I do not consider these units a match.5 I 4The division into first and second listers is based on the times at which the listings were per- formed: the first listing is the one that happened earlier. 5Aliza Kwiat of the Demographic Statistical Methods Division of the Census Bureau, working 7 permitted only one to one matches. Theguidingprincipleinthematchingworkwaswhetheraninterviewerwould gotothesamehousingunitifthetwoaddresseswereselected. WhenIbelievedthat s/he would, I considered the two listed cases a match. For example, unit A and unit 1 at the same address most likely refer to the same unit, and selecting either one would lead to the same unit being approached for an interview. Step 1: ID Matching of Units on the MAF In 178 of the blocks, the two listers used dependent listing: they started with alist of unitson the MasterAddress File(MAF) in theassigned blockandconfirmed or deleted these in the field. Because every housing unit on the MAF has a unique ID, in the first matching step I simply matched units in the two frames by this ID. This step identified housing units on the MAF that both listers agreed were on the housing unit frame. As shown in table 1.1, these are the majority of the matches. Step 2: SAS Matching The next step compared the addresses of the remaining unmatched units, both those on the input list and those added in the field. Here the parsing of the addressees in the listing software was a great help. The first pass required that all of the address fields match exactly, which identified 7,676 matches. Subsequent passes dropped fields from the matching routine. For example, the second pass did not require a match on the direction prefix field, so that 932 E Elm St would match with the same repeated listing dataset, takes a slightly different approach to the matching process and thus her results do not match my results in this paper (Kwiat, 2009). Her approach is lister- centric: shelooksatagreementbetweenthetwolistersabouttheappropriateactionforeachhousing unit in the field and on the input list. My approach is frame-centric: I am interested in whether the two frames created by the listers are the same, or not. 8 Table 1.2: Text Matches in Seven Passes Address Field Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Pass 6 Pass 7 Block X X X X X X X House number X X X X X X X House number suffix X X X X Direction Prefix X Street Type Prefix X X X X X Street Name X X X X X X X Street Type X X X X X X Direction X X X Extension X X Apartment X X X X X X X Matches Found 7676 12 0 6 30 0 531 to 932 Elm St. This pass identified 12 additional matches. These passes identify more accurate matches first, to ensure that a low quality match could not crowd out a better match. After each pass I reviewed the matches before accepting them and discarded six that seemed inappropriate. All passes required that the house number, street name and apartment designator match exactly. See Table 1.2 for the match criteria at each pass and the number of matches found. At every pass, I insisted that the two addresses agree on the block number. Although pass seven would allow 115 1st Ave to match to 115 1st St, it also required thatthetwolistersplacetheseunitsinthesameblock. Myjustificationinmatching these two addresses was that interviewers assigned to interview at 115 1st Ave and 115 1st St in the same block would likely end up at the same housing unit. Thus I am comfortable matching these two addresses if they were still unmatched in the seventh pass of the second matching step. 9 Step 3: Manual Matching Units still unmatched were then output to spreadsheets and matched manu- ally. This step caught many spelling and parsing errors that were not detectable by the programs above6 as well as different apartment designators (A, B, C versus 1, 2, 3) or (101, 102, 103 versus 1, 2, 3). When one lister included two units at an ad- dress and the other only one, I matched the single unit to the first unit and left the second unit unmatched, because interviewers are trained to interview at the first unit if the single family home selected for the survey turns out to be a multi-unit home. This work identified 93 additional matches. Use of Mapspots to Review Matching As they list, Census Bureau listers capture the coordinates (latitude and lon- gitude) of each unit they confirm or add. These data help direct interviewers back to housing units selected for interview. Figure 1.2 shows the distribution of the distance between the mapspots of matched housing units, broken down by match- ing step. The average distance between matched units is 0.06 kilometers and the median is 0.03. The maximum distance is 3.3 kilometers. There are some extreme outliers among the units matched in the first two steps. However, because these two steps are the most precise of all the matches, these outliers point to the inherent variability in the capturing of the coordinates (Sando et al., 2005), not of improper matching routines. Thus although there is some variability in the distances be- tween the mapspots for units matching in the last two steps, this is likely due to 6In several cases, I fixed these spelling and parsing errors in the datasets and reran the step 2 matching routines. The match counts in Tables 1.1 and 1.2 reflect the results after these cleaning steps were applied. 10 variability in the GPS readings and in listers? mapspotting procedures, rather than to inappropriate matches. I manually reviewed all matches where the distance be- tween the housing units was greater than one kilometer and found no evidence in the address fields or listing notes that these units were not true matches. 0 1 2 3 4Km between mapspots for matched cases Step 3 Step 2: Pass 7 Step 2: Pass 5 Step 2: Pass 4 Step 2: Pass 2 Step 2: Pass 1 Step 1: ID matches Figure 1.2: Distribution of the Distance Between Matched Pairs of Housing Units, by Matching Step 1.2.2 Correlates of Lister Disagreement To this dataset of listed housing units I added variables that I expect will explain some of the disagreement between the two listers who listed these blocks. Block Characteristics Earlier work on housing unit listing coverage has found low-income, rural and oddly shaped segments to be undercovered (Subcomittee on Survey Coverage, 1990; O?Muircheartaigh et al., 2006, 2007). I will expand upon this work by exploring the 11 other characteristics that correlate with agreement (and disagreement) between the two listers. In dangerous neighborhoods, listers may be less likely to walk down alleys and gangways, enter multi-unit buildings and talk to residents, and listing qual- ity can suffer. If high crime rates suppress the listing propensity of some units, we should see more disagreement between listers in these areas. The FBI?s Uni- form Crime Reports (UCR) dataset provides violent crime rates at the county level (United States Department of Justice. Federal Bureau of Investigation., 2009).7 To separate the effects of crime itself from those demographic characteristics which affect the perception of crime and disorder, I separately control for race, ethnicity, housing unit density, and percent of households with low income (Sampson and Raudenbush, 2004). Each lister had a map of the area they were to list. However, these maps are not easy to use and may even be out of date. Comparing the listing maps for each block with the same area on Google Maps (both map and satellite views). I coded several features of each block map: whether the block had a non-visible boundary, a water boundary or was simple and rectangular. (See Appendix A for details on the coding of these variables.) Of the 215 completed blocks, 59 (27%) had non-visible boundaries, 63 (29%) had a water boundary; 15 blocks had both of these characteristics. Thirty-three blocks (15%) were rectangular-shaped with no irregularities. 7Violent crime is defined as murder, rape, robbery and aggravated assault. 12 Housing Unit Characteristics Because the dataset could contain two observations of each unit, coding hous- ing unit level variables was not straightforward. The three available housing unit variables are binary indicators of trailer, multi-unit and add (whether a unit was initially on the MAF or was added by the listers). Each lister is supposed to indicate every unit that is a trailer in the listing software. Unless a lister takes this action, a unit is assumed not to be a trailer. I coded a unit as a trailer if either lister indicated it was, because false negatives are more likely than false positives given the default behavior of the software. 4.4% of the listed housing units were trailers. Units with any text in the Apartment field of the address (except those flagged as trailers) are designated to be in multi-unit buildings. In those rare cases where the two lister disagreed about whether a unit should have an apt designator, the unit was marked as a multi-unit. This situation occurred in only 166 cases and nearly all of these were matched during the first matching step by MAF ID. 58% of the housing units in the dataset are flagged as in multi-unit buildings. UnitsthatwerenotontheMAFwereaddedbythelistersinthefield. Because listers sometimes delete a unit from the MAF and later add that same unit back in to the frame, there are a few cases of units on the two frames that match, but one was added and one was not. In those cases, the unit is not marked as an add in my dataset. All cases in traditionally listed blocks are marked as adds. 9.6% of all units were added by both listers. 13 Listers are certain to differ in their skills related to listing, such as map read- ing, spatial ability and comfort with the laptop and listing software. Furthermore listers received different kinds of training and have different work histories: some domorelistingthanothers, somearemoreusedtourbanlisting, etc. IdeallyIwould control for all of these lister-level characteristics in my models. Unfortunately I am not able to do so, due to Census Bureau restrictions on use of data about employ- ees. The Census Bureau also was not able to identify which listers worked in which blocks. Thus my models do not contain any lister data or any clustering by listers. Table 1.3 gives the means, ranges and standard deviations of the variables in my analyses. 1.2.3 Models The final dataset is at the housing unit level and contains 67,205 cases. For each observation I have a binary variable indicating whether the two listers agreed that this unit should be included on the frame or whether only one lister thought so. Agreement between the two listers is the dependent variable I attempt to explain in the models below. I run multi-level models that can account for both the clustering of units into blocks and the inclusion of block-level characteristics as explanatory variables of interest (Snijders and Bosker, 1999; Raudenbush and Bryk, 2002; Gelman and Hill, 2007). The models do not include the selection weights for the block sample due to the unusual universe from which these blocks were selected: blocks where the 14 Table 1.3: Summary Statistics on Variables Available in Agreement Model Variable Mean Std. Dev Min Max N Unit added by listers 0.096 0.295 0 1 67205 Unit is in multi-unit building 0.583 0.493 0 1 67205 Unit is atrailer 0.044 0.206 0 1 67205 Proportion of population Hispanic 0.095 0.127 0 0.973 67205 Percent of HHs with income less then 45k 0.472 0.186 0.071 0.879 67205 Proportion of population African-American (only) 0.16 0.251 0 1 67205 HUs per land square mile ,bloc klevel (standardized) 0 1 -0.542 3.156 67205 Bloc kis rural 0.191 0.393 0 1 67205 Map ,simple shape ,no interior streets 0.048 0.213 0 1 67205 Map ,bloc khas water boundary 0.258 0.438 0 1 67205 Map ,bloc khas invisible boundary 0.233 0.423 0 1 67205 Per capita crime rate ,violent crimes (standardized) 0 1 -0.968 5.128 67205 15 Postal Service Delivery Sequence File showed positive growth. Because the dependent variable is dichotomous, a logistic model is an obvious choice. However, I am interested in several interaction effects in these models and interpretation of interaction effects in nonlinear models is complex (Ai and Nor- ton, 2003). In addition, I cannot include lister effects in my models due to data limitations, and coefficient estimates in nonlinear models are particularly prone to unobserved variables bias.8 For these reasons, I use a linear probability model (as suggested by Wooldridge (2009, pp. 454?457) and Mood (2010)). I fit all models with the multilevel linear regression command x78x74x72x65x67 command in Stata 11 (StataCorp LP, 2009). 1.3 Results Across all of the 215 blocks, the listers agreed about the inclusion of 79.0% of the housing units. Two listers using exactly the same methods create frames that are quite different. Table 1.4 breaks these agreement rates down by the housing unit characteristics available on the dataset. The agreement rate is higher among units on the Master Address File, the input to the dependent listing process, than among those that were added. There is a lower agreement rate among units in multi-family structures than single-family, and among trailers than non-trailers. The block-level agreement rates are shown in Figure 1.3. In the upper right corner of this graph are 18 blocks where the two frames are in complete agreement. 8The unobserved variables effect in logistic regression is due to the fixed error term, see (Mood, 2010). 16 Table 1.4: Agreement Rates by Housing Unit Characteristics n Agreement Rate F stat Overall 67,205 79.0% On MAF 60,746 80.5% 4.26? Added 6,459 64.9% Multi-Unit 39,191 76.7% 1.30 Single Unit 28,014 82.2% Trailer 2,982 65.8% 5.24? Non-trailer 64,223 79.6% F statistic tests significance of difference within each pair F statistics reflects clustering of HUs by block ? Significant at 5% level These are not only small blocks: two-thirds of these blocks have only five or fewer listed housing units, but three have more than 30. In the lower left corner there are 28 blocks where the two frames do not agree at all. In 22 of these blocks, one lister listed zero units and the other included from one to 105 units. (As discussed above, I exclude blocks where the two listers agreed that there were no housing units.) There is a good deal of diversity in these block level agreement rates. The goal of this paper is to explain this diversity, using characteristics at the housing unit and block levels. 1.3.1 Multi-Level Models of Lister Agreement Table 1.5 presents the estimates from the multi-level model described above. The dependent variable is whether the two listers agreed about the inclusion of a housing unit (1) or not (0). Positive coefficients indicate characteristics that make agreement more likely. 17 0 .2 .4 .6 .8 1 Percent Agreement Blocks Figure 1.3: Block-level Agreement Rates. Horizontal Axis is the 215 Blocks, Sorted by Agreement Rate. 18 In the first row, the agreement probability for single family units that lis- ters added is 22 percentage points lower than the agreement probability for single- family units they did not add, and this result is strongly significant ( ?fl??0.220,z? ?31.27).9 The agreement probability for units in multi-unit buildings that were not added (and in segments with average crime rates) is seven percentage points lower than for single family units that were not added ( ?fl??0.0773,z ??17.11). There is also a strong and significant interaction effect between these two characteristics in the opposite direction, such that units that are both added and multi-unit (and in segments with an average crime rate) are 12 percentage points less likely to be listed by both listers than those which are neither added nor multi-unit. Trail- ers are also associated with inter-lister disagreement, though this coefficient is not significant. The second set of independent variables in Table 1.5 refers to segment charac- teristics. Listers are more likely to agree about the inclusion of units in blocks with Hispanic residents: the agreement probability increases by 45 percentage points when the Hispanic population of a block increases from 0% to 100%, holding all other characteristics constant ( ?fl?0.452,z?2.61). The larger the share of house- holds earning less than $45,000 per year in a segment,10 the less likely is agree- ment between the listers ( ?fl??0.359,z??2.77). The proportion of the population that is African-American does not have a significant association with agreement. Housing unit density and rural blocks are also not significantly associated with the 9This finding prefigures the discussion in Chapter 3 on confirmation bias. 10$45,000 is approximately 200% of the federal poverty level for a family of four (DeNavas-Walt et al., 2009, pg. 43). 19 Table 1.5: Model of Probability of Agreement Between Two Listers ?fl z Unit Added (1), on MAF (0) -0.220 ??? (-31.27) Multi-Unit (1), Single Family (0) -0.0773 ??? (-17.11) Multi-Unit *Added 0.177 ??? (18.99) Unit is Trailer (1) or Not (0) -0.0115 (-1.31) Proportion Pop .Hispanic 0.452 ?? (2.61) Proportion of HHs with income less then 45k -0.359 ?? (-2.77) Proportion Pop .African-American (only) 0.0653 (0.57) HUs per land square mile ,bloc klevel (standardized) 0.0169 (0.34) Bloc krural (1) or not (0) 0.0637 (1.27) Map ,simple shape (1) or not (0) -0.0722 (-1.03) Map ,bloc khas water boundary (1) or not (0) 0.0236 (0.46) Map ,bloc khas non-visible boundary (1) or not (0) 0.0520 (1.01) Per capita violent crime rate (standardized) -0.0811 ? (-2.53) Violent crime rate *Multi-unit 0.0197 ??? (3.72) Constant 0.853 ??? (11.23) StdDev(Bloc ks) 0.304 StdDev(Residual) 0.303 rho 0.502 Observations 67205 ? p? 0.05, ?? p? 0.01, ??? p? 0.001 20 agreement probability. None of the three codes of the quality of the listing maps are significant predictors of agreement. In fact these map quality variables are all in the unexpected direction: simple blocks are associated with less agreement, and blocks with water or nonvisible boundaries are associated with more agreement. Thelasttwoindependentvariablesconcerntheeffectofcounty-levelpercapita violent crimes on agreement between the listers. A one standard deviation increase in the crime rate leads to an eight percent decrease in the probability that two lis- ters will agree about the inclusion of a single-family housing unit ( ?fl??0.0811,z? ?2.53). This variable also interacts significantly with the indicator for units in multi-unit buildings. The effect of crime on the agreement probability is dampened by about two percentage points for multi-units ( ?fl?0.0197,z?3.72). 1.4 Discussion and Conclusion Segments with many multi-unit buildings, in high-crime areas and with low income households are those where listers are most likely to disagree. Surveys that focus on the poor and those in high crime neighborhoods may wish to use a multiple-lister design to capture housing units with low listing propensities and avoid the undercoverage that would likely result from the use of a single listing. Multiple listings capture housing units with low listing propensities, though they place burden on central office staff who must deduplicate the frames to control the probabilities of selection. Additional work is needed on how to raise the listing propensities of cases at risk of undercoverage. 21 Although no previous studies have looked at inter-lister agreement, many of the results above are consistent with previous research on lister error. These stud- ies have shown that listers have trouble correctly covering trailers and multi-unit homes, and this finding is consistent with my results that listers tend to disagree about these units as well. The failure of the map quality variables to account for inter-lister disagree- ment suggests that listers do not vary in how they interpret the low-quality listing maps. The coefficients on the three map quality variables are all in the opposite direction from what I expected, and have low t statistics. More research into how listers use and interpret these maps is clearly needed. (The dissertation by Rusch (2008) on the spatial abilities of listers and the design of listing software is a step in this direction.) To reduce data collection costs, some surveys are moving away from housing unit listing to commercially-available address databases, or are considering such a move.11 However, in-field listing is still thought to be the gold-standard. In fact, listed frames are often used as a benchmark against which the databases are compared (O?Muircheartaigh et al., 2003; Thompson and Turmelle, 2004; Turmelle et al., 2005; Dohrmann et al., 2006; O?Muircheartaigh et al., 2006; Dohrmann et al., 2007; O?Muircheartaigh et al., 2007). The central finding of this paper is that there is a good deal of variability in the frames different listers create, suggesting that these frames have limitations as gold standards and raising concerns about the 11Those I know of include the General Social Survey, the National Study of Drug Use and Health, the National Health Interview Study, the National Children?s Study, and the Survey of Consumer Finance. 22 estimates of the coverage rates of address databases. Few studies of housing unit listing have explicitly modeled coverage propen- sities. But the findings of substantial inter-lister disagreement above demonstrate that the stochastic model of the listing process deserves a more central role in our thinking and research on listing error. While this double listing dataset is unique and allows for interesting analy- ses, it does have several drawbacks. Most important, there is no gold standard. I have no grounds to assert that one lister?s frame is more accurate than the other?s. However, both of these listings did pass the Census Bureau?s quality control pro- cedures and thus each could serve as a frame for the many important household surveys they carry out. The second drawback is that I do not have access to data about the listers. The most important differences between the first and second listings in this study are at the lister level. Information about the listers, their experience levels, training, education, etc., would makefor a richeranalysis. Thedataset used in theremaining chapters does contain lister level data and allows for experimental manipulation to expose the mechanisms of lister error. Driven by the findings in this chapter, the next chapters make use of the stochastic conception of housing unit coverage. 23 Chapter 2 Mechanisms of Undercoverage in Traditional Housing Unit Listing Chapter 1 used a large double-listing dataset to explore the correlates of lister disagreement and showed that inter-lister agreement rates vary quite a bit across segments and housing units. While the first chapter explored the correlates of this variation, the analyses were constrained by several limitations of the dataset. I was not able to manipulate the listing method to explore the different mechanisms of error in traditional and dependent listing. I also had no data about the listers who participated in the study. In this chapter I provide a theoretical basis for errors in traditional listing, derive hypotheses, and test them using a smaller but more appropriate dataset collected for this purpose. I pay particular attention to the incentives listers face and how these can lead to frame error. Just as interviewers? incentives can lead to errors at other stages of the survey process (nonresponse bias (Manheimer and Hyman, 1949; Kennickell, 2000, 2003), sampling error (Boyd and Westfall, 1955, 1965, 1970; Alt, 1991; Eyerman et al., 2001) and measurement error (Matschinger et al., 2005)), the incentives built into the listing task can encourage listers to make inadvertent or even purposeful mistakes while listing. However, with this dataset, I find limited support for the hypotheses derived from this perspective. 24 2.1 Background & Hypotheses There are two methods of in-field housing unit listing. In traditional listing (also called scratch listing), listers travel around each selected block in the segment and record the address or description of every housing unit (Kish, 1965; Survey Re- search Center, 1969, 1976). In dependent listing (also called update or enhanced listing), listers are provided with a list of addresses and travel around the segment, correcting the list to match what they see in the field. Estimates of net cover- age1 of listed housing unit frames range from 80% to more than 99% (Manheimer and Hyman, 1949; Kish and Hess, 1958; Hawkes, 1986; Jacobs, 1986; Joncas, 1985; Childers, 1992, 1993; Barrett et al., 2002; Pearson, 2003; Thompson and Turmelle, 2004; Turmelle et al., 2005; O?Muircheartaigh et al., 2006, 2007). At the segment level, low-incomeareastendtobeundercovered, asdoruralareasandoddly-shaped segments (Subcomittee on Survey Coverage, 1990; O?Muircheartaigh et al., 2006, 2007).2 At the housing unit level, units in multi-unit buildings are both undercov- ered and overcovered, as are renter-occupied units and vacant units (Subcomittee on Survey Coverage, 1990; Childers, 1992; Barrett et al., 2002). Lister characteris- tics have not been found to be statistically-significant predictors of coverage (Pear- son, 2003). None of these coverage studies move beyond the correlates of coverage 1The net coverage rate is the number of housing units listed on the frame divided by the number that should have been listed. A net coverage rate less than 100% means that undercoverage exceeds overcoverage, and this is usually the case. However, this metric obscures important differences between frames: a frame that contains a large amount of overcoverage offset by an equally large amount of undercoverage will appear to be just as accurate as a frame that contains no undercov- erage or overcoverage. Unfortunately, most coverage studies do not collect the data that will allow them to separately calculate undercoverage and overcoverage rates. 2Kish and Hess (1958)?s findings at the block and segment level are different, but their method of creating a gold-standard frame make statements about coverage at these levels inappropriate, as they acknowledge. 25 rates to test theoretically-derived hypotheses about the mechanisms of lister error. My hypotheses are motivated by the theories from economic sociology and be- havioral economics, which in broad terms hold that individuals act in their own interests and respond to incentives, but are also subject to social norms that con- strain self-interested behavior (Coleman, 1994). In this context it is important to understand the conditions in which listers work. I conducted 30 to 60 minute de- briefings with seven listers and interviewers from the Survey Research Center at the University of Michigan to gain insight into their work situations. Extended quotations from these debriefings are given in Appendix F. We send our listers and interviewers to neighborhoods they might never visit otherwise. These include very wealthy neighborhoods with gated homes, very poor neighborhoods with public housing, and neighborhoods where few people speak En- glish. In my one-on-one debriefings with listers, they mention feeling unsafe or uncomfortable in some neighborhoods. One had a gun pulled on her in a remote part of Alaska. Others are nervous in secluded wooded areas. Listers have been pulled over by the cops for driving slowly and stopping often while listing. One lister mentioned the importance of knowing what colors the local gangs wore and avoiding them (lister debriefings). But even when people feel physically safe in strange neighborhoods, the novel environment can disturb behaviors and percep- tions (Taylor et al., 1984; De Young, 1999). In these unfamiliar settings, listers usually work entirely on their own, the only member of the project staff within hundreds of miles. When a lister encoun- ters a difficult or unclear listing situation while in the neighborhood, she should 26 call the central office for guidance, and listers report doing this (lister debriefings). However, calling in is not always feasible. Listers are not provided with cell phones and some segments are out of range. Central office staff may not be available at the time the lister calls. Faced with obstacles to reaching project staff, the lister may decide to make her own choices about the appropriate listing behavior, especially if she is far from home or feels awkward in the neighborhood. But listers are not sam- pling statisticians or principal investigators and do not know the larger goals of the survey. For these reasons the judgments they make may not be in the best interests of data quality.3 If listers do behave in the ways suggested by these theories, then they likely weigh what they know about the survey and the goals of the listing task as well as their own interests in getting home in time to pick up children or prevent another drive out to a remote area. The last important piece to understanding lister working conditions is that their frames cannot be thoroughly reviewed. Traveling in another lister to check the work is prohibitively expensive.4 Particularly in the NSFG design (discussed in more detail below) where the same person lists a segment and later interviews there, no one else associated with the project may ever visit the segment. Listers may well be aware that their work cannot be checked in the field. The listing task is one where listers work on their own, in unfamiliar neigh- 3To be fair, the effects of housing unit coverage on overall survey quality are not known to sam- pling statisticians or principal investigators, either. This dissertation is an effort to fill in this knowl- edge. 4Only the Census Bureau performs in-field relisting of segments as a check on the accuracy of their listed frames, and they check at most two blocks per interviewer per year (personal communi- cation with Rodrick Marquette, Mathematical Statistician, Decennial Statistical Studies Division, Census Bureau.) 27 borhoods, knowing that no one will discover any errors they make. These are the hallmarks of the principal-agent model. A principal-agent problem arises whenever a principal hires an agent to perform a task, and the agent has information about the work that principal does not have. The classic principal-agent problem involves a landlord (principal) and a sharecropper (agent). The landlord wants the share- cropper to put forth a high level of effort, but he cannot observe her effort level. He can observe only the amount of final output, which is a function of both the agent?s effort level and environmental conditions (soil quality, pests, rainfall, etc.) that are known only to the agent. Knowing that the landlord cannot tell how hard she is working, the sharecropper will not work as much as she could, or as much as the landlord would like. This model has also been applied to insurance markets, indus- trial regulation, executive compensation, and financial markets (Sappington, 1991; Stiglitz, 2008). Thinking about listing as a principal-agent problem reveals that a careful consideration of lister incentives is important to understanding error in housing unit frames. In this chapter I develop and test hypotheses inspired by the principal- agent model about the mechanisms of undercoverage in traditional listing. Crime Interviewers pursue different contact strategies in neighborhoods where they do not feel safe (Martin, 1981; Groves and Couper, 1992). It is likely that listers be- have differently in these areas as well. High quality listing involves walking down alleys and gangways, entering multi-unit buildings and often talking to residents. 28 In a dangerous neighborhood, listers may be less likely to do each of these. Thus I suspect that listers will create frames of lower quality in neighborhoods where they do not feel safe.5 The impact of crime on undercoverage is likely stronger with units in multi-unit dwellings, which often require more investigation to list accurately. Invisible Boundaries Some blocks have boundaries that are not obvious on the ground; they may correspond to power lines, underground streams, political boundaries, or natural features that no longer exist. When a lister is assigned such a block, she often has trouble determining which units near the boundary should be included on the frame. Listers report sometimes speaking with local residents and officials or using a GPS device (which is not provided) to find the boundary (lister debriefings), all of which mean extra effort for the lister. Errors in determining the boundaries can lead to undercoverage of units near the boundary. Language Spoken in Segment Accurate listing often requires reading signs ("For Rent" or "Coming Soon, Luxury Apartments") and asking questions of segment residents. If the residents and the lister do not speak a common language, however, listers cannot ask how many units are in a building or whether anyone lives above the strip of commercial stores. Thus segments where the primary language is not one the lister speaks are likely to be undercovered and this undercoverage should show up most strongly in 5Listers? perceptions of crime are likely related not only to actual crime rates in the neighborhood but also to the physical state of the area and the demographic makeup (Sampson and Raudenbush, 1999, 2004). I will use listers? own reports of the safety of the segment to test this hypothesis, not arrest or victimization data, to isolate the effect of listers? perceptions of crime. 29 dwellings in multi-unit buildings. Driving Driving while listing certainly makes the task more comfortable: a lister can stay warm (or cool), listen to the radio, and drink a cup of coffee. Many interviewers prefer to drive around their segments while listing, even when instructed not to (personal communication with NSFG Cycle 7 Staff, 2008). Listers drive when they are uncomfortable on the street with a laptop or when the sun makes the screen of the laptop difficult to see (lister debriefings). But driving can lead to undercoverage of units not visible from the street: a lister in a car cannot check around the back of a structure for a second entrance or multiple gas meters. Here again I expect more undercoverage of units in multi-unit buildings with this behavior. Lister Motivation Interviewers of course differ in their interests and motivations. Some listers likely take on the interests of the principal investigator as their own and others likely see the work as a job like any other. I expect that those listers who are more motivated by the pay of the interviewing task will commit more errors of undercoverage. Hypothesis: Listers who find a segment unsafe commit more errors of undercover- age than those who do not find the segment unsafe. Hypothesis: Units in multi-unit buildings are more likely than single-family units to be undercovered in high crime segments. 30 Hypothesis: Housing units on blocks with invisible block boundaries are undercov- ered. Hypothesis: Listers undercover housing units in segments where the primary lan- guage is not one they themselves speak. Hypothesis: Units in multi-unit buildings are more likely than single-family units to be undercovered in segments where the lister does not speak the language. Hypothesis: Driving while listing is associated with undercoverage of multi-units. Hypothesis: Units in multi-unit buildings are more likely than single-family units to be undercovered while driving. Hypothesis: Listers more strongly motivated to take the interviewing job by finan- cial concerns will undercover housing units. 2.2 Data To test these hypotheses I had listers from the National Survey of Family Growth (NSFG) relist a nationally-representative sample of segments used in the survey. The NSFG is a national area probability study conducted by the Division of Vital Statistics at the National Center for Health Statistics to study fertility be- havior (Groves et al., 2005). Data collection for NSFG Cycle 7 is carried out by the Survey Research Center at the Institute for Social Research, University of Michi- gan. In the current design, all interviewers are also listers: in every quarter they interview cases in their active segments and list housing units in segments that 31 will be active the next quarter. (For this reason in this paper I use the terms inter- viewer and lister interchangeably.) NSFG listers record housing units into a tablet computer while in the field. The software ensures that listers parse addresses into fields (street number, street name, apartment designator) and provides a drop down menu of known street names to minimize spelling errors and standardize abbrevi- ations. Listers also record segment level observations on the computer: method of travel around the segment, languages spoken in the segment, safety and accessibil- ity concerns, and type of housing units. Segment Selection I randomly selected 13 primary sampling units (PSUs) containing 49 seg- ments from the NSFG quarter 12 sample. The three goals of the selection process were to overrepresent segments that are likely to be more difficult to list accord- ing to previous findings in the existing literature, ensure a diverse representation on the variables involved in my hypotheses, and select a nationally representative sample. I split the quarter 12 PSUs into two strata. The first stratum contained all segments that were particularly helpful in meeting the first two goals above, as well as segments in the same PSUs as these segments. The second stratum contained the 64 segments in the remaining PSUs. I selected all 10 PSUs (40 segments) from the first strata and three PSUs (nine segments) from the second. Because each quarter of data collection for NSFG is nationally representative, my random sam- ple from quarter 12 could also represent the entire country.6 6As discussed below, however, I do not have segment selection weights for the quarter 12 seg- ments, thus all analyses are unweighted. Without weights, the sample of segments cannot be said to be nationally representative. 32 The resulting sample is diverse with respect to per capita violent crime rates and percent of multi-family units and also with respect to income and percent African-American, as desired.7 According to the observations collected by the first lister, 17 of the selected segments contain Spanish speakers, 13 contain gated build- ings, three contain at least one trailer, and in 16 of the segments the lister reported safety concerns. The first listing of eachsegment was conducted by the interviewer assigned by theproject. Thatlisteruseddependentlistingin38ofthe49selectedsegments, and traditional listing in the other 11. SRC central office staff performed several quality checks of the first listing as part of their usual protocol. They checked the order of house numbers, checked the ratio of listed counts to the Census housing unit counts and reviewed the blocks with online mapping software (personal communication with NSFG Cycle 7 Staff, 2009). The segment was then listed a second time using traditional listing in every segment. The second lister was also a trained NSFG lister and interviewer who used the listing software and made independent segment observations. The second listing was never done by the same lister who did the first and was not subject to SRC quality checks. 7The crime data is at the county-level and comes from the Federal Bureau of Investigation?s Unform Crime Reports. I calculated violent crimes per capita at the county level and then flagged those segments in the quarter 12 sample which were above the median. Violent crimes are defined in the UCR as murder, rape, robbery and aggravated assault (United States Department of Justice. Federal Bureau of Investigation., 2009). The multi-family indicator is derived from the first listing: for each segment I calculated the percent of all addresses that were in multi-family buildings (based on a non-empty apartment field) and then flagged those segments above the median. The income and race variables were similarly dichotomized. The data for these comes from the Census 2000 SF3 files (U.S. Census Bureau, 2002a). 33 Listers NSFG administers a questionnaire to all of their interviewers (see Appendix D). All interviewers who participated in my study completed this questionnaire, though one interviewer chose not to respond to several of the questions. The ques- tions cover interviewer experience, motivation and attitudes, as well as race, eth- nicity and language measures. Eleven interviewers performed the second listing of the segments. The num- ber of segments listed by each interviewer ranges from one to nine. More than half the listers have a bachelor?s or master?s degree and only one has no college at all. All listers have at least one year of interviewing experience. Five report holding another job while working for NSFG, three of these have another interviewing job. Three interviewers are African-American (the others are white) and three speak Spanish; one is both African-American and speaks Spanish. Several items on the interviewer questionnaire attempt to capture the in- terviewers? motivations in taking the job with NSFG. The questionnaire asks in- terviewers to use a ten point scale (where 1 is low and 10 is high) to report the attractiveness of four aspects of the job: flexible working hours, importance of sur- vey research, pay and interacting with a variety of people (see Q25 in Appendix D). Only on the pay variable did the interviewers use the entire scale (on the other three motivation measures, all selected eight or higher). However, this pay variable is not appropriate for my motivation analyses. First, it has missing data (one lis- ter declined to answer this question). Second, the use of the work ?attractiveness? in the question text is unfortunate: interviewers may indicate that the pay is not 34 attractive because they feel the rate is too low, not because they do not need the income. Thus I don?t think this variable truly captures interviewers? motivations in the sense I mean. A fifth variable, which has no missing data, captures whether the lister has another job in addition to working as an NSFG interviewer. Because the NSFG job requires 30 hours per week, those listers who hold another job are likely motivated more by financial concerns than those who do not. Matching Before testing my hypotheses I matched the frames created by the first and second listing of each segment. I used several techniques of increasing permissive- ness to match units that would result in the same household being contacted for screening and interview. The quality of this matching work will impact the quality of all my analyses. I provide details on the procedures used in matching all three listings of these segments (the third is not involved in this paper) in Appendix C. Three of the 49 segments posed particular challenges to matching. These are among the most rural segments, where listers used descriptions to identify housing units because there are no addresses. Because the matching in these three seg- ments is not yet complete, most results below refer only to the 46 segments, rather than to all 49. 2.3 Analysis Methods To test the above hypotheses about the effect of interviewer incentives on frame quality, I fit multilevel listing propensity models. A binary variable indi- 35 cating whether each unit was listed by the second lister or not is the dependent variable in the listing propensity models. Explanatory Variables The independent variables in the models are at three levels: listers, segments, and housing units. See Table 2.3 for a summary of the variables, and their means and ranges. The first three sets of variables given in this table are those that are used in the multivariate models discussed below. Those in the fourth set are not used in the models. Lister At the lister level are the responses to the NSFG interviewer questionnaire. The content of the questionnaire is given in Appendix D. Segment At the segment level are Census demographic data on linguistic isola- tion, urban versus rural, income and percent of the population that is African- American from the 2000 Census (U.S. Census Bureau, 2001a,b, 2002a,b).8 I also have the lister?s segment observation and her method of travel around the segment. There are several measures of map quality at this level as well (see Appendix A on the creation of the map quality measures.) Also at this level are interactions of the segment and lister level characteristics, such as whether the interviewer speaks the language of the segment residents. Housing Unit At the housing unit level are indicators for a unit in a multi-unit 8The first three of these variables (linguistic isolation, urban versus rural, and income) are avail- able at the block group level. Most segment include blocks from only one block group. For those segments that cross block group boundaries, I averaged the data across the relevant block groups to calculate segment level statistics. Data on the percent of the population that is African-American is available at the block level and thus this variable refers precisely to the segment. 36 building, for a trailer, and for units with no house number. Details on the creation of housing unit level variables are given in Appendix E. Case Base I run these models on the housing units listed by the first lister to find the characteristics at the different levels that are associated with higher and lower listing propensities by the second lister. I do not include those units listed by the second listers and not the first (there were 1029 of these across the 46 segments) becauseIhavenobasisfordistinguishingbetweenovercoveragebythesecondlister and undercoverage by the first. For example, if the second lister included many multi-unitbuildingsonJacksonAvethatthefirstlisterdidnotinclude, isitbecause those units were missed by the first lister? Or because the units should not have been listed? Perhaps the units on Jackson Ave are professional offices and not residential, or perhaps that block of Jackson is outside of the segment. Ideally I would run one model for the units that all listers should have covered, to find the characteristics that make those units more or less likely to be correctly listed, and another model for the units that no lister should have included. Unfortunately, I do not have a gold standard frame for these segments. However, for the cases that were selected for the NSFG interview, I have dis- position data that indicates whether they were proper listings or not. NSFG se- lected 2,114 cases in these segments. (Only cases listed by the first lister were eligible for selection due to cost and timing constraints.) For these cases, I have dis- position data from the NSFG screening effort about whether each case was eligible 37 Variable Mean Std. Dev . Min. Max. N Multi-Unit 0.19 0.4 0 1 1970 Vacant 0.1 0.3 0 1 1970 Trailer 0.01 0.07 0 1 1970 Proportion Pop .Spanish language 0.05 0.1 0 0.39 1970 Proportion Pop .African-American 0.17 0.25 0 1 1970 Map ,invisible boundary 0.4 0.49 0 1 1970 Lister feels unsafe 0.11 0.32 0 1 1970 Lister drove herself while listing 0.67 0.47 0 1 1970 Lister and segment language matc h 0.8 0.4 0 1 1970 Proportion HUs rural 0.06 0.19 0 1 1970 Proportion HHs with income ?? 50,000 0.56 0.21 0.12 0.89 1970 Map ,bloc ks ha ve simple shape 0.2 0.4 0 1 1970 Map ,segment has external water boundary 0.13 0.33 0 1 1970 Interviewer Hispanic 0.17 0.38 0 1 1970 Interviewer African-American 0.26 0.44 0 1 1970 38 for the screener or not, which allows me to distinguish somewhat between overcov- erage by the first lister and undercoverage by the second listers. The fact that the first listing was more thoroughly checked by the SRC central office staff than the second listing also supports my use of the cases selected from the first listing as the universe for my analyses. Of the 2,114 cases selected from the first listing, just over one percent (25) were dispositioned by the interviewers as improperly listed lines (non-residential or out-of-segment).9 These lines should not have been listed by any of the listers. In the 46 segments where all of the three-way matching is complete, 1,994 lines were selected and 24 of these were improperly listed. Usingtheproperly-listedlinesamongtheselectedcasesinthesegmentswhere the matching is complete, I can run the models described above on the observations from the second listing of each unit. (Including the first units would skew the re- sults as all the selected cases were listed by the first lister.) Models Because the dependent variable in my models is binary, logistic regression models are an obvious choice. However, given the complexities in interpreting inter- action effects in logistic and other nonlinear models (Allison, 1999; Ai and Norton, 2003; Mood, 2010) I present and interpret linear probability models, as suggested by Wooldridge (2009) and Mood (2010). I did run the same models as logistic re- gressions and they do not change the substantive conclusions. The logistic models 9I cannot tell for which reason the interviewers coded these housing units as inappropriate list- ings. 39 are given in Appendix B. Each segment was relisted using traditional listing by only one lister, and each housing unit is in only one segment. Thus my multilevel models could have three nested levels: listers, segments and housing unit. Alternatively, the segments are nested within 13 PSUs and the models could be specified in that way. However, the highest level of clustering (whether lister or PSU) does not have any effect in the models I run below (the intracluster correlation coefficient for listers or PSUs is always very near 0), so the models I run below include random effects at only the segment level. I include fixed effects for the 11 listers to remove their idiosyncratic influence on listing propensity and isolate the underlying mechanisms (Snijders and Bosker, 1999; Kohler and Kreuter, 2005). 2.4 Results & Discussion Table 2.1 shows the number of units listed in the first and second listing. The first pair of columns in this table gives the number of cases listed by each lister in each segment and does not rely on the matching work. Figure 2.1 shows the ratio of the size of the two listings graphically (number listed by the second lister divided by the number listed by the first). Points on the left, below the reference line, correspond to segments where the second lister listed fewer units than the first and those in the upper right where the second lister included many more. (The labels in the figure refer to the segment numbers in Table 2.1.) Already we can see quite a bit of diversity both between the two listings and across the segments. 40 Table 2.1: Number of Housing Units Listed in First and Second Listing, by Segment All Cases Selected Cases Segment Listing 1 Listing 2 Listing 1 Listing 2 1 55 48 19 7 2 106 110 19 17 3 155 154 19 17 4 164 182 19 17 5 89 79 39 34 6 87 87 34 34 7 130 162 34 34 8 210 204 29 27 9 108 101 27 26 10 93 106 27 25 11 122 119 54 54 12 98 94 62 52 13 149 124 62 48 14 96 101 15 9 15 159 194 55 28 16 109 111 54 50 17 96 93 43 42 18 108 89 43 33 19 83 83 43 42 20 74 81 34 34 21 80 80 38 38 22 84 84 38 38 23 122 139 38 35 24 584 580 38 38 25 95 96 38 34 26 88 84 71 67 27 626 626 63 61 28 103 100 63 62 29 165 198 89 81 30 271 312 86 64 31 152 226 86 86 32 95 97 42 40 33 233 131 42 17 34 2,337 2,146 43 36 35 99 95 55 53 36 82 82 54 54 37 95 95 55 55 38 162 162 55 55 39 417 403 50 46 40 236 239 50 50 41? 94 95 46? * 42 110 110 47 26 43? 110 104 46? * 44 118 122 7 2 45 82 86 8 7 46 144 141 8 6 47 78 119 28 22 48? 160 158 28? * 49 110 113 71 63 Total 9,423 9,345 1,994 1,766 ? Matching not complete, segment dropped in models 41 33 1813 2915 7 31 47 .6 .8 1 1.2 1.4 1.6 Ratio of Second to First Listing Count Segment Figure 2.1: Ratio of Number of Housing Units Listed by First Lister to Number Listed by Second, by Segment Looking at only the 1,994 selected lines in the segments where matching is complete, 88.5% were listed by the second listers. (Note that all selected lines were listed by the first lister, because cases were selected for the NSFG survey from the frame created by the first lister.) Among the 1,970 selected cases that the inter- viewers found to be residential units inside the segments, the cases which are the universe for my models below, 89.2% were listed by the second lister. 2.4.1 Comparison to Previous Results As discussed above, previous work has shown that trailers, vacant units and those in multi-unit buildings, as well as housing units in low-income, rural and oddly-shaped blocks, are prone to undercoverage. To ground my results in previous 42 work, the upper half of Table 2.2 gives the percent of the selected units listed by the second lister, broken down by these characteristics. Housing units in multi-unit buildings, vacant units and trailers are all more likely to be undercovered than single family buildings, occupied units and non- trailers. Units in complex blocks, rural blocks and low-income blocks are also un- dercovered. All of these results are in the expected direction. The F statistics in the last column account for the clustering of housing units into 13 PSUs. Only two pairs, trailers versus non-trailers and rural versus non-rural, show significantly different coverage rates. However, power analyses revealed that due to the high degree of clustering in my data, I do not have enough PSUs to detect significant differences on the segment and lister characteristics. Thus the lack of significant findings on these variables is not surprising. The fact that my dataset confirms previous findings in listing research pro- vides reassurance that the matching work is accurate and that there is nothing unusual about the segments in my study. However, the results of these earlier studies do not connect with a larger theoretical framework. I now turn to testing the hypotheses motivated by consideration of lister incentives in section 2.1. 2.4.2 Tests of Hypotheses I hypothesized that high crime segments, listing while driving, unclear seg- mentboundaries, andmismatchesbetweenlisterandsegmentlanguagescontribute to undercoverage in traditional listing. The lower half of Table 2.2 compares the 43 Table 2.2: Comparison of Listing Rates by Housing Unit and Segment Characteris- tics n Listing % F stat Multi Unit 382 83.0% 3.33 Single Family 1588 90.7% Vacant 200 83.0% 3.64 Occupied 1770 89.9% Trailer 10 60.0% 14.44? Non-Trailer 1960 89.4% Simple shape 386 93.0% 1.50 Complex shape 1584 88.3% Rural 315 75.9% 5.49? Not Rural 1655 91.8% Low Income 918 88.1% 0.16 High Income 1052 90.2% Invisible Boundary 797 85.3% 2.36 Visible Boundaries 1173 91.9% Lang. Matched 1575 89.5% 0.06 No match 395 88.4% Safety concerns 225 88.9% 0.004 No concerns 1745 89.3% Drove alone 1318 90.8% 1.30 Walked or was driven 652 86.0% Other job 970 90.7% 0.54 No other job 1000 87.8% HH Screened 1602 90.1% 3.84 Not Screened 368 85.3% ? Difference significant at 5% 44 listing rates by these characteristics and provides an initial test of the hypotheses. Units in segments with invisible boundaries are less likely to be covered than those in segments without invisible boundaries. This result is as expected. Segments where the lister speaks the language of the residents are covered somewhat better than those where the lister does not. Units in segments where the traditional lister had safety concerns are covered at a lower rate than those in segments where the lister did not have concerns, but this difference is very small. Driving while listing is associated with higher listing propensities, in contrast to my hypothesized ef- fect.10 Listers who hold other jobs also do better, which contradicts my hypothesis. None of the differences in the lower half of the table are significant due in part to the lack of power in my highly clustered dataset. Given the mixed support for my hypotheses in the bivariate analyses of Table 2.2, I turn to the multi-level models described above, which simultaneously control for possibly confounding segment and housing unit characteristics. The models in Table 2.3 show the impact of lister, segment and housing-unit level variables on the listing propensity of the selected cases. In each model the dependent variable is a binary indicator of whether the second lister listed the selected housing unit (1) or did not (0). Positive coefficients indicate characteristics that are associated with a greater likelihood of being listed by the traditional lister. Accepting the premise that all of these housing units should have been listed by all listers, positive coeffi- cients suggest that those characteristics make a unit less likely to be undercovered, 10Note that driving was not randomly assigned in this dataset. Each lister could decide for herself whether to drive or walk while listing. Thus segment characteristics that lead a lister to choose to drive may affect the estimate of this difference. 45 and negative coefficients suggest those that make a unit more likely to be undercov- ered. The first model in the table simply provides a baseline for comparison. The intra-segmentcorrelationcoefficient(rho)is24.4%, meaningthatalmostone-quarter of the variability in listing propensities is within segments and three-quarters is be- tween segments. The second model includes explanatory variables that previous studies have found to be correlated with listing propensity. Units in multi-unit buildings are 17 percentage points less likely to be covered in traditional listing than single fam- ily units ( ?fl??0.173,z ??6.96). Vacant units and trailers also have lower listing propensities, by three and one percentage points respectively, but these effects are not significant. The more rural a segment is, the more likely the units in that seg- ment are to be undercovered. The positive coefficient on the poverty measure, pro- portion of households in the block group with incomes less then $50,000, suggests that the larger the share of households below this income threshold, the higher the listing probability of all units in the segment. This finding is the only one that is not in the expected direction. A likelihood ratio test of this model against the first is statistically significant (p?0.001). The share of variance within segments (rho) drops to 17.5% controlling for these five variables. The third model adds segment and lister characteristics that are unique to this listing study. Because the study involves 46 segments and only 11 listers, only a few variables could be added without overwhelming the model. This model adds two demographic variables, percent Spanish speakers and percent African- 46 Table 2.3: Traditional Listing Linear Probability Models ,Selected Cases Only (1) (2) (3) (4) ?fl z ?fl z ?fl z ?fl z Multi-Unit -0.173 ??? (-6.96) -0.171 ??? (-6.80) -0.173 ??? (-3.44) Vacant -0.032 (-1.51) -0.033 (-1.52) -0.032 (-1.50) Trailer -0.009 (-0.10) -0.006 (-0.06) -0.006 (-0.06) Proportion HUs rural -0.236 ? (-2.00) -0.243 ? (-1.99) -0.238 (-1.95) Proportion HHs with income <= 50,000 0.091 (0.83) 0.046 (0.32) 0.045 (0.31) Proportion Pop Spanish language 0.001 (0.00) 0.018 (0.04) Proportion Pop .Afr .-Amer . 0.175 (0.84) 0.193 (0.93) Map ,invisible boundary -0.060 (-1.03) -0.058 (-0.99) Lister feels unsafe -0.165 (-1.30) -0.149 (-1.16) Lister drove herself while listing 0.011 (0.17) 0.017 (0.27) Lister and segment language matc h -0.038 (-0.51) -0.041 (-0.55) Multi *Lister feels unsafe -0.053 (-0.80) Multi *Lister drove -0.015 (-0.27) Multi *Language matc h 0.024 (0.35) Constant 0.877 ??? (36.20) 0.950 ??? (9.68) 1.006 ??? (7.36) 1.001 ??? (7.31) StdDev(segments) 0.157 0.125 0.119 0.119 StdDev(residual) 0.276 0.272 0.272 0.272 rho 0.244 0.175 0.161 0.161 Log Likelihood -316.426 -279.600 -277.991 -277.667 Pseudo R2 0.116 0.121 0.122 Observations 1970 1970 1970 1970 ? p? 0.05, ?? p? 0.01, ??? p? 0.001 47 American, and those variables needed to test the hypotheses developed above. The coefficients on the controls from the first model are largely unchanged and maintain their signs and significance patterns. As expected, units in segments with invisible boundaries ( ?fl??0.060) and those where the lister feels unsafe ( ?fl??0.165) have lower listing propensities. After controlling for segment characteristics that might capture listers? decisions to drive while listing, such as percent rural, driving still has an unexpected, positive effect on coverage by traditional listers ( ?fl ? 0.011). Whenthelisterspeaksthelanguageofthesegmentresidents, thelistingpropensity drops by almost 4 percentage points ( ?fl??0.038). However, none of the variables added in this model are statistically significant. A likelihood ratio test of this model against the second fails to reject, indicating that this model does not explain listing propensity better than the previous. The fourth model adds interactions of three variables with the multi-unit in- dicator. Coefficients on the other variables are largely unchanged between models 2 and 3, though the coefficient on percent rural is now just below the threshold for significance at the 5% level. As discussed in the hypothesis section, the effects of safety concerns, driving, and speaking the language of segment residents on un- dercoverage are expected to be stronger for units in multi-unit buildings. When a lister feels unsafe, the listing propensity of single family units is reduced by 15 per- centage points and multi-units by 20 percentage points. Both of these results are in the expected direction. The effect of driving on listing probabilities is 1.7 percent- age points (in the positive direction) for single family units, but only 0.2 percentage points for multi-units. A lister who speaks the language of the segment reduces the 48 listing propensity of single family units by four percentage points and multi-units by 1.7 percentage points. These effects are in the unexpected direction. The main effects in model 3 and the interaction effects in model 4 test the hy- potheses developed above. None of the coefficients on these variables is significant, but several are in the expected direction. Units in blocks with invisible boundaries, and in segments where the lister feels unsafe, have reduced listing propensities, and safety concerns reduce the propensities of units in multi-unit buildings even more than single family. However, driving increases listing propensities and speak- ing the language of the segment decreases them; these effects are not as expected, though both effects do move towards the expected sign when interacted with multi- unit status. 2.5 Conclusions Thispaperhasusedtheoriesfromeconomicsociology, specificallytheprincipal- agent model, to develop hypotheses about how listers? incentives contribute to error in housing unit frames. For this purpose I collected a multiple-listing dataset of 49 segments throughout the United States. While the data replicate earlier findings about the correlates of listing propensity, both bivariate and multivariate tests of my hypotheses lead to limited support for the hypotheses. Given the variables and data available, it does not appear that listers? incentives are the main drivers that lead them to make errors of undercoverage in traditional listing. However, the list- ing data studied here was quite clustered, involving only 11 listers, which makes 49 tests of the effects of lister characteristics on coverage propensity quite underpow- ered. It is possible that this lack of power in part explains the poor support for the hypothesized explanations. One clear finding from this work is that there is a good deal of undercoverage in traditional listing, and we should continue to look for theories that will explain the phenomenon. I am excited by work in environmental psychology about the effects of built environments on individual?s perception, spatial understandings and cognitive skills. Future work may find these theoretical approaches more valuable than those tested here. I note that one interesting finding in this dataset is that housing units that completed the NSFG screener were four percentage points more likely to be listed by the second lister (p ?0.08, Table 2.2), suggesting a connection between under- coverage and nonresponse that warrants additional exploration. 50 Chapter 3 Confirmation Bias in Dependent Housing Unit Listing Chapter 2 explored the mechanisms of error in traditional listing. While the findings in that paper corroborated previous work in the literature, I found only limited support for my hypotheses concerning the effects of listers? incentives on the quality of their listing work. This chapter focuses on dependent listing, where the error mechanisms are likely to be different. In the summer of 2009, the Census Bureau conducted a very large listing op- eration to prepare for the 2010 census. More than 100,000 Address Canvassers updated the Master Address File to ensure that enumeration forms would go out to every household. They essentially listed the entire country.1 One canvasser wrote a whistle-blowing post for the blog My2Census.com, a decennial census watchdog website, detailing the problems of the operation in New York City. The post, in- cluded as Appendix G, points out the difficulties of urban listing. TherearesmalltenementbuildingsinChinatownandHarlembrown- stones; where there are illegal subdivisions. It is very difficult to gain entry or make contact even if you speak the language. The list of addresses on the master Address File was loaded onto the listers? hand- held computers (HHC) and they updated this list in the field. The whistle-blower 1The Address Canvassing operation did not include remote parts of Alaska and Maine (personal communication with Robin Pennington, Census Bureau, November 6, 2008.) 51 reports that s/he received pressure to simply confirm the prior listing. We were told that if we couldn?t gain access to a building after two visits we had to accept what was in the HHC as correct. Many of us were tempted to falsify work and accept what was in the HHC...One of the other listers found an entire building with over 200 single illegally divided rooms. The HHC had less than 10 units listed in it. If they accepted was in the HHC as true they would of missed over 200 housing units. Incentives to finish quickly also led listers and supervisors to confirm the existing list rather than carefully check each building for missed, or inappropriate, units. It was alleged that some of the crew leaders and field operations supervisors told their listers since there was no regard to quality that they could skip making contact even going as far as not conducting field work and enter the units at home. There is no way that listers who were reassigned work magically gained access to buildings people couldn?t access for weeks unless they accepted what was in the HHC as true. The crew leaders and field supervisors who finished first were rewarded with additional work. Those who finished last were sometimes ?written up? as unproductive and the office terminated their employment. The type of listing described by this lister is dependent listing. Listers are sent into the field to update an existing list of addresses called the input listing. They delete 52 non-existent or nonresidential units and add units that are missing. Dependent listing is used to create housing unit frames for many surveys as well. Often the inputlistisaprevioustraditionallistingoftheareaorageocodedaddressdatabase. This paper provides strong evidence that listers using dependent listing do indeed tend to confirm errors of exclusion and inclusion on the input list, leading to errors of undercoverage and overcoverage on their frames. 3.1 Background & Hypotheses In housing unit listing, whether for the decennial census or for household sur- veys, we worry about two types of error: undercoverage and overcoverage.2 Under- coverage occurs when housing units inside the selected area are not listed. Chapter 2 focused only on undercoverage, which can raise concerns about bias in survey estimates, if the undercovered units are different than the correctly covered units (Wright and Tsao, 1983; Groves, 1989; Lessler and Kalsbeek, 1992). Overcover- age is the inclusion of elements on the frame that should not have been listed and it can be further classified into two types. Out-of-scope overcoverage is the inclu- sion of non-residential or non-existent units. While this kind of overcoverage in- creases data collection costs slightly, it does not affect survey estimates.3 Multiple- probability overcoverage is the inclusion of elements that are in the target popula- 2Kish (1965) lists four kinds of frame error: undercoverage, overcoverage, duplicates and clus- tered units. Lessler (1980) adds two others: inaccurate auxiliary data (stratification and size vari- ables) and insufficient locating data. In this paper I am only concerned with undercoverage and overcoverage. 3Out-of-scope overcoverage does introduce variability into the sample size and can therefore in- crease the variance of estimates slightly, but I will not investigate this effect here. 53 tion but whose probability of selection should come from elsewhere. For example, a lister might misread her segment maps and list an additional block along 16th St. While these housing units are part of the survey?s target population, their chance to be selected comes from their own segment; including them twice inappropriately and unknowingly inflates their probability of selection. If selected, these units may well be interviewed and brought into the final sample and thus can affect estimates (Wright and Tsao, 1983; Lessler and Kalsbeek, 1992). Independentlisting, undercoveragecanoccurintwoways: either(1)theunits were on the list and the lister inappropriately removed them, or (2) the units were not on the input list and the lister failed to add them. The two types of overcoverage can occur because either (3) the lister added inappropriate units, or (4) the units were on the input list and the lister failed to delete them. Situations 1 and 3 are similar?in each case the input list is correct and the lister introduces an error. In situations (2) and (4), it is the list that is incorrect, and the lister fails to correct it. This paper concentrates on these two situations where the lister introduces undercoverage and overcoverage by failing to correct problems with the input list. I call this phenomenon confirmation bias. EckmanandKreuter(2010)providedthefirststudyofconfirmationbiasinde- pendent listing. In a small listing in Ann Arbor and Ypsilanti, Michigan, they intro- duced errors of undercoverage and overcoverage into the input list. They found that listers tend to confirm that the input listing is correct and thus to transfer those er- rors to the housing unit frame. When the input list includes an incorrect unit, the lister has a tendency not to delete it, a failure-to-delete error. Seventeen percent of 54 the units added to the input list were confirmed by the listers. Conversely, when a lister sees units inside the segment that are not on the list, she has a tendency not to add them, a failure-to-add error. Suppressing a housing unit from the input list- ing decreased its listing propensity by 13.4 percentage points. Eckman and Kreuter find some support for the hypothesis that both types of confirmation bias are more likely with units in multi-unit buildings. This paper extends these findings by in- vestigating the confirmation bias phenomenon in more depth. First, this work uses a larger geographic sample to explore whether the phenomenon exists on a larger scale. Second, it digs deeper into the mechanisms at work in confirmation bias. Confirmation bias is not unique to housing unit listing. In the social psychol- ogy literature, the term confirmation bias has several meanings (Klayman, 1995). The one closest to the use in the survey field is the tendency to look for corrobo- rating evidence (Hitchcock, 1995, pg 324) or for ?the presence of what you expect? (Klayman, 1995, pg 386). In dependent interviewing, when a second interviewer can see the result recorded by the first, s/he is more likely to collect the same re- sponse (O?Muircheartaigh, 2004; Lynn and Sala, 2006). We see the same result in dependent coding when the second coder sees the code assigned by the first (Biemer and Lyberg, 2003). In both cases the work of the second person is affected by ex- pectations set by prior information. Dependent listing is similar: the second lister has prior information that may create expectations about what she will find in the field.4 4The psychological confirmation bias literature has also influenced another thread of research in survey methodology on the effect of interviewers? expectations, about incentives, response rates and item sensitivity, on their production (Singer and Kohnke-Aguirre, 1979; Singer et al., 1983, 2000). 55 My use of the term confirmation bias is closer in spirit to its use in the coding literature. I suspect that the underlying mechanism in confirmation bias in housing unit listing is an appeal to authority, where the input list serves as the authority. Several of the listers I spoke to talked about unclear or difficult listing situations, much like those described in the blog post quoted above, where they looked to the list for guidance about the existence and designation of the housing units in the area. They mentioned locked buildings where they could get no information about the number of units, and segments with invisible boundaries where they could not tell what was inside and what outside the segment. In these situations a few lis- ters said all they could do was trust the list. Another lister talked about a more general hesitancy to contradict the list in any situation. (To be fair, two listers also mentioned the importance of not relying too much on the list, suggesting some inter-lister variability in the confirmation bias phenomenon.) See Appendix F for details from the interviewer debriefings. These discussions with listers, as well as the post by the address canvasser above, suggest that confirmation bias will be more likely in difficult listing situa- tions. We know a bit about the kinds of units and segments that are difficult for lis- ters from Chapters 1 and 2. In these chapters I found that listers tend to disagree about the inclusion of units in rural segments, those with complex shapes, those with invisible boundaries, and those with more high income households. They also disagree about the inclusion of units in multi-unit buildings.5 If the confirmation 5While Chapter 1 finds strong disagreement about trailers, I do not have enough trailers in this dataset I use in this chapter to analyze them separately. 56 bias phenomenon found by Eckman and Kreuter stems from the input list serving as an authority in difficult listing situations, we should see more confirmation bias in these situations. I also suspect that the overall level of error in the list affects the listers? ten- dency to confirm errors in the list. If the input listing is very inaccurate, confirma- tion bias may be less likely, as the authority of the list is undermined. The lister may essentially turn to traditional listing if the input list contains too many errors. I suspect that confirmation bias of both types (failure-to-add and failure-to-delete) is most likely when the input list is largely accurate. (Of course, confirmation bias cannot occur when the input list is entirely accurate.) Hypothesis: Listers tend to fail to add units not on the input list. Hypothesis: Listers tend to fail to delete units that do not exist from the input list. Hypothesis: Both types of confirmation bias are more likely in rural segments. Hypothesis: Both types of confirmation bias are less likely in segments where the blocks all have a simple square shape. Hypothesis: Both types of confirmation bias are more likely in segments with an invisible external boundary. Hypothesis: Both types of confirmation bias are more likely in segments with a high percentage of high income households. Hypothesis: Both types of confirmation bias are more likely in multi-unit buildings. 57 Hypothesis: Both types of confirmation bias are less likely in segments where the input list contains many errors. 3.2 Data To test these hypotheses I again use the NSFG repeated listing dataset de- scribed in Chapter 2. In the previous chapter, I discussed only two listings in this dataset. The first listing was conducted by the project to support normal sampling and interviewing procedures. The second listing used traditional listing in every segment. Each of the 49 selected segments was also listed a third time, using de- pendent listing. Like the second listing, the third was not subject to SRC?s quality review. The listers who performed the third (dependent) listing of the segments were drawn from the same pool as those who did the other two listings?trained and experienced NSFG lister and interviewers. However, those who participated in the third listing happen to have somewhat different qualities as reported in the interviewer questionnaire (the questionnaire is given in Appendix D). There were 11 listers involved in the third listing and each lister listed between three and nine segments. Only one is African-American, and six speak Spanish. All have completed at least two years of college. These listers are quite experienced with between four and fourteen years of interviewing work before they began working on NSFG Cycle 7. The dependent listers are as a whole better educated and more experienced than the traditional listers in my study.6 6These differences in lister attributes are not by design. I was not able to select the listers who 58 3.2.1 Manipulation of Input List To test the above hypotheses about confirmation bias, I experimentally ma- nipulated the input to the third (dependent) listing. The foundation of the input was the frame created by the first listers, prior to the quality checks performed by SRC. I added addresses not listed by the first lister and deleted addresses that were listed by the first lister. If confirmation bias exists, the third lister should show a tendency not to correct these errors.7 Units Deleted from Input Listing I deleted 556 units from the input list, 5.9% of those listed by the first listers. The lowest deletion rate by segment was 1.5% and the highest was 18.3%. The deleted housing units are of four types as shown in Table 3.1. These deletions were performed quasi-randomly. Every housing unit in the first listing was assigned a probability of being deleted depending on whether the case was selected8 and the manipulation group of the segment (as discussed below). For each housing unit I generated a number between zero and one from a uniform distribution and if the number was less than the probability assigned to the case I flagged the unit for deletion. The allocation of the flagged cases to deletion type was not entirely random, but based on the overall deletion rate in the segment and participated in my study nor assign them to segments or methods. 7The 1980 Census used a similar technique to review the work on precanvass enumerators (Fan et al., 1984), but I have not seen any results from this check. 8In doing the deletions, I gave preference to deleting units which had been selected for the NSFG screener and interview. My thinking at the time was that deleting selected cases would allow me to control for cases which the dependent lister failed to add because they were not real units. However, so few of the selected cases were found to be improper listings (less than 1%, as discussed in Chapter 2) that this safeguard was unnecessary and is not utilized in the analyses below. 59 plausibility constraints. Table 3.1: Number of Cases Deleted from Input to Third Listing, by Type Type of Deleted Unit Cases Percent Entire multi-unit building 87 15.7% Unit in multi-unit building 50 9.0% Single-family 263 47.3% All housing units on street segment 156 28.1% Total 556 Units Added to Input Listing To test for failure-to-delete error, I added 421 housing units to the input list- ing. The lowest addition rate by segment was 0.64% and the highest was 14.2%. The added units are of five types, as given in Table 3.2. Table 3.2: Number of Cases Added to Input to Third Listing, by Type Type of Added Unit Cases Percent Units on new street segment 45 10.7% Unit in multi-unit building 59 14.0% Building in midst of others 167 39.7% Turn single-family into make multi-unit 88 20.9% Units outside of segment 62 14.7% Total 421 These manipulations were also randomized using a method similar to that described above for the deletions. However, it was not always possible to add a unit at the point specified by the randomization, for example, between units five and six in an eight unit building. Thus I gave myself some leeway to deviate from the 60 randomly selected spots when adding units. I tried to add units that could seem plausible to the listers. For example, in a building with three units numbered 1,2 and 3, I might add a unit 4, or a basement unit. In a street with house numbers increasing by four (504, 508, 512) I might add a two unit building at 510. When adding units across the street, I used online satellite images and real estate web- sites to find addresses of housing units that very likely were across the street from the segment. Manipulation Groups To test whether the overall level of error also affects confirmation bias, I var- ied the level of manipulation at the segment level. I randomly split the 49 selected segments into four sets and varied the degree of manipulation, as shown in table 3.3. The addition and deletion rates given in the table are at the unit level, not at the manipulation level: one manipulation could have added or deleted many units. These manipulation groups allow me to test the hypothesis that when the input list is quite inaccurate, listers commit fewer errors of confirmation bias. I compare the fourth group (low deletion and addition rates) to the other three groups because this group will likely appear to the lister as obviously different than the other three. This fourth group is the one where the input list is of highest quality; in each of the other three groups, the sum of the addition and deletion rates is greater than 10%. Matching Frames To prepare the dataset for analysis, I again had to match the frames together, 61 Table 3.3: Level of Manipulation in Four Segment Sets Deletions Additions Segments Deletion rate Addition rate High High 12 6.3% 5.3% High Low 12 8.6% 2.4% Low High 12 4.6% 7.6% Low Low 13 3.4% 2.6% Overall 49 5.9% 4.7% but the matching steps were more complicated than those described in Chapters 1 and 2. This matching task involved two separate matching operations, each using multiple steps. I first matched the input list given to the third lister (with the suppressed lines added back in) to the frame created by these listers, to identify which lines were confirmed, deleted and added by the listers. Then I matched the three frames to each other. The quality of this matching work will impact the quality of all my analyses. Details on the procedures used in both rounds of matching are in Appendix C. As discussed in Chapter 2, the matching could not be completed in three very rural segments. The incomplete matching affects some of my analyses in this chapter, but not all. 3.3 Analysis Methods I use a variety of techniques to test the hypotheses developed above. Testing for failure-to-add and failure-to-delete errors requires slightly different techniques for reasons described below. Most tests use the traditional listers as a control group not subject to the input list manipulation. These analyses can use only the cases in 62 the 46 segments where the matching work is complete. 3.3.1 Testing for Failure-to-Add Error Iusethreetechniquestotestmyhypothesesaboutfailure-to-addconfirmation bias. First, I compare listing rates for the unmanipulated and deleted units. Be- cause the deletions to the input list are nearly random, these rates provide evidence for confirmation bias. If the dependent listers show a tendency towards confirma- tion bias, the deleted units should be listed at a lower rate than the unmanipulated units. Second, I calculate a difference-in-differences estimate of the effect of delet- ing cases from the input listing. This technique uses the traditional listers, who were not subject to the manipulation, as a control group.9 Let Ldepunm be the fraction of unmanipulated cases on the input list that were listed by the dependent lister. Ltraddel is the fraction of cases deleted from the input list that were listed by the tra- ditional lister. Ldepdel and Ltradunm are defined similarly. Then Ldepdel ?Ltraddel captures the difference in the listing rates for the deleted units between those listers who were and were not subject to the manipulation. This difference is part of the effect I am interested in, but it does not take advantage of the experiment by comparing the manipulated to the unmanipulated units. Conversely, Ldepunm?Ldepdel captures the difference in the listing rates between the unmanipulated and manipulated cases. 9The difference-in-differences technique is commonly used with panel data to derive treatment effects from non-randomized designs: each case serves as its own control and the difference in the change from period 1 to period 2 between those who did and did not receive the treatment is the average treatment effect (Angrist and Pischke, 2009, pp. 221?247). 63 (This difference is the same as the first analysis method discussed above.) The shortcoming of this approach is that any systematic variation in the deleted and unmanipulated units could bias the estimate. The more appropriate estimator of the treatment effect is the difference-in-differences estimate, which adjusts for any- thing that is unique about the deleted cases and takes advantage of the experiment: D?in?D?(Ltradunm?Ltraddel )?(Ldepunm?Ldepdel ) (3.1) This estimate depends on the accuracy of the match between the traditional and dependent listers. It can use only the 46 segments where the matching between the second and third frames is complete. The third analysis technique expands upon these difference-in-differences re- sults by simultaneously controlling for housing unit and segment level characteris- tics in estimating the size of the failure-to-add effect. Just as in Chapter 2, I use linear probability models with fixed effects for the eleven listers and random ef- fects for the segments. The dataset used in the model is at the housing unit and listing level: each of the unmanipulated and deleted housing units appears in the dataset twice, once for the traditional and once for the dependent listings. The bi- nary dependent variable is whether a given lister listed a housing unit (1) or did not (0). The independent variables of interest are a dummy variable indicating method (traditional listing is the reference category), a dummy variable indicating whether the housing unit was deleted from the input listing, and the interaction of these two, which captures the effects of the deletion on the dependent listers. Including 64 the indicator of deletion at the housing unit level controls for any unobserved at- tributes that make the suppressed units harder or easier to list. The models also control for many of the other housing unit and segment characteristics used in the models in the previous chapters. Just as in Chapter 2, the models do not account for the selection probabilities of the segments. While the area sample in each quarter of NSFG data collection is nationally representative, the weights to make inference from a single quarter do not exist.10 3.3.2 Testing for Failure-to-Delete Error All of the analysis techniques discussed in the previous section compared the deleted units to the unmanipulated. For the cases added to the input list, a differ- ent logic applies. Here relying on authority would mean failing to delete the added units, giving them a positive listing propensity. No comparison to the unmanipu- lated units is necessary. I again use three techniques to explore failure-to-delete errors in dependent listing. The first technique compares the listing rates of the added cases to the null hypotheses that none of these units should have been listed. The second analysis compares the listing rates for the added housing units across the two listing meth- ods. If the traditional listers also included some of these added units, that suggests the units do in fact exist and the estimate of confirmation bias should be reduced. In the notation introduced above, the second analysis calculates Ldepadd?Ltradadd . The third analysis, using linear probability models, expands on the comparison between 10I plan to soon develop weights for the 49 segments in my sample and rerun the models. 65 the traditional and dependent listing of the added units by controlling for housing unit and segment characteristics. 3.4 Results Together the results of these analyses give a clear picture of both the failure- -to-add and the failure-to-delete effects in dependent listing. The manipulation of the input list provides evidence that listers commit errors of both kinds in depen- dent listing. 3.4.1 Failure-to-Add As discussed above, if dependent listers are susceptible to failure-to-add con- firmation bias, then they should be less likely to list housing units deleted from the input list than those not deleted. Table 3.4 shows that only 63.8% of the deleted units were included in the dependent-listed frame, versus 95.7% of the units not deleted from the input listing. That is, removing cases from the input listing re- ducedthelikelihoodthattheseunitswouldbelistedbyalmost32percentagepoints. The difference in listing rates is even larger for housing units in multi-unit build- ings, and less pronounced in single family units. All differences in the listing rates between the deleted and unmanipulated lines in this table are highly statistically significant. To test the sensitivity of the difference in overall listing rates to individual lister behavior, I re-estimated this overall result, dropping the segments listed by 66 each of the eleven listers in turn. The estimates of the difference in the listing rates between the unmanipulated and deleted units ranged from 25.0% to 34.0%, suggesting that the estimate shown in Table 3.4 is not due to unusual behavior by any one lister. Every lister failed to add back one or more deleted units in her segments. Table 3.4: Percent of Unmanipulated and Deleted Units Listed in Dependent List- ing n Percent Listed F Statistic Overall Unmanipulated 8862 95.7% Deleted 556 63.8% Difference 31.8% 31.94? Multi-Unit Unmanipulated 2808 98.7% Deleted 152 57.2% Difference 41.5% 21.51? Single Family Unmanipulated 6054 94.3% Deleted 404 66.3% Difference 28.0% 22.56? ? Significant at the 5% level The difference-in-differences estimate is given in Table 3.5. In the second column are the listing rates for the unmanipulated and deleted lines in the tradi- tional (second) listing. (While these listers were not subject to the manipulation, by matching the frames together I can determine the share of these lines listed by the traditional listers.) The listing rates for the deleted units are consistently smaller than those for the unmanipulated lines, even among the traditional listers. This suggests that there is something different about the deleted lines that makes them harder to list, as expected due to the way in which lines were selected for deletion. For this reason, the results in Table 3.4 may overstate the failure-to-add effect by 67 not controlling for this fact. Table 3.5 compares the listing rates for the unmanipu- lated and deleted units across the two listing methods. In every case the dependent listers included a larger share of the unmanipulated units and a smaller share of the deleted units. In the last column are the difference-in-differences estimates of the effect of deleting units from the input to the dependent listing, as defined in Equation 3.1. In these 46 segments, removing cases from the input listing reduces the likelihood that those cases will be included on the frame by 26.5 percentage points. The results are again larger in magnitude for units in multi-family build- ings. Table 3.5: Failure-to-Add: Difference-in-Differences, in Percentage Points Housing Traditional Dependent D-in-D Units Pct. Listed Pct. Listed Estimate Overall Unmanipulated 8519 86.67% 96.11% -26.46 Deleted 535 81.68% 64.67% Multi-unit Unmanipulated 2803 84.94% 98.90% -34.31 Deleted 152 77.63% 57.23% Single family Unmanipulated 5716 87.51% 94.77% -22.93 Deleted 383 83.29% 67.62% Three segments dropped because matching not complete, data will not match Table 3.4 The results of the multi-level regression models are given in Table 3.6. The first set of variables are controls which have been found to be correlated with listing propensity in bivariate analysis in previous work (though they are not significant here). In the first model, the first variable in the second set shows that depen- dent listers are nine percentage points more likely to list the unmanipulated units in high manipulation segments than are traditional listers ( ?fl ? 0.094,z ? 22.92). 68 The next row in the table shows that the deleted units are in four percentage points less likely to be listed by the traditional listers than those that were not manipulated ( ?fl??0.041,z ??3.40), which we also saw in Table 3.5. Unmanipu- lated units in multi-unit buildings are less likely to be listed by traditional listers ( ?fl??0.085,z??12.68). In the next row, unmanipulated housing units in segments selected to be part of the low manipulation group do not have significantly differ- ent listing probabilities in traditional listing, as expected, because assignment to manipulation groups was random. The third set of variables tests the effect of the deletion of cases from the input list on the dependent listers. These variables are all interaction effects. The first row in this set is the two-way interaction of the deletion of units from the input list and the dependent listing method. The manipulation of the input list has a strong and negative effect on the listing propensity of single family units in high manipulation segments in dependent listing and this effect is strongly significant ( ?fl ? ?0.252,z ? ?13.32). Deleting these cases from the input list reduces their propensity to be listed by dependent listers by 20 percentage points (0.094?0.041? 0.252??0.199) relative to the propensity of unmanipulated single-family cases in high manipulation segments in traditional listing. In the next row of this section of independent variables, the tendency of de- pendent listers to add back deleted single family units is stronger in segments in the low manipulation group (that is, where the input list is of high quality) than in segments where the manipulation rate is higher ( ?fl?0.131,z ?3.62). Said the other way around, dependent listers are thirteen percentage points more likely to 69 Table 3.6: Failure-to-Add: Listing Propensity Models on Unmanipulated and Deleted Cases (1) (2) ?fl z ?fl z Map ,simple 0.018 (0.29) 0.018 (0.27) Map ,invisible boundary -0.060 (-1.11) -0.059 (-1.05) Pct. HUs rural -0.072 (-0.62) -0.073 (-0.60) Pct. HHs with income <= 50,000 0.044 (0.36) 0.044 (0.35) Pct. Pop .Afr .-Amer . 0.032 (0.27) 0.030 (0.24) Pct. Pop Spanish language 0.039 (0.12) 0.046 (0.13) Dependent method (1), Traditional (0) 0.094 ??? (22.92) 0.094 ??? (22.97) Unit Deleted -0.041 ??? (-3.40) -0.041 ??? (-3.40) Multi-Unit -0.085 ??? (-12.68) -0.086 ??? (-12.85) Low rate of manipulation in segment 0.051 (1.13) 0.052 (1.10) Unit deleted *Dependent method -0.252 ??? (-13.32) -0.444 ??? (-13.98) Unit deleted *Dependent *Low rate of manipulation 0.131 ??? (3.62) 0.097 ?? (2.66) Unit deleted *Dependent *Multi-unit -0.100 ??? (-3.79) Unit deleted *Dependent *Entire multi-unit building deleted reference Unit deleted *Dependent *Unit in multi-unit building deleted 0.219 ??? (4.55) Unit deleted *Dependent *Single family unit deleted 0.269 ??? (7.92) Unit deleted *Dependent *All units on street deleted 0.084 ? (2.29) Constant 0.837 ??? (9.05) 0.835 ??? (8.65) StdDev(segments) 0.116 0.121 StdDev(residual) 0.269 0.269 rho 0.156 0.169 Observations 18108 18108 ? p? 0.05, ?? p? 0.01, ??? p? 0.001 70 commit failure-to-add confirmation bias in segments where the list is of low qual- ity. This result contradicts the hypothesis that when the list contains more errors, listers notice the problem and do a better job of fixing the input list, making fewer confirmation errors. In the last row of this section of independent variables, the interaction of the deletion of units and dependent listing with multi-units is negative and significant ( ?fl ? ?0.100,z ? ?3.79). That is, when units in multi-unit buildings are deleted from the input list, they are ten percentage points less likely to be added back by dependent listers than are deleted single family units. The increase in failure-to- add error for multi-units is as expected and as found in the other analyses above. The second model in Table 3.6 compares the different types of deleted units. Here the deletion of an entire multi-unit building is the reference category. Each of the other types of manipulations has a positive coefficient, meaning that the reference category is the one which experiences the most confirmation bias. When the deleted unit is a single unit in a multi-unit building, it is 22 percentage points more likely to be added back to the frame by dependent listers than are the units in the buildings whichwere deleted entirely ( ?fl?0.219,z?4.55). Deleted single family units are 27 percentage points more likely to be added back ( ?fl? 0.269,z ? 7.92). When all units on a street segment were deleted (such as all units on the even side of the 400 block of Baltimore Ave) they are eight percentage points more likely to be added back than are the units in an entirely-suppressed multi-unit building ( ?fl?0.084,z?2.29). 71 3.4.2 Failure-to-Delete Turning attention to the 421 units added to the input list, if dependent listers are susceptible to failure-to-delete confirmation bias, then they should show a ten- dency not to delete these units. Table 3.7 shows that 24.9% of the added units were confirmed by the lister using dependent listing. A sensitivity analysis showed that this effect is not due to just one lister; dropping each lister in turn yielded a range of estimates from 14.4% to 27.7%. Two listers did not confirm any of the added units in their segments, though they did have among the fewest added cases (20 and 7). A larger share of the added units in multi-unit buildings were confirmed than those in single family homes (27.5% versus 23.5%), though the difference between these two is not significant. Table 3.7: Percent of Added Units Listed in Dependent Listing n Percent Listed Overall 421 24.9% Multi-Unit 153 27.5% Single Family 268 23.5% The second failure-to-delete analysis, shown in Table 3.8, compares the listing rates for the added cases in each of the two listing methods. In the first row we see that 6.6% of the housing units that were added to the input list were also listed by the traditional lister. This result suggests that in a few cases I fabricated units that really did exist. (Eckman and Kreuter also found that one of the units they added existed in Ann Arbor. Of course, another possibility I must acknowledge for this finding is matching error.) The dependent listers confirmed 25.1% of the 72 added cases. Adding units to the input list raises their listing propensity by 18.5 percentage points. Again the effect is stronger in units in multi-family buildings. Table 3.8: Failure-to-Delete: Comparison of Listing Rates between Traditional and Dependent Listing for Added Cases Housing Traditional Dependent Difference Units Pct. Listed Pct. Listed (Pct. Points) Overall 410 6.6% 25.1% 18.5 Multi-Unit 149 0.67% 26.8% 26.2 Single family 261 10.0% 24.1% 14.2 The regression models in Table 3.9 expand upon these results. This linear probability models is run on both the traditional and dependent listings of the 410 added housing units in the 46 segments where the matching is complete. The de- pendent variable is a binary indicator of whether the lister deleted the added units (1) or not (0) (Note that the dependent variable in this model, in contrast to other regression models in Chapters 2 and 3, is not listing but deletion.) Negative coeffi- cient estimates again signify characteristics associated with confirmation bias just as in Table 3.6. The first set of independent variables in the table are again con- trols, and again none are significant. The second set of independent variables show no strong effects for units in the low manipulation segments, as expected. Units in multi-unit buildings have a deletion propensity nine percentage points higher than single family units ( ?fl?0.092,z?2.74), meaning that listers of both methods were more likely not to list the 149 added units that were in multi-unit buildings. Thethirdsetofindependentvariablestestshypothesesaboutfailure-to-delete confirmation bias. The first variable in the third set estimates the difference in the 73 Table 3.9: Failure-to-Delete: Deletion Propensity Models on Added Cases (1) (2) ?fl z ?fl z Map ,simple shape -0.013 (-0.23) -0.003 (-0.06) Map ,invisible boundary 0.023 (0.45) 0.026 (0.48) Pct. HUs rural -0.005 (-0.04) 0.015 (0.11) Pct. HHs with income <= 50,000 0.073 (0.64) 0.133 (1.07) Pct. Pop .Afr .-Amer . -0.193 (-1.56) -0.186 (-1.40) Pct. Pop Spanish language 0.143 (0.44) 0.183 (0.52) Low rate of manipulation in segment 0.035 (0.63) 0.036 (0.63) Multi-Unit 0.092 ?? (2.74) 0.037 (1.23) Dependent method (1), Traditional (0) -0.145 ??? (-5.03) -0.018 (-0.35) Dependent method *Low Manip .Rate 0.025 (0.37) 0.015 (0.23) Dependent method *Multi-Unit -0.121 ?? (-2.62) Dependent *Units added in new street reference Dependent *Unit added in multi-unit building -0.208 ?? (-2.99) Dependent *Building between others -0.224 ??? (-3.97) Dependent *Single Family turned into multi-unit -0.177 ?? (-2.70) Dependent *Added unit outside segment -0.094 (-1.41) Constant 0.908 ??? (11.44) 0.878 ??? (10.02) StdDev(segments) 0.053 0.069 StdDev(residual) 0.315 0.313 rho 0.028 0.047 Observations 820 820 ? p? 0.05, ?? p? 0.01, ?? ? p? 0.001 74 listing rates between the two methods for the added lines, the simple failure-to- -delete effect. Listers using the dependent method are 15 percentage points less likely to delete the added single-family cases in high manipulation segments than the traditional listers ( ?fl??0.145,z??5.03). The failure-to-delete effect does not interact significantly with segments in the low manipulation group. The effect is stronger for units in multi-unit buildings ( ?fl??0.121,z??2.62): dependent listers are 12 percentage points less likely to delete these units than they are single family units. The stronger effect for multi-units is as expected and as found in the other failure-to-delete (and failure-to-add) analyses. The second model adds in a test of the strength of the effects among the dif- ferent kinds of added units. All of the estimated coefficients are negative and all but the last are significant, meaning that the reference category (units added in a new street) is the type of manipulation that listers were least likely to confirm. 3.4.3 Summary and Interpretation of Results The results above are clear evidence that both failure-to-delete and failure-to- add confirmation bias exist in dependent listing. Housing units inside the segment which are not on the input list are at risk of undercoverage. Those on the input list inappropriately are at risk of overcoverage. These results replicate previous findingsonconfirmationbiasanddemonstratethatthephenomenonisnotlocalized to student listers or to southeast Michigan, as in the Eckman and Kreuter study. These analyses used a national dataset and quite experienced listers. I find strong 75 support for my first two hypotheses about the existence of the types of confirmation bias. This paper expands on previous work not only geographically but also by ex- ploring the housing unit and segment characteristics associated with confirmation bias. I hypothesized that confirmation bias, of both types, is more common in seg- ments where traditional listers also have difficulties with undercoverage: rural seg- ments, complex-shaped blocks, segments with invisible boundaries and those with a high percent of high income households. Table 3.10 summarizes the difference-in-differences results for failure-to-add and the single difference results for failure-to-delete by important segment and housing unit characteristics. These bivariate results provide tests of my hypothe- ses about the effect of segment characteristics on confirmation bias. (While I could have added tests of these hypotheses as additional manipulation effects in the mul- tivariate models, I believe this table is easier to interpret). Nonrural segments are more susceptible to both kinds of confirmation bias, in contrast to my hypothesis. I suspect this contradictory finding is due to the concentration of multi-unit build- ings in non-rural segments. Segments made up of complex-shaped blocks, those with nonvisible boundaries, and those with more high income households, are more likely to experience confirmation bias, as hypothesized. These findings support my supposition that dependent listings commit confirmation error in situations that also give traditional listers trouble, suggesting that the input list serves as an au- thority in difficult listing situations. I do not find strong support for my hypothesis that when the input list con- 76 Table 3.10: Comparison of Listing Rates of Manipulated Cases in Traditional and Dependent Listing, in Percentage Points Failure-to-Add Failure-to-Delete (Diff-in-Diff) (Difference) Overall -26.46 18.54 Rural1 -19.23 15.00 Not Rural -27.45 18.92 Map, simple shape -14.8 13.69 Map, complex shape -28.03 19.58 Invisible boundary -26.68 22.84 No invisible boundary -25.68 12.92 Low Income2 -15.34 7.75 High Income -34.26 23.49 Low Manip. Rate3 -18.82 17.31 High Manip. Rate -27.19 23.68 Shading indicates the member of each pair which is greater in absolute value 1 Segments above median, percent of housing units in rural blocks 2 Segments above median, percent of households with income < $50,000 3 Low-low group compared to high-high group, see Table 3.3 77 tains a good deal of error, listers realize it is no longer an authority and commit fewer confirmation errors. If that were the case, then a high error rate in the input list should lead listers to question the authority of the list and reduce instances of confirmation bias. Instead I find that the more error in the input list, the more likely is confirmation bias. The bivariate differences in Table 3.10 show less con- firmation bias in segments where the list has fewer errors (is of high quality), not more. The multivariate results also show that listers commit fewer errors, of both types of confirmation bias, when the manipulation rates are low (though the coef- ficient on this term is significant only in the failure-to-add model). These results suggest that appeal to authority is not the right framework for understanding the confirmation bias phenomena. Another possible explanation is that the error rates inthisstudy(seeTable3.3)werenotlargeenough. Orperhapslistersperceiveerror rates differently than I measure them here. Additional work is needed to uncover the mechanisms underlying confirmation bias. 3.5 Discussion & Conclusions These results have important implications for surveys which use dependent listing in whole or in part to create housing unit frames, which includes many Cen- sus Bureau household surveys (U.S. Census Bureau, 2006), the National Survey of Family Growth (NSFG Cycle 7 Staff, 2008), the Residential Energy Consumption Survey (personal communication with Krishna Winfrey, National Opinion Research Center, January 14, 2010), and of course the decennial census. Surveys which rely 78 on dependent listing are much more reliant on the quality of their input listing than has been thought. Errors of inclusion and exclusion on the input listing are likely to be transmitted to the final frame due to confirmation bias. If the kinds of units un- dercovered and overcovered by the input listing are different than those which are properly covered, then confirmation error can introduce coverage bias into survey data. For this reason, survey organizations that use dependent listing should have a good understanding of the determinants of the quality of their input lists. Unfor- tunately, our understanding of the error in the listings and databases that serve as input lists are often based on studies which use dependent listing to identify errors (studies which use this technique include O?Muircheartaigh et al., 2003; Thompson and Turmelle, 2004; Turmelle et al., 2005; O?Muircheartaigh et al., 2006, 2007). The results in this paper suggest that those estimates of input frame quality are too high as they themselves suffer from confirmation bias. Indeed the confirmation bias findings in this paper call into question the find- ings of all listing studies which use dependent listing to create a gold standard frame. The Census Bureau routinely checks listers? work by sending a senior field representative to list the segment, using the original frame as an input (personal communication with Rodrick Marquette, Mathematical Statistician, Decennial Sta- tistical Studies Division, Census Bureau.) Other studies which use this technique include Hansen and Steinberg (1956) and Pearson (2003). The assertion in these studies is that experienced listers using dependent listing produce a gold standard frame. All dependent listers in this study, however, had at least four years of ex- 79 perience and all committed confirmation bias. I discuss the implications of these three chapters for gold standard frame construction in Chapter 5. This paper has largely treated failure-to-add and failure-to-delete confirma- tion bias as similar phenomena?both can lead to errors on housing unit frames and to bias in survey estimates. However, the two are very different in the minds of many survey practitioners. The listers (and SRC trainers) I spoke to were much more concerned with undercoverage than overcoverage (lister debriefings, personal communication with NSFG staff). I have heard trainers encourage listers to err on the side of inclusion, rather than exclusion: ?When in doubt, list it.? This attitude comes from a belief that overcoverage is less of a concern than undercoverage, and that a reasonable amount out-of-scope overcoverage is not a problem for surveys. While this instruction has good intentions, it may contribute to the failure-to-delete tendency observed in this analysis. During the debriefings conducted as part of this study, listers indicated that they were sometimes hesitant to remove units from the input listing (see Appendix F). Thus it seems that lister training may encourage failure-to-delete confirmation bias, but I have not seen any statements by lister trainers that would encourage failure-to-add confirmation bias. It is true that it does not take much interviewer time to identify nonresiden- tial or nonexistent units after they are selected. I am concerned, however, with multiple-probability overcoverage and the ways that the two types of overcoverage can blur together. For example, consider a single family home that has been incor- rectly listed as two units, and the second unit has been selected. The appropriate behavior for the interviewer is to disposition the case as nonexistent (out-of-scope 80 overcoverage). However, it is conceivable that some interviewers would think that since the two units had been combined into a single-family that the selected case now points to that single unit. If the interviewer proceeds to interview, then that single-family unit has two chances of selection and is overcovered in the sense of multiple-probability overcoverage defined above. More research is needed to under- stand how interviewers really do handle these sorts of situations. Yet the viewpoint by listers and managers that overcoverage is not a problem likely explains some of the tendency towards failure-to-delete bias I find in this study. These three chapters together paint a picture of the difficulties listers face in creating housing unit frames. Listing is a complex task and no method seems clearly superior. When the input list is very good, dependent listers can create high quality frames, though their improvements may not be worth the cost of time and travel. When the input list is not good, dependent listers have difficulties in the same areas that traditional listers do. Clearly frame quality deserves more attention in the literature on household surveys. However, the most important issue for survey data quality is the contri- bution of undercoverage and overcoverage to bias in survey estimates. If the errors traditional and dependent listers make are not related to variables of interest to survey researchers, then we need not devote resources to quantifying, understand- ing and reducing them. The next chapter uses NSFG data to estimate the impact of housing unit undercoverage on estimates in that survey. 81 Chapter 4 Bias Due to Undercoverage in Housing Unit Frames 4.1 Introduction The previous chapters have made important contributions to our understand- ing of errors in listed housing unit frames. However, questions still remain about themechanismsoferrorinhousingunitlisting. Futurestudiesoflistererrorshould use larger samples with more segments and listers to test hypotheses derived from alternative theories. But before pursuing additional research into the mechanisms of error in housing unit listing, it is wise to check first if these errors impact survey estimates. Coverage bias is a risk whenever a frame contains undercoverage or multiple- probability overcoverage. Undercoverage occurs when listers fail to include one or more housing units that lie inside the segment on their frame. Multiple-probability overcoverage occurs when listers include units from outside the segment. In a sur- vey of residents of households, such as the Current Population Survey, if the people who live in undercovered or overcovered housing units are different than those who live in the correctly covered units, survey data can be biased. In a study of housing conditions, suchastheResidentialEnergyConsumptionSurvey, itwouldbeenough that the undercovered or overcovered units themselves were different, regardless of the characteristics of the inhabitants. 82 The NSFG dataset used in Chapters 2 and 3 supports estimation of bias due to undercoverage in housing unit frames. Using the completed interviews in the segments selected for relisting, I can estimate the bias in important variables if the cases vulnerable to undercoverage in the second and third listings were not covered. The dataset does not permit estimates of bias due to multiple-probability overcov- erage because it does not contain indicators of which units, if any, are outside of the segment or otherwise had more than one chance to be selected. There are few estimates of coverage bias in the literature on area probability surveys. Coverage bias is difficult to estimate: one must know not only which units are undercovered and overcovered, but must also have data about the undercovered cases. There is indirect evidence that implausible estimates of relative school en- rollment and labor force participation rates from the Current Population Survey by race and sex (Clogg et al., 1989) and victimization rates from the National Crime Study (Martin, 1981; Cook, 1985) are due to coverage bias. However, these studies do not look specifically at bias due to housing unit listing. To my knowledge this chapter offers the first estimates of bias due to errors in housing unit frames. 4.2 Data This chapter again uses the multiple listing dataset discussed in Chapters 2 and 3. This dataset contains three listings of a sample of 49 segments from quarter 12ofCycle7ofthe NationalSurveyof FamilyGrowth (NSFG).SeeAppendix Cfora discussion of how the three frames were matched together. Using the response data 83 collected by NSFG, I can estimate undercoverage bias in means of NSFG variables. 4.2.1 Survey Background NSFG data produce important estimates on fertility and family formation be- havior that are used by demographers to model the size and composition of the population in the future. These data are also a valuable resource to researchers studying marriage, divorce, fertility, adoption, sexually transmitted infections, and more. The survey is conducted for the Vital Statistics Division at the National Cen- ter for Health Statistics (NCHS). Data collection is underway for Cycle 7 of NSFG. Each quarter, interview- ers receive an assignment of selected cases in their segments and approach each household to participate in the survey. They first attempt to screen the household to determine if any residents are eligible (15-44 years old), then select an eligible memberandcontinuetotheinterview. Aftereightweeks, theremainingnonrespon- dent cases are subsampled to concentrate the interviewer efforts on fewer cases and thus improve the representativeness of the respondent sample. More details on all stages of the training, sample selection, contact, screening, subsampling and inter- view procedures are available in Groves et al. (2009). From the segments in the listing study, 1,994 cases were selected for the NSFG study.1 24 of these cases were found to be improper listings, nonresidential or outside of segment, leaving 1,970 cases approached for the screener. 81.3% of the 1Although 49 segments were selected and listed for this study, the counts given here and in the rest of the paper refer only cases in the 46 segments where matching is complete. See Appendix C. 84 proper listings completed the screener. 56.1% of the screened households contained one or more eligible persons, and 75.5% of the selected respondents completed the interview (see Table 4.1). Table 4.1: Sample Performance in Quarter 12 of NSFG, Selected and Matched Seg- ments Selected Good HUs Screened Eligible Interviewed Cases 1994 1970 1602 898 678 Pct. of Selected 98.8% 80.3% 45.0% 34.0% Pct. of Previous Column 98.8% 81.3% 56.1% 75.5% Percent unweighted for selection probabilities Just over half of the completed interviews (56%) were with female respon- dents. Due to the topic of the survey, the questionnaires are quite different for male and female respondents. See Appendix H for outlines of the questionnaires. 4.2.2 Variable Selection For the bias analyses I chose variables that are in both the male and female questionnaires to keep the sample size large. The data from Cycle 7 will not be re- leased or reported on until 2011. However, the Vital Statistics Division has allowed me to access response data in advance of their release. These data are confidential and sensitive. As part of my agreement with NCHS, I cannot present any data that could be used to make forecasts about the Cycle 7 NSFG data. All of the variables for which I estimate coverage bias below are disguised. I use uninformative names and center the data. The continuous variables are also standardized (divided by the standard deviation of the variable). These manipulations ensure that readers 85 of this dissertation cannot glean advance information about the Cycle 7 data still being collected. Table 4.2 gives some information about the variables selected for the bias analyses. Table 4.2: Variables Used in Bias Analysis Variable Topic Type n M1 Health Proportion 678 M20 Health Proportion 678 M2 Sexual Proportion 665 M7 Sexual Proportion 677 M19 Sexual Proportion 667 M22 Sexual Proportion 668 M31 Sexual Proportion 658 M28 Demographic Proportion 678 M17 Demographic Proportion 678 M27 Financial Proportion 656 M4 Demographic Count 678 M6 Demographic Count 678 M24 Demographic Count 678 M15 Financial Continuous 678 M32 Financial Continuous 669 4.2.3 Undercoverage in NSFG listing Despite these limitations on the identification of the variables in my dataset, the multiple listing conducted in conjunction with NSFG offers a unique resource for the estimation of undercoverage bias in housing unit frames. The second and third listings of the segments contained a good deal of undercoverage and response data is available for many of the undercovered cases. The first row of Table 4.3 gives the number ofcases in my segments at eachstage of the interview process. The next two rows show the percent of cases at each stage that were listed by the second and 86 third listers. 88.6% of the selected cases were on the traditionally listed frame, and slightly more, 93.0%, were covered by the dependent-listed frame. The second column refers to those cases which were found by the interviewer to be appropriate listings?residential and inside the segment. Moving from left to right in the table progresses through the stages of the survey: screening, eligibility and interview. In the last column, the traditional listers undercovered almost ten percent of the cases that completed the interview, and the dependent listers just over five percent. The listing rates in both frames increase across the columns, meaning that the cases that continued through later stages of the interviewing process were easier to list than those that did not progress, an interesting finding I will return to in the discussion. The last two rows of Table 4.3 separate the cases on the dependent frame into those that were not manipulated and those that were deleted from the input list. 338 of the 1,994 selected cases were deleted from the input listing. (None of the selected cases correspond to housing units added to the input listing.) As found in Chapter 3, dependent listers were much less likely to list the cases that were deleted than those that were not deleted. This finding is reinforced by the coverage rates in the last two rows of Table 4.3. 4.3 Methods These coverage rates indicate that there is quite a bit of undercoverage in the second and third listings. However, just as nonresponse rates are not neces- 87 Table 4.3: Percent of Cases Listed by Second and Third Listings, by Survey Stage Selected Good HUs Screened Eligible Interviewed Total 1994 1970 1602 898 678 Traditional Listing 88.6% 89.2% 90.1% 90.2% 90.6% Dependent Listing 93.0% 93.5% 93.6% 94.1% 94.8% Unmanipulated 96.9% 97.3% 97.3% 97.6% 97.5% Deleted 74.0% 74.8% 75.0% 75.9% 81.1% Refers to only 46 segments where the matching is complete. sarily good predictors of nonresponse bias (Groves, 2006; Groves and Peytcheva, 2008), coverage rates are unlikely to be good predictors of coverage error. This pa- per uses two methods to estimate hypothetical coverage bias due to undercoverage of housing units by the traditional and dependent listings in NSFG variables. Each approach has strengths and weaknesses, and together they can provide a sense of the risk of bias due to undercoverage. 4.3.1 Direct Approach to Bias Estimation The first method, which I call the direct method, is simply the difference be- tween the mean calculated on the covered cases and the mean on all the cases. Let ?Y be the estimate of the mean or proportion of a given variable on the 678 respond- ing cases in my 46 segments.2 This mean will be 0 for all variables, due to the centering as discussed above. Let ?Ytrad be the same mean calculated on only those selected and completed cases which were also included in traditional (second) list- ing. There are 614 such cases. Let ?Ydep be the mean calculated on only those cases 2Due to some missing data among the completed cases, as shown in Table 4.2, the exact number of cases may be smaller for some variables. 88 covered by the dependent listing, 643 cases. Then the direct estimates of bias are: biasdirecttrad ( ?Y) ? ?Ytrad? ?Y ? ?Ytrad biasdirectdep ( ?Y) ? ?Ydep? ?Y ? ?Ydep (because ?Y ?0 for all variables Y). While this estimation method has intuitive appeal, the bias estimates it pro- duces for the dependent listing method necessarily reflect the manipulation of the input list. Dependent listers were much less likely to include the deleted units, and the manipulation may affect the bias estimates if the deleted cases are different than the unmanipulated on the survey variables. Using the direct method, it is not possible to remove the effect of the manipulation of the input list from the bias estimate. 4.3.2 Indirect Approach to Bias Estimation The second approach to estimating bias, which I call the indirect approach, takes advantage of the listing propensity models used in previous chapters. The indirect estimate of bias is: biasindirect( ?Y)? Y??? (4.1) 89 where ?i is the coverage propensity of housing unit i, ?? is the average propensity among the covered and undercovered cases, and Y? is the covariance between the propensity and the variable of interest (adapted from Bethlehem, 2002). Bias will be large when the listing propensity is highly correlated with the survey variable, or the average listing propensity is low. If the listing propensity model is correctly specified, the indirect method should give the same estimates as the direct method. The indirect method can produce estimates of bias due to undercoverage by traditional and dependent listing, just as the direct method does. I fit two logistic models to estimate the propensity of each completed case to be covered by each of the two listing methods, ?trad and ?dep. Plugging each of the propensities into the numerator and denominator of equation 4.1 leads to indirect estimates of bias due to the traditional and dependent listing methods. A strength of the indirect method is that I can use the listing propensity mod- els to simulate bias under different conditions. Specifically the indirect method allows for the estimation of bias in dependent listing if the input list had not been manipulated. The weakness of this method is that it is quite model dependent. Different models of listing propensity may lead to different estimates of bias. 4.3.2.1 Listing Propensity Models The models used to predict listing propensity for the indirect bias estimation differ in several ways from the models presented in earlier chapters. Here the goal of the models is not to study the characteristics that make listing more or less likely, 90 but to estimate the probability that each unit will be listed, given its values on the relevant characteristics. I fit separate models for traditional and dependent listing. The dependent variable in the models is again a binary indicator of whether the lister included the housing unit (1) or did not (0). Both models use logistic regression to ensure that all predicted propensities are within the range (0,1). Only the 678 completed cases are included in the models. This reduction in sample size necessitates rethinking the structure of the models as there are fewer cases within each segment and lister.3 The models in this chapter do not contain any random or fixed effects for segments or listers. In the earlier versions of the models, fixed effects for listers removed the idiosyncratic effects of the particular lister used in each segment to isolate the effect of lister characteristics in the regression coefficients. However, coefficient estimation is not the goal of the models this in this chapter and thus this precaution is not necessary. The simpler models used here do account for the clus- tering of housing units into segments in calculating standard errors, but without explicitly modeling the segments with random effects. The benefit to running un- clustered models without fixed or random effects is that more post-estimation and diagnostic tools are available. The estimated odds ratios for the explanatory variables and fit statistics are given in Table 4.4 for the traditional and dependent listing propensity models, though I do not interpret them here.4 I tested several versions of the models with 3Two of the 46 segments contain zero completed interviews. Only one respondent was selected in these segments, due to very low eligibility rates, and s/he did not complete the interview. 4For interpretation of listing propensity in the two listing methods, see Chapters 2 and 3. 91 different covariates until I found those that had high AUC values (area under the ROC curve), indicating a strong ability to discriminate between the listed and un- listed cases, and low dbeta statistics, indicating the model fits all the data points rather well (Hosmer and Lemeshow, 2000; Long and Freese, 2005). Figures 4.1(a) and 4.1(b) show the kernel densities of the predicted propen- sities from the traditional and dependent listing models. The distributions have very similar shapes and ranges. The majority of the housing units have very high propensities, near 100%, due to the high coverage rates in each method (89.2% in the traditional listing and 93.5% in the dependent listing). The listing propensities from the dependent listing model incorporate flags for the effect of the cases deleted from the input listing. 111 of the 678 completed cases were deleted from the input list. This manipulation has a strong effect on listing propensity, the failure-to-add effect, as shown in the odds ratio on the last variable in Table 4.4. These deleted cases are disproportionately missing from the frames created by the dependent listers.5 The low listing propensities of these cases could affect the estimates of bias from the direct method. If these cases are different than the unmanipulated cases on any variables chosen for the bias analysis, then the estimates of bias will be affected by the manipulation. The indirect method presents an opportunity to remove the effect of the ma- nipulation of the input list from the bias calculation. Recoding the relevant variable 5Note that the other type of input list manipulation, the addition of units, does not affect the propensities of the selected cases because none of these cases was on the first listing and thus none was eligible for selection for NSFG screening and interviewing. For this reason, the only manipu- lation I need to worry about when computing propensities for the selected cases is the deletion of units listed by the first lister. 92 Table 4.4: Listing Propensity Model Traditional Listing Dependent Listing Odds Ratio z Odds Ratio z Multi-Unit 0.130 ?? (-2.93) 1.910 (0.77) Map ,invisible boundary 0.040 ? (-2.53) 0.893 (-0.13) Map ,simple shape 1.624 (0.42) Segment has external water boundary 1.020 (0.02) Pct. HUs rural 0.599 (-0.37) 0.010 ? (-2.09) Pct. HHs with income <= 50,000 0.564 (-0.45) 10.009 (1.06) Pct. Pop .Afr .-Amer . 1.802 (0.32) 0.509 (-0.28) Pct. Pop Spanish language 0.035 (-0.63) 0.036 (-0.39) Gated communities in segment 19.566 ? (2.15) 4.598 (0.55) Lister drove herself while listing 0.549 (-0.60) 5.567 (1.08) Lister feels unsafe 0.082 ? (-2.28) 0.180 (-0.71) Lister and segment language matc h 5.423 (1.64) 13.290 (1.65) Years of interviewer experience 1.095 (0.56) 1.403 (1.41) Lister African-American 1.003 (0.00) 0.026 (-1.36) Lister speaks Spanish 0.061 ? (-2.32) 0.799 (-0.10) Lister reports Spanish speakers 2.199 (0.56) 3.933 (0.56) Units in segment predominately trailers 0.017 ?? (-2.94) 0.045 ??? (-3.37) HU deleted from input listing to L3 0.082 ?? (-3.20) Constant 269.261 ??? (3.70) 0.399 AUC 0.853 0.927 Pseudo- R2 0.252 0.374 Observations 678 678 ? p? 0.05, ?? p? 0.01, ??? p? 0.001 93 0 2 4 6 8 Density 0 .2 .4 .6 .8 1Propensity (a) Traditional Listing 0 10 20 30 Density 0 .2 .4 .6 .8 1Propensity (b) Dependent Listing, With Manipulations 0 20 40 60 Density 0 .2 .4 .6 .8 1Propensity (c) Dependent Listing, Without Manipulations Figure 4.1: Distribution of Predicted Listing Propensities, by Listing Method 94 on the manipulated units to reflect no manipulation and re-estimating the depen- dent listing propensity of these cases produces a third predicted listing propensity for every case, ?dep, no manip.6 Using this propensity, the indirect method of bias es- timation can estimate the bias in the survey variables had the manipulation of the input list not been done. Figure 4.1(c) shows the distribution of the predicted listing propensities for this counterfactual situation, had the input to the dependent listing not been ma- nipulated. The predicted propensities are all larger than 50%, higher than the predicted propensities from the other models. The outcome of these models are three predicted listing propensities for each of the interviewed cases. The indirect method of bias estimation can approximate bias under each of these listing scenarios. 4.3.3 Variance of Bias Estimates These bias estimates are useful only when accompanied by a measure of their precision. The Jackknife procedure can put a confidence interval around each of the estimates (Wolter, 2007, Chapter 4). The Jackknife is useful in estimating vari- ances in many complex designs involving clustering and weighting (Lohr, 1999, pp. 304?306). The form of the Jackknife used here, the delete-1 procedure, involves divid- ing the sample into R groups, and repeating the relevant estimation procedures R times, dropping each of the R groups one at a time. The R groups were the 13 PSUs 6?dep, no manip ??dep for all unmanipulated cases. 95 selected into the listing project. (See Chapter 2 for more details on the selection of PSUs and segments for this project.) Consider the direct estimates of bias in the mean of variable Y due to under- coverage in the traditional listing, biasdirecttrad ( ?Y). Let biasdirecttrad ( ?Y(r)) be the estimate of bias when the rth PSU is dropped. This calculation involves re-estimating both the mean on all the cases in the undropped PSUs ( ?Y(r)) as well as the mean on the cases covered by the second lister ( ?Ytrad,(r)) , and then taking the difference. This process yields R different direct estimates of the bias. The variance of these R estimates around the bias estimate on all PSUs7 gives an estimate of the variance of the original estimate. The form of the Jackknife estimator of the variance of the estimate of bias in variable Y, due to traditional listing, is: ?V[biasdirecttrad ( ?Y)]? R?1 R R? r?1 [biasdirecttrad ( ?Y(r))?biasdirecttrad ( ?Y)]2 (4.2) The square root of this variance is the standard error of the bias estimate. The Jackknife variance calculation for the indirect estimate is very similar. For each survey variable and each of the three listing propensities (?trad, ?dep, and ?dep, no manip), drop each PSU in turn, recalculating both the numerator and the denominator of the bias estimator. This procedure leads to R estimates. The sum of the squared differences between each estimate and the full sample estimate, times 7As originally developed, the squared deviations in the Jackknife formula were taken around the average across the R estimates, not around the full sample mean. However, it is now common to use the full sample mean in this calculation (Wolter, 2007, p. 153). 96 an adjustment factor, estimates the variance in the full sample estimate. ?V[biasindirectdep ( ?Y)]? R?1 R R? r?1 [biasindirectdep ( ?Y(r))?biasindirectdep ( ?Y)]2 (4.3) These estimated variances and standard errors reflect the impact of the experi- mental design only, not any sampling variance due to the selection of segments or cases.8 4.4 Results of Bias Analyses The two methods of bias estimation produce five estimates of bias for each variable: one for each of the two listing methods, from both the direct and indirect methods, plus one additional estimate from the indirect method using the propen- sity that removes the effect of the listing manipulations. These estimates are pre- sented below in tabular and graphical form to permit comparisons among methods and variables. Direct estimates of bias in NSFG variables due to the traditional and depen- dent listings of each segment are shown on the left side of Figure 4.2. The top left panel of Figure 4.2 shows bias in proportions and the bottom panel shows bias in continuous variables. These are estimates not of the bias in official Cycle 7 data, but of the bias that would be introduced had these alternative frames been used instead of the initial listing. 8The variance estimation procedure also does not account for the uncertainty in the predicted listing propensities. 97 Each row in the top left panel corresponds to one of the ten proportions se- lected for bias estimation. Along the horizontal axis is the bias scale in percentage points, with 0 representing no bias. For each variable there are two points. The circle shows the bias in the proportion if the completed cases in housing units that were not listed by the traditional lister were dropped from the calculation. The square shows the bias if the cases not covered by the dependent lister were dropped. The rows are sorted by the size of the bias due to traditional listing. In the first row of Figure 4.2(a), undercoverage in the traditional listing would have led to bias of about -0.35 percentage points in proportion M22. If the propor- tion calculated on the full sample for this variable was 50%, then the calculation on only the cases covered by the traditional listers would be 49.65%. The bias due to undercoverage in dependent listing is smaller on this variable: calculating the proportion on only the cases listed by the dependent lister would yield an estimate with a very small positive bias (0.05 percentage points), indicated by the square just to the right of the reference line in the first row. (While these are small effects, they could be large in relation to the mean. A change of 0.35 percentage points if the mean of this variable were 2% would be a large relative effect. Because the variables are centered, we cannot see the relative effect here.) The bias due to de- pendent listing in the second row is the largest in absolute value, just under one percentage point in the negative direction. No proportion would be biased by more than one percentage point in either direction. The lower left panel of Figure 4.2 shows the bias estimates for the continu- ous variables. The horizontal axis in this graph is in percent of standard deviation 98 ?1 ?.5 0 .5 Bias in Percentage Points M28M27M20M2M31M19M17M1M7M22 Traditional Dependent (a) Direct Estimates of Bias ,Proportions ?1.5 ?1 ?.5 0 .5 1 Bias in Percentage Points M28M27M20M2M31M19M17M1M7M22 Traditional Dependent, with deletions Dependent, no deletions (b) Indirect Estimates of Bias ,Proportions ?4 ?2 0 2 4 6 Bias in Percent of Standard Deviation M6M32M15M24M4 Traditional Dependent (c) Direct Estimates of Bias ,Continuous ?2 ?1 0 1 2 Bias in Percent of Standard Deviation M6M32M15M24M4 Traditional Dependent, with deletions Dependent, no deletions (d) Indirect Estimates of Bias ,Continuous Figure 4.2: Estimates of Bias in Survey Variables Due to Undercoverage ,by Listing Method and Estimation Method 99 units. For example, in the first row of Figure 4.2(c), the bias in variable M4 due to undercoverage in traditional listing would be four percent of one standard deviation in the negative direction. The bias in the same variable due to undercoverage in dependent listing would be slightly smaller in absolute value but also in the neg- ative direction. Listers using both methods tended to undercover cases where the value on this variable was greater than the full sample mean. No estimate would move more than six percent of one standard deviation in either direction due to undercoverage. There is no bias due to either method for M15, a financial variable. M32 is also financial and shows little bias. Results in earlier chapters suggested a correla- tion between the percent of households in a segment living below the poverty line and the coverage rate in the segment. We might then expect to find bias in these financial variables due to undercoverage, but we do not. The two graphs on the right side of Figure 4.2 display the indirect bias esti- mates for the same variables (in the same order). The horizontal axes in each panel on the right are in the same units as the corresponding panels on the left. Figures 4.2(b) and 4.2(d) each show three estimates corresponding to the bias due to un- dercoverage in traditional listing, dependent listing with the manipulations of the input list, and dependent listing without the manipulations in the input list. In the first row of Figure 4.2(b), traditional and dependent listing would each lead to slight positive bias in variable in M22, and dependent listing without the manipulations would lead to a small negative bias. In the second row, traditional listing leads to negative bias of 1.2 percentage points. The two dependent methods 100 would lead to biases that are approximately equal in magnitude but of different signs. Most of the bias estimates in the upper right panel are close to zero, though the overall range of estimates is slightly larger than the direct estimates for these variables, given in Figure 4.2(a). The graph in the lower panel shows the three bias estimates for each of the five continuous variables. For each variable, the bias associated with both kinds of dependent listing is smaller (in absolute value) than that from traditional listing. The range here is narrower than the direct estimates of bias for the same variables in the lower left panel. The indirect estimates of variance for these variables are in general smaller than the direct estimates. Here we do not see smaller bias effects for the two continuous financial variables, M15 and M32. Table 4.5 presents the estimates behind these graphs, as well as confidence intervals for each estimate. When assessing the significance of as many estimates as are given in Table 4.5 (30 direct + 45 indirect), it is wise to use a higher probabil- ity cutoff than 95%, which would find on average three or four significant values in any set of 75 estimates. For this reason, the table shows 99% confidence intervals rather than the standard 95%. None of the bias estimates are significantly different than zero at the one percent level. 4.5 Discussion & Conclusion The bias estimates developed and presented in this chapter reflect the hypo- thetical risk of bias in several variables collected by NSFG. These estimates are 101 Table 4.5: Bias Methods with 99% Confidence Intervals for All Variables and Methods Direct Bias Estimates Indirect Bias Estimates Variable Trad CI Dep CI Trad CI Dep w/ manip CI Dep ,no manip CI Proportions ,units: percentage points M22 -0.342 (-1.068, 0.384) 0.050 (-0.376 ,0.475) 0.202 (-0.865 ,1.27) -0.074 (-0.574 ,0.426) 0.076 (-1.495 ,1.647) M7 -0.338 (-1.335, 0.659) -0.968 (-1.948 ,0.011) -1.265 (-2.871 ,0.341) -0.177 (-3.696 ,3.343) 0.181 (-3.319 ,3.681) M1 -0.312 (-1.092, 0.468) 0.240 (-0.366 ,0.847) 0.267 (-0.738 ,1.272) 0.514 (-0.333 ,1.361) 0.004 (-0.346 ,0.354) M17 -0.309 (-2.523, 1.905) 0.214 (-2.39 ,2.818) 0.009 (-6.288 ,6.306) -0.513 (-19.077 ,18.051) 0.527 (-12.94 ,13.994) M19 -0.135 (-1.045, 0.775) -0.070 (-1.227 ,1.087) -0.091 (-0.656 ,0.474) -0.153 (-1.558 ,1.252) 0.157 (-3.12 ,3.434) M31 -0.030 (-0.996, 0.936) 0.392 (-0.838 ,1.621) 0.850 (-1.082 ,2.783) -0.015 (-1.788 ,1.758) 0.016 (-0.416 ,0.448) M2 0.033 (-0.971, 1.038) 0.528 (-0.575 ,1.631) -0.564 (-1.741 ,0.613) -0.024 (-2.4 ,2.353) 0.024 (-0.501 ,0.55) M20 0.128 (-0.033, 0.289) -0.080 (-0.556 ,0.396) -0.011 (-0.306 ,0.284) 0.015 (-0.174 ,0.204) 0.160 (-0.162 ,0.481) M27 0.379 (-0.852, 1.611) 0.166 (-1.631 ,1.962) 0.000 (-0.685 ,0.684) -0.051 (-2.132 ,2.029) 0.052 (-1.426 ,1.531) M28 0.460 (-0.981, 1.901) 0.602 (-0.888 ,2.091) 0.423 (-0.966 ,1.813) 0.552 (-0.821 ,1.926) 0.207 (-0.68 ,1.094) Continuous ,units: percent of standard deviation M4 -4.070 (-11.774 ,3.634) -3.158 (-12.107 ,5.79) -2.055 (-5.668 ,1.557) -0.158 (-11.882 ,11.566) 0.162 (-2.828 ,3.153) M24 -0.501 (-2.934 ,1.932) -1.193 (-5.003 ,2.618) -0.634 (-3.193 ,1.925) -0.039 (-1.852 ,1.775) 0.040 (-1.215 ,1.294) M15 -0.036 (-0.12 ,0.049) -0.025 (-0.12 ,0.07) -0.535 (-4.096 ,3.026) -0.095 (-6.386 ,6.196) 0.098 (-2.807 ,3.002) M32 0.902 (-0.2 ,2.004) 0.511 (-0.881 ,1.903) 0.675 (-1.1 ,2.45) -0.144 (-1.525 ,1.237) 0.148 (-2.746 ,3.041) M6 5.310 (-2.657 ,13.277) 4.631 (-3.354 ,12.616) 2.082 (-2.599 ,6.762) 0.556 (-2.526 ,3.639) 0.752 (-2.663 ,4.166) No bias estimates are significant at the 1% level 102 of course survey-specific and variable-specific (and statistic-specific: I have looked only at means, not other statistics that could be calculated from the same variables such as totals, medians, etc.). None of the 75 estimates of bias are significant at the one percent level. A larger sample of cases or segments would perhaps detect significant differences in these estimates, but the current study is too small and clustered to do so. To my knowledge, these are the first estimates of coverage bias due specifically to housing unit coverage in the survey literature. Coverage error is rarely studied because of the difficulties involved in both identifying the undercovered cases and collecting data about them. The unique design of this listing study made these estimates possible. The multiple listing approach revealed the cases at risk of undercoverage in each listing method, while the survey collected data on these cases. Despite the unique contribution of this study, several shortcomings should be noted. This paper can calculate bias due only to undercoverage by the second and third listers. Although the first listing received more quality checks than the others and is likely of higher quality, it too may contain some undercoverage. There were quite a few housing units listed by the second and third listers yet missed by the first. Some of these cases may be good listings, but they were not eligible for selection and no data exists for them. A richer dataset would have data about these cases and support estimates of bias due to undercoverage in the first listing as well. The adjusted dependent listing propensity (Figure 4.1(c)) represents a listing situation that is not much more realistic than the one which does reflect the ma- nipulations (Figure 4.1(b)). With the manipulations modeled away, the input listing 103 alreadycontainsallofthecasestobecovered. Buttheframesthatareusedasinput to dependent listing do contain errors of inclusion and exclusion (O?Muircheartaigh et al., 2006, 2007; Montaquila et al., 2010). A more realistic approximation of de- pendent listing as it is actually used would involve modeling errors in the input list that mimic the errors these lists really contain. However, this approach is outside the scope of this chapter. The bias and standard error calculations in this chapter are unweighted for the PSU, segment, housing unit, respondent and subsampling selection probabili- ties. Thisisunfortunateandmayverywellaffectmyestimatesofbiasandvariance. For example, Chapter 2 showed that housing units in rural areas are less likely to be covered by traditional listers. The rural segments in this study have low prob- abilities of selection, due to their low housing unit counts, and thus cases selected from these segments should have larger weights than the cases in other segments, where coverage rates are also higher. If the coverage bias in rural segments has a different pattern than in the non-rural segments, including the selection weights will change the overall estimates of bias in favor of the pattern in the heavily- weighted rural segments. I will rerun all bias and variance calculations later this year when the weights are available from the Survey Research Center, the NSFG data collection contractor. At that time I will also incorporate adjustments to the weights for both the selection of quarter 12 from the 16 quarters of Cycle 7 data collection and for the selection of segments for my study. The bias estimates found in this chapter are rather small in magnitude. Look- ing back at Equation 4.1, for a given set of listing propensities, the size of the bias is 104 due to the relationship between the propensities and the survey variables. Many of the variables collected in the NSFG survey are related to marriage, reproduction, and sexual histories (see Appendix H). The theories and models tested in Chap- ters 2 and 3 do not suggest strong correlations between these variables and listing propensities, which should lead to low bias. For the bias analysis, I chose some vari- ables on fertility and sexuality topics and others on more general topics, to speak to bias in other surveys as well. Bias estimates for all the items explored were small. However, bias that is small in absolute terms may be large in relative terms. One finding in this chapter and also mentioned briefly in Chapter 2 is the re- lationship between listing propensity and response propensity. The coverage rates on all cases are lower than those on the screened cases, which are lower than cover- age rates on the interviewed cases (see Table 4.3). The cases that progress through the survey, those that are most cooperative, are also more likely to be covered. This interesting relationship warrants additional research. While undercoverage does exist in housing unit frames, it appears in this study to be only weakly related to variables on the survey questionnaire. The find- ings of low bias due to undercoverage above do not apply directly to the official NSFG frame, the first of the three listings, from which the interviewed cases were selected: data on cases undercovered by that frame were not available. However be- cause that official frame received more scrutiny and quality checks than the other two frames, we can presume that undercoverage, and bias due to undercoverage, is lower on the official frame than on the two frames examined closely. Overall the findings here of small bias due to undercoverage are good news for NSFG and a 105 positive sign for all household surveys that use listing to create household frames. 106 Chapter 5 Conclusions This dissertation has used two datasets to explore errors in housing unit frames and how these errors can lead to bias in survey data. I find that even well- trained and experienced listers do produce different frames, making errors of un- dercoverageandovercoverage. Ishowforthefirsttimethatexperiencedlisterstend not to fix the errors in the listing they are given. I call this phenomenon confirma- tion bias, after a similar finding in research on dependent coding and interviewing. However, none of the errors of overcoverage and undercoverage in this study lead to substantial absolute bias in means derived from survey estimates. These findings are good news for surveys which depend on listing to create sampling frames, but also suggest directions for future research. Chapter 1 used a repeated listing dataset from the Census Bureau to esti- mate the degree of disagreement between two experienced listers using the same methodology. The two listers do produce different frames which would lead to differ- ent samples and different final datasets. The overall agreement rate was only 79%, and this varied quite a bit among the blocks. While I could not separate undercov- erage by one lister from overcoverage by another with this dataset, the agreement rate indicates that listers do make errors. These findings motivate the need for additional work to uncover the mechanisms of lister error. 107 Unfortunately analyses of these data were constrained by a lack of informa- tion about interviewers. In the two listings of each segment, most of the charac- teristics that varied were those due to listers, such as experience, training, etc., yet the dataset did not permit analysis of these characteristics. I was also not able to manipulate the listing process. To address these shortcomings, I worked with the Survey Research Center at the University of Michigan to collect data specifically for my dissertation. Experi- enced listers from the National Survey of Family Growth conducted three listings of a nationally-representative sample of 49 segments. The dataset contains lister observations of the segments, lister demographics and background information as reported in the interviewer questionnaire, as well as response data for a sample of housing units. I experimentally manipulated the listing task to test hypotheses about lister error. The analyses of both datasets rest on the quality of the matching work, which is always imperfect. Undoubtedly another researcher would match the Census Bu- reau and NSFG datasets slightly differently. For this reason, Chapter 1 and Ap- pendix C report the details of the procedures used in matching. I gave the listing task the care and attention it deserved as the foundation of all my analyses. Nev- ertheless, I was not able to complete the matching work in three very rural NSFG segments where few units had house numbers. These segments were dropped from most analyses in Chapters 2, 3 and 4. ThesecondchapterusedtheNSFGdatasettotesthypothesesaboutthemech- anisms of error in traditional listing. These hypotheses were motivated by an un- 108 derstanding of the listing task as a principal-agent problem in which monitoring is costly and the agent (lister) has more information than the principal (survey re- searchers). The overall coverage rate for the traditional listers was 89%. Breaking this overall rate down by housing unit and segment characteristics replicated find- ings of earlier work: multi-unit and vacant units were undercovered, as well as units in poor and rural segments. The hypotheses derived from the principal-agent model found limited support, suggesting that we should look to alternative theoret- ical approaches for future work on the mechanisms of error in traditional listing. I suspect this limited support is due in part to the small size of the NSFG listing dataset. While large in terms of housing units, it contained 49 segments and only 11 listers within each method. Testing the principal-agent involved interacting lister attributes with segment characteristics and also with housing unit character- istics. The dataset was not powerful enough to detect significant contributions at these higher levels. I have some suggestions below on how future studies can avoid these problems. The third chapter tested hypotheses about the mechanisms of error in depen- dent listing related to confirmation bias. Analyses revealed that listers do show a tendency to confirm the list that they are given and not to add missing units or re- move inappropriate ones. Units in multi-unit buildings are particularly vulnerable. Results are quite strong in both bivariate and multivariate analyses. These findings indicate that confirmation bias should be a concern in depen- dent listing. But the results lack external validity?the introduced errors are not the same as the errors listers are likely to encounter in the input lists actually used 109 in listing. Future research into confirmation bias should look more carefully at the kinds of errors that are typical in input lists and whether these are the types that listers tend not to correct. The implications of more realistic confirmation bias for coverage bias in sur- vey data should also be explored. The errors introduced here were random. Perhaps the errors in the commercial address databases are not random and are related to survey variables, raising the risk of coverage bias due to confirmation bias. For example, new construction is often missing from the commercial address lists as it takes a while for these units to be picked up by the postal service and then make their way into survey frames. The families in these units may be younger or less wealthy than those in the older units; undercovering them due to failure-to-add error could lead to bias. The last substantive chapter looks at bias due to undercoverage in both tradi- tional and dependent listings. Using two methods of bias calculations and 15 NSFG variables, I found only small bias would result if the alternative frames had been used. Together these chapters break new ground in listing research and lead me to several suggestions for surveys which use listing. The findings on confirmation bias are the strongest results from my research. The quality of the frame produced via dependentlistingisinpartafunctionofthequalityofthelistprovidedtothelisters. When using dependent listing, I recommend that the threshold for the size of the input list be set quite high, particularly in areas with many multi-unit buildings. For example, dependent listing could be used only when the size of the input list is 110 greater than or equal to the housing unit count from the most recent Census. This requirement would help reduce the chances of failure-to-add error. Furthermore I suggest that lister training emphasize the point that input listings do contain errors of both omission and inclusion. Some training practices encourage failure-to-delete error by emphasizing a preference to err on the side of overcoverage. These instructions are based on a belief that overcoverage does not lead to bias and can be cleaned up at low cost during data collection. I have more concerns about overcoverage and believe we need to understand better how inter- viewers handle instances of multiple probability overcoverage in their assignments. Failure-to-delete error, however, is a larger threat to bias and not the intention of any lister training. Training should include a discussion of confirmation error to warn listers against it. Perhaps manipulations like those in Chapter 3 could be used to periodically check up on how carefully listers examine the input list for errors. Another implication concerns the coverage of units in multi-unit buildings. That listers undercover and overcover these sorts of units is a robust finding in previous research and in my dissertation. For those surveys where full coverage is critical or where multi-units status is believed to correlate strongly with the sur- vey variables, I propose a procedure similar to the missed housing unit procedure. When a selected case is in a multi-unit building, the interviewer could be asked to do additional work to determine the number of units in the building. Interviewers often gain access to buildings and speak to residents and are thus in a better po- sition than listers to get an accurate unit count. If the number of units found by 111 the interviewer is greater than the number on the frame, appropriate adjustments, including selection of the new units, could be made. This procedure would increase coverage in buildings with multiple units. Of course, the findings in this dissertation raise more questions than they answer and I have several ideas for ways to expand up on this research in the future. In retrospect, drawing a national sample of segments was not necessary. (In fact the final matched dataset was not nationally representative due to the absence of weights and the difficulties matching three rural segments.) If I were to redesign thisstudy, Iwouldfocusonjustthreeorfourpurposefully-chosenareaswithseveral listing segments each. One would be very urban, with many multi-unit buildings and another should be very rural (more on rural listing below). Most importantly, more listings should be done of each segment, by more lis- ters. The repeated listings should use both methods but also repetitions within method as in the Census Bureau dataset. Such a design would permit stronger analyses of inter-lister variation in frame quality and improve the ability to sepa- rate the effects of lister characteristics from those of segments. In my study, only 11 listers participated in each method. The models in Chapters 2 and 3 did not contain enough variability at the lister level to test many interesting interactions of lister and segment characteristics, such as race of lister and race of segment residents. Future research should aim to do so. Of course, additional listings would complicate the matching task. The technique of partnering with a survey already in the field worked out quite well and I would do so again. However, the dataset contained no responses 112 from cases that were undercovered by the official NSFG listing. In the future I would hope to gather data about cases undercovered by each listing. Looking to the future of housing unit listing, I believe we will see a move towards the use of commercial address lists without in-field updating. NORC used this method of frame construction in its most recent National Frame (Harter et al., 2010). However, these are still many parts of the country, particularly rural areas, where the lists? coverage is quite low. I suspect that in five years or so, these rural areas will be where most listing work is carried out. Yet we know little about how to address the particular challenges posed by rural listing. In these areas, listers often drive because the distances are too great to walk. Housing units often do not have numbers, and streets may not have names. I recommend additional research into rural listing to identify the difficulties listers in these areas face and develop procedures to address them. Finally I want to return to the larger research interests that prompted this dissertation. While I am quite interested in coverage research and have been work- ing in this area for many years now, I am also fascinated by the role of interviewers in survey work. A dissertation on listing error nicely combined these two interests, exploring how interviewers affect coverage. Interviewers can contribute to nearly every component of total survey error. We most often think of interviewers? behaviors leading to measurement error in the ways they administer questions, or to nonresponse error by the kinds of respon- dents they recruit, or fail to recruit. Interviewers can bias samples when allowed to select their own cases (Manheimer and Hyman, 1949; Eyerman et al., 2001). Inter- 113 viewers as coders can introduce bias and variance (Campanelli et al., 1997; Biemer and Lyberg, 2003). This dissertation has focused quite narrowly on the role of inter- viewers (listers) in housing unit coverage, but in my future career I hope to study the role of interviewers in other error sources as well. Interviewers are the agents of data collection. Most of the responsibility for the quality of the final data product rests with them. The field needs a better understanding of the influences on their behavior and their decisions, and how these affect survey data. 114 Chapter A Appendix: Coding of Quality of Listing Maps The maps provided to the listers by both the Census Bureau and the Survey Research Center (SRC) are often out of date and can even be misleading. Nearly all of the listers I spoke to expressed frustration with the quality of the listing maps. Most reported that they purchased commercial maps or downloaded online maps for all their segments. Both the SRC and Census maps are derived from TIGER (Topologically Integrated Geographic Encoding and Referencing) data released by the Geography Division of the Census Bureau. See Figure A.1 for an example SRC listing map. The most important function of the maps is to let the lister know which block or blocks are selected for listing. Any errors in the maps, such as incorrect street names, unclear boundaries, missing streets, etc. can lead to errors in listing. (See Roberts (2010) for a striking example of how map errors led to substantial under- coverage in the decennial Address Canvassing effort.) For both my listing datasets, I compared the TIGER-based maps with Google maps (both map and satellite view) of the same area and coded discrepancies that I noticed. Map_Simple Blockissimpleandrectangularwithoutinteriorstreets, inbothTIGER and Google maps. Map_Interior Google map indicates additional interior streets not shown on the 115 Figure A.1: Example SRC Listing Map 116 TIGER map. Listers specifically mentioned these anomalies as making the listing task difficult (lister debriefings). Map_NVBB Block appears to have a nonvisible boundary, i.e. at least one of the block boundaries is not obviously a street or a water feature. Nonvisible boundaries are often political boundaries (town, county, etc.) but may also be overhead power lines or the previous path of a stream. One lister said she discovered a nonvisible boundary to correspond to an underground cable. Lis- ters reported difficulties understanding where to start and stop listing when their segments have nonvisible boundaries (lister debriefings). 117 Chapter B Appendix: Logistic Regression Models of Traditional Listing Propensity 118 Table B.1: Traditional Listing Propensity Models ,Selected Cases Only (1) (2) (3) (4) OR z OR z OR z OR z Multi-Unit 0.208 ??? (-5.49) 0.217 ??? (-5.28) 0.350 ? (-2.09) Vacant 0.683 (-1.52) 0.675 (-1.56) 0.684 (-1.51) Trailer 0.978 (-0.03) 1.005 (0.01) 1.001 (0.00) Pct. HUs rural 0.170 (-1.54) 0.179 (-1.44) 0.174 (-1.46) Pct. HHs with income <= 50,000 1.369 (0.24) 0.556 (-0.33) 0.570 (-0.32) Pct. Pop Spanish language 0.289 (-0.24) 0.164 (-0.34) Pct. Pop .Afr .-Amer . 2.109 (0.32) 1.839 (0.26) Map ,invisible boundary 0.308 (-1.72) 0.293 (-1.78) Lister has safety concerns 0.342 (-0.83) 0.384 (-0.71) Lister drove herself while listing 1.557 (0.57) 1.565 (0.55) Lister has safety concerns 0.342 (-0.83) 0.384 (-0.71) Lister and segment language matc h 2.229 (0.86) 2.741 (1.05) Multi *Lister has safety concerns 0.990 (-0.01) Multi *Lister drove 1.073 (0.10) Multi *Language matc h 0.489 (-0.92) Constant 17.721 ??? (9.42) 1.3e+09 (0.01) 2.4e+10 (0.00) 7.5e+09 (0.01) StdDev(segments) 1.753 1.234 1.145 1.141 rho 0.483 0.316 0.285 0.284 Log Likelihood -549.148 -517.053 -514.690 -514.054 Observations 1970 1970 1970 1970 ? p? 0.05, ?? p? 0.01, ??? p? 0.001 119 Chapter C Appendix: Matching Addresses in NSFG Listing The analyses presented in this paper involve two rounds of address matching, comparing listed addresses to identify those which refer to same housing unit. This sort of matching work always requires judgments. Other researchers might make different judgments and would thus create a slightly different agreement indicator, which would impact the results. However, I feel that all of the matching decisions I made are justifiable and defensible. The quality of this matching work will greatly affect the quality of the results; both false matches and false non-matches would cause errors in my analysis. In this appendix I explain the two rounds of matching in detail. The first round of matching involved comparing the input listing given to the dependent listers (the listers who performed the third listing of each segment) to the frame created by those listers. In the second round I matched the three frames to each other to determine which units were listed by only one lister, which by two listers and which by all three. Each round involved several matching steps using both computerized and manual matching and several quality checks. Although the procedures used in each of the two rounds were quite similar, in this appendix I discuss all steps in both rounds separately. In all of this matching work, my goal was to match addresses that would 120 lead to the same housing unit being selected. Using this principle, I did allow matches of what at first glance seem to be different addresses, e.g. x31x31x34x36 x4Ax75x6Ex69x70x65x72 x4Cx6Eandx31x31x36x34 x4Ax75x6Ex69x70x65x72 x4Cx6E. AfterdiscussionwithlistersandSurveyResearchCenter central office staff, it became clear that interviewers notice and correct these sorts of mistakes in the field. If x31x31x34x36 x4Ax75x6Ex69x70x65x72 x4Cx6E is selected and does not exist, but x31x31x36x34 x4Ax75x6Ex69x70x65x72 x4Cx6E does, the interviewer will often make the judgment about which address was intended herself or she will call in to the central office for confirmation. I also did not allow any many-to-one matches: each address on a frame could have one and only one match on another frame. All matching was done only within segment. Althoughtherehavebeengreatadvancementsrecentlyinprobabilisticmatch- ing algorithms (Herzog et al., 2007; Schnell et al., 2009), these sophisticated tech- niques are not needed here. The segments in my study are quite small: the average number of housing units per segment is less than 200 (range from 50 to 2300, see table 2.1). Thus manual review of all lines within each segment was possible. Ad- ditionally, the probabilistic matching routines are very good at resolving spelling errors in street names, but due to the drop-down menu of street names in the list- ing software, spelling differences are uncommon in these listings. To protect respondent confidentiality, none of the addresses shown in the ex- amples below are true addresses from the quarter 12 NSFG listing. 121 C.1 Matching Input List to L3 The third listing of each segment used the dependent method of listing. I derived the input to this dependent listing from the frame created by the first lister (L1), with manipulations (additions and suppressions) as discussed in the text. The manipulations allowed me to test for failure-to-add and failure-to-delete error, but only after matching the two lists together to find which added and suppressed lines were included on the frame after listing. The input listing contained 9283 lines plus the 561 housing units I deleted, and L3 contained 10445 lines.1 Matching these two lists involved three steps. C.1.1 Step 1: Match by ID In the first matching step, I took advantage of the unique key that tracks addresses from the input list to the final frame. However, because lister have the ability to edit addresses on the input list as well as delete them and add them back, not all matches by address ID are true matches.2 In the first matching step, I matched only cases that had the same ID as well as the same address (house number, street, and apartment number). Only cases that matched on all of these attributes and were confirmed by the lister were considered matches. These criteria led to 7670 matched pairs. All cases where the address changed in any way between 1The frame created by the third listers contains 9597 units. The software, however, retains the units removed by the dependent listers and thus the database of units listed and deleted by the third listers is larger, 10445. To capture failure-to-delete bias I had to match to this larger list. 2Listers tell me that they edit addresses on the input list and delete them and add them back when the order of the input list is not correct. The find these techniques easier than reordering the input list using the software?s reordering tool. 122 theinputlistandthefinalframe, orwheretheaddresswasnotconfirmed, weresent on for further matching. C.1.2 Step 2: Automatic Address Match There were 2174 remaining unmatched lines from the input list (including the deleted lines) and 2775 remaining unmatched lines from L3. The next step in the matching process involved parsing the addresses into standardized pieces and matching these pieces to identify addresses that referred to the same housing unit. The Survey Research Center listing software collects addresses in three parts: house number, street and apartment number. I used a SAS macro to parse the street variable on both the input frame and L3 into four fields: ? Pre-direction: Street direction, when it precedes the street name (e.g. N, E, NW) ? Street Name: Street name (e.g. Main, 37th, Martin Luther King) ? Street Type: Type of street (e.g. Ave, St, Dr, Circle) ? Post-direction: Street direction, when it follows the street name (e.g. N, E, NW) The parser also standardizes the address parts to improve matching. Table C.1 gives examples of both the parsing and the standardization. With the full address parsed into 6 pieces (the four shown in Table C.1, plus the house and apartment numbers), the cases ran through the matching programs 123 Table C.1: Parsing and Standardizing Street Variable Parsed Fields Full street Pre-direction Street Name Street Type Post-direction Brooklyn Ave Brooklyn Ave North 49th Av N 49th Ave NW Cherry Hill Drive NW Cherry Hill Dr Dry Creek Road S Dry Creek Rd S Table C.2: Automatic Matches found, by Pass Field Pass 1 Pass 2 Pass 3 Pass 4 Segment X X X X House number X X X X Pre-direction X Street Name X X X X Street Type X X X Post-direction X X Apartment X X X X Matches Found 987 0 0 28 (SAS macros). The first pass required matches on all 6 fields, and identified 987 matches. Subsequent passes relaxed the matching criteria. For example, the sec- ond pass would match x31x34x39x35 x42x65x61x72x64 x41x76x65 to x31x34x39x35 x53 x42x65x61x72x64 x41x76x65. The final automatic matching pass would match x36x35x39 x57x61x79x6Ex65 x44x72 to x36x35x39 x57x61x79x6Ex65 x52x64. Table C.2 shows the matching criteria at each pass as well as the number of matches found. All passes required matching house numbers, street names and apartment numbers.3 Note that only cases which did not match in step 1 or in pass 1 went on to the later, more permissive, passes. That is, only if both x36x35x39 x57x61x79x6Ex65 x52x64 and x36x35x39 x57x61x79x6Ex65 x44x72 had no better partners would they be matched together. I carefully reviewed all matches from the later passes to ensure that they seemed reasonable. 3Cases on either list where the house number was missing or was some variant of ?No #? were ex- cluded from all matching in step 2 to avoid false matches. These cases with missing house numbers were matched in step 3. 124 C.1.3 Step 3: Manual Address Match To find all remaining matches, in the third step I reviewed all unmatched addresses. I created spreadsheets of all addresses in each segment, both matched andunmatched. Thesespreadsheetscontainednotonlythefulladdressofeachcase on the input list and on L3 but also the description provided by the lister, if any.4 The addresses on each list were sorted in street, house number and apartment order. I reviewed these spreadsheets carefully to identify matches that were not picked up in the previous steps. I identified five kinds of matches in this review. Inconsistent Apartment Numbers Listers used different designators to refer to apartments in a building: for example x41x2C x42x2C x43x3B x31x2C x32x2C x33x3B x46x72x6Fx6Ex74x2C x52x65x61x72x3B x31x73x74 x66x6Cx6Fx6Fx72x2C x32x6Ex64 x66x6Cx6Fx6Fx72x2C x33x72x64 x66x6Cx6Fx6Fx72x3B x31x30x31x2C x31x30x32x2C x31x30x33. Because steps 1 and 2requiredperfectmatchesonapartmentnumbers, theseunitswerenotmatched. When the two lists used different numbering schemes but agreed on the num- ber of units in a building, I matched all the units (in the order implied by the numbering). In those cases where one list contains more units at an address than the other list, I tried to deduce which designators referred to the same units. This set of manual matches also includes cases where one list thought a struc- ture was a single family home and the other saw more than one unit at the address. In fact, my manipulation of the input list made this quite likely as I both turned two unit buildings into single-family buildings and vice versa. In- 4Listers are instructed to provide descriptions of the housing units whenever the address does not include a house number or when it might be unclear which unit is meant. 125 terviewers are trained to approach the first unit when a selected single-family case turns out to be a multi-unit building, so I matched the single family unit to the first unit and left the other units unmatched. Typo in Street Name The automatic matching routines in step 2 always required a perfect match on street name. But in a few situations, the third lister edited the street name to correct misspellings. I matched these addresses during my manual review. Typo in House Number When two units appeared to be the same except for small differences in the house number, I matched these units. These matches were made only when there were no or very few other unmatched units with the same street name, because in these situations I felt that the interviewer would probably come to the same conclusion in the field if the case were se- lected. If an interviewer cannot find x31x32x31x39x35 x57x69x6Cx6Cx6Fx77 x52x64, but does see x31x32x31x35x39 x57x69x6Cx6Cx6Fx77 x52x64, she is likely to interview at the second address. No House Number When a house number is not available, listers are taught to write ?No #? in the house number field and include a description that uniquely identifies the housing unit. These kinds of units appear on both the input to the third listing and the output to the third listing. I was able to match these lines when they were the only unmatched cases on a street or when the descriptions made it clear they referred to the same unit. This process led to 251 additional matches, more than half due to inconsistent apartment numbers (see Table C.3). 126 Table C.3: Manual Matches Found Type Matches Found Inconsistent Apartment Numbers 136 Typo in Street Name 56 Typo in House Number 11 No House Number 45 More Dubious 3 Total 251 C.1.4 Quality Checks After matching I performed several quality assurance steps. I carefully re- viewed all computer and manual matches and the match types. C.2 Matching Three Frames The process of matching the three final frames was similar to that used in matching the input to the third listing to the output. I first matched addresses unit strict matching criteria on all address parts and then relaxed these criteria, allow- ing divergent street directions and typos. I then performed manual matching that could account for spelling errors and other small differences. The principles used in matching were the same in this round as in the previous round. All matches were within segment and were one-to-one. Once again, my goal in matching addresses was to identify those listed cases that would lead to the same housing unit being approached for an interview. There are three ways in which this matching round was different from that described in section C.1 of this Appendix. First there is no ID across the three 127 frames so no ID matching was possible. Second, I matched only listed addresses on the three frames. Any addresses not verified, or added and then deleted, are not part of what they think is the housing unit frame for the segment and were not matched. Third, this round of matching involves three address sources rather than just two, which complicates the process as discussed below. The first listing contained 9423 listed lines, the second 9345 and the third 9597 (see Table C.4). After parsing and standardizing the addresses as described in section C.1, I used SAS to identify perfect matches across all of the address pieces in all three frames. This process found 6169 triples. I then used the matching macros, the same ones described above, to find identical pairs of addresses across listings that matched to a unit in the third listing at decreasing levels of strictness. This process identified another 419 matched triples. Next I created spreadsheets of all units in each listing (both matched and unmatched) for manual matching, in the same way described above, and found another 1165 three-way matches. I reviewed these spreadsheets several times, ordering the units within segments in listing order and also in street name and numbers order. The total number of three-way matches is 7751.5 There are also 1322 pair matches without a third. I recognize that there will likely be some false matches and false nonmatches that I cannot eliminate. If these errors are correlated with any of the independent variables in my models, my results will be biased (Carroll et al., 2006, pg. 345-352). 5During manual review, I dissolved some of the matches found in the less-strict SAS matching steps. For this reason, the number of final three-way matches does not quite equal the sum of the numbers given above. 128 Table C.4: Number of Housing Units Listed for Each of Three Listings, by Segment Segment L1 Lines L2 Lines L3 Lines 1 55 48 72 2 106 110 106 3 155 154 141 4 164 182 184 5 89 79 83 6 87 87 87 7 130 162 128 8 210 204 211 9 108 101 108 10 93 106 94 11 122 119 123 12 98 94 98 13 149 124 148 14 96 101 155 15 159 194 239 16 109 111 106 17 96 93 95 18 108 89 104 19 83 83 81 20 74 81 76 21 80 80 79 22 84 84 84 23 122 139 115 24 584 580 553 25 95 96 96 26 88 84 87 27 626 626 642 28 103 100 104 29 165 198 165 30 271 312 284 31 152 226 154 32 95 97 95 33 233 131 202 34 2337 2146 2342 35 99 95 98 36 82 82 79 37 95 95 94 38 162 162 163 39 417 403 411 40 236 239 234 41 94 95 94 42 110 110 102 43 110 104 105 44 118 122 130 45 82 86 77 46 144 141 145 47 78 119 158 48 160 158 155 49 110 113 111 Total 9423 9345 9597 129 Chapter D Appendix: Interviewer Questionnaire This interview is voluntary and confidential. Your answers will not be identified with your name, and they will not be shared with your supervisors or human resources department. Your answers to these questions will help researchers do basic research on the survey process and better understand the data in this survey. Q1. In addition to the University of Michigan, have you ever worked as an interviewer at any other survey or market research organization? 1. Yes 5. No Q1a. In addition to the National Survey of Family Growth (NSFG) Cycle 7, have you worked as an interviewer on any other University of Michigan survey projects? 1. Yes; have worked on other UM survey projects 5. No; NSFG Cycle 7 is my first UM survey project [IF Q1=NO AND Q1A=NO, GO TO Q3] 130 Q1b. Which previous NSFG cycles have you worked on? Please check all that apply. ? Cycle 1 (1973) ? Cycle 2 (1976) ? Cycle 3 (1982) ? Cycle 4 (1988) ? Cycle 5 (1995) ? Cycle 6 (2002) ? I did not work on any previous cycles of NSFG Q2. Was your previous interviewing experience doing in-person, telephone interviews, or both? 1. In-Person 2. Telephone 3. Both Q3. Including working on NSFG Cycle 7, how many months or years have you been an in-person field interviewer? 0 for less than one month, Enter months and years 131 Q4. Including all types of interviewing, about how many survey projects have you ever interviewed on? 0. NSFG Cycle 7 is my FIRST survey project [GO TO Q7] x5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5F x23 of survey projects Q5. Have you ever worked on a survey with the following content? (SELECT ALL THAT APPLY) 1. Sexual activity 2. Drug use 3. Criminal activity 4. None of the above Q6. On how many survey projects have you used a computer to do interview- ing? x5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5F x23 of survey projects Q7. Before coming to training had you ever used a ?stylus? (the electronic pen used on the tablet) on a computing device (including on a PDA or a Palm Pilot)? 1. Yes: have used stylus 5. No: never used stylus before NSFG training Q8. Besides being a Field Researcher on the NSFG, do you currently have any other paying jobs? 132 1. Yes 5. No [GO TO Q9] Q8a. How many hours per week, on average, do you work on your other job(s)? x5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5F x23 of hours per week Q8b. Is/Are your other paying job(s) as an interviewer or something else? 1. Interviewer 2. Something Else 3. Both Q9. What is the highest level of school you have completed? 1. High school graduate or GED 2. Some college but no degree 3. 2-year college degree (e.g., Associate?s degree) 4. 4-year college graduate (e.g., BA, BS) 5. Graduate or professional school Q10. Are you currently attending school or college to get a degree or certifi- cate? 1. Yes 133 5. No Q11. What is the month and year of your birth? x5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5F month x5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5F year Q12. Are you Hispanic or Latina, or of Spanish origin? 1. Yes 5. No [GO TO Q13] Q12a. Are you ... 1. Puerto Rican 2. Cuban 3. Mexican 4. Central or South American 5. Some other Hispanic or Latina group (specify) Q13. Which describes your racial background? Please select one or more groups. 1. American Indian or Alaska Native 2. Asian 134 3. Native Hawaiian or Other Pacific Islander 4. Black or African American 5. White [IF ONLY ONE RACIAL GROUP SELECTED, GO TO Q14] Q13a. Which of these groups would you say best describes your racial back- ground? 1. American Indian or Alaska Native 2. Asian 3. Native Hawaiian or Other Pacific Islander 4. Black or African American 5. White Q14. Do you speak any languages other than English? 1. Yes 5. No [GO TO Q15] Q14a. What language(s) do you speak? 1 Spanish 135 7 Other (specify) [GO TO Q15] Q14b. On a scale from 1 to 5, where 1 means ?barely conversational? and 5 means ?native Spanish speaker,? how proficient do you think you are in Spanish? Q15. What religion are you now, if any? 1. None 2. Catholic 3. Jewish 4. Baptist or Southern Baptist 5. Methodist, Lutheran, Presbyterian, or Episcopal 6. Other Protestant or Christian religion 7. Hindu, Buddhist, or Muslim 8. Other Religion [specify] Q16. Currently, how important is religion in your daily life? Would you say.... 1. Very 2. Somewhat important 3. Not important 136 Q17. What is your current marital status? 1. Married 2. Not married but living with a partner 3. Widowed 4. Divorced 5. Separated 6. Never been married Q18. Which category represents the total yearly income for your household during the past 12 months? 1. Under $25,000 2. $25,000-$34,999 3. $35,000-$49,999 4. $50,000-$74,999 5. $75,000 or more Q19. How many babies, if any, have you ever given birth to? x5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5F x23 of babies [IF Q19=0, GO TO Q20] Q19a. How old are your children? Please enter all that apply. 137 0. Under 5 years old 1. 5-12 years old 2. 13-18 years old 3. 19 years old or older Q20. In what city and state do you live? x43x69x74x79x2Fx54x6Fx77x6Ex5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5F x53x74x61x74x65x5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5F Q21. What county do you live in? x5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5F x43x4Fx55x4Ex54x59 Q21a. How many years have you lived in that county? x5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5Fx5F x23 x79x65x61x72x73 Now we have some questions about your opinions and attitudes about the interviewing process. This information will be used for statistical purposes only and has no effect on your employment with the University of Michigan. Q22. How strongly do you agree or disagree with the following three state- ments? Q22a. Most of the time I can/I will be able to figure out what a respondent?s real objections are to a survey. 1. Strongly Agree 138 2. Agree 3. Disagree 4. Strongly Disagree Q22b. I can/I will be able to persuade people to agree to interviews better than most other interviewers. 1. Strongly Agree 2. Agree 3. Disagree 4. Strongly Disagree Q22c. No matter what I do, there are/there will be some respondents who will never agree to participate. 1. Strongly Agree 2. Agree 3. Disagree 4. Strongly Disagree Q23. Interviewers sometimes have difficult decisions to make while doing their jobs. Which one of the following statements comes closest to how you feel as an interviewer? 139 1. IT?S BETTER TO PERSUADE A RELUCTANT RESPONDENT TO PARTIC- IPATE THAN TO ACCEPT A REFUSAL, EVEN WHEN YOU FEEL THEY WON?T GIVE VERY ACCURATE ANSWERS. 2. IT?S BETTER TO ACCEPT A REFUSAL FROM A RELUCTANT RESPON- DENT THAN TO PERSUADE THEM TO PARTICIPATE WHEN YOU FEEL THEY WON?T GIVE VERY ACCURATE ANSWERS. Q24. This question uses a scale from 1 to 10, where 1 means ?dislike very much? and 10 means ?like very much?. You can use both of these numbers, plus all of the numbers in between. Please answer with a number from 1 to 10 to describe how much you like or dislike each of the following tasks: a. Approaching the household/doorstep introduction b. Gaining cooperation c. Conducting the interview d. Working with a supervisor (field operations coordinator/production manager) e. Working on a team f. Completing paperwork associated with an interview g. Converting reluctant informants and respondents Q25. Thisquestionusesanattractivenessscalewhere1means?veryunattrac- tive? and 10 means ?very attractive?. You can use both of these numbers, plus all of 140 the numbers in between. Using this attractiveness scale, please signify the attrac- tiveness of each of the following tasks in your job as an interviewer: a. Flexible work hours b. Relevance/importance of survey research c. Pay d. Interacting with a variety of people Now we have some questions about your opinions and attitudes on a few top- ics, including some covered by the NSFG survey. This information will be used for statistical purposes only and will have no effect on your employment. Q26a. Sexual relations between two adults of the same sex are all right. 1. Strongly Agree 2. Agree 3. Disagree 4. Strongly Disagree Q26b. Any sexual act between two consenting adults is all right. 1. Strongly Agree 2. Agree 141 3. Disagree 4. Strongly Disagree Q27. A young couple should not live together unless they are married. 1. Strongly Agree 2. Agree 3. Disagree 4. Strongly Disagree 142 Chapter E Appendix: Development of Housing Unit Level Characteristics in NSFG Dataset The entire NSFG listing dataset has three observations of every housing unit. Thus deriving housing unit level characteristics is not straightforward. The vari- ables at the housing unit level in my models are multi-unit, multi-unit large, trailer, no number and several variables related to the interviewer dispositions. below I provide details on how I created these variables. Trailer If any lister indicated that a unit was a trailer, in the apartment or de- scription field, then the housing unit is a trailer on all observations of the unit. Multi-unit Units in multi-unit structures can be designated with apartment num- ber or in the street number field (for example: 1950A, 49 1/2, 1175-1). If the housing unit was listed by only one lister, and the lister included any of these designations, then the unit is flagged as in a multi-unit building in my dataset. If the housing unit was listed by two or three listers and more than half of the them designated it as multi-unit, then it is flagged as multi-unit at the housing unit level. If exactly half of the listers indicated the unit was in a multi-unit structure (one of the two listers who included the unit) then I flagged the unit as a multi-unit (unless I had manipulated the multi-unit 143 status of the unit in the third listing). Any units coded by this process as both trailers and multi-unit were recoded as single family. Multi-unit in small building Housing units flagged as multi-unit were catego- rized as in small buildings by counting the number of units in the same build- ing within each lister (usually those with the same house number and street name). If the minimum number of units int eh building across the three lis- ters was fewer than 19, the unit was code as in a small building. No Number If any lister recorded an address as not having a street number, I flagged the unit as a no number case. Disposition data For each case selected for interview, the assigned interviewer, who was also the first lister, assigned a code capturing the outcome of the screening and interviewing process. These codes are, broadly: no contact, refusal, screener complete not eligible, screener complete eligible, out of scope (vacant) and improper listing (nonresidential or out of segment). The codes were assigned to each selected unit by only one lister and thus no decisions were need to create housing unit level variables. 144 Chapter F Appendix: Debriefings with SRC Listers and Interviews In the fall of 2009 I conducted individual debriefings with trained listers and interviewers from the Survey Research Center. These were interviewers who had completed some listing work for my project. The questions I prepared for discussion are given in Section F.1 but the meetings were informal and I encouraged the sub- jects to talk with me about anything they thoughts was relevant. Section F.2 gives quotes from the transcripts themselves that support the discussions in Chapters 2 and 3. I promised confidentiality to the listers and thus cannot reveal their names or print the entire transcripts. I am very thankful to the listers both from their listing work for this project and for their participation in these discussions. The assistance provided by these listers greatly improves my research. F.1 Questions to Guide Debriefing Discussions INTRODUCTION: Thanks for your time Little bit about who I am: ? Student at Maryland ? Dissertation research on listing 145 ? Working with NSFG folks, but not employee of Michigan ? happy to answer questions about research, at the end Calling only as a researcher ? won?t report individual discussions back to Sharon, Nicole or any ISR ? YOU ARE THE EXPERT May I record so I don?t have to take a lot of notes? What interviewing experience do you have? What listing experience? NSFG only or other studies too? Other companies? Questions for experienced interviewers: Some studies do a missed housing unit check, looking for housing units that were not on the frame. Does your study do that? ? Do you understand why we do it? ? Do you think most FIs do this correctly? Interviewers sometimes notice errors in the listing. Have you noticed listing errors while interviewing? ? If yes, please describe most common error ? What causes these errors? 146 Another issue I?m concerned with is overcoverage-when HUs outside of the selected segment are listed and selected. ? Do your maps show you where the segment boundaries are? ? Have you ever been asked to check if the selected housing units are actually inside the segment? ? Does this work well? ? Do you think listers list outside of the segment? Are there characteristics of a housing unit that make you think "this one is not going to participate"? Questions for experienced listers: DEFINE SCRATCH / UPDATE LISTING. Have you done traditional or de- pendent listing? Have you done listing on paper or just on the computer? What differences have you noticed in listing procedures if you work for differ- ent projects and/or companies? Follow up method: ? Do you feel that the list you?re updating sometimes gets in the way, and it would be easier to list from scratch? ? Or do you think update listing is usually easier? 147 Follow up paper/computer: ? Do you prefer paper or computer listings? ? Are there some things that are easier on paper? ? Are there some things that are easier on the computer? I know that the central office reviews all the NSFG listings and sometimes asks questions or has a lister redo part of a listing. ? Did ask you to redo any of your listings? ? What did they think was wrong? ? Was it? What kids of units do you think we might miss in listing? I?m sure it?s sometimes hard to figure out how many units are in a multi-unit building. What do you do in these situations? Listers have said that some kinds of segments are more difficult to list-those with invisible boundaries or really large rural segments. What do you think? ? PROMPT: What do you think are the kinds of segments that have more listing errors? ? What about urban areas might make them difficult? Have you listed a segment with a nvbb (invisible boundary)? ? Did you figure out what the boundary was? STREAM, POWER LINE, POLIT- ICAL BOUNDARY 148 ? How did you decide what to list? Are there other kinds of segments that you think we list poorly? Do you talk to neighborhood residents when listing? As informants, or just chit-chat? Do you speak to other informants, such as post offices, fire stations etc.? From what I?ve heard, listers have been questioned by residents or even police when listing. Has something like this happened to you? How did you handle the situation? Do you drive while listing? Have someone drive for you? Walk? Do you enjoy listing work? More or less than interviewing work? Have you heard of listers ever making deliberate listing errors? Why might a lister do that? ASK ABOUT EXPERIENCES IN LISTING FOR MY PROJECT (WHICH THEY KNOW AS THE ST-TEST PROJECT) F.2 Quotes from Transcripts of Debriefing Discussions I conducted seven debriefing sessions and have recordings of six of them. (Due to technical difficulties, one was not recorded, though I do have notes I took during the debriefing.) I promised each respondent confidentiality, thus I cannot print the transcripts in full and cannot reveal the names of the interviewers I spoke to. Below are selected quotes (with some identifying details removed) that support the claims made out the listers? behaviors and concerns in Chapters 2 and 3. Note that 149 the listers use the terms scratch and update to refer to traditional and dependent listing. F.2.1 Quotes from Lister A LISTER A: The hardest ones to determine when you?re listing are one of the non-visible barrier. MS. ECKMAN: Right, I?ve heard about those. What have you seen? LISTERA:Well, I?veonlyreallyhadthose, Ithink?didIhaveoneinascratch listing? I try to look for a big ditch ? MS. ECKMAN: Mm-hmm. LISTER A: ? which has very often been a barrier, because in many tracts of land, what looks like a simple ditch is some kind of a creek run on a flat map. MS. ECKMAN: Oh, I see. LISTER A: Sometimes I?m not real sure that I got the right spot. But in ? there?s usually some kind of gap between houses if you?re in a subdivision. MS. ECKMAN: Mm-hmm. LISTER A: Because it would be wetter than somebody would really want to build on anyway at certain portions of the time probably (inaudible). But I had one that was a county ? not a county line, a township line, and I called the township office and asked them, can you please tell me what the last address would be ? MS. ECKMAN: Oh. LISTER A: ? in this township? Because I figured out that it was a township 150 line and I called and it was. They verified that and what the last address would be. And this house was their township; that house was not. ... LISTER A: Well, I?m driving up this path which is rough blacktop, not very wide with the brushy trees not greened out yet on both sides. And I haven?t made it up around ? and I?m just starting to go around the bend and I ? it just feels awful. MS. ECKMAN: Hmm. LISTER A: And I don?t want to go any farther and I worked very hard at backing out of the driveway. Well, I told my team leader, I said, it just felt wrong. MS. ECKMAN: Hmm. LISTER A: And it was about 3:30 on an afternoon on a weekday. It was bright, sunshiny, spring, whatever, and I said, there was no reason that it should have felt bad, but it did. And when they had a traveling interviewer come and one of the places they sent her was there and I ran into her in the hotel to trade some info, she said, Martha, there was nothing wrong back there. Well, it was a home that was up for sale. There was nobody living in it. It was very secluded back there. MS. ECKMAN: Uh-huh. LISTER A: So, who knows what kids could have been doing what at 3:30 after school got out if they know there?s a place where nobody is right now. MS. ECKMAN: Right, right. LISTER A: And who knows what was going on the day that I was driving up that driveway. 151 MS. ECKMAN: Right. LISTER A: Well, for some reason, I felt awful and I left. ... LISTER A: But you?re driving down streets very slowly, you have flashers on and someone had come out and ? I made a second pass down the road to check on something, and somebody had come out and even like almost knocked on my car, this older lady. And I?m saying older my age now, like 65. She said, what are you doing? I said, okay. And by George, within 15 minutes, there was a policeman on my tail. MS. ECKMAN: Oh, wow. LISTER A: And he was saying, what are you doing? And I showed him all my paperwork because I am legit. But it was ? it was one of the newer subdivisions and it was a subdivision that?s not in town or near town, it?s a bit of country and a bit of subdivision, and those are the hardest, those are the hardest. F.2.2 Quotes from Lister B LISTER B: Yeah, I would say the rural and big apartment complexes are difficult because ? yeah, in , they have that invisible boundary line. MS. ECKMAN: Oh, did they? LISTER B: Yeah. I could not ? I?d finally think I?d got it, but I don?t know for sure. I don?t have anything ? I wish I had something to definitely say that was it. I remember Judy, my FOC, saying, somebody there could tell you if that?s a ? you 152 know, that that?s the invisible boundary line. But I didn?t have anybody to ask. MS. ECKMAN: Oh, I see. LISTER B: (Inaudible) that myself before I get out there, you know, instead of asking. But, yeah, that?s difficult, invisible boundary lines ... LISTER B: Yeah, I prefer to drive because you don?t ? you?re not as suspicious- looking for the most part, like in an urban area, where there?s apartment buildings, you can just drive and rest, keep going, not draw too much attention to yourself. If you?re out listing, it does kind of ? it can rouse a little suspicion, a little more suspicious. So, that?s why I would prefer to ? MS. ECKMAN: Yeah. LISTER B: You know, just a little more incognito by driving. F.2.3 Quotes from Lister D LISTER D: Well, in the updating, you have the list and sometimes you think, well, you know, something?s there and maybe I?m just not seeing it, you know. So, you really kind of are tempted sometimes to just, you know, confirm it. MS. ECKMAN: Mm-hmm. LISTER D: And if we can?t get into a locked area, of course, then that?s what we do, we just confirm them. Sometimes there?s a gated and guarded community and we can?t get in. So, we just confirm all the ones that were listed there. So, it could be right, it could not be. We don?t know. 153 ... LISTER D: Yeah. Well, we ? the only way we can do that is list like the doorbells or whatever and go up there and see what mailboxes are there. Usually it?s inside the door or ? it?s usually inside the front door there and look in and see what the numbers are or by the mailboxes. And if we find another gas meter or something like that, we can just put down no number, you know, and then when ? if that one is selected for our screen, then we?ll be more ? then we?re able to go into the building, knock on the doors and find out, well, there?s not really another one, so then we can just close that one out as improper listing. But we?d rather have more listed there, you know, as a possibility than less. We don?t want to miss any. ... MS. ECKMAN: And has anyone ? the residents, have they kind of challenged you while you?re listing? LISTER D: Oh, oftentimes, yeah. I had a guy with a gun on his hip, you know (inaudible) in Alaska. What are you doing on this street? This is private property. And I said, well, jeez, there?s several houses here, I thought it was just a public street, you know. He says, no, and I want you off here. So, you know, you kind of telling them you?re doing ? you?re just updating maps and getting information. They usually accept that. You know, you show them your identification from the University of Michigan and you have a little sign in the window that says you?re on official business. 154 ... MS. ECKMAN: Right. Now, the one thing I worry about specifically with rural areas is these non-visible boundaries. LISTER D: Yeah. I mean, yes. MS. ECKMAN: So, I guess ? you?re giggling. I guess that means that they can be a problem for you. LISTER D: Oh, yes. MS. ECKMAN: So, just tell me about those. Do you get out there and then you can figure out what it is or sometimes you can?t figure it out or ? LISTER D: Sometimes there?s just no indication what it is and the only way you figure it out is by looking at the map and seeing where the curve in the road is and how far it should be ? you know, there?s a mileage gauge on the map, and so, you can figure it out that way. And sometimes you ? it might be just a county boundary and there?s no sign saying, okay, you?re going into another county. MS. ECKMAN: Mm-hmm. LISTER D: It?s just, you know, invisible. MS. ECKMAN: Yep. LISTER D: So, that?s the way you figure it out. You just figure out the curves in the road and how far it is from the road you?ve just come from, a crossroad or something like that. ... LISTER D: And you can just go down there and check and easily just do it 155 from your car. And that?s ? there?s no need to walk. Of course, busy road, of course, you have to walk because you can?t be slow-moving traffic there. And rural, you cannot walk, of course, because it?s way, way, way too far between places. MS. ECKMAN: Mm-hmm. LISTER D: If you?re going to have an apartment, you need to walk. You need to find out the apartment and also the office to get some information there. MS. ECKMAN: So, you?re saying you prefer to walk areas where there?s apart- ments that ? LISTER D: Oh, yes, definitely, mm-hmm. MS. ECKMAN: Mm-hmm. LISTER D: But I prefer to drive if I can. But if it?s going to ? if it?s too difficult, then, of course, I?ll walk. But, you know, in most subdivisions that are pretty evenly spaced, it?s easy to just drive along and find the ? if I find that it?s difficult finding the house numbers, of course, then I?ll get out and walk. But I think in most of the areas, I was able to just drive. F.2.4 Quotes from Lister E LISTER E: So, if it looks like there are two apartments, it?s better to list them. Then you might go later and they say, oh, no, we?ve been combined into one now. ... LISTER E: So, you just go through it and click confirm, you know, pretty much, maybe have to add or delete something here or there, but it can be really 156 helpful. So, yeah, you?re clicking. Now, there sometimes you might have a house that has ? you know, it will say apartment one, apartment two, and you don?t see apartment two at all. MS. ECKMAN: Right. LISTER E: But from what I?m recalling in the training ? and I usually do this ? we?re encouraged to leave it in and then later try to see if that apartment two is, in fact, there. Unless I?m just really positive there cannot possibly be an apartment two, then I?ll delete it. But sometimes I get into a little decision making there. I think, wow, I just don?t see it at all. MS. ECKMAN: Mm-hmm. LISTER E: So, it can also ? you know, it can be easier and it can help in- clude things that aren?t visible because sometimes there is an apartment and the entrance is in the back. There?s no way I would have seen that. MS. ECKMAN: Right. LISTER E: And that?s ? you know, it?s more likely that you will include that. I mean, that?s important with a scratch listing not to get kind of just in automatic mode, that you?re just going down the street, confirm, confirm, because sometimes there is something that isn?t included in that original update list and you have to make sure you get it. There can be a whole block that was missed or a whole little cul-de-sac. MS. ECKMAN: Really? LISTER E: So, you have to be really ? not lazy about it, you know, really alert. Sometimes ? I?ve had it in the past where I?ve had update listings where the 157 information provided to me was not good. ... LISTER E: Or an unsafe area where I feel like I shouldn?t necessarily get out of my car a whole lot. I should, you know, do it mostly by car. So, those are hard to ? or if it?s a long drive and a no trespassing sign, well, I kind of venture up that drive. But, again, you?re thinking about these safety issues and maybe you?re holding back a little more than you would otherwise. ... MS. ECKMAN: Right, right, I can see that. So, my next set of questions here is about segment characteristics. You?ve already mentioned segments that are dangerous being difficult to list. LISTER E: Mm-hmm. MS. ECKMAN: And let me just confirm, I think what you?re saying there is that a dangerous neighborhood, you?re more likely to drive while listing and, therefore, maybe miss some things. Is that what you were saying? LISTER E: It?s possible. And, again, it depends on the kind of street. If it?s not a busy street, if it?s just a side street and you can drive along the side of the street and stop and park your car and then maybe get out and look if you have to, that?s not a problem. But the problem is if it?s not a real good (inaudible) and it?s a busy street. MS. ECKMAN: I see. 158 LISTER E: And you can?t park at all, you can?t stop along the street. Well, if need be, then I?ll have to park somewhere and get out and walk and then go back to my car. I mean, there?s some you just can?t look from your car if it?s dangerous with traffic. MS. ECKMAN: I see. So, you?re not talking so much about fear of crime, more about fear of traffic. LISTER E: Well, it?s more traffic. Although if there?s fear of crime, I will try to stay in my car more. MS. ECKMAN: Mm-hmm. LISTER E: I mean, obviously, when I work the segment, I?ll have to get out and walk around. But, you know, I don?t walk unnecessarily. I will stay in my car where it?s feasible. But I?ll also try to go at a time of day when it?s probably okay, you know, in the morning, Sunday morning maybe. ... MS. ECKMAN: Okay, yeah. I?ve seen in some of the training materials these non-visible boundaries, NVBB. LISTER E: Oh, they can be difficult. MS. ECKMAN: Yeah. LISTER E: Can be. It can be easy. It can be really easy once you get, you know, kind of in the habit of it. You know, it could be a ? like maybe it?s indicated on the map that it?s water. Well, then there might not be any water, but you see a little dip or a little vegetation line where you think, yeah, some times of the year, 159 there might be a little trickle of water in that. It?s not always, you know, a flowing river. Or it can be a power line or you might notice suddenly the road isn?t paved or the numbering system on the mailboxes can change. ... MS. ECKMAN: We talked about driving and walking. It sounds like driving can be more convenient. LISTER E: It can, uh-huh. You can ? you know, you can get more of an overview maybe and cover an area more quickly certainly. But if you can?t pull over, I mean, then it doesn?t work. You have to be able to stop your car there and enter. I mean, I could pull ahead three houses and enter those three, or if need be, just quick write the addresses on a piece of paper if I can?t stay there very long on the side. MS. ECKMAN: I see, uh-huh. LISTER E: Then enter them into the computer. Or sometimes I ? if it?s an update, I?m hardly able to stop, you know, but I can go maybe slowly. So, I can?t be looking down at my computer and then up at the house, so I write down ? for that block, I write down all the addresses on a pad of paper and I hold that pad up near my steering wheel, and then I take a pen and I kind of check off these addresses as I pass them. Then I might have to go around that block three or four times to do that because of the traffic where you can?t stop. So, that can be ? and I guess you could get out and walk in an instance like that. It?s kind of a toss up. 160 F.2.5 Quotes from Lister F LISTER F: But I think I?d almost like to do one by scratch because the updat- ing you kind of ? I don?t know, it just ? I guess it may be hard to see the mistakes sometimes and then you realize, wait a minute, this is really on the opposite side of the street and it throws you off. So, you got to really think that you really have to do it as if you were doing it on your own and not relying on what?s there, because some houses have been missed that have added and some of them you wonder why they?re not included. So, I was just listing in Rhode Island and it was a group house, but there was no way of knowing that that was a group house ? ... LISTER F: Yeah, just, you know, if there?s two families living in one unit and that you may not come ? or sometimes going around to the back end of an apartment, you know, if it?s an old housing, like a tenement. MS. ECKMAN: Mm-hmm. LISTER F: You know, if you go around to the back, you may miss something in the back of the households. MS. ECKMAN: Right. LISTER F: I remember doing that in Michigan and Ann Arbor seeing where there may be a household right in the back of the building. MS. ECKMAN: Right. LISTER F: That you may not even know from the street. So, you really ? yeah, this really can be on foot when there ? it?s less rural, more buildings closer 161 together. MS. ECKMAN: Mm-hmm. LISTER F: It really (inaudible) me to ? to be really thorough, you need to get ? do it on foot and really check in the back, too. ... MS. ECKMAN: Mm-hmm. Well, now, we?ve talked a lot about the kinds of housing units that are difficult. What about segment level characteristics? Are there segments that are harder to list and easier to list? LISTER F: Yes, the ones that have an invisible boundary. MS. ECKMAN: Mm-hmm. LISTER F: The town lines. I think the ones ? I think most interviewers stress this over and over again. You know, the quality of the maps that we receive when especially we?re going into an area we haven?t been and there?s an invisible bound- ary. We?re on the lookout for it, but it?s not always visible when we?re there if it?s a town line. ... MS. ECKMAN: Mm-hmm. So, you have, yourself, encountered some of these non-visible boundaries, huh? LISTER F: Yes, mm-hmm. MS. ECKMAN: Yeah, everyone mentions those as being particularly trouble- some, but I?m not ? I can?t get my mind around them. I?m not exactly sure what 162 that?s about. I think I should go out and ? LISTER F: Well, there could be ? there could be ? I guess mostly it?s town boundaries that you don?t see or any markers. Sometimes they ? if it?s wintertime and they have a stream that you want to start at or something like that ? MS. ECKMAN: Oh, mm-hmm. LISTER F: ? you cannot see it, or if it?s in the fall when it?s dry, you can?t tell, and overgrown. So, you cannot tell if one house is included or not, where your starting point is. MS. ECKMAN: Right. LISTER F: That can be unclear. 163 Chapter G Appendix: Census 2010 Address Canvassing Whistleblower Post Post from My Two Census blog, 10/5/2009 x68x74x74x70x3Ax2Fx2Fx77x77x77x2Ex6Dx79x74x77x6Fx63x65x6Ex73x75x73x2Ex63x6Fx6Dx2Fx32x30x30x39x2Fx31x30x2Fx30x35x2F x66x65x61x74x75x72x65x2Dx72x65x61x6Cx2Dx73x74x6Fx72x69x65x73x2Dx66x72x6Fx6Dx2Dx74x68x65x2Dx63x65x6Ex73x75x73x2Dx62x75x72x65x61x75x2F I worked in the New York City area as a lister during address canvassing and was disappointed with how the operation was conducted. One of my colleagues pointed me to this website some time ago and I felt compelled to share my story. We had alot [sic] of the technology glitches in the hand held computers [HHC] that are widely know by now which included: ? software issues such the [sic] program freezes ? transmission problems such as the Sprint cellular network being down and missing assignments and map spots ? hardware issues such as the fingerprint swipe not working But New York City has its own problems and is a completely different beast in itself. New York City is the most densely populated city in the United States and each neighborhood has its own unique character. The Census Bureau tries to monitor productivity but the very nature of the city makes it very hard to moni- tor. Since all the units of multi unit apartment buildings are listed separately a 164 lister has to key in every entry. Comparing someone who has an assignment with high rise apartment buildings versus someone who has single family homes is like comparing apples with oranges. Duringaddresscanvassingwewereinstructedtofindsomeonewhowasknowl- edgeable about where people live or could live. But locating a knowledgeable re- spondent was easier said than done. There are small tenement buildings in Chi- natown and Harlem brownstones; where there are illegal subdivisions. It is very difficult to gain entry or make contact even if you speak the language. There are also a lot of abandoned construction sites where developers tried to take advantage of the real estate boom after September 11th but found themselves out of money in the current recession. Luckily for the Census Bureau, the current recession produced a talented pool of very intelligent and highly educated workers. My crew leader was knowledgable andagreatleader. Fromtheverybeginninghewascommittedtodoingthingsright. He said that he was continuously told a proper address canvassing operation would be the cornerstone of a successful enumeration. He was thorough and all the work was quality checked by one of the other listers or his assistant. When we couldn?t gain access to a building, he encouraged us to try again and gave us additional work to keep us productive. In the end we had all these partially complete assignments where we had one or buildings we either couldn?t get into or make contact with anyone. However the office was less than empathetic to our thoroughness. Our crew leader told us that Assistant Manager of Field Operations, field operations supervisors (FOS) and crew leaders in other districts would belittle those who were 165 behind. They would constantly say things like ?John?s district is 40% complete why aren?t you 40% complete?? We were told that if we couldn?t gain access to a building after two visits we had to accept what was in the HHC as correct. Many of us were tempted to falsify work and accept what was in the HHC as correct but my crew leader and FOS were adamant about not doing that. One of the other listers found an entire building with over 200 single illegally divided rooms. The HHC had less than 10 units listed in it. If they accepted was in the HHC as true they would of missed over 200 housing units. At the beginning of the fouth [sic] week, my crew leader and several others were written up for being unproductive because they weren?t working fast enough to complete their assignments. They asked the Field Operations Supervisor to ap- prove the writeups. One of the Field Operations Supervisors refused to sign the writeups and they wrote him up also for being insubordinate. During address canvassing we were to document any additions, or deletes to the address list on an INFO-COMM which is a carbon copy paper. They said that they were hiring clerks to reconcile INFO-COMMs between the production and quality control. The sheer volume of having to go through 2000 pieces of paper is mindboggling. Originally, theplanwastousetheINFO-COMMstohelpthequality control listers, but they wanted to keep the operation independent so quality control wrote an additional INFO-COMM. All told we wrote out over 2000 INFO-COMMs. The handheld computer also had glitches. They switched crew leaders in dis- tricts that weren?t working fast enough and sometimes just reassigned work. When listers saw their timesheets weren?t approved they submitted additional timesheets 166 electronically. The new crew leader approved it and then they accused these listers of intentionally trying to milk the government clock. They accused half of an entire crew of listers of clocking overtime. Nonetheless with all the problems most of the listers worked quickly and breezed through their assignments. By the end of the first week we were about 25% done but they decided to train another 100 listers, by the end of the second week we were halfway done and some crews were almost done but they trained an- other group of listers. Some of these listers were trained and received no field work because there was none. All told we trained over 100 listers who received less days of work than the four and half days worth of training they received. The thing to realize is that this was a poorly planned operation from the very beginning. The Census Bureau will waste money for government contracts on hand held computers that are shoddy and unreliable and training staff for which there is no work. But they will try to cut corners when it comes to their mission of counting each person accurately. In order to try to save money and finish ahead of other regions they used intimidation and the threatening of employees. I?m glad that Field Operations Supervisor stood up to the higher ups because like my crew leader said to me...they?re just of [sic] bullies. When the address canvassing operation finished up it was alleged that some of the crew leaders and field operations supervisors told their listers since there was no regard to quality that they could skip making contact even going as far as not conducting field work and enter the units at home. There is no way that listers who were reassigned work magically gained access to buildings people couldn?t access 167 for weeks unless they accepted what was in the HHC as true. The crew leaders and field supervisors who finished first were rewarded with additional work. Those who finishedlastweresometimes?writtenup?asunproductiveandtheofficeterminated their employment. Luckily this story has a happy ending. My crew leader didn?t fire any of us for clocking overtime. What they found was that the payroll system was mistak- enly rewarding people overtime if they worked over eight hours during a work day even though they were below forty hours in a week. Someone was able to view the timesheet submissions in the office and prove all these listers weren?t clocking overtime. It was rumored that someone who discovered this was the same FOS who refused to sign the writeups. As for thousands of INFO-COMMs they are sitting in the office file cabinets gathering dust maybe someday someone will go through them. I highly doubt it given the sheer magnitude. I think my crew leader was incredible. And from what I heard from some of the listers that met him their Field Operations Supervisor was even better. I never got the chance to see him but I am honored to have worked with someone who is willing to jeopardize his job for what was morally right. I am surprised I received a phone call the other day to work in the next operation Group Quarters Validation. But I?m pretty sure that my crew leader or FOS won?t be returning anytime soon. 168 Chapter H Appendix: Content of NSFG Cycle 7 Female and Male Questionnaires H.1 Female Questionnaire Adapted from NSFG Cycle 7 Staff (2008, Figure 6, pp. 13?14). Section A ? Respondent demographic characteristics (age; DOB; marital/cohabitation sta- tus; race and Hispanic origin) ? Household roster (age; sex; relationship to respondent) ? Introduction to Life History Calendar ? Education (degrees; highest grade completed; date last attended) ? Childhood family background and parents Section B ? Onset of menstruation (menarche) ? Current pregnancy status ? Number of pregnancies ? Detailed pregnancy history (more details if in last 5 years) 169 ? Confirmation of pregnancy history ? Care of nonbiological children (women 18?44) ? Relinquishment of biological children for adoption ? Adoption plans and preferences (women 18?44) Section C ? Marital history and characteristics of each husband ? Details on current cohabiting partner, if there is one ? Cohabitation history; characteristics of former cohabiting partners ? Whether the respondent has had biological children with each of her hus- bands and cohabiting partners ? Ever had sexual intercourse (asked if never married, never pregnant and never cohabited): Age and date of first intercourse; Characteristics of first sexual partner (if not already discussed); Date and age of first intercourse after menarche ? Sex education (if 15?24); timing relative to first sex ? Number of sexual partners (in lifetime; in past 12 months; before first mar- riage) ? Recent (last 12 months) partner history, up to 3 partners (or last partner ever, if none in the past 12 months); more details on current partners 170 Section D ? Sterilizing operations (respondent and husband/cohabiting partner) ? Desire for sterilization reversal (tubal ligations and vasectomies only) ? Nonsurgicalsterilityandfertilityproblems(respondentandhusband/cohabiting partner) Section E ? Ever-useofcontraceptivemethods, howemergencycontraceptionwasobtained, discontinuation of use, and reasons for dissatisfaction with selected methods ? Details on first method ever used (even if before first intercourse) ? Method use at first sexual intercourse ? Months during which she had intercourse for past 3?4 years or since first intercourse (if within the last 3?4 years) ? Contraceptive method history by month, for past 3?4 years or since first method used ? Method used at first and last sex, for up to three partners in last 12 months ? Wantedness of each pregnancy (by respondent and by father of pregnancy) ? Happiness to be pregnant scale ? Further details on circumstances surrounding pregnancies in last 3 years (in- cluding wantedness with that partner) 171 ? Current method use, reasons for current nonuse ? Recent pill use (reasons; brand and type, consulting the Pill Chart) ? Consistency of condom use in last 4 weeks ? Frequency of sex in past 4 weeks Section F ? Use of medical services related to birth control and reproduction in the last 12 months (services include receipt of: birth control method; checkup or medical test related to using birth control; counseling about birth control; steriliza- tion; counseling about getting sterilized; emergency contraception; counseling about emergency contraception; pregnancy test; abortion; Pap smear; pelvic exam; prenatal care; post-pregnancy care; testing or treatment for sexually transmitted disease (STD)) ? Provider and payment information for each visit for these services in last 12 months (more detail if specific clinic is cited) ? Activation of clinic lookup if service was received at a clinic ? First service ever received is asked of women 15?24 years of age, including when received, and type of provider ? If clinic is regular source of medical care ? Ever visited a clinic 172 Section G ? Do you want a/another baby (respondent and partner) ? Intentions to have a/another baby ? Number of additional children (respondent/respondent and husband) intend to have Section H ? Infertility services (help to get pregnant; help to prevent miscarriage) ? Infertility diagnoses received, if ever pursued medical help to get pregnant ? Vaginal douching ? Health problems related to childbearing (pelvic inflammatory disease; dia- betes(gestational&nongestational); ovariancysts; uterinefibroids; endometrio- sis; problems with ovulation or menstruation) ? Physical disabilities/limitations ? HIV testing experience (some items limited to last 12 months) ? Where HIV test was received, if within the last 12 months ? HPV vaccine-related knowledge and experience Section I ? Health insurance coverage in last 12 months 173 ? Current residence and residence as of April 1, 2000 ? Place of birth (date came to the United States, if born outside of the United States) ? Religion and attendance at religious services, at age 14 and currently ? Workinpast12monthsandcurrentworkstatus(respondentandhusband/cohabiting partner) ? Child care arrangements used (if any) in past 4 weeks for children under 13 ? Attitudes: relationships, sex, condom use, gender roles, parenthood Section J (ACASI) ? General health, including height and weight ? Pregnancy history (numbers ending in live birth, abortion, or other outcomes) ? School suspension/expulsion (for respondents 15?24 years old) ? Substance use (cigarettes; alcohol; marijuana; cocaine; crack; crystal meth, IV drugs) ? Sexualexperiencewithmales(vaginalintercourse, oralsex, andanalsex; con- dom use at last occurrence of each type of sex; timing of oral sex relative to vaginal intercourse (if age 15?24 and have had both types of sex); nonvolun- tary intercourse with males (asked only for age 18 or older); numbers of male partners in lifetime; numbers of male partners in last 12 months (including numbers by specific type of sex); and other HIV/STD risk behaviors) 174 ? Sex with females, including number of female partners ? Sexual orientation and attraction ? STD experience (some items limited to last 12 months) ? Individual earnings, family income, and public assistance during the previous year H.2 Male Questionnaire Adapted from NSFG Cycle 7 Staff (2008, Figure 7, p. 15). Section A ? Respondent demographic characteristics (age; DOB; marital/cohabitation sta- tus; race and Hispanic origin) ? Household roster (age; sex; relationship of each member to respondent) ? Education (degrees; highest grade completed; date last attended) ? Basic information about his childhood family background and parents ? Numbers of marriages and cohabitations Section B ? Ever had sexual intercourse ? Sex education received (Rs aged 15?24 only) ? Sterilizing operations 175 ? Ever had biological child(ren); how many ? Enumeration of (up to) three most recent female sexual partners, or last part- ner ever Section C ? Marital and cohabitation dates for current wife/partner ? Surgical sterilization and infertility (wife/partner) ? Biological children with current wife/partner (more details if born in last 5 years) ? Otherchildrenhiscurrentwife/partnerhadfrompreviousrelationships(more details if he lived with the child) ? Other nonbiological children he or his current wife/partner ever cared for ? First and last sex: dates and contraceptive use ? Contraceptive use in last 12 months Section D ? Characteristics of (up to) three sexual partners in the past 12 months or last partner ever, contraceptive use at first and most recent sex, and date of first sex in the last 12 months ? Information on children with these partners (collected as above in C) 176 ? First intercourse ever (if not already discussed): date, contraceptive method use, and characteristics of partner Section E ? Characteristics of former wives and first cohabiting partner Section F ? Other biological children (information collected as above in C) ? Other nonbiological children ever raised ? Pregnancies fathered in his lifetime that did not result in live birth (total number and numbers by outcome) ? Exact number of female partners lifetime and in last 12 months Section G ? Activities with the children living in his household ? Activities with his biological and adopted children living elsewhere ? Financial support of his biological and adopted children living elsewhere Section H ? Desire for (wanting) a/another baby (respondent & wife/cohabiting partner) ? Intentions to have a/another baby, asked individually or jointly, as appropri- ate 177 Section I ? Usual source of health care ? Health insurance coverage in last 12 months ? Health services received in last 12 months (more details if under age 25) ? Infertility services received ? HIV testing experience Section J ? Current residence and residence as of April 1, 2000 ? Placeofbirth(datecametoUnitedStates, ifbornoutsideoftheUnitedStates) ? Religion and attendance at religious services, at age 14 and currently ? Military service ? Work status (respondent and wife/cohabiting partner) ? Attitudes: relationships, sex, condom use, gender roles, parenthood Section K (ACASI) ? General health questions ? Significant life events ? School suspension/expulsion (for respondents 15?24 years old) 178 ? Pregnancies fathered ? Substance use (alcohol; marijuana; cocaine; crack; crystal meth; IV drugs) ? Sexual experience with females (vaginal intercourse, oral sex, and anal sex; condom use at last occurrence of each type of sex; timing of oral sex relative to vaginal intercourse (for respondents 15?24 who have had both types of sex); nonvoluntary intercourse with females (asked only for respondents 18 or older); numbers of female partners in lifetime; numbers of female partners in last12months(includingnumbersbyspecifictypeofsex); andotherHIV/STD risk behaviors) ? Sexual experience with other males; condom use at last occurrence of anal or oral sex; nonvoluntary sex with males (asked only for respondents 18 or older); HIV/STD risk behaviors, including number of male partners) ? Sexual orientation and attraction ? STD experience (some items limited to last 12 months) ? Individual earnings, family income, and public assistance during the previous year 179 Bibliography Ai, C. and E. C. Norton (2003). Interaction Terms in Logit and Probit Models. Economics Letters 80(1), 123 ? 129. Alho, J. M., M. H. Mulry, K. Wurdeman, and J. Kim (1993). Estimating Heterogene- ity in the Probabilities of Enumeration for Dual-System Estimation. Journal of the American Statistical Association 88(423), 1130?1136. Allison, P. D. (1999). Comparing Logit and Probit Coefficients Across Groups. Soci- ological Methods & Research 28(2), 186?208. Alt, C. (1991). Stichprobe und Repr?sentativit?t. In H. Bertram (Ed.), Die Fami- lie in Westdeutschland. Stabilit?t und Wandel familialer Lebensformen, pp. 497? 531. Opladen: Leske & Budrich. Angrist, J. and J. Pischke (2009). Mostly Harmless Econometrics: an Empiricist?s Companion. Princeton University Press. Barrett, D. F., M. Beaghen, D. Smith, and J. Burcham (2002). Census 2000 Housing Unit Coverage Study. In Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 146?151. Bethlehem, J. (2002). Weighting Nonresponse Adjustments Based on Auxilliary Information. In R. M. Groves, D. A. Dillman, J. L. Eltinge, and R. J. A. Little (Eds.), Survey Nonresponse, Chapter 18. Wiley-Interscience. Bethlehem, J. and H. Kersten (1985). On the Treatment of Nonresponse in Sample Surveys. Journal of Official Statistics 1(3), 287?300. Biemer, P. P. and L. E. Lyberg (2003). Introduction to Survey Quality. Wiley- Interscience. Boyd, H. W. and R. Westfall (1955). Interviewers as a Source of Error in Surveys. Journal of Marketing 19(4), 311?324. Boyd, H. W. and R. Westfall (1965). Interviewer Bias Revisited. Journal of Market- ing Research 2(1), 58?63. Boyd, H. W. and R. Westfall (1970). Interviewer Bias Once More Revisited. Journal of Marketing Research 7(2), 249?253. Bureau of the Census (1993). Programs to Improve Coverage in the 1990 Census. Technical report. 1990 CPH-E-3. Campanelli, P.C., K.Thomson, N.Moon, andT.Staples(1997). TheQualityofOccu- pational Coding in the UK. In L. Lyberg, P. Biemer, M. Collins, E. de Leeuw, and C. Dippo (Eds.), Survey Measurement and Process Quality, pp. 437?457. Wiley- Interscience. 180 Carroll, R. J., D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu (2006). Measure- ment Error in Nonlinear Models: A Modern Perspective (Second ed.), Volume 105 of Monographs on Statistics and Applied Probability. Chapman & Hall/CRC. Chang, T. and P. S. Kott (2004). Modeling NML Using the Area Frame Survey. In Technical Manuscript, National Agricultural Statistics Service, United States Department of Agriculture. Chhikara, R. S., F. M. Spears, C. R. Perry, and P. S. Kott (2007). Supplemental Samples for the 2007 Area Frame: A Design for Estimating Numbers of NML Farmsforthe2007CensusofAgriculture. InResearchandDevelopmentDivision, NationalAgriculturalStatisticsService,UnitedStatesDepartmentofAgriculture, Research Report No. RDD-07-01. Childers, D. R. (1992). The 1990 Housing Unit Coverage Study. In Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 506?511. Childers, D. R. (1993). Coverage of Housing in the 1990 Decennial Census. In Proceedings of the Section on Survey Research Methods, American Statistical As- sociation, pp. 635?640. Clogg, C., M. Massagli, and S. Eliason (1989). Population Undercount and Social Science Research. Social Indicators Research 21(6), 559?598. Coleman, J. S. (1994). A Rational Choice Perspective on Economic Sociology. In N. J. Smelser and R. Swedberg (Eds.), The Handbook of Economic Sociology, pp. 166?180. Russel Sage Foundation. Cook, P. (1985). The Case of the Missing Victims: Gunshot Woundings in the Na- tional Crime Survey. Journal of Quantitative Criminology 1(1), 91?102. Dalenius, T. (1983). Some Reflections on the Problem of Missing Data. In W. G. Madow and I. Olkin (Eds.), Incomplete Data in Sample Surveys, pp. 411?413. Academic Press. De Young, R. (1999). Environmental Psychology. In D. E. Alexander and R. W. Fairbridge (Eds.), Encyclopedia of Environmental Science, pp. 223?224. Kluwer Academic Publishers. DeNavas-Walt, B. D. P. Carmen, and J. C. Smith (2009). Current Population Re- ports, P60-236, Income, Poverty, and Health Insurance Coverage in the United States: 2008. Technical report, U.S. Government Printing Office, Washington, DC. Dohrmann, S., D. Han, and L. Mohadjer (2006). Residential Address Lists vs. Tra- ditional Listing: Enumerating Households and Group Quarters. In Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 2959?2964. 181 Dohrmann, S., D. Han, and L. Mohadjer (2007). Improving Coverage of Residen- tial Address Lists in Multistage Area Samples. In Proceedings of the Section on Survey Research Methods, American Statistical Association. Eckman, S. and F. Kreuter (2010). Confirmation Bias in Housing Unit Listing. Under Review. Eyerman, J., D. Odom, and J. Chromy (2001). Impact of Computerized Screening on Selection Probabilties and Response Rates in the 1999 NHSDA. In Proceedings of the Section on Survey Research Methods, American Statistical Association. Fan, M. C., M. L. Sutt, and J. H. Thompson (1984). Evaluation of the 1980 Census Precanvasss Coverage Improvement Operations. In Proceedings of the Section on Government Statistics, American Statistical Association. Fay, R. E. (1989). An Analysis of Within Household Undercoverage in the Current Population Survey. In Proceedings of the Bureau of the Census Annual Research Conference, pp. 156?175. Fein, D. J. (1990). Racial and Ethnic Differences in U.S. Census Omission Rates. Demography 27(2), 285?302. Gelman, A. and J. Hill (2007). Data Analysis Using Regression and Multi- level/Hierarchical Models. Oxford University Press. Groves, Robert M., M. W. D., J. Lepkowski, and N. G. Kirgis (2009). Planning and Development of the Continuous National Survey of Family Growth. Vital Health Statistics 1(48). Groves, R. M. (1989). Survey Errors and Survey Costs. John Wiley and Sons. Groves, R. M. (2006). Nonresponse Rates and Nonresponse Bias in Household Sur- veys. Public Opinion Quarterly 70(5), 646?675. Groves, R. M., G. Benson, and W. D. Mosher (2005). Plan and Operation of Cycle 6 of the National Survey of Family Growth. Vital Health Statistics 1(42). Groves, R. M. and M. Couper (1992). Nonresponse in Household Surveys. John Wiley and Sons. Groves, R. M. and E. Peytcheva (2008). The Impact of Nonresponse Rates on Non- response Bias: A Meta-Analysis. Public Opinion Quarterly 72(2), 167?189. Hansen, M. H. and J. Steinberg (1956). Control of Errors in Surveys. Biomet- rics 12(4), 462?474. Harter, R., S. Eckman, N. English, and C. O?Muircheartaigh (2010). Applied Sam- pling for Large-Scale Multi-Stage Area Probability Designs. In P. Marsden and J. Wright (Eds.), Handbook of Survey Research (Second ed.). Emerald. 182 Hawkes, W. (1986). Census Data Quality: A User?s View. Journal of Official Statis- tics 2(4), 531?544. Herzog, T. N., F. J. Scheuren, and W. E. Winkler (2007). Data Quality and Record Linkage Techniques. New York: Springer. Hitchcock, D. (1995). Do the fallacies have a place in the teaching of reasoning skills of critical thinking? In H. V. Hansen and R. C. Pinto (Eds.), Fallacies: Classical and Contemporary Readings, pp. 319?327. The Pennsylvania State University Press. Hosmer and Lemeshow (2000). Applied Logistic Regression (2nd ed.). New York: Wiley-Interscience. Jacobs, C. (1986). Interim Evaluation of Listing Process Audit. Unpublished mem- orandum to Housing Working Group, U.S. Bureau of Labor Statistics. [Cited in (Subcomittee on Survey Coverage, 1990)]. Joncas, M. (1985). Cluster Listing Check Program for the Redesigned LFS Sam- ple. Unpublished report, Ottawa: Statistics Canada. [Cited in (Subcomittee on Survey Coverage, 1990)]. Kennel, T. (2007). A Coverage Profile of Area Frame Blocks on the United States Census Bureau?s Master Address File. In Proceedings of the Section on Survey Research Methods, American Statistical Association. Kennickell, A. B. (2000). Asymmetric Information, Interviewer Behavior, and Unit Nonresponse. In Proceedings of the Section on Survey Research Methods, Ameri- can Statistical Association. Kennickell, A. B. (2003). Reordering the Darkness: Application of Effort and Unit Nonresponse in the Survey of Consumer Finances. In Proceedings of the Section on Survey Research Methods, American Statistical Association. Kish, L. (1965). Survey Sampling. John Wiley and Sons. Kish, L. and I. Hess (1958). On Noncoverage of Sample Dwellings. Journal of the American Statistical Association 53(282), 509?524. Klayman, J. (1995). Varieties of Confirmation Bias. Psychology of Learning and Motivation 32, 385?418. Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. Stata Press. Kwiat, A. (2009). Examining Blocks with Lister Error in Area Listing. In Proceed- ings of the Section on Survey Research Methods, American Statistical Association. Lessler, J.T.(1980). ErrorsAssociatedwiththeFrame. InProceedingsoftheSection on Survey Research Methods, American Statistical Association. 183 Lessler, J. T. and W. D. Kalsbeek (1992). Nonsampling Error in Surveys. John Wiley and Sons. Liu, X. (2008). Using A MAF-Based Frame For Demographic Household Surveys. In Proceedings of the Section on Government Statistics, American Statistical As- sociation. Liu, X. (2009). Impact of MAF-Based Frame Coverage on Survey Estimates. In Proceedings of the Section on Survey Research Methods, American Statistical As- sociation. Lohr, S. L. (1999). Sampling: Design and Analysis. Duxbury Press. Long, J. S. and J. Freese (2005). Regression Models for Categorial Dependent Vari- ables Using Stata (2nd ed.). Stata Press. Loudermilk, C. L. and M. Li (2009). A National Evaluation of Coverage for a Sam- pling Frame Based on the Master Address File (MAF). In Proceedings of the Section on Survey Research Methods, American Statistical Association. Lynn, P. and E. Sala (2006). Measuring Change in Employment Characteristics: The Effects of Dependent Interviewing. International Journal of Public Opinion Research 18(4), 500?509. Manheimer, D. and H. Hyman (1949). Interviewer Performance in Area Sampling. Public Opinion Quarterly 13(1), 83?92. Martin, E. (1981). A Twist on the Heisenberg Principle: Or, How Crime Affects its Measurement. Social Indicators Research 9(2), 197?223. Matschinger, H., S. Bernert, and M. C. Angermeyer (2005). An Analysis of Inter- viewer Effects on Screening Questions in a Computer Assisted Personal mental Health Interview. Journal of Official Statistics 21(4), 657?674. Montaquila, J. M., V. Hsu, and J. M. Brick (2010). Using a Match Rate Model to Predict Areas Where USPS-Based Address Lists May Be Used in Place of Tradi- tional Listing. Under Review. Mood, C. (2010). Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. European Sociological Review 26(1), 67?82. NSFG Cycle 7 Staff (2008). Interviewer Training for NSFG Cycle 7, June 16 - June 20, 2008. Survey Research Center, University of Michigan. Oh, H. and F. Scheuren (1983). Weighting Adjustment for Unit Nonresponse. In- complete Data in Sample Surveys 2, 143?184. O?Muircheartaigh, C. A. (2004). Simple Response Variance: Estimation and Deter- minants. In P. Biemer, R. M. Groves, L. Lyberg, N. Mathiowetz, and S. Sudman (Eds.), Measurement Errors in Surveys. Wiley-Interscience. 184 O?Muircheartaigh, C. A., S. A. Eckman, and C. Weiss (2003). Traditional and En- hanced Field Listing for Probability Sampling. In Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 2563?2567. O?Muircheartaigh, C. A., E. M. English, and S. A. Eckman (2007). Predicting the Relative Quality of Alternative Sampling Frames. In Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 551?574. O?Muircheartaigh, C. A., E. M. English, S. A. Eckman, H. Upchurch, E. Garcia, and J. Lepkowski (2006). Validating a Sampling Revolution: Benchmarking Ad- dress Lists against Traditional Listing. In Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 4189?4196. O?Muircheartaigh, C. and P. Campanelli (1998). The Relative Impact of Interviewer Effects and Sample Design Effects on Survey Precision. Journal of the Royal Statistical Society. Series A (Statistics in Society) 162(3), 63?77. Panel on Coverage Evaluation and Correlation Bias in the 2010 Census, National Research Council (2008). Coverage Measurement in the 2010 Census. National Academies Press. Pearson, J. M. (2003). Quality and Coverage of Listings for Area Sampling. In Proceedings of the Section on Government Statistics, American Statistical Associ- ation. Raudenbush, S. and A. Bryk (2002). Hierarchical Linear Models (Second ed.). Sage. Roberts, S. (2010). New York?s Nooks are a Challenge to Census Takers. New York Times February 23. Rusch, M. L. (2008). Releationships Between User Performance and Spatial Ability in Using Map-Based Software on Pen-Based Devices. Ph. D. thesis, Iowa State University. Sampson, R. J. and S. W. Raudenbush (1999). Systematic social observation of public spaces: A new look at disorder in urban neighborhoods. American Journal of Sociology 105(3), 603?651. Sampson, R. J. and S. W. Raudenbush (2004). Seeing Disorder: Neighborhood Stigma and the Social Construction of "Broken Windows". Social Psychology Quarterly 67(4), 319?342. Sando, T., R. Mussa, J. Sobanjo, and L. Spainhour (2005). Qauntification of the Accuracy of Low Priced GPS Receivers for Crash Location. Journal of the Trans- portation Research Forum 44(2), 19?32. Sappington, D. (1991). Incentives in Principal-Agent Relationships. Journal of Economic Perspectives 5(2), 45?66. 185 Schnell, R., T. Bachteler, and J. Reiher (2009). Privacy-preserving Record Linkage using Bloom Filters. BMC Medical Informatics and Decision Making 9(1), 41. Schnell, R. and F. Kreuter (2005). Separating Interviewer and Sampling-Point Ef- fects. Journal of Official Statistics 21(3), 389. Singer, E., M. R. Frankel, and M. B. Glassman (1983). The Effect of Interviewer Characteristics and Expectations on Response. Public Opinion Quarterly 47(1), 68?83. Singer, E. and L. Kohnke-Aguirre (1979). Interviewer Expectation Effects: A Repli- cation and Extension. Public Opinion Quarterly 43(2), 245?260. Singer, E., J. van Hoewyk, and M. P. Maher (2000). Experiments with Incentives in Telephone Surveys. Public Opinion Quarterly 64(2), 171?188. Snijders, T. A. B. and R. J. Bosker (1999). Multilevel Analysis: An Introduction to Basic and Advanced Multilvel Modelling. Sage. StataCorp LP (2009). Stata statistical software: Release 11. College Station, TX: StataCorp. Stiglitz, J.E.(2008). Principalandagent(i). InS.N.DurlaufandL.E.Blume(Eds.), The New Palgrave Dictionary of Economicsf (Second ed.). Palgrave Macmillan. Subcomittee on Survey Coverage (1990). Survey Coverage. Technical report, Fed- eral Committee on Statistical Methodology. SurveyResearchCenter(1969). Interviewer?sManual. InstituteforSocialResearch, The University of Michigan. Survey Research Center (1976). Interviewer?s Manual (Revised ed.). Institute for Social Research, The University of Michigan. Taylor, R. B., S. D. Gottfredson, and S. Brower (1984). Neighborhood Naming as an Index of Attachment to Place. 7(2), 103?125. Thompson, G. and C. Turmelle (2004). Classification of Address Register Coverage Rates: A Field Study. In Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 4477?4484. Turmelle, C., J.-F. Rodrigue, and G. Thompson (2005). Using the Canadian Ad- dress Register in the Labour Force Survey Implementation, Results and Lessons Learned. In Proceedings of the Conference of the Federal Committee on Statistical Methodology. United States Department of Justice. Federal Bureau of Investigation. (2009, 6). Uniform crime reporting program data [united states]: Arrests by age, sex, and race, 2007 [computer file]. icpsr25108-v1. 186 U.S. Census Bureau (2001a). Census 2000 Summary File 1. Technical report. U.S. Census Bureau (2001b). Census 2000 Summary File 3 Technical Documenta- tion. Technical report. U.S. Census Bureau (2002a). Census 2000 Summary File 3. Technical report. U.S. Census Bureau (2002b). Census 2000 Summary File 3 Technical Documenta- tion. Technical report. U.S. Census Bureau (2006). Technical Paper 66: Design and Methodology, Current Population Survey. Technical report. Wolter, K. M. (1986). Some Coverage Error Models for Census Data. Journal of the American Statistical Association 81, 338?346. Wolter, K. M. (2007). Introduction to Variance Estimation (2nd ed.). Springer- Verlag. Wooldridge, J. M. (2009). Introductory Econometrics: A Modern Approach (fourth ed.). South-Western. Wright, T. and H. J. Tsao (1983). A Frame on Frames: An Annotated Bibliogra- phy. In T. Wright (Ed.), Statistical Methods and the Improvment of Data Quality. Academic Press. 187