ABSTRACT Title of dissertation: CONTRIBUTIONS TO BAYESIAN STATISTICAL MODELING IN PUBLIC POLICY RESEARCH Kevin D. Dayaratna, Doctor of Philosophy, 2014 Dissertation directed by: Professor Benjamin Kedem Department of Mathematics This dissertation improves existing Bayesian statistical methodologies and ap- plies these improvements to a variety of important public policy questions. The manuscript is divided into six chapters. The first chapter provides an overview of the various chapters of the dissertation. The second chapter improves existing Bayesian binary logistic regression methodologies using polynomial expansions as an alternative to existing Markov Chain Monte Carlo (MCMC) methods. Our improve- ments make the estimation technique quite useful for a variety of applications. We also demonstrate the methodology to be considerably faster than existing MCMC methods. These computational gains are quite useful for models analyzing large data sets involving high-dimensional parameter spaces. We apply this methodology to a child poverty data set to analyze the potential causes of child poverty. The next chapter improves upon a well-known technique in semiparametric modeling known as density ratio estimation. This methodology is useful in principle; however, it suffers from one primary limitation - The technique has thus far been incapable of modeling individual-level heterogeneity. Modeling heterogeneity is important as there is often no a priori reason to believe that different individuals (or observations) in a data set will behave in an identical manner. We ameliorate this limitation in the third chapter of this dissertation by adapting density ratio estimation methods to accommodate individual-level heterogeneity. We apply this new methodology to an analysis of the efficacy of medical malpractice reform across the country. In the fourth chapter of this dissertation, we shift our focus toward improving Bayesian credible interval estimation via semiparametric density ratio estimation. We do so by applying an innovative adaptation of the methodology, known as out of sample fusion, to posterior samples from a hierarchical Bayesian linear model looking at the efficacy of the welfare reform of the 1990s. In the fifth chapter, we extend the application of this methodology to credible interval estimation of a hierarchical gen- eralized linear model used for analyzing terrorism data in a number of major conflicts across the globe. We use our results to offer some prescriptive policy suggestions regarding counterterrorism policy. The final chapter concludes the dissertation and offers a number of suggestions for further research. We emphasize that the modeling contributions presented in this dissertation are useful in myriads of other applied problems beyond just the public policy applications presented here. CONTRIBUTIONS TO BAYESIAN STATISTICAL MODELING IN PUBLIC POLICY RESEARCH by Kevin D. Dayaratna Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2014 Advisory Committee: Professor Benjamin Kedem, Chair/Advisor Professor Tobias Von Petersdorff Professor Yuan Liao Professor David Hamilton Professor P.K. Kannan (Dean’s Representative) c© Copyright by Kevin D. Dayaratna 2014 Dedication To Daddy ii Acknowledgments My experience in Graduate School at the University of Maryland is one that will remain with me for the rest of my life. There are so many people I’d like to ac- knowledge that these first few pages cannot accommodate them all. Firstly, I would like to thank both of my parents, Dr. Lama Dayaratna and Mrs. Esme Dayaratna, as well as my two siblings, Arnal and Sandra, for their endless support and encourage- ment over the course of my entire life. My accomplishments would almost surely not have been possible without them. I’d also like to of course thank my wife Sashi for her tremendous support as well. Others I’d also like to thank include Ron Matteotti and my in-laws Mr. Gamini Abeysekara and Mrs. Kalyani Abeysekara. I’d also like to thank many of my other relatives including Asantha Kempitya, Obashini Kempi- tiya, Dharshika Kempitiya Sooriyabandara, Gihan Sooriyabandara, Hemali de Silva, Harsha Rajapaksha, Udeshi Hargett, Nirmala Abeysinghe, and Channa Abeysinghe to name a few. There are also of course plenty of others outside my family whom I would like to thank. Firstly, from the bottom of my heart, I would like to express my utmost gratitude to my adviser Professor Benjamin Kedem. Professor Kedem has not only been the best adviser I could have ever asked for but has also been a great friend and colleague. Much of this dissertation builds upon many of his excellent ideas, and he has provided tremendous input and guidance to improve upon previous drafts of this dissertation. I would also like to thank Professor Paul Smith who admitted me to the iii program and served as my professor for some of my coursework. In addition, I would like to thank a number of other professors in the department including Professors Eric Slud, Abram Kagan, Leonid Koralov, Sandra Cerrai, Tobias von Petersdorff, Yuan Liao, and David Hamilton to name a few. Outside Maryland’s Mathematics department, I would like to thank my colleague Professor P.K. Kannan of the Robert H. Smith School of Business, whom I had the privilege of publishing other research with outside this dissertation [22]. I would also like to thank Professor Jordan Boyd- Graber of the iSchool for introducing me to the concept of non-parametric Bayesian modeling used in Chapter 5 of this dissertation. Another faculty member I would like to thank is Professor Cindy Clement of the Economics department for giving me the opportunity to serve as a Lecturer in her department for each of the last three years. Additionally, there are numerous classmates at Maryland that I would also like to thank including Eddie Kim, Xuan Yao, Xuan Liu, Joyce Hsiao, Kwame Okrah, Hisham Talukder, Arseniy Zakharov, Joseph Paulson, David Shaw, Daniel Han, Jong Jun Lee, Clare Wickman, Jim Greene, William Waldron, Catherine Ochalek, Jeff Tan, Wenqing Hu, Wen Zhou, Lemeng Pan, Yue Tian, Zi Ding, Sean Burke, Joel Witten, Peggy Tseng, Gauri Kulkarni, and Anshuman Sinha among others. I would also like to especially thank Paul Tschirhart of the University’s Writing Fellows Program for his advice about LaTeX coding and help formatting this dissertation. I would also like to thank the Mathematics department’s great staff including Celeste Regalado, Linette Berry, Haydee Hidalgo, and Alverda McCoy. Outside the University of Maryland, there are also many people I would like to iv thank. Firstly, I am indebted to Professor Steven J. Miller of Williams College. Steve has essentially served as my “mathematics mentor” since I was an undergraduate at U.C. Berkeley. Over the years, we have become friends and colleagues, and I’ve had the privilege of co-authoring a number of publications with him [23,24,108]. He has also provided very insightful comments to this dissertation. I would also like to thank Professor Eric Bradlow of the University of Pennsylvania for helpful input on the Chapter 2 of this dissertation. There are many people from my undergraduate days at the University of Cal- ifornia, Berkeley I would also like to thank. Special thanks go to my good friend Gagan Tara Nanda whom I studied mathematics with throughout the course of my studies at Berkeley. I’d like to thank my undergraduate adviser, Professor L. Craig Evans, who was instrumental in getting me interested in continuing to study higher- level mathematics. I’d also like to of course thank a number of professors that I had the privilege of learning from at Berkeley including Professors Richard Borcherds, Maciej Zworski, Michael Hutchings, and John Neu. Studying mathematics from these and other professors interested me greatly in pursuing graduate-level studies. I would also like to thank one of my political science professors, Professor Darren Zook, for getting me interested in terrorism analysis and ethnic conflict, which form a component of this dissertation. There are plenty of friends I’d also like to thank. I’d like to thank Alex Levy who has been my good friend for nearly twenty years and the best man at my wedding. I’d also like to thank Hovannes Abramyan, Mike Taylor, and Richard Nazareth who not only served as groomsmen at my wedding but have also been great v friends since my time at U.C. Berkeley. I would especially like to thank Hovannes and his colleague Mark Crain for providing the data that they used in their work with the Pacific Research Institute for Chapter 3 of this dissertation. Other U.C. Berkeley friends I would like to thank include Carl Densing, Elaine Gin, Yuriy Pasko, Mike McFarlane, Dennis Bedford, Cameron Huey, Laura Russell, Nat Lipanovich, Erica Batres, Erik Johannessen, Steve Berkovich, Michael Klein, Sam Lee, Eddie Kim, Taiwan Udtamadilok, Rosanna Leroe-Munoz, David Timmons, and Rebekah Hsieh among others. I’d also like to thank A.J. Hergenroder, Annie Moskovian, Jocelyn Hsiao, Ben Ouyang, Laurie Kaufman, Rebekka Levy, Chris Luise, and Pamitha Weeresinghe. Special thanks also go to the Rhodes, Pillai, and Pathirana families for their friendship and encouragement over the years. Finally, I would like to say thank you to many of my colleagues at The Her- itage Foundation. In particular, I’d like to thank my colleagues in the Center for Data Analysis - namely, Rea Hederman, Salim Furth, Drew Gonshorowski, Patrick Tyrrell, James Sherk, David Kreutzer, David Muhlhausen, John Ligon, Rachel Gres- zler, and Filip Jolevski. I’d also like to thank Derrick Morgan, Bill Beach, Robert Rector, Leslie Merkle, Nina Owcharenko, Alyene Senger, Jennifer Marshall, Rachel Sheffield, Bob Moffit, David Azerrad, Katie Tubb, Elinor Renner, and Pamela Ouzts. I would especially like to thank Rea Hederman and Robert Rector for providing the data used in Chapters 2 and 4 of this dissertation as well as for insightful discus- sions. I would also like to think Jim Carafano for helpful comments on Chapter 5 of this dissertation on statistical inferences for counterterrorism policy. vi Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. A Rigorous Examination of the Determinants of Child Poverty in America via Closed-Form Bayesian Inferences with Implications for Welfare Reform 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 What are the Causes of Child Poverty? . . . . . . . . . . . . . 5 2.1.2 Improvements to Bayesian Computation . . . . . . . . . . . . 6 2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Polynomial Expansions of the Binary Logit Model . . . . . . . 10 2.2.2 Closed-Form Bayesian Inference via Polynomial Expansions as Described In Miller et al (2006) . . . . . . . . . . . . . . . 13 2.2.3 Bayesian Inference via Series Expansions Using a Two-Sided Heterogeneity Distribution . . . . . . . . . . . . . . . . . . . . 16 2.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.1 Numerical Simulations . . . . . . . . . . . . . . . . . . . . . . 23 2.3.2 Comparisons to MCMC Methods . . . . . . . . . . . . . . . . 24 2.4 A Few Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4.1 Incorporating covariances: Positing Heterogeneity from a Mul- tivariate Normal Distribution . . . . . . . . . . . . . . . . . . 32 2.4.2 Allowing dependence on other factors . . . . . . . . . . . . . . 35 2.4.3 Arbitrary Priors . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5 Application to Analyzing Potential Causes of Child Poverty . . . . . 38 2.5.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.5.2 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.3 Policy Implications . . . . . . . . . . . . . . . . . . . . . . . . 41 2.6 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . 42 3. Closed-form Bayesian Inferences for Semiparametric Density Ratio Modeling with Applications to Tort Reform . . . . . . . . . . . . . . . . . . . . . . . 45 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.1.1 Tort Reform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.1.2 Bayesian Parametric methods . . . . . . . . . . . . . . . . . . 47 3.1.3 Density Ratio Estimation . . . . . . . . . . . . . . . . . . . . 47 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2.1 Bayesian Density Ratio Estimation . . . . . . . . . . . . . . . 50 3.2.2 Advantages of Marginalization . . . . . . . . . . . . . . . . . . 52 3.2.3 Derivation of Distributions . . . . . . . . . . . . . . . . . . . . 52 3.3 Numerical Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4 A Few Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.4.1 Arbitrary Prior Distributions and Tilt Functions . . . . . . . . 58 3.4.2 Assuming homogeneity in the intercepts . . . . . . . . . . . . 60 3.5 An Application: Tort Reform . . . . . . . . . . . . . . . . . . . . . . 60 3.5.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5.3 Policy Implications . . . . . . . . . . . . . . . . . . . . . . . . 66 3.6 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . 67 4. Bayesian Inferences of Welfare Reform with Improved Credible Interval Es- timation via Semiparametric Out of Sample Fusion . . . . . . . . . . . . . 69 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1.1 “Ending Welfare as We Know it” . . . . . . . . . . . . . . . . 69 4.2 Statistically Modeling the Impact of Welfare Reform . . . . . . . . . 72 4.2.1 A Hierarchical Bayesian Model Allowing for State-Level Het- erogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.2 The Limitations of Bayesian Computational Methods . . . . . 74 4.2.3 Statistical Inferences Based on Density Ratio Estimation . . . 76 4.2.4 Empirical Likelihood Estimation in Semiparametric Density Ratio Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.5 Numerical Simulations . . . . . . . . . . . . . . . . . . . . . . 81 4.3 The Impact of Welfare Reform . . . . . . . . . . . . . . . . . . . . . . 91 4.3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.2 Analysis of the PROWA Using Bayesian Lagged Regression . . 94 4.3.3 Statistical Inferences Based on Hierarchical Bayesian Analysis of PROWA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4 Additional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.4.1 Policy Implications . . . . . . . . . . . . . . . . . . . . . . . . 106 4.5 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . 107 4.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.6.1 An Elementary Frequentist Analysis . . . . . . . . . . . . . . 108 5. Generalized Bayesian Inferences for Counterterrorism Policy with Improved Credible Interval Estimation via Semiparametric Out of Sample Fusion . . 111 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.1.1 Combating Terrorism . . . . . . . . . . . . . . . . . . . . . . . 111 5.1.2 Dirichlet Process Priors and the Limitations Bayesian Para- metric methods . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.1.3 A Brief Review of Density Ratio Estimation . . . . . . . . . . 115 5.1.4 Using DRE to Improving on Bayesian Estimation Results . . . 116 viii 5.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.3 A Bayesian Analysis of Several Major Conflicts Across the Globe . . 120 5.3.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.3.2 The War in Afghanistan . . . . . . . . . . . . . . . . . . . . . 120 5.3.3 The War in Iraq . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.3.4 The Sri Lankan Civil War . . . . . . . . . . . . . . . . . . . . 127 5.3.5 The Troubles . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.3.6 The Use of Dirichlet Process Priors . . . . . . . . . . . . . . . 133 5.3.7 Improving Bayesian Credible Interval Estimates . . . . . . . . 135 5.4 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . 140 5.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.5.1 Variable Definition . . . . . . . . . . . . . . . . . . . . . . . . 144 6. Conclusions and Future Research - Where do we go from here? . . . . . . . 154 ix List of Figures 3.1 Plot of Hˆ vs. H˜ - DRE, 2004 Per Capita Tort Loss Data . . . . . . . 64 3.2 Plot of Hˆ vs. H˜ - Bayesian DRE, 2004 Per Capita Tort Loss Data . . 64 3.3 Plot of Hˆ vs. H˜ - DRE, 2006 Per Capita Tort Loss Data . . . . . . . 65 3.4 Plot of Hˆ vs. H˜ - Bayesian DRE, 2006 Per Capita Tort Loss Data . . 65 4.1 Histogram of Posterior Sample for Posterior Intercept Mean Coeffi- cient (µ0), TANF regression, Lag 1 . . . . . . . . . . . . . . . . . . . 95 4.2 Histogram of Posterior Slope Mean Coefficient (µ1), TANF regression, Lag 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.3 Histogram of Posterior Intercept Variance Coefficient (σ20), TANF regression, Lag 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4 Posterior Slope Variance Coefficient, TANF regression (σ21), Lag 1 . . 97 5.1 Histogram of Posterior Sample for Suicide Attack Coefficient in Afghanistan, Assuming a Normally Distributed Prior . . . . . . . . . . . . . . . . . 136 5.2 Histogram of Posterior Sample for Suicide Attack Coefficient in Afghanistan, Assuming a DP prior . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.3 Histogram of Posterior Sample for Bombing/Explosion Attack Coef- ficient in Iraq, Assuming a Normally Distributed Prior . . . . . . . . 138 5.4 Histogram of Posterior Sample for Bombing/Explosion Attack Coef- ficient in Iraq, Assuming a DP prior . . . . . . . . . . . . . . . . . . . 139 5.5 Histogram of Posterior Sample for Facility/Infrastructure Coefficient in Sri Lanka, Assuming a Normally Distributed Prior . . . . . . . . . 140 5.6 Histogram of Posterior Sample for Facility/Infrastructure Coefficient in Sri Lanka, Assuming a DP prior . . . . . . . . . . . . . . . . . . . 141 5.7 Histogram of Posterior Sample for Hijacking Coefficient in North- ern Ireland (before 1998 GF Agreement), Assuming a Normally Dis- tributed Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.8 Histogram of Posterior Sample for Hijacking Coefficient in Northern Ireland (before 1998 GF Agreement), Assuming a DP prior . . . . . . 143 5.9 Histogram of Posterior Sample for Unarmed Assault Coefficient in Northern Ireland (after 1998 GF Agreement), Assuming a Normally Distributed Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.10 Histogram of Posterior Sample for Unarmed Assault Coefficient in Northern Ireland (after 1998 GF Agreement), Assuming a DP prior . 145 Chapter 1: Introduction 1.1 Introduction In public policy research, questions abound about the direction of our country. In recent years, rigorous statistical analysis has been a useful tool for answering many of these questions [42]. This dissertation looks at a number of such policy questions improving existing statistical methodologies in the process. For example, in our next chapter, we look at the determinants of child poverty and improve existing Bayesian estimation techniques to do so. Understanding the root causes of poverty is a fundamental question that has consistently plagued policy researchers for decades. With nearly $16 trillion spent on federal welfare programs since President Lyndon Johnson began his War on Poverty and with the Obama Administration expected to spend over another $10 trillion over the course of the next decade, millions of American children continue to live in poverty [131]. Any meaningful policy proposals should be based on addressing the causes of poverty and not just the apparent symptoms. Although a number of studies have attempted to identify these causes, no studies, to our knowledge, have done so including individual-level heterogeneity in the associated statistical models. Incorporating heterogeneity is very important as different families will almost surely respond differently to potential causes. Unfor- tunately, however, for the binary logistic regression models typically called upon to help answer this question, positing a distribution for heterogeneity does not really allow for closed-form inferences. As a result, researchers often have to resort to numerical techniques such as Markov Chain Monte Carlo (MCMC) methods to es- timate the associated models. These methods can be difficult and time consuming to obtain convergence, particularly for large data sets. We address this issue in Chapter 2 by presenting an alternative Bayesian es- timation technique for binary logistic regression using polynomial expansions that allows for closed form inferences, enabling researchers to make direct inferences about the population. Miller, Bradlow, and Dayaratna (2006) also presented a sim- ilar method; however, their result was quite limited as it was restricted to using a single-sided prior distribution [108]. We assuage this limitation by allowing the re- searcher to draw from one of the most flexible and commonly used prior distributions - the normal distribution. After deriving our polynomial expansions, we present a number of numerical simulations to illustrate the usefulness and advantages of our approach. We also estimate our model on a large poverty data set from the Current Population Survey to study the determinants of child poverty. In particular, we find that marital status of parents, parental age, parental education, whether the parents are working full time, and the number of children living in a household all significantly influence whether a child grows up in poverty. We discuss the resulting policy implications and conclude with a discussion of potential avenues for future research. In Chapter 3, we examine the efficacy of medical malpractice reforms instituted throughout the country over the course of the last decade. In the process, we improve on existing semiparametric density ratio estimation methodologies. Density ratio estimation (DRE) is a well-known semiparametric modeling technique that has been around for decades. Although the DRE method has proven to be very useful 2 in statistical modeling, it suffers from one primary limitation. In particular, the method has thus far been incapable of modeling individual-level heterogeneity. We ameliorate this limitation in Chapter 3 to enable DRE methods to model individual-level heterogeneity. We perform a series of numerical simulations, along with goodness of fit computations, to illustrate the efficacy of our approach. We show that this new approach outperforms existing semiparametric density ratio esti- mation methods. After our numerical simulations, we apply our approach to medical malpractice loss data from the previous decade to quantify the probability of ex- treme losses. Our results indicate the success of some recently instituted medical malpractice reforms. Subsequently, in our fourth chapter, we shift our focus toward welfare policy, conducting a rigorous Bayesian analysis of the welfare reform of the 1990s, providing some improvements to Bayesian credible interval estimation techniques in the pro- cess. First popularized by Ronald Reagan in his classic 1964 “A Time for Choosing” speech, the concept of welfare reform has been a hot topic of public policy research for decades [127]. The Personal Responsibility and Work Authorization Act of 1996 was one of the nation’s most comprehensive efforts at welfare reform [118]. The law’s primary aim was to transform one of America’s major welfare programs away from a system fostering dependency and into a program providing temporary assistance to enable people to become contributing members of society. In Chapter 4, we utilize hierarchical Bayesian linear modeling to rigorously quantify the impact of this law. In the process, we improve upon existing Bayesian interval estimation methods by calling upon semiparametric DRE method thus far used only for frequentist statistical modeling. We find that the welfare reform of the 1990s was quite successful in getting people back to work and can be improved upon even further. We conclude by discussing the resulting policy implications. In Chapter 5, we shift our focus to foreign policy, particularly toward combat- 3 ing terrorism. Terrorism has been around for generations and understanding how to fight terrorists has been a question vexing policymakers throughout the world. In Chapter 5, we rigorously analyze terrorist attack data from four major conflicts across the globe - from Afghanistan, Iraq, Sri Lanka, and Northern Ireland - to help policymakers answer this very question. We utilize both parametric and non- parametric (hence generalized) Bayesian logistic regression techniques to do so, and again improve upon Bayesian credible interval estimation in the process by using semiparametric DRE methods. We thus extend the application of the semipara- metric DRE method used in Chapter 4 for linear models to this generalized linear model. Our study helps shed light on the factors that influence the success of terror- ist attacks in each conflict, providing policymakers with advice about how to more effectively combat this very dangerous enemy. Finally, in Chapter 6, we discuss some conclusions as well as potential avenues for future research. We emphasize that the modeling contributions presented here are applicable to myriads of other fields beyond just the public policy applications looked at in this dissertation. 4 Chapter 2: A Rigorous Examination of the Determinants of Child Poverty in America via Closed-Form Bayesian Inferences with Implications for Welfare Reform 2.1 Introduction 2.1.1 What are the Causes of Child Poverty? Ever since President Lyndon Johnson began his War on Poverty in 1964, the U.S. Federal Government has spent an exorbitant amount (estimated to be as much as $15.9 trillion) on welfare programs and on other forms of cash assistance to the poor. Such programs include means-tested welfare programs such as Temporary Assistance for Needy Families (TANF), the Earned Income Tax Credit (EITC), Supplemental Security Income (SSI), food stamps, public housing, and Medicaid among others [131]. As millions of Americans continue to live in poverty today, some researchers have argued that many of these programs have been largely inef- fective and have the potential to trap people in poverty. In his 1984 seminal work Losing Ground, for example, Charles Murray of the American Enterprise Institute suggested that many of these welfare programs were actually encouraging depen- dence and ironically crippling the people they were intended to help [111]. Many such researchers have consequently argued that meaningful public policy proposals on these issues should be grounded on an understanding of the root causes of poverty and not just the mere symptoms [130,136]. Although some studies have attempted to identify these causes, no such re- search, to our knowledge, have incorporated individual-level heterogeneity in its statistical modeling [130,145]. Incorporating heterogeneity is of utmost importance in these models as there is no a priori reason to believe that all families will respond to potential causes in the same manner. From a statistical perspective, assuming homogeneous response coefficients in a model can lead to the researcher ignoring potential variability and can consequently result in misleading statistical inferences. We fill this gap in the extant literature by examining the potential causes of child poverty via Bayesian logistic regression. The contribution of our study is two fold. From a policy perspective, we look at the determinants of child poverty, while incorporating individual-level heterogeneity into our model. Our incorporation of individual-level heterogeneity allows us to make more informative statistical infer- ences compared to methods that simply just assume homogeneity. Methodologically, our approach makes Bayesian estimation involving large data sets, such as the child poverty data set examined here, much more feasible. 2.1.2 Improvements to Bayesian Computation From fields ranging from economics to public policy to medicine to professional sports, logistic regression has become one of the most widely used tools in applied statistical research. For example, logistic regression models have been used to help understand the determinants of depression and suicide, examine consumer choice in marketing research, model medical outcomes pertaining to various illnesses, and shed light on baseball player hitting performance among others [1, 7, 11, 58–60, 133, 164,167]. With improvements in statistical computing power over the course of the past 6 two decades, incorporating heterogeneity in these and other models has become increasingly common in the applied statistical literature. Researchers have incor- porated heterogeneity by choosing amongst a variety of methodologies, including a parametric Bayesian approach, a non-parametric Bayesian approach, or even a frequentist finite mixture approach [49, 72, 74, 100]. Parametric Bayesian models are often used to incorporate individual-level heterogeneity. Generally, with limited data per individual in a data set, assuming a different parametrization for each individual renders a model statistically unidentifiable, making estimation virtually impossible. Researchers will typically assume that these individual response coeffi- cients are all drawn from their own lower-dimensional probability distribution. They can then estimate these models from an empirical Bayesian perspective or from a fully Bayesian perspective by imposing priors on the parameters of the heterogeneity distributions themselves [49,110]. Although “nice” in principle, incorporating individual-level heterogeneity is often concomitant with the drawbacks of the computational complexity associated with numerical computation. For example, numerical methods such as quadrature, simulated maximum likelihood, and Markov Chain Monte Carlo (MCMC) methods can be difficult and time consuming to estimate, especially for large data sets in- volving high-dimensional parameter spaces [50, 138]. Additionally, commonly-used MCMC methods suffer from the drawback of sensitivity to starting values and can consequently result in a significant amount of simulation error. As a result, a number of researchers have approached alternative techniques to make Bayesian computation more feasible. For example, Everson and Bradlow (2002) used polynomial expansions to approximate the posterior distributions of the beta-binomial random variables using a class of prior distributions previously considered non-conjugate [39]. Similarly, Bradlow et al (2002) used polynomial expansions to improve on researchers’ ability to make posterior inferences about the 7 negative binomial distribution [9]. In subsequent research, McShane et al (2008) used similar techniques to improve on Weibull count model estimation [105]. Miller et al (2006) used polynomial expansions to solve the problem looked at in this research, namely for binary logistic regression [108]. Their approach, however, suffered from a serious limitation by requiring that the prior distribution be single- sided. Consequently, their result, although nice in principle, is very limited in scope as it is often difficult to know a priori the signs of the coefficients beforehand. We assuage this limitation by looking at the same problem but instead allowing the researcher to draw from the one of the richest and most commonly used two-sided prior distributions - The normal distribution. In particular, we derive a marginalized likelihood for the binary logit model us- ing polynomial expansions. We begin by reviewing the Miller et al (2006) approach and its subsequent limitations. We then proceed by ameliorating these limitations, assuming that the response coefficients are drawn from independent normal distri- butions. We subsequently generalize this result by allowing correlations amongst the coefficients. Afterwards, we allow for dependence of the prior distributions on various covariates and then subsequently allow for the choice of non-normal prior distributions. Our model can be estimated via the method of maximum marginal likelihood (MML) from which empirical Bayesian inferences can be made, allowing us to make direct inferences about the population [110,147]. We present a number of numerical simulations to illustrate the usefulness of our approach as well as its advantages over existing MCMC methods. We also utilize our approach to answer an impor- tant question in public policy research; particularly, identifying the causes of child poverty. 8 2.2 Problem Formulation Consider a data set obtained from i ∈ {1, . . . , I} individuals (units) having j ∈ {1, . . . , J} categories (e.g. illness, product brand, etc) measured on t ∈ {1, . . . , Ni} occasions (repeated measures). As is standard, we define yijt =    1 if outcome occurs for individual i pertaining to category j at time t 0 otherwise, (2.1) where pijt = Prob(yijt = 1) is the probability of a particular outcome occurring (e.g. living in poverty, purchase of a product, mortality of a patient) for the ith individual pertaining to the jth category on the tth occasion. Additionally, let p = 1, . . . , P represent a set of attributes pertaining to the covariates, with corresponding values xijt,p ≥ 0 such that XTijt = (xijt,1, . . . , xijt,P ). To account for residual effects not manifested in the coefficient estimates for the explanatory variables, we can allow xijt,1 = 1 defining category-level intercepts. Multiplying over all individuals, categories, and occasions, we obtain the stan- dard logit likelihood of the data, Y = (yijt): P (Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi 1 + eX T ijtβi , (2.2) where βi = (βi,1, . . . , βi,P ) is the coefficient vector for the ith individual with pth variable-specific coefficient, βi,p and β = (β1, . . . , βI). Allowing our model to accommodate heterogeneity is quite important in mod- eling real world phenomena, as there is no reason to believe that all individuals will behave in an identical manner. For example, in modeling consumer purchase behav- ior, customers purchasing a product will almost surely differ in how they respond to 9 prices or promotions. In modeling baseball hitting performance, different sluggers will respond differently to different pitching styles. Additionally, in public policy research, looked at in this study, different families will respond differently to various factors that may or may not contribute to them living in poverty. We can model heterogeneity across individuals by allowing each βi,p to be drawn from probability distributions. To start out, we discuss the Miller et al (2006) approach and the limitations of their result. Subsequently, we ameliorate these limitations and make the mode much for useful for applying to real-world problems. 2.2.1 Polynomial Expansions of the Binary Logit Model We are interested in the following marginalized likelihood which we intend to maximize over our parameter space: P (Y |Ω) = ∫ β P (Y |β)N(β|Ω)dβ. (2.3) In the above equation, P (Y |β) is our standard logit likelihood with a prior dis- tribution N(β|Ω) and Ω represents the parameters of our prior distribution. As men- tioned above, we have non-negative explanatory variables XTijt = (xijt,1, . . . , xijt,P ) and binary dependent variables Y = (yijt). We intend to maximize the above marginalized likelihood over our prior distribution’s parameter space. Specifically, our logit likelihood is: P (Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi 1 + eX T ijtβi , (2.4) It is the heterogeneity across i = 1, ..., I individuals in their βi,p coefficients that we are modeling by allowing these parameters to follow N(β|Ω). Due to the fact that the βi appears in both the numerator and denominator of (2.4), performing 10 the integration in (2.3) analytically for most choices of heterogeneity distributions without any numerical approximations is essentially impossible. We can take a series expansion approach to this problem and rewrite P (Y |β) as follows: P (Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi 1 + eX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi I∏ i=1 J∏ j=1 Ni∏ t=1 1 1 + eX T ijtβi P (Y |β) = P1(Y |β)P2(Y |β). (2.5) We refer to the second factor above as P2(Y |β) although it does not depend on Y. If we assume XTijtβi < 0, we can expand P2(Y |β) via a geometric series expansion as follows [108]: P2(Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 1 1 + eX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijtekijtX T ijtβi . (2.6) Putting together the pieces, we therefore have when XTijtβi < 0: 11 P (Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi 1 + eX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi I∏ i=1 J∏ j=1 Ni∏ t=1 1 1 + eX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijtekijtX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijteyijtX T ijtβi+kijtX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijte(yijt+kijt)X T ijtβi P (Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijte(yijt+kijt) ∑P p=1 xijt,pβi,p . (2.7) If, on the other hand, we assume XTijtβi > 0, we can also use a geometric series expansion: P2(Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 1 1 + eX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 e−X T ijtβi 1 + e−X T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 e−X T ijtβi 1 1 + e−X T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 e−X T ijtβi I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijte−kijtX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijte−X T ijtβi−kijtX T ijtβi P2(Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijte−(1+kijt) ∑P p=1 xijt,pβi,p . (2.8) 12 And again putting together the pieces: P (Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi 1 + eX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi I∏ i=1 J∏ j=1 Ni∏ t=1 1 1 + eX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijte−(1+kijt)X T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijteyijtX T ijtβi−(1+kijt)X T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijte(yijt−1−kijt)X T ijtβi P (Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 ∞∑ kijt=0 (−1)kijte(yijt−1−kijt) ∑P p=1 xijt,pβi,p . (2.9) In the next section we utilize these series expansions to derive closed-form ex- pressions from which we can make Bayesian inferences. 2.2.2 Closed-Form Bayesian Inference via Polynomial Expansions as Described In Miller et al (2006) Ideally, one would like to allow each βi,p to follow two sided prior distribu- tion, under such circumstances we would have a combination of both of the above situations, as well as when XTijtβi = 0: 13 P (Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi 1 + eX T ijtβi = I∏ i=1 J∏ j=1 Ni∏ t=1 [ eyijtX T ijtβi 1 + eX T ijtβi (1) ] = I∏ i=1 J∏ j=1 Ni∏ t=1 [ eyijtX T ijtβi 1 + eX T ijtβi [ I(XTijtβi > 0) + I(X T ijtβi < 0) + I(X T ijtβi = 0) ] ] P (Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1 [ eyijtX T ijtβi 1 + eX T ijtβi I(XTijtβi > 0) + eyijtX T ijtβi 1 + eX T ijtβi I(XTijtβi < 0) + eyijtX T ijtβi 1 + eX T ijtβi I(XTijtβi = 0) ] . This can be rewritten as: P (Y |β) = I∏ i=1 J∏ j=1 Ni∏ t=1   ∞∑ kijt=0 (−1)kijte(yijt−1−kijt) ∑P p=1 xijt,pβi,pI( P∑ p=1 xijt,pβi,p > 0) + ∞∑ kijt=0 (−1)kijte(yijt+kijt) ∑P p=1 xijt,pβi,pI( P∑ p=1 xijt,pβi,p < 0) + eyijt ∑P p=1 xijt,pβi,p 1 + e ∑P p=1 xijt,pβi,p I( P∑ p=1 xijt,pβi,p = 0) ] . (2.10) As a result, the marginalized likelihood is: P (Y |Ω) = ∫ β P (Y |β)N(β|Ω)dβ = ∫ β I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi 1 + eX T ijtβi N(β|Ω)dβ = ∫ β I∏ i=1 J∏ j=1 Ni∏ t=1 eyijtX T ijtβi 1 + eX T ijtβi dP βi,p P (Y |Ω) = ∫ β P (Y |β)dP βi,p 14 where Pβi,p is the measure induced by βi,p on measurable space (Si,p, Fi,p). When Miller et al (2006) looked at this problem, the authors attempted to integrate each βi,p individually for every potential value of i and p [108]. For a two- sided heterogeneity distribution, such as a normal heterogeneity distribution, this marginalization would involve: P (Y |Ω) = ∫ β P (Y |β)N(β|Ω)dβ = ∫ β I∏ i=1 J∏ j=1 Ni∏ t=1   ∞∑ kijt=0 (−1)kijte(yijt−1−kijt) ∑P p=1 xijt,pβi,pI( P∑ p=1 xijt,pβi,p > 0) + ∞∑ kijt=0 (−1)kijte(yijt+kijt) ∑P p=1 xijt,pβi,pI( P∑ p=1 xijt,pβi,p < 0) + eyijt ∑P p=1 xijt,pβi,p 1 + e ∑P p=1 xijt,pβi,p I( P∑ p=1 xijt,pβi,p = 0) ] · P∏ p=1 1 √ 2piσp · e −(βi,p−µp) 2 2σ2p dβi,p.(2.11) As the range of βi,p is the entire real line, the limits of the integration space differ for the first and second integrals depending on whether ∑P p=1 xijt,pβi,p < 0 or ∑P p=1 xijt,pβi,p > 0. Miller et al (2006) noted that integrating over both spaces would result in “numerous, complicated subdivisions of the integration space.” These subdivisions, they argued, rendered the integration “untenable” and pre- cluded the derivation of “tractable closed-form expansions.” As a result, the au- thors restricted their model to adhere to only one of the above cases, particularly XTijtβi < 0 1 and required that the density N(β|Ω) being integrated over be a one- sided probability distribution. Making the assumption that N(β|Ω) was composed of independent gamma distributions g(βi,p|bp, np) with parameters bp and np (i.e. N(β|Ω) = ∏P p=1 g(βi,p|bp, np)), they derived the marginalized likelihood as follows: 1 Miller et al (2006) actually parametrized their logit likelihood in a slightly different functional form as P (Y |β) = ∏I i=1 ∏J j=1 ∏Ni t=1 e −yijtX T ijtβi 1+e −XTijtβi and hence equivalently assumed that XTijtβi > 0 and βi,p ≥ 0 ∀ i, p. 15 P (Y |Ω) = ∫ β P (Y |β)N(β|Ω)dβ = ∫ β I∏ i=1 J∏ j=1 Ni∏ t=1   ∞∑ kijt=0 (−1)kijte(yijt+kijt) ∑P p=1 xijt,pβi,p P∏ p=1 g(βi,p|bp, np)dβi,p   = I∏ i=1 ∞∑ ki11=0 · · · ∞∑ kiJNi=0 (−1) ∑J j=1 ∑Ni t=1 kijt P∏ p=1 ∫ ∞ βi,p=0 e− ∑J j=1 ∑Ni t=1(yijt+kijt)xijt,pβi,p · 1 bpΓ(np) ( βi,p bp )np−1 e−βi,p/bpdβi,p = I∏ i=1 ∞∑ ki11=0 · · · ∞∑ kiJNi=0 (−1) ∑J j=1 ∑Ni t=1 kijt P∏ p=1 ( 1 1 + bp ∑J j=1 ∑Ni t=1(yijt + kijt)xijt,p )np Having made assumptions that the explanatory variables Xijt,p were restricted to the set of non-negative integers, Miller et al (2006) borrowed some tools from an- alytic number theory to rewrite the above equation in terms of solutions to a system of Diophantine equations, which made estimating the model significantly more fea- sible from a computational perspective [109]. The interested reader is referred to Miller et al (2006) for a complete discussion of this methodology [108]. 2.2.3 Bayesian Inference via Series Expansions Using a Two-Sided Heterogeneity Distribution Although the Miller et al (2006) result is elegant mathematically, it is not particularly useful to implement in practice as in most applications it is generally unrealistic to a priori assume that the regression coefficients all have the same sign. However, for the case when J = 1 and Ni = 1, a simple transformation of variables leads to very clean and tractable integration, allowing us to integrate within distinct regions along the real line. Restricting J and Ni in this manner is quite reasonable for many applied statistical problems including cross-sectional data analysis with a single category (such as the child poverty application looked at later in this study), longitudinal analysis of a single individual (such as the baseball player hitting streak 16 analysis conducted in Albrght (1993)), or analysis where the heterogeneity can be assumed across all observations of the data set (such as the terrorist attack data analysis conducted in Kyung et al (2012) or the data set used in the analysis of medical outcomes in Wisner (1990)) [1, 90,108,164]. Specifically, if we make the assumption that pi = Prob(yi = 1) is the prob- ability of a particular outcome occurring (e.g. living in poverty, patient mortality, purchase incidence, etc) for the ith individual and again let p = 1, . . . , P represent a set of attributes describing the covariates, with corresponding values xi,p such that XTi = (xi,1, . . . , xi,P ) and take the product across all individuals i, the likelihood function is: P (Y |β) = I∏ i=1 eyiX T i βi 1 + eX T i βi , (2.12) where βi = (βi,1, . . . , βi,P ) and β are defined as before. Upon making these assump- tions, we can recall the marginalization presented in (2.11): P (Y |Ω) = ∫ β P (Y |β)N(β|Ω)dβ = ∫ β I∏ i=1 [ ∞∑ ki=0 (−1)kie(yi−1−ki) ∑P p=1 xi,pβi,pI( P∑ p=1 xi,pβi,p > 0) + ∞∑ ki=0 (−1)kie(yi+ki) ∑P p=1 xi,pβi,pI( P∑ p=1 xi,pβi,p < 0) + eyi ∑P p=1 xi,pβi,p 1 + e ∑P p=1 xi,pβi,p I( P∑ p=1 xi,pβi,p = 0) ] · P∏ p=1 1 √ 2piσp e −(βi,p−µp) 2 2σ2p dβi,p. (2.13) In particular, since we are assuming that the βi,p follow independent nor- mal distributions for p = 1, ..., P (i.e. βi,p ∼ N(µp, σ2p)) it follows that zi = ∑P p=1 xi,pβi,p ∼ N( ∑P p=1 xi,pµp, ∑P p=1 x 2 i,pσ 2 p). Therefore, if we define Pzi as the mea- sure induced by zi on measurable space (Ti, Gi) having density with respect to Lebesgue measure: 17 f(zi) = 1 √ 2pi ∑P p=1 x 2 i,pσ2p e −(zi− ∑P p=1 xi,pβi,p) 2 2 ∑P p=1 x 2 i,pσ 2 p , then: P (Y |Ω) = ∫ β P (Y |β)N(β|Ω)dβ = ∫ β I∏ i=1 eyiX T i βi 1 + eX T i βi N(β|Ω)dβ = ∫ β I∏ i=1 eyiX T i βi 1 + eX T i βi dP βi,p P (Y |Ω) = ∫ β P (Y |β)dP βi,p . Applying our transformation we can see that [149]: P (Y |Ω) = ∫ β P (Y |β)dP βi,p = ∫ TI ... ∫ T1 I∏ i=1 P (yi|zi)dP zi = I∏ i=1 ∫ Ti P (yi|zi)dP zi = I∏ i=1 ∫ zi P (yi|zi) 1 √ 2pi ∑P p=1 x 2 i,pσ2p e −(zi− ∑P p=1 xi,pµp) 2 2 ∑P p=1 x 2 i,pσ 2 p dzi P (Y |Ω) = I∏ i=1 Hi, (2.14) where: 18 Hi = ∫ ∞ −∞ [ ∞∑ ki=0 (−1)kie(yi−1−ki)ziI(zi > 0) + ∞∑ ki=0 (−1)kie(yi+ki)ziI(zi < 0) + eyizi 1 + ezi I(zi = 0) ] · 1 √ 2pi ∑P p=1 x 2 i,pσ2p e −(zi− ∑P p=1 xi,pµp) 2 2 ∑P p=1 x 2 i,pσ 2 p dzi. We can decompose Hi into a sum of three integrals, Hi,1, Hi,2, and Hi,3 where Hi = Hi,1 +Hi,2 +Hi,3. Before we proceed, we present a simple integration lemma for integrating an ex- ponential against a normal distribution with mean µ and variance σ2. Lemma 2.2.1 (Integrating an Exponential Against a Normal Distribution). ∫ ∞ c ekx 1 √ 2piσ e −(x−µ)2 2σ2 dx = ekµ+ k2σ2 2 Φ ( kσ2 − c+ µ σ ) (2.15) ∫ c −∞ ekx 1 √ 2piσ e −(x−µ)2 2σ2 dx = ekµ+ k2σ2 2 Φ ( − kσ2 − c+ µ σ ) (2.16) where Φ(x) is the normal cumulative distribution function. Proof: 19 ∫ ∞ c ekx 1 √ 2piσ e −(x−µ)2 2σ2 dx = ∫ ∞ c 1 √ 2piσ ekx− (x−µ)2 2σ2 dx = ∫ ∞ c 1 √ 2piσ e −1 2σ2 (x−µ)2+kxdx = ∫ ∞ c 1 √ 2piσ e −1 2σ2 [[x−(σ2k+µ)]2−(σ2k+µ)2+µ2]dx = 1 √ 2piσ e −1 2σ2 [−(σ2k+µ)2+µ2] ∫ ∞ c e −1 2σ2 [x−(σ2k+µ)]2dx = e σ2k2 2 +kµ [ 1− Φ ( c− (σ2k + µ) σ )] ∫ ∞ c ekx 1 √ 2piσ e −(x−µ)2 2σ2 dx = ekµ+ k2σ2 2 Φ ( kσ2 − c+ µ σ ) . (2.17) The computation of the second integral is quite similar. As a result of (Lemma 2.2.1), Hi,1 = ∫ ∞ 0 ∞∑ ki=0 (−1)kie(yi−1−ki)zi 1 √ 2pi ∑P p=1 x 2 i,pσ2p e −(zi− ∑P p=1 xi,pµp) 2 2 ∑P p=1 x 2 i,pσ 2 p dzi = 1 √ 2pi ∑P p=1 x 2 i,pσ2p ∞∑ ki=0 (−1)ki ∫ ∞ 0 e(yi−1−ki)zie −(zi− ∑P p=1 xi,pµp) 2 2 ∑P p=1 x 2 i,pσ 2 p dzi Hi,1 = ∞∑ ki=0 (−1)kie (yi−1−ki) 2∑P p=1 x 2 i,pσ 2 p 2 +(yi−1−ki) ∑P p=1 xi,pµp ·Φ   (yi − 1− ki) ∑P p=1 x 2 i,pσ 2 p + ∑P p=1 xi,pµp √∑P p=1 x 2 i,pσ2p   . (2.18) 20 Hi,2 = ∫ 0 −∞ ∞∑ ki=0 (−1)kie(yi+ki)zi 1 √ 2pi ∑P p=1 x 2 i,pσ2p e −(zi− ∑P p=1 xi,pµp) 2 2 ∑P p=1 x 2 i,pσ 2 p dzi = 1 √ 2pi ∑P p=1 x 2 i,pσ2p ∞∑ ki=0 (−1)ki ∫ 0 −∞ e(yi+ki)zie −(zi− ∑P p=1 xi,pµp) 2 2 ∑P p=1 x 2 i,pσ 2 p dzi Hi,2 = ∞∑ ki=0 (−1)kie ((yi+ki)) 2∑P p=1 x 2 i,pσ 2 p 2 +(yi+ki) ∑P p=1 xi,pµp ·Φ  − (yi + ki) ∑P p=1 x 2 i,pσ 2 p + ∑P p=1 xi,pµp √∑P p=1 x 2 i,pσ2p   . (2.19) Hi,3 = 0 as it is an integral against a density on a set of Lebesgue measure zero. As a result, as P (Y |Ω) = ∏I i=1Hi, we estimate our model via maximum likelihood estimation by maximizing logP (Y |Ω) which is equivalent to: logP (Y |Ω) = log I∏ i=1 Hi = I∑ i=1 log(Hi) = I∑ i=1 log(Hi,1 +Hi,2), (2.20) where Hi,1 and Hi,2 are defined as above. We state this result as a theorem as it is the main result of this chapter. Theorem 2.2.2 (Marginalized Logit Likelihood Assuming Independent Normal Prior Distributions). The log marginalized likelihood of (2.12) assuming independent normal heterogeneity distributions, based on a convergent series approximation to (2.3), is provided by: 21 logP (Y |Ω) = I∑ i=1 log(Hi,1 +Hi,2), (2.21) where: Hi,1 = ∞∑ ki=0 (−1)kie (yi−1−ki) 2∑P p=1 x 2 i,pσ 2 p 2 +(yi−1−ki) ∑P p=1 xi,pµp ·Φ   (yi − 1− ki) ∑P p=1 x 2 i,pσ 2 p + ∑P p=1 xi,pµp √∑P p=1 x 2 i,pσ2p   (2.22) Hi,2 = ∞∑ ki=0 (−1)kie ((yi+ki)) 2∑P p=1 x 2 i,pσ 2 p 2 +(yi+ki) ∑P p=1 xi,pµp ·Φ  − (yi + ki) ∑P p=1 x 2 i,pσ 2 p + ∑P p=1 xi,pµp √∑P p=1 x 2 i,pσ2p   (2.23) Theorem 2.2.2 provides the marginalized likelihood, and we can estimate our parameters µp and σ2p for p = 1, ..., P by maximizing the above equation and com- pute associated p-values to determine the statistical significance of the resulting estimates. This marginalization reduces the parameter space from one of IP dimen- sions to 2P dimensions, making model estimation on large data sets considerably easier. 22 2.3 Simulations 2.3.1 Numerical Simulations We illustrate the efficacy of our method by performing a series of numerical simulations. In particular, we conducted a series of simulations for p = 2, 3, and 4 attributes, allowing I = 1000, JNi = 1, and 300 terms in the series expansion (i.e. 1000 households, 1 category, and 1 occasion). These simulations are a subset of a larger number of simulations conducted that is available upon request. For each vector of parameters (µ1,...,µP ,...,σ21,...,σ 2 P ), we performed 25 simulations. For each such simulate, we simulated I = 1000 values of (xi,1, ..., xi,P ), rescaled them by dividing by a constant to allow for enough 0/1 variation of yi, numerically ap- proximated P (Y |Ω) via (2.20), and maximized the resulting marginal likelihood as a function of Ω. These simulations were run using MATLAB on an AMD 2.2 GHz Triple Core Processor with 8 GB of RAM. Our results are summarized in Tables 2.1-2.9, con- sisting of the true values of (µ1,...,µP ,...,σ21,...,σ 2 P ), the mean and standard deviation of each of these values, and the corresponding t-statistics. The simulations corre- sponding to p = 4 attributes is split up into several tables (Tables 2.5-2.9) due to space constraints. In order to determine whether the values were in numerical cor- respondence with the true values, we conducted t-tests for each of the parameters. This resulted in comparing the computed t-statistics to a t-distribution with 24 de- grees of freedom, as a result of performing 25 simulations for each set of parameter estimates. After instituting conservative but commonly accepted Bonferroni correc- tions, we used critical values of 3.791 for p = 2 attributes, 4.051 for p = 3 attributes, and 4.244 for p = 4 attributes to compare the calculated t-statistics against. All of our values are well below these critical values. These significance tests therefore do not suggest any meaningful disparity between the estimated parameter values 23 and the true values. As these runs were based on simulated data from a plethora of normal distributions, these simulations demonstrate the strong accuracy of our poly- nomial approximations as well as the efficacy of our new technique. Additionally, the final few terms of the tails of the truncated series approximations were essen- tially zero (according to MATLAB output) for a variety of choices of truncation levels. This fact suggested that, at least for these simulations, the truncated series expansions based on 300 terms were reasonable approximations to the marginalized likelihood. Tab. 2.1: Numerical Simulations for P = 2 attributes (µ1, µ2) (σ21, σ 2 2) (µ1, µ2) (σ 2 1, σ 2 2) (σµ1 , σµ2) (σσ21 , σσ22) (-8, -3) (3, 8) (-7.970, -2.972) (2.556, 7.658) (1.581, 1.417) (1.195, 1.387) (-4, -14) (5, 6) (-3.728, -14.175) (4.772, 6.511) (1.188, 1.236) (1.273, 1.037) (-7, -5) (5, 3) (-7.300, -5.078) (4.691, 2.710) (1.434, 1.426) (1.582, 1.366) (5, -4) (4, 5) (4.865, -3.842) (4.265, 5.239) (1.001, 1.210) (1.432, 1.150) (-5, 4) (4, 5) (-4.801, 4.123) (4.385, 5.072) (1.408, 1.318) (1.252, 1.252) (5, 4) (4, 5) (5.062, 3.893) (4.429, 5.312) (1.377, 1.279) (1.441, 1.086) (8, -3) (3, 8) (8.069, -3.232) (2.771, 8.290) (1.742, 1.816) (0.969, 0.636) (-8, 3) (3, 8) (-7.601, 2.904) (2.942, 8.389) (1.587, 1.414) (0.745, 0.885) (8, 3) (3, 8) (8.182, 2.970) (2.173, 8.484) (1.329, 1.362) (1.387, 1.873) (4, -14) (5, 6) (3.779, -14.461) (5.259, 5.584) (1.077, 1.353) (1.242, 1.303) (-4, 14) (5, 6) (-3.945, 13.877) (5.167, 5.557) (0.945, 1.135) (1.088, 1.012) (7, -5) (5, 3) (7.061, -4.359) (5.005, 3.416) (1.253, 1.111) (1.686, 1.374) (-7, 5) (5, 3) (-7.666, 5.115) (4.806, 2.929) (1.036, 1.319) (1.344, 1.542) (7, 5) (5, 3) (7.327, 4.565) (4.786, 2.678) (1.410, 1.240) (0.858, 1.323) 2.3.2 Comparisons to MCMC Methods We also assessed the efficacy of our series expansion approach by comparing the speed of the approach to that of existing MCMC methods [50]. For our MCMC baseline comparison, we used the bayesm package in R (Rossi and McCullouch 2006) and to ensure a strict “apples-to-apples” comparison of computing times, we also translated our series expansion approach from MATLAB to R. We ran the 24 Tab. 2.2: t-statistics for P = 2 attributes (µ1, µ2) (σ21, σ 2 2) t-stat (µ1) t-stat (µ2) t-stat (σ 2 1) t-stat (σ 2 2) (-8, -3) (3, 8) 0.095 0.098 1.859 1.232 (-4, -14) (5, 6) 1.147 -0.708 0.897 -2.461 (-7, -5) (5, 3) -1.047 -0.272 0.976 1.063 (5, -4) (4, 5) -0.672 0.652 -0.924 -1.040 (-5, 4) (4, 5) 0.706 0.466 -1.538 -0.289 (5, 4) (4, 5) 0.224 -0.418 -1.490 -1.437 (8, -3) (3, 8) 0.197 -0.640 1.180 -2.275 (-8, 3) (3, 8) 1.257 -0.339 0.392 -2.201 (8, 3) (3, 8) -0.683 0.111 2.981 -1.291 (4, -14) (5, 6) -1.025 -1.704 -1.042 1.597 (-4, 14) (5, 6) 0.292 -0.542 -0.765 2.187 (7, -5) (5, 3) 0.242 2.887 -0.013 -1.514 (-7, 5) (5, 3) -3.212 0.435 0.723 0.230 (7, 5) (5, 3) 1.159 -1.756 1.246 1.216 Tab. 2.3: Numerical Simulations for P = 3 attributes (µ1, µ2, µ3) (σ 2 1 , σ 2 2 , σ 2 3) (µ1, µ2, µ3) (σ 2 1 , σ 2 2 , σ 2 3) (σµ1 , σµ2 , σµ3 ) (σσ1 , σσ2 , σσ3 ) (-5, -6, -7) (3, 4, 3) (-5.010, -6.298, -6.883) (3.268, 3.973, 3.232) (1.591, 1.453, 1.704) (1.037, 1.091, 0.941) (5, -6, -7) (3, 4, 3) (5.030, -5.690, -7.658) (3.028, 4.098, 2.679) (1.788, 1.897, 1.382) (1.397, 1.128, 1.357) (-5, 6, -7) (3, 4, 3) (-5.283, 6.043, -6.691) (2.685, 3.947, 3.189) (1.159, 1.369, 1.617) (1.288, 0.748, 0.923) (-5, -6, 7) (3, 4, 3) (-4.803, -5.962, 7.247) (2.930, 4.096, 2.885) (1.242, 1.541, 1.396) (1.250, 0.882, 1.139) (5, -6, 7) (3, 4, 3) (4.745, -5.889, 7.093) (3.130, 3.925, 3.263) (1.295, 1.600, 1.501) (1.054, 1.038, 0.766) (5, 6, 7) (3, 4, 4) (4.964, 6.290, 6.987) (2.972,4.339, 4.123) (1.542, 1.423, 1.593) (0.970, 1.617, 1.397) (-15, -4, -6) (2, 7, 4) (-15.347, -3.751, -6.367) (2.116, 7.619, 3.472) (2.045, 1.894, 2.162) (1.009, 2.977, 2.961) (15, -4, -6) (2, 7, 4) (14.352, -3.913, -5.930) (2.005, 7.058, 4.202) (1.708, 1.568, 1.579) (0.825, 2.255, 1.830) (-9, -8, 4) (3, 3, 5) (-9.203, -7.712, 3.560) (3.024, 3.637, 4.689) (1.567, 1.737, 1.919) (1.011, 1.888, 1.167) (-15, -4, 6) (2, 7, 4) (-14.975, -3.882, 5.610) (1.888, 6.995, 4.364) (1.890, 1.528, 1.790) (0.981, 2.478, 2.290) (15, -4, 6) (2, 7, 4) (14.601, -3.780, 6.152) (1.761, 6.503, 4.535) (2.174, 1.919, 1.853) (0.940, 2.714, 2.688) (15, 4, 6) (2, 7, 4) (14.818, 4.131, 5.782) (2.169, 6.344, 4.371) (1.891, 2.127, 1.729) (0.986, 2.706, 2.787) (-9, -8, -4) (3, 3, 5) (-9.039, -8.245, -3.967) (2.721, 2.975, 5.219) (1.641, 1.504, 1.533) (1.989, 2.321, 2.394) (-9, 8, -4) (3, 3, 5) (-9.112, 8.124, -4.107) (3.191, 3.364, 4.949) (1.205, 1.686, 1.434) (1.022, 0.762, 0.598) (-9, -8, 4) (3, 3, 5) (-9.203, -7.712, 3.560) (3.024, 3.637, 4.689) (1.567, 1.737, 1.919) (1.011, 1.888, 1.167) (9, -8, 4) (3, 3, 5) (9.699, -7.759, 3.298) (3.104, 2.810, 4.954) (1.591, 1.361, 1.421) (0.557, 0.782, 0.557) (-9, 8, 4) (3, 3, 5) (-9.176, 7.820, 4.318) (2.949, 3.260, 5.068) (1.440, 1.403, 1.431) (1.157, 1.054, 0.993) (9, 8, 4) (3, 3, 5) (9.394, 8.074, 3.907) (2.709, 2.310, 4.487) (1.693, 1.531, 1.629) (2.403, 2.344, 2.304) Tab. 2.4: t-statistics for P = 3 attributes (µ1, µ2, µ3) (σ 2 1 , σ 2 2 , σ 2 3) t-stat (µ1) t-stat (µ2) t-stat (µ3) t-stat (σ 2 1) t-stat (σ 2 2) t-stat (σ 2 3) (-5, -6, -7) (3, 4, 3) -0.030 -1.024 0.342 -1.293 0.125 -1.231 (5, -6, -7) (3, 4, 3) 0.083 0.818 -2.382 -0.101 -0.433 1.185 (-5, 6, -7) (3, 4, 3) -1.221 0.156 0.955 1.222 0.352 -1.023 (-5, -6, 7) (3, 4, 3) 0.791 0.123 0.883 0.279 -0.544 0.507 (5, -6, 7) (3, 4, 3) -0.984 0.347 0.310 -0.614 0.364 -1.717 (5, 6, 7) (3, 4, 4) 0.118 -1.016 0.041 0.147 -1.048 -0.442 (-15, -4, -6) (2, 7, 4) -0.848 0.657 -0.849 -0.572 -1.040 0.892 (15, -4, -6) (2, 7, 4) -1.897 0.279 0.223 -0.031 -0.129 -0.552 (-9, -8, 4) (3, 3, 5) 0.648 -0.829 1.146 0.121 1.686 -1.333 (-15, -4, 6) (2, 7, 4) 0.066 0.385 -1.089 0.570 0.010 -0.794 (15, -4, 6) (2, 7, 4) -0.917 0.573 0.409 1.273 0.916 -0.996 (15, 4, 6) (2, 7, 4) -0.481 0.307 -0.630 -0.856 1.213 -0.666 (-9, -8, -4) (3, 3, 5) 0.118 0.814 -0.107 -0.702 -0.054 0.456 (-9, 8, -4) (3, 3, 5) 0.465 -0.369 0.372 0.932 2.392 -0.423 (-9, -8, 4) (3, 3, 5) 0.648 -0.829 1.146 0.121 1.686 -1.333 (9, -8, 4) (3, 3, 5) -2.197 -0.887 2.472 0.936 -1.215 -0.411 (-9, 8, 4) (3, 3, 5) 0.611 0.640 -1.111 -0.222 1.235 0.342 (9, 8, 4) (3, 3, 5) -1.163 -0.242 0.285 -0.605 -1.472 -1.113 25 Tab. 2.5: Numerical Simulations for P = 4 attributes (µ1, µ2, µ3, µ4) (σ21, σ 2 2, σ 2 3, σ 2 4) (-5, -14, -3, -2) (3,6,4,6) (-10,-12,-5,-5) (3,3,3,4) (-7,-5,-5,-3) (4,6,5,4) (5,-14,-3,-5) (3,6,4,6) (-5,14,-3,-5) (3,6,4,6) (-5,-14,-3,5) (3,6,4,6) (5,14,-3,5) (3,6,4,6) (5,-14,3,-5) (3,6,4,6) (5,14,3,5) (3,6,4,6) (10,-12,-5,-5) (3,4,3,4) (-10,12,-5,-5) (3,4,3,4) (-10,-12,5,-5) (3,4,3,4) (-10,-12,-5,5) (3,4,3,4) (10,-12,5,-5) (3,4,3,4) (-10,12,-5,5) (3,4,3,4) (10,12,5,5) (3,4,3,4) (7,-5,-5,-3) (4,6,5,4) (-7,5,-5,-3) (4,6,5,4) (-7,-5,5,-3) (4,6,5,4) (7,-5,5,-3) (4,6,5,4) (-7,5,-5,3) (4,6,5,4) (7,5,5,3) (4,6,5,4) 26 Tab. 2.6: Numerical Simulations for P = 4 attributes (continued) (µ1, µ2, µ3, µ4) (σ21, σ 2 2, σ 2 3, σ 2 4) (-4.957, -13.727, -3.254, -2.323) (2.368, 6.014, 3.736, 5.074) (-9.651, -11.951, -5.249, -4.794) (2.753, 3.035, 2.806, 3.984) (-7.269, -4.924, -5.028, -3.163) (3.523, 5.720, 4.367, 3.839) (5.088, -14.096, -3.556, -4.840) (2.437, 5.938, 4.067, 6.112) (-5.618, 14.514, -2.732, -4.960) (2.832, 6.048, 3.760, 5.845) (-4.707, -14.606, -2.770, 5.084) (2.694, 5.589, 4.116, 6.014) (-5.120, 14.078, -3.168, 5.304) (3.410, 5.987, 4.348, 5.793) (4.987, -14.345, 3.140, -4.963) (2.412, 5.604, 3.876, 5.818) (5.062, 14.542, 2.360, 5.343) (2.597, 5.537, 3.947, 5.476) (9.624, -11.746, -5.127, -5.109) (2.659, 4.061, 3.088, 3.973) (-10.161, 12.526, -4.934, -5.120) (3.124, 4.026, 3.069, 4.073) (-10.167, -12.214, 4.796, -4.899) (2.685, 3.774, 3.066, 3.636) (-9.792, -11.717, -5.133, 4.834) (3.243, 4.206, 3.034, 3.243) (10.076, -11.867, 5.134, -4.972) (2.969, 4.024, 2.934, 4.088) (-9.833, 12.462, -5.085, 4.123) (2.964, 3.948, 3.008, 3.934) (10.432, 11.902, 5.244, 4.756) (3.054, 4.280, 2.398, 3.640) (7.045, -5.497, -4.710, -3.204) (4.347, 6.055, 4.975, 3.883) (-7.097, 5.174, -5.122, -2.976) (3.875, 5.943, 4.816, 3.799) (-6.715, -5.053, 5.164, -3.247) (4.037, 6.064, 4.898, 4.066) (7.361, -4.920, 4.834, -2.846) (3.907, 5.966, 5.073, 3.997) (-7.234, 4.965, -5.234, 3.456) (4.026, 5.923, 5.091, 4.172) (6.766, 5.419, 5.249, 3.109) (3.757, 5.210, 4.864, 4.174) 27 Tab. 2.7: Numerical Simulations for P = 4 attributes (continued) (σµ1 , σµ2 , σµ3 , σµ4) (σσ21 , σσ22 , σσ23 , σσ24) (1.956, 1.762, 1.672, 1.712) (1.231, 1.878, 1.525, 1.599) (1.938, 1.898, 1.712, 1.827) (2.301, 2.506, 2.298, 2.501) (1.652, 1.479, 1.743, 1.574) (1.845, 2.094, 2.099, 2.007) (1.595, 1.482, 1.471, 1.601) (1.593, 0.967, 0.737, 1.153) (1.277, 1.576, 1.337, 1.517) (1.365, 0.934, 0.749, 0.565) (1.382, 1.612, 1.570, 1.670) (0.914, 1.146, 0.939, 0.492) (1.707, 1.542, 1.451, 1.287) (1.502, 0.906, 1.061, 0.875) (1.440, 1.811, 1.365, 1.349) (1.132, 0.789, 0.786, 0.724) (1.380, 1.726, 1.839, 1.850) (1.822, 2.091, 2.155, 2.075) (1.372, 1.680, 1.645, 1.379) (1.313, 0.848, 1.171, 0.761) (1.588, 1.542, 1.434, 1.666) (0.740, 0.549, 0.887, 0.660) (1.805, 1.776, 1.723, 1.866) (1.928, 2.108, 2.050, 2.163) (1.798, 1.769, 1.545, 1.692) (1.936, 1.835, 1.826, 1.678) (1.656, 1.866, 1.496, 1.623) (0.502, 0.479, 0.235, 0.294) (1.564, 1.680, 1.755, 1.518) (0.780, 0.575, 0.763, 0.775) (1.753, 1.815, 1.588, 1.806) (2.352, 2.435, 2.215, 2.432) (1.138, 1.449, 1.528, 1.654) (1.340, 0.880, 1.124, 1.040) (1.519, 1.793, 1.945, 1.562) (1.009, 0.880, 0.857, 0.854) (1.681, 1.850, 1.458, 1.728) (0.469, 0.898, 0.277, 0.689) (1.294, 1.450, 1.624, 1.433) (0.657, 0.267, 0.215, 0.528) (1.376, 1.826, 1.736, 1.493) (1.185, 0.630, 0.966, 0.527) (1.590, 1.671, 1.557, 1.679) (1.903, 2.122, 2.165, 2.243) 28 Tab. 2.8: Numerical Simulations for P = 4 attributes (continued) t-stat (µ1) t-stat (µ2) t-stat (µ3) t-stat (µ4) 0.111 0.774 -0.758 -0.942 0.899 0.129 -0.727 0.563 -0.813 0.257 -0.079 -0.519 0.276 -0.325 -1.891 0.500 -2.419 1.631 1.003 0.132 1.060 -1.881 0.731 0.250 -0.351 0.254 -0.580 1.180 -0.045 -0.951 0.513 0.138 0.225 1.570 -1.740 0.928 -1.370 0.756 -0.387 -0.396 -0.508 1.705 0.231 -0.360 -0.463 -0.603 -0.591 0.272 0.580 0.799 -0.431 -0.491 0.230 0.356 0.449 0.085 0.534 1.375 -0.241 -2.889 1.233 -0.269 0.768 -0.675 0.199 -1.713 0.949 -0.616 -0.320 0.486 -0.313 0.078 0.848 -0.144 0.562 -0.715 1.394 0.274 -0.512 0.538 -0.849 -0.097 -0.674 1.526 -0.737 1.255 0.801 0.324 29 Tab. 2.9: Numerical Simulations for P = 4 attributes (continued) t-stat (σ21) t-stat (σ 2 2) t-stat (σ 2 3) t-stat (σ 2 4) 2.568 -0.038 0.866 2.897 0.537 -0.070 0.421 0.032 1.294 0.668 1.508 0.400 1.768 0.324 -0.452 -0.485 0.615 -0.255 1.606 1.372 1.675 1.792 -0.616 -0.147 -1.366 0.070 -1.639 1.183 2.597 2.513 0.790 1.255 1.107 1.106 0.124 1.262 1.300 -0.359 -0.376 0.175 -0.841 -0.236 -0.388 -0.555 0.817 0.537 -0.160 0.841 -0.627 -0.560 -0.094 2.255 0.306 -0.248 1.405 -1.490 0.229 0.455 -0.054 0.425 -0.115 -0.576 1.360 0.741 -1.294 -0.310 0.112 0.563 0.622 0.326 1.072 1.177 -0.392 -0.354 1.841 -0.481 0.708 0.628 -1.689 0.028 -0.108 0.608 -0.471 -1.629 0.638 1.860 0.314 -0.388 30 MCMC sampler for 20,000 iterations, after allowing for 5,000 burn-in iterations. In particular, we simulated data and estimated our model on this simulated data for I = 1000 individuals, letting p = 2, 3, or 4 attributes, and JNi = 1. We truncated our series expansions after 300 terms, which was shown to be quite sufficient in the previous section. The computing times in minutes for each method is presented in Table 2.10. Tab. 2.10: Comparisons of Computing Time for closed-form Series Expansion Approach versus MCMC Methods (in minutes) Series Expansions MCMC Methods p = 2 attributes 2.20 69.20 p = 3 attributes 11.29 69.60 p = 4 attributes 10.98 69.30 As Table 2.10 illustrates, our new technique clearly outperforms existing MCMC methods. Autocorrelations of draws from the posterior suggested that the number of iterations performed (20,000 along with 5,000 for burn-in) was necessary in order to begin having a sense of the entire posterior distribution, although running the sam- pler for more iterations (and hence for a longer amount of time) would be advisable in practice. Thus, it is quite clear that our non-iterative series expansion approach outperforms existing MCMC methods in computing time. These computational gains can be quite substantial for large data sets. 31 2.4 A Few Generalizations 2.4.1 Incorporating covariances: Positing Heterogeneity from a Multivariate Normal Distribution Previously, we had assumed that our parameters βi,1, ..., βi,p were drawn from independent normal distributions. It is possible, however, to generalize this assump- tion and allow for correlations amongst the different response coefficients. In some applications, allowing for correlations is an important assumption to make; for ex- ample, in modeling consumer choice of products, a consumer’s price sensitivity may be related to sensitivities to other marketing-related covariates, such as the presence of a promotion or even the time of year. Still retaining our initial assumption that all households are independent, we can allow βi = (βi,1, ..., βi,p) to be drawn from a multivariate normal distribution with mean µ and variance-covariance matrix Σ: P (Y |β) = I∏ i=1 eyiX T i βi 1 + eX T i βi , where (βi,1, ...βi,p) ∼MVN(~µ,Σ). (2.24) The diagonal elements of Σ represent variances σ21, ...σ 2 p and the off-diagonal elements ρm,n for m 6= n allow us to model correlations amongst the various coeffi- cients. The result is a more general integral to evaluate. As we did earlier, we can define zi = ∑P p=1 xi,pβi,p. Now that we have covariances in our model, however, it follows that zi ∼ N( ∑P p=1 xi,pµp, ∑P p=1 x 2 i,pσ 2 p + ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn). Therefore, if we define Pzi as the measure induced by zi on measurable space (Ti, Gi) having density with respect to Lebesgue measure: 32 f(zi) = 1 √ 2pi ∑P p=1 x 2 i,pσ 2 p + ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn e −(zi− ∑P p=1 xi,pβi,p) 2 2 ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn , then we can again integrate Hi as defined in (2.20): Hi = ∫ ∞ −∞ [ ∞∑ ki=0 (−1)kie(yi−1−ki)ziI(zi > 0) + ∞∑ ki=0 (−1)kie(yi+ki)ziI(zi < 0) + eyizi 1 + ezi I(zi = 0) ] f(zi)dzi. We can again decompose Hi into a sum of three integrals, Hi,1, Hi,2, and Hi,3 where Hi = Hi,1 +Hi,2 +Hi,3. And once again, as a result of Lemma 2.2.1, Hi,1 = ∫ ∞ 0 ∞∑ ki=0 (−1)kie(yi−1−ki)zi e −(zi− ∑P p=1 xi,pβi,p) 2 2 ∑P p=1 x 2 i,pσ 2 p+ ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn √ 2pi ∑P p=1 x 2 i,pσ2p + ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn dzi = 1 √ 2pi ∑P p=1 x 2 i,pσ2p + ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn ∞∑ ki=0 (−1)ki ∫ ∞ 0 e(yi−1−ki)zi ·e −(zi− ∑P p=1 xi,pβi,p) 2 2 ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn dzi Hi,1 = ∞∑ ki=0 (−1)kie (yi−1−ki) 2( ∑P p=1 x 2 i,pσ 2 p+ ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn) 2 +(yi−1−ki) ∑P p=1 xi,pµp ·Φ   (yi − 1− ki)( ∑P p=1 x 2 i,pσ 2 p + ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn) + ∑P p=1 xi,pµp √∑P p=1 x 2 i,pσ2p + ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn   . (2.25) 33 Hi,2 = ∫ 0 −∞ ∞∑ ki=0 (−1)kie(yi+ki)zi e −(zi− ∑P p=1 xi,pβi,p) 2 2 ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn √ 2pi ∑P p=1 x 2 i,pσ2p + ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn dzi = 1 √ 2pi ∑P p=1 x 2 i,pσ2p + ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn ∞∑ ki=0 (−1)ki ∫ 0 −∞ e(yi+ki)zi ·e −(zi− ∑P p=1 xi,pβi,p) 2 2 ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn Hi,2 = ∞∑ ki=0 (−1)kie (yi+ki) 2( ∑P p=1 x 2 i,pσ 2 p+ ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn) 2 +(yi+ki) ∑P p=1 xi,pµp ·Φ  − (yi + ki)( ∑P p=1 x 2 i,pσ 2 p + ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn) + ∑P p=1 xi,pµp √∑P p=1 x 2 i,pσ2p + ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn   . (2.26) Hi,3 = 0 as it is once again an integral against a density on a set of Lebesgue measure zero. As a result, as P (Y |Ω) = ∏I i=1Hi, we can again estimate our model, now incorporating correlations amongst the coefficients, via the method of marginal maximum likelihood by maximizing logP (Y |Ω) which is equivalent to: logP (Y |Ω) = log I∏ i=1 Hi = I∑ i=1 J∑ j=1 Ni∑ t=1 log(Hi) = I∑ i=1 J∑ j=1 Ni∑ t=1 log(Hi,1 +Hi,2). (2.27) The parameter space is now considerably larger given the need to estimate the off- diagonal coefficients ρm,n but can still be optimized over using standard optimization routines. 34 2.4.2 Allowing dependence on other factors Some researchers may want to allow the mean of βi to depend on certain co- variates (Zi,1, . . . , Zi,k) that may represent K demographic factors for each particular individual [141]. Under such a model, we would have: P (Y |β) = I∏ i=1 eyiX T i βi 1 + eX T i βi , where (βi,1, ...βi,p) ∼MVN(~∆,Σ), (2.28) where ~∆ = ( ∑K k=1 Zi,kµ1,k, ..., ∑K k=1 Zi,kµP,k) and once again, the diagonal elements of Σ represent variances σ21, ...σ 2 p and the off-diagonal elements ρm,n for m 6= n allow us to model correlations amongst the various coefficients. The density for zi = ∑P p=1 xi,pβi,p with which we integrate our series expansions with respect to is now a bit different: zi ∼ N( P∑ p=1 xi,p K∑ k=1 Zi,kµp,k, P∑ p=1 x2i,pσ 2 p + ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn). (2.29) Therefore, we can now redefine Pzi as the measure induced by zi on measurable space (Ti, Gi) having density with respect to Lebesgue measure, f(zi) = 1 √ 2pi ∑P p=1 x 2 i,pσ 2 p + ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn e −(zi− ∑P p=1 xi,p ∑K k=1 Zi,kµp,k) 2 2 ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn then again: 35 Hi = ∫ ∞ −∞ [ ∞∑ ki=0 (−1)kie(yi−1−ki)ziI(zi > 0) + ∞∑ ki=0 (−1)kie(yi+ki)ziI(zi < 0) + eyizi 1 + ezi I(zi = 0) ] f(zi)dzi. Once again decomposing Hi into a sum of three integrals, Hi,1, Hi,2, and Hi,3 where Hi = Hi,1 +Hi,2 +Hi,3 and utilizing Lemma 2.2.1, Hi,1 = ∫ ∞ 0 ∞∑ ki=0 (−1)kie(yi−1−ki)zi e −(zi− ∑P p=1 xi,p ∑K k=1 Zi,kµp,k) 2 2 ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn √ 2pi ∑P p=1 x 2 i,pσ 2 p + ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn dzi = 1 √ 2pi ∑P p=1 x 2 i,pσ 2 p + ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn ∞∑ ki=0 (−1)ki ∫ ∞ 0 e(yi−1−ki)zi ·e −(zi− ∑P p=1 xi,p ∑K k=1 Zi,kµp,k) 2 2 ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn dzi Hi,1 = ∞∑ ki=0 (−1)kie (yi−1−ki) 2( ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn) 2 +(yi−1−ki) ∑P p=1 xi,p ∑K k=1 Zi,kµp,k ·Φ   (yi − 1− ki)( ∑P p=1 x 2 i,pσ 2 p + ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn) + ∑P p=1 xi,p ∑K k=1 Zi,kµp,k √∑P p=1 x 2 i,pσ 2 p + ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn   . (2.30) 36 Hi,2 = ∫ 0 −∞ ∞∑ ki=0 (−1)kie(yi+ki)zi e −(zi− ∑P p=1 xi,p ∑K k=1 Zi,kµp,k) 2 2 ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn √ 2pi ∑P p=1 x 2 i,pσ 2 p + ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn dzi = 1 √ 2pi ∑P p=1 x 2 i,pσ 2 p + ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn ∞∑ ki=0 (−1)ki ∫ 0 −∞ e(yi+ki)zi ·e −(zi− ∑P p=1 xi,p ∑K k=1 Zi,kµp,k) 2 2 ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn Hi,2 = ∞∑ ki=0 (−1)kie (yi+ki) 2( ∑P p=1 x 2 i,pσ 2 p+ ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn) 2 +(yi+ki) ∑P p=1 xi,p ∑K k=1 Zi,kµp,k ·Φ  − (yi + ki)( ∑P p=1 x 2 i,pσ 2 p + ∑ m 6=n,m,n≤P xi,mxi,nρm,nσmσn) + ∑P p=1 xi,p ∑K k=1 Zi,kµp,k √∑P p=1 x 2 i,pσ 2 p + ∑ m6=n,m,n≤P xi,mxi,nρm,nσmσn   . (2.31) As before, Hi,3 = 0. We utilize these equations to maximize (2.20), which again now has a slightly larger parameter space as a result of βi,p’s dependence on Zi,k. 2.4.3 Arbitrary Priors There is no reason to solely restrict ourselves to normal prior distributions. If we look at the pertinent integration, we can nicely generalize our result provided we have an analytic expression for the probability density function representing zi = ∑P p=1 xi,pβi,p: Hi = ∫ ∞ −∞ [ ∞∑ ki=0 (−1)kie(yi−1−ki)ziI(zi > 0) + ∞∑ ki=0 (−1)kie(yi+ki)ziI(zi < 0) + eyizi 1 + ezi I(zi = 0) ] f(zi)dzi. It is easy to notice that this is simply just a linear combination of incomplete mo- ment generating functions corresponding to f(zi). Thus provided that a closed-form density for f(zi) exists: 37 Hi = ∞∑ ki=0 (−1)kiIMGF1(yi − 1− ki) + ∞∑ ki=0 (−1)kiIMGF2(yi + ki), (2.32) where IMGF1 and IMGF2 are the incomplete moment generating functions corre- sponding to f(zi), integrated from 0 to ∞ and from −∞ to 0 respectively. For an arbitrary prior, we can use these incomplete moment generating functions to rewrite our marginalized likelihood and maximize the function accordingly, provided that a closed-form density corresponding to f(zi) exists. 2.5 Application to Analyzing Potential Causes of Child Poverty Understanding the potential causes of child poverty is an important issue in pub- lic policy research. With trillions of dollars being thrown at welfare programs for decades, it is paramount for policy researchers to analyze potential determinants of child poverty to evaluate the efficacy of such programs [131]. We do so in this Chapter. 2.5.1 Data We used 2009 Current Population Survey (CPS) Data, compiled by the United States Census Bureau [156]. The CPS data is used for a variety of purposes including for providing the United States Federal Government’s monthly jobs report. From the CPS Data, we extracted children under the age of 18 and information of whether they were below the poverty level (1 if they were below the poverty level and 0 otherwise), education of the head of the household (hereafter referred to as head parent), marital status of the parents, the head parent’s age, whether the head 38 parent was working full time, and how many people were living in the household. The resulting data set consisted of slightly more than 60,000 observations. In existing policy research, these covariates have been considered significant factors in determining whether a child grows up in poverty [63, 130, 134, 145]. Al- though logistic regression has been used to analyze poverty data [150], no policy research, to our knowledge, has looked at this question using models that allow for individual-level heterogeneity. Allowing for heterogeneity is important as there is no a priori reason to believe that some individuals would not respond differently to these factors than others. Furthermore, ignoring heterogeneity can result in the researcher ignoring potential variability in the model and can therefore lead to mis- leading statistical inferences. We therefore used the closed-form expansion derived in the previous sections to estimate a Bayesian binary logistic regression model with child poverty as a binary categorical dependent variable and all the other variables mentioned above as explanatory variables. In order to simplify our model estimation, we assumed no correlations amongst our coefficients. Data was rescaled and coefficients were multiplied again by a constant to ensure for sufficient terms in the series expansion to allow for reasonable approx- imations. Due to the rapid convergence of the series expansions associated with this data set indicated by looking the final few terms of the truncated series ap- proximations for a variety of choices of truncated terms, only 50 terms in our series expansion were necessary. 2.5.2 Estimation Results Our estimation results are depicted in Table 2.11. The use of our closed-form polynomial expansion approach reduced the dimensionality of our problem from approximately 350,000 parameters to simply just 12 parameters. P-values were 39 determined by performing likelihood ratio tests and utilizing Wilk’s Theorem: Tab. 2.11: Empirical Bayesian Logistic Regression Estimation Results on Child Poverty Data Parameter (µp) p-value (σ2p) p-value Intercept 0.955 < 0.001 0.000025 < 0.001 Parents Married -2.006 < 0.001 2.902 < 0.001 Head Parent College Educated -1.889 < 0.001 1.869 < 0.001 Head Parent Working Full Time -1.711 < 0.001 2.577 < 0.001 Head Parent Age -0.030 < 0.030 0.000025 < 0.001 Number of Children in the Household 0.345 < 0.001 0.380 < 0.001 Our results shed light on potential factors influencing child poverty. All of our coefficient estimates are highly significant. In particular, of the covariates looked at, marital status of the parents is an influential predictor of child poverty, as well as the educational level of the head parent, and whether the head parent is working full time. Parental age and the number of children living in the household are also significant factors influencing whether a child lives in poverty. Our variance esti- mates also indicate there is a substantial amount of heterogeneity in the population in how these factors combine to influence child poverty. The significance of our explanatory variables coincides with common sense - Two working parents have the potential to bring in more income to a household than simply just one parent. Additionally, more educated parents have more job opportunities and therefore have greater potential to comfortably support a family than less educated parents. Older parents are typically more mature and understand the challenges associated with having children and often wait until they are ready to do so. Similarly, more responsible parents will also often wait until they are financially ready to have additional children. 40 2.5.3 Policy Implications Our results are in line with research produced by both the Heritage Foundation and the Brookings Institution [130, 132, 145]. Both think tanks have argued that marriage is a powerful antidote to child poverty. Recent research, for example, has argued that marriage in the American society has declined in recent years while out of wedlock births have steadily increased [130]. In 1964, for example, 93% of children were born to married parents while in 2007 only 59% of children were born to married parents. On the other hand, in the mid 1960s, less than 10% of children were born out of wedlock, while in 2007, this number skyrocketed to 40.6%. As our results here indicate, children born out of wedlock are overwhelmingly far more likely to live in poverty than children born to married parents [130,145]. These results have a number of important policy implications. Our results, along with work from both Heritage and Brookings, suggest that increasing marriage can significantly reduce child poverty [134]. In order to do so, state and local policy- makers could consider establishing a campaign of public advertising and education on teenage abstinence as well as the consequences of child bearing outside of mar- riage. These campaigns could also work toward communicating the practical issues faced by single parents, the importance of delaying having children until one is older and more mature, as well as the importance of waiting until one finds a suitable partner before doing so [67]. These campaigns could set normative expectations for younger generations, encouraging young people to become well-educated, to delay having children until marriage, to work full time to support any children they have, and to limit their family size to what they can afford [145]. Just as policymakers have established anti-smoking, anti-drinking, and staying in school campaigns, the impact of such “pro-marriage campaigns” could be quite significant [130]. Additionally, our results also suggest that welfare programs could benefit from significant reform. Currently, many means-tested welfare programs such as food 41 stamps, public housing, and Temporary Assistance to Needy Families (TANF) are structured in a manner that disincentivizes marriage. In particular, many of these programs have penalties for marriage because welfare benefits decline as a family’s income rises. Thus, for many low-income mothers, marriage signifies a decline in gov- ernmental assistance and consequently an overall reduction in the couple’s combined income. These problems with the welfare system can be ameliorated by reforming the Earned Income Tax Credit for married couples with children to counteract the anti-marriage penalties associated with welfare programs [130]. Furthermore, policymakers could also consider strict work and/or study re- quirements for able-bodied welfare recipients. Such “workfare” requirements not only have the potential to increase personal income but also encourage only those truly in need to apply for welfare [129]. Additionally, “workfare” helps enable wel- fare recipients to acquire valuable training that could be useful in finding subsequent full time work. In the 1990s, for example, AFDC programs were fundamentally re- structured into TANF around these ideas, and the result was a rise in employment as well as a marked decline in child poverty. There are many other American wel- fare programs that could benefit from similar reforms [135]. The right reforms could transform many of these programs from the broken safety nets that they currently are into trampolines that foster growth and success. 2.6 Conclusions and Future Research With millions of Americans remaining in poverty despite the federal govern- ment’s endless spending on welfare programs, it is important that policymakers understand the fundamental causes of poverty. Our study has looked at this issue and offers a number of informative suggestions. In particular, our results suggest 42 that policies that work toward educating young people about marriage and having a family could have a significant capacity to reduce child poverty. Additionally, work requirements for able-bodied welfare recipients could also be particularly helpful. Methodologically, we now also have an approach for obtaining closed-form Bayesian inferences for the binary logit model utilizing polynomial expansions, al- lowing for the use of rich two-sided normal prior distribution. These series expan- sions can be made arbitrarily close depending on how many terms the researcher chooses to use in the truncated approximations. Our simulations demonstrate the efficacy of our technique as well as the fact that our method outperforms existing MCMC methods. The speed of our approach provides an attractive alternative to MCMC methods, particularly for large data sets, such as the child poverty data set used here. Our analysis of this child poverty data set suggests that marriage, parental age, parental education, and parental work status are significant factors influencing child poverty. Our findings are in line with extant policy research, and we suggest a number of policy implications and suggestions based on our results coupled with these studies. Although we have focused our application of this model to public policy re- search, there is no reason that this model cannot be applied to other fields where logistic regression is used including marketing, economics, biostatistics, criminology, and professional sports among others. Methodologically, there are also many poten- tial avenues of future research that this study should encourage. For example, we made the assumption in this study that JNi = 1. Future research should look into weakening this restriction. Additionally, a potential avenue of future research is to explore other methods of polynomial expansions and compare them to the approach using geometric series expansions here. Furthermore, although we primarily con- centrated on one particular model - the binary logit model, and one class of priors, the normal distribution, we hope this study spurs research on closed-form Bayesian 43 inferences for other models as well. In particular, a nice aspect about members of the exponential family is that each member has a certain conjugate prior. It could be useful from a computational perspective to use polynomial expansions to approximate posterior distributions within this family for a choice of priors previ- ously considered non-conjugate. Additionally, the binary logit model discussed here belongs to a larger class of generalized linear models (GLMs). A potential avenue of future research could be to utilize polynomial expansions to allow researchers to make closed-form Bayesian inferences based on other GLMs. Additionally, deriving a polynomial expansion approach for the multinomial logistic regression model, a workhorse model in applied economics research, would also be a worthy endeavor of future research [58,99]. It is always useful to have a variety of methods to draw inferences from sta- tistical models, especially for large data sets. We hope the polynomial expansion approach presented here adds significant value to the applied statistician’s toolbox. 44 Chapter 3: Closed-form Bayesian Inferences for Semiparametric Den- sity Ratio Modeling with Applications to Tort Reform 3.1 Introduction 3.1.1 Tort Reform Medical doctors belong to one of the most highly-respected professions in America. Yet they are at risk of facing unnecessary lawsuits everyday. From sur- geons to obstetricians to general practitioners, virtually all doctors fear the risk of lawsuit abuse. These risks have been considered by policymakers on both sides of the aisle to be an important component in reforming our nation’s health care system [73]. One aspect of the American health care system that is worthy of attention is medical malpractice reform. The potential for fraud and abuse of the medical malpractice system unnecessarily raises health care costs by impacting physician supply and forcing doctors to engage in defensive medicine [19, 85, 151]. Medical malpractice reform falls into a larger class of reforms of the civil justice system, known as tort reform. According to Black’s Law Dictionary, a tort is defined as “... a legal wrong committed upon the person or property independent of contract ...” [46]. In settling these disputes, compensation may be awarded to the victims. America has always been a highly litigious country with many frivolous law- suits and unnecessary abuses of the civil justice system. Such a climate has been shown to impose hidden costs on consumers as well as on business, including, but not limited to, the health care, automotive, agricultural, and retail sectors of the American economy [103]. Tort reform seeks to to reduce tort costs by fundamentally reforming the civil justice system to prevent abuse, thereby reducing unnecessary litigation. Recent research has illustrated that tort reform can significantly improve the business climate in America leading to more jobs, better health care, and a more prosperous economy [104]. Policymakers have sought to pursue tort reform in a number of ways. One approach has been to impose monetary caps, which limit the amount of money that a jury may award a plaintiff. These caps may apply to appeal bonds, non-economic damages, punitive damages, or monetary damage awards. Other approaches have been to direct reforms toward other aspects of the civil justice system such as class action lawsuits, imposing statutes of limitations, and requiring attorney fee limita- tions among others. In this chapter, we quantify the impact of medical malpractice reforms by esti- mating the probabilities of extreme tort losses. Computation of these probabilities, however, is not an elementary statistical problem as different states exhibit different degrees of litigiousness. Alaska and Texas have been shown, for example, to be considerably less litigious than other states such as New York and California [104]. Additionally, different localities within states may also exhibit a certain degree of heterogeneity. For example, New York City and various towns in upstate New York may differ how litigious they are. In order to capture this type of state-level heterogeneity in our model, we improve on existing semiparametric density ratio estimation methodologies. A series of numerical simulations, along with goodness of fit computations, illustrates the efficacy of our approach. 46 3.1.2 Bayesian Parametric methods With the rapid improvements in statistical computing power over the course of the last three decades, incorporating heterogeneity in parametric models has become increasingly common in the statistical literature. These days, researchers can choose from a variety of methodologies such as a parametric Bayesian approach, a non-parametric Bayesian approach, a finite mixture approach, or a combination of the finite mixture modeling coupled with Bayesian modeling [74,90,92,140]. Modeling individual-level heterogeneity is often important as there is gener- ally no a priori reason to assume that all individuals (or observations) in a data set behave in an identical manner. However, as data sets often contain limited in- formation about each individual, it is quite difficult, if not impossible, to estimate models incorporating heterogeneity from a frequentist perspective. The Bayesian approach allows the statistician to assume individual-level parameters adhere to a lower dimensional probability distribution and perform statistical inferences based on the parameters of these lower dimensional distributions either via an empirical Bayesian approach or a fully Bayesian approach [48,110]. By reducing the problem’s dimensionality, estimation of these statistical models becomes quite feasible. In this chapter, we adapt such a Bayesian approach to semiparametric method- ology used thus far only for frequentist statistical modeling. Our approach enables these models to accommodate individual-level heterogeneity with high-dimensional parameter spaces. We discuss this semiparametric methodology in the following section. 3.1.3 Density Ratio Estimation Density ratio estimation (DRE) methods were first suggested over thirty years ago. In their 1997 paper, following Prentice and Pyke (1979) and others, Qin and Zhang suggested instead of making the typically strict parametric assumptions about 47 distributions of data sets, that a researcher only make assumptions about the ratios of probability densities (known as tilts) based on subsamples within the data sets. Qin and Zhang (2005) assumed exponential tilts and recommended estimation of these models via the method of empirical likelihood [116,122,124,125]. Over the years, DRE has had many applications in statistical research. For example, Gilbert et al (1999) improved on DRE’s methodologies and applied these improvements to understanding the efficacy of HIV vaccine trials. Several years later, Kedem et al (2008) applied DRE to time series forecasting [52, 81]. In par- ticular, their study used DRE to develop distributional assumptions necessary to forecast mortality rates for various age groups within the United States. Kedem et al (2009) and Voulgaraki et al (2012) applied DRE to cancer research [80,160]. Both studies were able to utilize DRE to determine the significance of certain risk factors for cancer. In addition to these studies, there have been many other studies that have utilized the semiparametric benefits of the DRE approach [43,44,80,119,122]. Fokianos and Qin (2008) employed importance sampling in connection with DRE, which required the generation of artificial data [45]. Prior to that, however, all research using DRE was based on within-sample data. Specifically, data was always divided into smaller subsets (such as cohorts) and comparisons would be made between these subsets. Recent research proposed a innovative adaptation of density ratio estimation known as “out of sample fusion.” Based on the idea of having one primary data set as a reference, and a secondary artificial (potentially simulated) data set, this research illustrated that more accurate inferences can be made about the primary data set by applying DRE to both samples [78,82,168]. Earlier in this dissertation, we illustrated that that this methodology can be particularly useful for making Bayesian inferences when applied to posterior samples. To date, however, although a few studies have applied Bayesian methods to empirical likelihood problems, no studies have done so uniting the semiparametric 48 DRE method with Bayesian methods to model individual-level heterogeneity [91, 106,146,165]. We ameliorate this limitation in this chapter. In particular, we adapt the Bayesian approach mentioned in the previous section to the semiparametric DRE method to model individual-level heterogeneity. We apply this methodology to a data set used in a tort reform study conducted by the Pacific Research Institute to understand overall distributional properties of tort losses throughout the country [19]. 3.2 Problem Formulation Suppose that we have a data set consisting of i = 1, ..., I samples and define j = 1, ..., ni observations within each ith sample, such that we have the following P - dimensional vectors xi,j = (xi,j,1, xi,j,2, ..., xi,j,P ) with ∑I i=1 ni = N . Let M = I + 1 and define probability density functions gi such that: xi,j ∼ gi. (3.1) Additionally, we can also define gI+1 ≡ g as our reference probability density, de- scribing another sample of size nI+1. Assume the densities gi satisfy the following relationship regarding their ratios: gi(xi,j) g(xi,j) = w(θi,xi,j), (3.2) where θi is a vector-valued statistical parameter to be estimated. Without loss of generality, we assume exponential tilts, defining w(θi,xi,j) ≡ eαi+β ′ ih(xi,j), where we are currently assuming h : RP → RP . We allow αi ∼ N(µα, 1) and βi ∼ N(µβ,Σβ) 49 where β′i = (βi,1, ..., βi,P ) be our heterogeneity distributions. 1 By allowing our model’s coefficients to vary for each sample, we enable our model to capture “sample- level” (or individual-level if each sample represents an individual) heterogeneity.2 We begin by assuming that Σβ is a diagonal matrix and hence that the random variables βi,p are statistically independent of each other. Additionally, we make the assumption that ni = 1 ∀ i = 1, . . . , I (to assume a single observation for every individual in a data set) and nI+1 = nM = N . 3.2.1 Bayesian Density Ratio Estimation Let G(x) = GI + 1(x) be the reference CDF and define pij = dG(xi,j) = dGI+1(xi,j). We can utilize the method of constrained empirical likelihood and estimate gi and θi as follows. We can write the empirical likelihood function, based on our pooled data xij : L(θ, GM) = M∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 eαi+β ′ ih(xi,j), (3.3) where θ = (α1, . . . , αI , β1,1, . . . , βI,P ).3 As stated above, we make the assumption that ni = 1 ∀ i = 1, . . . , I and nI+1 = nM = N = I. As a result, the second prod- uct in the exponential terms does not contribute to multiplication of the distortion function. We marginalize the empirical likelihood function to generate a “marginal- ized empirical likelihood” by integrating the above empirical likelihood against the heterogeneity distributions: 1 For example, one potential choice for h is h(x) = x as in Voulgaraki et al (2012) [160]. This definition can be generalized, however, and the dimension of β′i would consequently also need to be altered. 2 We assume the above parameterization for αi (with a constant variance) to ensure statistical identifiability of the model after marginalization. 3 Chapter 4 contains some theoretical discussion regarding empirical likelihood estimation in- volving density ratios. 50 ML(µα,µβ,Σβ, GM ) = ∫ +∞ −∞ ... ∫ +∞ −∞ L(θ, GM ) I∏ i=1 1 √ 2pi e( −(αi−µα) 2 2 )dαi · P∏ p=1 1 √ 2piσ2βp e ( −(βi,p−µβp ) 2 2σ2βp ) dβi,p = ∫ +∞ −∞ ... ∫ +∞ −∞ M∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 eαi+β ′ ih(xi,j) 1 √ 2pi e( −(αi−µα) 2 2 )dαi · P∏ p=1 1 √ 2piσ2βp e ( −(βi,p−µβp ) 2 2σ2βp ) dβi,p = M∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 ∫ +∞ −∞ . . . ∫ +∞ −∞ eαi+β ′ ih(xi,j) 1 √ 2pi e( −(αi−µα) 2 2 )dαi · P∏ p=1 1 √ 2piσ2βp e ( −(βi,p−µβp ) 2 2σ2βp ) dβi,p ML(µα,µβ,Σβ, GM ) = M∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 eµα+ 1 2 eµβ ′h(xi,j)+ 1 2h(xi,j) ′Σβh(xi,j) (3.4) As a result, we have the following theorem: Theorem 3.2.1 (Marginalized Empirical Likelihood for Density Ratio Model As- suming Normal Prior Distributions). The marginalized log-likelihood function of (3.3), assuming normal heterogeneity distributions, is provided by: LL(µα,µβ,Σβ, GM ) = logML(µα,µβ,Σβ, GM ) = M∑ i=1 ni∑ j=1 log pij + I∑ i=1 ni∑ j=1 (µα + 1 2 + µβ ′h(xi,j) + 1 2 h(xi,j) ′Σβh(xi,j)). This result is simply due to taking the logarithm of (3.4). One maximizes the above marginalized likelihood subject to constraints anal- ogous to those used in Voulgaraki et al 2012 [160]: pij ≥ 0, ∑M i=1 ∑ni j=1 pij = 1, and ∑M i=1 ∑ni j=1 pije µα+ 12+µβ ′h(xi,j)+ 1 2h(xi,j) ′Σβh(xi,j) = 1.4 One can perform the opti- 4 This constraint is easy to see after integration of both sides of the constraint imposed in 51 mization of the empirical likelihood function numerically to estimate µα, µβ, and Σβ. 3.2.2 Advantages of Marginalization The marginalization of the empirical likelihood function in equation (3.3) serves a few important purposes. Firstly, in data sets where ni is small, each ob- servation will contain very limited information regarding each αi and βi. As a result, marginalizing the empirical likelihood function by integrating over this high- dimensional parameter space, enables us to markedly reduce the dimensionality of the problem, to a considerably less intricate model involving just µα,µβ, and Σβ. Note that this reduction in dimensionality occurs because the marginalization essentially transforms our model from one density ratio into another. In particular, after starting with I different samples, each of size 1, using a sample of size N = I as a reference, and integrating over the parameter space, the density ratio becomes another exponential with a slightly different functional form. As a result, this new density ratio compares two different distributions, each of sample size N . As the regularity conditions outlined in Fokianos (2004) clearly hold for our marginalized empirical likelihood, our empirical likelihood estimators are statisti- cally unbiased and asymptotically normal [43]. A researcher can test the null hy- pothesis that each coefficient is equal to zero against the alternative that it is non- zero. As illustrated in Kedem et al (2009), the likelihood ratio of each null model to the full model asymptotically follows a chi-squared distribution [80]. 3.2.3 Derivation of Distributions We can use the results from our optimization to estimate the distributions. In particular, we can define γ ≡ λ/2N , where λ is a Lagrange multiplier. We can Voulgaraki et al (2012): ∑M i=1 ∑ni j=1 pije αk+β ′ kh(xi,j) = 1 over the heterogeneity distributions F, providing us with ∫ ∑M i=1 ∑ni j=1 pije αk+β ′ kh(xi,j) dF = ∫ 1 dF ∀ k = 1, . . . , I. 52 subsequently replace γ, µα,µβ, and Σβ by their estimators. As a result, following the derivations in Voulgaraki (2011) [159], estimators of pˆij and Gˆ(x) are provided by: pˆij = 1 2N 1 1 + γˆ[eµˆα+ 1 2 eµˆ ′ βh(xi,j)+ 1 2h(xi,j) ′Σˆβh(xi,j) − 1] (3.5) and: Gˆ(x) = M∑ i=1 ni∑ j=1 pˆijI(xij ≤ x) = 1 2N M∑ i=1 ni∑ j=1 I(xij ≤ x) 1 + γˆ[eµˆα+ 1 2 eµˆ ′ βh(xi,j)+ 1 2h(xi,j) ′Σˆβh(xi,j) − 1] . (3.6) Furthermore, for the “marginalized distribution,” which we will hereafter refer to as Hˆ, we have: Hˆ(x) = M∑ i=1 ni∑ j=1 pˆije µˆα+ 12+µˆ ′ βh(xi,j)+ 1 2h(xi,j) ′Σˆβh(xi,j)I(xij ≤ x) = 1 2N M∑ i=1 ni∑ j=1 eµˆα+ 1 2+µˆ ′ βh(xi,j)+ 1 2h(xi,j) ′Σˆβh(xi,j)I(xij ≤ x) 1 + γˆ[eµˆα+ 1 2 eµˆ ′ βh(xi,j)+ 1 2h(xi,j) ′Σˆβh(xi,j) − 1] . (3.7) For researchers interested in estimating the probability density function of the sample, kernel density estimators can be constructed by smoothing the increments of Hˆ [43, 125, 160]. This research in fact illustrated that one can arrive at more efficient kernel density estimates as a result of combining data. As mentioned earlier in this dissertation, optimal bandwidth selection for the kernel density estimation is discussed in detail in Voulgaraki et al (2012) [160]. 53 3.3 Numerical Simulations To demonstrate the efficacy of our approach, we simulated data sets of size 100 from a variety of distributions (gamma, Weibull, and exponential). The gamma and Weibull distributions were parametrized with shape parameter αi and scale parameter βi while the exponential distribution was parameterized by αi. These parameters were drawn from normal distributions, with αi ∼ N(µα, σ2β) and βi ∼ N(µβ, σ2β). This stochastic nature of our distributions’ parameters enabled us to simulate datasets that, as described earlier, are “heterogeneous” in nature. Assuming the density ratio to follow (3.2) having linear exponential tilts w(αi, βi;xi) = eαi+βixi and only one observation per sample (i.e. ni = 1 ∀ i), we fused this simulated data with a random draw of equal sample size from a spec- ified distribution that served as a reference distribution. Specifically, we took our sample of size 100 and fused it with another sample (a reference distribution) of size 100 from a probability distribution (gamma, uniform, or normal) chosen a priori. When fusing with a gamma distribution, we fit our original sample to a gamma dis- tribution, estimated the gamma distribution’s parameters via standard maximum likelihood techniques, and used a sample of size 100 from this gamma distribution as a reference. When fusing with a uniform distribution, we fused our distribution with a sample over the range of our original sample’s data and used a sample of size 100 from this uniform distribution as a reference. Finally, when fusing with a normal distribution, we fused our simulated data with a sample of size 100 from a normal distribution with sample mean and sample variance equal to that of the original sample. We then estimated the distribution of both samples from the combined data using the method of constrained empirical likelihood. 54 Given the limited information per sample in the simulated data set (as ni = 1), the marginalization of the empirical likelihood discussed in Section 3.2 enables a significant reduction of the dimensionality of the model’s parameter space (from 200 to 3) and is consequently quite useful for understanding distributional properties of the data. To quantify our model’s ability to understand these distributional properties, we used a diagnostic statistic recommended in Voulgaraki et al (2012) [160]: R2α,k = 1− exp [ − ( xα m− xα )k ] . (3.8) In the above equation, m is defined as the number of times the estimated semiparametric cumulative distribution function falls inside the estimated 1 − α confidence interval obtained from the corresponding empirical cumulative distribu- tion function, both functions being evaluated at the sample points. Additionally, k is a prespecified constant, set by the statistician. Since we fused our data with a reference sample from a known distribution, we estimated (3.8) on our reference distribution. If our semiparametric density ratio model were inappropriate, then our estimates would ruin the integrity of this reference distribution, resulting in inaccurate estimates and consequently poor goodness of fit results. We estimated (3.8) for our simulations over choices of k = 1 and k = 2. Following the notation pertaining to (3.2), we refer to our simulated dataset as x1 = (x1,1, x2,1, . . . , xi,1, . . . , xI,1) (that is, ni = 1 and p = 1 ∀ i = 1, . . . , I and p = 1, . . . , P ) and the sample with which we are fusing as x2 = (x1,2, x2,2, . . . , xi,2, . . . , xI,2). In the following tables, αˆMLE and βˆMLE denote the maximum likelihood estimators of x1 when the data are fused with a gamma distribution, x(1,1) and x(I,1) denote the minimum and maximum values of x1 respectively when the data are fused with 55 a uniform distribution, and x¯1 and s2x1 denote the mean and variance of x1 when the data are fused with a normal distribution. Our results are depicted in Tables 3.1-3.3. We notice that our Bayesian ap- proach clearly fits better than, if not as well as, the existing DRE method as R2 under our standard DRE approach is always less than or equal to R2 under our new Bayesian DRE approach. Additionally, we also notice that the choice of x2 with which we fuse our sample with, also affects the goodness of fit results. For example, at least in these simulations, we notice that fusing our data with a uniform distribu- tion results in identical results when comparing the DRE approach to the Bayesian DRE approach. In some cases, such as for our exponentially distributed samples, we notice an identical goodness of fit (and hence the standard DRE approach is just as effective); however, for other cases, such as Weibull samples, fusing with a uniform distribution results in quite poor goodness of fit. These results suggest that when using out of sample fusion to make statistical inferences about a particular data set, the researcher should choose from a variety of different distributions from which to fuse her data with and compare goodness of fit diagnostics to optimally determine the distribution’s best estimate. These results demonstrate a number of important points. Firstly, the den- sity ratios before the marginalization were slightly misspecified to begin with. For example, the proper ratio in equation (3.2) for a gamma distribution (fused with another gamma or uniform distribution) should be of the form w(αi, βi,1, βi,2;xi,1) = eαi+βi,1xi,1+βi,2log(xi,1) whereas we used a density ratio have functional form eαi+βixi instead. The proper ratios for the other distributions examined are also slightly different from what was examined here. In reality, however, when presented with a data set, a researcher generally does not have any detailed specifications regarding the distribution of the data. Making the simple specification as we have here, with an elementary linear tilt function and the marginalization of the resulting empirical 56 Tab. 3.1: First Simulation Set - Out of Sample Fusion with Gamma Distribution, Sample Size I = 100 DRE Bayesian DRE xi,1 µα σ2α µβ σ 2 β xi,2 R 2 0.95,1 R 2 0.95,2 R 2 0.95,1 R 2 0.95,2 Γ(αi, βi) 10 4 10 4 Γ(αˆMLE , βˆMLE) 0.957 0.976 0.964 0.982 Γ(αi, βi) 30 25 10 25 Γ(αˆMLE , βˆMLE) 0.886 0.917 0.902 0.929 Γ(αi, βi) 10 4 30 4 Γ(αˆMLE , βˆMLE) 0.980 0.994 0.980 0.994 Γ(αi, βi) 100 100 30 100 Γ(αˆMLE , βˆMLE) 0.934 0.975 0.950 0.979 Γ(αi, βi) 30 25 100 25 Γ(αˆMLE , βˆMLE) 0.993 1.000 0.993 1.000 Weibull(αi, βi) 10 4 10 4 Γ(αˆMLE , βˆMLE) 0.829 0.871 0.847 0.882 Weibull(αi, βi) 30 25 10 25 Γ(αˆMLE , βˆMLE) 0.837 0.876 0.856 0.893 Weibull(αi, βi) 10 4 30 4 Γ(αˆMLE , βˆMLE) 0.582 0.552 0.598 0.576 Weibull(αi, βi) 100 100 30 100 Γ(αˆMLE , βˆMLE) 0.637 0.624 0.639 0.626 Weibull(αi, βi) 30 25 100 25 Γ(αˆMLE , βˆMLE) 0.475 0.368 0.502 0.410 Exponential(αi) 10 4 NA NA Γ(αˆMLE , βˆMLE) 0.952 0.974 0.952 0.974 Exponential(αi) 30 25 NA NA Γ(αˆMLE , βˆMLE) 0.944 0.965 0.944 0.965 Exponential(αi) 10 4 NA NA Γ(αˆMLE , βˆMLE) 0.948 0.972 0.948 0.972 Exponential(αi) 100 100 NA NA Γ(αˆMLE , βˆMLE) 0.936 0.971 0.936 0.971 Exponential(αi) 30 25 NA NA Γ(αˆMLE , βˆMLE) 0.965 0.987 0.965 0.987 Tab. 3.2: Second Simulation Set - Out of Sample Fusion with Normal Distribution, Sample Size I = 100 DRE Bayesian DRE xi,1 µα σ2α µβ σ 2 β xi,2 R 2 0.95,1 R 2 0.95,2 R 2 0.95,1 R 2 0.95,2 Γ(αi, βi) 10 4 10 4 N(x¯1, s2x1) 0.870 0.905 0.876 0.906 Γ(αi, βi) 30 25 10 25 N(x¯1, s2x1) 0.730 0.738 0.743 0.754 Γ(αi, βi) 10 4 30 4 N(x¯1, s2x1) 0.952 0.984 0.955 0.987 Γ(αi, βi) 100 100 30 100 N(x¯1, s2x1) 0.808 0.827 0.822 0.844 Γ(αi, βi) 30 25 100 25 N(x¯1, s2x1) 0.988 0.999 0.988 0.999 Weibull(αi, βi) 10 4 10 4 N(x¯1, s2x1) 0.881 0.922 0.897 0.936 Weibull(αi, βi) 30 25 10 25 N(x¯1, s2x1) 0.898 0.947 0.922 0.966 Weibull(αi, βi) 10 4 30 4 N(x¯1, s2x1) 0.589 0.550 0.604 0.571 Weibull(αi, βi) 100 100 30 100 N(x¯1, s2x1) 0.664 0.666 0.704 0.723 Weibull(αi, βi) 30 25 100 25 N(x¯1, s2x1) 0.511 0.431 0.514 0.439 Exponential(αi) 10 4 NA NA N(x¯1, s2x1) 0.782 0.824 0.786 0.830 Exponential(αi) 30 25 NA NA N(x¯1, s2x1) 0.853 0.913 0.853 0.913 Exponential(αi) 10 4 NA NA N(x¯1, s2x1) 0.797 0.850 0.801 0.855 Exponential(αi) 100 100 NA NA N(x¯1, s2x1) 0.859 0.910 0.859 0.910 Exponential(αi) 30 25 NA NA N(x¯1, s2x1) 0.832 0.891 0.832 0.891 57 Tab. 3.3: Third Simulation Set - Out of Sample Fusion with Uniform Distribution, Sample Size I = 100 DRE Bayesian DRE xi,1 µα σ2α µβ σ 2 β xi,2 R 2 0.95,1 R 2 0.95,2 R 2 0.95,1 R 2 0.95,2 Γ(αi, βi) 10 4 10 4 Unif(x(1), x(N)) 0.960 0.980 0.960 0.980 Γ(αi, βi) 30 25 10 25 Unif(x(1), x(N)) 0.704 0.699 0.704 0.699 Γ(αi, βi) 10 4 30 4 Unif(x(1), x(N)) 1.000 1.000 1.000 1.000 Γ(αi, βi) 100 100 30 100 Unif(x(1), x(N)) 0.791 0.826 0.791 0.826 Γ(αi, βi) 30 25 100 25 Unif(x(1), x(N)) 1.000 1.000 1.000 1.000 Weibull(αi, βi) 10 4 10 4 Unif(x(1), x(N)) 0.308 0.177 0.308 0.177 Weibull(αi, βi) 30 25 10 25 Unif(x(1), x(N)) 0.303 0.170 0.303 0.170 Weibull(αi, βi) 10 4 30 4 Unif(x(1), x(N)) 0.144 0.037 0.144 0.037 Weibull(αi, βi) 100 100 30 100 Unif(x(1), x(N)) 0.143 0.032 0.143 0.032 Weibull(αi, βi) 30 25 100 25 Unif(x(1), x(N)) 0.107 0.020 0.107 0.020 Exponential(αi) 10 4 NA NA Unif(x(1), x(N)) 1.000 1.000 1.000 1.000 Exponential(αi) 30 25 NA NA Unif(x(1), x(N)) 1.000 1.000 1.000 1.000 Exponential(αi) 10 4 NA NA Unif(x(1), x(N)) 1.000 1.000 1.000 1.000 Exponential(αi) 100 100 NA NA Unif(x(1), x(N)) 0.981 0.999 0.981 0.999 Exponential(αi) 30 25 NA NA Unif(x(1), x(N)) 1.000 1.000 1.000 1.000 likelihood function, still enables us to reasonably estimate our distributions’ data, as our goodness of fit results illustrate. The fact that misspecified tilt functions can still lead to reasonable distributional estimates has been illustrated in Katzoff et al (2013) [78]. In the following section, we discuss a few nice extensions of our theoretical results. 3.4 A Few Generalizations 3.4.1 Arbitrary Prior Distributions and Tilt Functions Although we assumed normal heterogeneity distributions, there is no a priori reason to restrict ourselves to this parametric family. In particular, if we generalize our assumptions and posit an arbitrary prior distribution and tilt function, we notice that that the marginalized empirical likelihood function simply just involves the mo- 58 ment generating function of the prior measure. Mathematically, we can quite easily derive this generalization again under the assumption that ni = 1 ∀ i = 1, . . . , I and nI+1 = nM = N . Following the density ratio in (3.2), and supposing that αi follows a distribution with density P (αi|θα) with respect to Lebesgue measure and that each βi,p follows a distribution, statistically independent of αi and of each other, with density P (βi,p|θβp), we can generalize our marginalized empirical likelihood as follows: ML(θα,θβ1 , . . . ,θβP , GM ) = ∫ +∞ −∞ ... ∫ +∞ −∞ L(θ, GM ) I∏ i=1 P (αi|θα)dαi P∏ p=1 P (βi,p|θβp)dβi,p = ∫ +∞ −∞ . . . ∫ +∞ −∞ M∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 eαi+β ′ ih(xi,j)P (αi|θα)dαi · P∏ p=1 P (βi,p|θβp)dβi,p = M∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 ∫ +∞ −∞ . . . ∫ +∞ −∞ eαi+β ′ ih(xi,j)P (αi|θα)dαi · P∏ p=1 P (βi,p|θβp)dβi,p ML(θα,θβ1 , . . . ,θβP , GM ) = M∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 MGF (θα; 1) P∏ p=1 MGF (θβp ;h(xi,j)), (3.9) where θ = (α1, . . . , αI , β1,1, . . . , βI,P ) as in (3.3). The result is thus in terms of the moment generating functions of our prior distributions (MGF ) and the tilt function h(xi,j). The log-likelihood, subject to the appropriate constraints (similar to those imposed on Theorem 3.2.1), can be optimized to estimate the model. Of course, prior distributions need to be chosen to ensure statistical identifiability. 59 3.4.2 Assuming homogeneity in the intercepts To make estimation easier, the researcher may prefer to restrict assumptions regarding heterogeneity of coefficients corresponding to the explanatory variables only. We can do so by assuming that the intercepts follow a delta mass at a particular constant cα such that αi ∼ δ(cα) : ML(cα,θβ1 , . . . ,θβP , GM ) = ∫ +∞ −∞ ... ∫ +∞ −∞ L(θ, GM ) I∏ i=1 δ(cα)dαi P∏ p=1 P (βi,p|θβp)dβi,p = ∫ +∞ −∞ ... ∫ +∞ −∞ M∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 eαi+β ′ ih(xi,j)δ(cα)dαi · P∏ p=1 P (βi,p|θβp)dβi,p = M∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 ∫ +∞ −∞ . . . ∫ +∞ −∞ eαi+β ′ ih(xi,j)δ(cα)dαi · P∏ p=1 P (βi,p|θβp)dβi,p ML(cα,θβ1 , . . . ,θβP , GM ) = M∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 ecα P∏ p=1 MGF (θβp ;h(xi,j)). (3.10) In the following section, we apply our approach to a fundamental aspect of the American civil justice system with applications to health care reform. 3.5 An Application: Tort Reform In this section, we analyze tort loss data across all fifty states. Recent research has found that tort reform has had a significant impact on reducing insurance premi- ums as well as on tort losses throughout the country. We utilize the Bayesian DRE approach in this study to quantify the probability of tort losses exceeding a specified 60 threshold [19] for 2004 and 2006. Computation of these probabilities enables us to understand on how litigious the country indeed is, thereby shedding light on the efficacy of recently instituted tort reforms. 3.5.1 Data Our dataset was the same data used in the Crain et al (2009) study and was provided to us by two of the paper’s authors [19]. We examined per capita tort losses defined as the ”payments by defendants (or their insurance companies) for judgments, settlements, attorney fees, and administrative expenses in tort lawsuits ...”, in thousands of (real 2006) dollars per capita, of each of the fifty U.S. states in 2004 and 2006 [19]. For this analysis, we adhered to examining medical malpractice tort losses, although analysis of other aspects of the civil justice system (such as automobile insurance, product liability insurance, and homeowners insurance among others) are worthwhile endeavors for future research. Our data is denominated in losses per capita to enable comparable analysis across the different states. The data set consisted of one observation for each of the fifty states (for a total sample size of 50 for each year), similar to what we assumed in the simulation above. As different states and localities throughout the country have the potential to be more litigious than others, it is important to be able to capture this heterogeneity in estimating the probability of tort losses exceeding a particular threshold. This fact, coupled with the fact that our data set contains only one observation per state, makes our semiparametric Bayesian approach particularly useful for reducing the dimensionality of the problem. 3.5.2 Estimation We examined 2004 and 2006 separately, treating each datum within each year as an independent single-observation sample with its own unique parametrization. It is worthwhile to compare the distributions of per capita tort losses between 2004 61 and 2006 to understand the efficacy of tort reforms that had been recently insti- tuted around that time period [19]. In particular, we used the following model specification: gi(x) g(x) = eαi+βix; i = 1, . . . , 50 (3.11) and let αi ∼ N(µα, 1) and βi ∼ N(µβ, σ2β) as we did in Section 3.2. By allowing our model’s coefficients to vary for each state, we enable our model to capture state-level heterogeneity. We estimated the marginalized distribution by fusing the sample with data regarding tort losses from 1996. After estimating our marginalized distribution, we then used the cumulative distribution outlined in (3.7) to compute the probabilities of extreme tort losses. These probabilities better enable us to understand the risks associated with litigation across the country. Additionally, we used a bootstrap approach to develop 95% confidence in- tervals around these point estimates. Specifically, we resampled our dataset (with replacement) 1000 times and re-estimated our probabilities for each sample. We used the resulting set to generate our interval estimates. Our results are are outlined in Tables 3.4-3.7. Tab. 3.4: Coefficient Estimates - 2004, Using Bayesian DRE Approach Coefficient Estimate Lower 95% CI Upper 95% CI µα -1.3514 -2.0632 -0.5848 µβ 0.0419 -0.0300 0.0765 σ2β 0.0003 0.0000 0.0031 Tab. 3.5: Coefficient Estimates - 2006, Using Bayesian DRE Approach Coefficient Estimate Lower 95% CI Upper 95% CI µα 0.4196 -0.2142 1.0331 µβ -0.1102 -0.1896 -0.0368 σ2β 0.0044 0.0066 0.0076 62 Tab. 3.6: Analysis of 2004 Tort Loss Data, Using Bayesian DRE Approach Probability Estimate Lower 95% Limit Upper 95% Limit P(Tort Losses > 35000) 0.100 0.010 0.148 P(Tort Losses > 45000) 0.085 0.005 0.104 P(Tort Losses > 55000) 0.019 0.000 0.060 Tab. 3.7: Analysis of 2006 Tort Loss Data, Using Bayesian DRE Approach Probability Estimate Lower 95% Limit Upper 95% Limit P(Tort Losses > 35000) 0.068 0.010 0.144 P(Tort Losses > 45000) 0.045 0.005 0.102 P(Tort Losses > 55000) 0.019 0.000 0.060 We also estimated the goodness of fit diagnostic statistics outlined in Table 3.8 for k = 1 and k = 2, comparing it to the existing DRE method. These results are outlined in Table 3.8. Additionally, Figures 3.1-3.4 depict plots comparing the estimated distributions of per capita tort losses Hˆ from (3.7) versus the correspond- ing empirical CDFs for 2004 and 2006. The plots suggest that the comparative fit between the two models is quite comparable in 2004 (as there is near perfect agree- ment between the estimated and empirical CDFs) but substantial improvement as a result of the Bayesian approach compared to the standard approach in 2006. Tab. 3.8: Goodness of Fit Diagnostics DRE Bayesian DRE R20.95,1 R 2 0.95,2 R 2 0.95,1 R 2 0.95,2 2004 0.915 0.998 0.911 0.997 2006 0.108 0.129 0.607 0.582 Our results illustrate that the Bayesian DRE approach substantially improved model fit in 2006 suggesting that there was unobserved heterogeneity across the country during that year that the standard DRE approach was unable to properly model. Although there are not really any noticeable differences between the two approaches in 2004 (in fact the standard DRE model performs slightly better), this 63 Fig. 3.1: Plot of Hˆ vs. H˜ - DRE, 2004 Per Capita Tort Loss Data Fig. 3.2: Plot of Hˆ vs. H˜ - Bayesian DRE, 2004 Per Capita Tort Loss Data 64 Fig. 3.3: Plot of Hˆ vs. H˜ - DRE, 2006 Per Capita Tort Loss Data Fig. 3.4: Plot of Hˆ vs. H˜ - Bayesian DRE, 2006 Per Capita Tort Loss Data 65 similarity can be explained by the fact that in 2004, the 50 states may have been considerably more homogeneous in nature regarding litigation. In terms of per capita tort losses, the probabilities of extreme losses clearly declined between 2004 and 2006. In terms of actual tort losses, we notice a reduction in the probabilities of extreme losses in 2006 compared to 2004. These results, in conjunction with the results from Crain et al (2009), illustrate the efficacy of the number of state-based medical malpractice reforms, many of which had been instituted around this time period [19]. Such reforms have included attorney fee limitations, requirements for pre-trial screening, standards regarding expert witnesses, imposition of strict statute of limitations for filing lawsuits, and economic damage caps. These results may also be explained by the fact that existing tort laws, not necessarily recently instituted, may have become more stringently enforced in 2006 compared to 2004. As a side note, although the coefficient estimates of σ2β appear to be low, they should not be taken to imply that there is no heterogeneity in the model. As the units of our data are in thousands of dollars per capita, seemingly small per capita differences between states imply substantial variation in litigiousness amongst the states. 3.5.3 Policy Implications A 2004 study conducted by the non-partisan Congressional Budget Office found that state-based tort reform reduced the number of lawsuits filed, decreased the number of damage awards, and lowered insurance claims [17]. Our results above also illustrate that the risks associated with the civil justice system notably de- clined between 2004 and 2006. These reductions were primarily due to state-based tort reforms that were instituted throughout the country, including economic caps, appeal-bond caps, and standards regarding expert witnesses [19]. Although these results illustrate the efficacy of certain malpractice reforms 66 instituted around this time period, considerably more can be done to reform the civil justice system including passing tort reforms in states where important laws are lacking and more stringently enforcing existing laws [19, 103]. Scholars at the Heritage Foundation, the Cato Institute, the American Enterprise Institute, and the Brookings Institution have discussed the benefits of malpractice reform, noting that such reforms have the potential to improve the environment for physicians to practice, benefiting patients in the process [18,93,139,142]. These benefits are quite well-known. For example, a 2005 study published in the Journal of the American Medical Association found that malpractice reform increased physician supply, particularly with specialties associated with high mal- practice costs [86]. The authors found that without appropriate reforms, the medi- cal malpractice climate had been encouraging early retirements in and discouraging entry to particularly risky specialties such as obstetrics, anesthesiology, radiology, and surgery. Additionally, such an environment results in the practice of defensive medicine, causing physicians to order unneeded tests due to the fear of lawsuits, unnecessarily increasing health care costs [85, 151]. 3.6 Conclusions and Future Research Our study illustrates the efficacy of medical malpractice reform in reducing the probabilities of extreme tort losses. Medical malpractice reform is of course by no means a “silver bullet” for fixing our nation’s broken health care system. Such reforms, however, are a very helpful component of the supply side of health care. On the demand side, policymakers should seek to instill free market competition in the industry, which will notably reduce costs and improve quality of care [13,20,21] Methodologically, our study provides a notable contribution to both Bayesian 67 estimation methodologies as well as to semiparametric modeling. In particular, we have applied empirical Bayesian methods to semiparametric density ratio modeling, allowing statisticians to incorporate individual-level heterogeneity in such models. An interesting aspect of this approach is that our marginalization yields a closed- form expression allowing us to make direct statistical inferences about the population without having to resort to the numerical approaches typically concomitant with Bayesian methods. As discussed in Chapter 2, these numerical approaches such as MCMC methods, can be quite computationally intensive, particularly for large data sets with high-dimensional parameterizations [50]. Additionally, although we focused our application on medical malpractice re- form, there is no reason that this approach cannot be used in other settings where modeling individual-level heterogeneity is important without being forced to adhere to strict parametric assumptions. In particular, the approach outlined here can be useful in many application areas including medical research [160], economics [56], understanding athletic performance [24, 107], and other areas within public policy research [19, 136] among others. Regardless of the application, we believe that the Bayesian approach outlined here can be yet another useful tool for statisticians to use. 68 Chapter 4: Bayesian Inferences of Welfare Reform with Improved Credible Interval Estimation via Semiparametric Out of Sample Fusion 4.1 Introduction 4.1.1 “Ending Welfare as We Know it” Welfare reform is a hot topic of debate amongst policymakers in America [61, 135,152]. The welfare state can be defined as a government system aimed at helping the disadvantaged by providing benefits such health care, pensions, and financial programs aimed to help those in need [14]. The first Chancellor of Germany, Otto von Bismarck, constructed the beginnings of the modern European welfare state in the late 1800s, creating programs such as old age pensions, accident insurance, and medical care. In the early 20th century, the welfare state came to Britain with the introduction of pension programs, free school meals, and unemployment and health benefits. Shortly afterwards, the welfare state began to grow throughout other European countries [14]. Although the United States does not have an overarching welfare state that many European countries do, the country does offer a variety welfare programs [14]. Initiated in the United States in the early 1900s, federal welfare programs were designed with the intention of operating as safety net programs to aid the disad- vantaged. President Theodore Roosevelt, for example, proposed a number of such ideas as part of his ‘New Nationalism’ platform. During that time period, Roosevelt believed it was incumbent upon the federal government to institute programs such as a National Heath Service; social insurance for the unemployed, disabled, and elderly; and to provide workers’ compensation to those with certain work-related injuries [84]. As a component of his New Deal, President Franklin Delano Roosevelt in- stituted a number of federal cash assistance programs to assist the poor. These programs included Social Security, Aid to Dependent Children (ADC), and unem- ployment compensation programs among others. Several decades later, President Lyndon Johnson instituted the War on Poverty to further expand America’s welfare state with the goal of eradicating poverty. In particular, he expanded President Roosevelt’s ADC programs (which by this time were known as Aid to Families with Dependent Children or AFDC programs), began food stamp programs, public hous- ing, Medicare, and Medicaid, all of which were intended to be “safety nets” for those in poverty [14]. In Losing Ground, Charles Murray of the American Enterprise Institute rig- orously evaluated the effects of these types of welfare programs [111]. He concluded that without doubt many of these programs had the potential to foster dependency and cripple those they intend to help. Murray’s work shattered much of the political capital advocates had for open-endedly expanding America’s welfare state. In 1992, Presidential Candidate Bill Clinton campaigned to “end welfare as we know it,” [16]. Two years later, Congressional Republicans made welfare reform a fundamental component of their Contract with America [53]. The goal of these reforms were to transform welfare away from a system that encouraged endless dependence on taxpayer dollars and instead transform it into a system providing 70 temporary assistance to get people back to work and become active contributing members of society. In 1996, Congressional Republicans worked with President Clinton to reform AFDC Programs [53]. Intended to operate as a safety net for those in poverty, AFDC programs were simply a government handout to qualifying single mothers. The government mailed checks to recipients who had virtually no responsibilities in return. In 1996, President Clinton signed into law the Personal Responsibility and Work Authorization Act (PROWA), restructuring AFDC programs into a more re- strictive program known as Temporary Assistance to Needy Families (TANF) [118]. Under TANF, state governments were required to impose federal work standards on welfare recipients. In particular, TANF required recipients to work or study as a condition for receiving welfare. In addition, TANF also limited receipt of welfare benefits to a five year time frame. These requirements, although not particularly rigorous, were intended to transform the welfare program from a system that fos- tered dependency into a springboard that gave the disadvantaged temporary help to become contributing members of society [137]. It is important for policymakers to be able to rigorously examine such funda- mental changes to government programs. Many reforms, including PROWA, have been constructed to provide states a certain degree of flexibility in implementing the law. As a result, different states have the potential to manifest different responses to the law. Although a number of policy studies have made some quantitative evaluations of PROWA, no studies, to our knowledge, have done so capturing this state-level heterogeneity in the associated statistical models [61,132,152]. We perform such an analysis in this study. In particular, we statistically evaluate the success of PROWA in helping to get people back to work. We do so by estimating hierarchical Bayesian linear models to rigorously compare AFDC programs (before welfare reform) to TANF programs (after welfare reform). We find 71 that TANF was considerably more successful than AFDC in getting people back to work, illustrating the success of the PROWA. In the process of our analysis, we improve on existing Bayesian statistical es- timation techniques. In particular, standard Bayesian estimation techniques suffer from a number of significant limitations. Specifically, standard Markov Chain Monte Carlo (MCMC) methods used to estimate Bayesian statistical models can require a significant amount of time to adequately sample a posterior density, especially for large data sets involving high-dimensional parameter spaces [50]. Furthermore, despite diagnostic checks, it is difficult to truly know whether an MCMC sampler has adequately sampled the parameter space. As a result, posterior inferences based on MCMC sampling may or may not necessarily reflect reality. We provide a novel adaptation of a semiparametric estimation technique, used thus far only for frequen- tist statistical models, to more accurately summarize the posterior by providing more precise estimates of the density’s credible intervals [160]. The contribution of this study is therefore twofold. Firstly, we statistically examine the efficacy of the welfare reform of the 1990s. Secondly, we improve on standard Bayesian estimation techniques for hierarchical linear models. As we discuss, our improvements to Bayesian computation are useful for a variety of models beyond just the hierarchical linear model structure studied here. 4.2 Statistically Modeling the Impact of Welfare Reform 4.2.1 A Hierarchical Bayesian Model Allowing for State-Level Heterogeneity We are interested in understanding the efficacy of the Personal Responsibility and Work Opportunity Reconciliation Act of 1996 [118]. This law imposed a five 72 year time limit on welfare benefits and imposed some, albeit not particularly oner- ous requirements, on individuals to either work or study as a condition of receiving welfare. To understand the efficacy of this reform, we looked at the effect of the total number of welfare caseloads in each state on the number of employment re- lated discontinuances of receiving welfare. Mathematically, this relationship can be modeled over i ∈ {1, . . . , 53} territories (50 states as well as Guam, Puerto Rico, and the District of Columbia) (units) over t ∈ {1, . . . , T} years. Consider a data set consisting of i ∈ {1, . . . , I} observations (in our case state/territories) over t ∈ {1, . . . , T} occasions. We define our dependent vector valued variable to be yi = (yi,t)Tt=1, which represents the welfare caseload discontin- uances due to employment for state/territory i at time t. We define our explanatory vector valued variable, xi = (xi,t′)T ′ t′=1, as a vector of dimension equal to yi represent- ing the total number of welfare caseloads at time t′ (for t′ < t) as we wish to relate yi,t to past covariates. Thus, xi represents a lagged number of welfare caseloads. As the PROWA imposed a new five year time limit for TANF recipients, which had previously not been present for AFDC programs, we conducted our analysis for lags of 1,2, 3, 4, and 5 years. 73 yi ∼ MVN(βi,0 + βi,1xi,1,Σ) (4.1) βi,0 ∼ N(µ0, σ 2 0) (4.2) βi,1 ∼ N(µ1, σ 2 1) (4.3) Σ−1 ∼Wishart(Φ0, ν0) (4.4) µ0 ∼ N(0, 10) (4.5) µ1 ∼ N(0, 10) (4.6) σ−20 ∼ Γ(10, 10) (4.7) σ−21 ∼ Γ(10, 10) (4.8) Σ is a T -dimensional variance-covariance matrix that enables us to capture cross-year correlations (for example caseloads reductions in one particular state in one year may have a strong correlation with caseload reductions in that same state the following year). Another way of viewing this model would be to esti- mate a hierarchical Bayesian linear model defined as yi,t = βi,0 + βi,1xi,t + i,t for i = 1, . . . , I, t = 1, . . . , t − 1 with the prior structure discussed above and with non-zero covariances between i,t′ and i,t ∀t, t′ ≤ T . To ensure statistical identifiability of our model, we requred β1,0 ≡ 0.1 In the next section, we discuss estimation of this model, including improvements to estimating the posterior density’s credible intervals. 4.2.2 The Limitations of Bayesian Computational Methods Typically, these types of hierarchical Bayesian models lack an analytic form for their posterior functionals and are therefore estimated numerically using standard 1 This coefficient pertains to the intercept regarding Alabama as our data was organized alpha- betically. 74 Markov Chain Monte Carlo (MCMC) methods [50]. Researchers use the resulting MCMC samples to generate a variety of statistical estimators such as posterior means, posterior standard deviations, and endpoints depicting credible intervals. There are a number of limitations associated with MCMC sampling, however. In particular, MCMC sampling can be computationally intensive, particularly for models involving high-dimensional parameter spaces. Recent research has developed alternative approaches for Bayesian estimation that attempt to address this issue. In particular, there are approaches using variational approximations to posterior densities, polynomial expansion approaches, and alternative sampling approaches among others [9, 10,39,57,108]. Another limitation of MCMC methods is that although there are some eval- uatory tools (e.g. Gelman and Rubin 1992) to assess convergence of the chain, these measures are simply just diagnostics, and it is difficult to determine with certainty if the MCMC chain has sufficiently navigated the posterior distribution’s parameter space [51]. Although MCMC samplers theoretically do converge under reasonable conditions, this convergence is only guaranteed to occur asymptotically over an infinitely large number of draws [154]. In reality, however, the Bayesian MCMC samplers need to eventually be truncated. As a result, researchers are of- ten rightfully concerned with whether their posterior sample based on truncation of the MCMC sampler truly represents the posterior density. A classic pathologi- cal example demonstrating this phenomenon involves MCMC sampling of the so- called “witch’s hat,” where the MCMC sampler can get stuck in a particular area of the posterior distribution [141]. Premature truncation of the MCMC sampler can consequently result in misleading statistical inferences regarding the posterior distribution. We can mitigate this problem considerably by applying a semiparametric methodology, known as density ratio estimation, to posterior samples [123–125,160]. 75 This estimation technique can enable a researcher to more accurately estimate cred- ible intervals. Although this semiparametric approach has been used in a variety of settings, no researchers, to our knowledge, have applied this method to Bayesian estimation. 4.2.3 Statistical Inferences Based on Density Ratio Estimation Density ratio estimation (hereafter referred to as DRE) is a semiparametric statistical estimation technique. This technique has been applied to many settings including AIDS vaccine trials, the analysis of variance, mortality rate prediction, cluster detection, and cancer research among others [43,44,52,80,81,83]. In the one dimensional case, there are i = I + 1 random samples xi with sample size ni such that ∑I+1 i=1 ni = n: xi = (xi,1, . . . , xi,ni), with probability density functions gi, such that: xi,j ∼ gi, i = 1, . . . , I, I + 1, j = 1, . . . , ni. (4.9) In utilizing this method, the statistician a priori assumes that gI+1 ≡ g defines a reference probability density with a particular ratio between gi and g ∀ i = 1, . . . , I. In many cases, this ratio is an exponential involving a vector-valued function h(x), known as a tilt function: gi(x) g(x) = eαi+β ′ ih(x). (4.10) If both gi(x) and g(x) are densities belonging to the exponential family having functional form: 76 g(x,θ) = d(θ)S(x)exp [ J∑ j=1 cj(θ)Tj(x) ] = exp [ J∑ j=1 cj(θ)Tj(x) + log [ d(θ)S(x) ] ] , (4.11) where θ is a parameter to be estimated, then the densities of any two such members will clearly have a ratio satisfying (4.15), as will the ratio of a single truncated such member coupled with the density of a uniformly distributed random variable. Some examples of special cases resulting in commonly used tilt functions are h(x) = (x, x2)′ (appropriate for normally distributed data): gi(x) g(x) = eαi+βix+γix 2 , (4.12) h(x) = (x, log(x))′ (appropriate for data coming from a gamma distribution), and gi(x) g(x) = eαi+βix+γilog(x), (4.13) h(x) = (xτ , log(x))′ (appropriate for potentially more heavily-skewed data coming from a generalized gamma distribution): gi(x) g(x) = eαi+βix τ+γilog(x). (4.14) Parametric assumptions regarding densities, however, are sufficient but not necessary conditions for density ratios, such as those above, to be able to properly model real-world phenomena. Recent research, for example, has shown that the distributions described in (4.13) and (4.14) are useful in modeling situations where the data does not necessarily adhere to the strict parameterizations described by known parametric distributions [78,82,168]. 77 4.2.4 Empirical Likelihood Estimation in Semiparametric Density Ratio Models Suppose we assume the density ratio outlined in (4.15) with tilt function h(x), gi(x) g(x) = eαi+β ′ ih(x). (4.15) If we let G(x) be the reference CDF and define probability masses pij = dG(xi,j) = dGI+1(xi,j), then we can utilize the method of constrained empirical likelihood to estimate gi and the associated parameters as follows. We can write our empirical likelihood function, parametrized by θ = (α1, . . . , αI ,β1, . . . ,βI) (a vector of dimension Id), based on our pooled data: L(θ, G) = I+1∏ i=1 ni∏ j=1 pij I∏ i=1 ni∏ j=1 eαi+β ′ ih(xi,j). (4.16) The resulting log-likelihood is given by: l = log L = I+1∑ i=1 ni∑ j=1 log pij + I∑ i=1 ni∑ j=1 ( αi + β′ih(xi,j) ) , (4.17) subject to the following constraints: pij > 0, I+1∑ i=1 ni∑ j=1 pij = 1, I∑ i=1 ni∑ j=1 pije αk+β′kh(xi,j) = 1, for k = 1, . . . , I. (4.18) The following conditions, as discussed in Fokianos (2004) and Qin and Law- less (1995) as cited by Voulgaraki (2011) ensure the existence of an empirical likelihood estimator [43, 123, 159]. Specifically, if we let f(x, θˆ) = (eαˆ1+βˆ1h(x) − 1, . . . , eαˆI+βˆIh(x) − 1)′, then given the following conditions, there exists, with prob- ability approaching one, an extremum in a ball around the true parameter vector θ0: 78 Lemma 4.2.1. Under the assumptions that: (a) E(f(x,θ0)f ′(x,θ0)) is positive definite, (b) ∂f∂θ is continuous in a ball around the true value of θ0 (c) ||∂f∂θ || and ||f(x,θ)|| 3 are bounded by an integral function with respect to G in this ball, (d) E (∂f ∂θ ) is of rank Id, where E is the expectation with respect to the probability measure corresponding to G, then the coefficients αi and βi, as well as the discrete estimators of Gi(x) ∀ i = 1, . . . , I+1 can be estimated by optimizing (4.17) subject to the constraints outlined in (4.18) via a two step estimation procedure utilizing the method of Lagrange multipliers. Specifically, if we can define µk ≡ λk/n, where λk is the pertinent Lagrange multiplier we find the following estimators, pˆij and Gˆi(x) for i = 1, . . . , I+ 1, as follows: pˆij = 1 n 1 1 + ∑I k=1 µˆk[e αˆi+βˆ′ih(xij) − 1] (4.19) and: GˆI+1(x) = I+1∑ i=1 ni∑ j=1 pˆijI(xij ≤ x) = 1 n I+1∑ i=1 ni∑ j=1 I(xij ≤ x) 1 + ∑I k=1 µˆk[e αˆi+βˆ′ih(x) − 1] . (4.20) We can generalize this result to distributions Gˆi for i = 1, . . . , I to generate the following estimators: 79 Gˆk(x) = I+1∑ i=1 ni∑ j=1 pˆije αˆk+βˆ′kh(x)I(xij ≤ x) = 1 n I+1∑ i=1 ni∑ j=1 eαˆk+βˆ ′ kh(x)I(xij ≤ x) 1 + ∑I k=1 µˆk[e αˆi+βˆ′ih(x) − 1] , for k = 1, . . . , I. (4.21) Furthermore, estimators of the probability densities gi(x) can be obtained via kernel density estimation applied to the jumps of Gˆi ∀ i = 1, . . . , I + 1 . The inter- ested reader should refer to Voulgaraki et al (2012), which provides a methodology for a complete discussion of the approach including techniques for determining the optimal bandwidth needed for accurate kernel density estimation [160]. Additionally, as long as the following properties hold, our estimators are statis- tically unbiased and asymptotically normal as illustrated in the following theorem. In particular, by defining the vector µ ≡ (µ1, . . . , µI) as the Lagrange multipliers for the optimization of the empirical likelihood function subject to the constraints outlined in (4.18) (just as has been done above) and by letting µ0 denote the true value of µ, then, we can state the following theorem: Theorem 4.2.2. Suppose the following conditions hold: (a) ∂ 2f ∂θ∂θ′ is continuous in a ball around the true value of θ0 (b) There exists an integrable function with respect to G bounding || ∂ 2f ∂θ∂θ′ || (c) The four conditions outlined in Lemma 4.2.1 hold. Then: √ n ( θˆ − θ0 µˆ− µ0 ) ⇒ P, (4.22) where ⇒ denotes weak convergence to a probability measure P, induced by a mul- tivariate normal random vector with mean 0 and a particular variance-covariance 80 matrix Σ. The details of the structure of Σ are complicated and derived in detail in Fokianos (2004). Lu (2007) contains a comprehensive proof of Theorem 4.2.2 [43,96]. One of the most appealing aspects about this semiparametric approach is that it can more accurately determine probability distributions from data compared to its more restrictive parametric counterparts. This advantage is due to the fact that the combined sample of larger size is used rather than any particular sample consisting of a smaller number of elements. In Section 4.2.2, we discussed many of the limitations of present day standard Bayesian computational methods. In the following section, we illustrate how this semiparametric density ratio technique can be used to ameliorate these limitations. 4.2.5 Numerical Simulations In this section, we present a series of numerical simulations to illustrate the efficacy of using DRE to generate accurate estimates of percentiles from a variety of different samples. In particular, we applied the DRE method, assuming the ratio described in (4.12) through (4.14) to samples from known parametric distributions (normal, student-t, gamma, and Weibull) to determine the samples’ 2.5th and 97.5th percentiles. Since the supports of these distributions are infinite, fusing a sample from any of these distributions with a uniformly distributed sample signifies that the density ratio model holds approximately. Specifically, we applied the DRE approach outlined in Section 4.2.3 with I = 1 (i.e. 2 samples), taking a sample from one of our known distributions, given a particular choice of parameters, and fused this sample with a random sample drawn from a uniform distribution with support including this sample’s entire range. This uniformly distributed sample served as our reference distribution. This methodology, known as “out of sample fusion,” was first introduced by Zhou (2013) as a tool for 81 estimating small probabilities [168]. We used the DRE approach to estimate the 2.5th and 97.5th percentiles (DREk,2.5 and DREk,97.5) of our original distribution over the course of k = 1, . . . , K simulations. We compared these estimates to the true (2.5th and 97.5th) percentiles of our original population, Qk,2.5 and Qk,97.5, by computing the squared difference between the two. We performedK = 100 such simulations for each choice of parameters, summed these squared differences for each percentile, and computed averages across all sim- ulations. We hereafter refer to these mean squared differences as MSEDRE,2.5 and MSEDRE,97.5: MSEDRE,2.5 = ∑K k=1 (DREk,2.5 −Qk,2.5) 2 K (4.23) and: MSEDRE,97.5 = ∑K k=1 (DREk,97.5 −Qk,97.5) 2 K . (4.24) The use of known parametric distributions enables us to understand the effi- cacy of using DRE for estimating low and high level quantiles of distributions. As a result, these simulations shed light on the usefulness of using the DRE approach for credible interval estimation of Bayesian regression coefficients. Zhou (2013) also developed an approach for quantifying the efficacy of using out of sample fusion for estimating probabilistic thresholds [168]. Zhou’s work found that out of sample fusion results in shorter confidence intervals compared to using empirical distributions solely based on within sample data. These confidence inter- vals, however, are inherently frequentist in nature, based on the assumption that each sample is the single realization of a long-run frequency of an asymptotically large number of samples. This approach, although not completely useless, is not 82 philosophically compatible with the Bayesian philosophy of statistical estimation. As a result, although such an approach sheds light on the efficacy of out of sample fusion, the approach presented here provides further verification as well as a more legitimate basis for applying the DRE method to Bayesian estimation. Our results are outlined in the Tables 4.1-4.10, which we conducted for sam- ples of size 600, 1000, 5000, 10000, 15000, 20000, and 25000. DRE2.5 and DRE97.5 represent the estimates of the 2.5th and 97.5th percentiles using the DRE method, averaged over the 100 simulations, and E2.5 and E97.5 represent these same averages using the samples’ empirical CDFs. For the Weibull quantile estimation, we esti- mated via maximum likelihood estimation an estimator for τ assuming the data were a random sample from a generalized gamma distribution. We then fixed this esti- mate as the value for τ in the density ratio and proceeded with the semiparametric estimation. We notice that for all of the sample sizes and distributions examined, MSEDRE2.5 andMSEDRE97.5 are substantially lower thanMSEE2.5 andMSEE97.5 . Thus, by sim- ply fusing samples from known distributions with uniformly distributed data and estimating the sample’s distribution using the semiparametric DRE approach with the combined data based on a reasonable choice of a tilt function, we are able to more accurately determine the 2.5th and 97.5th percentiles. As our analysis illustrates, for data that appears to be normally distributed (i.e. symmetric and unimodal, the tilt function h(x) = (x, x2)′ is reasonable, whereas for skewed data a tilt function of the form h(x) = (x, log(x))′ or more generally h(x) = (xτ , log(x))′ is acceptable. 83 Tab. 4.1: Simulation Results based on Samples from a Normal Distribution with mean µ and standard deviation σ, fused with uniformly distributed sample over xmin and xmax, assuming tilt function h(x) = (x, x2)′ N µ σ MSEDRE2.5 MSEDRE97.5 MSEE2.5 MSEE97.5 600 0 1 0.006 0.007 0.012 0.011 600 10 2 0.030 0.018 0.041 0.042 600 5 3 0.076 0.084 0.143 0.111 600 -2 5 0.176 0.190 0.234 0.222 1000 0 1 0.004 0.005 0.008 0.008 1000 10 2 0.015 0.020 0.021 0.027 1000 5 3 0.046 0.040 0.070 0.066 1000 -2 5 0.101 0.119 0.164 0.204 5000 0 1 0.001 0.001 0.001 0.002 5000 10 2 0.018 0.015 0.027 0.031 5000 5 3 0.039 0.041 0.056 0.066 5000 -2 5 0.103 0.114 0.159 0.184 10000 0 1 < 0.001 0.001 0.001 0.001 10000 10 2 0.017 0.017 0.034 0.029 10000 5 3 0.038 0.044 0.060 0.068 10000 -2 5 0.118 0.118 0.164 0.184 15000 0 1 < 0.001 < 0.001 0.001 < 0.001 15000 10 2 0.001 0.001 0.002 0.002 15000 5 3 0.003 0.003 0.004 0.005 15000 -2 5 0.008 0.007 0.012 0.010 20000 0 1 < 0.001 < 0.001 < 0.001 < 0.001 20000 10 2 0.017 0.014 0.029 0.029 20000 5 3 0.041 0.031 0.075 0.070 20000 -2 5 0.090 0.116 0.195 0.161 25000 0 1 < 0.001 < 0.001 < 0.001 < 0.001 25000 10 2 0.017 0.017 0.035 0.024 25000 5 3 0.051 0.042 0.066 0.079 25000 -2 5 0.113 0.067 0.234 0.143 84 Tab. 4.2: Further simulations based on Samples from a student-t Distribution with varying degrees of freedom, fused with uniformly distributed sample over xmin and xmax, assuming tilt function h(x) = (x, x2)′ N df MSEDRE2.5 MSEDRE97.5 MSEE2.5 MSEE97.5 600 10 0.0127 0.0139 0.0253 0.0234 600 15 0.0114 0.0094 0.0191 0.0190 600 20 0.0089 0.0110 0.0146 0.0165 600 25 0.0101 0.0107 0.0147 0.0157 600 30 0.0076 0.0113 0.0140 0.0200 600 35 0.0069 0.0070 0.0118 0.0140 600 40 0.0087 0.0079 0.0148 0.0128 1000 10 0.0127 0.0139 0.0253 0.0234 1000 15 0.0114 0.0094 0.0191 0.0190 1000 20 0.0089 0.0110 0.0146 0.0165 1000 25 0.0101 0.0107 0.0147 0.0157 1000 30 0.0076 0.0113 0.0140 0.0200 1000 35 0.0069 0.0070 0.0118 0.0140 1000 40 0.0087 0.0079 0.0148 0.0128 5000 10 0.0020 0.0017 0.0026 0.0027 5000 15 0.0014 0.0015 0.0024 0.0021 5000 20 0.0015 0.0013 0.0024 0.0023 5000 25 0.0012 0.0013 0.0023 0.0022 5000 30 0.0010 0.0010 0.0016 0.0016 5000 35 0.0011 0.0018 0.0019 0.0016 5000 40 0.0010 0.0010 0.0020 0.0017 10000 10 0.0018 0.0019 0.0012 0.0011 10000 15 0.0017 0.0016 0.0012 0.0010 10000 20 0.0016 0.0016 0.0019 0.0011 10000 25 0.0015 0.0014 0.0019 0.0017 10000 30 0.0016 0.0016 0.0017 0.0010 10000 35 0.0015 0.0014 0.0017 0.0017 10000 40 0.0015 0.0015 0.0018 0.0017 85 Tab. 4.3: Further simulations based on Samples from a student-t Distribution with varying degrees of freedom, fused with uniformly distributed sample over xmin and xmax, assuming tilt function h(x) = (x, x2)′ N df MSEDRE2.5 MSEDRE97.5 MSEE2.5 MSEE97.5 15000 10 0.0017 0.0017 0.0019 0.0019 15000 15 0.0015 0.0015 0.0018 0.0017 15000 20 0.0014 0.0014 0.0017 0.0015 15000 25 0.0014 0.0014 0.0018 0.0016 15000 30 0.0014 0.0014 0.0015 0.0015 15000 35 0.0014 0.0013 0.0016 0.0015 15000 40 0.0014 0.0014 0.0016 0.0016 20000 10 0.0018 0.0016 0.0019 0.0017 20000 15 0.0014 0.0013 0.0015 0.0015 20000 20 0.0012 0.0013 0.0015 0.0015 20000 25 0.0012 0.0013 0.0013 0.0014 20000 30 0.0013 0.0013 0.0014 0.0015 20000 35 0.0012 0.0013 0.0013 0.0015 20000 40 0.0012 0.0012 0.0013 0.0013 25000 10 0.0016 0.0015 0.0016 0.0016 25000 15 0.0013 0.0013 0.0014 0.0014 25000 20 0.0012 0.0013 0.0013 0.0014 25000 25 0.0012 0.0012 0.0014 0.0014 25000 30 0.0012 0.0013 0.0013 0.0015 25000 35 0.0012 0.0012 0.0014 0.0014 25000 40 0.0012 0.0013 0.0014 0.0014 86 Tab. 4.4: Simulations based on Samples from a Gamma Distribution with parameters α and β, assuming tilt function h(x) = (x, log(x))′ N α β MSEDRE2.5 MSEDRE97.5 MSEE2.5 MSEE97.5 600 1 2 0.001 1.367 0.001 2.040 600 3 2 0.092 2.170 0.116 2.655 600 5 2 0.235 2.898 0.279 3.644 600 1 4 < 0.001 0.297 < 0.001 0.366 600 3 4 0.030 0.495 0.032 0.601 600 5 4 0.065 0.642 0.078 0.979 600 7 4 0.104 0.898 0.122 1.073 600 1 8 < 0.001 0.083 < 0.001 0.095 600 3 8 0.005 0.149 0.005 0.167 600 5 8 0.021 0.240 0.024 0.254 600 7 8 0.030 0.255 0.037 0.304 1000 1 2 0.001 0.671 0.001 1.006 1000 3 2 0.053 0.930 0.058 1.287 1000 5 2 0.167 1.693 0.248 2.331 1000 1 4 < 0.001 0.174 < 0.001 0.232 1000 3 4 0.011 0.324 0.013 0.427 1000 5 4 0.046 0.339 0.049 0.467 1000 7 4 0.068 0.559 0.083 0.896 1000 1 8 < 0.001 0.066 < 0.001 0.070 1000 3 8 0.003 0.085 0.003 0.112 1000 5 8 0.013 0.116 0.015 0.138 1000 7 8 0.016 0.148 0.021 0.191 87 Tab. 4.5: Further simulations based on Samples from a Gamma Distribution with param- eters α and β, assuming tilt function h(x) = (x, log(x))′ N α β MSEDRE2.5 MSEDRE97.5 MSEE2.5 MSEE97.5 5000 1 2 < 0.001 0.133 < 0.001 0.211 5000 3 2 0.010 0.214 0.012 0.251 5000 5 2 0.035 0.251 0.041 0.423 5000 1 4 < 0.001 0.042 < 0.001 0.053 5000 3 4 0.003 0.070 0.003 0.087 5000 5 4 0.009 0.093 0.011 0.107 5000 7 4 0.014 0.113 0.018 0.148 5000 1 8 < 0.001 0.010 < 0.001 0.013 5000 3 8 0.001 0.021 0.001 0.022 5000 5 8 0.002 0.024 0.002 0.026 5000 7 8 0.004 0.021 0.004 0.034 10000 1 2 < 0.001 0.074 < 0.001 0.098 10000 3 2 0.004 0.110 0.005 0.161 10000 5 2 0.013 0.160 0.016 0.257 10000 1 4 < 0.001 0.019 < 0.001 0.025 10000 3 4 0.002 0.041 0.002 0.052 10000 5 4 0.004 0.037 0.005 0.047 10000 7 4 0.008 0.051 0.009 0.071 10000 1 8 < 0.001 0.005 < 0.001 0.006 10000 3 8 < 0.001 0.008 < 0.001 0.010 10000 5 8 0.001 0.012 0.001 0.014 10000 7 8 0.002 0.014 0.002 0.016 15000 1 2 < 0.001 0.047 < 0.001 0.063 15000 3 2 0.004 0.070 0.005 0.084 15000 5 2 0.011 0.107 0.013 0.161 15000 1 4 < 0.001 0.015 < 0.001 0.017 15000 3 4 0.001 0.019 0.001 0.023 15000 5 4 0.003 0.036 0.004 0.045 15000 7 4 0.005 0.034 0.006 0.048 15000 1 8 < 0.001 0.004 < 0.001 0.005 15000 3 8 < 0.001 0.006 < 0.001 0.007 15000 5 8 0.001 0.008 0.001 0.009 15000 7 8 0.001 0.010 0.001 0.011 88 Tab. 4.6: Further simulations based on Samples from a Gamma Distribution with param- eters α and β, assuming tilt function h(x) = (x, log(x))′ N α β MSEDRE2.5 MSEDRE97.5 MSEE2.5 MSEE97.5 20000 1 2 < 0.001 0.047 < 0.001 0.065 20000 3 2 0.003 0.055 0.003 0.079 20000 5 2 0.007 0.088 0.009 0.123 20000 1 4 < 0.001 0.010 < 0.001 0.011 20000 3 4 0.001 0.023 0.001 0.026 20000 5 4 0.001 0.020 0.002 0.024 20000 7 4 0.004 0.024 0.004 0.033 20000 1 8 < 0.001 0.002 < 0.001 0.002 20000 3 8 < 0.001 0.004 < 0.001 0.005 20000 5 8 0.001 0.006 0.001 0.007 20000 7 8 0.001 0.007 0.001 0.008 25000 1 2 < 0.001 0.027 < 0.001 0.035 25000 3 2 0.002 0.051 0.003 0.068 25000 5 2 0.007 0.082 0.009 0.133 25000 1 4 < 0.001 0.006 < 0.001 0.007 25000 3 4 0.001 0.013 0.001 0.018 25000 5 4 0.001 0.016 0.002 0.026 25000 7 4 0.002 0.018 0.003 0.024 25000 1 8 < 0.001 0.002 < 0.001 0.002 25000 3 8 < 0.001 0.003 < 0.001 0.004 25000 5 8 < 0.001 0.005 0.001 0.005 25000 7 8 0.001 0.007 0.001 0.009 89 Tab. 4.7: Simulations based on Samples from a Weibull Distribution with parameters α and β, assuming tilt function h(x) = (xτ , log(x))′ N α β MSEDRE2.5 MSEDRE97.5 MSEE2.5 MSEE97.5 600 1 1 0.004 6.345 0.005 6.585 600 1.2 1 0.009 2.726 0.011 3.660 600 1.5 1 0.016 0.955 0.021 1.134 600 2.5 1 0.052 0.140 0.063 0.201 600 1 1.2 0.005 6.388 0.005 7.254 600 1.2 1.2 0.015 2.450 0.016 2.679 600 1.5 1.2 0.024 1.150 0.029 1.790 600 2.5 1.2 0.079 0.212 0.099 0.266 600 1 1.5 0.010 10.454 0.011 10.995 600 1.2 1.5 0.022 4.000 0.028 5.585 600 1.5 1.5 0.049 2.727 0.064 2.945 600 2.5 1.5 0.087 0.364 0.118 0.491 600 1 2.5 0.033 29.605 0.035 39.361 600 1.2 2.5 0.068 12.825 0.072 16.565 600 1.5 2.5 0.124 5.034 0.142 8.541 600 2.5 2.5 0.274 1.023 0.370 1.202 1000 1 1 0.003 2.524 0.003 3.272 1000 1.2 1 0.006 1.417 0.008 1.653 1000 1.5 1 0.010 0.551 0.013 0.722 1000 2.5 1 0.021 0.079 0.033 0.119 1000 1 1.2 0.004 3.900 0.005 5.232 1000 1.2 1.2 0.009 1.838 0.009 2.516 1000 1.5 1.2 0.015 0.727 0.019 1.008 1000 2.5 1.2 0.034 0.125 0.041 0.129 1000 1 1.5 0.006 7.696 0.006 9.494 1000 1.2 1.5 0.015 2.571 0.017 3.905 1000 1.5 1.5 0.025 1.546 0.033 1.871 1000 2.5 1.5 0.068 0.181 0.084 0.232 1000 1 2.5 0.012 15.829 0.012 22.411 1000 1.2 2.5 0.041 8.823 0.048 12.399 1000 1.5 2.5 0.081 3.573 0.101 4.808 1000 2.5 2.5 0.198 0.662 0.236 0.755 90 Tab. 4.8: Further Simulations based on Samples from a Weibull Distribution with param- eters α and β, assuming tilt function h(x) = (xτ , log(x))′ N α β MSEDRE2.5 MSEDRE97.5 MSEE2.5 MSEE97.5 5000 1 1 < 0.001 0.459 < 0.001 0.559 5000 1.2 1 0.001 0.256 0.001 0.353 5000 1.5 1 0.003 0.097 0.003 0.122 5000 2.5 1 0.005 0.021 0.007 0.025 5000 1 1.2 0.001 0.931 0.001 1.274 5000 1.2 1.2 0.002 0.452 0.002 0.722 5000 1.5 1.2 0.004 0.184 0.004 0.253 5000 2.5 1.2 0.008 0.029 0.009 0.036 5000 1 1.5 0.002 1.319 0.002 2.066 5000 1.2 1.5 0.002 0.452 0.002 0.722 5000 1.5 1.5 0.004 0.280 0.004 0.351 5000 2.5 1.5 0.011 0.035 0.016 0.037 5000 1 2.5 0.003 0.626 0.003 0.759 5000 1.2 2.5 0.007 1.608 0.008 1.826 5000 1.5 2.5 0.010 0.585 0.016 0.693 5000 2.5 2.5 0.045 0.116 0.054 0.169 4.3 The Impact of Welfare Reform 4.3.1 Data We obtained data of AFDC (1982-1996) monthly caseloads and annual dis- continuances in caseloads due to employment from the Quarterly Public Assistance Statistics [157]. The TANF data (1997-2009) was obtained from the HHS website as well as the each of program’s annual reports to Congress [25–33]. Because the data consisted of thousands of caseloads and discontinuances for each each state, we changed the units of our data to thousands of caseloads to make our estima- tion easier. Additionally, we divided the annual discontinuance by 12 to represent average monthly discontinuances to have both covariates have the same units of measurement. Across all 53 territories, AFDC programs had slightly more than 75,000 caseloads with approximately 7,500 discontinuances due to employment over 91 Tab. 4.9: Further simulations based on Samples from a Weibull Distribution with param- eters α and β, assuming tilt function h(x) = (xτ , log(x))′ N α β MSEDRE2.5 MSEDRE97.5 MSEE2.5 MSEE97.5 10000 1 1 < 0.001 0.319 < 0.001 0.392 10000 1.2 1 < 0.001 0.124 0.001 0.154 10000 1.5 1 0.001 0.059 0.001 0.063 10000 2.5 1 0.003 0.008 0.003 0.009 10000 1 1.2 0.002 3.210 0.002 4.069 10000 1.2 1.2 0.001 0.186 0.001 0.267 10000 1.5 1.2 0.002 0.092 0.002 0.113 10000 2.5 1.2 0.004 0.014 0.005 0.020 10000 1 1.5 0.001 0.634 0.001 0.761 10000 1.2 1.5 0.001 0.339 0.001 0.456 10000 1.5 1.5 0.003 0.095 0.003 0.152 10000 2.5 1.5 0.005 0.024 0.006 0.029 10000 1 2.5 0.002 1.924 0.002 2.693 10000 1.2 2.5 0.003 0.733 0.004 1.172 10000 1.5 2.5 0.006 0.276 0.008 0.361 10000 2.5 2.5 0.016 0.071 0.018 0.090 15000 1 1 < 0.001 0.222 < 0.001 0.284 15000 1.2 1 < 0.001 0.106 < 0.001 0.134 15000 1.5 1 0.001 0.034 0.001 0.043 15000 2.5 1 0.002 0.009 0.003 0.010 15000 1 1.2 < 0.001 0.282 < 0.001 0.476 15000 1.2 1.2 < 0.001 0.108 0.001 0.154 15000 1.5 1.2 0.001 0.049 0.001 0.077 15000 2.5 1.2 0.003 0.009 0.003 0.012 15000 1 1.5 < 0.001 0.377 < 0.001 0.515 15000 1.2 1.5 0.001 0.196 0.001 0.265 15000 1.5 1.5 0.002 0.082 0.002 0.108 15000 2.5 1.5 0.004 0.014 0.005 0.020 15000 1 2.5 0.001 1.260 0.001 1.896 15000 1.2 2.5 0.002 0.432 0.002 0.633 15000 1.5 2.5 0.005 0.197 0.006 0.304 15000 2.5 2.5 0.009 0.039 0.011 0.075 92 Tab. 4.10: Further simulations based on Samples from a Weibull Distribution with pa- rameters α and β, assuming tilt function h(x) = (xτ , log(x))′ N α β MSEDRE2.5 MSEDRE97.5 MSEE2.5 MSEE97.5 20000 1 1 < 0.001 0.175 < 0.001 0.201 20000 1.2 1 < 0.001 0.059 < 0.001 0.075 20000 1.5 1 0.001 0.024 0.001 0.032 20000 2.5 1 0.001 0.006 0.001 0.007 20000 1 1.2 < 0.001 0.307 < 0.001 0.348 20000 1.2 1.2 0.001 0.100 0.001 0.124 20000 1.5 1.2 0.001 0.039 0.001 0.061 20000 2.5 1.2 0.002 0.009 0.003 0.011 20000 1 1.5 < 0.001 0.307 < 0.001 0.442 20000 1.2 1.5 0.001 0.129 0.001 0.186 20000 1.5 1.5 0.001 0.061 0.001 0.072 20000 2.5 1.5 0.003 0.014 0.003 0.015 20000 1 2.5 0.001 0.861 0.001 1.124 20000 1.2 2.5 0.002 0.377 0.002 0.505 20000 1.5 2.5 0.004 0.191 0.004 0.265 20000 2.5 2.5 0.008 0.030 0.009 0.041 25000 1 1 < 0.001 0.125 < 0.001 0.150 25000 1.2 1 < 0.001 0.045 < 0.001 0.065 25000 1.5 1 0.001 0.019 0.001 0.025 25000 2.5 1 0.001 0.004 0.001 0.005 25000 1 1.2 < 0.001 0.146 < 0.001 0.208 25000 1.2 1.2 < 0.001 0.089 < 0.001 0.118 25000 1.5 1.2 0.001 0.030 0.001 0.035 25000 2.5 1.2 0.001 0.007 0.002 0.008 25000 1 1.5 < 0.001 0.170 < 0.001 0.277 25000 1.2 1.5 < 0.001 0.103 < 0.001 0.137 25000 1.5 1.5 0.001 0.045 0.001 0.063 25000 2.5 1.5 0.003 0.009 0.003 0.010 25000 1 2.5 0.001 0.946 0.001 1.090 25000 1.2 2.5 0.001 0.403 0.001 0.482 25000 1.5 2.5 0.003 0.118 0.004 0.181 25000 2.5 2.5 0.007 0.027 0.009 0.044 93 a fifteen year time horizon. TANF, on the other hand, had over 40,000 caseloads with over 7,000 discontinuances due to employment over the course of thirteen years. 4.3.2 Analysis of the PROWA Using Bayesian Lagged Regression Using our data, we regressed our dependent variables, yt discontinuances in caseloads due to employment at year t, against our explanatory variable, total caseloads in a previous year t′, xt′ . We estimated ten instances of the model defined in equation 3. In particular, we estimated separate regressions regarding the impact of each of the five previous year’s caseloads (t − 1, t − 2, t − 3, t − 4, and t − 5) on discontinuances due to employment for [1] AFDC programs and for [2] TANF programs as TANF had instituted a five year time limit.2 We used WinBUGS to run a Bayesian MCMC sampler over the parameter space of the hierarchical model outlined in equations (4.1) to (4.8) to sample the posterior distribution [50, 97]. We ran our MCMC sampler for 30,000 draws, using the first half for “burn-in” to dissipate initial conditions. In the above model, Φ0 and ν0 of equation 6 were randomly generated. We ran our simulations several times over different random choices of Φ0 and ν0 and found similar results. These results suggested that the random generation of Φ0 and ν0 had no impact on the results generated by our MCMC sampler. Additionally, our posterior samples overall had a low degree of autocorrelation, indicating that we had likely not oversampled particular regions within our density. After completion of the MCMC sampler, we estimated posterior means, stan- dard deviations, and 95% credible intervals of all posterior samples.3 Since we had truncated the MCMC sampler after 30,000 draws, however, there is still some con- 2 Concatenating the explanatory variables across multiple lags for a larger multiple regression analysis resulted in multiciollinearity issues due to correlations between lagged caseloads and hence was not performed as part of this analysis. 3 A 95% credible interval for a univariate sample is a Bayesian interval estimator and denotes an interval bounded by the 2.5th and 97.5th percentiles of the posterior sample, serving as an estimate for the interval bounded by the 2.5th and 97.5th quantiles of the distribution. 94 Fig. 4.1: Histogram of Posterior Sample for Posterior Intercept Mean Coefficient (µ0), TANF regression, Lag 1 cern about missing information in the posterior sample. We therefore used the DRE method to improve upon our posterior sample. The posterior densities for µ0 and µ1 were symmetric and unimodal and were therefore appropriate for analysis using the tilt function h(x) = (x, x2)′ described in (4.12). The posterior densities of σ20 and σ21, however, appeared to be slightly skewed and we therefore more appropriate for analysis using the tilt function h(x) = (x, log(x))′ described in (4.13). The plots for a few selected coefficients are depicted in Figures 4.1-4.4 illustrating the reason for our choices of tilt functions: For our posterior samples of µ0 and µ1, we took each sample of the marginal posterior (sample size 15,000), used as a reference density a random sample of equal size from a uniform distribution over the posterior sample’s range, and applied the DRE method to estimate the cumulative distribution function (CDF) of the marginal posterior for this new sample of size 30,000. We then equated the CDF to 95 Fig. 4.2: Histogram of Posterior Slope Mean Coefficient (µ1), TANF regression, Lag 1 Fig. 4.3: Histogram of Posterior Intercept Variance Coefficient (σ20), TANF regression, Lag 1 96 Fig. 4.4: Posterior Slope Variance Coefficient, TANF regression (σ21), Lag 1 0.025 and solved the resulting equation to generate a lower limit for the 95% credible interval. We took a similar approach, equating the CDF to 0.975, to generate an upper estimate for the credible interval. We refer to this newly estimated region, based on semiparametric out of sample fusion, as the posterior density’s 95% credible interval. Our results are outlined in Tables 4.11-4.30. Tab. 4.11: AFDC Regression of Discontinuances as a Function of Caseloads - One Year Time Lag Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.072 0.173 -0.263 0.417 µ1 0.009 0.075 -0.136 0.155 σ20 0.773 0.188 0.486 1.227 σ21 0.290 0.049 0.208 0.400 97 Tab. 4.12: AFDC Regression of Discontinuances as a Function of Caseloads, Interval Es- timates and Refinements - One Year Time Lag 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit µ0 -0.263 0.417 -0.264 0.411 µ1 -0.136 0.155 -0.136 0.156 σ20 0.486 1.227 0.471 1.192 σ21 0.208 0.400 0.204 0.396 Tab. 4.13: TANF Regression of Discontinuances as a Function of Caseloads - One Year Time Lag Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.068 0.151 -0.228 0.365 µ1 0.014 0.075 -0.132 0.163 σ20 0.630 0.142 0.408 0.961 σ21 0.292 0.050 0.209 0.406 Tab. 4.14: TANF Regression of Discontinuances as a Function of Caseloads, Interval Es- timates and Refinements - One Year Time Lag 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit µ0 -0.228 0.365 -0.228 0.364 µ1 -0.132 0.163 -0.135 0.162 σ20 0.408 0.961 0.398 0.947 σ21 0.209 0.406 0.206 0.401 Tab. 4.15: AFDC Regression of Discontinuances as a Function of Caseloads - Two Year Time Lag Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.272 0.183 -0.094 0.624 µ1 0.001 0.074 -0.146 0.149 σ20 0.761 0.186 0.479 1.196 σ21 0.290 0.050 0.208 0.405 98 Tab. 4.16: AFDC Regression of Discontinuances as a Function of Caseloads, Interval Es- timates and Refinements - Two Year Time Lag 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit µ0 -0.094 0.624 -0.088 0.628 µ1 -0.146 0.149 -0.145 0.148 σ20 0.479 1.196 0.462 1.172 σ21 0.208 0.405 0.204 0.399 Tab. 4.17: TANF Regression of Discontinuances as a Function of Caseloads - Two Year Time Lag Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.240 0.151 -0.059 0.535 µ1 0.000 0.076 -0.148 0.150 σ20 0.578 0.125 0.380 0.870 σ21 0.292 0.050 0.210 0.404 Tab. 4.18: TANF Regression of Discontinuances as a Function of Caseloads, Interval Es- timates and Refinements - Two Year Time Lag 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit µ0 -0.059 0.535 -0.059 0.534 µ1 -0.148 0.150 -0.149 0.150 σ20 0.380 0.870 0.372 0.854 σ21 0.210 0.404 0.207 0.399 4.3.3 Statistical Inferences Based on Hierarchical Bayesian Analysis of PROWA Although our overall coefficient estimates are small in magnitude, the reader should be reminded of the fact that the data set analyzed consisted of observations in units of thousands. Therefore, slight differences between the coefficients signify notable differences. For the first year time lag model, the marginal posterior mean for µ1 is quite higher in Table 4.11 than in Table 4.13. A two sample Kolmogrov Smirnov test of the sample of the two posterior samples rejects the null hypothesis of distributional equality at p < 0.001, suggesting that there is a marked difference 99 Tab. 4.19: AFDC Regression of Discontinuances as a Function of Caseloads - Three Year Time Lag Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.196 0.196 -0.184 0.582 µ1 0.000 0.076 -0.147 0.150 σ20 0.765 0.201 0.465 1.237 σ21 0.292 0.050 0.210 0.406 Tab. 4.20: AFDC Regression of Discontinuances as a Function of Caseloads, Interval Es- timates and Refinements - Three Year Time Lag 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit µ0 -0.184 0.582 -0.189 0.583 µ1 -0.147 0.150 -0.148 0.150 σ20 0.465 1.237 0.448 1.212 σ21 0.210 0.406 0.206 0.401 Tab. 4.21: TANF Regression of Discontinuances as a Function of Caseloads - Three Year Time Lag Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.390 0.167 0.069 0.719 µ1 -0.005 0.075 -0.153 0.143 σ20 0.716 0.172 0.449 1.121 σ21 0.291 0.050 0.208 0.405 Tab. 4.22: TANF Regression of Discontinuances as a Function of Caseloads, Interval Es- timates and Refinements - Three Year Time Lag 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit µ0 0.069 0.719 0.065 0.720 µ1 -0.153 0.143 -0.154 0.143 σ20 0.449 1.121 0.437 1.095 σ21 0.208 0.405 0.204 0.399 100 Tab. 4.23: AFDC Regression of Discontinuances as a Function of Caseloads - Four Year Time Lag Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.149 0.212 -0.26 0.563 µ1 0.003 0.076 -0.147 0.153 σ20 0.722 0.175 0.453 1.13 σ21 0.292 0.05 0.211 0.407 Tab. 4.24: AFDC Regression of Discontinuances as a Function of Caseloads, Interval Es- timates and Refinements - Four Year Time Lag 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit µ0 -0.26 0.563 -0.265 0.562 µ1 -0.147 0.153 -0.146 0.152 σ20 0.453 1.13 0.441 1.108 σ21 0.211 0.407 0.206 0.401 Tab. 4.25: TANF Regression of Discontinuances as a Function of Caseloads - Four Year Time Lag Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.44 0.171 0.111 0.788 µ1 -0.008 0.075 -0.156 0.141 σ20 0.821 0.194 0.513 1.266 σ21 0.293 0.051 0.21 0.407 Tab. 4.26: TANF Regression of Discontinuances as a Function of Caseloads, Interval Es- timates and Refinements - Four Year Time Lag 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit µ0 0.111 0.788 0.106 0.782 µ1 -0.156 0.141 -0.156 0.141 σ20 0.513 1.266 0.5 1.246 σ21 0.21 0.407 0.206 0.403 101 Tab. 4.27: AFDC Regression of Discontinuances as a Function of Caseloads - Five Year Time Lag Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.269 0.241 -0.203 0.742 µ1 -0.002 0.076 -0.15 0.145 σ20 0.822 0.213 0.501 1.325 σ21 0.292 0.051 0.21 0.407 Tab. 4.28: AFDC Regression of Discontinuances as a Function of Caseloads, Interval Es- timates and Refinements - Five Year Time Lag 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit µ0 -0.203 0.742 -0.207 0.74 µ1 -0.15 0.145 -0.15 0.145 σ20 0.501 1.325 0.483 1.296 σ21 0.21 0.407 0.206 0.402 Tab. 4.29: TANF Regression of Discontinuances as a Function of Caseloads - Five Year Time Lag Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.541 0.175 0.205 0.889 µ1 -0.013 0.075 -0.159 0.138 σ20 0.744 0.191 0.458 1.19 σ21 0.292 0.05 0.21 0.404 Tab. 4.30: TANF Regression of Discontinuances as a Function of Caseloads, Interval Es- timates and Refinements - Five Year Time Lag 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit µ0 0.205 0.889 0.198 0.886 µ1 -0.159 0.138 -0.161 0.136 σ20 0.458 1.19 0.444 1.171 σ21 0.21 0.404 0.207 0.401 102 between the two posterior distributions.4 Overall, these result suggests that the previous year’s caseloads for TANF had a notably stronger impact in getting people off the welfare rolls and back to work than the previous year’s caseloads for AFDC programs did. The other models with the longer lagged covariates suggest that total caseloads had either a negligible or even a negative effect on employment-based discontinuances. There are also significant differences in using DRE to refine the 95% credible intervals. For example, the interval estimates regarding µ1 are altered by a factor of several thousand discontinuances per caseload, depending on the year. Additionally, the posterior distributions of the variances are also altered substantially. Regardless, however, the posterior estimates of σ20 and σ 2 1 all remain relatively small, suggesting that there is not much uncertainty pertaining to the relationship between caseloads and discontinuances. An additional fully frequentist statistical analysis is included in this chapter’s Appendix. This analysis also finds similar results to those above. 4.4 Additional Analysis In addition to the analysis above, we conducted further analysis to examine the efficacy of transforming AFDC to TANF. As our data of caseloads and discon- tinuances spanned from 1982-2009, we conducted another rigorous Bayesian analysis over the entire time horizon, with categorically-coded coefficients representing time in terms of year. Comparisons of these coefficient estimates from before the PROWA 4 As mentioned earlier, autocorrelations of the sample were quite low, suggesting that indepen- dence of the sample, a sufficient condition for being able to apply the Kolmogorov Smirnov test, was not an unreasonable assumption. 103 to after the PROWA can enable us to understand the efficacy of the reforms them- selves. Again, following the notation corresponding to our previous model, we can again define this new hierarchical Bayesian model over i ∈ {1, . . . , 53} territories (50 states as well as Guam, Puerto Rico, and the District of Columbia) (units) over t ∈ {1982, . . . , 2008} years as follows: yi ∼MVN(βi,0 + βi,1xi,1 + βt,Σ) (4.25) βi,0 ∼ N(µ0, σ 2 0) (4.26) βi,1 ∼ N(µ1, σ 2 1) (4.27) βt ∼ N(0, 10) (4.28) Σ−1 ∼Wishart(Φ0, ν0) (4.29) µ0 ∼ N(0, 10) (4.30) µ1 ∼ N(0, 10) (4.31) σ−20 ∼ Γ(10, 10) (4.32) σ−21 ∼ Γ(10, 10) (4.33) This model is quite similar to the model we examined earlier, but runs across the data set’s entire time horizon and also contains a term βt that quantifies the impact of time t′ on caseload reduction at time t. A posterior examination of this coefficient before and after the introduction of TANF can enable us to understand the impact of the 1996 reform. Just as we did earlier, we defined β1,0 = 0 as well as β1982=0 to ensure statistical identifiability of our model. We ran this Bayesian MCMC sampler for 30,000 iterations, using first half to dissipate our initial con- ditions and the remaining half for statistical inference. Our Bayesian posterior 104 estimates are outlined in Tables 4.31-4.32. As our posterior samples of µ0, µ1, and βt appeared to be symmetric and unimodal we again utilized our DRE approach to improve the estimation of our Bayesian credible intervals. Tab. 4.31: Additional Analysis Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit µ0 0.093 0.167 -0.231 0.428 µ1 0.006 0.074 -0.139 0.150 β1983 0.042 0.067 -0.091 0.174 β1984 -0.008 0.062 -0.131 0.113 β1985 -0.028 0.073 -0.175 0.114 β1986 0.010 0.080 -0.146 0.165 β1987 0.011 0.077 -0.142 0.160 β1988 0.032 0.082 -0.130 0.191 β1989 0.015 0.081 -0.146 0.175 β1990 0.048 0.082 -0.116 0.206 β1991 0.086 0.087 -0.091 0.256 β1992 0.076 0.082 -0.091 0.232 β1993 0.064 0.091 -0.120 0.240 β1994 0.199 0.100 0.005 0.394 β1995 0.189 0.086 0.019 0.356 β1996 0.573 0.183 0.201 0.919 β1997 0.512 0.158 0.191 0.811 β1998 0.290 0.123 0.044 0.531 β1999 0.294 0.111 0.074 0.511 β2000 0.249 0.104 0.040 0.451 β2002 0.272 0.115 0.048 0.498 β2003 0.305 0.116 0.079 0.534 β2004 0.294 0.115 0.069 0.518 β2005 0.344 0.134 0.084 0.613 β2006 0.382 0.162 0.060 0.702 β2007 0.298 0.151 -0.003 0.589 β2008 0.256 0.157 -0.059 0.568 σ20 0.778 0.164 0.508 1.151 σ21 0.288 0.049 0.207 0.397 We see a noticeable increase in our coefficient estimates pertaining to time after 1997, once the law had become fully enacted. In fact, the small posterior estimates of βt from the 1980s and early 1990s suggest that AFDC programs at the 105 Tab. 4.32: Additional Analysis, Interval Estimates and Refinements 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Lower Limit Upper Limit µ0 -0.231 0.428 -0.234 0.427 µ1 -0.139 0.150 -0.138 0.151 β1983 -0.091 0.174 -0.091 0.173 β1984 -0.131 0.113 -0.130 0.113 β1985 -0.175 0.114 -0.173 0.116 β1986 -0.146 0.165 -0.148 0.166 β1987 -0.142 0.160 -0.140 0.161 β1988 -0.130 0.191 -0.129 0.193 β1989 -0.146 0.175 -0.145 0.175 β1990 -0.116 0.206 -0.114 0.208 β1991 -0.091 0.256 -0.089 0.256 β1992 -0.091 0.232 -0.087 0.235 β1993 -0.120 0.240 -0.117 0.241 β1994 0.005 0.394 0.004 0.393 β1995 0.019 0.356 0.019 0.357 β1996 0.201 0.919 0.210 0.926 β1997 0.191 0.811 0.198 0.818 β1998 0.044 0.531 0.046 0.532 β1999 0.074 0.511 0.074 0.513 β2000 0.040 0.451 0.042 0.453 β2002 0.048 0.498 0.047 0.496 β2003 0.079 0.534 0.076 0.535 β2004 0.069 0.518 0.069 0.518 β2005 0.084 0.613 0.082 0.608 β2006 0.060 0.702 0.061 0.700 β2007 -0.003 0.589 0.001 0.590 β2008 -0.059 0.568 -0.053 0.567 σ20 0.508 1.151 0.500 1.134 σ21 0.207 0.397 0.203 0.392 time were essentially just a poverty trap, lulling people into poverty and hindering upward mobility. These posterior estimates increase notably in the 1990s, especially after the law was introduced in 1996. In Table 32, we present our Bayesian 95% credible intervals and subsequent refinements. We find that all 95% credible intervals regarding time before the 1996 welfare reform all had negative probability mass, whereas after the law had become fully enacted, they had no negative mass until β2007 and β2008. After our posterior refinements, however, all were positive except for β2008. This negativity was due to our MCMC sampler not properly summarizing the posterior. The higher point and interval estimates of these “post-welfare reform” coefficients, indicate that the law had indeed been successful in reducing dependence on government assistance. 4.4.1 Policy Implications Our analysis clearly illustrates the great success of welfare reform. Specifically, our Bayesian models suggest that the previous year’s caseloads had a significantly 106 more notable impact on getting people off of the welfare rolls and back to work under the TANF program compared to the AFDC program, particularly within one year. Subsequent year lags had a negligible or even a detrimental impact on caseload reductions due to employment. These general findings are substantiated by research produced by The Her- itage Foundation, the Cato Institute, and the Brookings Institution. In particular, research from these well-known think tanks has clearly illustrated that as a re- sult of welfare reform, caseloads declined, earnings increased, and poverty rates dropped [61, 128, 145, 152]. More rigorous work standards and time limits would almost surely improve on these results. 4.5 Conclusions and Future Research Our study offers a number of significant contributions. Firstly, our work also sheds light on the efficacy of welfare reform. Although safety nets are an impor- tant component of society, they should be constructed in a manner to help provide temporary assistance to get people back to work and become contributing members of society. If these programs are just an open-ended government handout, recipi- ents can become overly dependent on them and can subsequently become incapable of realizing the American dream. As Charles Murray’s Losing Ground illustrated, these programs have the potential to trap generations upon generations of people in poverty [111]. Although the PROWA is a case study in successful welfare reform, the law reformed only one of approximately seventy federal welfare programs [135]. Policymakers should consider similar reforms for these other programs. More fun- damentally, from a policymaker’s perspective, it is always important to understand the efficacy of policy reforms and make effective policy recommendations. Future re- 107 search should similarly examine other programs including Social Security, Medicare, Medicaid, public housing, and the Affordable Care Act among others. Methodologically, we have improved upon Bayesian credible interval estima- tion by applying semiparametric density ratio estimation to truncated posterior samples. Although this semiparametric modeling technique has had a number of applications in frequentist models, we are the first to apply the approach to Bayesian estimation. These methodological improvements of course need not only apply to Bayesian models pertaining to public policy research and are worthwhile tools to be applied in many other applied settings. 4.6 Appendix 4.6.1 An Elementary Frequentist Analysis As an additional test of our work, we also performed a frequentist analysis examining the impact of AFDC/TANF caseloads on discontinuances due to employ- ment. We estimated essentially the same model we estimated earlier, but instead made the (rather restrictive) assumption of homogeneity amongst the coefficients. Specifically, we estimated the simple linear model: yi,t = β0 + β1xi,t + i,t, (4.34) where yi,t is the number of employment related discontinuances for state/territory i at time t and xi,t′ is the number of caseloads at time t′, where t′ < t is a lagged time. We assumed i,t ∼ N(0, 1). Our results are presented in Table 4.33. In all five lagged comparisons, we see that the slope coefficient for TANF programs is higher than the slope coefficient for AFDC programs, suggesting that TANF was more successful than AFDC programs in getting people back to work. 108 Tab. 4.33: Frequentist Statistical Analysis Coefficient Estimate Standard Error t-stat P-value Adjusted R2 AFDC Lag 1 βˆ0 0.167 0.023 7.427 < 0.001 0.524 βˆ1 0.005 0.000 28.807 < 0.001 TANF Lag 1 βˆ0 0.184 0.033 5.653 < 0.001 0.561 βˆ1 0.011 0.000 28.722 < 0.001 AFDC Lag 2 βˆ0 0.172 0.024 7.141 < 0.001 0.513 βˆ1 0.005 0.000 27.147 < 0.001 TANF Lag 2 βˆ0 0.142 0.030 4.785 < 0.001 0.636 βˆ1 0.011 0.000 32.122 < 0.001 AFDC Lag 3 βˆ0 0.177 0.026 6.878 0.144 0.495 βˆ1 0.005 0.000 25.186 < 0.001 TANF Lag 3 βˆ0 0.098 0.028 3.492 0.002 0.690 βˆ1 0.011 0.000 34.544 < 0.001 AFDC Lag 4 βˆ0 0.185 0.028 6.605 < 0.001 0.478 βˆ1 0.005 0.000 23.380 < 0.001 TANF Lag 4 βˆ0 0.086 0.030 2.859 0.004 0.684 βˆ1 0.011 0.000 32.678 < 0.001 AFDC Lag 5 βˆ0 0.197 0.031 6.407 < 0.001 0.457 βˆ1 0.005 0.000 21.314 < 0.001 TANF Lag 5 βˆ0 0.079 0.033 2.373 0.018 0.670 βˆ1 0.011 0.000 29.507 < 0.001 109 To ensure that these differences were not insignificant, we tested the null hypothesis H0 for each lag that the slope coefficient for TANF programs, β1, was equal to the slope coefficient for AFDC programs for that same time lag. For example, for TANF regression results for lag 1, we tested the null hypothesis H0 that β0 = 0.005 against the two-sided alternative Ha that β0 6= 0.005. Our results are outlined in Table 4.34 for all five lags: Tab. 4.34: Statistical Significance of Slope Coefficients Lag t-stat p-value Lag 1 15.890 < 0.001 Lag 2 19.667 < 0.001 Lag 3 19.333 < 0.001 Lag 4 18.667 < 0.001 Lag 5 15.714 < 0.001 These results indicate a statistically significant difference between the slope coefficients of the lagged TANF models compared to the coefficients corresponding to the analogous AFDC models. These results lend further credence to the argument that TANF programs were more successful than AFDC programs in getting people off the welfare rolls and back to work. 110 Chapter 5: Generalized Bayesian Inferences for Counterterrorism Pol- icy with Improved Credible Interval Estimation via Semi- parametric Out of Sample Fusion 5.1 Introduction 5.1.1 Combating Terrorism From the Siccari during the first century siege of Jerusalem, to the Assassins during the 11th century in what is today’s Middle East, to the Thugs in India during the 1400s to 1800s, to the Wall Street bombings of the early 1920s, terrorism has been an issue endangering civilians for generations [69,126]. Today, myriads of questions abound regarding counterterrorsim policy throughout the world including questions pertaining to the U.S. military operations in Iraq and Afghanistan, the Troubles in Northern Ireland, the post-Suharto years in Indonesia, and the twenty-six year long civil war in Sri Lanka among others. Amongst scholars in the field of international relations, terrorism is gener- ally defined as “the deliberate use or threat of force against noncombatants by a non-state actor in pursuit of a political goal” [15]. Understanding how to fight ter- rorists has been an issue that policymakers throughout the world have debated and grappled with for years. Fighting terrorists is inherently different from many of the major conflicts of the twentieth century. During World War I and World War II, for example, both sides had uniformed combatants representing nation-states at war. During the Cold War, both sides understood the concept of mutually assured destruction, and consequently did not want to be annihilated in a retaliatory strike. Terrorists, however, are markedly different from traditional enemy combat- ants, as they generally blend in with civilians and deliberately target innocent men, women, and children, as well as military personnel. Many are typically brainwashed by the belief that a better life awaits them for slaughtering their enemies [38]. As we continue on into the twenty-first century, fighting terrorism remains an issue, and it is important to equip policymakers with the tools to be able to do so. A Google Scholar search of the keyword “terrorism” yields nearly 893,000 studies on the topic. Many of these studies include research statistically examining the risk of terrorist attacks as well as looking at certain types of terrorist attacks in particular localities across the globe [40, 71, 77, 117, 120, 155]. In 1971, Hawkes conducted a cluster analysis of terrorism data [62]. Enders (2007) offers a compre- hensive review of research on measuring terrorism, the efficacy of counterterroism policies, and the causes of terrorism and its various manifestations [35]. In the 1980s, Holden (1986, 1987) examined the “contagion effect” of American aircraft hijack- ings [65, 66]. Enders and Sandler (1995) examine terrorist behavior from game and choice theoretic perspectives [36]. Sandler and Arce (2003) discuss game theoretic analyses of terrorism and their various policy implications [144]. Li and Schaub (2005) conduct a time series analysis and examine the effect of economic global- ization on terrorist attacks [95]. Kaplan et al (2005, 2006) look at the impact on different “counterterror tactics,” on suicide bombings in Israel [75, 76]. Lewis et al (2012) statistically examine the changes over time in civilian deaths in Iraq as a result of terrorism [94]. In recent years, a number of researchers have utilized very sophisticated statistical modeling including negative binomial distribution models along with self-exiting hurdle models to examine the incidence as well as the prob- 112 ability of terrorist attacks [121,162,163] Although these papers are interesting both mathematically and from a policy perspective, these studies have typically been restricted to simply just one conflict or one type of terrorist attack. Additionally, much of the heavily statistical research in this area provides limited advice to policymakers about how to fortify specific security measures to prevent various types of terrorist attacks. In this chapter, we rigorously analyze terrorist attack data from a number of major conflicts throughout the world. In particular, we utilize Bayesian logistic regression techniques to offer counterterrorism strategies in four major conflicts across the globe. Kyung et al (2011) also conducted an analysis of terrorist attack data; however, their statistical models were not constructed in a way to offer prescriptive policy advice [90]. We not only present a model capable of providing such advice but also improve on Bayesian credible interval estimation techniques in the process. Our study helps shed light on the factors that influence the success of terrorist attacks, providing policymakers with advice on how to more strategically target their security and resources to help them battle against this very dangerous enemy. 5.1.2 Dirichlet Process Priors and the Limitations Bayesian Parametric methods With consistent improvements in statistical computing capabilities over the last several decades, the use of Bayesian methods has become increasingly com- mon in applied research. Bayesian methods are an attractive approach for modeling real-world phenomena for a number of reasons. One reason is that the associated sta- tistical inferences from such models are based conditionally on existing data rather than on the distributional properties of estimators or test statistics calculated over a long-run frequency of many imaginary unobserved samples. Additionally, Bayesian methods provide us with “exact-sample” results, rather than being rooted in the typical asymptotic theory that most frequentist statistical estimation methods gen- 113 erally assume. Thirdly, and perhaps most importantly, Bayesian methods enable us to tackle problems with high-dimensional parameter spaces that would generally be impossible to estimate from a purely frequentist perspective. One of the most controversial aspects of Bayesian methods, however, is the formulation of a prior distribution. Typically, prior assumptions about a model’s parameters are made subjectively by the researcher. Often these prior distributions belong to well-known parametric families. Normal and uniform distributions are for instance workhorse examples of priors in the applied Bayesian statistical literature. A common criticism with these types of parametric prior distributions, how- ever, is that since they are subjectively chosen by the researcher their distributional assumptions may therefore not necessarily model reality. The normal distribution, for example, has a limited degree of flexibility as it is unimodal, does not accom- modate skewness, and has relatively thin tails. If a normally distributed prior is a misspecification to begin with, then misleading inferences can result. Other choices of parametric prior distributions have similar types of limitations. Dirichlet Process priors (hereafter referred to as DP priors) do not have these restrictions as they allow researchers to weaken the restrictive assumptions generally concomitant with Bayesian statistical models [4,41]. These prior distributions have been used in a number of settings including applied economics research, health policy research, and examining the incidence of illness among others [3, 72, 88]. DP priors avoid the typical strict parametric assumptions regarding heterogeneity mentioned above and instead utilize an unknown distribution G to model heterogeneity. As G is assumed to be random, a DP Prior can be placed on this distribution. Dirichlet Processes thus enable the researcher to place a probability distribution over a space of probability distributions. Mathematically, we can describe a DP prior G ∼ DP (G|G0, α) consists of two “parameters:” G0, a parametric baseline probability measure and a concentration 114 parameter α. The baseline measure G0 can be considered to be a prior assumption regarding the population distribution. The Dirichlet Process transforms this baseline measure into a discrete probability distribution with the concentration parameter α determining how close the non-parametric distribution G is to the baseline measure. Smaller values of α indicate greater departure from the baseline measure. In the limit, as α → ∞, G ⇒ G0 (i.e. the Dirichlet Process converges in the measure to the baseline measure). A commonly-used choice for the baseline measure is a normal distribution, which the Dirichlet Process discretizes. Since discrete probability measures have non-zero probabilities of observing identical values, they are often used for cluster analysis. For each particular sample from an MCMC simulation, identical posterior estimates of parameters believed to be a priori following a Dirichlet Process, are considered to belong to the same cluster of observations. Thus, a nice aspect of the Dirichlet Process is that it not only alleviates the strict parametric assumptions typically associated with parametric prior distributions, but it also alleviates the similarly restrictive assumptions associated with finite mixture models that a priori assume a particular number of segments [22, 74]. In this chapter, we employ DP priors to weaken the generally restrictive as- sumptions associated with commonly chosen parametric prior distributions for a Bayesian logistic regression model. In addition, we also offer some improvements to the credible interval estimates generated by our Bayesian estimation. We discuss the techniques for doing so in the next section. 5.1.3 A Brief Review of Density Ratio Estimation Density ratio estimation (referred to as DRE throughout this dissertation) is a semiparametric modeling approach. As discussed in the previous chapter of this dissertation, this technique has seen myriads of applications ranging from AIDS 115 research to mortality rate prediction to cluster detection among many others [43,52, 80, 81, 83]. In the univariate case, we typically assume that there are I + 1 random samples (xi1, . . . , xini), having probability density functions gi, such that: xij ∼ gi, i = 1, . . . , I, I + 1, j = 1, . . . , ni. (5.1) This approach assumes that gI+1 ≡ g defines a reference density having a known ra- tio between gi and g. In many applications, this ratio is defined to be an exponential in terms of a vector-valued tilt function h(x): gi(x) g(x) = eαi+β ′ ih(x), i = 1, . . . , I. (5.2) By assuming a particular ratio, the statistician can estimate the parameters αi and βi as well as the distributions of Gi and G (the CDFs of gi and g) ∀ i = 1, . . . , I using the method of empirical likelihood. The methodology has been discussed in detail in the previous chapter of this dissertation. 5.1.4 Using DRE to Improving on Bayesian Estimation Results Typically, one would estimate posterior functionals using standard Markov Chain Monte Carlo (MCMC) methods [50]. Researchers would use the resulting MCMC samples to generate a variety of statistical estimators such as posterior means, standard deviations, and credible interval estimates. As discussed in Chapter 4, however, a common limitation of MCMC meth- ods stems from computational feasibility. Most Bayesian MCMC samplers need to be truncated in “real-time.” As a result, it is difficult to understand if we have properly navigated the posterior distribution. Although we have some diagnostic checks, such as autocorrelation of draws and convergence of chains [51], these tests are merely diagnostics that may shed light on the issue, but we never completely 116 understand whether our MCMC sampler truly manifests our entire posterior distri- bution. Therefore, point and interval estimates based on MCMC samples may not necessarily be sufficiently accurate from which to make statistical inferences. One can ameliorate this limitation by using the DRE Method [123–125, 160]. Although this semiparametric approach has been used in a variety of settings, this dissertation is the first to apply the methodology in Bayesian estimation. In the previous chapter of this dissertation, we applied the methodology to hierarchical Bayesian linear regression. In that chapter, we illustrated, via a series of numerical simulations, that the DRE method has the ability to provide more accurate quantiles of distributions. In this chapter, we extend our application of this methodology to a generalized lienar hierarchical Bayesian model. The interested reader is referred to the previous chapter for simulation results demonstrating the efficacy of this approach. 5.2 Problem Formulation 5.2.1 Model Consider a data set of terrorist attacks recorded over the course of t ∈ {1, . . . , T} discrete time occasions. We define the binary outcome variable: yt =    1 if the terrorist attack is successful at time t 0 otherwise, (5.3) where pt = Prob(yt = 1) is the probability that a terrorist attack is successful. We examined the success of terrorist attacks by estimating the following binary logistic regression model: 117 P (Y |β) = T∏ t=1 eyt(β0+β1Xt,1+β2Xt,2+βt,3Xt,3) 1 + eβ0+β1Xt,1+β2Xt,2+βt,3Xt,3 , (5.4) Xt,1 =    1 if the terrorist attack at time t is a suicide attack 0 otherwise, (5.5) Xt,2 = (xt,2,1, . . . , xt,2,n2) is a categorically coded explanatory variable denoting the type of terrorist attack (assassination, hijacking, bombing, etc.) Armed assaults, defined in the appendix, are used as a benchmark variable to ensure statistical iden- tifiability of the model. Finally, Xt,3 = (xt,3,1, . . . , xt,2,n3) is a categorically coded explanatory variable denoting the target of the terrorist attack (airports/airlines, business, educational institution, food or water supply, government, etc.). For the analysis of the conflicts in Afghanistan, Iraq, Sri Lanka, and Northern Ireland after the 1998 Good Friday Agreement, airports/airlines were the benchmark variable. For analysis of the conflict of Northern Ireland before the 1998 Good Friday Agree- ment, abortion related targets were the benchmark variable. β2 and βt,3 are of course n2 and Tn3 dimensional vectors respectively. We allow β0 ∼ N(µ0, σ20) and β1 ∼ N(µ1, σ 2 1). Additionally, we allow β2 ∼ N(µ2,Σn2×n2) and βt,3 to either follow a normally distributed prior or to follow a DP prior. Mathematically, our two choices for varying βt,3 are either βt,3 ∼ N(µ3,ΣTn3×Tn3) or βt,3 ∼ G, where G ∼ DP (α,N(µ3,ΣTn3×Tn3)). µ0, µ1, as well as each component of the vectors µ2 and µ3 all follow a normal distribution with mean 0 and variance 10. σ20 and σ 2 1 are each specified to follow an inverse gamma distribution with shape and scale parameter each equal to 10. Additionally, for computational simplicity Σn2×n2 and ΣTn3×Tn3 are assumed to be diagonal matrices, with each component also drawn from an inverted gamma distribution 118 with shape and scale parameter each equal to 10. As discussed earlier, βt,3 to follow a DP prior enables us to weaken the strict parametric assumptions associated with a normal prior distributions as well as to cluster our analysis around the types of targets terrorists intend to attack. 5.2.2 Data Our data sets were obtained from the START GTD Database, a database compiled by the University of Maryland, containing detailed information on over 113,000 terrorist attacks [54]. We performed a statistical analysis on four different conflicts - The War in Afghanistan, the War in Iraq, the Civil War in Sri Lanka, and the Civil War in Ireland. The dependent variable in our Bayesian logistic regression was whether or not the attack was successful, and our explanatory variables involved whether the attack was a suicide attack, the type of attack, and the targets of the attack. All of our explanatory variables were categorically coded. For each conflict, a few observations could not definitively provide information regarding the details of the attack (such as whether the attack was a suicide attack or whether the attack was successful) and hence was excluded from our analysis. The START database defined the success of a terrorist attack as whether the type of attack actually took place. For example, a bombing/explosion is considered successful if the bomb involved actually detonated. More details are contained in the appendix to this chapter. 119 5.3 A Bayesian Analysis of Several Major Conflicts Across the Globe 5.3.1 Estimation We rigorously examined four recent conflicts - The War in Afghanistan, the War in Iraq, the Civil War in Sri Lanka, and the Civil War in Northern Ireland. For each conflict, we estimated the two Bayesian logistic regression models outlined in Section 5.2, with one model assuming normally distributed priors for the coefficient corresponding to the target of the terrorist attack and other assuming the less re- strictive DP priors. We estimated both models via MCMC methods over the course 30,000 iterations, using the first half for burn in and the remaining half for statistical inference. For the model assuming normal prior distributions, we ran our MCMC sampler in WinBUGS [97]. For the model assuming DP priors we ran our sampler using the R Package DPPackage [70] that used well-known algorithms for MCMC sampling from non-conjugate priors for DP prior models [37, 98, 112]. Autocorrela- tions of each marginal posterior sample were low, suggesting reasonable navigation around the posterior density. As mentioned earlier, however, autocorrelations are simply just a diagnostic check and cannot truly elicit whether the posterior density has been adequately sampled. As a result, in addition to our standard Bayesian analysis, we also present improvements to the Bayesian interval estimates of a few important posterior coefficients using DRE. 5.3.2 The War in Afghanistan On September 11, 2001, foreign terrorists struck the United States in the most devastating attack on American soil since Pearl Harbor. A group of nineteen terrorists hijacked three U.S. passenger planes, crashing them into the World Trade 120 Center in New York and the Pentagon in Washington, D.C. A fourth hijacked plane, intended for the U.S. Capitol, crashed in rural Pennsylvania that same morning. A total of over 3,000 innocent Americans died in these attacks. Shortly thereafter, overwhelming evidence made it quite apparent that the attacks were perpetrated by the terrorist group al Qaeda, headed at the time by Osama bin Laden. The Taliban in Afghanistan had been harboring bin Laden, was known for aiding and abetting al Qaeda, and was infamous for sponsoring terrorism [79]. After the Taliban refused to hand bin Laden over to American custody, the United States used military force to remove the Taliban from power, began efforts toward finding bin Laden, and helped install a democratic Afghan government that renounces terrorism [153]. Since military operations in Afghanistan began thirteen years ago, the United States military has maintained a consistent presence in Afghanistan, providing stability and support to help the relatively new Afghan government. Unfortunately, however, terrorist attacks in Afghanistan have continued over the course of the last several years. Tables 5.1 and 5.2 present a Bayesian statis- tical analysis of these attacks, using the modeling approach outlined above. The data used spans from slightly after the September 11th attacks (when U.S. mili- tary operations in Afghanistan began) through December 2011 consisting of 2887 observations. Our results are quite informative. In Afghanistan, both models indicate that suicide attacks, with posterior estimates around -1 in both models, were generally unsuccessful as were assassination attempts (with posterior mean coefficient esti- mate -3.147 in the normal model and -2.858 in the DP model), bombings (-0.541 in the normal model and -0.634 in the DP model), attacks on infrastructure (-0.176 in the normal model and -0.418 in the DP model), and hijackings (-1.931 in the normal model and -1.787 in the DP model.) These posterior estimates indicate the 121 Tab. 5.1: Afghanistan - Hierarchical Bayesian Model Using Normally Distributed Priors Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit Intercept 4.605 0.553 3.541 5.645 Suicide -1.281 0.236 -1.741 -0.811 Assassination -3.147 0.345 -3.841 -2.506 Bombing/Explosion -0.541 0.327 -1.191 0.081 Facility/Infrastructure -0.176 0.618 -1.321 1.110 Hijacking -1.931 1.472 -4.556 1.276 Hostage Taking - Barricade Incident 2.035 2.199 -1.543 7.024 Hostage Taking - Kidnapping 1.075 0.815 -0.352 2.855 Unknown 1.787 2.238 -1.889 6.706 Mean - Business 0.917 0.798 -0.454 2.528 Mean - Educational Institution -0.767 0.640 -2.011 0.452 Mean - Food or Water Supply 1.329 2.288 -2.595 6.176 Mean - Government (Diplomatic) -0.454 0.695 -1.757 0.958 Mean - Government (General) -0.610 0.493 -1.500 0.321 Mean - Journalists & Media 2.100 2.116 -1.270 6.734 Mean - Maritime 1.347 2.357 -2.746 6.468 Mean - Military -0.318 0.564 -1.345 0.804 Mean - NGO -0.530 0.777 -1.978 0.985 Mean - Other 0.077 1.204 -2.006 2.704 Mean - Police -0.085 0.516 -1.016 0.957 Mean - Private Citizens & Property 1.674 0.679 0.465 2.989 Mean - Religious Figures/Institutions 1.323 0.974 -0.440 3.259 Mean - Telecommunication 2.743 2.274 -0.837 7.969 Mean - Terrorists -0.381 1.089 -2.448 1.827 Mean - Tourists 0.640 2.851 -4.604 6.452 Mean - Transportation -1.085 0.618 -2.256 0.162 Mean - Unknown -2.469 0.696 -3.808 -1.111 Mean - Utilities -1.660 1.017 -3.502 0.498 Variance - Business 1.122 0.399 0.585 2.136 Variance - Educational Institution 1.148 0.406 0.601 2.148 Variance - Food or Water Supply 1.097 0.381 0.583 2.038 Variance - Government (Diplomatic) 1.110 0.394 0.587 2.068 Variance - Government (General) 1.049 0.334 0.577 1.868 Variance - Journalists & Media 1.097 0.379 0.588 2.010 Variance - Maritime 1.101 0.382 0.587 2.070 Variance - Military 1.228 0.459 0.623 2.374 Variance - NGO 1.111 0.392 0.586 2.098 Variance - Other 1.134 0.409 0.594 2.177 Variance - Police 1.032 0.322 0.577 1.814 Variance - Private Citizens & Property 1.106 0.361 0.596 2.020 Variance - Religious Figures/Institutions 1.149 0.427 0.596 2.212 Variance - Telecommunication 1.094 0.383 0.580 2.021 Variance - Terrorists 1.157 0.430 0.597 2.214 Variance - Tourists 1.109 0.388 0.581 2.054 Variance - Transportation 1.095 0.379 0.588 2.043 Variance - Unknown 1.102 0.396 0.579 2.072 Variance - Utilities 1.152 0.430 0.594 2.233 Variance - Unknown 1.112 0.402 0.581 2.109 Variance - Utilities 1.144 0.417 0.594 2.183 122 Tab. 5.2: Afghanistan - Hierarchical Bayesian Model Using DP priors Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit (Intercept) 3.965 0.348 3.289 4.664 Suicide -1.036 0.208 -1.429 -0.622 Assassination -2.858 0.299 -3.445 -2.277 Bombing/Explosion -0.634 0.314 -1.245 -0.013 Facility/Infrastructure -0.418 0.562 -1.416 0.776 Hijacking -1.787 1.225 -4.007 0.863 Hostage Taking (Barricade Incident) 1.976 2.220 -1.472 7.026 Hostage Taking (Kidnapping) 1.017 0.796 -0.363 2.820 Unknown 2.059 2.350 -1.635 7.147 µBaseline 3.853 0.635 2.569 5.086 ΣBaseline 1.007 1.052 0.214 3.286 Number of clusters 4.691 1.552 2.000 8.000 α 1.054 0.566 0.249 2.419 Business 4.863 0.655 3.539 5.942 Educational Institution 3.644 0.346 2.910 4.312 Food or Water Supply 4.084 0.826 2.694 5.781 Government (Diplomatic) 3.721 0.371 3.028 4.543 Government (General) 3.636 0.287 3.067 4.204 Journalists & Media 4.274 0.838 3.048 5.898 Maritime 4.106 0.831 2.683 5.814 Military 3.734 0.316 3.139 4.384 NGO 3.715 0.435 2.907 4.812 Other 3.948 0.666 2.923 5.556 Police 3.835 0.322 3.228 4.523 Private Citizens & Property 5.255 0.418 4.460 6.120 Religious Figures/Institutions 4.878 0.683 3.475 5.985 Telecommunication 4.424 0.843 3.174 5.965 Terrorists 3.782 0.582 2.679 5.287 Tourists 3.989 0.854 2.301 5.742 Transportation 3.519 0.421 2.526 4.222 Unknown 2.634 0.727 1.321 3.956 Utilities 3.472 0.650 1.913 4.687 success of the U.S. military in preventing terrorist attacks of this nature. Attacks related to hostage taking, on the other hand, had positive coefficients and have therefore generally been much more successful. As a result, it could be useful for the U.S. military to work with Afghan security forces to determine where hostage taking is predominately occurring and improving security around such locations. Additionally, our model using DP priors suggests that there may be some clustering of attacks around potential targets not necessarily observable from the data directly. We will discuss this result in more detail later in this chapter. 5.3.3 The War in Iraq For decades, Iraq has been a country housing three distinct ethnic and religious groups - Sunni Muslims, Shi’ite Muslims, and Kurds. Saddam Hussein, the President of Iraq from 1979 up until his fall in 2003, was infamous for being an egregious violator of human rights, providing preferential treatment for many of his fellow Sunni Muslims, while subjecting many Shi’ite Muslims and Kurds to severe and 123 often deadly persecution. In 1982, for example, Saddam detained nearly 800 men, women, and children from the Shi’ite town of Dujail following a failed assassination attempt. In court, prosecutors and witnesses claimed that they witnessed torture and murder of many of these civilians. In 1988, during the final days of the Iran- Iraq war, Saddam ordered a poison gas attack in the northern Iraqi town of Halabja, killing thousands of innocent civilians and injuring scores of others. Many of the survivors are still suffering from the long term effects today [64,101]. In addition to human rights violations, Saddam invaded two of his neighboring countries, supported terrorists, led the international community to believe that he had intentions to acquire illicit weapons, and attempted to assassinate a former U.S. President [34, 153]. In light of these violations of international law, President Bill Clinton in 1998 signed into law the Iraq Liberation Act, making regime change in Iraq the explicit policy of the United States government. Four years later, Pres- ident George W. Bush signed into law the Iraq War Resolution, allowing the use of American ground troops to carry out the goals outlined in the Iraq Liberation Act [5, 68]. In 2003, a military coalition led by the United States and British governments removed Saddam from power and helped install a democratic regime that espouses the values of freedom and human rights. Although the initial campaign in Iraq to remove Saddam was astonishingly successful, maintaining stability in the immediate post-Saddam Iraq was quite difficult. Terrorist attacks occurred on a regular basis, jeopardizing the lives of coalition forces as well as civilians. Recent research has shown that most of these suicide attacks have been targeted towards civilians [148]. Tables 5.3-5.4 contain an analysis of these attacks from 2003 through 2011, which consisted of 7621 observations, using our Bayesian approach. Our posterior coefficient estimates for suicide attacks illustrate that, just like Afghanistan, suicide attacks were generally unsuccessful (with posterior mean coef- 124 Tab. 5.3: Iraq - Hierarchical Bayesian Model Using Normally Distributed Priors Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit Intercept 4.838 0.502 4.011 5.910 Suicide -0.522 0.199 -0.906 -0.130 Assassination -4.261 0.290 -4.844 -3.721 Bombing/Explosion -1.063 0.282 -1.628 -0.530 Facility/Infrastructure -0.770 0.746 -2.128 0.791 Hijacking 0.522 2.822 -4.605 6.178 Hostage Taking - Barricade Incident 2.077 2.189 -1.384 6.985 Hostage Taking - Kidnapping -0.682 0.559 -1.711 0.493 Unknown 1.265 2.475 -2.872 6.609 Mean - Business 0.677 0.573 -0.528 1.694 Mean - Educational Institution -0.536 0.606 -1.768 0.599 Mean - Food or Water Supply 2.000 2.521 -2.309 7.333 Mean - Government (Diplomatic) -0.420 0.590 -1.644 0.692 Mean - Government (General) -0.076 0.488 -1.131 0.684 Mean - Journalists & Media 1.364 0.971 -0.373 3.532 Mean - Maritime -1.709 1.514 -4.549 1.478 Mean - Military 0.112 0.510 -1.007 0.992 Mean - NGO 0.349 1.284 -1.981 2.995 Mean - Other 1.651 1.086 -0.433 3.839 Mean - Police 0.358 0.459 -0.660 1.071 Mean - Private Citizens & Property 0.998 0.493 -0.167 1.728 Mean - Religious Figures/Institutions 0.543 0.542 -0.664 1.532 Mean - Telecommunication 1.917 2.326 -1.943 7.024 Mean - Terrorists 0.027 0.594 -1.196 1.101 Mean - Tourists 1.214 2.109 -2.321 5.838 Mean - Transportation 0.280 0.614 -0.943 1.483 Mean - Unknown -1.579 0.724 -3.007 -0.143 Mean - Utilities -0.635 0.643 -1.911 0.615 Mean -Violent Political Party 3.536 1.590 0.875 7.189 Variance - Business 1.126 0.374 0.588 2.030 Variance - Educational Institution 1.176 0.419 0.601 2.201 Variance - Food or Water Supply 1.100 0.387 0.584 2.057 Variance - Government (Diplomatic) 1.226 0.459 0.618 2.349 Variance - Government (General) 1.392 0.460 0.707 2.452 Variance - Journalists & Media 1.185 0.469 0.598 2.364 Variance - Maritime 1.122 0.405 0.585 2.113 Variance - Military 1.080 0.348 0.586 1.969 Variance - NGO 1.107 0.383 0.581 2.059 Variance - Other 1.086 0.376 0.576 2.021 Variance - Police 0.831 0.229 0.479 1.323 Variance - Private Citizens & Property 1.163 0.394 0.615 2.172 Variance - Religious Figures/Institutions 1.280 0.519 0.616 2.560 Variance - Telecommunication 1.096 0.381 0.582 2.046 Variance - Terrorists 1.155 0.423 0.602 2.252 Variance - Tourists 1.101 0.384 0.584 2.055 Variance - Transportation 1.102 0.435 0.577 2.219 Variance - Unknown 1.118 0.396 0.595 2.116 Variance - Utilities 1.119 0.374 0.603 2.041 Variance -Violent Political Party 1.103 0.386 0.582 2.057 Variance - Utilities 1.135 0.411 0.587 2.150 Variance - Violent Political Party 1.107 0.393 0.586 2.064 125 Tab. 5.4: Iraq - Hierarchical Bayesian Model Using DP priors Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit (Intercept) 4.543 0.278 4.012 5.109 Suicide -0.449 0.180 -0.785 -0.087 Assassination -3.836 0.264 -4.361 -3.316 Bombing/Explosion -1.049 0.267 -1.593 -0.540 Facility/Infrastructure -1.022 0.639 -2.187 0.274 Hijacking 0.622 2.851 -4.434 6.538 Hostage Taking (Barricade Incident) 2.094 2.149 -1.361 6.976 Hostage Taking (Kidnapping) -0.645 0.517 -1.590 0.404 Unknown 1.390 2.480 -2.778 6.753 µBaseline 4.546 0.434 3.669 5.387 ΣBaseline 0.372 0.389 0.099 1.200 Number of clusters 4.685 1.566 2.000 8.000 α 1.046 0.572 0.242 2.440 Business 4.857 0.375 4.152 5.599 Educational Institution 4.251 0.368 3.473 4.961 Food or Water Supply 4.576 0.486 3.710 5.555 Government (Diplomatic) 4.257 0.347 3.548 4.939 Government (General) 4.241 0.269 3.725 4.785 Journalists & Media 4.823 0.441 3.992 5.685 Maritime 4.405 0.493 3.379 5.357 Military 4.425 0.317 3.822 5.064 NGO 4.506 0.462 3.639 5.467 Other 4.741 0.458 3.918 5.646 Police 4.633 0.281 4.097 5.199 Private Citizens & Property 5.143 0.294 4.579 5.740 Religious Figures/Institutions 4.673 0.363 3.993 5.419 Telecommunication 4.586 0.482 3.717 5.537 Terrorists 4.400 0.345 3.756 5.104 Tourists 4.563 0.482 3.689 5.528 Transportation 4.545 0.371 3.870 5.329 Unknown 4.112 0.491 2.898 4.930 Utilities 4.260 0.381 3.437 4.995 Mean -Violent Political Party 4.822 0.484 3.946 5.784 ficient estimates approximately -0.5 in both models), although not as unsuccessful as Afghanistan. Assassination attempts (-4.261 in the normal model and -3.736 in the DP model), bombings (-1.063 in the normal model and -1.049 in the DP model), and attacks on Facility/Infrastructure (-0.770 in the normal model and -1.022 in the DP model) were also generally unsuccessful. It is interesting to note, however, that both models suggest that hijacking (0.522 in the normal model and 0.622 in the DP model) and hostage taking via barricade incidents (2.077 in the normal model and 2.094 in the DP model), on the other hand, were successful. Hostage taking via kidnapping (-0.682 in the normal model and -0.645 in the DP model), however, was not. These results suggest that Iraqi security forces that have taken respon- sibility for the country after the departure of U.S. troops in 2011 should consider fortifying security to prevent hijackings and hostage takings via barricade incident. The Iraqi government could consider providing additional security personal or offer recommendations regarding private security to do so. 126 5.3.4 The Sri Lankan Civil War The Sri Lankan Civil War was a 26 year conflict from 1983-2009. Fought by the Liberation Tigers of Tamil Eelam (LTTE), the war was waged against the government of Sri Lanka, who were seeking to create a separate Tamil state within the northeastern part of the country. The LTTE engaged in terrorist attacks rang- ing from suicide attacks against civilians, to coordinated attacks against religious figures and important facilities, to assassination attempts against members of the Sri Lankan government. These attacks included the 1993 assassination of President Ranasinghe Premadasa, the 1996 bombings of the Sri Lankan Central Bank, killing over 90 and injuring over 1400, and the 1998 bombing of the revered Temple of the Tooth among others [54]. Shortly after the killing of LTTE leader Velupillai Prabhakaran in 2009, the 26 year-long Civil War ended [161]. Tables 5.5 and 5.6 contain our Bayesian analysis of this data set from over the course of the conflict which consisted of 2924 observations. The results from the civil war in Sri Lanka are quite similar to those in Iraq. In particular, suicide attacks (with posterior mean coefficient estimate -0.421 in the normal model and -0.371 in the DP model), assassination attempts (-1.487 in the normal model and -1.532 in the DP model), bombing attempts (-1.206 in the normal model and -1.169 in the DP model), attacks on facility/infrastructure (essentially zero in the normal model and -0.196 in the DP model), hostage taking via kidnapping (-0.833 in the normal model and -0.877 in the DP model), unarmed assaults (-1.206 in the normal model and -1.201 in the DP model), and terrorist attacks of an unknown nature (-1.454 in the normal model and -1.359 in the DP model) were largely unsuccessful. Hostage taking via barricade incident (1.405 in the normal model and 1.848 in the DP model) and hijackings (1.822 in the normal model and 1.848 in the DP model), on the other hand, were much more successful. Despite the fact that the war ended in 2009, understanding these issues could be 127 Tab. 5.5: Sri Lanka - Hierarchical Bayesian Model Using Normally Distributed Priors Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit Intercept 3.842 0.570 2.867 5.018 Suicide -0.421 0.386 -1.153 0.351 Assassination -1.487 0.330 -2.148 -0.844 Bombing/Explosion -1.206 0.277 -1.752 -0.671 Facility/Infrastructure 0.006 0.628 -1.147 1.326 Hijacking 1.822 2.276 -1.933 6.908 Hostage Taking - Barricade Incident 1.405 2.448 -2.774 6.641 Hostage Taking - Kidnapping -0.833 0.659 -2.039 0.547 Unarmed Assault -1.206 1.455 -3.793 1.898 Unknown -1.454 0.431 -2.282 -0.581 Mean - Business 0.448 0.671 -0.879 1.722 Mean - Educational Institution 3.426 1.919 0.363 7.843 Mean - Food or Water Supply 2.225 2.205 -1.553 6.978 Mean - Government (Diplomatic) -1.061 0.962 -2.926 0.865 Mean - Government (General) -0.021 0.580 -1.224 1.069 Mean - Journalists & Media 0.094 0.867 -1.507 1.883 Mean - Maritime 1.100 1.169 -0.941 3.678 Mean - Military 0.580 0.575 -0.509 1.579 Mean - NGO 3.277 1.884 0.096 7.414 Mean - Other -0.701 0.920 -2.465 1.159 Mean - Police 0.416 0.602 -0.832 1.485 Mean - Private Citizens & Property 2.324 0.674 0.961 3.534 Mean - Religious Figures/Institutions 1.763 1.203 -0.340 4.471 Mean - Telecommunication -0.960 0.939 -2.785 0.895 Mean - Terrorists 3.528 1.781 0.498 7.640 Mean - Tourists 1.671 2.359 -2.406 6.703 Mean - Transportation 0.418 0.631 -0.797 1.649 Mean - Unknown -2.413 0.675 -3.776 -1.128 Mean - Utilities 0.232 0.850 -1.375 2.017 Mean - Violent Political Party 0.466 0.707 -0.876 1.857 Variance - Business 1.102 0.392 0.585 2.108 Variance - Educational Institution 1.092 0.385 0.571 2.028 Variance - Food or Water Supply 1.103 0.388 0.587 2.056 Variance - Government (Diplomatic) 1.132 0.403 0.592 2.145 Variance - Government (General) 1.141 0.430 0.591 2.157 Variance - Journalists & Media 1.117 0.389 0.585 2.099 Variance - Maritime 1.115 0.394 0.590 2.075 Variance - Military 1.149 0.452 0.591 2.313 Variance - NGO 1.096 0.385 0.582 2.050 Variance - Other 1.134 0.413 0.590 2.190 Variance - Police 1.146 0.439 0.600 2.213 Variance - Private Citizens & Property 1.083 0.359 0.574 1.997 Variance - Religious Figures/Institutions 1.094 0.382 0.577 2.070 Variance - Telecommunication 1.126 0.397 0.593 2.101 Variance - Terrorists 1.097 0.385 0.577 2.053 Variance - Tourists 1.109 0.393 0.581 2.090 Variance - Transportation 1.139 0.414 0.592 2.169 Variance - Unknown 1.180 0.437 0.602 2.271 Variance - Utilities 1.133 0.405 0.601 2.131 Variance - Violent Political Party 1.112 0.407 0.576 2.096 128 Tab. 5.6: Sri Lanka - Hierarchical Bayesian Model Using DP priors Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit (Intercept) 3.890 0.354 3.233 4.627 Suicide -0.371 0.325 -0.982 0.282 Assassination -1.532 0.300 -2.128 -0.950 Bombing/Explosion -1.169 0.263 -1.699 -0.658 Facility/Infrastructure -0.196 0.597 -1.262 1.044 Hijacking 1.848 2.151 -1.611 6.598 Hostage Taking (Barricade Incident) 1.345 2.439 -2.720 7.063 Hostage Taking (Kidnapping) -0.870 0.598 -1.948 0.414 Unarmed Assault -1.201 1.341 -3.520 1.832 Unknown -1.359 0.407 -2.121 -0.517 µBaseline 3.506 0.775 1.896 4.967 ΣBaseline 1.944 3.062 0.463 6.355 Number of clusters 4.513 1.349 3.000 8.000 α 0.999 0.529 0.263 2.306 Business 3.763 0.266 3.271 4.305 Educational Institution 4.698 0.979 3.371 6.455 Food or Water Supply 4.188 0.886 3.032 6.147 Government (Diplomatic) 3.262 0.830 1.368 4.230 Government (General) 3.669 0.284 3.053 4.191 Journalists & Media 3.742 0.369 3.027 4.473 Maritime 4.014 0.678 3.223 5.836 Military 3.783 0.241 3.333 4.284 NGO 4.572 0.973 3.336 6.374 Other 3.519 0.609 1.793 4.279 Police 3.759 0.242 3.302 4.253 Private Citizens & Property 5.452 0.465 4.598 6.410 Religious Figures/Institutions 4.456 0.875 3.357 6.159 Telecommunication 3.322 0.785 1.446 4.226 Terrorists 4.992 0.939 3.459 6.571 Tourists 4.060 0.930 2.080 6.074 Transportation 3.753 0.255 3.269 4.267 Unknown 1.604 0.454 0.716 2.499 Utilities 3.752 0.367 3.068 4.479 Violent Political Party 3.764 0.286 3.250 4.339 useful for the Sri Lankan government as post-war reconciliation processes continue. 5.3.5 The Troubles Commonly known as “The Troubles,” the Civil War in Ireland was an ethno- nationalist conflict over the constitutional status of Northern Ireland [102]. The Troubles bears many similarities to the Civil War in Sri Lanka. Irish nationalists, primarily Catholic, wanted an Independent State of Northern Ireland, while Union- ists and loyalists, primarily Protestant, preferred Northern Ireland to remain part of the United Kingdom. The Unionists and loyalists were represented politically by the Ulster Volunteer Force (UVF) and the Ulster Defence Association (UDA), while the Irish nationalists were represented by the Irish Republican Army (IRA). Unlike many political groups, however, the UVF, UDA, and IRA were known for engaging in violent behavior, including engaging in attacks targeting civilians. In fact, these organizations have been deemed by the United States State Department 129 as terrorist groups alongside organizations such as al Qaeda, Hezbollah, Hamas, and LTTE among others [158]. The Good Friday Agreement of 1998, a compromise that Northern Ireland would remain a component of the United Kingdom until the people of northern Ireland and the Republic of Ireland would determine otherwise, contained provisions for the creation of institutions to civilly discuss issues between Northern Ireland and the Republic of Ireland as well as between Britain and Ireland [114]. The Agreement, however, was not a panacea for the country’s issues, and terrorist attacks continued after its passage. Tables 5.7-5.10 contain our analysis of the attacks in Northern Ireland from 1970 up through the Good Friday Agreement of 1998 (consisting of 3517 observations) and then from 1998 through the present (consisting of 457 observations). During these time periods, only one suicide attack occurred in our data set (on December 29, 1998 in Armagh, Northern Ireland) and was consequently excluded from our analysis. Our posterior estimates indicate that hostage taking via kidnapping (posterior mean coefficient estimates 1.548 in the normal model and 1.429 in the DP model), unarmed assaults (2.585 in the normal model and 2.619 in the DP model), and terrorist attacks of an unknown nature (0.156 in the normal model and 2.619 in the DP model) were the most successful types of terrorist attacks in Ireland before the Good Friday Agreement of 1998. Although terrorist attacks continued after the Agreement, the most successful types of attacks were via hijacking (1.146 in the normal model and 2.869 in the DP model) and hostage taking via kidnapping (1.090 in the normal model and 1.042 in the DP model). The success of unarmed assaults declined after the Good Friday Agreement of 1998 as the posterior coefficient estimates became negative (-0.865 in the normal model -1.083 in the DP model). 130 Tab. 5.7: Ireland before the 1998 Good Friday Agreement - Hierarchical Bayesian Model Using Normally Distributed Priors Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit Intercept 2.346 0.639 1.087 3.644 Assassination -0.421 0.236 -0.900 0.028 Bombing/Explosion -0.972 0.233 -1.447 -0.524 Facility/Infrastructure -0.831 0.319 -1.450 -0.196 Hijacking -2.452 1.233 -4.837 0.056 Hostage Taking - Barricade Incident -0.283 1.268 -2.476 2.485 Hostage Taking - Kidnapping 1.548 1.113 -0.313 4.033 Unarmed Assault 2.585 2.088 -0.842 7.188 Unknown 0.156 0.861 -1.349 2.012 Mean - Airports & Airlines -1.152 1.132 -3.339 1.048 Mean - Business 0.774 0.620 -0.488 2.043 Mean - Educational Institution -0.438 0.949 -2.235 1.471 Mean - Government (Diplomatic) 1.520 2.522 -3.090 6.930 Mean - Government (General) -0.491 0.629 -1.783 0.751 Mean - Journalists & Media -0.335 1.495 -3.178 2.703 Mean - Maritime 2.158 2.310 -1.853 7.131 Mean - Military 1.067 0.647 -0.291 2.447 Mean - NGO 1.917 2.223 -2.065 6.544 Mean - Other 1.935 1.165 -0.152 4.434 Mean - Police 0.216 0.623 -1.054 1.492 Mean - Private Citizens & Property 0.898 0.616 -0.415 2.122 Mean - Religious Figures/Institutions -0.029 0.958 -1.781 1.971 Mean - Telecommunications -0.950 1.723 -4.285 2.624 Mean - Terrorists -0.279 0.669 -1.723 1.066 Mean - Tourists 1.464 2.532 -3.218 6.731 Mean - Transportation 0.543 0.756 -0.926 2.037 Mean - Unknown 1.774 0.984 0.043 3.871 Mean - Utilities -2.775 2.285 -7.589 1.418 Variance - Airports & Airlines 1.128 0.403 0.591 2.161 Variance - Business 1.195 0.428 0.619 2.306 Variance - Educational Institution 1.119 0.400 0.585 2.119 Variance - Government (Diplomatic) 1.114 0.397 0.585 2.094 Variance - Government (General) 1.340 0.536 0.654 2.729 Variance - Journalists & Media 1.131 0.408 0.593 2.164 Variance - Maritime 1.103 0.381 0.585 2.041 Variance - Military 1.162 0.444 0.586 2.326 Variance - NGO 1.106 0.391 0.579 2.082 Variance - Other 1.116 0.394 0.589 2.113 Variance - Police 1.210 0.428 0.631 2.324 Variance - Private Citizens & Property 1.001 0.335 0.547 1.866 Variance - Religious Figures/Institutions 1.124 0.406 0.590 2.144 Variance - Telecommunications 1.120 0.405 0.591 2.121 Variance - Terrorists 1.070 0.365 0.576 1.987 Variance - Tourists 1.104 0.386 0.584 2.084 Variance - Transportation 1.134 0.410 0.593 2.148 Variance - Unknown 1.113 0.389 0.582 2.103 Variance - Utilities 1.111 0.394 0.587 2.092 131 Tab. 5.8: Ireland before the 1998 Good Friday Agreement - Hierarchical Bayesian Model Using DP priors Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit (Intercept) 2.336 0.240 1.854 2.802 Assassination -0.393 0.195 -0.768 -0.001 Bombing/Explosion -0.897 0.191 -1.259 -0.530 Facility/Infrastructure -0.847 0.253 -1.314 -0.360 Hijacking -2.282 1.075 -4.222 -0.064 Hostage Taking (Barricade Incident) -0.012 1.174 -2.052 2.263 Hostage Taking (Kidnapping) 1.429 0.997 -0.308 3.603 Unarmed Assault 2.619 2.193 -0.859 7.858 Unknown 0.222 0.817 -1.192 1.955 µBaseline 2.231 0.396 1.453 2.998 ΣBaseline 0.391 0.350 0.113 1.187 Number of clusters 4.453 1.441 2.000 8.000 α 1.006 0.546 0.234 2.320 Airports & Airlines 1.977 0.491 1.162 3.007 Business 2.783 0.203 2.359 3.167 Educational Institution 2.021 0.472 1.262 3.002 Government (Diplomatic) 2.379 0.536 1.365 3.174 Government (General) 1.703 0.250 1.228 2.213 Journalists & Media 2.236 0.542 1.299 3.118 Maritime 2.419 0.526 1.399 3.179 Military 2.831 0.185 2.476 3.195 NGO 2.420 0.530 1.392 3.183 Other 2.689 0.375 1.766 3.240 Police 2.152 0.201 1.757 2.545 Private Citizens & Property 2.821 0.184 2.467 3.182 Religious Figures/Institutions 2.191 0.510 1.330 3.086 Telecommunications 2.194 0.547 1.260 3.099 Terrorists 1.810 0.279 1.291 2.362 Tourists 2.366 0.543 1.359 3.168 Transportation 2.555 0.391 1.728 3.145 Unknown 2.782 0.291 2.088 3.273 Utilities 2.144 0.554 1.214 3.090 Tab. 5.9: Ireland after the 1998 Good Friday Agreement - Hierarchical Bayesian Model Using Normally Distributed Priors Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit Intercept 2.819 1.058 0.910 4.946 Assassination 0.462 1.114 -1.734 2.530 Bombing Explosion -2.260 1.049 -4.355 -0.379 Facility/Infrastructure 3.759 2.077 -0.005 8.227 Hijacking 1.146 2.613 -3.549 6.608 Hostage Taking - Kidnapping 1.090 1.203 -1.318 3.368 Unarmed Assault -0.865 1.648 -3.965 2.529 Mean - Business -0.226 0.591 -1.361 0.962 Mean - Educational Institution 0.223 0.482 -0.716 1.162 Mean - Government (General) 2.191 1.224 0.158 5.149 Mean - Journalists & Media -2.353 2.281 -7.232 1.800 Mean - Military -0.680 1.573 -3.670 2.601 Mean - NGO 1.068 2.654 -3.878 6.628 Mean - Other -1.390 1.829 -4.945 2.298 Mean - Police -0.889 1.135 -3.037 1.455 Mean - Private Citizens & Property -0.209 0.388 -1.013 0.512 Mean - Religious Figures/Institutions 0.668 1.322 -1.743 3.554 Mean - Terrorists 3.080 1.941 -0.166 7.442 Mean - Tourists -1.516 1.224 -3.942 0.885 Mean - Transportation -1.288 0.908 -3.060 0.476 Mean - Unknown -3.902 1.071 -6.157 -1.986 Mean - Utilities -2.514 2.379 -7.595 1.825 Mean - Violent Political Party 2.808 2.037 -0.659 7.306 Variance - Business 1.148 0.402 0.602 2.147 Variance - Educational Institution 1.085 0.367 0.580 1.998 Variance - Government (General) 1.109 0.392 0.581 2.082 Variance - Journalists & Media 1.112 0.398 0.581 2.101 Variance - Military 1.106 0.391 0.585 2.078 Variance - NGO 1.108 0.390 0.584 2.077 Variance - Other 1.114 0.398 0.590 2.084 Variance - Police 1.143 0.406 0.594 2.149 Variance - Private Citizens & Property 1.136 0.410 0.599 2.169 Variance - Religious Figures/Institutions 1.097 0.380 0.585 2.037 Variance - Terrorists 1.100 0.387 0.584 2.077 Variance - Tourists 1.110 0.392 0.590 2.067 Variance - Transportation 1.111 0.391 0.589 2.102 Variance - Unknown 1.087 0.374 0.577 2.021 Variance - Utilities 1.109 0.392 0.585 2.073 Variance - Violent Political Party 1.106 0.387 0.588 2.069 132 Tab. 5.10: Ireland after the 1998 Good Friday Agreement - Hierarchical Bayesian Model Using DP priors Post. Post. 95% Credible Interval 95% Credible Interval Mean Std Dev Lower Limit Upper Limit (Intercept) 2.417 0.474 1.514 3.372 Assassination -0.210 1.268 -2.336 2.821 Bombing/Explosion -2.288 0.414 -3.159 -1.542 Facility/Infrastructure 0.475 0.742 -0.863 2.117 Hijacking 2.869 1.964 -0.496 7.233 Hostage Taking (Kidnapping) 1.042 2.514 -3.330 6.289 Unarmed Assault -1.083 1.368 -3.460 2.115 µBaseline 1.796 0.885 -0.109 3.435 ΣBaseline 1.212 1.564 0.181 4.792 Number of clusters 3.272 1.291 2.000 6.000 α 0.798 0.491 0.154 2.031 Business 2.737 0.422 1.995 3.633 Educational Institution 2.453 0.776 0.423 3.618 Government (General) 2.685 0.426 1.891 3.547 Journalists & Media 2.543 0.806 0.377 3.857 Military 2.909 0.646 2.024 4.669 NGO 2.284 0.945 -0.108 3.580 Other 2.470 0.724 0.546 3.568 Police 2.738 0.417 2.000 3.621 Private Citizens & Property 2.685 0.396 1.981 3.514 Religious Figures/Institutions 2.723 0.521 1.780 3.807 Terrorists 2.839 0.640 1.955 4.504 Tourists 2.238 0.911 0.007 3.507 Transportation 2.278 0.808 0.357 3.468 Unknown 0.529 0.901 -1.337 2.226 Utilities 2.155 1.077 -0.507 3.568 Violent Political Party 2.798 0.624 1.872 4.262 5.3.6 The Use of Dirichlet Process Priors We applied either a normal prior or a DP Prior to the coefficients regarding the targets of terrorist attacks. To compare the DP model for each conflict with its normal counterpart, we estimated pseudo-Bayes factors (PsBF) [47,48]. The PsBF is based on a model’s cross-validation predictive density. As standard Bayes factors are often not computationally feasible to compute, the PsBF is considered a useful surrogate. If we let y be our observed data, yt be the tth terrorist attack at time t across t = 1, . . . , T observations, and y(t) be the data of attacks with observation t deleted, the cross validative predictive density pi(yt|y(t)) is: pi(yt|y(t)) = ∫ pi(yt|β,y(t))pi(β, |y(t))dβ ≈ [ 1 R R∑ r=1 1 f(yt,β (r)) ]−1 , (5.6) where f is the likelihood function and β(r) is the vector of parameter values ob- 133 tained during the rth MCMC iteration [115]. The relation in (5.6) is based on a truncated series approximation of the harmonic mean of the logistic regression func- tion evaluated at each posterior draw, averaged across all R post burn-in MCMC iterations [115]. As a result, the PsBF comparing the normal model (M=1) to the DP model (M=2) can be written as follows: PsBF = T∏ t=1 pi(yt|y(t),M = 2) pi(yt|y(t),M = 1) . (5.7) We can take advantage of the additive properties of logarithms and utilize (5.6) to estimate the logarithms of the numerator as well as the denominator in (5.11). These quantities, approximations to the log-marginal data likelihoods for each model, enable us to estimate PsBFs for each conflict examined. Table 5.11 contains our results. Tab. 5.11: PsBF Computation comparing DP Model to Normal Model Normal Model DP Model Pseudo Bayes Factor Afghanistan -518.066 -510.610 1730.213 Iraq -1183.940 -1183.310 1.878 Sri Lanka -563.147 -556.940 496.310 N. Ireland Before 1998 GF Agreement -1278.738 -1272.590 467.781 N. Ireland After 1998 GF Agreement -230.315 -231.140 0.438 PsBF estimates greater than one indicate stronger support for the DP Model, whereas estimates less than one provide support for the normal model. Furthermore, the larger the value, the stronger the indication of support [47, 48].1 Therefore, these results indicate substantial support for the DP Model for understanding the determinants of successful attacks in Afghanistan, Sri Lanka, and Northern Ireland before the 1998 Good Friday Agreement. The PsBF for Iraq is 1.878, indicated some support for the DP Model over the normal model for that conflict. Thus, for four 1 Pseudo-Bayes Factors have the potential to legitimately take on extremely large values (as large as exp(50)) that would indicate overwhelming evidence in favor of one model over another, as illustrated in Ansari and Mela’s (2003) hierarchical Bayesian probit analysis of online customer clickstream behavior [3]. 134 of the five conflicts examined, the DP model effectively captures clustering around potential targets for these conflicts compared to the normal model. The normal model, on the other hand, is considerably more adequate for modeling the conflict in Northern Ireland after the 1998 Good Friday Agreement. The adequacy of the normal model compared to the DP Model in this instance may be due the lack of clustering of attacks around targets in Northern Ireland in recent years. Table 5.12 contains our average cluster sizes for each conflict, averaged over our MCMC iterations: Tab. 5.12: Average Cluster Size - DP prior Model Average Cluster Size Standard Deviation Afghanistan 4.691 1.552 Iraq 4.685 1.566 Sri Lanka 4.513 1.349 Northern Ireland (before 1998 GF Agreement) 4.453 1.441 Northern Ireland (after 1998 GF Agreement) 3.272 1.291 These results suggest an interesting phenomenon illustrated by our DP prior model. In particular, our models cluster around a small number of targets. This clus- tering manifests the terrorists’ strategy, improving the model’s explanatory power for three of the conflicts described above. Future research could look into determining the exact composition of these clusters and provide military advice accordingly. Re- cent research has performed similar cluster analysis of other real-world phenomena, including the topics discussed by Barack Obama, John McCain, and Mitt Romney in the last two presidential elections [113]. 5.3.7 Improving Bayesian Credible Interval Estimates In the previous chapter of this dissertation, we discussed utilizing the semi- parametric DRE method to improve the credible interval estimation of a hierarchical linear model regarding welfare reform. In this section, we extend this technique to refine the credible interval estimates of the hierarchical generalized linear model 135 Fig. 5.1: Histogram of Posterior Sample for Suicide Attack Coefficient in Afghanistan, Assuming a Normally Distributed Prior of this chapter. Specifically, we took posterior samples of several coefficients that warranted a more rigorous examination of their distributional properties. Upon looking at the histograms of these posterior samples, they all seemed to be reason- ably “normal-like” in nature, appearing to be unimodal and reasonably symmetric with limited skewness. Several of these histograms are in Figures 5.1-5.10 As a result of these marginal posterior samples’ distributional behavior, we used the approach in the previous chapter to estimate the 95% credible interval, again assuming a tilt function of the form h(x) = (x, x2)′. As the simulations of the previous chapter illustrated, applying the DRE method to posterior samples can provide more accurate quantile estimation. Our results are presented in Tables 5.13-5.17: As we saw with our rigorous examination of welfare reform, the semiparametric DRE method enables us to better understand our posterior interval estimates re- 136 Fig. 5.2: Histogram of Posterior Sample for Suicide Attack Coefficient in Afghanistan, Assuming a DP prior Tab. 5.13: Bayesian Credible Interval Refinements for Suicide Attack Posterior Coefficient, Afghanistan 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit Normal Prior -1.741 -0.811 -1.748 -0.817 DP Prior -1.429 -0.622 -1.439 -0.626 Tab. 5.14: Bayesian Credible Interval Refinements for Bombing/Explosion Posterior Co- efficient, Iraq 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit Normal Prior -1.628 -0.530 -1.622 -0.521 DP Prior -1.593 -0.540 -1.571 -0.528 Tab. 5.15: Bayesian Credible Interval Refinements for Facility/Infrastructure Posterior Coefficient, Sri Lanka 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit Normal Prior -1.147 1.326 -1.206 1.275 DP Prior -1.262 1.044 -1.339 1.010 137 Fig. 5.3: Histogram of Posterior Sample for Bombing/Explosion Attack Coefficient in Iraq, Assuming a Normally Distributed Prior Tab. 5.16: Bayesian Credible Interval Refinements for Hijacking Posterior Coefficient, Northern Ireland before 1998 Good Friday Agreement 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit Normal Prior -4.837 0.056 -4.878 0.028 DP Prior -4.222 -0.064 -4.342 -0.132 Tab. 5.17: Bayesian Credible Interval Refinements for Unarmed Assault Posterior Coeffi- cient, Northern Ireland bafterefore 1998 Good Friday Agreement 95% Credible Interval 95% Credible Interval Refined 95% Credible Refined 95% Credible Lower Limit Upper Limit Interval Lower Limit Interval Upper Limit Normal Prior -3.965 2.529 -4.072 2.431 DP Prior -3.460 2.115 -3.672 1.737 138 Fig. 5.4: Histogram of Posterior Sample for Bombing/Explosion Attack Coefficient in Iraq, Assuming a DP prior garding terrorist attacks. This information is useful for policymakers to understand upper and lower limits of response coefficients so they can make more accurate statistical inferences regarding fortification of appropriate security measures. For example, in Northern Ireland after the 1998 Good Friday Agreement, although the success of unarmed assaults remains relatively low, the upper limit of the pertinent marginal posterior distribution still remains positive. This result suggests that this positive upper interval estimate was unlikely due to truncation of the MCMC sam- pler. We can use these results to advise associated policymakers that regardless of the negative Bayesian point estimator generated pertaining to unarmed assaults, they should not neglect security issues regarding attacks of this nature. 139 Fig. 5.5: Histogram of Posterior Sample for Facility/Infrastructure Coefficient in Sri Lanka, Assuming a Normally Distributed Prior 5.4 Conclusions and Future Research Our results provide policymakers with a model offering useful counterterrorism strategies shedding light on which types of attacks are sufficiently defended against and which others warrant further fortification. As a result, policymakers can use our model in conjunction with other statistical models aimed at understanding terror- ist networks as well as theoretical research in political science examining the goals, causes, and onset of terrorism [2, 15, 89, 120, 143, 166]. It is interesting to note that across the conflicts studied here, suicide attacks are generally unsuccessful, despite the tremendous attention they garner from the mainstream media. This result is consistent with the findings of Kyung et al (2011)’s study of the Middle East and Northern Africa [90]. Other considerably more coordinated non-suicide attacks, however, such as hostage taking, are often much more successful. This phenomenon 140 Fig. 5.6: Histogram of Posterior Sample for Facility/Infrastructure Coefficient in Sri Lanka, Assuming a DP prior can be explained by the fact that security forces in these countries may be successful at deterring terrorist attacks, although further measures need to be taken to deter other types of attacks. There are many extensions of this research that we hope will aid policymak- ers. For example, policy researchers could potentially use this type of model on a considerably more local level as different regions within a country will almost surely require different security measures. The Iraq troop surge of 2007, for example, was intended to quell violence in Baghdad as well as the Al-Anbar province [12]. A more micro analysis, perhaps at a provincial level, can help policymakers understand what exactly such security measures may be necessary. Another potential avenue for fu- ture research could be to estimate a similar Bayesian statistical model presented here and instead examine the impact of different types of weapons on the success of terrorist attacks. This information is available in the rich START database and 141 Fig. 5.7: Histogram of Posterior Sample for Hijacking Coefficient in Northern Ireland (be- fore 1998 GF Agreement), Assuming a Normally Distributed Prior could provide valuable information to policymakers. Methodologically, there is also a significant amount of future research that we hope that this study will encourage. In particular, four of the eight models in this study applied standard DP priors to weaken the typically restrictive parametric assumptions associated with Bayesian statistical models. An interesting avenue of future research is to apply distance-dependent DP priors, where the mechanism for clustering is guided based on distance between terrorist attacks [8]. Bayesian infer- ences from models of this nature could provide useful military advice for combating terrorist attacks in the future. There is also of course no a priori reason to restrict this type of modeling to counterterrorism policy as there are myriads of other applications of Bayesian models ranging in applied economics, business, and professional sports among oth- ers [1,3,72]. In this study, we also utilized the semiparametric DRE method to pro- 142 Fig. 5.8: Histogram of Posterior Sample for Hijacking Coefficient in Northern Ireland (be- fore 1998 GF Agreement), Assuming a DP prior vide more accurate estimation of several Bayesian credible intervals for our posterior coefficients. An interesting avenue of future research could be to use this approach to provide detailed estimates for posterior densities corresponding to other parametric and non-parametric models. Additionally, in this study, we only applied the DRE method to marginal posterior distributions. Another potentially interesting avenue for future research could be to apply this approach to develop Bayesian credible regions for multivariate posterior densities. Finally, we applied this approach to posterior samples, based on significantly larger samples generated by MCMC meth- ods. For high-dimensional models involving large data sets, adequately sampling a posterior density can be quite time consuming and computationally burdensome. Generalized direct sampling is a faster alternative to MCMC that generates inde- pendent samples and can more quickly navigate the posterior [10]. Although these independent samples are quite small compared to MCMC samples, associated sta- 143 Fig. 5.9: Histogram of Posterior Sample for Unarmed Assault Coefficient in Northern Ire- land (after 1998 GF Agreement), Assuming a Normally Distributed Prior tistical inferences based on such samples can be significantly improved upon by utilizing the semiparametric DRE method. 5.5 Appendix 5.5.1 Variable Definition Below are descriptions of the variables used in this chapter, as defined in the GTD Codebook [55]. Our dependent variable was whether the terrorist attack was successful. Dependent Variable Used in Model - Successful attack 144 Fig. 5.10: Histogram of Posterior Sample for Unarmed Assault Coefficient in Northern Ireland (after 1998 GF Agreement), Assuming a DP prior Success - “Success of a terrorist strike is defined according to the tangible effects of the attack. Success is not judged in terms of the larger goals of the perpetra- tors. For example, a bomb that exploded in a building would be counted as a success even if it did not succeed in bringing the building down or inducing government repression.” Please see descriptions under each of the individual explanatory variables below about how this definition is applied to each type of terrorist attack. Explanatory Variable Used in Model - Suicide Attack Suicide - “This variable is coded ‘Yes’ in those cases where there is evidence that the perpetrator did not intend to escape from the attack alive.” 145 Explanatory Variable Used in Model - Types of Terrorist Attacks Assassination - “An act whose primary objective is to kill one or more specific, prominent individuals. Usually carried out on persons of some note, such as high-ranking military officers, government officials, celebrities, etc. Not to include attacks on non-specific members of a targeted group. The killing of a police officer would be an armed assault unless there is reason to believe the attackers singled out a particularly prominent officer for assassination ... In order for an assassination to be successful, the target of the assassination must be killed. For example, even if an attack kills numerous people but not the target, it is an unsuccessful assassination.” Armed Assault - “An attack whose primary objective is to cause physical harm or death directly to human beings by use of a firearm, incendiary, or sharp in- strument (knife, etc.). Also included under this attack type would be CBRN (chemical, biological, radiological, nuclear) weapons. Not to include acts of a purely personal or criminal nature, or acts in which people are incidentally harmed in pursuit of another primary objective. Not to include attacks involv- ing the use of fists, rocks, sticks, or other handheld (less-than-lethal) weapons ... An armed assault is determined to be successful if the assault takes place and if a target is hit. Unsuccessful armed assaults are those in which the perpetrators attack and do not hit the target. An armed assault is also un- successful if the perpetrators are apprehended on their way to commit the assault. To make this determination, however, there must be information to indicate that an actual assault was imminent.” Bombing/Explosion - “An attack where the primary effects are caused by an en- ergetically unstable material undergoing rapid decomposition and releasing a pressure wave that causes physical damage to the surrounding environment. 146 Can include either high or low explosives (including a dirty bomb) but does not include a nuclear explosive device that releases energy from fission and/or fusion, or an incendiary device where decomposition takes place at a much slower rate ... A bombing is successful if the bomb or explosive device det- onates. Bombings are considered unsuccessful if they do not detonate. The success or failure of the bombing is not based on whether it hit the intended target.” Hijacking - “An act whose primary objective is to take control of a vehicle such as an aircraft, boat, bus, etc. for the purpose of diverting it to an unprogrammed destination, force the release of prisoners, or some other political objective. Obtaining payment of a ransom should not the sole purpose of a Hijacking, but can be one element of the incident so long as additional objectives have also been stated. Hijackings are distinct from Hostage Taking because the target is a vehicle, regardless of whether there are people/passengers in the vehicle ... A hijacking is successful if the hijackers assume control of the vehicle at any point, whereas a hijacking is unsuccessful if the hijackers fail to assume control of the vehicle. The success or failure of the hijacking is not based on whether the vehicle reached the intended destination of the hijackers.” Hostage Taking (Barricade Incident) - “An act whose primary objective is to take control of hostages for the purpose of achieving a political objective through concessions or through disruption of normal operations. Such attacks are distinguished from kidnapping since the incident occurs and usually plays out at the target location with little or no intention to hold the hostages for an extended period in a separate clandestine location ... A barricade incident is successful if the hostage takers assume control of the individuals at any point, whereas a barricade incident is unsuccessful if the hostage takers fail to assume 147 control of the individuals.” Hostage Taking (Kidnapping) - “An act whose primary objective is to take con- trol of hostages for the purpose of achieving a political objective through concessions or through disruption of normal operations. Kidnappings are dis- tinguished from Barricade Incidents (above) in that they involve moving and holding the hostages in another location ... A kidnapping is successful if the kidnappers assume control of the individuals at any point, whereas a kidnap- ping is unsuccessful if the kidnappers fail to assume control of the individuals.” Facility/Infrastructure - “An act, excluding the use of an explosive, whose primary objective is to cause damage to a non-human target, such as a building, mon- ument, train, pipeline, etc. Such attacks include arson and various forms of sabotage (e.g., sabotaging a train track is a Facility/Infrastructure, even if passengers are killed). Facility/Infrastructures can include acts which aim to harm an installation, yet also cause harm to people incidentally (e.g. an arson attack primarily aimed at damaging a building, but causes injuries or fatalities) ... A facility attack is determined to be successful if the facility is damaged. If the facility has not been damaged, then the attack is unsuccessful.” Unarmed Assault - “An attack whose primary objective is to cause physical harm or death directly to human beings by any means other than explosive, firearm, incendiary, or sharp instrument (knife, etc.) ... An unarmed assault is deter- mined to be successful there is a victim that who has been injured. Unarmed assaults that are unsuccessful are those in which the perpetrators do not in- jure anyone ... An unarmed assault is also unsuccessful if the perpetrators are apprehended when on their way to commit the assault. To make this deter- mination, however, there must be information to indicate that an assault was imminent.” 148 Explanatory Variable Used in Model - Targets of Terrorist Attacks Business - “Businesses are defined as individuals or organizations engaged in com- mercial or mercantile activity as a means of livelihood. Any attack on a busi- ness or private citizens patronizing a business such as a restaurant, gas station, music store, bar, caf, etc. This includes attacks carried out against corporate offices or employees of firms like mining companies, or oil corporations. Fur- thermore, includes attacks conducted on business people or corporate officers. Included in this value as well are hospitals and chambers of commerce and cooperatives. Does not include attacks carried out in public or quasi-public areas such as business district or commercial area, (these attacks are captured under Private Citizens and Property, see below.)” Government (General) - “Any attack on a government building; government mem- ber, former members, including members of political parties in official capac- ities, their convoys, or events sponsored by political parties; political move- ments; or a government sponsored institution where the attack is expressly carried out to harm the government. This value includes attacks on judges, public attorneys (e.g., prosecutors), courts and court systems, politicians, roy- alty, head of state, government employees (unless police or military), election- related attacks, intelligence agencies and spies, or family members of govern- ment officials when the relationship is relevant to the motive of the attack.” Police - “This value includes attacks on members of the police force or police in- stallations; this includes police boxes, patrols headquarters, academies, cars, checkpoints, etc. Includes attacks against jails or prison facilities, or jail or prison staff or guards.” Military - “Includes attacks against army units, patrols, barracks, and convoys, 149 jeeps, etc. Also includes attacks on recruiting sites, and soldiers engaged in internal policing functions such as at checkpoints and in anti-narcotics activi- ties. Excludes attacks against non-state militias and guerrillas, these types of attacks are coded as Terrorist/Non-state Militias see below.” Abortion Related - “Attacks on abortion clinics, employees, patrons, or security personnel stationed at clinics.” Airports and Airlines - “An attack that was carried out either against an airplane or against an airport. Attacks against airline employees while on board are also included in this value. Includes attacks conducted against airport business offices and executives. Attacks where airplanes were used to carry out the attack (such as three of the four 9/11 attacks) are not included.” Government (Diplomatic) - “Attacks carried out against foreign missions, includ- ing embassies, consulates, etc. This value includes cultural centers that have diplomatic functions, and attacks against diplomatic staff and their families (when the relationship is relevant to the motive of the attack) and property. The United Nations is a diplomatic target.” Educational Institution - “Attacks against schools, teachers, or guards protecting school sites. Includes attacks against university professors, teaching staff and school buses. Moreover, includes attacks against religious schools in this value. As noted below in the Private Citizens and Property value, the GTD has sev- eral attacks against students. If attacks involving students are not expressly against a school, university or other educational institution or are carried out in an educational setting, they are coded as private citizens and property. Ex- cludes attacks against military schools (attacks on military schools are coded as Military, see below).” 150 Food or Water Supply - “Attacks on food or water supplies or reserves are included in this value. This generally includes attacks aimed at the infrastructure re- lated to food and water for human consumption.” Journalists and Media - “Includes, attacks on reporters, news assistants, photogra- phers, publishers, as well as attacks on media headquarters and offices. Attacks on transmission facilities such as antennae or transmission towers, or broadcast infrastructure are coded as Telecommunications, see below.” Maritime - “Implies civilian maritime. Includes attacks against fishing ships, oil tankers, ferries, yachts, etc. (Attacks on fishermen are coded as Private Citi- zens and Property, see below).” NGO - “Includes attacks on offices and employees of non-governmental organiza- tions (NGOs). NGOs here include large multinational non-governmental or- ganizations such as the Red Cross and Doctors without Borders. Does not include labor unions, social clubs, student groups, and other non-NGO (such cases are coded as Other, see immediately below).” Other - “This value includes acts of terrorism committed against targets which do not fit into other categories.” Private Citizens and Property - “This value includes attacks on individuals, the public in general or attacks in public areas including markets, commercial streets, busy intersections and pedestrian malls. Also includes ambiguous cases where the target/victim was a named individual, or where the target/victim of an attack could be identified by name, age, occupation, gender or nationality. This value also includes ceremonial events, such as weddings and funerals. The GTD contains a number of attacks against students. If these attacks are not expressly against a school, university or other educational institution or 151 are not carried out in an educational setting, these attacks are coded using this value. Also, includes incidents involving political supporters as private citizens and property, provided that these supporters are not part of a government- sponsored event. Finally, this value includes police informers. Does not include attacks causing civilian casualties in businesses such as restaurants, cafes or movie theaters (these categories are coded as Business see above).” Religious Figures and Institutions - “This value includes attacks on religious lead- ers, (Imams, priests, bishops, etc.), religious institutions (mosques, churches), religious places or objects (shrines, relics, etc.). This value also includes at- tacks on organizations that are affiliated with religious entities that are not NGOs, businesses or schools. Attacks on religious pilgrims are considered Pri- vate Citizens and Property; attacks on missionaries are considered religious figures.” Telecommunication - “This includes attacks on facilities and infrastructure for the transmission of information. More specifically this value includes things like cell phone towers, telephone booths, television transmitters, radio, and mi- crowave towers.” Terrorists/Non-State Militias - “Terrorists or members of identified terrorist groups within the GTD are included in this value. Membership is broadly defined and includes informants for terrorist groups, but excludes former or surrendered terrorists. This value also includes cases involving the targeting of militias and guerillas.” Tourists - “This value includes the targeting of tour buses, tourists, or ‘tours.’ Tourists are persons who travel primarily for the purposes of leisure or amuse- ment. Government tourist offices are included in this value. The attack must 152 clearly target tourists, not just an assault on a business or transportation system used by tourists. Travel agencies are coded as business targets. Transportation (Other than aviation) - “Attacks on public transportation systems are included in this value. This can include efforts to assault public buses, minibuses, trains, metro/subways, highways (if the highway itself is the target of the attack), bridges, roads, etc. The GTD contains a number of attacks on generic terms such as cars or vehicles. These attacks are assumed to be against Private Citizens and Property unless shown to be against public transportation systems. In this regard, buses are assumed to be public transportation unless otherwise noted.” Unknown - “The target type cannot be determined from the available information.” Utilities - “This value pertains to facilities for the transmission or generation of energy. For example, power lines, oil pipelines, electrical transformers, high tension lines, gas and electric substations, are all included in this value. This value also includes lampposts or street lights. Attacks on officers, employees or facilities of utility companies excluding the type of faculties above are coded as business.” Violent Political Parties - “This value pertains to entities that are both political parties (and thus, coded as ’government’ in this coding scheme) and terrorists. It is operationally defined as groups that engage in electoral politics and appear as Perpetrators in the GTD.” 153 Chapter 6: Conclusions and Future Research - Where do we go from here? This dissertation has offered a number of improvements to Bayesian statistical methodologies alongside a number of contributions to public policy research. There are still, however, many other important questions future research should look at. In particular, the tilt functions used in the semiparametric density ratio models called upon in this thesis as well as in other research are chosen a priori by the researcher. Future research could look into a systematic manner, perhaps via variational calcu- lus, of determining the optimal tilt functions. There are also many other statistical methodologies that warrant future research including improvements to other non- MCMC based Bayesian estimation techniques, such as variational Bayesian meth- ods, particle filtering, and generalized direct sampling [6,10,87]. Fast alternatives to MCMC methods are of utmost importance to researchers as the need for analyzing large data sets has grown tremendously in recent years in a number of applied fields. Additionally, there are myriads of other questions beyond the ones looked at here that could benefit from rigorous analysis, including questions regarding edu- cation policy, energy policy, immigration policy, health policy, and macroeconomic modeling. Methodologies similar to those utilized in this dissertation, as well as many of the techniques mentioned above, could be particularly useful in looking at these questions and others. We emphasize, however, that the ideas presented in this dissertation are not applicable just to public policy research. Regardless of the application, it is always important to be able to properly model real-world phenomena. We hope that this dissertation provides a number of useful tools for applied statisticians to use. 155 Bibliography [1] Albright, S. C. A statistical analysis of hitting streaks in baseball. Journal of the American Statistical Association 88, 424 (1993), 1175–1183. [2] Allanach, J., Tu, H., Singh, S., Willett, P., and Pattipati, K. Detecting, tracking, and counteracting terrorist networks via hidden Markov models. In Aerospace Conference, 2004. Proceedings. 2004 IEEE (2004), vol. 5, IEEE. [3] Ansari, A., and Mela, C. F. E-customization. Journal of Marketing Research 40, 2 (2003), 131–145. [4] Antoniak, C. E., et al. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics 2, 6 (1974), 1152–1174. [5] Authorization for Use of Military Force Against Iraq Resolu- tion of 2002. Public Law 107-243, 116 Stat. 1498, 2002. [6] Beal, M. J. Variational algorithms for approximate Bayesian inference. Ph.D. Dissertation, University of London, 2003. [7] Bender, R., and Grouven, U. Using binary logistic regression models for ordinal data with non-proportional odds. Journal of Clinical Epidemiology 51, 10 (1998), 809–816. [8] Blei, D. M., and Frazier, P. I. Distance dependent Chinese restaurant processes. The Journal of Machine Learning Research 12 (2011), 2461–2488. [9] Bradlow, E. T., Hardie, B. G., and Fader, P. S. Bayesian inference for the negative binomial distribution via polynomial expansions. Journal of Computational and Graphical Statistics 11, 1 (2002). [10] Braun, M., and Damien, P. Generalized direct sampling for hierarchical Bayesian models. arXiv preprint arXiv:1108.2245 (2011). [11] Bucklin, R. E., and Gupta, S. Brand choice, purchase incidence, and seg- mentation: An integrated modeling approach. Journal of Marketing Research 29, 2 (1992). [12] Bush, G. W. President’s address to the nation. http: //georgewbush-whitehouse.archives.gov/news/releases/2007/01/ 20070110-7.html, January 2007. [Online; accessed April 17, 2014]. [13] Capretta, J., and Dayaratna, K. D. Compelling evidence makes the case for a market-driven health care system. Heritage Foundation Back- grounder, 2867 (2013). [14] Castles, F. G., and Leibfried, S. The Oxford Handbook of the Welfare State. Oxford Handbooks Online, 2010. [15] Chenoweth, E. Terrorism and democracy. Annual Review of Political Sci- ence 16 (2013), 355–378. [16] Clinton, B. My Life. Random House LLC, 2005. [17] Congressional Budget Office. The effects of tort reform: Evidence from the states. A CBO Paper (June 2004). [18] Cornyn, J., and Meese, E. Health care and medical malpractice reform: The necessity of reform in the current debate. Heritage Foundation Lecture 1142 . [19] Crain, N., Crain, M., McQuillan, L. J., and Abramyan, H. Tort law tally: How state tort reforms affect tort losses and tort insurance premiums. Pacific Research Institute, San Francisco, CA (2009). [20] Dayaratna, K. D. Studies show: Medicaid patients have worse access and outcomes than the privately insured. Heritage Foundation Backgrounder, 2740 (2012). [21] Dayaratna, K. D. Competitive markets in health care: The next revolution. Heritage Foundation Backgrounder, 2833 (2013). [22] Dayaratna, K. D., and Kannan, P. K. A mathematical reformulation of the reference price. Marketing Letters 23, 3 (2012), 839–849. [23] Dayaratna, K. D., and Miller, S. J. First order approximations of the Pythagorean won-loss formula for predicting MLB teams’ winning per- centages. By The Numbers: The Newsletter of the SABR Statistical Analysis Committee 22, 1 (2012), 15–19. [24] Dayaratna, K. D., and Miller, S. J. The Pythagorean won-loss formula and hockey: A statistical justification for using the classic baseball formula as an evaluative tool in hockey. The Hockey Research Journal: A Publication of the Society for International Hockey Research (2013), 193–209. [25] Department of Health and Human Services, Administration for Children and Families, Office of Family Assistance. Temporary Assistance for Needy Families Program: Eighth annual report to Congress. 157 [26] Department of Health and Human Services, Administration for Children and Families, Office of Family Assistance. Temporary Assistance for Needy Families Program: Fifth annual report to Congress. [27] Department of Health and Human Services, Administration for Children and Families, Office of Family Assistance. Temporary Assistance for Needy Families Program: Fourth annual report to Congress. [28] Department of Health and Human Services, Administration for Children and Families, Office of Family Assistance. Temporary Assistance for Needy Families Program: Ninth annual report to Congress. [29] Department of Health and Human Services, Administration for Children and Families, Office of Family Assistance. Temporary Assistance for Needy Families Program: Second annual report to Congress. [30] Department of Health and Human Services, Administration for Children and Families, Office of Family Assistance. Temporary Assistance for Needy Families Program: Seventh annual report to Congress. [31] Department of Health and Human Services, Administration for Children and Families, Office of Family Assistance. Temporary Assistance for Needy Families Program: Sixth annual report to Congress. [32] Department of Health and Human Services, Administration for Children and Families, Office of Family Assistance. Temporary Assistance for Needy Families Program: Third annual report to Congress. [33] Department of Health and Human Services, Administration for Children and Families, Office of Family Assistance. Acf archives. http://archive.acf.hhs.gov/programs/ofa/data-reports/ afdc/rpt3800/3800index.htm, 1994-1997. [Online; accessed April 17, 2014]. [34] Drehle, D. V., and Smith, J. R. U.S. strikes Iraq for plot to kill Bush. The Washington Post (June 1993). [35] Enders, W. Terrorism: An empirical analysis. Handbook of Defense Eco- nomics 2 (2007), 815–866. [36] Enders, W., and Sandler, T. Terrorism: Theory and applications. Hand- book of Defense Economics 1 (1995), 213–249. [37] Escobar, M. D., and West, M. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90, 430 (1995), 577–588. [38] Esposito, J. L. Unholy War: Terror in the Name of Islam. Oxford Univer- sity Press, 2003. 158 [39] Everson, P. J., and Bradlow, E. T. Bayesian inference for the beta- binomial distribution via polynomial expansions. Journal of Computational and Graphical Statistics 11, 1 (2002). [40] Ezell, B. C., Bennett, S. P., Von Winterfeldt, D., Sokolowski, J., and Collins, A. J. Probabilistic risk analysis and terrorism risk. Risk Analysis 30, 4 (2010), 575–589. [41] Ferguson, T. S. A Bayesian analysis of some nonparametric problems. The Annals of Statistics (1973), 209–230. [42] Fienberg, S. E. Bayesian models and methods in public policy and govern- ment settings. Statistical Science 26, 2 (2011), 212–226. [43] Fokianos, K. Merging information for semiparametric density estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66, 4 (2004), 941–958. [44] Fokianos, K., Kedem, B., Qin, J., and Short, D. A. A semiparametric approach to the one-way layout. Technometrics 43, 1 (2001). [45] Fokianos, K., and Qin, J. A note on Monte Carlo maximization by the density ratio model. Journal of Statistical Theory and Practice 2, 3 (2008), 355–367. [46] Garner, B. A., and Black, H. C. Black’s Law Dictionary. St. Paul, MN: Thomson/West, 2004. [47] Geisser, S., and Eddy, W. F. A predictive approach to model selection. Journal of the American Statistical Association 74, 365 (1979), 153–160. [48] Gelfand, A. E. Model determination using sampling-based methods. In Markov chain Monte Carlo in practice. Springer, 1996, pp. 145–161. [49] Gelfand, A. E., Hills, S. E., Racine-Poon, A., and Smith, A. F. Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association 85, 412 (1990), 972–985. [50] Gelfand, A. E., and Smith, A. F. Sampling-based approaches to calcu- lating marginal densities. Journal of the American Statistical Association 85, 410 (1990), 398–409. [51] Gelman, A., and Rubin, D. B. Inference from iterative simulation using multiple sequences. Statistical Science (1992), 457–472. [52] Gilbert, P. B., Lele, S. R., and Vardi, Y. Maximum likelihood estima- tion in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika 86, 1 (1999), 27–43. 159 [53] Gingrich, N. Real Change: From the World that Fails to the World that Works. Regnery Publishing, 2008. [54] Global Terrorism Database [Data file]. National Consortium for the Study of Terrorism and Responses to Terrorism (START). 2011. [55] Global Terrorism Database [Data file]. START GTD Database Code- book: Inclusion Criteria and Variables, October 2012. [56] Greenspan, A. The Map and the Territory: Risk, Human Nature, and the Future of Forecasting. Penguin, 2013. [57] Grimmer, J. An introduction to Bayesian inference via variational approxi- mations. Political Analysis 19, 1 (2011), 32–47. [58] Guadagni, P. M., and Little, J. D. A logit model of brand choice calibrated on scanner data. Marketing Science 2, 3 (1983), 203–238. [59] Guadagni, P. M., and Little, J. D. When and what to buy: A nested logit model of coffee purchase. Journal of Forecasting 17, 3-4 (1998), 303–326. [60] Gupta, S., and Chintagunta, P. K. On using demographic variables to determine segment membership in logit mixture models. Journal of Marketing Research (1994), 128–136. [61] Haskins, R., and Greenberg, M. Welfare Reform: Success or Failure? Policy and Practice 64, 1 (2006), 10. [62] Hawkes, A. G. Spectra of some self-exciting and mutually exciting point processes. Biometrika 58, 1 (1971), 83–90. [63] Hederman, R. S., and Rector, R. The Role of Parental Work in Child Poverty. Heritage Foundation Center for Data Analysis Report 03, 01 (2003). [64] Hiltermann, J. R. A Poisonous Affair: America, Iraq, and the Gassing of Halabja. Cambridge University Press, 2007. [65] Holden, R. T. The contagiousness of aircraft hijacking. American Journal of Sociology (1986), 874–904. [66] Holden, R. T. Time series analysis of a contagious process. Journal of the American Statistical Association 82, 400 (1987), 1019–1026. [67] Hymowitz, K., Carroll, J., Wilcox, W., and Kaye, K. Knot Yet: The Benefits and Costs of Delayed Marriage in America. http: //twentysomethingmarriage.org/, 2013. [Online; accessed April 17, 2014]. [68] Iraq Liberation Act of 1998. Public Law 105–338, 112 Stat. 3178, 1998. 160 [69] Jackson, O. A. The impact of the 9/11 terrorist attacks on the US economy. Journal of 9/11 Studies 20 (2008), 1–27. [70] Jara, A., Hanson, T. E., Quintana, F. A., Mu¨ller, P., and Rosner, G. L. DPpackage: Bayesian semi and nonparametric modeling in R. Journal of Statistical Software 40, 5 (2011), 1. [71] Jha, M. K. Dynamic Bayesian network for predicting the likelihood of a terrorist attack at critical transportation infrastructure facilities. Journal of Infrastructure Systems 15, 1 (2009), 31–39. [72] Jochmann, M., and Leon-Gonzalez, R. Estimating the demand for health care with panel data: A semiparametric Bayesian approach. Health Economics 13, 10 (2004), 1003–1014. [73] Jones, A. Should tort reform be part of health care reform? A “liberal” thinks so. The Wall Street Journal (May 2010). [74] Kamakura, W. A., and Russell, G. J. A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research (1989), 379–390. [75] Kaplan, E. H., Mintz, A., and Mishal, S. Tactical prevention of suicide bombings in Israel. Interfaces 36, 6 (2006), 553–561. [76] Kaplan, E. H., Mintz, A., Mishal, S., and Samban, C. What happened to suicide bombings in Israel? Insights from a terror stock model. Studies in Conflict & Terrorism 28, 3 (2005), 225–235. [77] Kaplan, S. Applying the general theory of quantitative risk assessment (QRA) to terrorism risk. Risk-Based Decision Making (2002), 77–81. [78] Katzoff, M., Zhou, W., Khan, D., Lu, G., and Kedem, B. Out of sample fusion in risk prediction. Journal of Statistical Theory and Practice Forthcoming (2013). [79] Kean, T. The 9/11 commission report: Final report of the national commis- sion on terrorist attacks upon the United States. Government Printing Office, 2011. [80] Kedem, B., Kim, E.-y., Voulgaraki, A., and Graubard, B. I. Two- dimensional semiparametric density ratio modeling of testicular germ cell data. Statistics in Medicine 28, 16 (2009), 2147–2159. [81] Kedem, B., Lu, G., Wei, R., and Williams, P. D. Forecasting mortality rates via density ratio modeling. Canadian Journal of Statistics 36, 2 (2008), 193–206. 161 [82] Kedem, B., Pan, L., and Zhou, W. Out of sample fusion: A Monte Carlo method. Proceedings of the 2014 IEEE Conference on Computational Science and Computational Intelligence (2014), 364–367. [83] Kedem, B., and Wen, S. Semi-parametric cluster detection. Journal of Statistical Theory and Practice 1, 1 (2007), 49–72. [84] Kennedy, D. M., Cohen, L., and Bailey, T. A. The American Pageant. Houghton Mifflin Harcourt, 2005. [85] Kessler, D., and McClellan, M. Do doctors practice defensive medicine? The Quarterly Journal of Economics 111, 2 (1996), 353–390. [86] Kessler, D. P., Sage, W. M., and Becker, D. J. Impact of malpractice reforms on the supply of physician services. Journal of the American Medical Association 293, 21 (2005), 2618–2625. [87] Khan, Z., Balch, T., and Dellaert, F. MCMC-based particle filtering for tracking a variable number of interacting targets. Pattern Analysis and Machine Intelligence, IEEE Transactions on 27, 11 (2005), 1805–1819. [88] Kottas, A., Duan, J. A., and Gelfand, A. E. Modeling disease inci- dence data with spatial and spatio temporal Dirichlet process mixtures. Bio- metrical Journal 50, 1 (2008), 29–42. [89] Kydd, A. H., and Walter, B. F. The strategies of terrorism. International Security 31, 1 (2006), 49–80. [90] Kyung, M., Gill, J., and Casella, G. New findings from terrorism data: Dirichlet process random-effects models for latent groups. Journal of the Royal Statistical Society: Series C (Applied Statistics) 60, 5 (2011), 701–721. [91] Lazar, N. A. Bayesian empirical likelihood. Biometrika 90, 2 (2003), 319– 326. [92] Lenk, P. J., and DeSarbo, W. S. Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika 65, 1 (2000), 93–119. [93] Levy, R. Do’s and dont’s of tort reform. Cato Institute Commentary (May 2005). [94] Lewis, E., Mohler, G., Brantingham, P. J., and Bertozzi, A. L. Self-exciting point process models of civilian deaths in Iraq. Security Journal 25, 3 (2012), 244–264. [95] Li, Q., and Schaub, D. Economic globalization and transnational terrorism: A pooled time-series analysis. Journal of Conflict Resolution 48, 2 (2004), 230–258. 162 [96] Lu, G. Asymptotic theory for multiple-sample semiparametric density ratio models and its application to mortality forecasting. Ph.D. Dissertation, Uni- versity of Maryland, 2007. [97] Lunn, D. J., Thomas, A., Best, N., and Spiegelhalter, D. WinBUGS - A Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing 10, 4 (2000), 325–337. [98] MacEachern, S. N., and Mu¨ller, P. Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics 7, 2 (1998), 223–238. [99] McFadden, D. Conditional logit analysis of qualitative choice behavior. Frontiers in Econometrics (1973), 105–142. [100] McFadden, D., and Train, K. Mixed MNL models for discrete response. Journal of Applied Econometrics 15, 5 (2000), 447–470. [101] McGeough, P. Witness won’t let Saddam intimidate him. The Sydney Morning Herald (December 2005). [102] McKittrick, D., and McVea, D. Making Sense of the Troubles: The Story of the Conflict in Northern Ireland. New Amsterdam Books, 2002. [103] McQuillan, L. J. Jackpot Justice: The True Cost of America’s Tort System. Pacific Research Institute, 2007. [104] McQuillan, L. J., and Abramyan, H. U.S. tort liability index: 2010 report. Pacific Research Institute, San Francisco, CA (2010). [105] McShane, B., Adrian, M., Bradlow, E. T., and Fader, P. S. Count models based on Weibull interarrival times. Journal of Business & Economic Statistics 26, 3 (2008). [106] Mengersen, K. L., Pudlo, P., and Robert, C. P. Bayesian computa- tion via empirical likelihood. Proceedings of the National Academy of Sciences 110, 4 (2013), 1321–1326. [107] Miller, S. J. A derivation of the Pythagorean Won-Loss Formula in baseball. Chance 20, 1 (2007), 40–48. [108] Miller, S. J., Bradlow, E. T., and Dayaratna, K. Closed-form Bayesian inferences for the logit model via polynomial expansions. Quan- titative Marketing and Economics 4, 2 (2006), 173–206. [109] Miller, S. J., and Takloo-Bighash, R. An Invitation to Modern Number Theory, vol. 13. Princeton University Press, 2006. [110] Morris, C. N. Parametric empirical Bayes inference: theory and applica- tions. Journal of the American Statistical Association 78, 381 (1983), 47–55. 163 [111] Murray, C. Losing Ground: American Social Policy, 1950-1980. Basic books, 2008. [112] Neal, R. M. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9, 2 (2000), 249– 265. [113] Nguyen, V.-A., Boyd-Graber, J., Resnik, P., Cai, D. A., Midberry, J. E., and Wang, Y. Modeling topic control to detect influence in conver- sations using nonparametric topic models. Machine Learning (2013), 1–41. [114] Northern Ireland Peace Agreement (The Good Friday Agree- ment). http://peacemaker.un.org/sites/peacemaker.un.org/files/ IE%20GB_980410_Northern%20Ireland%20Agreement.pdf, April 1998. [On- line; accessed April 19, 2014]. [115] Ntzoufras, I. Bayesian Modeling using WinBUGS, vol. 698. John Wiley & Sons, 2011. [116] Owen, A. B. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75, 2 (1988), 237–249. [117] Pate´-Cornell, E. Fusion of intelligence information: A Bayesian approach. Risk Analysis 22, 3 (2002), 445–454. [118] Personal Responsibility and Work Opportunity Reconciliation Act of 1996. Public Law 104–193, 110 Stat. 2105, 1996. [119] Phue, J.-N., Kedem, B., Jaluria, P., and Shiloach, J. Evaluating microarrays using a semiparametric approach: Application to the central car- bon metabolism of Escherichia coli BL21 and JM109. Genomics 89, 2 (2007), 300–305. [120] Popp, R. L., and Yen, J. Emergent Information Technologies and Enabling Policies for Counter-terrorism, vol. 6. John Wiley & Sons, 2006. [121] Porter, M. D., White, G., et al. Self-exciting hurdle models for terrorist activity. The Annals of Applied Statistics 6, 1 (2012), 106–124. [122] Prentice, R. L., and Pyke, R. Logistic disease incidence models and case-control studies. Biometrika 66, 3 (1979), 403–411. [123] Qin, J., and Lawless, J. Estimating equations, empirical likelihood and constraints on parameters. Canadian Journal of Statistics 23, 2 (1995), 145– 159. [124] Qin, J., and Zhang, B. A goodness-of-fit test for logistic regression models based on case-control data. Biometrika 84, 3 (1997), 609–618. 164 [125] Qin, J., and Zhang, B. Density estimation under a two-sample semipara- metric model. Nonparametric Statistics 17, 6 (2005), 665–683. [126] Rapoport, D. C. Fear and trembling: Terrorism in three religious traditions. The American Political Science Review (1984), 658–677. [127] Reagan, R. A Time for Choosing: The Speeches of Ronald Reagan, 1961- 1982. Gateway Books, 1983. [128] Rector, R. Undermining true welfare reform. Heritage Foundation Com- mentary 14, 3, 283–298. [129] Rector, R. Welfare reform, dependency reduction, and labor market entry. Journal of Labor Research 14, 3 (1993), 283–297. [130] Rector, R. Marriage: America’s greatest weapon against child poverty. Heritage Foundation Backgrounder, 2465 (2010). [131] Rector, R., Bradley, K., and Sheffield, R. Obama to spent $10.3 trillion on welfare. Heritage Foundation Special Report SR, 67 (2009). [132] Rector, R., and Fagan, P. The continuing good news about welfare reform. Heritage Foundation Backgrounder, 1620 (2003). [133] Rector, R., and Johnson, K. Roles of couples’ relationship skills and fathers’ employment in encouraging marriage. Heritage Foundation Center for Data Analysis Report 04, 14 (2004). [134] Rector, R., Johnson, K., and Noyes, L. R. Teenage sexual abstinence and academic achievement. Heritage Foundation Center for Data Analysis Report 03, 04 (2003). [135] Rector, R., and Marshall, J. The unfinished work of welfare reform. National Affairs (2013). [136] Rector, R., and Sheffield, R. Understanding poverty in the United States: Surprising facts about America’s poor. Heritage Foundation Back- grounder, 2607 (2011). [137] Rector, R., and Sheffield, R. Ending work for welfare: An overview. Heritage Foundation Issue Brief, 3712 (2012). [138] Revelt, D., and Train, K. Mixed logit with repeated choices: households’ choices of appliance efficiency level. Review of Economics and Statistics 80, 4 (1998), 647–657. [139] Rivlin, A. Curing health care: the next president should complete, not abandon, Obama’s reform. Brookings Institution Campaign 2012 Policy Brief (May 2012). 165 [140] Rossi, P. E., and Allenby, G. M. Bayesian Statistics and Marketing. Marketing Science 22, 3 (2003), 304–328. [141] Rossi, P. E., Allenby, G. M., and McCulloch, R. E. Bayesian Statis- tics and Marketing. J. Wiley & Sons, 2005. [142] Rubin, P. H., and Shepherd, J. M. Tort reform and accidental deaths. Journal of Law and Economics 50, 2 (2007), 221–238. [143] Sanchez-Cuenca, I., and De la Calle, L. Domestic terrorism: The hidden side of political violence. Annual Review of Political Science 12 (2009), 31–49. [144] Sandler, T., et al. Terrorism & game theory. Simulation & Gaming 34, 3 (2003), 319–337. [145] Sawhill, I. V., and Haskins, R. Work and Marriage: The Way to End Poverty and Welfare. Welfare Reform & Beyond Initiative, Brookings Insti- tution, 2003. [146] Schennach, S. M. Bayesian exponentially tilted empirical likelihood. Biometrika 92, 1 (2005), 31–46. [147] Schmittlein, D. C., Morrison, D. G., and Colombo, R. Counting your customers: Who-are they and what will they do next? Management Science 33, 1 (1987), 1–24. [148] Seifert, K. R., and McCauley, C. Suicide bombers in Iraq, 2003–2010: disaggregating targets can reveal insurgent motives and priorities. Terrorism and Political Violence Forthcoming . [149] Shiryaev, A. N. Probability. Springer-Verlag, New York, 1996. [150] Smith, K. R., and Zick, C. D. The incidence of poverty among the recently widowed: Mediating factors in the life course. Journal of Marriage and the Family (1986), 619–630. [151] Studdert, D. M., Mello, M. M., Sage, W. M., DesRoches, C. M., Peugh, J., Zapert, K., and Brennan, T. A. Defensive medicine among high-risk specialist physicians in a volatile malpractice environment. Journal of the American Medical Association 293, 21 (2005), 2609–2617. [152] Tanner, M., and DeHaven, T. TANF and federal welfare. Downsizing the Federal Government (September 2010). [153] Tenet, G., and Harlow, B. At the Center of the Storm: My Years at the CIA. HarperPress, 2007. [154] Tierney, L. Markov chains for exploring posterior distributions. The Annals of Statistics (1994), 1701–1728. 166 [155] Tranchita, C., Hadjsaid, N., and Torres, A. Ranking contingency resulting from terrorism by utilization of Bayesian networks. In Electrotechni- cal Conference, 2006. MELECON 2006. IEEE Mediterranean (2006), IEEE, pp. 964–967. [156] Unicon Research Corporation [producer and distributor of CPS Utilities]. Current Population Surveys, March 1962-2009 [machine-readable data files]/conducted by the Bureau of the Census for the Bureau of Labor Statistics. Washington: Bureau of the Census [producer and distributor]. Los Angeles, CA., 2009. [157] US Department of Health and Human Services and others. Quar- terly Public Assistance Statistics. Washington DC: HHS, Administration on Children and Families (1982-1993). [158] US Department of State Office of the Coordinator for Coun- terterrorism. Country Reports on Terrorism 2004. http://www.state. gov/documents/organization/45313.pdf, April 2005. [Online; accessed April 17, 2014]. [159] Voulgaraki, A. Semiparametric Regression and Mortality Rate Prediction. Ph.D. Dissertation, University of Maryland, 2011. [160] Voulgaraki, A., Kedem, B., Graubard, B. I., et al. Semiparametric regression in testicular germ cell data. The Annals of Applied Statistics 6, 3 (2012), 1185–1208. [161] Weaver, M., and Chamberlain, G. Sri Lanka declares end to war with Tamil Tigers. The Guardian 19 (2009). [162] White, G., and Porter, M. D. GPU accelerated MCMC for modeling terrorist activity. Computational Statistics & Data Analysis 71 (2014), 643– 651. [163] White, G., Porter, M. D., and Mazerolle, L. Terrorism Risk, Re- silience and Volatility: A Comparison of Terrorism Patterns in Three South- east Asian Countries. Journal of Quantitative Criminology 29, 2 (2013), 295– 320. [164] Wisner, D. H. A stepwise logistic regression analysis of factors affecting morbidity and mortality after thoracic trauma: effect of epidural analgesia. Journal of Trauma-Injury, Infection, and Critical Care 30, 7 (1990), 799–805. [165] Yang, Y., He, X., et al. Bayesian empirical likelihood for quantile regres- sion. The Annals of Statistics 40, 2 (2012), 1102–1131. [166] Young, J. K., and Findley, M. G. Promise and pitfalls of terrorism research. International Studies Review 13, 3 (2011), 411–431. 167 [167] Zhang, J., and Kai, F. Y. What’s the relative risk?: A method of correcting the odds ratio in cohort studies of common outcomes. Journal of the American Medical Association 280, 19 (1998), 1690–1691. [168] Zhou, W. Out of Sample Fusion. Ph.D. Dissertation, University of Maryland, 2012. 168