ABSTRACT Title of dissertation: A FRAMEWORK FOR DETECTING AND DIAGNOSING CONFIGURATION FAULTS IN WEB APPLICATIONS Cyntrica N. Eaton, Doctor of Philosophy, 2007 Dissertation directed by: Professor Atif Memon Department of Computer Science Software portability is a key concern when target operational environments are highly configurable; variations in configuration settings can significantly impact software correctness. While portability is key for a wide range of software types, it is a significant challenge in web application development. The client configuration used to navigate and interact with web content is known to be an important factor in the subsequent quality of deployed web applications. With the widespread use of diverse, heterogeneous web client configurations, the results of web application deployment can vary unpredictably among users. Given existing approaches and limited development resources, attempting to develop web applications that are viewable, functional, and portableforthevast web configuration space isa significant undertaking. As a result, faults that only surface in precise configurations, termed configuration faults, have the potential to escape detection until web applications are fielded. This dissertation presents an automated, model-based framework that uses static analysis to detect and diagnose web configuration faults. This approach over- comes the limitations of current techniques by featuring an extensible model of the configuration space that enables efficient portability analysis across the vast array of client environments. The basic idea behind this approach is that source code fragments (i.e., HTML tags and CSS rules) embedded in web application source code adversely impact portability of web applications when they are unsupported in target client configurations; without proper support, the source code is either processed incorrectly or ignored, resulting in configuration faults. Using static anal- ysis, configuration fault detection is performed by applying a model of the web application source against knowledge of support criteria; any unsupported source code detected is considered an index to potential configuration faults. In the effort to fully exploit this approach, improve practicality, and maximize fault detection efficiency, manual and automated approaches to knowledge acquisition have been implemented, variations of web application and client support knowledge models have been investigated, and visualization of configuration fault detection results has been explored. To optimize the automated acquisition of support knowledge, alternate learning strategies have been empirically investigated and provisions for capturing tag interaction have been integrated into the process. A FRAMEWORK FOR DETECTING AND DIAGNOSING CONFIGURATION FAULTS IN WEB APPLICATIONS by Cyntrica N. Eaton Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2007 Advisory Committee: Professor Atif Memon, Chair/Advisor Professor Rance Cleaveland Professor William Gasarch Professor Brian Hunt Professor Vibha Sazawal Acknowledgments I have the utmost appreciation, respect, and admiration for my advisor, Dr. Atif Memon, and I am truly grateful that I had the opportunity to work with him. His patience and guidance were key in this experience and I can never thank him enough for his open door, open ear, and well placed pep talks. I would like to take this opportunity to thank my preliminary and final exam- ination committee members, Dr. Atif Memon, Dr. Rance Cleaveland, Dr. William Gasarch, Dr. Brian Hunt, and Dr. Vibha Sazawal for their feedback and sugges- tions. I appreciate the time and effort you devoted to reading my drafts and helping me improve my work. Where would I be without my Mom, Cynthia Eaton, my Dad, Rickey Eaton, and my Maternal Grandmother, Blondell Hardie! You guys will never know the extent of my gratitude for raising me in a loving environment, providing me with all I need to thrive, and most importantly, for allowing me the freedom to plot my own path. Thanks for all of your support over the years. A special thanks goesout to my sister-friends Chakeita Jackson, Tracey Taylor, Erika Thompson, and Irvinia Jackson. I truly love you all and I really appreciate the laughter, heart-to-hearts, and support throughout this journey and beyond! To Dr. Johnetta Davis, Dr. Angela Grant, Hattie Redd, and Tamara Washington, thanks for being excellent mentors and providing me with a blueprint. To my STAND (Science and Technology: Addressing the Need for Diversity) Family, Joelle Davis Carter, Alice Bishop, and Tamara Singleton, you are an amaz- ii ing group of women and I feel honored and blessed to work with you. To my remaining Math SPIRAL (Summer Program in Research and Learning) Family, Dr. Marshall Cohen and Dr. Leon Woods, I?ve enjoyed working with each of you for the last two summers and I appreciate the encouragement. I have to thank my uber-talented sisters in research, Jaymie Strecker, Penelope Brooks, and Xun Yuan for reading my drafts, sitting through practice talks, and giving me useful feedback. I?ve enjoyed working with each of you and I wish you all the best in the future! In closing, I want to say that there is not enough room to list everyone who has touched my life and impacted me in a positive way. For everyone who prayed for me, encouraged me, and supported my endeavors, please accept my sincerest gratitude. iii Table of Contents List of Tables vii List of Figures viii List of Abbreviations 1 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Framework Design Considerations . . . . . . . . . . . . . . . . . . . . 6 1.4 Challenges in Attaining and Applying Source Support Knowledge . . 7 1.5 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.6 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Background and Related Work 12 2.1 Web Applications and the Browser Wars . . . . . . . . . . . . . . . . 12 2.1.1 HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2 CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.3 The Browser Wars . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.4 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Web Portability Analysis . . . . . . . . . . . . . . . . . . . . . 20 2.2.1.1 Manual and Automated Execution-based Approach . 20 2.2.1.2 Lookup-based Approach . . . . . . . . . . . . . . . . 24 2.2.1.3 Source Code Standardization Approach . . . . . . . 25 2.2.2 Web Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.3 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.4 Fault Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.5 Machine Learning in Software Fault Detection . . . . . . . . . 31 3 General Framework Architecture 33 3.1 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Manual and Automated Execution-based Approach . . . . . . . . . . 35 3.3 HTML Lookup Techniques . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 Source Code Standardization . . . . . . . . . . . . . . . . . . . . . . . 37 4 Initial Implementation 38 4.1 General Framework Instantiation . . . . . . . . . . . . . . . . . . . . 38 4.2 Inductive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.1 Modeling Client Configurations . . . . . . . . . . . . . . . . . 41 4.2.2 Modeling the Association Vector . . . . . . . . . . . . . . . . 43 4.2.3 Algorithm to Generate/Update the Inductive Model . . . . . . 47 4.2.4 Algorithm to Use the Inductive Model . . . . . . . . . . . . . 49 iv 4.3 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.1 Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.2 Empirical Method . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.2.1 Research Questions and Evaluation Strategy . . . . . 51 4.3.2.2 Independent and Dependent Variables . . . . . . . . 54 4.3.2.3 Experimental Procedure . . . . . . . . . . . . . . . . 54 4.3.2.4 Step 1: Client Configuration Selection . . . . . . . . 56 4.3.2.5 Step 2: Training Set Selection . . . . . . . . . . . . . 56 4.3.2.6 Step 3: Tag Extraction/Abstraction . . . . . . . . . 59 4.3.2.7 Step 4: Defining the Gold Standard . . . . . . . . . . 60 4.3.2.8 Steps 5: Tag Classification and Evaluation . . . . . . 61 4.3.3 Threats to Experimental Validity . . . . . . . . . . . . . . . . 61 4.3.3.1 Internal Validity . . . . . . . . . . . . . . . . . . . . 61 4.3.3.2 External Validity . . . . . . . . . . . . . . . . . . . . 62 4.3.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . 63 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5 Current Framework Implementation 69 5.1 Current Framework Design . . . . . . . . . . . . . . . . . . . . . . . . 72 5.1.1 Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.1.1 Support Criterion Structure . . . . . . . . . . . . . . 74 5.1.1.2 Knowledge Consolidation . . . . . . . . . . . . . . . 76 5.1.2 updateKB() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.1.2.1 Manual Update . . . . . . . . . . . . . . . . . . . . . 79 5.1.2.2 Automated Update . . . . . . . . . . . . . . . . . . . 80 5.1.2.3 Information Solicitation . . . . . . . . . . . . . . . . 81 5.1.3 processURL() . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.1.4 query() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.1.5 generateReport() . . . . . . . . . . . . . . . . . . . . . . . . 85 5.2 Machine Learning Knowledge Base Updates . . . . . . . . . . . . . . 87 5.2.1 Data Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.2 Web Application Model . . . . . . . . . . . . . . . . . . . . . 90 5.2.3 Learning Strategies . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2.4 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3 Research Questions and Metrics . . . . . . . . . . . . . . . . . . . . . 96 5.3.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 97 5.3.2 Configuration Subject and Data . . . . . . . . . . . . . . . . . 98 5.3.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3.3.1 Actual Support . . . . . . . . . . . . . . . . . . . . . 100 5.3.3.2 Predicted Support . . . . . . . . . . . . . . . . . . . 100 5.3.3.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 101 5.4 Study Design, Results, and Discussion . . . . . . . . . . . . . . . . . 102 5.4.1 Q1 Study: The effect of web application model, strategy, and training set size on learning accuracy . . . . . . . . . . . . . . 103 5.4.1.1 Experimental Procedure . . . . . . . . . . . . . . . . 103 v 5.4.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.4.2 Q2 Study: How does the web application model affect analysis costs in terms of tags/rules evaluated and the time needed for analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.3 Q3 Study: The effect of training set imbalance on false positives.105 5.4.4 Q4 Study: The impact of CSS inclusion during the learning process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4.5 Q5 Study: The impact of Tag Interaction during the learning process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4.6 Threats to Experimental Validity . . . . . . . . . . . . . . . . 107 5.4.6.1 Internal Validity . . . . . . . . . . . . . . . . . . . . 107 5.4.6.2 External Validity . . . . . . . . . . . . . . . . . . . . 108 6 Conclusions and Future Work 108 Bibliography 126 vi List of Tables 4.1 ? Values for All Tags in the Example of Figure 4.2. . . . . . . . . . . 47 4.2 Configuration Point Details. . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Part of the Negative Instance Set of the Initial Web Application Pool. 58 4.4 Evaluation Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1 Contingency table illustrating the four possible states of tag/category co-occurence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 vii List of Figures 1.1 When rendered in (a) Internet Explorer 6.0 and (b) Netscape 4.8, both on Windows XP, the Scrabble Home Page is significantly different. 2 1.2 A Web Application Created in Word 97 Executed Differently in Dif- ferent Client Configurations. . . . . . . . . . . . . . . . . . . . . . . . 10 2.1 Sample HTML/CSS code and the corresponding web page. . . . . . . 13 3.1 General framework architecture for detecting configuration-specific faults in web applications. . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1 An Example of a Client Configuration Space. . . . . . . . . . . . . . . 43 4.2 Set of Web Applications Classified as Positive or Negative. . . . . . . 46 4.3 The updateVector() Algorithm. . . . . . . . . . . . . . . . . . . . . 48 4.4 The queryData() Algorithm. . . . . . . . . . . . . . . . . . . . . . . 50 4.5 False Positive Rate with Respect to Training Set Size. . . . . . . . . . 65 4.6 Examples of Configuration-Specific Errors Found in Our Study. . . . 117 4.7 Mozilla is More Forgiving than Netscape when Tags are Misproperly Placed in Source Documents. . . . . . . . . . . . . . . . . . . . . . . 118 5.1 Instantiation of the general framework in the current Approach . . . 118 5.2 A generic representation of the knowledge base. . . . . . . . . . . . . 119 5.3 A practical example of support violation offsets. . . . . . . . . . . . . 119 5.4 Snapshot of the Knowledge Base after a manual update. . . . . . . . 119 5.5 Positive and negative web applications in an arbitrary client config- uration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.6 Snapshot of the knowledge base after an automated update. . . . . . 120 5.7 Snapshot of the knowledge base after information solicitation. . . . . 121 viii 5.8 The retrieval of data, implemented by processURL(), begins once the user submits a URL. From there, the corresponding web page is fetched and, based on the hyperlinks observed, a crawler collects each of the web pages that are a part of the site. Once the source code is retrieved, a vector model of the web application is created. . . . . . . 121 5.9 An overview of query() . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.10 Visualization of compliance analysis results . . . . . . . . . . . . . . . 122 5.11 The interaction matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.12 Accuracy Values Defined . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.13 The affect of learning strategy, training set size, and web application model on learning accuracy. The graph shown in (a) corresponds with the L1 learning strategy; (b) corresponds with L2. . . . . . . . . 123 5.14 The affect of web application model on time needed for analysis and the number of tags/rules analyzed. The graph shown in (a) shows the time needed(b) shows the number of tags analyzed. . . . . . . . . 124 5.15 The affect of training set imbalance on false positive rate. The graph shown in (a) shows what results with an extra negative example (b) shows the results with an extra positive training example. . . . . . . . 125 ix Chapter 1 Introduction 1.1 Motivation Establishing a high level of confidence in the quality of an implementation is essential in software development. Though the process of detecting and correcting faults in an implemented software system is inherently difficult[21], software quality assurance (QA) becomes increasingly complex when faults only surface in precise configurations. In such cases, the number, nature, and interconnection of constituent parts [45] that define the configuration can significantly impact software quality. To adequately reduce the number of faults in the delivered product, developers must evaluate the overall correctness of the implementation in addition to how that correctness is affected by variation in configurations. The problem of detecting configuration faults has a trivial solution if the space (or set) of target configurations is manageably small; namely, evaluating the imple- mentation in every possible configuration. Yet, as the size and variability of the configuration space grows, developers are faced with a fundamental QA trade-off between comprehensive configuration space coverage and limitations in develop- ment resources [51]. Access to each prospective configuration or the time necessary to apply an exhaustive, brute force assessment strategy is highly unlikely under realistic development conditions. Without an effective technique for assessing soft- 1 (a) (b) Figure 1.1: When rendered in (a) Internet Explorer 6.0 and (b) Netscape 4.8, both on Windows XP, the Scrabble Home Page is significantly different. ware portability across the configuration space, quality could degrade as software is ported and faults have the potential to remain latent until they are encountered by users in the field. As a result, correcting configuration faults is a crucial step in establishing portability for a highly varied configuration space. While configuration faults affect portability for a wide range of software types, they are a particular challenge in web application development. Defined as soft- ware accessed via a web browser over a network [50], web applications have become one of the most widely used class of software to date and critical components of the global information infrastructure [18]. Given that there are several different browsers (e.g., Microsoft Internet Explorer (IE), Netscape, AOL Browser, Opera, Mozilla, Safari for Mac OS X, Konqueror for Linux, Amaya, Lynx, Camino, Java- based browsers, WebTV), each with different versions (e.g., IE 4.0, IE 5.0, IE 6.0, 2 Netscape 4.0), a number of operating systems on which to run them (e.g., Windows, Power Macintosh), and dozens of settings (e.g., browser view, security options, script enabling/disabling) client configurations used to launch and interact with web appli- cations are highly varied. Though expanded variation and flexibility in web access options allows for more customized web user experiences, subsequent differences in configurations present a serious challenge for web developers to ensure univer- sal quality. Characterized as the software configuration explosion problem [34], this high degree of flexibility translates into a wide space of potential web client config- urations and complicates the QA effort by requiring that web developers not only ensure that the systems they have developed are correct, but that correctness per- sists as software is ported. Failure to evaluate web application portability across the configuration space can result in instances where a web page renders correctly in some client configurations and incorrectly in others (Figure 1.1). In practice, one of the more popular approaches to web application portability analysis involves a qualitative comparison between expected and actual execution. The idea behind this technique is to identify a subspace of popular client configura- tions and to launch the web application in each. While developers using this strategy get first-hand exposure to configuration faults, this approach is weakened by limited scope (because analysis focuses on a small number of target client environments) and non-diagnostic results (because only the occurrence of an error, not the cause of the error, is detected). In an effort to address the challenges of web configuration fault detection and the weaknesses of existing web portability analysis approaches, the goal of this research is to enable automated detection and diagnosis of web configu- 3 ration faults across a large configuration space in a manner that is comprehensive, yet efficient. The basic idea behind this approach is that source code fragments (i.e., Hypertext Markup Language (HTML) tags and Cascading Style Sheet (CSS) rules) embedded in web application source code adversely impact portability of web appli- cations when they are unsupported in target client configurations; without proper support, the source code is either processed incorrectly or ignored, resulting in con- figuration faults. Using static analysis, configuration fault detection is performed by applying a model of the web application source against knowledge of support criteria; any unsupported source code detected is considered an index to potential configuration faults. In the effort to fully exploit this approach, improve practical- ity, and maximize fault detection efficiency, manual and automated approaches to acquisition of source code support knowledge have been implemented, variations of web application and client support knowledge models have been investigated, and visualization of configuration fault detection results has been explored. To optimize the automated acquisition of support knowledge, alternate machine learning strate- gies have been empirically investigated and provisions for capturing tag interaction have been integrated into the process. In the immediate sections that follow, this chapter continues with an overview of the research approach, insight into design considerations for practical implementation, a discussion of research contributions, and finally, closes with an outline of the dissertation structure. 4 1.2 Research Approach In web application development, HTML tags and CSS rules are the core lan- guages used. As building blocks of web applications, HTML and CSS directives indicate how an application should be rendered and how users should be able to in- teract with various web application widgets. When web applications are launched, browsers parse the source code and use it as a basis for rendering and functionality. The ability of a configuration to process these statements correctly provides a criti- cal link between what the web application should be able to do, as outlined in source code, and what it actually does once it has been deployed; a client configuration capable of processing a given tag/rule properly is said to support it. Asymmetric support for source code across the configuration space greatly complicates develop- ment of web applications that are portable. Given this concept of asymmetry and the perspective that the functional and aesthetic properties of web applications are a function of the underlying source code, it is very difficult for web developers to know which configurations will support their specification, embodied by the source code elements, and which ones will not. In light of these factors, the problem of evaluating the portability of web applications across varied configurations can effec- tively be recast as identifying known patterns of unsupported source code; this idea lies at the base of the web portability analysis approach utilized in this research. In the example shown in Figure 1.1 for instance, the tag