ABSTRACT Title of Dissertation: EVENT-CODE INTERACTION DIRECTED TEST CASES Ishan Banerjee, Doctor of Philosophy, 2016 Dissertation directed by: Professor Atif M. Memon, Department of Computer Science The Graphical User Interface (GUI) is an integral component of contemporary computer soft- ware. A stable and reliable GUI is necessary for correct functioning of software applications. Comprehensive verification of the GUI is a routine part of most software development life-cycles. The input space of a GUI is typically large, making exhaustive verification difficult. GUI defects are often revealed by exercising parts of the GUI that interact with each other. It is challenging for a verification method to drive the GUI into states that might contain defects. In recent years, model-based methods, that target specific GUI interactions, have been devel- oped. These methods create a formal model of the GUI’s input space from specification of the GUI, visible GUI behaviors and static analysis of the GUI’s program-code. GUIs are typically dynamic in nature, whose user-visible state is guided by underlying program-code and dynamic program-state. This research extends existing model-based GUI testing techniques by modelling interactions between the visible GUI of a GUI-based software and its underlying program-code. The new model is able to, efficiently and effectively, test the GUI in ways that were not possible using existing methods. The thesis is this: Long, useful GUI testcases can be created by examining the interactions between the GUI of a GUI-based application and its program-code. To explore this thesis, a model-based GUI testing approach is formulated and evaluated. In this approach, program-code level interactions between GUI event handlers will be examined, modelled and deployed for constructing long GUI testcases. These testcases are able to drive the GUI into states that were not possible using existing models. Implementation and evaluation has been conducted using GUITAR, a fully-automated, open-source GUI testing framework. EVENT-CODE INTERACTION DIRECTED TEST CASES by Ishan Banerjee Dissertation submitted to the faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2016 Advisory committee: Professor Atif M. Memon, Chair Professor Rance Cleaveland Professor Michel Cukier Professor Mihai Pop Professor James Purtilo c© Copyright by Ishan Banerjee 2016 DEDICATION the mountain top beckons me so, come forth hither, ’tween the clouds, smell the climes, feel the snow, above the land, sprawled there proud. my steps walk thither, at crests forlorn, where are the treasures, promised afore? no gold that shines, like sun at morn, pearls all white, there are no more. those new peaks, hidden from view, jewels myriad, ’twere unseen from below, calls me forth to tread anew, shape a passage for thee to follow. ii ACKNOWLEDGEMENTS I would like to thank Professor Atif Memon for accepting me as a student and guiding me through the long journey. This dissertation is a product of his enthusiasm and encouragement. I would like to thank past and present members of the GUITAR group at the University of Maryland, College Park. This dissertation has its roots in the work of past GUITAR members, namely Adithya Nagarajan, Cyntrica Eaton, Jaymie Strecker, Penelope Brooks, Qing Xie, Scott McMaster and Xun Yuan. Thanks to my colleagues Bao Nguyen, Bryan Robbins, Emily Kowal- czyk, Ethar Elsaka, Leslie Milton and Zebao Gao whose thoughts are reflected in this dissertation. Thanks are due to Professor Rance Cleaveland, Professor Michel Cukier, Professor Mihai Pop and Professor James Purtilo for serving on my thesis committee, providing valuable feedback about the research and reviewing the manuscript. iii Table of Contents 1 Introduction 1 1.1 Background and terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 GUI testing challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.6 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.7 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Background 14 2.1 Finite State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Variable Finite State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Complete Interaction Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Off-nominal Finite State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5 Event-Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6 Event-Interaction Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.7 Event-Semantic Interaction Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.8 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.9 Genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.10 Covering arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 Event-Code Interaction 30 3.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.1 Approach 1: Does one event influence another event? . . . . . . . . . . . . 31 3.2.2 Approach 2: Do multiple events in combination influence another event? . 32 3.2.3 How can a variation in event execution be detected? . . . . . . . . . . . . 32 3.3 Event-Code Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.1 Event-Code Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.2 Event-Code Interaction Graph . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.3 Composite Event-Code Interaction . . . . . . . . . . . . . . . . . . . . . . 38 3.4 GUI states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.2 Hierarchical signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.3 New GUI states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 iv 4 Tools and testbeds 55 4.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1.1 Cobertura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1.2 GUITAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.1 Standard workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.2 ECIG extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5 Empirical evaluation 67 5.1 Experiment overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.1.2 Applications under test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.1.3 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Standard workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.3 ECIG extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6 Future work 88 A Radio Button Demo 91 Bibliography 98 v List of Tables 2.1 Techniques developed to model the GUI of a GUI-based application. Testcases are typically generated based on the model and executed on the application. . . . . . . 15 2.2 Covering ArrayCA(9; 2, 4, 3) for the Exit Confirmationwindow of Radio Button Demo application. Only 9 testcases are required for testing 2-way inter- action of 3 events at all 4 positions in a testcase. Exhaustive testing would require 81 testcases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Partial code from event handler of Whole Word (Edit menu) checkbox event. . 53 3.2 Fault is seeded at line 249. The ! has been removed. . . . . . . . . . . . . . . . . . 54 4.1 Standard workflow path corresponding to Figure 4.1. Each segment of the path is labeled I–III. The workflow is: I −→ II −→ III. . . . . . . . . . . . . . . . . . . . 59 4.2 ECIG extension corresponding to Figure 4.1. The workflow is I −→ II −→ IV −→ II −→ III. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 A representative SequenceLength-2 testsuite containing 8 testcase, with corre- sponding code coverage files for each testcase. Code coverage is collected after executing each event of every testcase. . . . . . . . . . . . . . . . . . . . . . . . . 61 5.1 Properties of applications under test, that were used for evaluation. . . . . . . . . . 68 5.2 GUI Ripper reverse engineers the GUI of an applications to create the GUI tree. The GUI tree contains structural information about the GUI. . . . . . . . . . . . . 71 5.3 EFG Graph Converter extracts follows relation from the GUI tree to create the EFG. 72 5.4 Testsuite creation time, execution time and counting time for SequenceLength-n testcase based on the EFG, where n ∈ {2, 3, 4, 5, 6, 7}. Counting time is the time taken to count the possible number of testcases, without actually generating them. . 72 5.5 EIG obtained from the EFG by removing structural events. . . . . . . . . . . . . . 75 5.6 Testsuite creation time, execution time and counting time for SequenceLength-n testcase based on the EIG, where n ∈ {2, 3, 4, 5}. Counting time is the time taken to count the possible number of testcases, without actually generating them. . . . . 76 5.7 EIG-based SequenceLength-2 testcases is the seed testsuite. ECI relations are de- termined from the code coverage. GUI state forms the baseline, to check if another testsuite found new GUI states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.8 Metrics for obtaining the ECI event pairs from coverage data, obtained by execut- ing EIG-based seed testsuite. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.9 Count of testcases for ECIG-based SequenceLength-n testcase, where n ∈ {3, ..., 20}. Total testcases for each application is shown in Total column. . . . . . . . . . . . . 79 5.10 ECIG-based SequenceLength-n execution results, where n ∈ {3, ..., 20}. . . . . . 81 vi 5.11 Time taken to compute GUI state signatures, using procedure sigUpdate, for the baseline EIG-based baselineSignature, ECIG-based targetSignature and identifying new ECIG states using procedure newState. . . . . . . . . . . . 83 5.12 States detected during execution of EIG-based seed testsuite and ECIG-based tar- get testsuites. Number of ECIG-based testcases detecting a new state is shown in testcases (E). Number of unique new states detected by the ECIG-based testsuite is shown in unique new states (G). . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.13 Column (H) shows the percentage of states reached by ECIG-based testsuite that are new. Column (I) shows the percentage of states reached by ECIG-based test- suite that were already reached by the baseline EIG-based testsuite. . . . . . . . . . 84 vii List of Figures 1.1 A Java application, Radio Button Demo. It has one top-level window, Radio Button Demo and one modal window, Exit Confirmation, that is opened by the Exit button. Nine widgets are labeled, w0 − w8. . . . . . . . . . . . . . . 7 2.1 Finite State Machine for the Radio Button Demo application. State is repre- sented with LECS – L create log, E Exit Confirmation window opened, C shape created, S Circle/Square selected. Transitions are labeled as: SRC −→ DST , with input marked at tail. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Variable Finite State Machine for the Radio Button Demo application. State is represented with LES – L create log, E Exit Confirmation window opened, S Circle/Square selected. Transitions are labeled as: (precondition)SRC −→ DST (effect), with input marked at tail. The variable V models the created state. 17 2.3 Event-Flow Graph for the Radio Button Demo application. . . . . . . . . . . 20 2.4 Event-Interaction Graph for the Radio Button Demo application. . . . . . . . . . . 22 2.5 Event-Semantic Interaction Graph for the Radio Button Demo application. . . . . . 24 2.6 ESI relationship. (a) Initial application state (b) square executed (c) create exe- cuted (d) square→ create in sequence. square influences behaviour of create. . . 25 2.7 Plan generation for Radio Button Demo application (a) Initial and goal states (b) Two generated plans (c) Plan operators for GUI. . . . . . . . . . . . . . . . . . . . 26 2.8 Genetic algorithm emulating a novice user’s behavior to generate GUI testcase. . . 27 3.1 Event-Code Interaction Graph for the Radio Button Demo application. . . . . 38 3.2 Event-Code Interaction Graph for length-2 composite events in the Radio Button Demo application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 Signatures computed for the hierarchical GUI structure of RadioButton Demo’s initial state. It contains one GUI window. (a) Visible GUI window with signatures of widgets and containers. (b) Tree showing hierarchical nature of GUI windows with signature of widgets and containers. . . . . . . . . . . . . . . . . . . . . . . . 44 3.4 Signatures computed for the hierarchical GUI structure of RadioButton Demo after executing the Exit button. It contains two GUI window. (a) Visible GUI windows with signatures of widgets and containers. (b) Tree showing hierarchical nature of GUI windows with signature of widgets and containers. . . . . . . . . . . 45 3.5 Procedures to detect if an application reached new GUI states after executing a target testsuite. Procedure sigUpdate computes a signature of the GUI state after the execution of an event. Procedure listNew compares the signatures of a target testsuite with a baseline testsuite. . . . . . . . . . . . . . . . . . . . . . . . 47 viii 3.6 ECIG-based length-3 testcase – Auto Wrap −→ Regular Expressions −→ Whole Word – executed on JEdit. Resulting GUI states are shown – initial state (top- left), after executing Auto Wrap (bottom-left), after executing Regular Expressions (bottom-right), after executing Whole Word (top-right). Four checkboxes, labelled 1 − 4, become checked as a result of the testcase execution. The final state, is a new GUI state, reached by the length-3 testcase. It is not reachable using existing length-2 testcase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.7 Software defect being manifested as a GUI defect. The checkbox 4 is not checked because of the defect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1 Typical GUITAR workflows. The Standard workflow enables common GUI test- ing activities. An extension of the standard workflow is developed as the ECIG extension. I–IV represent 4 distinct segments of the workflows. . . . . . . . . . . . 58 4.2 Procedure ECI computes ECI relations between event pairs of a GUI. Procedure ECIG produces an ECIG-based on the ECI relation. . . . . . . . . . . . . . . . . 62 4.3 Testbed for executing testcases. Controller machine, C, executes 20 concurrent testcases on each worker machine, M1, M2, M3. Results are stored in the con- troller machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1 Visual representation of the EFG, EIG and ECIG of ArgoUML and Buddi. The EFG represents follows relation between all pairs of events. EIG represents follows relation between non-structural events. It contains fewer vertices and more edges. ECIG is a subset of the EIG, containing edges that interact at the program-code level. 73 5.2 Visual representation of the EFG, EIG and ECIG of JabRef and JEdit. The EFG represents follows relation between all pairs of events. EIG represents follows relation between non-structural events. It contains fewer vertices and more edges. ECIG is a subset of the EIG, containing edges that interact at the program-code level. 74 5.3 Count of testcases where the first new state was found after executing a specific number of events in the testcase. . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 ix Chapter 1 Introduction Contemporary computer software provides different methods for users to interact with the software. For example, a Unix operating system provides a command line shell to administrators, avionics software accepts pilot’s inputs from aircraft control devices and a commercial banking system provides a web-based interface to depositors. A popular method for interacting with computer software is via Graphical User Interfaces (GUI). Graphical User Interfaces are used across a wide spectrum of: 1) devices – enterprise computer systems, personal computers, hand-held computers, vending machine, operating heavy machinery 2) platforms – Android, Linux, Microsoft Windows, Solaris, world-wide web and 3) cost – $100 e-reader, $100M1 supercomputer. The wide-ranging applicability and deployment of Graphical User Interfaces warrants continued investigation into GUIs, their usage, behavioral characteristics, security and reliability. A GUI-based software system (GUI-based software or GUI-based application) is one that pro- vides a GUI as a method of interacting with it. GUI-based software systems are the focus of the research presented in this study. They belong to the larger class of Event-Driven [28] software sys- tems. Concepts presented here may be applicable, in part or whole, to other event-driven systems. 1http://en.wikipedia.org/wiki/Tianhe-2 1 1.1 Background and terminology The GUI of GUI-based software presents a visual interface, by means of a visual display sys- tem. A user of the software can execute actions, using an appropriate input device (such as mouse, keyboard or touch) on the GUI. Terms, that will be used to describe the interaction of the user with the GUI, and testing the GUI are defined as follows: widget: A GUI widget is a primitive element of the GUI. Many GUI widgets together constitute the GUI of a GUI-based application. For example, the ‘Save’ and ‘Cancel’ buttons in the ‘Save As...’ dialog box of Microsoft Notepad are widgets. The layout of GUI widgets are often hierarchical, where a GUI container (such as the ‘Save As...’ dialog box) contains a set of primitive GUI widgets, or other GUI containers. event: A GUI event is an instance of executing an action (e.g.: click, select) by the user on a widget (e.g.: ‘Save’, Checkbox). event sequence: An ordered set of GUI events is an event sequence. An event sequence can be executed on the GUI by the user or by a software agent such as a GUI testing tool. As an example, ‘File’ −→ ‘Save As...’ −→ ‘Save’ is an event sequence that can be executed on Microsoft Notepad immediately after it has been launched. testcase: A GUI event sequence that can be executed on a GUI-based application is a testcase. A testcase is typically executed on the application at a time when the application’s GUI state is known, for example, immediately after the application is launched. length-n testcase: A GUI testcase containing n consecutive GUI events of interest, is a length- n testcase. A length-n testcase may be prefixed with additional GUI events that make it possible to execute the first event in the length-n testcase. For example, ‘Paste’ −→ ‘Edit’ −→ ‘Select All’ is a length-3 testcase in Microsoft Notepad. However, this may be prefixed with the event ‘Edit’ to reach the ‘Paste’ event, after the application is launched. 2 In this study, the terms event and widget may be interchanged, when the context is clear. For example, a statement ‘the Save event was executed’ would expand to ‘the click event was executed on the Save button widget’. 1.2 GUI testing challenges Functional verification of the GUI of GUI-based applications has always been an important component of their development life-cycle. This verification does not focus on the underlying ‘business-logic’ of the application, although defects in that part of the software may also be re- vealed. As with other software components, verification of the GUI has its own unique set of challenges Verification of the GUI typically entails execution of a selected set of testcases on the GUI of the application, and comparing the resulting state of the GUI with an expected state (using an ora- cle [32]). Given a GUI, with hundreds or thousands of events, the possible number of input event sequences can be very large, potentially infinite (for example, a user can execute ‘Edit’−→ ‘Paste’ repeatedly on Microsoft Notepad). Hence, it becomes challenging to 1) identify the complete set of executable testcases 2) prioritize the testcases in a suitable order 3) execute testcases, manually or automatically, within a reasonable time. In manual verification, a human tester typically 1) launches the application 2) identifies a test- case 3) clicks on widgets following the prescribed order of the testcase 4) repeats steps 2 − 3 or 1 − 3 until a testing criteria has been achieved. When this method is employed for testing the application, it becomes challenging for the tester to 1) identify all testcases 2) prioritize and select testcases according to some criteria 3) execute the selected testcases without error or omission. Automated verification also has similar challenges. Capture/Replay (record/replay) [20] is a popular method for automated verification of the GUI. In this method, a human tester first executes a set of testcases, on the GUI, which are observed and captured by the capture tool. At a later time, 3 the replay tool executes the same testcases on the GUI to detect potential regressions. Using this method however, the human tester may still miss out ‘important’ testcases. Besides capture/replay, exhaustive execution of all possible testcases, in an automated manner, is typically impractical, owing to the large number of testcases and limited time. In the last decade, Model-based testing [39, 42, 6, 34, 47, 43, 37, 38] has been adopted as a popular method for GUI testing. Some of the model-based methods have produced practical methods for 1) reverse engineering the GUI of a GUI-based software to extract its structure 2) creating a semantic model of the GUI such as EFG [43], EIG [34] from the extracted structure 3) generating ‘important’ testcases based on the semantic models. For example, an EFG identifies follows relationship between pairs of GUI events. An event ey follows event ex if ey is available for execution immediately after executing event ex. Models such as EFG and EIG have relied on structural information available from the visible run-time state of the GUI, for example, ‘is the Cancel button enabled after editing a textbox’? Such testcases were shown to be effective at revealing defects in the application [34, 43, 47]. A GUI testcase typically drives the GUI of a GUI-based application from a starting state into different user-accessible states. As events are executed on the GUI, the visible state of the GUI changes. Naturally, executing longer event sequences on the GUI could make the GUI reach states that are not possible with a shorter event sequence. Executing a longer event sequence on the GUI increases chances of interactions between participating event handlers. It may execute program-code that depend on each other. Executing long event sequences therefore would be good at exercising interacting program-code and at revealing software defects. For example, in Microsoft Notepad, the event sequence – ‘Edit’ −→ ‘Select All’ −→ ‘File’ −→ ‘New’ −→ ‘Edit’ −→ ‘Paste’ – would be more likely to reveal a software defect than the short event sequence ‘Edit’ −→ ‘Select All‘. Long event sequences are intuitively more likely to reveal software defects. At the same time, it becomes challenging for the human tester to select a long event sequence for executing on the 4 GUI. This is because, as longer event sequences are considered, the number of such executable event sequences grows rapidly. Not all of those event sequences may be equally good at driving the GUI into new states or at revealing defects. In addition, the number of available long event sequences is prohibitively large. Hence, prioritizing or selecting useful event sequences becomes an important part of the verification process. The above discussion motivates the necessity of generating ‘long’ and ‘useful’ GUI testcases. Such testcases which are chosen from the large input space of the GUI, would drive the GUI into states that are not possible using shorter testcases. 1.3 Thesis statement Software defects in the GUI of a GUI-based application may be revealed by executing long sequences of events on the GUI. This study presents a method to create long events sequences that are useful testcases for a GUI-based application. Program-code level interactions between GUI events are used as the basis for selecting the events sequences. Program-code executed in response to a GUI event, called event handler, often share program- code components (such as functions or sub-routines) and state. Therefore, it is possible that the event handler of an event ex influences the subsequent execution of the event handler for event ey, thereby making it behave differently. Defects in the event handlers for such interacting events may be revealed by executing such events together. This study seeks to 1) define program-code interactions between event handlers in GUI-based software 2) establish that such interactions exist in GUI-based software 3) describe a simple, prac- tical method for detecting such interactions 4) create a formal model of the program-code inter- actions 5) generate long testcases based on such interactions 6) evaluate the effectiveness of such testcases at revealing defects. This information is leveraged to execute targeted GUI testcases that can reveal defects in the program code of GUI event handlers. 5 The thesis statement is: Long, useful GUI testcases can be created by examining the interactions between the GUI of a GUI-based application and its program-code. The following terms are defined for usage in the remainder of this study: Definition: A long GUI testcase is defined as a testcase that contains 3 or more GUI events selected from a GUI model.  Definition: A useful GUI testcase is defined as a testcase that can exercise a GUI in a manner that reveals defects in its event handlers.  1.4 Approaches This section lists approaches for modelling interactions between the GUI of a GUI-based appli- cation and its program-code (see Section 3.2). The models can be used for creating long testcases. Radio Button Demo: In this and subsequent sections, a simple Java application, shown in Fig- ure 1.1, called Radio Button Demo will be used as a running example. Different GUI models for this application will be constructed and explained. The application has one top-level window, Radio Button Demo and one modal window, Exit Confirmation, which is invoked by clicking the Exit button. Widgets of interest on the application are labeled w0 through w8. A GUI event can be executed on all these widgets except for w4, which can only display GUI content. The program-code of this application is listed in Appendix A. Approach 1: The following two event sequences are executed on the Radio Button Demo application (each after a fresh application start): 6 w 1 w 2 w 3 w 5 w 0 w 6 w 4 w 7 w 8 Figure 1.1: A Java application, Radio Button Demo. It has one top-level window, Radio Button Demo and one modal window, Exit Confirmation, that is opened by the Exit button. Nine widgets are labeled, w0 − w8. (B) circle −→ create (C) create Columns (B) and (C) of Appendix A, show the lines-of-code executed by the event handler for the event create for the two cases. It can be seen that they are different. In fact, they differ at lines: {36, 37, 133, 137, 138, 141, 194, 195, 196, 200, 201, 220, 221, 222} This intuitively indicates that the event handler for circlemight have influenced, or interacted with, the event handler for create to make it behave differently. A defect in the interacting program-code can lead to incorrect execution. For example, the following defect in line 137: original: shape = new CirclePanel(); defect: shape = new SquarePanel(); 7 would lead to incorrect behavior of the application. This defect could be revealed by executing a testcase containing the event sequence circle −→ create. This approach detects events that interact at the program-code level. The interacting event sequence (circle −→ create) is a good candidate to include in a model for modelling the interac- tions between the GUI of a GUI-based application and its program-code. In the above example, the defect can be revealed by generating event sequences containing interacting events. Approach 2: In the preceding example, the event handler for event circle was shown to interact with the event handler for event create. It is possible that a set of events do not independently interact with another event, However, those events may together interact with the other event. For example, the events handlers for circle and create do not interact with the event handler for exit. However, when the events circle and create are executed in sequence, it may interact with the event exit. Here, the composite event {circle, create} interacts with the event exit, shown as {circle, create} −→ exit. In this approach, the event sequence circle −→ create −→ exit is a good candidate for inclu- sion into the model. This approach may be used for identifying deeper interactions between GUI events at the program-code level. It will be shown later that the GUI models created using the above approaches can be used to create long, useful GUI testcases. This study evaluates long testcases created using Approach 1. 1.5 Related work Capture/replay and model-based approaches have been the most prominent methods of testing the GUI. Other techniques such as specification based, usage profile based, random testing and symbolic execution have also been reported. 8 Capture/replay has been a traditional method for testing the GUI. In this method, a human tester executes GUI events – such as mouse clicks and text-field entries – on the GUI. The capture tool observes and records these actions. At a later time, the recorded GUI events can be automatically executed on the GUI (potentially a newer version) by the replay tool. At that time, the replay tool can check for failure by observing the GUI or checking the application’s execution log. Early capture tools captured and recorded mouse coordinates and keystrokes as test scripts. As a result, the replayer would break if the screen coordinates or resolution changed during the replay phase. Modern capture/replay tools such as Abbot2, Quick Test Pro3 and Selenium4 identify a GUI object using its GUI properties – such as name, color, width and height – and are more tolerant to changes in execution environment. The research community has amply studied capture/replay tools and have augmented it in var- ious ways. Ostrand et al. [36] develop a capture/replay tool to represent the capture actions as a flowchart. The flowchart can be altered by the tester and replayed back on the GUI. Ariss et al. [2] target Java applications where they augment a capture and replay tool with a model and automated test oracles, resulting in improved test coverage. Grechanik et al. [18] use the plat- form’s accessibility technologies to better access GUI objects and their properties from the GUI. This has been used for emulating a human tester and automatically execute testcases on the GUI. Derezinska et al. [14] combine the capture/replay method with extraction of GUI properties from the application’s binary executable. Chang et al. [10] develop a tool, Sikuli, based on the cap- ture/replay method. Sikuli uses computer-vision technology to identify GUI objects during the replay phase and accurately identify GUI objects. Chen et al. [12] have combined capture/re- play and specification based methods in GTT, a tool for testing Java applications. Hellmann et al. [19] use capture/replay to capture low-quality GUI images from hand-sketched prototypes, in a test-driven-development environment, and use replay for verifying the GUI implementation. 2http://sourceforge.net/projects/abbot/ 3http://en.wikipedia.org/wiki/HP QuickTest Professional 4http://docs.seleniumhq.org/ 9 During the last decade, model-based testing of GUI-based applications has received substantial attention [5]. Graph models such as Event-Flow Graph (EFG) [43], Event-Interaction Graph (EIG) [34], Event-Semantic Interaction Graph (ESIG) were created by reverse engineering the GUI using a GUI Ripper [26, 27]. The GUI Ripper was an enabling technology for daily/nightly regression testing [31, 25] of GUI-based applications. The GUI Ripper also enabled creating different lev- els of detail for GUI oracles [32]. Other models such as Covering Arrays, Complete Interaction Sequence and AI Planning models were also developed. Model-based testing of GUI applications are discussed in Chapter 2. In random testing of GUI applications, random GUI events are typically executed on the appli- cation. This often serves as an effective smoke test during the development process. Hu et al. [22] test Android application by executing random events on it. In a similar approach, Dabczi et al. [13] tests MATLAB by generating random inputs to simulate a user. Takala et al. [41] model Android applications as a state machine and then use Monkey5 to randomly choose transitions in this model. Several other techniques have been developed for testing GUI applications. Chen et al. [11] have developed a tool to aid specification based testing of GUI applications. Ganov et al. [16] use symbolic execution to prune the event and data input space. Memon [29] monitors the usage of a GUI application to create testcases which can be replayed on a newer version of the same application. This can be used for automated regression testing during development. Garg et al. [17] generate user’s GUI interaction usage characteristics to simulate a real user. Arlt et al. [3, 4] examine bytecodes of Java applications to infer static data dependencies between event handlers. These dependencies yield a Event Dependency Graph that can be used for directed testing. The techniques discussed above largely rely on the visible GUI of the application under test (AUT) to create a model and generate testcases. There is no existing research that studies the interaction of visible GUI objects and its corresponding program code. While most GUI events directly or indirectly execute event handlers, the interaction between GUI events and event han- 5http://developer.android.com/tools/help/monkey.html 10 dlers and that between different event handlers are not known. This study attempts to understand how program code of an AUT behaves in response to GUI events and leverages this information to generate long testcases that are more likely to reveal GUI defects. 1.6 Challenges To generate long testcases, this study introduces a new paradigm, Event Code Interaction (ECI), that models the effect of executing a GUI event e1, on the execution of a subsequent GUI event e2. This study is restricted to GUI-based applications. However, the concepts developed here may be applicable to other systems as well, especially those that can be considered to be event-driven. When events are executed on the GUI of a GUI-based application, program code is executed within the application. This includes a) Event Handlers, which respond to the GUI event and 2) Business Logic, which performs the tasks expected from the application. The program state of an application is typically read, modified and written by the program code, typically in response to GUI events. Program state may affect the program code being executed in response to GUI events. That is, the execution of a GUI event e1 may affect the program state, that in turn affects the program code p being executed in response to a subsequent event e2. The effect on p can be classified as 1) effect on program state, that is, the program state changed by e2 may be different in the presence of e1 2) program code, that is, the program code being executed by e2 may be different in the presence of e1. If the program-state or program-code behavior of event e2 is affected by the prior execution of event e1, then event e1 is said to interact with event e2 at the code-level. In this study, the interaction is restricted to effects on program code only. It is shown as e1 −→ e2. This interaction is modelled as an Event Code Interaction Graph (ECIG). Vertices, for example e1, e2, in this graph 11 model GUI events and edges represent code interaction, for example, e1 −→ e2. Edges in the ECIG are a subset of the edges from the EFG (or EIG) that have been identified based on program-code level interactions. The ECIG is a sparser graph than the EFG (or EIG). Therefore, using the ECIG, it is possible to generate a targeted testsuite containing longer (albeit fewer) testcases and execute them all in a reasonable time. Code coverage has traditionally been a good metric for evaluating the effectiveness of a test- suite, or a testcase generation method. It has been shown in this study, that long GUI testcases may not exhibit increased code coverage, compared to similar shorter GUI testcases. This necessitates the use of a different metric to determine the usefulness of a testsuite. In this study, GUI state coverage has been used as a metric to determine the usefulness of a testsuite. This study identifies the following challenges in testing GUI-based software and makes the corresponding contributions: Challenge 1: It is challenging to identify interactions between program code corresponding to GUI events. Contribution 1: A new paradigm, called Event Code Interaction, is defined, which models inter- actions between program code corresponding to GUI events (see Section 3.3). Challenge 2: It is challenging to identify events in a GUI whose combined execution can reveal defects. Contribution 2: Based on the ECI paradigm, an Event Code Interaction Graph is created for the GUI of a GUI-based application. This models GUI events which contain interaction at the program code level. Event sequences executed from this graph are likely to test combined execution of events which interact with each other (see Section 4.2.2). Challenge 3: It is challenging to generate long event sequences that are more likely to reveal program defects. 12 Contribution 3: The ECIG is a sparse graph. Long testcases can easily be generated by traversing this graph, targeting specific parts of the GUI event space (see Section 5.3). Challenge 4: It is challenging to identify new GUI states reached by an application when a test- suite is executed on it. Contribution 4: A method is developed to identify new GUI states that are reached by an applica- tion when a testsuite is executed on it. These states are then compared to a baseline testsuite. 1.7 Outline The remainder of this dissertation is structured in five chapters. Chapter 2 surveys existing model-based GUI testing techniques. Chapter 3 describes Event-Code Interaction, related concepts and models. Chapter 4 describes tools and testbeds deployed in this study. Chapter 5 presents results of evaluating ECI directed GUI testing. Chapter 6 identifies directions for future research, based on this work. 13 Chapter 2 Background Verification of the GUI of GUI-based software have been performed using different approaches. Some of the popular techniques that have been attempted are: 1. feedback based [9] 2. specification-based [40] 3. genetic algorithms [45] 4. capture/replay-based [7, 2, 23] 5. model-based [38, 30, 47] 6. random-testing [8, 21] 7. execution-trace driven [1] The methods developed in this study belongs to the model-based family of testing techniques. To provide a context for this research, a survey of model-based GUI testing techniques is presented in this chapter. Model-based GUI testing can be broadly partitioned into two distinct phases. In the first phase, a model of the GUI is created. Creating the model may be automated or manual or a combination of both. In the second phase, testcases are generated (manually or automatically) based on the model. Most model-based GUI testing techniques are concerned with developing modelling tech- niques which detect more defects with a minimum number of testcases. Table 2.1 lists the GUI modelling techniques discussed in this section and the family of modelling techniques it belongs to. 14 Family Model Section State Machine Finite State Machine 2.1 Variable Finite State Machine 2.2 Complete Interaction Sequence 2.3 Faulty Complete Interaction Sequence 2.4 Workflow Event-Flow Graph 2.5 Event-Interaction Graph 2.6 Event-Semantic Interaction Graph 2.7 Event Sequence AI Planning 2.8 Genetic Algorithm 2.9 Combinatorial Coverage Arrays 2.10 Table 2.1: Techniques developed to model the GUI of a GUI-based application. Testcases are typically generated based on the model and executed on the application. SequenceLength-n: A popular (graph) model-based testcase generation method is called SequenceLength- n. In this method a depth-first-walk of the graph is executed starting from an event of interest. The path walked from the starting event is collected and produced as a testcase when depth n is reached. A SequenceLength-n testsuite contains all possible testcases of length-n generated from the graph of interest. The SequenceLength-n method of testcase generation will be used in this study. 2.1 Finite State Machine Esmelioglu et al. [15] have modelled the GUI of a GUI-based application as a Finite State Machine (FSM). An FSM is defined as FSM = (S, I, O, T,Φ), where S is the set of finite GUI states, I is the set of inputs to the GUI, O is the finite set of outputs, T is the transition function 15 010C110C 011C 111C 010S110S 011S 111S 000C stop yes exit no x xyes 100C 001C 101C 000S100S 001S 101S exit no yes x x exit no exit no yes x xyes no exit x x no exit no exit yes resetcreate circle square circle square no exit reset create circle square circle square create A A reset reset start B B create Figure 2.1: Finite State Machine for the Radio Button Demo application. State is represented with LECS – L create log, E Exit Confirmation window opened, C shape created, S Circle/Square selected. Transitions are labeled as: SRC −→ DST , with input marked at tail. S × I → S defining the next state based on the current state and input, Φ is the output function S × I → O defining the output from a transition. An FSM for the Radio Button Demo application is shown in Figure 2.1. In this exam- ple, each state represents 4 GUI elements – L, E, C, S, where L = 0/1 indicates that w6 is (un)checked, E = 0/1 indicates that the Exit Confirmation window is closed/opened, C = 0/1 indicates that a shape is cleared/rendered, S = C/S indicates that a circle/square radio button has been selected. Deploying FSM-based GUI testing suffers from certain practical problems. First, the FSM may require a large number of states to represent the GUI. This is tedious to create, both manually and automatically, and also difficult to maintain for a GUI under development or maintenance. The states and transitions are difficult to intuitively map onto the actual GUI, resulting in greater maintenance effort. 16 01C11C 01S11S 00C stop yes exit no x xyes 10C 00S10S exit no x xyes no exit circle square no exit create (v=1) circle square start create (v=1) (v=1) reset (v=0) create (v=1) (v=1) reset (v=0) create (v=1) (v=1) reset (v=0)(v=1) reset (v=0) Figure 2.2: Variable Finite State Machine for the Radio Button Demo application. State is represented with LES – L create log, E Exit Confirmation window opened, S Cir- cle/Square selected. Transitions are labeled as: (precondition)SRC −→ DST (effect), with input marked at tail. The variable V models the created state. 2.2 Variable Finite State Machine Variable Finite State Machine (VFSM) is a technique employed by Shehady et al. [39] to represent the GUI states of a GUI-based software application. The authors shows that VFSMs require fewer states to represent the GUI, are more intuitive for representing a GUI and can be easily used to generate testcases and detect faults in a GUI-based application. The key difference between FSM and VFSM is that VFSMs use of a set of global variables. The current value of these variables affect state transitions, which in turn may alter the value of the variables. A VFSM is represented as a 7-tuple, V FSM = (S, I, O, T,Φ, V, ζ). The symbols S, I , O posses the same semantics and properties as described for FSM (see Section 2.1). In addition, V is a set of n variables defined as V = {V1, V2, ..., Vn}, where each Vi is a set of values the i − th 17 global variable may assume. T is a state transition function, T = DT → S; Φ is an output function, Φ = DT → O; where DT ∈ S × I × V1 × V2 × ...× Vn. This indicates that the state transitions, T and the output function Φ both are function of the global variables V . In addition, ζ is a state transition function that determines if the state of the global variables are altered as the result of a transition. Figure 2.2 shows the VFSM for the Radio Button Demo application. In this figure, the state component C has been removed and is modelled with the variable V . A transition is labeled as precondition −→ effect, where the transition takes place only if the precondition is true with the postcondition being affected after the transition. VFSMs produce smaller state machine which are more compact than FSMs, while retaining a similar state space. The VFSM is converted into an equivalent FSM in order to generate testcases. This is done by expanding the set of states, S and set of transitions, T using V and Φ. 2.3 Complete Interaction Sequence White et al. [42] use Complete Interaction Sequence (CIS) as a method to prune the state space of a GUI-based application. The authors define a GUI responsibility as a GUI activity consisting of GUI objects that produces an observable effect on the GUI’s environment – memory usage, peripheral device activity, underlying business logic response. A responsibility is identified manually by the tester. For each identified responsibility, the sequence of GUI events that lead to that responsibility is called the Complete Interaction Sequence for that responsibility. A CIS can be tested individually by 1) manually identifying the GUI responsibilities in a GUI- based application 2) identifying the CIS for each responsibility 3) creating an FSM for a CIS 4) converting an FSM to a reduced FSM 5) testing a CIS using the reduced FSM to generate and execute testcases on the GUI-based application. The conversion of the FSM to the reduced FSM is an interesting modelling abstraction. This 18 conversion is done by identifying subFSMs in the FSM for the CIS. A set of states S = {S1, ..., Sn} in an FSM form a subFSM if there exists a directed path from Si to Sj , where Si, Sj ∈ S. In addition, a subFSM S possesses structural symmetry if 1) in S, there exists states S1 with only one incoming transition and S2 with only one outgoing transition and there exists multiple paths from S1 to S2 2) states outside S do not affect transitions within S 3) choice of paths taken between S1 and S2 do not affect states outside S. A subFSM with structural symmetry can be replaced with a single superstate in the FSM. The subFSM can now be tested in isolation. Identification and replacement of subFSMs reduce the overall complexity of the FSM for a CIS, resulting in a reduced FSM. The reduced FSM can also be tested by using any one path through a subFSM when its superstate is encountered in the reduced FSM. Using the reduced FSM, the authors generate design tests – which assume that the FSM was im- plemented as designed and implementation tests – which attempt to execute transitions not present in the design of the FSM. 2.4 Off-nominal Finite State Machine While most model-based GUI testing methods attempt to generate and execute testcases that invoke valid event sequences in the GUI of a GUI-based application, testing the GUI for invalid event sequences is also important. These testcases form a negative testsuite that verifies that the GUI does not permit a user to execute disallowed actions. Intuitively, generating invalid event sequences from a graph model of a GUI is straightforward. One needs to select an event-pair which does not share an edge and generate a testcase which contains this pair as an event sequence. Belli et al. [6] generate testcases for event sequences that are invalid. They argue that these sequences should be tested in addition to valid sequences. A CIS (see Section 2.3) is used for creating a Faulty Complete Interaction Sequence (FCIS). Given a GUI, a CIS and its corresponding 19 reset create square exit circle checkbox noyes Radio Button Demo Exit Confirmation terminal event follows relation initial event Figure 2.3: Event-Flow Graph for the Radio Button Demo application. FSM with valid transitions is constructed. Missing edges in the FSM, termed Faulty Interaction Pairs (FIPs), are identified. A testcase containing an FIP can be easily generated by first generating a testcase leading to the first event in the FIP. This testcase is then prefixed to the FIP, creating a testcase with an invalid event sequence. 2.5 Event-Flow Graph The Event-Flow Graph [43] is a GUI model, of a GUI-based application, that represents GUI events, and their sequences, that can be executed on the GUI. The Event-Flow Graph is a directed graph, where a vertex represents an event executable on the GUI. An edge ei → ej from vertex i 20 to j indicates that the event j is executable immediately after executing event i. Event j is said to follow event i. Intuitively, the Event-Flow Graph models the possible execution paths on the GUI. Formally, an Event-Flow Graph is defined as a triple < V,E,B >, where V is a set of vertices representing events on the GUI of a GUI-based application; E ∈ V × V is a set of directed edges representing the follows relation; B ∈ V is a set of vertices representing initial events, events that are available for execution immediately after the application is launched. The Event-Flow Graph for the Radio Button Demo application is shown in Figure 2.3. In this figure, events are shown as ovals, shaded ovals represent initial events, directed edges represents the follows relation. From this figure, it can be seen that event yes can be executed after exit. However, yes cannot be executed after create. These corresponds to the GUI itself. Valid testcases can be easily generated by traversing the Event-Flow Graph, for example square → create→ circle→ exit→ yes. The Event-Flow Graph of a GUI-based application can be extracted from the run-time state of the GUI, by a process of reverse engineering [26]. In this process, a monitor application, called the GUI Ripper launches the application, identifies its top-level windows and GUI state, executes all visible events and continues identifying new windows which may be created. The GUI Ripper con- tinues the process of executing events and extracting the GUI state of windows until as much of the GUI as possible has been traversed. This Event-Flow Graph is an approximation to the complete Event-Flow Graph, since the GUI Ripper may miss out some GUI windows and widgets [26]. The Event-Flow Graph represents valid executable event sequences on the GUI. Testcases can be easily generated from the Event-Flow Graph using different graph traversal algorithms starting from the initial events. Examples of graph traversal algorithms are goal-directed [33], random- walk [34]. Graph-pruning techniques based on the GUI’s behavior such as EIG [34] and ESIG [47], may also be used to reduce the state space of the Event-Flow Graph. 21 reset create square circle checkbox yes Radio Button Demo Exit Confirmation terminal event interacts relation initial event no exit Figure 2.4: Event-Interaction Graph for the Radio Button Demo application. 2.6 Event-Interaction Graph The Event-Flow Graph models all events and all possible event-sequences on the GUI. For typical GUI’s the number of event sequences of a given length can grow exponentially even with small event sequence lengths. This leads to large testsuites with impractical execution times. Xie et al. [43] empirically developed a method to prune the state space of the EFG. This in- creased the feasibility of generating testsuites with longer testcases and practical suite sizes. In their study the authors empirically concluded that structural GUI events – open/close menu item, open/close modeless windows – typically do not reveal defects in the application. This is likely be- cause these events are typically executed by popular libraries which are well tested and defect-free. 22 On the other hand, termination events – where model windows are closed – and system interaction events – where the GUI interacts with the underlying business logic – are more likely to reveal defects. The Event-Interaction Graph was developed to model this logic. Intuitively, an Event-Interaction Graph contains only termination and system interaction events. An edge, x→ y, between two events, x and y, indicate that event y is executable (not necessarily immediately) after event x. The edges in the Event-Interaction Graph models the interacts with relation. An Event-Interaction Graph can be easily derived from an Event-Flow Graph based on GUI properties of each event [44]. The Event-Interaction Graph for the Radio Button Demo ap- plication is shown in Figure 2.4. This was obtained from the Event-Flow Graph by removing the exit event, since this event opens a modal window and is classified as a structural event. Testcases are generated from the Event-Interaction Graph using graph traversal algorithms. Testcases generated directly from the Event-Interaction Graph may need to be augmented with missing structural events in order to make the testcase executable on the GUI. 2.7 Event-Semantic Interaction Graph The GUI model represented using an Event-Interaction Graph can be further pruned based on semantic relationship between GUI widgets. Yuan et al. [47] created a sparse graph, where an edge was present between two GUI events only if executing one influenced the execution result of the other. The resulting representation called Event Semantic Interaction Graph greatly reduced the state space and was useful for generating longer testcases with a practical testsuite size. Intuitively, if executing an event x affects the visible result of executing event y, then they are likely to share common program code or program state. In the Radio Button Demo ap- plication, the event handlers for the events square and create have common program state in the variables created and currentShape. As a result, executing the event sequences – 1) square 23 reset create square exit circle checkbox noyes Radio Button Demo Exit Confirmation terminal event initial event interacts relation Figure 2.5: Event-Semantic Interaction Graph for the Radio Button Demo application. in isolation 2) create in isolation 3) square → create – show different resulting visible states on the GUI (Figure 2.6). In this figure, (a) is the initial state of the application. The resulting visible states in (b) and (c) are different from (d). The event square, when executed before the event create makes the latter behave differently. The event square is said to interact with the event create. Such interacting events, in a GUI can be identified and modelled as an Event Semantic Interac- tion Graph (ESIG). The ESIG for the Radio Button Demo application is shown in Figure 2.5. As expected, there is an edge from the event square to the event create. The ESIG typically has fewer edges than the Event-Flow Graph and Event-Interaction Graph. Using this representation it is possible to generate longer testcases which specifically test interacting events – for example, (create, square) and (create, reset). 24 (a) (b) (c) (d) Figure 2.6: ESI relationship. (a) Initial application state (b) square executed (c) create executed (d) square→ create in sequence. square influences behaviour of create. 2.8 Planning Plan Generation [33] has been used by Memon et al. to generate testcases for GUI-based applications. This is a goal-driven approach where the tester creates testcases by specifying the intended GUI task to be performed. The test generator produces the sequence of events that brings the application from the given initial state to the goal state. The intuition in developing goal-driven testcase generation is that testers often find it easier to specify what needs to be done rather than how it needs to be done. Often, a GUI might present more than one, often convoluted, path for performing a task, which a tester may fail to test. Plan generation takes as input the starting GUI state and produces all possible paths for reaching the goal GUI state. Generating testcases using Plan Generation works in two phases. In the first phase, the tester uses domain knowledge of the GUI to create pre-condition and effect Plan Operators. Plan Op- erators are building blocks for defining state transitions in the GUI. An operator models the state 25 Initial Goal (a) Initial→ Square→ Create→ Goal Initial→ Create→ Square→ Goal (b) action pre-condition effect Square isAccessible(Radio Button Demo) focus(Square) if created→ render(square) Create isAccessible(Radio Button Demo) created← true if focus(square)→ render(square) if focus(circle)→ render(circle) (c) Figure 2.7: Plan generation for Radio Button Demo application (a) Initial and goal states (b) Two generated plans (c) Plan operators for GUI. of GUI entities that may trigger the operator. It also defines the effect of the operator on the state of GUI entities. In the second phase, the tester identifies tasks that need to be performed on the GUI by specifying the initial and goal state of the GUI. Thereafter, a plan generator, such as an AI planner, uses the plan operators and tasks to produce a set of testcases for each task. Figure 2.7 (a) shows an initial and goal state of the Radio Button Demo application. There are many paths, in theory infinite, of transitioning the GUI from this initial state to goal state. The tester can specify a testing goal with this pair of states. The Planner will generate a testcase shown in (b). Figure (c) shows an example of how plan operators for this GUI may be defined. 26 initial goal novice expert deviate operator Figure 2.8: Genetic algorithm emulating a novice user’s behavior to generate GUI testcase. 2.9 Genetic algorithm Testcases on a GUI are sequences of events. It is possible to model a testcase as a gene, where each event is an allele (or chromosome). The goal of the genetic algorithm is to produce a set of event sequences that satisfy a good testing or coverage criteria. Kasik et al. [24] have applied genetic algorithms to generate GUI testcases. In this work, the authors have attempted to emulate the event sequence a novice user would execute on the GUI. They argue, that a novice user would typically execute more events on the GUI in order to achieve a task than an expert user. The novice user’s event sequence would execute unpredictable paths which the developer would not have predicted. This could thus trigger untested execution sequences and hence reveal defects in the GUI. The algorithm begins creating a set of initial alleles, generating a reward score for each allele, replicating good alleles to the next generation of genes, applying mutation and crossover operators to enable better exploration of the search space. The goal of the genetic algorithm is to promote testcases that best resemble novice users. In this work, the authors claim that devising the best reward strategy for the alleles was a challenge. The reward system implemented a deviate strategy from the expert user’s path to emulate novice users. Figure 2.8 shows a typical path traversed by an expert user and by a novice user, to complete the same task. The genetic algorithm attempts to emulate the novice user by rewarding events that make the testcase deviate from the expert user’s path. 27 2.10 Covering arrays Yuan et al. [46] use covering arrays to generate long testcases. This approach can be used for testing an event in the context of other events. This is because an event, e1, may behave differently when executed after another event e2. In addition, a set of events, e1, e2 may behave differently when another event e3 executed before either of them. A covering array, CA(N ; t, k, v), is an N × k array on a set of v symbols such that every N × t sub-array contains all ordered subsets of all v symbols, at least once. This means that any t-columns of the array will contain all t-combinations of the v symbols. Using covering arrays, testcases are generated for testing the execution of a set of v events in t-way interaction such that each event occurs at each position in length-k testcases. The testcase generation workflow operates by partitioning a GUI of a GUI-based application. Typically, a GUI windows forms a partition. Each event in a partition is considered as a symbol in v. Thereafter, testcases of length k are generated by creating a covering array of dimension N ×k. Each row of this array is a testcase. N , the number generated testcases, is minimized using optimization techniques. The final output is a set of N , length-k testcases. Table 2.2 shows all 2-way interaction of events in the Radio Button Demo application’s Exit Conformation GUI window (partition) with three events – yes, no, (un)check. Length-4 testcase are generated to test the effect of placing each event at sequence location 1 through 4. To test the placement of 3 events in all positions of a length-4 testcase, P 43 = 81 testcases are required. However, using a covering array, CA(9; 2, 4, 3), only 9 testcases are required. 2.11 Summary Model-based testing of the GUI of GUI-based applications have received considerable attention over the past decade [5]. Most techniques create a representation of the GUI based on automated or 28 yes yes yes no yes (un)check no no no (un)check no yes (un)check (un)check (un)check yes (un)check no 2-way interaction yes yes yes yes yes no no (un)check yes (un)check (un)check no no no yes no no (un)check no yes no yes (un)check (un)check (un)check (un)check yes (un)check (un)check yes no no (un)check no (un)check yes CA(9; 2, 4, 3) testcases Table 2.2: Covering Array CA(9; 2, 4, 3) for the Exit Confirmation window of Radio Button Demo application. Only 9 testcases are required for testing 2-way interaction of 3 events at all 4 positions in a testcase. Exhaustive testing would require 81 testcases. manual analysis of the GUI. Subsequently, testcases for testing the GUI are generated based on this representation. Researchers have focused on 1) obtaining a concise and accurate representation of the GUI 2) developing state space reduction techniques and 3) study the generation of effective testcases based on the model. To the best of my knowledge, there has been no study that examines the interaction between GUI events and their interaction with program code of the GUI application being tested. The work presented in this study sets the stage for gaining a better understanding of this interaction. 29 Chapter 3 Event-Code Interaction This chapter discusses different approaches that were considered for generating long GUI test- cases. It also describes two models that are used for generating the testcases and evaluating their characteristics. The first is the Event-Code Interaction (ECI) model. It is used for modelling the GUI of an application. The second is hierarchical signatures. It is a method for enumerating the visible GUI states of an application. 3.1 Goals The goal of this study is to develop a method for generating long testcases that can detect faults in event handlers of GUI-based applications. Testcases are generated based on the Event Code Interaction model created by observing code coverage metrics, from a seed testsuite, in in the program code of the application. The methods presented in this study have the following characteristics. Model: This study seeks to find a relationship between events executed on the GUI of a GUI- based application and the program code executed on its behalf. Specifically, interactions, called Event-Code Interactions, between the program code for different event handlers are examined. These interactions are modelled as an Event-Code Interaction Graph (ECIG). Effectiveness: Testcases generated based on the ECIG must be able to detect faults that are not detectable by other comparable techniques. The nature of generated testcases will target 30 the application, with better testcases, where other techniques may be inadequate. In other words, the model will generate long, useful testcases. Efficiency: The steps required to analyze the application, create the ECIG model, use the model for testcase generation and execution must complete within a reasonable time. In addition, the generated testsuite must not contain duplicate or non-executable testcases. Automation: The goal of this study is to develop a technique that is applicable to real-world, large-scale applications. Such applications may need to be tested daily (or more frequently) in a regression testing environment. The workflow must be fully automated to make this technique amenable for integration into third-party test harnesses. The above goals aim to enhance existing model-based testcase generation techniques. In addi- tion, it seeks to establish a new area of investigation for studying the relationship between events and program code that respond to the events. 3.2 Approaches This work studies the relationship between events and corresponding program code of an ap- plication by first executing a seed testsuite on the application. The seed testsuite is an EIG-based SequenceLength-2 testsuite, that can be easily generated and executed. Code coverage metrics reported from execution of the seed testsuite are analyzed to infer ECI relations between events. The following sections list questions that were asked before arriving at the above strategy. 3.2.1 Approach 1: Does one event influence another event? During the execution of an event e1, its event handler might read, modify and write program variables. Some of these program variable may also be accessible by the event handler of another event e2. 31 It would be interesting to study if the event handler for e1 affects the event handler for e2. First, an event e2 is executed immediately after launching an application and its behavior is recorded. In a subsequent execution, after launching the application, event e1 is executed followed by e2. Will e2 behave differently in the second execution instance? If there is a difference, then execution of event e1 influences the execution of event e2 and executing these two events in a testcase may reveal faults that are not triggered when either event is executed in isolation. This approach forms the basis for the Event-Code Interaction method presented in this study. 3.2.2 Approach 2: Do multiple events in combination influence another event? Given a set of events E = {e1, ..., en}, each individual event ei, 0 < i < n, may not posses any interaction with another event e. However, when the events in E are executed together, in that order, they may influence the behavior of event e. In this instance, the combination of a set of events interact with another event. Hence, a testcase that contains all events in E, in that order, followed by the event e, exercises the interaction between E and e. This approach is an alternative to the Event-Code Interaction method and seeks to find com- plex interactions between events at the program-code level. It is not evaluated in this study (see Chapter 6). 3.2.3 How can a variation in event execution be detected? This section lists four approaches using which an event e1 can be detected as interacting with another event e2. An event, e1, is said to interact with another event, e2, if the execution of e1 im- mediately before e2 makes e2’s behavior different than when e2 is executed in isolation. Different metrics may be employed to signal the difference in behavior of e2 in these two instances. 32 1. Approach L1 (unique-lines-of-code): In this method, the unique lines of program-code executed, as a result of invoking an event, is recorded. A distinction between single and multiple-executions of the same line is not made. In addition, the order of execution of the lines is not recorded. Consider the following code fragment, taken from an event handler: 1 2 3 sub a(N) for (i = 0; i < N; i++) s += i 4 5 6 sub b(N) for (i = 0; i < N; i++) s += i In this code, a and b are two functions. The lines of code executed by an event handler in three different invocations A, B, C by calling the functions a, b, with different input param- eters, are given below. The metric of code coverage, L1, will not distinguish between the three different executions, since the set of unique lines executed is the same, {1, 2, 3, 4, 5, 6}, in all cases. A B C a(1); b(2) a(2); b(2) b(2); a(1) 1,2,3, 4,5,6,5,6 1,2,3,2,3, 4,5,6,5,6 4,5,6,5,6, 1,2,3 2. Approach L2 (line-hit-count): In this method, the number of times a program line is ex- ecuted, is recorded. The sequences in which lines are executed are not recorded. In the example above, A, C are considered identical and distinct from B. 3. Approach L3 (line-sequence): In this method, the order of execution of program lines is recorded. Two executions are considered distinct if there is a difference in the recorded execution order. In the example above, A, B, C are all considered distinct. 4. Approach S (program-state): In this method, program state is recorded after invoking an event handler. The program state may include global data structures, heap and static storage. In the example above, the variable s can be considered as the only program state. By record- ing the value of s after each event handler terminates, A, C (s= 3) are considered identical and distinct from B (s= 4). 33 3.3 Event-Code Interaction An event in a GUI-based application is typically associated with an event handler. An event handler is a program function that is executed by the application or operating system when an event is executed on the application. This function may in turn execute other functions, spawn threads and trigger timer-based code execution. Hence, an application-level event causes a set of lines of program code to be executed in the application and possibly in the platform. This study is restricted to program code that executes at the application level. Different events will trigger their own event handlers to be executed. The program code executed in response to different events may contain shared pieces of code. They may also share application state such as objects and data structures, which are accessed by the event handler when an event is executed. Shared program code and program state between different event handlers create a possibility of interactions between them. For example: Program code : An event handler may acquire a lock which is also required by another event handler sharing code. Hence, execution of one event handler may delay the execution of the other event handler. Program state : An event handler may modify an object required by another event handler. This modification may affect the execution of the other event handler. The Radio Button Demo application shown in Figure 1.1 will be used as a running exam- ple in this chapter. Appendix A lists the program code for the Radio Button Demo applica- tion. The Radio Button Demo application is written using the Java programming language. It consists of one file, RadioButtonDemo.java. This file contains one main class, RadioButton- Demo. The main class contains eight sub-classes. Of these eight sub-classes, five sub-classes de- fine events handlers – W0Listener, W1Listener, W2Listener, W3Listener, W5Listener. The event handlers correspond to the widgets w0, w1, w2, w3, w5, shown in Figure 1.1. Three sub-classes define GUI widgets – CirclePanel, SquarePanel, EmptyPanel. The file RadioButtonDemo.java has 34 239 uncommented lines of source code. Appendix A also shows lines of code executed in response to certain events. For example, when the Radio Button Demo application is launched and the event sequence square −→ create is executed, the lines of code in the column A are executed. An ×, on a given row indicates that that line was executed at least once. In this study, code coverage measurements follow the method Approach L1 (unique-lines-of-code) described in Section 3.2. 3.3.1 Event-Code Interaction An event e1, belonging to an application under test, A, is said to interact with another event e2, belonging to A, at a program code level, if the execution of event e1 alters the lines of code executed by event e2. This interaction is represented as an edge e1 −→ e2 from event e1 to event e2. Altered lines of code execution is determined using the Approach L1 (unique-lines-of-code). 3.3.2 Event-Code Interaction Graph An ECIG is a graph representing ECI relations between the events of an application under test, A. In this graph, vertices represent events belonging to the GUI of the application. An edge is present from an event e1 to another event e2 if there exists an ECI relation from e1 to e2. Definition: An Event-Code Interaction Graph, ECIG, for an application under test is defined as a triple, ECIG = (V,E, I), where: 1. V = {v1, ..., vn} is a set of n vertices such that each vertex represents an event in the appli- cation under test. 2. E = {e1, ..., em} and E ∈ V × V is a set of m directed edges. A directed edge e from vertex v1 to v2, represented by v1 −→ v2 represents an ECI relation between events represented by the vertices v1 and v2. 35 3. I ⊆ V is a set of vertices representing initial events that are executable immediately after the application is launched.  Figure 3.1 shows the ECIG for the Radio Button Demo application. This ECIG was em- pirically determined. In this figure, V = {square, circle, create, exit, reset, yes, no, checkbox}, E = {square −→ create, circle −→ create, create −→ square, create −→ circle}, I = {square, circle, create, exit, reset}. Algorithm: Given an application on which events can be executed, a simple algorithm can be followed to determine if one of its event, e1, interacts with another of its event, e2: e1 −→ e2: Execute event e1 as the first event immediately after launching the application. This might require initial events to be executed in order to reach the event e1. Follow e1 imme- diately with event e2. Record the lines of code executed by event handler for e2 – call it set X . e2: Execute event e2 as the first event immediately after launching the application. This might require initial events to be executed in order to reach the event e2. Record the lines of code executed by the event handler of e2 – call it set Y . ECI predicates: The following three predicates indicate whether the execution of event e1 interacts with the execution of event e2. 1. X − Y 6= Φ. This indicates that certain lines of code were executed when the sequence e1 −→ e2 was executed (X), but were not executed when e2 was executed in isolation (Y). 2. Y −X 6= Φ. This indicates that certain lines of code were executed when e2 was executed in isolation (Y), but were not executed when the sequence e1 −→ e2 was executed (X). 3. X 6= Y . This condition indicates that the lines of code executed by e2 in isolation and by e1 −→ e2 are identical. However, there exists a set of lines, for which the hit count between X and Y are different. 36 In this study, ECI predicates 1 and 2 will be used to detect interaction. Hence, if ECI predicates 1 or 2 are true then the event e1 is said to interact with the event e2 at the program code level.  Example: As an illustration, consider the program code for the Radio Button Demo ap- plication. Columns (B) and (C) from Appendix A shows the following two code coverage infor- mation: B : Shows the lines of code executed by the event handler for create when the event sequence circle −→ create is executed after launching the application. C : Shows the lines of code executed by the event handler for create when the event create is executed after launching the application. It can be seen that the lines of code executed by the event create in isolation (C) are different from the lines of code executed by the same event handler when the event sequence circle −→ create (B). The state variable currentShape affects the execution of create in (B) at lines 133, 137− 8, 141. Additional lines of code – such as the method CirclePanel.paintComponent() – are executed in (B). The program state currentShape is set to Shape.CIRCLE by the event circle influencing the event create to execute different lines of code. This example shows that the event circle interacts with the event create. Hence an edge circle −→ create will be added to the ECIG for the Radio Button Demo application in Figure 3.1. The other 3 remaining edges are added based on a similar observation.  Some event pairs do not have any ECI relations. For example, the event square does not interact with the event circle. In Appendix A, column (D) shows the lines of code executed by the event handler for circle when the event sequence square −→ circle is executed. This is identical to column (E) when circle is executed in isolation. Since (D) and (E) are identical, the event square has no interaction with the event circle. Contribution 1: The ECI paradigm defined in this section models interactions between program- code (event handler) corresponding to GUI events. 37 reset create square exit circle checkbox noyes Radio Button Demo Exit Confirmation terminal event initial event interacts relation Figure 3.1: Event-Code Interaction Graph for the Radio Button Demo application. 3.3.3 Composite Event-Code Interaction The concept of Event-Code Interaction, where an event e1 is said to interact with another event e2, can be extended to composite events. Definition: A composite event, CE, in an application under test, is defined as an ordered set of events, {E1, E2, ..., En} where each constituent event Ei, for 1 ≤ i ≤ n, is executed in sequence without executing any other event in between two successive constituent event.  Composite ECI predicates: A composite event CE1 = {E1, E2, ..., En} is said to interact with another event e2 of the application under test, at the program code level, if both the following predicates are true: • The execution of the composite event E1 alters the lines of code executed by the event e2. 38 reset create square exit circle checkbox noyes Radio Button Demo Exit Confirmation terminal event initial event interacts relation Figure 3.2: Event-Code Interaction Graph for length-2 composite events in the Radio Button Demo application. • There are no ECI relations for any ordered subset of CE1 (E1 −→ e2, ..., En −→ e2, {E1, E2} −→ e2 and so on). Intuitively, this means that a set of constituent events independently do not interact with another event e2. However, when the constituent events are executed as a composite event, their combined execution does interact with the event e2.  Example: Consider the Radio Button Demo application example. Composite events in this application of different lengths are like – {circle, create} of length-2, {create, exit, no} of length-3, {create, circle, reset, exit} of length-4. For simplicity of description, consider composite events of length-2. Figure 3.2 shows the interacts relation of the Radio Button 39 Demo application with length-2 composite events. The following three sets of interacts relations were empirically determined: 1. {create, circle} −→ exit 2. {create, square} −→ exit 3. {square, create} −→ exit; {circle, create} −→ exit In this figure, each set from the list above is shown in a distinct color. An interacts relation is shown as a set of two edges with the same color. The first edge is undirected and connects the two vertices of the composite event. The second edge is directed and connect the last event of the composite event and the interacting event. For example, the composite event {circle, create} interacts with exit. It is shown as circle — create −→ exit. In this example, none of the constituent events – create, circle, square – of the composite events interact with the event exit. As an illustration of a composite event interacting with another event, consider the ECI {circle, create} −→ exit. In Appendix A, column (F) shows the lines executed by the event handler for the event exit when the event sequence circle −→ create −→ exit is executed. Column (G) shows the lines of code executed by the event handler for exit when executed in isolation. It can be seen that (F) and (G) are different. In (F) the line 163 is executed, in addition to the lines from (G). This indicates that the composite event {circle, create} interacts with the event exit. From Figure 3.1, it can be seen that the constituent events, circle and create do not interact with the event exit.  Composite ECI relations described in this section are not evaluated in this study. It is a concept that can be investigated in future (see Chapter 6). 40 3.4 GUI states A visible GUI state (simply, GUI state) consists of all visible GUI windows and their widgets. A GUI moves from one visible state to another as events are executed on it. The usefulness of a testsuite, under study, is typically measured by comparing its execution results with that of another testsuite, obtained by a known method. Typical metrics that help assess the usefulness are 1) code coverage 2) fault detection with mutation testing and 3) performance metrics such as execution time and testsuite size. Execution of long GUI testcases can, intuitively, lead the application to reach GUI states that are not reachable using short testcases. In this study, ECIG based testcases will be considered useful, if they can drive the GUI, of a GUI-based appli- cation, into states that were not possible using other comparable methods. This section describes a fast method to detect if new GUI states were reached, while executing a target testcase. In this method, a baseline testsuite is first executed on the application under test and all GUI states reached by the application are recorded. Thereafter, a target testsuite is executed. If GUI states are reached, using the target testsuite, that were not seen while executing the baseline testsuite, then those states are considered as new GUI states. 3.4.1 Challenges Comparing two sets of GUI states, to determine if one set contains states not present in the other set, poses the following practical challenges: Resource: The GUI state reached by the GUI of an application under test is recorded after each event of a testcase is executed. For example, a testsuite with 1000 testcases where each testcase contains 2 events will require 2 ∗ 1000 = 2000 GUI states to be recorded. When a target test- case with 2 events is executed, the GUI state after executing each event must be compared with 2000 recorded states to determine if the target testcase drove the GUI into a state that was not en- 41 countered by the baseline testsuite. For a target testsuite with 1000 testcase of length-2 each, this translates to 1000 ∗ 2 ∗ 1000 ∗ 2 = 4 million comparison of GUI states. This is an O(n2) algorithm and is computationally resource intensive. Platform: Comparing two specific GUI states to determine if they are different (or identical) poses certain challenges. Since GUIs are hierarchical in nature, each state may contain GUI containers that may contain sub-containers that may finally contain GUI widgets. Let us consider two hypo- thetical and visually identical GUI states – A and B – that contain only widgets – a, b, c, d – as follows: A = {a, b, c, d} B = {a, b, d, c} The two GUI states contain the same widgets and are visually identical. However, the software layer that renders and extracts the GUI components may report a different ordering. The algo- rithmic complexity to determine if the two states are different (or identical) is non-trivial. This is because a real-life GUI may contain many levels of hierarchy and ordering differences at any level. For example, if A and B were given as: A = { {a, b}, {c, d}, { {e, f}, {g, h} } } B = { {c, d}, { {g, h}, {e, f} }, {a, b} } A and B could be visually identical. However, a software layer may change the ordering of the components when queried. The above two challenges motivate the development of a fast and simple method to compare a GUI state with a set of known GUI states. 42 3.4.2 Hierarchical signature A GUI widget is associated with a set of attributes and corresponding values. Consider a button widget called Exit that has been extracted from the GUI state of an application. The widget has the attributes width, height, color, text with corresponding values 100, 30, 0xfff, Exit. It is possible to create a signature from these values as follows: φ = {width, 100}, {height, 30}, {color, 0xfff}, {text, Exit} (3.1) where φ is the set of attributes and corresponding values of the attributes, of the widget Exit. φ′ = {i , j}, {k , l}, {m, n}, {o, p} (3.2) where φ′ is a set containing a signature, computed from the set φ, such that i = hash(width), j = hash(100) and so on. Here, hash is a suitable hash function to convert the attribute and value into an integer. h = i+ j + k + l +m+ n+ o+ p (3.3) where h is a signature computed from the attributes and values of the widget Exit. Using the above three steps, a signature can be computed for any widget of a GUI application. As a result of the commutative operator +, the signature h is insensitive to the order in which the attributes appear. Extending the above concept, it is possible to compute a signature for each component of the hierarchical GUI structure of a GUI state. Computing the signature begins at the leaf widget and continues upwards until the GUI windows are reached. Figure 3.3 (a) shows the set of signatures computed for the GUI component hierarchy of RadioButton Demo’s initial GUI window. Figure 3.3 (b) shows the hierarchical nature of 43 h 1 h 6 h 7 h 4 h 5 h 2 h 3 h 8 h 10 h 9 H (a) Create Reset Exit Circle Square shape Container Container Container Title h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 Visible GUI H (b) Figure 3.3: Signatures computed for the hierarchical GUI structure of RadioButton Demo’s initial state. It contains one GUI window. (a) Visible GUI window with signatures of widgets and containers. (b) Tree showing hierarchical nature of GUI windows with signature of widgets and containers. 44 h 1 h 6 h 7 h 4 h 5 h 2 h 3 h 8 h 10 h 9 h 11 H (a) Container h 10 Visible GUI H Container h 11 (b) Figure 3.4: Signatures computed for the hierarchical GUI structure of RadioButton Demo after executing the Exit button. It contains two GUI window. (a) Visible GUI windows with signatures of widgets and containers. (b) Tree showing hierarchical nature of GUI windows with signature of widgets and containers. 45 this GUI window, with the corresponding signatures. In these figures, h1 – h7 are signatures com- puted for the leaf-level widgets. Intermediate GUI containers are shown with signatures h8 and h9. The sole GUI window has the signature h10. The final signature of the complete (with only one GUI window) GUI state is H . Figure 3.4 (a) and (b) shows the computation of signature for RadioButton Demowhen the Exit dialog box has been created. In this state, the GUI has two windows. Figure 3.4 (a) shows the two windows with corresponding signatures. In this figure, h10 and h11 are the signatures of the GUI windows. The final signature of the GUI state is H , obtained by combining h10 and h11. This section shows a method to compute signatures for a hierarchical GUI that is visible during the execution of a testcase. The next section shows how the signatures can be used to determine the presence of a new GUI state, during the execution of a testcase. 3.4.3 New GUI states Hierarchical signatures provide a fast and simple method to identify the presence of new GUI states reached by the GUI, when comparing the executions of a baseline and target testsuite. The following three steps are followed: Step 1: First, the baseline testsuite is executed. After executing an event from a testcase in this testsuite, the procedure sigUpdate from Figure 3.5, is executed as: sigUpdate(baselineSignature, G) (3.4) where baselineSignature is a table for storing all signatures obtained from the baseline testsuite and G is the complete state of the GUI after executing the event. The procedure sigUpdate accepts as input 1) G, a set of GUI windows of the application, that are visible after executing an event from the testcase and 2) signature, a reference to the 46 PROCEDURE sigUpdate IN: signature = { signatures table } IN: G = { GUI windows} H = 0 foreach g ∈ G 1 H+ = hash(g) 2 if (!exists(signature, H)) 3 insert(signature, H) 4 PROCEDURE hash IN: C: widget or container OUT: hash value of C h = 0 if isWidget(C) 1 A = attributeList(C) 2 foreach a ∈ A 3 v = value(a) 4 h+ = CRC32(a) 5 h+ = CRC32(v) 6 else G = componentList(C) 7 foreach g ∈ G 8 h+ = hash(g) 9 return h PROCEDURE listNew IN: B: signature table of baseline IN: T : signature table of target foreach t ∈ T 1 if (!exists(B, t)) 2 print new state t 3 Algorithm run-times: O(log m) - exists, insert O(n log m) - sigUpdate O(1) - hash where: n - # testcases in a testsuite m - # unique signatures in signature table Algorithm run-times: O(n log m) - listNew where: n - # testcases in target testsuite m - # testcases in baseline testsuite Figure 3.5: Procedures to detect if an application reached new GUI states after executing a target testsuite. Procedure sigUpdate computes a signature of the GUI state after the execution of an event. Procedure listNew compares the signatures of a target testsuite with a baseline testsuite. table where the computed signatures will be stored by the procedure. In lines 1 − 2, the hash procedure is invoked for each GUI window from the set G. In line 3 − 4, the computed signature of the complete visible GUI state is inserted into the table, if the signature was not already present. The procedure hash accepts as input C, a GUI container (such as a GUI window) or leaf-level widget. It returns the computed signature of that container or widget. The variable h stores the running value of the signature for C. Lines 1 − 6 process C if it is a leaf-level widget. In line 2, a list of attributes of the widget is obtained. For each attribute a, lines 4 − 6 obtain the value associated with the attribute and update h with the signature of the attribute and value. The CRC32 algorithm is used for computing the signature. Lines 7 − 9 process C if it were a GUI container 47 such as a window. For every sub-container, g in C, its signature is computed and updated into h. During execution of the baseline testsuite, the procedure sigUpdate computes the signature of the complete visible state of the GUI after execution of each event in a testcase. It is then stored in a table called baselineSignature. When execution of the baseline testsuite completes, the table baselineSignature contains signatures of the GUI state visible after execution of each event of every testcase. Step 2: Second, the target testsuite is executed in a similar manner as the baseline testsuite. In this case, the procedure sigUpdate is executed as: sigUpdate(targetSignature, G) (3.5) where targetSignature is a table for storing signatures of the visible GUI state after execut- ing each event of every testcase from the target testsuite. Step 3: Finally, after both baseline and target testsuites have completed execution, an analysis is made to determine if new GUI states were reached during the execution of the target testsuite. This is done by invoking the procedure listNew in Figure 3.5. The procedure listNew accepts as input, two tables, the first containing a set of all signatures from the baseline testsuite and the second containing a list of all signatures from the target testsuite. It is invoked as: listNew(baselineSignature, targetSignature) (3.6) In lines 1 − 3, if a signature from the target table is not found in the baseline table, then it is determined to be a new GUI state. The procedures sigUpdate and listNew provide a fast and simple method to enumer- ate GUI states that were encountered using a target testsuite, but were not encountered using the baseline testsuite. These new states are ones that the target testsuite drove the GUI into, but the baseline testsuite was not able to. Procedure sigUpdate has complexity O(nlogm), where n is 48 the number of events in the given testsuite and m is the number of unique GUI signatures that are encountered by the testsuite. The procedure listNew similarly has a complexity of O(nlogm), where n and m are the number of unique GUI signatures encountered in the baseline and target testsuite respectively. This section introduces hierarchical signatures, to identify new GUI states that are encountered while executing ECIG-based testsuites. Identification of new GUI states will show that the ECIG- based testsuites are able to drive the application into GUI states that was not possible using a comparable method. 3.5 Example This section shows a simple example of an ECIG model-based long testcase that is useful in detecting a software defect. The example: 1. finds a new GUI state 2. shows why the ECIG model chose to create this testcase 3. shows how a software defect in the event handler could manifest as a defect in the GUI state. 1) Find new GUI state: A length-3 testcase based on ECIG model was created for the Java ap- plication JEdit (see Chapter 5). The testcase has the following event sequence: Search −→ Find −→ Auto Wrap −→ Regular Expressions −→ Search −→ Whole Word The underlined events form the length-3 event sequence generated by the model, asAutoWrap −→ Regular Expression −→ Whole Word. The intermediate events are filled in so that underlined events are reachable from the previous underlined event. This is considered as a long testcase, 49 since typical EFG (or EIG) based testcases are of length-2. It was executed immediately after launching JEdit, that is, from a clean known state of the application. Figure 3.6 shows visible states of JEdit taken after executing events Auto Wrap (as e1, at bottom-left), Regular Expressions (as e2, at bottom-right) and Whole Word (as e3, at top-right). After executing the first event, the checkbox Auto Wrap becomes checked. It is labelled with a large 1. After executing the second event, the checkbox Regular Expressions, labelled with a 2, becomes checked. After executing the third event, two checkboxes Whole Word, labelled with 3 and 4, become checked. Altogether 4 checkboxes, 1 − 4, are checked. The final GUI state, achieved after executing the third event, is a GUI state that cannot be reached by executing any length-2 event. For example, executing the length-2 testcase, Auto Wrap −→ Regular Expressions would result in only two checkboxes, 1 and 2, being checked. Executing Auto Wrap −→ Whole World would result in only checkboxes 1, 3 and 4 being checked. Hence, in this example, the ECIG-based length-3 testcase is able to drive the application to a GUI state that is not possible with a length-2 testcase. 50 new state. not created by: e1 -> e2 or e1 -> e3 or e2 -> e3 init e2: regular expressionse1: auto wrap e3: whole word 1 2 3 4 Figure 3.6: ECIG-based length-3 testcase – Auto Wrap −→ Regular Expressions −→Whole Word – executed on JEdit. Resulting GUI states are shown – initial state (top-left), after executing Auto Wrap (bottom-left), after executing Regular Expressions (bottom-right), after executing Whole Word (top-right). Four checkboxes, labelled 1 − 4, become checked as a result of the testcase execution. The final state, is a new GUI state, reached by the length-3 testcase. It is not reachable using existing length-2 testcase. 51 2) Why this testcase?: The ECIG model selected the two event sequences Auto Wrap−→ Reg- ular Expressions and Regular Expressions −→Whole Word. These two event sequences are edges in the ECIG. Q: Why were these two edges (or event sequences) present in the ECIG? A: Because the preceding event influences the succeeding event at the program-code level. As an example, consider the event sequence (or ECIG edge) Regular Expressions −→ Whole Word. Table 3.1 shows part of the event handler for the event Whole Word, in JEdit’s code- base. During the construction of the ECIG for JEdit, the event sequences (a) Regular Expressions −→ Whole Word and (b) Whole Word were independently executed on the initial state of JEdit, and the code coverage was recorded after executing each event, in each instance. The figure shows part of the lines of code that were executed by the event handler for Whole Word. The highlighted (with ∗) lines of code were executed in instance (a) but not in instance (b). This is because execution of the event Regular Expressions before the execution of the event Whole Word influences the latter. In fact, the function handleSearchSettingsChanged is invoked by Whole Word, only if the dialog box Search and Replace is created before its execution. Since Regular Expressions first creates the dialog box, the execution of Whole Word is affected. Hence, Regular Expressions influences (or interacts with) Whole Word at the program-code level. This merits inclusion of the edge Regular Expressions −→ Whole Word in the ECIG for JEdit. 3) Detect software defect: The testcase in this example can detect a defect. Consider the seeded fault shown in Table 3.2. This is a modified version of the code from Table 3.1. The fault is seeded in line 249, where the ! is removed in the defective code. This line of code will be executed by any testcase containing the event sequence Regular Expressions −→ Whole Word. When executed by the testcase in this example, the resulting final GUI state is shown in Figure 3.7. In this figure, the checkbox labelled 4 is not checked, as it should be (from Figure 3.6, top-right). 52 JEdit: SearchDialog.java 247 248 249* 250* 251* 924 925 926* 927* ...* 984* 985* 986* public void handleSearchSettingsChanged(EBMessage msg) { if (!saving) load(); } private void load() { wholeWord.setSelected(SearchAndReplace.getWholeWord()); ignoreCase.setSelected(SearchAndReplace.getIgnoreCase()); ... keepDialog.setSelected(jEdit.getBooleanProperty( "search.keepDialog.toggle")); } Table 3.1: Partial code from event handler of Whole Word (Edit menu) checkbox event. This defective GUI state can be detected by comparing the defective state with an expected (oracle) state using the hierarchical signature method described in the previous section. This section shows the example of an ECIG-based long testcase that drives the GUI of the application into a new state and proves useful in detecting a fault. 53 JEdit: SearchDialog.java 247 248 249* 250 251 924 925 926 927 ... 984 985 986 public void handleSearchSettingsChanged(EBMessage msg) { if ( saving) load(); } private void load() { wholeWord.setSelected(SearchAndReplace.getWholeWord()); ignoreCase.setSelected(SearchAndReplace.getIgnoreCase()); ... keepDialog.setSelected(jEdit.getBooleanProperty( "search.keepDialog.toggle")); } Table 3.2: Fault is seeded at line 249. The ! has been removed.code fault appears in GUI state SearchDialog.java 245 //{{{ handleSearchSettingsChanged() method 246 @EBHandler 247 public void handleSearchSettingsChanged(EBMessage msg) 248 { 249 if( ! saving ) 250 load(); 251 } //}}} code fault incorrect GUI state 3 4 Figure 3.7: Software defect being manifested as a GUI defect. The checkbox 4 is not checked because of the defect. 54 Chapter 4 Tools and testbeds This chapter describes components required for an empirical evaluation of ECIG-based test- cases. The evaluation requires software components, a workflow that joins these components and a testbed where the software is executed. Open source software components were used for the evaluation. They were integrated into a production quality, fully automated testbed. 4.1 Tools Two open source tools, Cobertura and GUITAR, were used in the evaluation. Gephi1, an open source graph management tool was used for rendering the EFG, EIG and ECIG for visualization. 4.1.1 Cobertura A code-coverage tool, Cobertura2, is used for recording the lines of code executed in response to events. For each uncommented line of program code, Cobertura records the number of times the line was executed. 1gephi.github.io 2cobertura.sourceforge.net 55 4.1.2 GUITAR GUITAR3, a GUI Testing frAmewoRk, is a tool for modelling and testing GUI-based applica- tions. The flexible plugin-based architecture [35] of GUITAR 1) enables developers to implement their own new GUI models and 2) generate testcases based on the new models. The ECIG has been implemented as a GUITAR plugin. It has been used in the evaluation in this study. At a high level GUITAR contains four component tools – GUI Ripper, Graph Converter, Test Case Generator and Replayer. The GUI Ripper and Replayer are components that contain platform-dependent hooks that interact with the platform’s GUI APIs. The Graph Converter and Test Case Generator are platform-independent components that work with abstract GUI models and do not need to interact with the platform. All components of GUITAR operate in a fully au- tomated manner and integrate well with software testing harnesses. GUITAR works with 6 GUI platforms – Java (JFC, SWT), Web, iOS, Android and UNO (Open Office). The GUI Ripper is a tool that extracts structural and semantic GUI information from the run- time execution state of a GUI-based application under test. The GUI Ripper 1) automatically launches the application, 2) identifies its top-level GUI windows, 3) extracts structural and seman- tic information about each window and its constituent widgets, 4) invokes executable events on widgets, 5) detects new GUI windows that may be created as a result of the executed events and repeats steps 3− 5 on any newly created GUI windows. After repeating these steps, the GUI Rip- per produces a GUI Tree, modelling structural information about the GUI of the application under test. The GUI Ripper contains platform-dependent components that invoke platform’s GUI APIs to extract GUI information from the runtime application state and also execute events on the GUI. The Graph Converter extracts semantic information about GUI widgets and events from a GUI Tree and models it as a graph, such as an EFG (see Section 2.5). It works on GUI models and does not require any interaction with the platform. Other semantic models, such as EIG, can be 3guitar.sourceforge.net 56 extracted and modelled by implementing an appropriate plugin for the Graph Converter. The Test Case Generator is a platform-independent component that generates executable test- cases based on a GUI model such as EFG. The Test Case Generator traverses the GUI model and produces testcases which are sequences of events executable along the traversal path. Plugins for the Test Case Generator can be implemented by the tester to realize different testcase generation logic such as a random walk and all-event coverage. The Replayer is a platform-dependent component that executes testcases on the application under test. It invokes platform-specific GUI APIs to identify widgets on which GUI events from the testcase needs to be executed and executes the event. It records whether the application en- countered an unhandled exception or was simply terminated as a result of an exception, while the testcase was being executed. It also records the resulting GUI states after executing each event in the testcase. The GUI state recorded by the Replayer shows how the GUI was explored by the testcase while it was being executed. These GUI state are provided as input to the procedure sigUpdate, as described in Section 3.4. In this study, the usefulness of a testcase is measured by its ability to explore new GUI states. The four GUITAR components described above can be integrated into test harnesses, such as Jenkins4, to produce automated workflows for testing the GUI of GUI-based applications. 4.2 Workflow GUITAR is designed using independent components that can be linked to tailor a suitable workflow. In this study, two GUITAR workflows have been used. The standard workflow was used to execute the seed testsuite. The ECIG extension was used to execute ECIG-based long testcases. 4jenkins-ci.org/ 57 ECIG System Under Test GUI Tree GUI Replayer Test cases Test Cases Coverage Analysis Graph Converter (EIG) GUI Ripper EIG Test Case Generator Coverage ECI ECIG Graph Converter (ECIG) Standard workflow F e e d b a c k StopStart GUI State Logs ECIG extension BA I II III IV Figure 4.1: Typical GUITAR workflows. The Standard workflow enables common GUI testing ac- tivities. An extension of the standard workflow is developed as the ECIG extension. I–IV represent 4 distinct segments of the workflows. 4.2.1 Standard workflow Figure 4.1 shows a schematic diagram of typical workflows that use GUITAR to test the GUI of a GUI-based application. This figure shows two workflows – standard workflow and ECIG extension. The standard workflow follows the segments I–III shown in Table 4.1, which corresponds to Figure 4.1.. It is a basic testing strategy that can be used for testing the GUI of GUI-based ap- plications. In this workflow, the four GUITAR components – GUI Ripper, Graph Converter (with EIGConverter plugin), Test Case Generator (with SequenceLength Generator plugin), Replayer – are shown in oval, with the testing artifacts they consume and produce, in rectangles. In Segment I, the GUI Ripper takes the application under test and reverse engineers it to pro- duce a GUI model, the GUI Tree. A Graph Converter plugin consumes the GUI Tree to produces a graph. The EIGConverter is used, hence an EIG is produced. The EIG models interactions between events that are not structural GUI events. 58 I Start −→ Application Under Test −→ GUI Ripper −→ GUI Tree −→ EIG Graph Converter −→ EIG −→ II SequenceLength Test Case Generator −→ Testcases −→ Replayer −→ III Logs −→ Stop Table 4.1: Standard workflow path corresponding to Figure 4.1. Each segment of the path is labeled I–III. The workflow is: I −→ II −→ III. I −→ II −→ IV Coverage −→ Coverage Analysis −→ ECI −→ ECIG Graph Converter −→ ECIG −→ II −→ III Table 4.2: ECIG extension corresponding to Figure 4.1. The workflow is I −→ II −→ IV −→ II −→ III. In Segment II, a Test Case Generator plugin, the SequenceLength Generator, traverses the EIG to generate a testsuite that contains all possible testcases of a specified length. For example, when the specified length is 2, the testsuite will contain testcases that together cover all event pairs interactions from the EIG. The Replayer executes all testcases on the application under test. It records the resulting GUI state of the application, after executing each event. In Segment III, logs produced by the Replayer are automatically analyzed, shown with the edge A, to detect exceptions and crashes. The GUI state recorded by the replayer in the stan- dard workflow can be used as the baseline against which the ECIG-based testsuite is compared. The GUI states can be provided as input to the procedure sigUpdate and stored in the table baselineSignatures. The entire workflow is production quality and is fully automated, requiring no human inter- vention. 4.2.2 ECIG extension This section describes the steps to reverse engineer a GUI model of a GUI-based application. These include identifying ECI relations, creating an ECIG and testing the application based on its 59 ECIG relations. The steps build upon the Standard Workflow described in the previous section. The ECIG extension workflow, in Figure 4.1, consumes test artifacts from the standard workflow. In this workflow segments I and II of the standard workflow are first executed (see Table 4.2). As a result, the GUI Tree and EIG of the application under test is produced. The SequenceLength Generator produces an EIG-based SequenceLength-2 testsuite that contains all possible length-2 testcases. This testsuite serves as a seed testsuite for determining ECI relations between events. The seed testsuite is executed by the replayer, producing code coverage information. Segment IV contains the core logic for ECI detection and ECI-based testing. It consumes the code coverage report from the seed testsuite. This segment consists of Coverage Analysis and ECIG Graph Con- verter (see Table 4.2). Coverage analysis: The SequenceLength-2 seed testsuite and its corresponding code coverage are analyzed to determined ECI relations amongst the events of the GUI. The code coverage obtained from the code coverage tool, Cobertura, is converted to use the Approach L1 (unique-lines-of- code) method of code coverage. Table 4.3 shows a representative example of a length-2 EIG-based testsuite and its code cov- erage files. Each testcase is named with the two event names of the testcase - first, second - as efirst esecond. The code coverage files are named in a similar manner. The folder for the coverage files is named as efirst esecond. The coverage files placed in it for the two events, first and second, are named first.xml and second.xml respectively. Procedure ECI in Figure 4.2 is then applied to the seed testsuite and its code coverage. Inputs to the procedure are the list of testcase names T and the corresponding list of code coverage records C, which is a 2-dimensional record accessed by the (first, second) event names of a testcase. The ECI relation is returned in the output table ECI . In order to determine if an event e interacts with another event the native code coverage for e, when it is executed in isolation immediately after launching the application, is required. This 60 # Testcase events Testcase name (T ) Per-event code coverage (C) first, second (efirst esecond) (first.xml, second.xml) 1 a, b ea eb a.xml, b.xml 2 a, c ea ec a.xml, c.xml 3 b,c eb ec b.xml, c.xml 4 b,d eb ed b.xml, d.xml 5 c,d ec ed c.xml, d.xml 6 c,a ec ea c.xml, a.xml 7 d,a ed ea d.xml, a.xml 8 d,b ed eb d.xml, b.xml Table 4.3: A representative SequenceLength-2 testsuite containing 8 testcase, with corresponding code coverage files for each testcase. Code coverage is collected after executing each event of every testcase. native code coverage for all events are stored in a lookup table CC in lines 2 − 4. Here, each code coverage record is checked to see if its first event has been considered for its native code coverage. If not, then its code coverage is stored in CC as the native code coverage for the event first. At the end of this step CC contains the native code coverage for every event in the GUI. Lines 5−8 selects a testcase, t, and determines if its first event interacts with its second event. This is done by comparing the code coverage of second for t with the native code coverage for second. If the two code coverages do not match then it indicates that first might have interacted with second to alter its code coverage in t. This means that event first interacts with event second and an edge first −→ second is added to the set ECI . For example, in Table 4.3, to determine if the interaction eb −→ ed exists, two coverage files for the event ed needs to be compared. The native coverage for ed, d.xml, is stored in CC[ed] from testcase 7. To determine the interaction, the coverage for ed, d.xml, from testcase 4 is compared with the native coverage. 61 PROCEDURE ECI IN: T = {t | t ∈ EIG based SequenceLength-2 testsuite; t is like first second } IN: C = {c(first, second) | c = code coverage record for testcase first second)} OUT: ECI = Φ CC = Φ 1 # Compute native code coverage for each event in the EIG foreach c ∈ C 2 if (CC[c.first] == Φ) 3 CC[c.first] = c 4 # Determine ECI interactions foreach t ∈ T 5 (first, second) = (t.first, t.second) 6 if (C[first, second].second.coverage 6= CC[second].first.coverage) 7 ECI = ECI ∪ (first, second) 8 PROCEDURE ECIG IN: EIG =< V1, E1, I1 > IN: ECI ={e | e ∈ ECI} OUT: ECIG =< V2, E2, I2 > V2 = V1; E2 = Φ; I2 = I1 1 foreach e ∈ E1 2 E2 = E2 ∪ e 3 Figure 4.2: Procedure ECI computes ECI relations between event pairs of a GUI. Procedure ECIG produces an ECIG-based on the ECI relation. ECIG Graph Converter The ECI information produced by the Coverage Analysis step produces a list of event pairs which exhibit the ECI relation. This step generates the ECIG-based on the ECI relations. The procedure ECIG in Figure 4.2 is executed to generate the Event-Code Interaction Graph. This procedure takes the EIG for the application and the ECI relation from the procedure ECI as inputs. It produces a graph ECIG as the output. This graph has the same set of vertices as the EIG (V2 = V1) and the same set of initial events (I2 = I1) (Line 1). Its edges are the set ECI (Lines 2-3). The procedures ECI and ECIG produce the ECIG as the output of Segment IV, in the ECIG extension of Figure 4.1 and Table 4.2. Thereafter, Segment II and III of the standard workflow are again executed. In Segment II, the Test Case Generator consumes the ECIG from Segment 62 IV and generates SequenceLength-n testcases based on the ECIG. These testcases are executed by the Replayer, which records the GUI states of the application after executing each event of every testcase. In Segment III, logs recorded by the Replayer are analyzed to determine if the application crashed or an exception was encountered during replay. The procedure sigUpdate is executed on the GUI state recorded by the Replayer, to create the table targetSignatures. This table stores all GUI states visited by the target ECIG-based testsuite. The GUI states reached by the target ECIG-based testsuite is then compared with the GUI states reached by the baseline EIG-based length-2 testsuite. This is done by invoking the procedure listNew with: listNew(baselineSignatures, targetSignatures) (4.1) to determine if the ECIG-based testsuite visited new GUI states. 4.2.3 Integration It was shown by Nguyen [35], that GUITAR may be integrated into an automated software testing workflows. In a similar manner, the Standard workflow and ECIG extension described in the previous sections may readily be integrated into generic software testing workflows. Regression testing: During the process of software development, program-code changes are fre- quently made to the software. It is necessary to execute testcases on the changed software to determine if the behaviour of the new version of the software has regressed from the previous ver- sion. The Standard workflow and ECIG extension may be integrated into such regression testing workflows. This is done by executing the ECIG-based testcases during each regression testing 63 cycle. Incremental model updates: During the process of developing a software, the behaviour, charac- teristics and appearance of the software may gradually change. These changes necessitate updates to the EIG and ECIG models, so that testcases may be re-created, based on the new behaviour of the application. Using the standard workflow, updates to the EIG requires executing the GUI Rip- per. Using the ECIG extension, updates to the ECIG require 1) re-creating the EIG-based baseline testcases 2) coverage analysis and 3) creation of the ECIG. These steps being resource intensive, it is possible to make incremental updates to the EIG and ECIG models instead. Based on known updates to the GUI of the application. The GUI Ripper may be modified to detect updates to the GUI compared to the previous version. The changes may be used to 1) make incremental updates the EIG 2) re-create EIG-based baseline testcase corresponding to the updated GUI 3) execute the re-created baseline testcases and 4) analyse coverage data for the re-created testcases to determine new ECI relations and incrementally update the ECIG. These steps outline a method for incremental updates to the ECIG model. They are not further explored in this study. 4.2.4 Contributions The standard workflow and ECIG-extension stitch together the GUITAR tools to evaluate the usefulness of ECIG-based testcases. The workflows determine ECI relations, produces the ECIG, ECIG-based testcases and new GUI states. These artifacts are contributions 2 and 3 identified in Chapter 1. Contribution 2: The ECIG is the second contribution of this study. Code-coverage analysis of the seed EIG-based SequenceLength-2 testcases identifies interacting events. The interacting events are combined into a model, the ECIG. Since the ECIG contains events that are known to interact at the program-code level, combined execution of events from this graph is likely to test interacting program-code and reveal defects therein. 64 Contribution 3: Comparing the EIG and ECIG, it can be seen that edges of the ECIG are selected from the edges of the EIG. In fact, the ECIG is essentially obtained from the EIG by re- moving those edges that do not exhibit interactions at the program-code level. Hence, the ECIG is sparse compared to the EIG. The number of event-sequences of length n that can be generated from the ECIG is fewer, compared to the EIG. The testsuite size is smaller and is typically executable in a reasonable amount of time. ECIG-based SequenceLength-n testsuite is a method to generate long testcases that can target specific parts of the GUI. 4.3 Testbed Experiments were executed on the testbed shown in Figure 4.3. In this figure, host machines – M1, M2, M3 – are configured with 24-2.4GHz Intel CPU, 48GB RAM, Ubuntu OS with 3.13.0- 58 kernel and Java 1.7.0 45-b18. The controller machine, C, is configured with 64-2.3GHz AMD Opteron CPU, 256GB RAM, Linux OS with 3.10.0-229.7.2 kernel. The controller machine is configured to execute 20 concurrent testcases on each host machine for a total of 60 concurrent testcases. To execute a testcase on the host machine (control flow), the controller machine 1) sets up the environment on the host 2) launches the application 3) executes a testcase 4) records cover- age reports, test logs and GUI states 5) archives results 6) terminates the application and cleans up its persistent states. These steps ensure that all testcase are executed in an identical environment with identical initial states. Test results and artifacts are stored in the controller machine (data flow). 65 M1 M2 M3 C 1…20 1…201…20 control flow data flow Figure 4.3: Testbed for executing testcases. Controller machine, C, executes 20 concurrent test- cases on each worker machine, M1, M2, M3. Results are stored in the controller machine. 66 Chapter 5 Empirical evaluation The Event-Code Interaction (ECI) model described in Chapter 3 is empirically evaluated in this study. Results of the empirical evaluation are presented in this chapter. The evaluation focuses on determining the existence of ECI relations in GUI applications and their usefulness in creating testcases for testing the applications. These results show that 1) ECI relations indeed exist in GUI- based applications 2) the methods developed in this study, make it is easy to identify such relations in real-world software testing environments 3) it is easy to create a model of the application, based on these relations 4) the model can generate long testcases that exercise the application in new ways and 5) these testcases are useful in finding defects in the application. Creation of long testcases for GUI applications has been challenging, using existing GUI models. The ECI model is shown to address this challenge with the ability to create long GUI testcases. In this chapter, the term state indicates the complete visible GUI state of an application at a given instant. Typically, the state is referred to after an event of a testcase has been executed. In this evaluation, there are no references to non-GUI states, such as, memory states. 5.1 Experiment overview Experiments were conducted on a set of 4 subject applications. Each application under test was subjected to the evaluation cycle described in Standard Workflow (see Section 4.2.1) and ECIG Extension (see Section 4.2.2). In all, four evaluation cycles were conducted, with one cycle being executed for each application under test. The cycles were executed in sequence, such that the data 67 Application Abbreviation Version Year LOC Branch Method Class ArgoUML AU 0.34 1999 70, 430 32, 254 16, 034 1, 854 Buddi BD 3.4.1.11 2006 155, 960 43, 670 25, 684 2, 751 JabRef JA 2.10 2003 61, 714 28, 440 8, 722 1, 493 JEdit JE 5.1.0 1998 67, 761 38, 845 9, 881 1, 288 Table 5.1: Properties of applications under test, that were used for evaluation. for one cycle was collected and analyzed completely before the next cycle was started. Section 5.2 and Section 5.3 present results from the four cycles, consisting of the Standard Workflow and ECIG extension respectively. In these sections, results for all applications under test are collected and presented together. 5.1.1 Research questions To guide the empirical evaluation of the ECIG-based testcases, the following research ques- tions are designed. RQ1: Do ECI relations exist in a GUI-based application? RQ2: Can ECI relations be used to generate long testcases for a GUI-based application? RQ3: Can ECIG-based testcases exercise the GUI of a GUI-based application in useful ways? RQ4: Are ECIG-based testcases good for detecting software defects? These questions guide the evaluation of the ECIG model and ECIG-based testcases. Results of the evaluation are analyzed to answer these questions in Section 5.3. 68 5.1.2 Applications under test Four Java-based GUI applications were selected as applications-under-test, for this study. The applications are the following: 1. ArgoUML: An open-source1 software design and engineering tool. 2. Buddi: An open-source2 personal finance and budgeting software. 3. JabRef: An open-source3 software for managing bibliography references. 4. JEdit: An open-source4 text editor for computer programmers. Characteristics of the applications are shown in Table 5.1. In this table, the column Version shows the version number of the application that was selected for evaluation. The latest version available at the time of evaluation was selected. The column Year shows the year when the ap- plication was first made available. All the applications have been present for a decade or more, indicating that an appreciable amount of code maturity would have been reached. The column LOC shows the number of lines of code in the application, column Branch shows the number of code branches, columns Method and Class show the number of Java methods and classes in the application. It can be seen that the applications are non-trivial in terms of code size and complexity. 5.1.3 Threats to validity External validity: Threats to external validity are factors that hinder generalization of results from this study to other applications, systems and environment. Results presented in this study are based on the empirical evaluation of four Java-based GUI-centric application. Results collected from each application are consistent with the results from the other applications. However, owing to the nature of GUI applications, evaluation results with new applications may vary. In addition, 1www.tigris.org 2buddi.digitalcave.ca 3jabref.sourceforge.net 4www.jedit.org 69 results may vary if non-GUI centric applications are chosen for evaluation. In this study four Java- based, GUI-centric applications were evaluated in the Linux platform. Evaluation results may vary if applications from other GUI platforms such as Windows, Android or MacOS are used for evaluation. Internal validity: Threats to internal validity are factors that may affect or bias results of eval- uation. For example, the design and implementation of tools and techniques may suffer from shortcomings. During the process of reverse engineering the GUI of an application under test, event identifiers are automatically generated, for every identified event of the application. The event identifier for an event is an integer, that is created by hashing together properties of the cor- responding widget. The hash function may suffer from hash collision. A hash collision will result in two distinct events of an application being assigned the same event identifier. This may result in incorrect execution of events from a testcase. The process of extracting events from the GUI of an application may be incomplete. This happens when the GUI Ripper fails to identify and extract a widget from the visible GUI of the application. In addition, the GUI Ripper may fail to access and extract some GUI properties of a widget. As a result, the EFG, EIG and ECIG models may contain partial information about the structure of the GUI. This may result in some events and testcases being abset from the corresponding EIG or ECIG testsuites. The hierarchical signature that is used for identification of GUI states may suffer from hash collision. A hash collision will result in two distinct GUI states being considered to be identical. This may result in inaccuracies in the count of GUI states reached by an application, when a testsuite is executed on it. 5.2 Standard workflow This section reports evaluation results following the steps of the Standard Workflow outlined in Section 4.2.1. The Standard Workflow is used to create a seed testsuite, from which ECI rela- tions are determined and the ECIG model is constructed. States visited by the applications during 70 Application rip time windows widgets code coverage AU 4 min 32 sec 26 1, 723 29.18% BD 3 min 48 sec 20 1, 077 5.49% JA 6 min 45 sec 44 1, 899 21.90% JE 8 min 48 sec 18 910 23.38% Table 5.2: GUI Ripper reverse engineers the GUI of an applications to create the GUI tree. The GUI tree contains structural information about the GUI. execution of the seed testsuite form the baseline for comparing with the new states visited during the execution of the ECIG-based testsuite. GUI Ripper: The GUI Ripper reverse engineers the GUI structure of the application under test from its visible run-time state. It launches the application, identifies its top-level windows, ex- tracts widgets using platform-specific APIs, executes events on widgets to open new windows and continues the process until the complete application has been observed. The GUI Ripper produces the GUI tree, that represents structural information about the GUI of the application. Table 5.2 shows the result from executing the GUI Ripper on each of the four applications under test. The table shows time to reverse engineer the GUI and create the GUI-tree (rip time), metrics of the application’s GUI, that is, number of GUI windows that were discovered (windows), number of widgets that were extracted (widgets) and the code coverage achieved as a result of exercising the application (code coverage). It can be observed that a sizeable part of ArgoUML, JabRef and JEdit’s code was exercised by the simple process of reverse engineering. The GUI tree is next used to create the EFG and EIG. EFG Graph Converter: In this step semantic information about GUI events is extracted from the GUI tree obtained from the GUI Ripper. The semantic information is used to create an EFG model representing events and their relations. The EFG Graph Converter extracts all follows relations 71 Application create time # vertices # edges AU 2 sec 498 15, 846 BD 1.6 sec 245 3, 621 JA 2 sec 367 12, 981 JE 1.6 sec 428 13, 687 Table 5.3: EFG Graph Converter extracts follows relation from the GUI tree to create the EFG. Length AU BD JA JE n = 2 # testcases 15, 846 3, 621 12, 981 13, 687 creation time 7 min 3 sec 1 min 56 sec 6 min 21 sec 6 min 43 sec execution time 13 hours 5 hours 25 min 20 hours 9 hours 2 min n = 3 # testcases 678, 628 92, 548 521, 445 447, 464 counting time 1 sec 2 sec 2.4 sec 2.2 sec n = 4 # testcases 30, 421, 680 3, 166, 492 21, 251, 647 14, 602, 995 counting time 3 sec 3 sec 6.4 sec 4.4 sec n = 5 # testcases 1, 377, 291, 937 121, 486, 655 865, 333, 263 474, 185, 546 counting time 3 min 23 sec 21 sec 2 min 23 sec 1 min 52 sec n = 6 # testcases 62, 464, 993, 710 4, 812, 611, 517 35, 185, 275, 646 15, 326, 111, 843 counting time 2 hours 26 min 12 min 35 sec 1 hour 35 min 37 min 35 min n = 7 # testcases 2, 833, 714, 733, 116 192, 229, 380, 427 1, 429, 529, 616, 612 493, 309, 467, 198 counting time 149 hours 16 min 9 hours 7 min 68 hours 49 min 22 hours 16 min Table 5.4: Testsuite creation time, execution time and counting time for SequenceLength-n test- case based on the EFG, where n ∈ {2, 3, 4, 5, 6, 7}. Counting time is the time taken to count the possible number of testcases, without actually generating them. from the GUI tree. It creates a directed graph whose vertices are events from the GUI. Edges are added to the graph, between two events, when one event can be executed immediately after the other. Table 5.3 shows properties of the EFG obtained for each of the four applications – time taken to create the EFG (create time) and number of vertices and edges in the EFG (# vertices, # edges). It can be seen, from the table, that the EFG is create very quickly. Figures 5.1 and 5.2 show the graph models – EFG, EIG, ECIG – for all the applications. The EFG, shown in the left column, models 72 A rg oU M L B ud di EFG EIG ECIG Figure 5.1: Visual representation of the EFG, EIG and ECIG of ArgoUML and Buddi. The EFG represents follows relation between all pairs of events. EIG represents follows relation between non-structural events. It contains fewer vertices and more edges. ECIG is a subset of the EIG, containing edges that interact at the program-code level. the follows relation between an event e2 that can be executed immediately after e1. Clusters can be visually identified in the EFG that belong to GUI windows of the application. EFG-based testcases: In model-based GUI testing methods proposed in earlier work [30], the EFG was used for generating testcases. Table 5.4 shows properties of SequenceLength-n testcases, for n ∈ 2, 3, 4, 5, 6, 7, based on the EFG. In this table, Length is the length of the testcase, in terms of number of events, # testcases is the number of testcases in a testsuite, creation time is the time taken to create a testsuite, execution time is the time to execute a testsuite, counting time is the time taken to count the possible number of testcases in a testsuite, without actually writing the testsuite 73 Ja bR ef JE di t EFG EIG ECIG Figure 5.2: Visual representation of the EFG, EIG and ECIG of JabRef and JEdit. The EFG represents follows relation between all pairs of events. EIG represents follows relation between non-structural events. It contains fewer vertices and more edges. ECIG is a subset of the EIG, containing edges that interact at the program-code level. into a data file. This table shows that the number of testcases grows rapidly with increasing length of the testcase. Execution of n = 2 testsuites consumed between 9 and 20 hours. In fact, beyond a length of 2 (or 3), the number of testcases is too large to create or execute, in a reasonable time. Hence, it is impractical to generated and execute testsuites containing long EFG-based testcases. EIG Graph Converter: The EIG Graph Converter converts the EFG, obtained in the previous step, into an EIG. Table 5.5 shows properties of the EIG for each application. In this table, create time is the time taken by the Graph Converter to convert the EFG into an EIG, # vertices and # edges 74 Application create time # vertices # edges AU 2.2 sec 313 67, 445 BD 1.6 sec 144 12, 576 JA 2.1 sec 276 57, 873 JE 3.4 sec 378 133, 734 Table 5.5: EIG obtained from the EFG by removing structural events. are the number of vertices and edges in the EIG. Vertices representing structural GUI events are not included in the EIG. An edge is present between two events (vertices), e1 and e2, if the events are not structural events and e2 is executable along some path after executing e1. As can be seen from this table, creation of the EIG is a fast process. In Figures 5.1 and 5.2, the middle column shows EIGs for the applications under test. Compared to the EFG, more edges are present in the EIG. This is because edges in the EIG include event pairs that do not necessarily follow each other immediately. EIG-based testcases: Testcases created from the EIG, termed abstract testcases, are not imme- diately executable. This is because intermediate events need to be inserted so that one event is reachable from the previous. Intermediate events are inserted to convert the abstract testcases into executable testcases. Section 3.5 shows the example of an abstract testcase Auto Wrap −→ Regu- lar Expressions −→ Whole Word that has been converted into an executable testcase by inserting intermediate events. Table 5.6 shows properties of EIG-based testcases. Similar to EFG-based testcases, this table shows that the number of EIG-based testcases, in a testsuite, grows rapidly with increasing test- case length. Beyond a length of 2, the number of testcases is too large to create or execute in a reasonable time. Hence, similar to EFG-based testcases, it is practically impossible to create and execute testsuites containing long EIG-based testcases. Seed EIG-based testsuite In this study, EIG-based SequenceLength-2 testsuites, were used as the 75 Length AU BD JA JE n = 2 # testcases 67, 445 12, 576 57, 873 133, 734 creation time 1 hour 44 min 17 min 31 sec 1 hour 18 min 3 hour 24 min (seed) execution time 4 days 16 hours 1 day 2 hours 2 days 9 hours 9 days 23 hours n = 3 # testcases 13, 520, 515 1, 026, 461 11, 859, 294 47, 223, 464 counting time 5.22 sec 3 sec 4.7 sec 7.7 sec n = 4 # testcases 2, 696, 435, 575 82, 069, 590 2, 450, 183, 082 16, 673, 935, 504 counting time 8 min 58 sec 15 sec 9 min 10 sec 37 min 51 sec n = 5 # testcases 537, 280, 797, 376 6, 512, 775, 666 506, 215, 408, 542 5, 887, 308, 434, 240 counting time 22 hour 22 sec 16 min 16 sec 20 hour 49 min 218 hour 29 min Table 5.6: Testsuite creation time, execution time and counting time for SequenceLength-n test- case based on the EIG, where n ∈ {2, 3, 4, 5}. Counting time is the time taken to count the possible number of testcases, without actually generating them. App # testcases execution execution GUI state coverage execution log code clock time work size (GB) size (GB) size (GB) coverage AU 67, 445 4 days 16 hours 224 days 91 608 1.4 30.60% BD 12, 576 1 day 2 hours 31 days 4.8 1, 208 0.26 7.83% JA 57, 873 2 days 9 hours 135 days 143 2, 789 0.73 25.55% JE 133, 734 9 days 23 hours 657 days 861 8, 714 1.5 25.70% Table 5.7: EIG-based SequenceLength-2 testcases is the seed testsuite. ECI relations are deter- mined from the code coverage. GUI state forms the baseline, to check if another testsuite found new GUI states. seed testsuite. This corresponds to the row n = 2 in Table 5.6. Execution of the seed testsuite by the Replayer is described in the next step. Replayer: The Replayer executes the seed testsuite. The seed testsuite is the same testsuite from the n = 2 rows in Table 5.6. The Replayer executes each testcase after launching the application, that is, from a known state. Cobertura is used to record the lines of code executed after invoking each event of the testcase. The GUI state that is visible after executing each event is also recorded. Table 5.7 shows execution results from executing the seed testsuite for each application. The 76 number of testcases in the seed testsuite (column # testcases) is equal to the number of edges in the EIG. Execution of the seed testsuite takes between 1 and 10 days, shown in column execution clock time. The column execution work shows the time taken to execute the testsuite multiplied by the number of concurrent execution threads. It estimates the time it would have taken to execute the testsuite using one execution thread. The collected artifacts for each testcase are: 1) GUI state 2) per-event code coverage data and 3) execution logs. The size of the collected artifacts are shown in the columns GUI state size, coverage size and execution log size respectively. The column code coverage shows the percentage of the application’s source code exercised by the testsuite. The per-event code coverage data is used to determine ECI relations in the application. The recorded GUI state is used as the baselineSignature (see Section 3.4.3) to determine if ECIG-based testcases drive the GUI into new states. This section describes the Standard Workflow. This workflow creates and executes EIG-based SequenceLength-2 testsuites (seed testsuites), on the applications under test. Artifacts from the execution of the seed testsuite are used in the ECIG extension, described in the next section. 5.3 ECIG extension This section describes the ECIG Extension workflow outlined in Section 4.2.2. This workflow was used to 1) identify ECI relations in the applications under test 2) create the ECIG using the ECI relations 3) generate long, ECIG-based testcases 4) execute ECIG-based testcases and 5) ver- ify that the testcases visited new GUI states in the applications. Coverage Analysis: Coverage data obtained from executing the EIG-based seed testcases, from the Standard Workflow, were analyzed using the procedure ECI described in Section 4.2.2. The coverage data was first normalized, that is, all per-line positive coverage hit counts were reduced 77 Application hit count ECI analysis # ECI # ECI normalization time vertices edges AU 32 hours 9 hours 22 min 205 799 BD 10 hours 38 min 1 hour 49 min 45 157 JA 25 hours 57 min 1 hour 31 min 168 355 JE 90 hours 10 min 14 hours 58 min 224 517 Table 5.8: Metrics for obtaining the ECI event pairs from coverage data, obtained by executing EIG-based seed testsuite. to 1. This is in accordances with the Approach L1 (unique-lines-of-code). The procedure ECI, from Figure 4.2, then produced a list of ECI event pairs from the coverage data. Table 5.8 shows metrics for producing the list of ECI event pairs, for all the applications. In this table, hit count normalization shows the time taken to normalize the hit counts, in the coverage data, to 1 and ECI analysis time shows the time taken to execute procedure ECI on the coverage data. The column # ECI vertices shows the number of events that participated in the ECI relations. The column # ECI edges shows the number of ECI relations that were discovered, by the proce- dure ECI, in the coverage data. The number of ECI edges that were found for ArgoUML, Buddi, JabRef, JEdit are 799, 157, 355 and 517 respectively. RQ1: Do ECI relations exist in a GUI-based application? A: Column #ECI edges from Table 5.8 shows that ECI relations exist in a GUI-based applica- tion. Each ECI pair, represented in this column, denoted as e1 −→ e2, indicates that execution of event e1 before e2 influences subsequent execution of event e2 at the program code level. Identifi- cation of these ECI relations in GUI applications is the crux of this study.  ECIG Graph Converter: The Graph Converter with ECIG plugin takes as input the EIG from the Standard Workflow and produces an ECIG. Edges included in the ECIG were determined by the 78 n −→ 3 4 5 6 7 8 9 10 11 AU 319 797 1, 074 1, 801 2, 225 2, 247 2, 546 2, 522 2, 789 BD 65 183 368 711 1, 230 2, 014 2, 966 4, 347 5, 799 JA 125 242 421 698 1, 123 1, 694 2, 338 2, 475 2, 298 JE 234 442 717 1209 1, 676 2, 302 3, 250 3, 990 4, 264 n −→ 12 13 14 15 16 17 18 19 20 Total AU 2, 660 2, 585 2, 089 1, 601 2, 850 - - - - 28, 105 BD 5, 753 5, 466 3, 086 - - - - - - 31, 988 JA 1, 604 1, 196 759 478 326 - - - - 15, 777 JE 3, 530 2, 667 1, 764 1, 264 800 436 184 51 7 28, 787 Table 5.9: Count of testcases for ECIG-based SequenceLength-n testcase, where n ∈ {3, ..., 20}. Total testcases for each application is shown in Total column. procedure ECI in the previous step. The number of vertices and edges in the ECIG are shown in #ECI vertices and #ECI edges in Table 5.8. In Figures 5.1 and 5.2, the right column shows a visual representation of the ECIG for the applications. Comparing the columns #ECI edges (Table 5.8) and # edges (Table 5.5), it can be seen that the ECIG is sparse, compared to the EIG. As a result, the number of testcases produced from the ECIG is fewer than that from the ECIG. This results in a smaller testsuite that can be executed within a reasonable time. Testcase Generator: The ECIG can now be used to generate ECIG-based testcases. These testcase, like EIG-based testcases, are abstract in nature. Successive events in these testcase might not be directly reachable from the preceding event. Hence, intermediate events may need to be inserted, to convert the abstract testcases into executable testcases. Table 5.9 shows the number of SequenceLength-n ECIG-based testcases of different lengths, from 3 to 20. In this table, a column shows the number of testcases of a given length for dif- ferent applications. It can be seen that the ECIG generates testcases with as many as 20 events. The maximum length of testcases generated using the ECIG is a property of the application under 79 test. Comparing the number of testcases of a given length, in this table, with EIG-based testcases in Table 5.6, shows that the number of ECIG-based testcases is drastically less. For example, for JEdit, at length 5, there are ‘5, 887, 308, 434, 240’ EIG-based testcases, whereas there are 717 ECIG-based testcases. Similarly, comparing Table 5.9 with EFG-based testcases in Table 5.4, the ECIG-based testsuite at a given length is smaller. The reduced size of the ECIG-based testsuite makes is possible to execute it within a reasonable time. RQ2: Can ECI relations be used to generate long testcases for a GUI-based application? A: From Table 5.9, it can be seen that using the ECIG, it is possible to generate SequenceLength- n testcase where n is as large as 20. These testcases are longer than SequenceLength-2 EFG or EIG-based testcases. The size of the testsuite is such that it can be executed in a reasonable time.  Replayer: The ECIG-based testcases, created in the previous step, were executed on the applica- tions. Table 5.10 shows the results observed after executing the ECIG-based testcases. Since the number of testcases, of a specific length, were reasonably small, all the testcases for an application were executed together as a single testsuite. In this table #testcases corresponds to the Total column in Table 5.9. Executing the testsuites took between 1 and 12 days for each application. The wall-clock time for execution is shown in the column execution clock time. The number of concurrent execution of testcases varied between the application. The column execution work shows the clock time multiplied by the number of concurrent executions. It estimates the total time that would be consumed if the testcases were executed without any concurrency. The GUI state of the application was collected after executing each event of every testcase. The size of the collected state is shown in GUI state size. Code coverage data and execution logs were also collected. Their size is shown in coverage size and execution log size. Since these testcase are long testcases, they are expected to exercise the application in ways that is not possible with n = 2, EFG or EIG-based testcases. 80 App # testcases execution execution GUI state coverage size execution log code clock time work size (GB) (compressed GB) size (GB) coverage AU 28, 105 9 days 5 hours 267 days 45 154 0.86 27.52% BD 31, 988 12 days 588 days 131 431 0.79 6.28% JA 15, 777 1 day 18 hours 115 days 49 78 0.37 21.33% JE 28, 787 4 days 2 hours 269 days 585 151 0.82 24.36% Table 5.10: ECIG-based SequenceLength-n execution results, where n ∈ {3, ..., 20}. Column code coverage in Table 5.10 shows the code covered by all testcases for an application. The code covered by the ECIG-based testsuite is less than the code covered by its seed testsuite. For example, for JEdit, the code coverage for ECIG-based testsuite is 24.36%, whereas the code covered by its seed testsuite is 25.70% (Table 5.7). This is expected, since the ECIG is a sub- set of the EIG. In addition, analysis of the code coverage from the ECIG-based testsuite showed that few new lines of code were executed, compared to the seed EIG-based testsuite. It indicates that executing the same events in different contexts does not necessarily result in increased code coverage. Intuitively, event handlers interact with other event handlers at the program-code and program-state level. Executing them in different contexts is likely to exercise their interaction with each other. Hence, code coverage is not sufficient to quantify the coverage delivered by a testsuite. GUI state analysis: The GUI state of the application was extracted and stored after executing each event of every testcase, from ECIG-based testsuite. These GUI states were analysed to detect if new GUI states of the application was exercised. Contribution 4: GUI state coverage is an alternate metric to measure the effectiveness of a test- suite. Long GUI testcases may not result in new code being exercised. However, they may result in the code being exercised in new ways. This may manifest itself as new GUI states being reached by the application. Measuring the number of GUI states reached (or created) by a testsuite may be used as a metric for measuring the effectiveness of a testsuite. 81 The GUI state recorded while executing the EIG-based seed SequenceLength-2 was com- pared with the ECIG-based long testcases. Each application was processed as follows. First, the procedure sigUpdate was executed on the GUI state of the seed testsuite to produce the baselineSignature as described in Section 3.4.3. Second, sigUpdate was executed on the GUI state collected from the ECIG-based testsuite. The resulting GUI state signatures were stored in targetSignature. Third, the procedure newState was executed using both the GUI state signatures. This produced a list of ECIG-based testcases that exercised new GUI states in the application. Table 5.11 shows the time taken to execute the procedure sigUpdate on the baseline EIG-based testsuites, the target ECIG-based testsuites. It also shows the time taken to compare the two sets of signatures using the procedure newState. The state of an application was recorded after each event of every testcase was executed. For a testcase with an executable event sequence of length n, n states were recorded. For a testsuite, whose testcases together contain m events, m states were recorded. Since different testcases of a testsuite may cause the application to reach the same state, the number of unique states reached by an application may be less than m, for a given testsuite. Table 5.12 shows information about GUI states of the applications that were reached, by the seed EIG-based and ECIG-based testsuites. Column (A) records the total number of states that were reached when the EIG-based testsuite was executed. This counts a state after the execution of each event or every testcase. For example, for JEdit, the testsuite caused the execution of 589, 632 successful events, recording as many states. Column (B) shows the number of states that were unique. It is possible that events from different testcase caused the application to reach the same state. For example, for JEdit, of all the states reached only 2, 708 states were unique. This shows that different testcases of the testsuite with different event sequences may cause the application to move within a small set of states. The columns (C) and (D) shows the states that were reached by the ECIG-based testsuite, and the unique states within them. For example, for JEdit, 543, 548 events were successfully exe- 82 EIG ECIG EIG vs ECIG App sigUpdate time sigUpdate time newState time AU 5 hours 17 min 13 hours 42 min 14 min 50 sec BD 1 hour 40 min 37 hours 33 min 12 hours 29 min JA 5 hours 16 min 11 hours 52 min 15 min 21 sec JE 24 hours 45 min 18 hours 42 min 5 hours 18 min Table 5.11: Time taken to compute GUI state signatures, using procedure sigUpdate, for the baseline EIG-based baselineSignature, ECIG-based targetSignature and identifying new ECIG states using procedure newState. . EIG ECIG EIG vs ECIG App total states unique states total states unique states testcases new states unique new states (A) (B) (C) (D) (E) (F) (G) AU 157, 291 4, 734 461, 621 6, 916 24, 133 247, 777 6, 461 BD 38, 951 1, 160 943, 935 8, 611 31, 827 637, 291 8, 378 JA 196, 469 10, 037 362, 511 6, 920 15, 509 268, 512 6, 578 JE 589, 632 2, 708 543.548 2, 038 26, 414 261, 245 1, 744 Table 5.12: States detected during execution of EIG-based seed testsuite and ECIG-based target testsuites. Number of ECIG-based testcases detecting a new state is shown in testcases (E). Num- ber of unique new states detected by the ECIG-based testsuite is shown in unique new states (G). cuted, collecting as many states. Of these, 2, 038 unique states were reached. Column (E) shows the number of ECIG-based testcases that found one or more new states during its execution. For example, for JEdit 26, 414 testcases, out of 28, 787 ECIG-based testcases (see Table 5.9), found one or more new states. Observation 1: Although the ECIG for JEdit has only 517 edges, compared to 133, 734 for the EIG, the number of unique states reached by the ECIG-based testsuite (column D) is compa- rable to that reached by the EIG-based testsuite (column B). This is true for all the applications. Column (F) shows the number of states that are new in the ECIG-based testsuite. These new states were reached by the ECIG-based testsuites, but not by the EIG-based testsuites. For example, 83 App (H) (I) AU 93.5% 6.5% BD 97.2% 2.8% JA 95.1% 4.9% JE 85.6% 14.4% Table 5.13: Column (H) shows the percentage of states reached by ECIG-based testsuite that are new. Column (I) shows the percentage of states reached by ECIG-based testsuite that were already reached by the baseline EIG-based testsuite. for JEdit, 261, 245 new states were discovered during execution of the ECIG-based testsuite. Column (G) show that, of these, 1, 744 unique states were discovered. By comparing columns (D) and (G), it can be seen that the number of unique new states com- prise a significant part of the number of unique states, reached by the ECIG-based testsuite. This indicates that most states reached by the testsuite were new, compared to the baseline testsuite. To quantify this, Table 5.13 shows statistics derived from Table 5.12. H = 100 ∗G/D (5.1) I = 100 ∗ (D −G)/D (5.2) Column (H) shows the percentage of states reached by the ECIG testsuite that are new – that is, they were not reached by the baseline testsuite. Column (I) shows the percentage of states reached by the ECIG testsuite, that were already reached by the baseline EIG-based testsuite. For all the applications column (H) shows that a large percentage of states reached by the ECIG testsuite are new states. Observation 2: Although the ECIG contained fewer edges, it was able to drive the application to reach a large number of new states and unique new states. 84 01,000 2,000 3,000 4,000 5,000 6,000 2 4 6 8 10 12 14 t e s t c a s e c o u n t event index in testcase ArgoUML 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 2 4 6 8 10 12 14 t e s t c a s e c o u n t event index in testcase Buddi 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 2 4 6 8 10 12 14 t e s t c a s e c o u n t event index in testcase JabRef 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 2 4 6 8 10 12 14 t e s t c a s e c o u n t event index in testcase JEdit Figure 5.3: Count of testcases where the first new state was found after executing a specific number of events in the testcase. RQ3: Can ECIG-based testcases exercise the GUI of a GUI-based application in useful ways? A: In Table 5.12, column (G) shows the number of states that the application was able to reach, with ECIG-based testsuite, but not with EIG-based testsuite. Presence of the new states indicate that the application was exercised in new ways. The ECIG-based testsuite is able to test the appli- cation in ways that was not possible with EIG-based testcases. In addition, each unique new state could be reached using multiple testcases.  85 When an ECIG-based testcase is executed, it may find a new state in the application. The new state may be reached by any of the events in the testcase. It would be informative to know at what stage of the execution of the testcase is the first new state typically found. Figure 5.3 shows a count of testcases where the first new state was found after executing a specific number of events from the testcase. A plot is shown for each application. In each plot, the X-axis is the event index in the testcase and the Y-axis is the count of testcases, where the first new state was found at a specific event index. From the figure, it can be seen that the first new state is discovered after executing only a few events. For example, for JEdit, 2.338 testcases found the first new state after executing 3 events; 8 testcases found the first new state after executing 14 events. This shows that some new states are found after executing as many as 14 events. Hence, ECIG-based testcases of different lengths are all useful in finding new states. RQ4: Is ECIG-based testcases good for detecting software defects? A: A GUI state reached during the execution of events in a testcase represents the cumulative inter- action of program-code until that event. Each GUI state intuitively represents one or more chains of interaction, that leads to that state. Software defects embedded in a chain of interaction may be exercised when that state is reached by the GUI. ECIG-based testcases are shown to discover new states. A newly discovered GUI state represents one or more new chains of interaction. These new interactions may contain defects that are not exercised by the baseline EIG-based testcases. When an ECIG-based testcase discovers a new state, it indicates that a new chain of interaction was ex- ercised. Hence new states discovered by ECIG-based testcases may be deemed to correspond to software defects embedded in one or more new chains of interaction. Since ECIG-based testcases are shown to discover new GUI states (Table 5.12), they may be deemed good for exercising new software defects. 86 The empirical evaluation demonstrates the existence of ECI relations in GUI-based applica- tions. It demonstrates a practical method to identify ECI relations in an application, model it as an Event-Code Interaction Graph and generate long testcases based on the ECIG. These long test- cases are shown to exercise and test the application in ways that was not possible using existing GUI models. 5.4 Discussion Existing model-based GUI testing methods examine the characteristics of the visible state of the GUI. These characteristics have been modelled as the EFG, EIG, ESIG and state machines. A significant contribution of this study is that it connects the characteristics of the visible GUI of a GUI-based application to its program-code. The ECIG models the new relationship. Long GUI testcases generated from the ECIG model are shown to exercise the GUI in new ways. This study has used the EIG as a model for creating the baseline testsuite. This model and its testsuites were used to detect ECI relations in an application. It is possible to use another suitable model to generate the baseline testsuite. For example, the EFG, ESIG or a state machine model may be appropriately used. The baseline testsuite from these models may help identify ECI relations that are not identified using the EIG as the baseline model. The discovery of the new ECI relations are caused by the difference between the EIG and the other baseline models. It is not owing to any inherent characteristic of the ECI concept. The ECI model is therefore sensitive to the choice of a baseline model. The ECI relations determined from the baseline SequenceLength-2 testsuites in this study were based on Approach L1 (unique-lines-of-code) method of code coverage (see Section 3.2). Using a different method to characterize code coverage may alter the nature of ECI relations. This can result in the identification of a different set of event-pairs as ECI event pairs. Hence, the ECI relations are sensitive to the choice of a code coverage metric. 87 Chapter 6 Future work Event-Code Interaction is the first study that examines the relationships between GUI events and program-code that respond to GUI events. It leverages these relationships to generate long GUI testcases that are able to drive the GUI into new states. This study extends existing research on GUI models such as Event-Flow Graph, Event-Interaction Graph and Event-Semantic Interaction Graph, which work with the visible, run-time GUI state of an application. This study demonstrates the following: 1. That there exists program-code level interactions, defined as Event-Code Interaction (ECI), between GUI event handlers. 2. That these interactions can be systematically identified, using practical methods. 3. That these interactions can be modelled as an Event-Code Interaction Graph (ECIG). 4. That the ECIG can be used to generate long GUI testcases, contained in a small testsuite. 5. That the ECIG-based testcases drive the GUI to reach new GUI states that can detect defects in the application. In addition, this study develops a simple and fast method to uniquely identify GUI states. This method, defined as Hierarchical signatures, can be used to enumerate GUI states, as well as com- pare two sets of GUI states (see Section 3.4.2). The design, development and execution of this study leaves certain questions unanswered and opens the way for further investigation. 88 Q1: Do GUI applications and their program-code interact in deeper ways? Composite ECI: The ECI relation examined in this study attempts to find program-code level interaction between two GUI events. In this interaction, an event e1 is said to interact with another event e2 at the program-code level, if the execution of e1 before e2 influences the execution of e2. The ECI concept may be extended to 3 or more events. Consider 3 events, e1, e2 and e3, where there is no ECI relation between e1 and e3 or e2 and e3. However, the combined execution of e1 and e2 may affect the subsequent execution of e3. This is represented as {e1, e2} −→ e3 and is called Composite Event-Code Interaction. Section 3.3.3 describes composite ECI with an example. A study of composite ECI relations could give a better understanding of GUI applications and event- driven systems. Alternate seeds: ECI relations and the ECIG are created from a seed testsuite for an application. In this study, EIG-based SequenceLength-2 testsuites were used as seed testsuites. It is possible to use seed testsuites generated using different methods. For example, an EFG-based or ESIG- based testsuite may be used. The EFG, EIG and ESIG posses different properties. Seed testsuites generated from these models may yield different ECI relations. A study of EFG or ESIG-based ECI relations could provide alternative methods for creating and using an ECIG. Alternate coverage: ECI relations were identified in this study by analyzing the code coverage of event handlers in response to GUI events. The code coverage method that was adopted is Approach (L1) unique-lines-of-code (see Section 3.2). In this method, the execution status of a line of program-code was considered to be a Boolean value, TRUE or FALSE. Multiple executions of the same line of code was considered as TRUE. Using a different method for quantifying code coverage, and hence code coverage differences, is likely to yield deeper information about interactions between the GUI and its event handlers. Alternative code coverage methods have been proposed in Section 3.2, where Approach L2 (line- hit-count) distinguishes between the number of times a line was executed and Approach L3 89 (line-sequence) records the sequence of execution of each line. A different method for quantify- ing code coverage could yield different ECI relations. Q2: Does the ECI concept developed in this study, apply to other event-driven systems? The ECI concept developed in this study uses Java-based GUI applications, as examples of event-driven system, for evaluation. The visible run-time state of applications were used to de- termine if a combined execution of events affect the execution of an event. This concept may be generalized by: 1) expanding the nature of event-driven system and 2) considering different observable metrics of a system to detect changes in expected state. A few such systems are: Internet-of-things: Connected devices, for example in the domain of internet-of-things, interact with each other using a variety of signals, messages and data. These inputs are likely to invoke the execution of program-code within a connected device. A specific combination of inputs from other devices may cause a device to behave differently. The observed behaviour may be in terms of the messages it sends to other devices. Examination of such behaviour would help in the development of better hardware, software and verification methods for such connected devices. non-GUI systems: Event-driven systems - such as computer hardware components, mobile de- vice components, Java enabled devices1 - that are event-driven may also exhibit ECI behaviour. Applying the ECI concept to such systems may help in developing better testing strategies. Security: Security of computer software and the hardware they control is an integral compo- nent of most software systems. The response of software to unexpected or crafted inputs is of particular interest in the security domain. Examination of ECI relations may help understand the security characteristics of a software and develop better testing strategies. The above questions are a guidance for further investigation, based on the Event-Code Interac- tion concept, developed in this study. 1http://www.java.com/en/about/ 90 Appendix A Radio Button Demo 91 A B C D E F G L Radio Button Demo - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - X X X - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - - - - - - - - X - - - - - - - - - - - - - X - X - X - - X - X - X - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - X - - - - - - - - - - - - - 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 import java.awt.BorderLayout; import java.awt.Color; import java.awt.Graphics; import java.awt.event.ActionEvent; import java.awt.event.ActionListener; . import javax.swing.BorderFactory; import javax.swing.Box; import javax.swing.BoxLayout; import javax.swing.ButtonGroup; import javax.swing.JButton; import javax.swing.JCheckBox; import javax.swing.JFrame; import javax.swing.JOptionPane; import javax.swing.JPanel; import javax.swing.JRadioButton; . public class RadioButtonDemo extends JFrame { private static final long serialVersionUID = 1L; . JButton w0; // Exit JRadioButton w1; // Circle JRadioButton w2; // Square . JButton w3; // Create JPanel w4; // Rendered shape JButton w5; // Reset . JCheckBox w6; // Log exit time . JPanel contentPane; . int exitCircle = 0, exitSquare = 0; Shape currentShape; Boolean created = false; . enum Shape { CIRCLE, SQUARE } . public RadioButtonDemo() { super("Radio Button Demo"); contentPane = new JPanel(new BorderLayout()); . 92 A B C D E F G L Radio Button Demo X - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - - - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - X - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - - - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 w1 = new JRadioButton("Circle"); w1.setSelected(false); w1.addActionListener(new W1Listener()); . w2 = new JRadioButton("Square"); w2.setSelected(false); w2.addActionListener(new W2Listener()); . ButtonGroup selectShapeGroup = new ButtonGroup(); selectShapeGroup.add(w1); selectShapeGroup.add(w2); . Box selectShapePanel = new Box(BoxLayout.Y_AXIS); selectShapePanel.add(w1); selectShapePanel.add(w2); . selectShapePanel.setBorder( BorderFactory.createTitledBorder( BorderFactory.createLineBorder(Color.GRAY), "Select")); . draw(new EmptyPanel()); . w3 = new JButton("Create"); w3.addActionListener(new W3Listener()); . w5 = new JButton("Reset"); w5.setEnabled(false); w5.addActionListener(new W5Listener()); . w0 = new JButton("Exit"); w0.addActionListener(new W0Listener()); . JPanel buttonPanel = new JPanel(); buttonPanel.add(w3); buttonPanel.add(w5); buttonPanel.add(w0); . contentPane.add(selectShapePanel, BorderLayout.WEST); contentPane.add(w4, BorderLayout.CENTER); contentPane.add(buttonPanel, BorderLayout.SOUTH); . w6 = new JCheckBox("Log exit time."); 93 A B C D E F G L Radio Button Demo - - - - - - - X - - - - - - X - - - - - - X - - - - - - - - - - - - - - - - - - - - X X X - - - - X X X - - - - - - - - - - - X X X - - - - X X X - - - - - - - - - - - - - - - - - - X X X - - - - X X X - - - - X X X - - - - X X X - - - - - - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - X X - - - - - X X - - - - - - - - - - - - - - - - - - - X X - - - - - - - - - - - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - X - - - - - - - - - - - - - - - - - - - - X - - - - - - - - - - - - - - - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - - - - - X X X - - - - 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 . setContentPane(contentPane); setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); } . private void draw(JPanel shape) { if (w4 != null) { contentPane.remove(w4); } w4 = shape; w4.setBorder(BorderFactory.createTitledBorder( BorderFactory.createLineBorder(Color.GRAY), "Rendered Shape")); contentPane.add(w4); repaint(); pack(); } . class W1Listener implements ActionListener { @Override public void actionPerformed(ActionEvent arg0) { currentShape = Shape.CIRCLE; if (created) { draw(new CirclePanel()); } } } . class W2Listener implements ActionListener { . @Override public void actionPerformed(ActionEvent arg0) { currentShape = Shape.SQUARE; if (created) { draw(new SquarePanel()); } } } . class W3Listener implements ActionListener { @Override public void actionPerformed(ActionEvent e) { JPanel shape; if (currentShape == Shape.CIRCLE || 94 A B C D E F G L Radio Button Demo - - - - - - - X X - - - - - - - - - - - - X X X - - - - X X X - - - - - X - - - - - X - X - - - - X - - - - - - - - - - - - - - - X - - - - X X X - - - - X X X - - - - - - - - - - - - - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - X X - - - - - - - - - - - - X X - - - - - - - - - - - - X - - - - - - - - - - - - - X X - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - X X - - - - - - - - - - - - X X - - - - - - - - - - - - - - - - - - - X X 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 currentShape == Shape.SQUARE) { w5.setEnabled(true); } created = true; if (currentShape == Shape.CIRCLE) shape = new CirclePanel(); else if (currentShape == Shape.SQUARE) shape = new SquarePanel(); else shape = new EmptyPanel(); draw(shape); } } . class W5Listener implements ActionListener { @Override public void actionPerformed(ActionEvent e) { w5.setEnabled(false); created = false; draw(new EmptyPanel()); } } . class W0Listener implements ActionListener { @Override public void actionPerformed(ActionEvent e) { . String message = "Are you sure?"; . if (created == true && currentShape == Shape.CIRCLE) { exitCircle++; } if (created == true && currentShape == Shape.SQUARE) { exitSquare++; } . Object[] params = { message, w6 }; . if (exitCircle == 10 || exitSquare == 10) { System.exit(0); } int exit = 95 A B C D E F G L Radio Button Demo - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - X - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - X - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 JOptionPane.showConfirmDialog(null, params, "Exit Confirmation", JOptionPane.YES_NO_OPTION); if (exit == 0) { if (w6.isSelected()) { writeTimeStamp(); } System.exit(0); } } } . private void writeTimeStamp() { } . class CirclePanel extends JPanel { private static final long serialVersionUID = 1L; . public CirclePanel() { super(); } . @Override public void paintComponent(Graphics g) { g.drawOval(60, 30, 45, 45); } } . class SquarePanel extends JPanel { private static final long serialVersionUID = 1L; . public SquarePanel() { super(); } . @Override public void paintComponent(Graphics g) { g.drawRect(60, 30, 45, 45); } } . class EmptyPanel extends JPanel { private static final long serialVersionUID = 1L; . 96 A B C D E F G L Radio Button Demo X - X - - - - X - X - - - - X - X - - - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - X - - - - - - X - - - - - - X - - - - - - - - - - - - - - - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - X - - - - - - - - - - - - - X - - - - - - - - - - - - - 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 public EmptyPanel() { super(); } } . private static void createAndShowGUI() { RadioButtonDemo frame = new RadioButtonDemo(); frame.setVisible(true); frame.pack(); } . public static void main(String[] args) { javax.swing.SwingUtilities.invokeLater( new Runnable() { public void run() { createAndShowGUI(); } }); } } 97 Bibliography [1] AMALFITANO, D., FASOLINO, A. R., AND TRAMONTANA, P. Rich Internet Application Testing Using Execu- tion Trace Data. In Conference on Software Testing, Verification, and Validation Workshops (2010), pp. 274–283. [2] ARISS, O. E., XU, D., DANDEY, S., VENDER, B., MCCLEAN, P., AND SLATOR, B. A Systematic Capture and Replay Strategy for Testing Complex GUI Based Java Applications. In Conference on Information Technology (2010), pp. 1038–1043. [3] ARLT, S., BANERJEE, I., BERTOLINI, C., MEMON, A. M., AND SCHAF, M. Grey-box GUI Testing: Efficient Generation of Event Sequences. CoRR abs/1205.4928 (2012). [4] ARLT, S., PODELSKI, A., BERTOLINI, C., SCHAF, M., BANERJEE, I., AND MEMON, A. Lightweight Static Analysis for GUI Testing. In ISSRE’12 Proceedings of the 23rd IEEE International Symposium on Software Reliability Engineering (Washington, DC, USA, 2012), IEEE Computer Society. [5] BANERJEE, I., NGUYEN, B., GAROUSI, V., AND MEMON, A. Graphical User Interface (GUI) Testing: Sys- tematic Mapping and Repository. Information and Software Technology (2013). [6] BELLI, F. Finite-State Testing and Analysis of Graphical User Interfaces. In Symposium on Software Reliability Engineering (2001), p. 34. [7] BERTOLINI, C., AND MOTA, A. A Framework for GUI Testing based on Use Case Design. In Conference on Software Testing, Verification, and Validation Workshops (2010), pp. 252–259. [8] BERTOLINI, C., PERES, G., AMORIM, M., AND MOTA, A. An Empirical Evaluation of Automated Black-Box Testing Techniques for Crashing GUIs. In Software Testing Verification and Validation (2009), pp. 21–30. [9] CAI, K.-Y., ZHAO, L., AND WANG, F. A Dynamic Partitioning Approach for GUI Testing. In Computer Software and Applications (2006), pp. 223–228. [10] CHANG, T.-H., YEH, T., AND MILLER, R. C. GUI Testing Using Computer Vision. In Conference on Human factors in computing systems (2010), pp. 1535–1544. [11] CHEN, J., AND SUBRAMANIAM, S. Specification-based Testing for GUI-based Applications. Software Quality Journal 10, 2 (2002), 205–224. 98 [12] CHEN, W.-K., TSAI, T.-H., AND CHAO, H.-H. Integration of Specification-Based and CR-Based Approaches for GUI Testing. In Conference on Advanced Information Networking and Applications (2005), pp. 967–972. [13] DABOCZI, T., KOLLAR, I., SIMON, G., AND MEGYERI, T. Automatic Testing of Graphical User Interfaces. In Instrumentation and Measurement Technology Conference (2003), pp. 441–445. [14] DEREZINSKA, A., AND MALEK, T. Experiences in Testing Automation of a Family of Functional-and GUI- similar Programs. Journal of Computer Science & Applications 4, 1 (2007), 13–26. [15] ESMELIOGLU, S., AND APFELBAUM, L. Automated test generation, execution, and reporting. In In Proceed- ings of Pacific Northwest Software Quality Conference (Oct 1997), IEEE Press. [16] GANOV, S., KILLMAR, C., KHURSHID, S., AND PERRY, D. E. Event Listener Analysis and Symbolic Execu- tion for Testing GUI Applications. Formal methods and software engineering 5885, 1 (2009), 69–87. [17] GARG, A., VIDYARAMAN, S., UPADHYAYA, S., AND KWIAT, K. USim: a User Behavior Simulation Frame- work for Training and Testing IDSes in GUI Based Systems. In Symposium on Simulation (2006), pp. 196–203. [18] GRECHANIK, M., XIE, Q., AND FU, C. Creating GUI Testing Tools Using Accessibility Technologies. In Conference on Software Testing, Verification, and Validation (2009), pp. 243–250. [19] HELLMANN, T. D., HOSSEINI-KHAYAT, A., AND MAURER, F. Supporting Test-Driven Development of Graphical User Interfaces Using Agile Interaction Design. In Proceedings of the 2010 Third International Conference on Software Testing, Verification, and Validation Workshops (2010), pp. 444–447. [20] HICINBOTHOM, J. H., AND ZACHARY, W. W. A Tool for Automatically Generating Transcripts of Human- Computer Interaction, vol. 2. 1993, p. 1042. [21] HU, C., AND NEAMTIU, I. A GUI bug finding framework for Android applications. In Proceedings of the 2011 ACM Symposium on Applied Computing (2011), SAC ’11, ACM, pp. 1490–1491. [22] HU, C., AND NEAMTIU, I. Automating GUI testing for Android applications. In Proceedings of the 6th International Workshop on Automation of Software Test (2011), pp. 77–83. [23] JAASKELAINEN, A., KATARA, M., KERVINEN, A., MAUNUMAA, M., PAAKKONEN, T., TAKALA, T., AND VIRTANEN, H. Automatic GUI test generation for smartphone applications - an evaluation. In Conference on Software Engineering (2009), pp. 112–122. [24] KASIK, D. J., AND GEORGE, H. G. Toward automatic generation of novice user test scripts. In Proceedings of the SIGCHI conference on Human factors in computing systems (1996), ACM, pp. 244–251. 99 [25] MEMON, A., BANERJEE, I., HASHMI, N., AND NAGARAJAN, A. DART: A Framework for Regression Test- ing. In Conference on Software Maintenance (2003), pp. 410–419. [26] MEMON, A., BANERJEE, I., AND NAGARAJAN, A. GUI ripping: Reverse engineering of graphical user interfaces for testing. In Conference on Reverse Engineering (2003), pp. 260–269. [27] MEMON, A., BANERJEE, I., NGUYEN, B., AND ROBBINS, B. The First Decade of GUI Ripping: Extensions, Applications, and Broader Impacts. In Proceedings of the 20th Working Conference on Reverse Engineering (WCRE) (2013), IEEE Press. [28] MEMON, A. M. Developing testing techniques for event-driven pervasive computing applications. In Proceed- ings of The OOPSLA 2004 workshop on Building Software for Pervasive Computing (BSPC 2004) (2004). [29] MEMON, A. M. Employing User Profiles to Test a New Version of a GUI Component in its Context of Use. Software Quality Control 14, 4 (2006), 359–377. [30] MEMON, A. M. An Event-flow Model of GUI-based Applications for Testing. Software Testing, Verification & Reliability 17, 3 (2007), 137–157. [31] MEMON, A. M., BANERJEE, I., AND NAGARAJAN, A. Automatically testing “nightly/daily builds” of GUI applications. In Proceedings of the 2003 International Conference on Dependable Systems and Networks (June 2003). [32] MEMON, A. M., BANERJEE, I., AND NAGARAJAN, A. What Test Oracle Should I Use for Effective GUI Testing? In Conference on Automated Software Engineering (2003), pp. 164–173. [33] MEMON, A. M., POLLACK, M. E., AND SOFFA, M. L. Hierarchical GUI Test Case Generation Using Auto- mated Planning. IEEE Transactions on Software Engineering 27, 2 (2001), 144–155. [34] MEMON, A. M., AND XIE, Q. Studying the Fault-Detection Effectiveness of GUI Test Cases for Rapidly Evolving Software. IEEE Trans. Softw. Eng. (2005), 884–896. [35] NGUYEN, B. N., ROBBINS, B., BANERJEE, I., AND MEMON, A. GUITAR: An innovative tool for automated testing of GUI-driven software. Automated Software Engineering (2013), 1–41. [36] OSTRAND, T., ANODIDE, A., FOSTER, H., AND GORADIA, T. A visual test development environment for GUI systems. ACM SIGSOFT Software Engineering Notes 23, 2 (1998), 82–92. [37] PAIVAA, A. C., FARIAA, J. C., AND VIDAL, R. F. Towards the Integration of Visual and Formal Models for GUI Testing. Electronic Notes in Theoretical Computer Science 190, 2 (2007), 99–111. 100 [38] RISOLDI, M., AND BUCHS, D. A domain specific language and methodology for control systems GUI specifica- tion, verification and prototyping. 2007 IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, 2007, pp. 179–182. [39] SHEHADY, R. K., AND SIEWIOREK, D. P. A Method to Automate User Interface Testing Using Variable Finite State Machines. In Symposium on Fault-Tolerant Computing (1997), p. 80. [40] SUN, Y., AND JONES, E. L. Specification-driven Automated Testing of GUI-based Java Programs. In ACM Southeast Regional Conference (2004), pp. 140–145. [41] TAKALA, T., KATARA, M., AND HARTY, J. Experiences of system-level model-based GUI testing of an Android application. In Software Testing, Verification and Validation (ICST), 2011 IEEE Fourth International Conference on (2011), IEEE, pp. 377–386. [42] WHITE, L., AND ALMEZEN, H. Generating Test Cases for GUI Responsibilities Using Complete Interaction Sequences. In Symposium on Software Reliability Engineering (2000), p. 110. [43] XIE, Q., AND MEMON, A. M. Designing and Comparing Automated Test Oracles for GUI-based Software Applications. ACM Transactions on Software Engineering and Methodology 16, 1 (2007), 1–36. [44] XIE, Q., AND MEMON, A. M. Using a Pilot Study to Derive a GUI Model for Automated Testing. ACM Transactions on Software Engineering and Methodology 18, 2 (2008), 1–33. [45] YUAN, X., COHEN, M. B., AND MEMON, A. M. Towards Dynamic Adaptive Automated Test Generation for Graphical User Interfaces. In Conference on Software Testing, Verification, and Validation Workshops (2009), pp. 263–266. [46] YUAN, X., COHEN, M. B., AND MEMON, A. M. GUI Interaction Testing: Incorporating Event Context. In IEEE Transactions on Software Engineering (2010). [47] YUAN, X., AND MEMON, A. M. Generating Event Sequence-Based Test Cases Using GUI Runtime State Feedback. IEEE Trans. Softw. Eng. 36 (January 2010), 81–95. 101