ABSTRACT Title of dissertation: INTERACTIVE EXPLORATION OF TEMPORAL EVENT SEQUENCES Krist Wongsuphasawat Doctor of Philosophy, 2012 Dissertation directed by: Professor Ben Shneiderman Department of Computer Science Life can often be described as a series of events. These events contain rich in- formation that, when put together, can reveal history, expose facts, or lead to discov- eries. Therefore, many leading organizations are increasingly collecting databases of event sequences : Electronic Medical Records (EMRs), transportation incident logs, student progress reports, web logs, sports logs, etc. Heavy investments were made in data collection and storage, but di culties still arise when it comes to making use of the collected data. Analyzing millions of event sequences is a non-trivial task that is gaining more attention and requires better support due to its complex nature. Therefore, I aimed to use information visualization techniques to support exploratory data analysis|an approach to analyzing data to formulate hypothe- ses worth testing|for event sequences. By working with the domain experts who were analyzing event sequences, I identi ed two important scenarios that guided my dissertation: First, I explored how to provide an overview of multiple event sequences? Lengthy reports often have an executive summary to provide an overview of the report. Unfortunately, there was no executive summary to provide an overview for event sequences. Therefore, I designed LifeFlow, a compact overview visualization that summarizes multiple event sequences, and interaction techniques that supports users? exploration. Second, I examined how to support users in querying for event sequences when they are uncertain about what they are looking for. To support this task, I developed similarity measures (the M&M measure 1-2 ) and user interfaces (Similan 1-2 ) for querying event sequences based on similarity, allowing users to search for event sequences that are similar to the query. After that, I ran a controlled experiment comparing exact match and similarity search interfaces, and learned the advantages and disadvantages of both interfaces. These lessons learned inspired me to develop Flexible Temporal Search (FTS) that combines the bene ts of both interfaces. FTS gives con dent and countable results, and also ranks results by similarity. I continued to work with domain experts as partners, getting them involved in the iterative design, and constantly using their feedback to guide my research directions. As the research progressed, several short-term user studies were con- ducted to evaluate particular features of the user interfaces. Both quantitative and qualitative results were reported. To address the limitations of short-term evalu- ations, I included several multi-dimensional in-depth long-term case studies with domain experts in various elds to evaluate deeper bene ts, validate generalizabil- ity of the ideas, and demonstrate practicability of this research in non-laboratory environments. The experience from these long-term studies was combined into a set of design guidelines for temporal event sequence exploration. My contributions from this research are LifeFlow, a visualization that com- pactly displays summaries of multiple event sequences, along with interaction tech- niques for users? explorations; similarity measures (the M&M measure 1-2 ) and simi- larity search interfaces (Similan 1-2 ) for querying event sequences; Flexible Temporal Search (FTS), a hybrid query approach that combines the bene ts of exact match and similarity search; and case study evaluations that results in a process model and a set of design guidelines for temporal event sequence exploration. Finally, this research has revealed new directions for exploring event sequences. INTERACTIVE EXPLORATION OF TEMPORAL EVENT SEQUENCES by Krist Wongsuphasawat Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial ful llment of the requirements for the degree of Doctor of Philosophy 2012 Advisory Committee: Dr. Ben Shneiderman, Chair/Advisor Dr. Catherine Plaisant Dr. Amol Deshpande Dr. Lise Getoor Dr. David Gotz Dr. Je rey Herrmann c Copyright by Krist Wongsuphasawat 2012 Dedication To the Wongsuphasawat Family ii Acknowledgments Doing a PhD is a long, complicated and devoted journey. Of course, there were good, fun, inspiring and exciting moments; otherwise, I would not have survived long enough to be sitting here writing this dissertation. However, there were also many challenges and obstacles, which I probably would not be able to go through without the advice, inspiration and support from these people, whom I deeply appreciate. First and foremost, I would like to thank my advisor, Ben Shneiderman. Ben introduced me to the world of information visualization and inspired me to continue for a PhD after completing my Master?s degree. He is a truly devoted professor who constantly inspires people around him with his passion and energy. His tremendous work in HCI is nothing short of amazing. His warmth, encouragement, creativity, depth and breadth of knowledge, and great vision had helped me greatly throughout my PhD training. He taught me how to be a good researcher and also showed how to be a good colleague and a good person. He always supported me when I was in doubt and celebrated my success with his signature \Bravo!" I am grateful and feel very fortunate to have a chance to work with and learn from him. The second person that I would like to thank is Catherine Plaisant. By working with a great researcher like her, I have learned many design principles and valuable lessons about how to work with users. In addition, her enthusiasm and energy were like triple shot espressos that could cheer me up even on a depressing rainy day. I could not remember how many times I walked to her room (which was conveniently located right in front of my cubicle) and asked for help or suggestions. She always iii helped me with smiles no matter how big or small the problem was. The work in this dissertation could not have been completed without her invaluable feedback and suggestions. Similar to Ben, it was my honor and great pleasure to work with and learn from Catherine. I am thankful to Lise Getoor, Amol Deshpande, Je rey Herrmann and David Gotz for agreeing to serve on my dissertation committee, for sparing their invaluable time reviewing the manuscript, and for sharing their thoughtful suggestions that help me improve this dissertation. Additionally, I would like to thank David Gotz for o ering me an internship at IBM, which was a great experience that led to the spin-o Out ow project. I am also thankful for the guidance from Samir Khuller when I started working on Similan. I owe my thanks to Taowei David Wang who had helped me start my PhD student life and set many good examples for me to follow. Taking care of his LifeLines2 while he was away to a summer internship also led me to many ideas that later became LifeFlow. I appreciate his generous help and suggestions during the time I was nding my dissertation topic and preparing my proposal. I appreciate John Alexis Guerra G omez?s help in the development of LifeFlow and thank him for taking care of LifeFlow while I was away to a summer internship. Many new datasets were made available for analysis in the late stage of my dissertation thanks to Hsueh-Chien Cheng, who developed DataKitchen and made data preprocessing a much more pleasurable experience. My colleagues in the Human-Computer Interaction Lab have made my PhD life such a great memory. I have enjoyed every moment of our discussion, collabora- iv tion (and procrastination). I will always remember the brown bag lunches (including the free pizzas), normal lunches (which people often found me with a bag of Chick- l-A), annual symposiums (and the green t-shirts), helpful practice talks, fruitful brainstorming sessions (aka: productive-procrastination), yummy international cui- sine tasting, relaxing late-night hangouts in the lab, entertaining discussions in the cubicles, and all other memorable activities. I appreciate the support from Mark Smith and his team from the Washington Hospital, MedStar Health and MedStar Institute for Innovation, especially Phuong Ho and A. Zach Hettinger. Without the inspirations from their medical problems, none of the projects discussed herein would have materialized. In addition, I would like to acknowledge nancial support from the National Institutes of Health (NIH) Grant RC1CA147489 and Center for Integrated Transportation Systems Manage- ment (CITSM), a tier 1 transportation center at the University of Maryland. Many thanks to the user study participants: Phuong Ho, Nikola Ivanov, Michael VanDaniker, Michael L. Pack, Sigfried Gold, A. Zach Hettinger, Jae-wook Ahn, Daniel Lertpratchya, Chanin Chanma, Sorawish Dhanaphanichakul and Anne Rose, as well as the anonymized participants in other studies. Their precious feed- backs have contributed greatly to my research. Studying abroad and living away from the place that I had been living for my entire life was never easy. I owe my thanks to the Thai communities in Maryland and DC area for making this place feel like a home away from home. Thank you for inviting me to dinners and fun activities on weekends. I am also thankful to my friends, no matter where you are, who chatted with me when I was bored, frustrated, v or just sleepy on Monday morning. Thank you for keeping my sanity in check and reminding me every once in a while that the dissertation is only a part of my life, not my entire life. Words cannot express the gratitude I owe my family. I would not be the person I am today without the nurture of my parents, grandparents and aunt. My family always stood by me when I questioned myself on my quest to earn a PhD. Their support and guidance gave me the strengths to overcome all challenges and obstacles. They celebrated my achievements, made fun of my photos on Facebook at certain times, and embraced me with love every time I went home. I also have to thank my lovely Ben for supporting me to follow my dreams, having faith in me and adding unforgettable memories every time I returned. Remembering everybody is a challenging task. I have tried to acknowledge as many people as I could remember, but if I have inadvertently left anyone out, I sincerely apologize. While I am about to nish writing this dissertation in the next paragraph, I would like to thank the person who created this LATEX template for making my life easier, and PhDComics for making my life funnier. Lastly, my gratitude is described in Algorithm 1. Algorithm 1 In nite Gratitude Require: a lot of water 1: for int i=0; i<555; i=i+0 do 2: say("Thank you."); 3: end for vi Table of Contents List of Figures xiv List of Abbreviations xxv 1 Introduction 1 1.1 Overview of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Background and Related Work 11 2.1 Information Visualization . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.1 Temporal Data Visualization . . . . . . . . . . . . . . . . . . . 11 2.1.1.1 Single-Record Visualization . . . . . . . . . . . . . . 11 2.1.1.2 Visualization of multiple records in parallel . . . . . 14 2.1.1.3 Visualization that aggregates multiple records . . . . 16 2.1.2 Hierarchy Visualization . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3 State Transition Visualization . . . . . . . . . . . . . . . . . . 21 2.1.4 Flow Visualization . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Query Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.1 Query Languages . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.2 Query-by-Example Languages . . . . . . . . . . . . . . . . . . 24 2.2.3 Query by Graphical User Interfaces (GUIs) . . . . . . . . . . . 27 2.2.3.1 Exact Match Approach . . . . . . . . . . . . . . . . 27 2.2.3.2 Similarity Search Approach . . . . . . . . . . . . . . 28 2.2.4 Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.4.1 Numerical Time Series . . . . . . . . . . . . . . . . . 29 2.2.4.2 String and Biological Sequences . . . . . . . . . . . . 30 2.2.4.3 Event Sequences . . . . . . . . . . . . . . . . . . . . 31 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3 Providing an Overview of Temporal Event Sequences to Spark Exploration: LifeFlow Visualization 35 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Motivating Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2.1 Event De nitions . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2.2 Example question . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Data Aggregation: Tree of Sequences . . . . . . . . . . . . . . . . . . 38 3.4 Visual Representation: LifeFlow Visualization . . . . . . . . . . . . . 42 3.5 Basic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6.2 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.6.2.1 Tasks 1-9: Simple Features . . . . . . . . . . . . . . 52 vii 3.6.2.2 Task 10-14: Advanced Features . . . . . . . . . . . . 52 3.6.2.3 Task 15: Overall analysis and nding anomalies . . . 52 3.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6.3.1 Tasks 1-14 . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6.3.2 Task 15 . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6.3.3 Debrie ng . . . . . . . . . . . . . . . . . . . . . . . . 55 3.6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.7 Advanced Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4 Querying Event Sequences by Similarity Search 67 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.1.1 Example of Event Sequence Query . . . . . . . . . . . . . . . 67 4.1.2 Motivation for Similarity Search . . . . . . . . . . . . . . . . . 68 4.1.3 Chapter Organization . . . . . . . . . . . . . . . . . . . . . . . 70 4.2 Similan and the M&M Measure: The First Version . . . . . . . . . . 71 4.2.1 Introduction to the Match & Mismatch (M&M) Measure . . . 71 4.2.2 Description of the User Interface: Similan . . . . . . . . . . . 72 4.2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.2.2 Events and Timeline . . . . . . . . . . . . . . . . . . 74 4.2.2.3 Alignment . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2.2.4 Rank-by-feature . . . . . . . . . . . . . . . . . . . . 75 4.2.2.5 Scatterplot . . . . . . . . . . . . . . . . . . . . . . . 77 4.2.2.6 Comparison . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.3 The Match&Mismatch (M&M) Measure . . . . . . . . . . . . 78 4.2.3.1 Matching . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.3.2 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3.1 Usability Study Procedure and Tasks . . . . . . . . . . . . . . 85 4.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.3.3 Pilot Study of a New Prototype . . . . . . . . . . . . . . . . . 89 4.4 Similan and the M&M Measure: The Second Version . . . . . . . . . 90 4.4.1 Description of the User Interface: Similan2 . . . . . . . . . . . 90 4.4.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4.1.2 Query . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.4.1.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . 93 4.4.1.4 Weights . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.4.2 The Match and Mismatch (M&M) Measure v.2 . . . . . . . . 95 4.4.2.1 Matching . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4.2.2 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.5 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.5.1 Motivation for a Controlled Experiment . . . . . . . . . . . . 103 4.5.2 Description of the User Interface: LifeLines2 . . . . . . . . . . 105 4.5.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 viii 4.5.3.1 Research questions . . . . . . . . . . . . . . . . . . . 106 4.5.3.2 Participants . . . . . . . . . . . . . . . . . . . . . . . 107 4.5.3.3 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . 107 4.5.3.4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.5.3.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 111 4.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.5.4.1 Performance Time . . . . . . . . . . . . . . . . . . . 112 4.5.4.2 Error Rates . . . . . . . . . . . . . . . . . . . . . . . 114 4.5.4.3 Subjective Ratings . . . . . . . . . . . . . . . . . . . 114 4.5.4.4 Debrie ng . . . . . . . . . . . . . . . . . . . . . . . . 115 4.6 Lessons Learned and Ideas for Hybrid Interface . . . . . . . . . . . . 117 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5 Combining Exact Match and Similarity Search for Querying Event Sequences: Flexible Temporal Search 121 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.2 Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.2.1 How to de ne a cut-o point? Mandatory & Optional Flags . 122 5.2.2 Speci cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.2.3 Grade & Similarity Score . . . . . . . . . . . . . . . . . . . . . 127 5.3 FTS Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.3.2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.3.3 Grading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.3.4 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.3.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.3.6 Di erence from the M&M Measure . . . . . . . . . . . . . . . 135 5.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.4.1 Query Representation . . . . . . . . . . . . . . . . . . . . . . 137 5.4.2 Query Speci cation . . . . . . . . . . . . . . . . . . . . . . . . 140 5.4.3 Search Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.5 Use case scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6 Multi-dimensional In-depth Long-term Case Studies 148 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.2 Monitoring Hospital Patient Transfers . . . . . . . . . . . . . . . . . . 150 6.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.2.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.2.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.2.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.2.4.1 First impression . . . . . . . . . . . . . . . . . . . . 151 6.2.4.2 Understanding the big picture . . . . . . . . . . . . . 153 6.2.4.3 Measuring the transfer time . . . . . . . . . . . . . . 155 6.2.4.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . 156 ix 6.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.3 Comparing Tra c Agencies . . . . . . . . . . . . . . . . . . . . . . . 157 6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.3.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.3.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.3.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.3.4.1 Quantifying data quality issues . . . . . . . . . . . . 159 6.3.4.2 Ranking the agencies? performance . . . . . . . . . . 161 6.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.4 Analyzing Drug Utilization . . . . . . . . . . . . . . . . . . . . . . . . 165 6.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.4.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.4.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.4.3.1 Drug prescribing patterns . . . . . . . . . . . . . . . 167 6.4.3.2 Drug switching . . . . . . . . . . . . . . . . . . . . . 167 6.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 6.5 Hospital Readmissions . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.5.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.5.3 Before the Study . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.5.4 Early Progress . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 6.5.5 Analysis: Personal Exploration . . . . . . . . . . . . . . . . . 172 6.5.6 Analysis: Administration . . . . . . . . . . . . . . . . . . . . . 173 6.5.6.1 Revisit . . . . . . . . . . . . . . . . . . . . . . . . . . 173 6.5.6.2 Revisit by recurring patients . . . . . . . . . . . . . . 175 6.5.6.3 Admission . . . . . . . . . . . . . . . . . . . . . . . . 177 6.5.6.4 Mortality . . . . . . . . . . . . . . . . . . . . . . . . 178 6.5.6.5 Revived patients and John Doe . . . . . . . . . . . . 181 6.5.6.6 Identify an interesting pattern and then search for it 181 6.5.6.7 Diagnoses . . . . . . . . . . . . . . . . . . . . . . . . 184 6.5.6.8 Frequent visitors . . . . . . . . . . . . . . . . . . . . 185 6.5.7 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . 187 6.6 How do people read children books online? . . . . . . . . . . . . . . . 190 6.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 6.6.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.6.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.6.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 6.6.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 192 6.6.4.2 First observation . . . . . . . . . . . . . . . . . . . . 194 6.6.4.3 How do people read from the second page? . . . . . . 194 6.6.4.4 Reading backwards . . . . . . . . . . . . . . . . . . . 194 6.6.5 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . 197 6.7 Studying User Activities in Adaptive Exploratory Search Systems . . 199 6.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 6.7.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 x 6.7.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 6.7.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 6.7.4.1 Switch of activities . . . . . . . . . . . . . . . . . . . 204 6.7.4.2 User activity patterns Part I: Frequent patterns . . . 207 6.7.4.3 User activity patterns Part II: User model exploration212 6.7.5 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . 213 6.8 Tracking Cement Trucks . . . . . . . . . . . . . . . . . . . . . . . . . 214 6.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 6.8.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 6.8.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 6.8.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 6.8.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 216 6.8.4.2 Anomalies . . . . . . . . . . . . . . . . . . . . . . . . 217 6.8.4.3 Monitoring plants? performance . . . . . . . . . . . . 218 6.8.4.4 Classifying customers from delay at sites . . . . . . . 220 6.8.4.5 Search . . . . . . . . . . . . . . . . . . . . . . . . . . 221 6.8.5 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . 222 6.9 Soccer Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 6.9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 6.9.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 6.9.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 6.9.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 6.9.4.1 Finding entertaining matches . . . . . . . . . . . . . 225 6.9.4.2 Predicting chances of winning . . . . . . . . . . . . . 228 6.9.4.3 Explore statistics . . . . . . . . . . . . . . . . . . . . 228 6.9.4.4 Search for speci c situations . . . . . . . . . . . . . . 243 6.9.5 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . 245 6.10 Usage Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 6.11 A Process Model for Exploring Event Sequences . . . . . . . . . . . . 250 6.11.1 De ning goals . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 6.11.2 Gathering information . . . . . . . . . . . . . . . . . . . . . . 252 6.11.2.1 Preprocessing data . . . . . . . . . . . . . . . . . . . 252 6.11.2.2 Cleaning data . . . . . . . . . . . . . . . . . . . . . . 254 6.11.3 Re-representing the information to aid analysis . . . . . . . . 255 6.11.4 Manipulating the representation to gain insight . . . . . . . . 257 6.11.4.1 Manipulating the representation . . . . . . . . . . . . 257 6.11.4.2 Exploring results of manipulation . . . . . . . . . . . 258 6.11.4.3 Searching for patterns . . . . . . . . . . . . . . . . . 259 6.11.4.4 Exploring search results . . . . . . . . . . . . . . . . 260 6.11.4.5 Handling ndings . . . . . . . . . . . . . . . . . . . . 261 6.11.5 Producing and disseminating results . . . . . . . . . . . . . . 263 6.11.5.1 Recording ndings . . . . . . . . . . . . . . . . . . . 263 6.11.5.2 Producing results . . . . . . . . . . . . . . . . . . . . 264 6.11.5.3 Disseminating results . . . . . . . . . . . . . . . . . . 264 6.11.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 xi 6.12 Design Recommendations . . . . . . . . . . . . . . . . . . . . . . . . 265 6.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 7 Conclusions and Future Directions 270 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 7.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 7.2.1 Improving the overview visualization . . . . . . . . . . . . . . 273 7.2.2 Improving the search . . . . . . . . . . . . . . . . . . . . . . . 275 7.2.3 Supporting more complex data . . . . . . . . . . . . . . . . . 276 7.2.4 Supporting new tasks . . . . . . . . . . . . . . . . . . . . . . . 278 7.2.5 Scalability and Performance . . . . . . . . . . . . . . . . . . . 280 7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 A Examples of Temporal Event Sequences 282 B LifeFlow Software Implementation 288 B.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 B.2 Input Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 B.2.1 Event Data File . . . . . . . . . . . . . . . . . . . . . . . . . . 289 B.2.2 Attribute Data File (optional) . . . . . . . . . . . . . . . . . . 289 B.2.3 Con g File (optional) . . . . . . . . . . . . . . . . . . . . . . . 290 B.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 B.3.1 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . 290 B.3.2 Code Organization . . . . . . . . . . . . . . . . . . . . . . . . 291 B.4 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 B.4.1 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 B.4.2 Fundamental Data Structures . . . . . . . . . . . . . . . . . . 293 B.4.3 Handling Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 295 B.5 Main Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 B.5.1 Main class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 B.5.2 LifeFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 B.5.2.1 Tree of Sequences . . . . . . . . . . . . . . . . . . . . 297 B.5.2.2 Flexible Geometry . . . . . . . . . . . . . . . . . . . 297 B.5.2.3 LifeFlow Component . . . . . . . . . . . . . . . . . . 298 B.5.3 LifeLines2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 B.5.4 Flexible Temporal Search (FTS) . . . . . . . . . . . . . . . . . 301 B.5.4.1 Similarity search . . . . . . . . . . . . . . . . . . . . 301 B.5.4.2 User Interface . . . . . . . . . . . . . . . . . . . . . . 302 C Spin-o : Out ow 303 C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 C.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 C.2.1 Congestive Heart Failure (CHF) . . . . . . . . . . . . . . . . . 306 C.2.2 Soccer Result Analysis . . . . . . . . . . . . . . . . . . . . . . 308 C.3 Description of the Visualization . . . . . . . . . . . . . . . . . . . . . 309 xii C.3.1 Data Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . 311 C.3.2 Visual Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 312 C.3.3 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 C.3.3.1 B ezier Curve . . . . . . . . . . . . . . . . . . . . . . 314 C.3.3.2 Sugiyama?s Heuristics . . . . . . . . . . . . . . . . . 314 C.3.3.3 Force-directed Layout . . . . . . . . . . . . . . . . . 315 C.3.3.4 Edge Routing . . . . . . . . . . . . . . . . . . . . . . 316 C.3.4 Basic Interactions . . . . . . . . . . . . . . . . . . . . . . . . . 317 C.3.5 Simpli cation . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 C.3.6 Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 C.4 Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 C.5 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 C.5.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 C.5.1.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 327 C.5.1.2 Tasks and Questionnaire . . . . . . . . . . . . . . . . 328 C.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 C.5.2.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 330 C.5.2.2 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . 331 C.5.2.3 Unrestricted Exploration . . . . . . . . . . . . . . . . 332 C.5.2.4 Questionnaire and Debrie ng . . . . . . . . . . . . . 333 C.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Bibliography 336 xiii List of Figures 1.1 LifeFlow : An overview visualization for event sequences . . . . . . . . 9 1.2 Similan 2 : Query event sequences by similarity . . . . . . . . . . . . 9 1.3 Flexible Temporal Search (FTS): A hybrid approach that combines the bene ts of exact match and similarity search for event sequences . 10 2.1 LifeLines [97] displays a medical history of a single patient on a timeline. 13 2.2 The Spiral visualization approach of Weber et al. [137] applied to the power usage dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Continuum?s overview panel (top) displays a histogram as an overview of the data set. [10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 LifeLines2 [131, 132] displays a temporal summary in the bottom as an overview of the data set. . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 Icicle tree in ProtoVis. [21] The visualization is called an icicle tree because it resembles a row of icicles hanging from the eaves. [66] . . . 19 2.6 Blaas et al.?s state transition visualization: A smooth graph repre- sentation of a labeled biological time-series. Each ring represents a state, and the edges between states visualize the state transitions. This graph uses smooth curves to explicitly visualize third order tran- sitions, so that each curved edge represents a unique sequence of four successive states. The orange node is part of a selection set, and all transitions matching the current selection are highlighted in orange. [17] 21 2.7 Charles Minard?s Map of Napolean?s Russian Campaign of 1812 . . . 23 2.8 An example of query-by- lters: PatternFinder [37] allows users to specify the attributes of events and time spans to produce pattern queries. The parameters in the controls are converted directly into constraints that can be used to retrieve the records. . . . . . . . . . . 25 2.9 An example of query-by-example: QueryMarvel [54] allows users to draw comic strips to construct queries. With its exact result back- end, the comic strips are converted into rules, as seen in the status bar. (4). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1 This diagram explains how a LifeFlow visualization can be constructed to summarize four records of event sequences. Raw data are repre- sented as colored triangles on a horizontal timeline (using the tra- ditional approach also used in LifeLines2). Each row represents one record. The records are aggregated by sequence into a data structure called a tree of sequences. The tree of sequences is then converted into a LifeFlow visualization. Each tree node is represented with an event bar. The height of the bar is proportional to the number of records while its horizontal position is determined by the average time between events. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 xiv 3.2 Tree of sequences data structure of the event sequences in Figure 3.1: Each node in the tree contains an event type and number of records. Each edge contains time gap information and number of records. . . 41 3.3 Two trees of sequences created from a dataset with an alignment point at B. A positive tree in forward direction and a negative tree in backward direction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 This screenshot of LifeFlow shows a random sample of patient trans- fer data based on real de-identi ed data. The way to read sequences in LifeFlow is to read the colors (using the legend). For example, the sequence (A) in the gure is Arrival (blue), Emergency (pur- ple), ICU (red), Floor (green) and Discharge-Alive (light blue). The horizontal gap between colored bars represents the average time between events. The height of the bars is proportional to the num- ber of records, therefore showing the relative frequency of that se- quence. The bars (e.g. Floor and Die) with same parent (Arrival ! Emergency ! ICU) are ordered by frequency (tallest bar on top), as you can see that Floor (green bar) is placed above Die (black bar). The most frequent pattern is the tallest bar at the end. Here it shows that the most common sequence is Arrival, Emergency then Discharge alive. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) . . . . . . 44 3.5 When users move the cursor over an event bar or gap between event bars, LifeFlow highlights the sequence and shows the distribution of time gaps. Labels help the users read the sequences easier. A tooltip also appears on the left, showing more information about the time gap. In this Figure, the distribution shows that most patients were discharged within 6 hours and the rest were mostly discharged exactly after 12 hours. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) . . . . . . . . 46 3.6 Here LifeFlow is used side-by-side with LifeLines2 so that individual records can be reviewed by scrolling. When a user clicks on a sequence in LifeFlow, the sequence is highlighted and all corresponding records are also highlighted and moved to the top in LifeLines2, allowing the user to examine them in more detail. In this example, a user noticed an uncommon pattern of frequent transfer back and forth between ICU (red) and Floor (green), so he selected those patients to see more detail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.7 The same data with Figure 3.4 was aligned by ICU. The user can see that the patients were most likely to die after a transfer to the ICU than any other sequence because the black bar is the tallest bar at the end. Also, surprisingly, two patients were reported dead (see arrow) before being transferred to ICU, which is impossible. This indicates a data entry problem. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) . . . . . . 49 xv 3.8 Using the same data with Figure 3.4, the user excluded all event types except Arrival, Discharge-Alive and Die, i.e. the beginning and end of hospital visits. All other events are ignored, allowing rapid comparisons between the patients who died and survived in terms of number of patients and average time to discharge. Patients who survived were discharged after 7.5 days on average while patients who died after 8.5 days on average. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) . 50 3.9 LifeFlow with tra c incidents data: The incidents are separated by agencies (A-G). Only Incident Notification and Return to normal (aggregated) events are shown. Other events are hidden. The agencies are sorted by a simple measure of agency performance (average time from the beginning to the end). Agency C seems to be the fastest to clear its incidents, followed by E, A, H, D, F, B and nally G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.10 User right-click at the second Arrival event and select \Show at- tributes summary..." to bring up the summary table, which summa- rizes the common diagnoses. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) . 59 3.11 User assigned attribute \Status" as \Alive" and \Dead" to the pa- tients who had pattern Arrival ! Discharge-Alive and Arrival ! Die in Figure 3.8, respectively. After that, the user included other event types that were excluded earlier and chose to group records by attribute \Status" to show patterns of \Alive" and \Dead" patients. Notice that the majority of the dead patients were transferred to Floor rst and later transferred to the ICU before they died. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) . . . . . . . . . . . . . . . . . . . . . . 60 3.12 User used the measurement tool to measure the time from ICU (red) to Discharge-Alive (light blue). The tooltip shows that it took about ten days. The distribution of time gap is also displayed. . . . . . . . 63 3.13 The sequences of medical records are split into episodes using an event type Arrival to de ne the beginning of an episode. Dotted lines show separation between episodes. . . . . . . . . . . . . . . . . 64 3.14 The sequences of medical records are broken into episodes using an event type Arrival to de ne the beginning of an episode. Dotted lines show separation between episodes. . . . . . . . . . . . . . . . . 66 4.1 (top) The M&M measure (bottom-left) High time di erence (low match score) but no mismatch (high mismatch score) (bottom-right) Low time di erence (high match score) but high mismatches (low mis- match score) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 xvi 4.2 A screenshot of Similan, the predecessor of Similan2. Users can start by double-clicking to select a target record from the main panel. Sim- ilan will calculate a score that indicates how similar to the target record each record is and show scores in the color-coded grid on the left. The score color-coding bars on the right show how the scores are color-coded. The users then can sort the records according to these scores. The main panel also allows users to visually compare a target with a set of records. The timeline is binned (by year, in this screen- shot). If the users want to make a more detailed comparison, they can click on a record to show the relationship between that record and the target record in the comparison panel on the top. The plot panel at the bottom shows the distribution of records. In this exam- ple, the user is searching for students who are similar to Student 01. The user sets Student 01 as the target and sorts all records by total score. Student 18 has the highest total score of 0.92, so this suggests that Student 18 is the most similar student. Although Student 41 and Student 18 both have one missing paper submission, Student 41 has a lower match score, therefore, Student 18 has a higher total score. 73 4.3 Relative Timeline: Time scale is now relative to sentinel events (blue). Time zero is highlighted in dark gray. . . . . . . . . . . . . . . . . . . 74 4.4 Control Panel: (left) Legend of event types (categories) (middle-top) Users can choose to align events by selecting sentinel event type. (middle-bottom) Weight for calculating total score can be adjusted using slider and textboxes. (right) Links in comparison panel can be ltered using these parameters. . . . . . . . . . . . . . . . . . . . . . 76 4.5 Similan 1.5 prototype: The timeline is continuous and events are split into rows by event type. . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.6 Similarity search interface (Similan2) with the same query as in Fig- ure 4.11. Users specify the query by placing events on the query panel. To set the time range of interest and focus on events within this range, users draw a red box. After clicking on \Search", all records are sorted by their similarity to the query. The similarity score is represented by a number that is the total score and a bar with four sections. A longer bar means a higher similarity score. Each section of the rectangle corresponds to one decision criterion, e.g. the top two records have longer leftmost sections than the third record because they have lower time di erence so the Avoid Time Di erence Score (AT) is high, resulting in longer bars. Figure 4.7 shows how users can adjust the weight. . . . . . . . . . . . . . . . . 91 4.7 Similan2?s control panel has 2 tabs. The rst tab is \search" as shown in Figure 4.6. Another tab is \weight and detailed weight", for which users can adjust the weight of the four decision criteria using the four sliders in the left gure. For more advanced customization, they can even set the weight for each event type within each decision criterion by clicking on \more details" (right gure). . . . . . . . . . . . . . . 94 xvii 4.8 (left) M&M Matching v.1 (right) M&M Matching v.2: Events in each event type are matched separately. . . . . . . . . . . . . . . . . . . . 96 4.9 M&M Matching v.2: Dynamic programming table . . . . . . . . . . . 96 4.10 Four types of di erence: time di erence, missing events, extra events and swaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.11 Exact match interface (LifeLines2) showing the results of a query for patients who were admitted to the hospital then transferred to the Intensive Care Unit (ICU) within a day, then to an Intermediate ICU room on the fourth day. The user has speci ed the sequence lter on the right selecting Admit, ICU and Intermediate in the menus, and aligned the results by the time of admission. The distribution panel in the bottom of the screen shows the distribution of Intermediate, which gives an overview of the distribution and has allowed users to select the time range of interest (e.g. on the fourth day) by drawing a selection box on the distribution bar chart. . . . . . . . . . . . . . 104 4.12 Performance time a function of the interface type and the tasks (1-5). Vertical bars denote 0.95 con dence intervals. . . . . . . . . . . . . . 112 5.1 How to decide whether a constraint is mandatory or optional . . . . . 124 5.2 Flexible Temporal Search (FTS) implemented in LifeFlow software: User can draw a query and retrieve results that are split into two bins: exact match and other results. In this example, a user is querying for bounce back patients, which are patients who arrived (blue), were later moved to the ICU (red), transferred to Floor (green) and transferred back to the ICU (red) within two days. The results panel display all bounce back patients in the exact match results while showing the rest in the other results, sorted by their similarity to the query. The top patient in the other results has a pattern very similar to bounce back but the return time to ICU was 4 days. The physician then can notice this patient and use his/her judgment to decide whether to consider this case as a bounce back patient or not. 136 5.3 Query Representation: (A) Triangles are mandatory events. (B) Mandatory negation (red) is a triangle with a strike mark through the center placed in a balloon . (C) An event that has a time con- straint (blue) has a path drawn from its glyph in the query strip to the position on the timeline that represents the beginning of its time con- straint. The duration of the time constraint is rendered as a rectangle on the timeline. Solid line and lled rectangle represent mandatory constraint. (D) Query C with tooltip (E) A circle is an optional event (green). (F) An optional negation is a mandatory negation that uses a circle instead of a triangle. (G) An optional version of a time con- straint in Query C. Dashed line and hollow rectangle are used instead. (H) Query G with tooltip . . . . . . . . . . . . . . . . . . . . . . . . 137 xviii 5.4 Gap Representation: The gaps on the left (I,J,K,L) are mandatory while the gaps on the right are optional (M,N,O,P). Mandatory gaps are lled and use solid lines while optional gaps are not lled and use dashed lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.5 User can click on an empty location on the query strip to add an event or a gap to a query. In this example, the user is going to add something between Floor (green) and Discharge-Alive (light blue). The location is marked with a cross and an \Add..." popup dialog appear. User then choose to add an event or a gap, which will open another dialog. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.6 Weights: Users can adjust the importance of each kind of di erence. . 142 5.7 Creating a query from a record: In this example, a user chooses the record \10010010" as a template. (left) In \use speci c time" mode, each event use its original time from the selected record as an optional time constraint. (right) In \use gap between events", all gaps between events in the selected records are converted into optional gaps in the query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.8 Creating a query from an event sequence in LifeFlow: In this example, a user chooses the sequence Arrival (blue), Emergency (purple), ICU (red), Floor (green) and Discharge-Alive (light blue) as a template. 144 5.9 Comparison: Four events in the query are matched while one is miss- ing (light blue) and another one is extra (black). . . . . . . . . . . . . 145 6.1 Patients who visited the Emergency Room in January 2010. . . . . . 152 6.2 Six patients were transferred from ICU (red) to Floor (green) and back to ICU (bounce backs). The average transfer time back to the ICU was six days. The distribution shows that one patient was trans- ferred back in less than a day. . . . . . . . . . . . . . . . . . . . . . 154 6.3 This gure shows 203,214 tra c incidents in LifeFlow. There is a long pattern (more than 100 years long) in the bottom that stands out. I was wondering if there was an incident that could last more than hundred years, we probably should not be driving any more. Investigating further, the Incident Arrival time of all those inci- dents were on January 1th 1900, a common initial date in computer systems. This suggested that the system might have used this default date when no date was speci ed for an incident. . . . . . . . . . . . 159 6.4 LifeFlow with tra c incidents data: The incidents are separated by agencies (A-G). Only Incident Notification and Return to normal (aggregated) events are shown. Other events are hidden. The agencies are sorted by a simple measure of agency performance (average time from the beginning to the end). Agency C seems to be the fastest to clear its incidents, followed by E, A, H, D, F, B and nally G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 xix 6.5 LifeFlow with tra c incidents data from Agency C and Agency G: Only Incident Notification and Return to normal (aggregated) events are shown. The incidents are also grouped by incident types. Most of the incidents that Agency C reported are Disabled Vehicles which had about 1 minute clearance time on average. . . . . . . . . 163 6.6 Patient records aligned by the rst ED Reg date (light blue): The bar?s height represents the number of patients. The visualization shows the proportion of revisits. . . . . . . . . . . . . . . . . . . . . . 174 6.7 Tooltip uncovers more information: Place cursor over a sequence in Figure 6.6 to see more information from tooltip. For this sequence with four visits, there were 1501 patients, which is 2.5% of all patients.174 6.8 Patient records aligned by the all ED Reg date (light blue): The bar?s height represents the number of visits. The rst bar after the alignment point represents the number of visits by all patients who had at least one visit. The second bar represents the number of visits by all patients who had at least two visits. . . . . . . . . . . . . . . . 176 6.9 Access more useful information using tooltip{ Patients with at least ve visits, which is 3.1% of patients, accounted for 6.6% of visits. . . 176 6.10 Patient records aligned by all ED Reg date (light blue): The total height represents total number of visits. The ED Reg date bar (light blue) on the right of the alignment represents all visits. The Admit date (blue) bar on its right shows 19,576 admissions out of a total of 92,616 visits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.11 Patients who died: Patient records are aligned by all ED Reg date (light blue). The total height represents total number of visits. From the total of 92,616 visits to the ED, there were 788 deaths (red). . . . 178 6.12 Patient records aligned by rst ED Reg date (light blue): The total height represents number of patients. . . . . . . . . . . . . . . . . . . 179 6.13 The majority of the patients died (red) after admission (blue) more than those who died while in the ED (light blue). . . . . . . . . . . . 180 6.14 (left) Search for patients who visited, were discharged, came back and died. No record was returned as an exact match results. However, records in other results look like the pattern. This was because the ED Reg date and Discharge date in each record were exactly at the same time. (right) Re ne the query to search for patients who visited, came back and died. 237 patients were identi ed. . . . . . . . . . . . 183 6.15 Summary of Complaints (left) and Diagnoses (right) of the Patients? First Visits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.16 Patients with more than 30 visits: (a) Patients with many visits dis- tributed regularly throughout the year, for example, 1212249 and 1332883 (b) Patients with frequent visits followed by a long gap (marked by red horizontal lines) then another series of frequent visits, for example, 1011068 and 2355683 . . . . . . . . . . . . . . . . . . . . 186 6.17 International Children?s Digital Library www.childrenslibrary.org . . 191 xx 6.18 The page numbers are color-coded using color gradient from blue to red. We found that people started their reading on di erent pages. Some started from the rst page, while others jumped into the later pages, probably skipping the empty pages in the beginning. The height of the bars shows that people started on the earlier pages more than the later pages. . . . . . . . . . . . . . . . . . . . . . . . 193 6.19 After aligned by the second page: People read in order from (blue to red). Some ipped back one page and continued reading (small lines). There are also some long patterns before the second page. . . 195 6.20 The selection shows book sessions in which readers accessed the pages in the backward direction. . . . . . . . . . . . . . . . . . . . . . . . . 196 6.21 TaskSieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 6.22 Adaptive VIBE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 6.23 Before (above) and after (below) lling gaps with colors of previous events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 6.24 Overview of User Behaviors in TaskSieve (above) and Adaptive VIBE (below): In the visualization-based system (Adaptive VIBE), the distribution of di erent actions were denser than in the text-based system (TaskSieve). The participants switched more frequently by spending less time (smaller width colored blocks) per action, with the two additional event types (POI activities (light green) and Find subset (green)) mixed in the sequences. . . . . . . . . . . . . 206 6.25 Bigrams of user activities visualized in LifeFlow . . . . . . . . . . . . 208 6.26 Bigrams of user activities after aligned by Manipulate UM: The three most frequent actions before Manipulate UM were still the most fre- quent actions after Manipulate UM. Seems like the user model ma- nipulations were in the chain of the three actions repeatedly and the user model manipulation task needs to be considered as a set with those friend actions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 6.27 LifeFlow showing 821 trips: The majority of the trips had normal sequences. However, there were some anomalies in the bottom. . . . . 216 6.28 Anomalies: 1) Some trucks arrived to the sites but did not ll cement. 2) Some trips were reported to have begun lling cement before arriv- ing to the sites. 3) Some trips were reported to have loaded cement before entering plants. . . . . . . . . . . . . . . . . . . . . . . . . . . 217 6.29 Trips are grouped by Plant ID. Three event types; Enter plant, Leave site and Return to plant; were excluded to eliminate the overnight time and provide a more accurate comparison. Trips from plant \C313" took on average 45 minutes from leaving plants (green) to arrival at sites (red), which was much longer than other plants. This is because of its wide area coverage and regular heavy tra c near its location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 xxi 6.30 All events except Start fill cement (blue) and End fill cement (light blue) are hidden. Next, we aligned by Start fill cement, grouped by attribute Site ID, and ranked by \Average time to the end". We can see cement lling duration at each site ranging from two minutes to four hours. . . . . . . . . . . . . . . . . . . . . . . . 220 6.31 Matches that Man U scored the rst three goals . . . . . . . . . . . . 226 6.32 Matches that both teams scored in \an eye for an eye" fashion . . . . 227 6.33 Matches that Man U conceded early goals and came back to win the game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 6.34 Matches grouped by Score and Result . . . . . . . . . . . . . . . . . 228 6.35 Summary of all scorelines . . . . . . . . . . . . . . . . . . . . . . . . . 229 6.36 A distribution of the winning goals in all 1-0 matches: Matches are split into two groups: matches with early goals and late goals. . . . . 230 6.37 Matches grouped by Opponent: Man U competed against Chelsea in this season more often than any other teams. . . . . . . . . . . . . . . 231 6.38 Matches grouped by Competition: Man U scored (green) the rst goal in the English Premier League (EPL) faster than in the UEFA Champions League (UCL). . . . . . . . . . . . . . . . . . . . . . . . . 232 6.39 Matches grouped by Competition and Result: Man U scored (green) the rst goal in all matches that won in the UCL. . . . . . . . . . . . 233 6.40 Matches grouped by Venue and Result: Venues played an important role in the team?s performance. Man U had a great performance at home but was not as strong on the road. . . . . . . . . . . . . . . . . 234 6.41 Matches grouped by Result and Venue: Most common score in a draw match is 0-0. Most of these matches are away matches. . . . . . 235 6.42 Matches grouped by Venue, Score and Opponent Score: For away matches, the number of matches decreased with an increasing number of goals scored, but the trend is not the same for home matches. . . . 236 6.43 All matches that Man U played in season 2010-2011 . . . . . . . . . . 236 6.44 Three fastest goals occurred on the rst minute: Using a selection from distribution, we could select matches that Man U scored very early. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 6.45 Javier Hernandez and rst goals: We right-clicked on the rst green bar ( rst goal) and brought up a summary of event attribute \Player". Javier Hernandez was the most frequent scorer of the rst goal. He also scored the rst goal against Stoke City on the 27th minute twice. 238 6.46 George Elokobi, Wolverhampton Wanderer?s left defender scored the rst goal against Man U twice. Those two goals were the only two goals in his total 76 appearances for Wolves from 2008{2011. . . . . . 239 6.47 Display only yellow cards and group matches by Result: Man U received less booking in winning matches. . . . . . . . . . . . . . . . . 240 6.48 Display only red cards for both sides and group matches by Result: When opponents received red cards (blue), Man U always won. There was one match that Man U won with 10 players. . . . . . . . . . . . . 242 xxii 6.49 (left) Missed a penalty then conceded a goal : A disappointment for the fans as the match against Fulham ended with a draw at 2-2. Nani had a chance to make it 3-1, but he missed the penalty kick. (right) Missed a penalty but nothing happened after that : Man U was awarded a penalty kick while the team was leading 1-0, but Wayne Rooney could not convert it. However, the opponent could not score an equalizer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 6.50 Received a red card then scored a goal: A user searched for a situation when Man U received a red card then conceded a goal. However, he could not nd any, but found a match against Bolton when Man U scored after receiving a red card instead. . . . . . . . . . . . . . . . . 244 6.51 Features of LifeFlow used in 238 sessions . . . . . . . . . . . . . . . . 247 6.52 Analyzing LifeFlow?s tooltip usage with LifeFlow: LIFEFLOW TOOLTIP (green), a tooltip for LifeFlow sequence, was often used and followed by LIFELINES2 INSTANCE TOOLTIP (blue), a tooltip for each record. After that, users often used LIFEFLOW TOOLTIP (green) again, or LIFELINES2 EVENT TOOLTIP (light blue), which is a tooltip for each event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 6.53 A Process Model for Exploring Event Sequences . . . . . . . . . . . . 251 A.1 Ragnarok job tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 C.1 Out ow processes temporal event data and visualizes aggregate event progression pathways together with associated statistics (e.g. out- come, duration, and cardinality). Users can interactively explore the paths via which entities arrive and depart various states. This screen- shot shows a visualization of Manchester United?s 2010-2011 soccer season. Green shows pathways with good outcomes (i.e., wins) while red shows pathways with bad outcomes (i.e., losses). . . . . . . . . . 305 C.2 Multiple temporal event sequences are aggregated into a represen- tation called an Out ow graph. This structure is a directed acyclic graph (DAG) that captures the various event sequences that led to the alignment point and all the sequences that occurred after the alignment point. Aggregate statistics are then anchored to the graph to describe speci c subsets of the data. . . . . . . . . . . . . . . . . . 310 C.3 Out ow visually encodes nodes in the Out ow graph using vertical rectangles. Edges are represented using two distinct visual marks: time edges and link edges. Color is used to encode average outcome. . 310 C.4 Link edges are rendered using quadratic B ezier curves. Control point placement is selected to ensure horizontal starting and ending edge slopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 xxiii C.5 A multi-stage rendering process improves legibility. (a) Initial lay- out after sorting edges by outcome. (b) After applying Sugiyama?s heuristics to reduce crossings. (c) The nal visualization after both Sugiyama?s heuristics and Out ow?s force-directed layout algorithm to obtain straighter edges. . . . . . . . . . . . . . . . . . . . . . . . . 315 C.6 A spring-based optimization algorithm is used to obtain straighter (and easier to read) edges. Nodes and edges are simulated as parti- cles and springs, respectively. Spring repulsions are inserted between nodes. During the optimization, nodes are gradually moved along a vertical axis to reduce the spring tension in their edges. . . . . . . . . 316 C.7 Edge routing prevents overlaps between time and link edges. (a) A link edge is seen passing \behind" the time edge above it. Out ow?s edge routing algorithm extends the link edge horizontally beyond the occluding time edge. (b) The new route avoids the overlap and makes the time edge fully visible. . . . . . . . . . . . . . . . . . . . . . . . . 317 C.8 Interactive brushing allows users to highlight paths emanating from speci c nodes or edges in the visualization. This allows users to quickly see alternative progression paths taken by entities passing through a given state. . . . . . . . . . . . . . . . . . . . . . . . . . . 319 C.9 Using the same dataset as illustrated in Figure C.1, a user has ad- justed the simpli cation slider to group states with similar outcomes. Clustered states are represented with gray nodes. This simpli ed view shows that as more events occur (i.e., as more goals are scored), the paths diverge into two distinct sets of clustered states. The simpli ed states more clearly separate winning scorelines from losing scorelines. As more goals are scored, the probable outcomes of the games become more obvious. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 C.10 Out ow highlights factors that are strongly correlated with speci c event pathways. In this screenshot, a physician has focused on a group of patients transitioning from the current state due to the onset of the \NYHA" symptom. This transition seems to be deadly, as seen from the color-coding in red. The right sidebar displays medications (factors) with high correlations to this transition. The factor with the highest correlation in this example is prescribing antiarrhythmic agents. This correlation, which may or may not be causal, can help clinicians generate hypotheses about how best to treat a patient. . . . 322 C.11 Out ow aggregates temporal event data from a cohort of patients and visualizes alternative clinical pathways using color-coded edges that map to patient outcome. Interactive capabilities allow users to explore the data and uncover insights. . . . . . . . . . . . . . . . . . 325 C.12 The progression from green to red when moving left to right in this gure shows that patients with more symptoms exhibit worse outcomes.326 C.13 Questionnaire results for each of the eight questions (Q1{Q8) an- swered on a 7-point scale by study participants. . . . . . . . . . . . . 332 xxiv List of Abbreviations ED Emergency Department EHR Electronic Health Record EMR Electronic Medical Record EPL English Premier League ER Emergency Room FTS Flexible Temporal Search HCIL Human-Computer Interaction Lab IBM International Business Machine ICU Intensive Care Unit ICDL Internation Children?s Digital Library IMC Intermediate Medical Care Man U Manchester United Football Club M&M Match and Mismatch MILCs Multi-dimensional In-depth Long-term Case Studies MVC Model-View-Controller NIH National Institute of Health RPG Role-Playing Game TC Temporal Categorical UCL UEFA Champions League UM User Model WHC Washington Hospital Center xxv Chapter 1 Introduction \Every moment and every event of every man?s life on earth plants something in his soul." {Thomas Merton Our lives are de ned by events, those tiny yet powerful pieces of informa- tion. From the day that I started my graduate program (August 2007, enter graduate school) to the day that I received my doctoral degree (May 2012, graduation), many events had occurred in-between. Everyday, I woke up (8:00 a.m., wake up), had breakfast (8:15 a.m., breakfast), took a shower (9:00 a.m., shower), then I went to campus (10:00 a.m., work). Some days I had a meeting (1:00 p.m., meeting) or an important soccer game that I needed to watch (2:45 p.m., soccer). While I was working in the lab, some patients might visit the hospital (August 10, 2011 6:11 a.m., arrival), get diagnosed by the physicians (August 10, 2011 6:30 a.m., emergency room) and have to stay in the ICU (August 10, 2011 9:05 a.m., ICU) for several days (August 15, 2011 3:00 p.m., discharged). Each event, or temporal event, contains a timestamp (8:00 a.m.), its event type (wake up) and additional information that make each event unique and have a story of its own. However, that is only a small part of its potential. The greater potential of these events is revealed when they are grouped and connected into event 1 sequences, which connect the dots between the scattered events and transform them into a more complete story. Here are some examples of event sequences. 1) Student record : (August?07, enter PhD program) ! (April?10, propose) ! (May?12, graduate) 2) Human activity : (8:00 a.m., wake up) ! (8:15 a.m., breakfast) ! (9:00 a.m., shower) ! ... 3) Medical record : (6:11 a.m., arrival) ! (6:30 a.m., emergency room) ! (9:05 a.m., ICU) ! ... 4) Soccer match: (1st min, team A goal) ! (89th min, team B goal) ! (90th min, team B goal) An increasing number of leading organizations are collecting event sequence data. Health organizations have Electronic Medical Record (EMR) databases con- taining millions of records of patient histories. Each patient history contains hospital admissions, patients? symptoms, treatments and other medical events. Transporta- tion systems generate logs of incidents and the timing of their management ( rst report, noti cations and arrivals of each unit on the scene). Academic institutions keep detailed records of the educational advancement of their students (classes, mile- stones reached, graduation, etc.). Web logs, nancial histories, market baskets and data in many other domains can also be viewed as event sequences. (More examples of event sequences in various domains are listed in Appendix A.) However, these vast collections of data are not being fully utilized due to the limited approaches available for analyzing event sequences. Much e ort has been put into designing data storage and information retrieval, but the exploratory 2 data analysis [124]|an approach to analyzing data for the purpose of formulating hypotheses worth testing|is still insu ciently supported. 1.1 Overview of the Dissertation To help the users harvest from the diligently collected, rich and informative but gigantic databases, my dissertation aims to use information visualization tech- niques to support exploratory data analysis of event sequences. The fundamental concept of information visualization is that \visual representations and interaction techniques take advantage of the human eye?s broad bandwidth pathway into the mind to allow users to see, explore, and understand large amounts of information at once" [123]. Therefore, I believe that by designing the suitable visual represen- tations and interaction techniques for exploring event sequences, I can help users understand and gain new insight from the data. This dissertation work was initially inspired by working closely with the physi- cians at the Washington Hospital Center. While helping them analyzing patient transfers sequences from Electronic Medical Records, I came across two important tasks that are not supported by the current systems and remain open problems in the research community. These two tasks are signi cant not only for analyzing pa- tient transfers, but also in other event sequences. Seeing the opportunities to make an impact both theoretically and practically, I de ned these two research questions to drive my research directions. 3 1. How to provide an overview of multiple event sequences? Although many systems were developed to visualize event sequences. [98, 8, 28, 97, 37, 131, 129], how to visualize an overview of multiple event sequences still remains a challenging problem. Previous systems can answer questions regarding the number of records that include a speci c event sequence (e.g., \How many patients went from the Emergency Room to the Intensive Care Unit (ICU)?") but questions requiring an overview are not adequately sup- ported. For example, a question such as \What are the most common transfer patterns between services within the hospital?" requires examination of all records one by one. Being unable to see all records on the screen at once makes it di cult to spot any pattern. Providing an overview is signi cant because it gives users a big picture of the underlying data, in the same way that an abstract does for a scholarly paper and an executive summary does for a marketing report. Squeezing a billion records into a million pixels [113] is also a great challenge in information visualization. Therefore, I have developed a novel interactive visual overview of event se- quences called LifeFlow. This approach aggregates and compresses informa- tion from multiple event sequences into a data structure called tree of sequences whose size depends on length and total number patterns, thus reducing amount of information greatly. This data structure is later converted into a LifeFlow visualization that can display all possible sequences of events and the temporal spacing of the events within sequences. 4 2. How to support users in querying for event sequences when they are uncertain about what they are looking for? Many traditional tools use an exact match approach. Once the query is sub- mitted, the queries are transformed into constraints and the tools return only the records that match the constraints. For example, users may want to nd the records that have event \A" followed by event \B" within exactly 2 days, so they specify the query as A! 2 days!B. This approach is e ective when the users know exactly what they want. However, sometimes the users are uncertain about what they are looking for. For example, the physician wants to nd the records that have event Surgery followed by event Die within approximately 2 days. The value \2 days" is just an approximation. Specifying narrow queries (e.g., Surgery! 2 days!Die) could miss the records that might be \just o " (e.g., a patient who died 2 days and 1 minute after surgery). Using broad queries (e.g., Surgery!Die) could return too many results that are not relevant. Query methods that search for similar records rather than nding exact matches could better support the exploratory searchers. Therefore, I have developed methods for searching event sequences based on similarity. Since then, I have learned from the user studies that using simi- larity alone also has its limitations, so I have revised and developed Flexible Temporal Search, a hybrid approach between exact and similarity search for event sequences. 5 This dissertation describes how I approached these two problems in detail and include several user studies, ranging from a usability study, controlled experiments to multi-dimensional in-depth long-term case studies (MILCs) [114] that prove the bene ts of my approaches. Although this research was initially inspired by ap- plications in the medical domain, I have designed generalizable solutions that are not domain-speci c, and therefore make them applicable to other event sequences outside of the medical domain, as shown in the case studies. 1.2 Contributions My contributions from this research are: 1. A tree of sequences, a data structure that aggregates multiple event sequences while preserving temporal and sequential aspects of the data, and a visual rep- resentation, called LifeFlow, which compactly displays summaries of multiple event sequences, along with interaction techniques that facilitate the users? explorations. (Figure 1.1) 2. Similarity measures (the M&M measure 1-2) and similarity search interfaces (Similan 1-2) for querying event sequences. (Figure 1.2) 3. A similarity measure and user interface for Flexible Temporal Search (FTS), a hybrid query approach for querying event sequences that combines the bene ts of exact match and similarity search. (Figure 1.3) 4. Case study evaluations to re ne the concept and user interfaces resulting in 6 Concept Language Lines of Code Evaluation Similan 1 C# 8,138 Usability Study Similan 2 Adobe Flex 11,629 Controlled Experiment LifeFlow Java 61,188 Usability Study, MILCs FTS Java MILCs Table 1.1: Summary of all software developed in this dissertation a process model and a set of design recommendations for temporal event se- quence exploration. A summary of all software developed in this dissertation and evaluation results are shown in Table 1.1 and 1.2, respectively. 1.3 Dissertation Organization The remainder of this dissertation is organized as follows: Chapter 2 provides a literature review of background and related work; Chapter 3 describes LifeFlow, an overview visualization for event sequences; Chapter 4 explains Similan and the M&M measure, a user interface and similarity measure for querying event sequences by similarity; Chapter 5 then discusses Flexible Temporal Search, a hybrid interface for querying event sequences that combines the bene ts of exact match and similarity search interfaces; Chapter 6 reports on several multi-dimensional in-depth long-term case studies that demonstrate the applications of my research in several domains; Finally, I give concluding remarks and discuss future work in Chapter 7. 7 Concept Evaluation Results Similan 1 Usability Study A study with eight participants was con- ducted. The participants believed that Sim- ilan could help them nd students who were similar to the target student. (Section 4.3) Similan 2 Controlled Experiment A controlled experiment that compared exact match and similarity search interfaces showed that an exact match interface had advan- tages in nding exact results and also gave more con dence to the users in tasks that in- volve counting. On the other hand, similarity search interface had advantages in the exi- bility and intuitiveness of specifying the query for tasks with time constraints or uncertainty, or tasks that ask for records that are similar to a given record. (Section 4.5) LifeFlow Usability Study A study with ten participants con rmed that even novice users with 15 minutes of training were able to learn to use LifeFlow and rapidly answer questions about the prevalence of in- teresting sequences, nd anomalies, and gain signi cant insight from the data. (Section 3.6) LifeFlow MILCs Eight long-term case studies in six application domains demonstrated the bene ts of Life- Flow in exploring event sequences and iden- tifying interesting patterns. Many interesting use cases and ndings were reported. (Chap- ter 6) FTS MILCs Two long-term case studies in two application domains were conducted. FTS was used to search for particular situations when needed. Having the similar results give users more con- dence when the exact match results is empty. (Section 6.5 and 6.9) Table 1.2: Summary of all evaluation conducted in this dissertation 8 Figure 1.1: LifeFlow : An overview visualization for event sequences Figure 1.2: Similan 2 : Query event sequences by similarity 9 Figure 1.3: Flexible Temporal Search (FTS): A hybrid approach that combines the bene ts of exact match and similarity search for event sequences 10 Chapter 2 Background and Related Work Time is one of the oldest data types that mankind has ever collected. Artifacts from the Palaeolithic suggest that the moon was used to reckon time as early as 6,000 years ago [105]. It would be interesting to review the literature of temporal data back that far, but for the sake of brevity, I select only a number of topics that are relevant to my research. In this chapter, I review the important techniques used for visualizing and querying temporal data, especially event sequences, and some other related areas to provide readers with the context of my dissertation work. 2.1 Information Visualization 2.1.1 Temporal Data Visualization There has been a long history of visualizing temporal data [6]. I rst review tools that support analysis of a single record, then tools designed to handle collec- tions of records. 2.1.1.1 Single-Record Visualization Many systems were designed for analyzing a single record [32, 46, 59, 97, 13, 7]. The most common approach is to use a timeline-based representation. Events are placed on a horizontal timeline according to their time. One record consists of 11 multiple rows, one for each category of events. Cousins et al. [32, 33] developed the Timeline Browser for visualizing dia- betes data of a single patient. The Timeline Browser has one special row in which the vertical position is used to indicate the value of blood glucose concentration readings. Other information, such as insulin doses, meals, and clinical studies are placed on di erent rows. It also provides zooming, ltering and details-on-demand. Harrison et al.?s Timelines [46] is an interactive system for collection and visual- ization of video annotation data. The visualization plots the categories (events and intervals) on the y-axis and time along the x-axis. Zooming, scrolling and reorder- ing events are supported. Karam [59] introduced xtg (Timeline Display Generator for X-windows) for visualizing activities in client-server application|a video con- ference system. Xtg supports multiple views and zooming with details-on-demand. It also allows annotation and searching for events in the timeline. LifeLines (Fig- ure 2.1) was developed by Plaisant et al. [97] to provide a general visualization environment for personal histories, which was applied to clinical patient records. Problems, diagnoses, test results or medications are presented as dots or horizontal lines. Colors can be used to indicate severity or type. Bade et al. [13] presented MIDGAARD, a timeline-based visualization, which enables the users to reveal the data at several levels of detail and abstraction, ranging from a broad overview to the ne structure. Resizing the height of the visualization adjusts the abstraction levels. PlanningLines [7], a visualization used for representing temporal uncertainties, was developed to support project management. The glyph consists of two encapsulated bars, representing minimum and maximum duration, that are bounded by two caps 12 Figure 2.1: LifeLines [97] displays a medical history of a single patient on a timeline. that represent start and end intervals. TimeZoom [34] allows zooming into regions of the timeline, permitting arbitrary granularity of time units. It supports multiple focus regions with various levels of detail, which allows comparison between multiple regions while preserving the overall context. Spiral timelines|in which angle represents time interval (time of day, days of week, months, or years)|were inspired by the cyclic nature of how we organize time and used to reveal periodic patterns in time series [25, 49, 137]. Carlis et al. [25] proposed a visualization on a spiral timeline in both 2D and 3D. Visualizations, such as bar charts or size-coded circles, are placed along the spiral to represent the values of the events. Hewagamage et al. [49] used multiple 3D spirals on a geographical 13 Figure 2.2: The Spiral visualization approach of Weber et al. [137] applied to the power usage dataset map to visualize spatio-temporal patterns. Weber et al.?s work [137], as shown in Figure 2.2, allowed the users to adjust the length of the cycle to nd periodic patterns. Suntinger et al. [120] visualized event sequence streams by plotting circles on a radial layout. Outer ring represents more recent time. Tree-based representation was used by VizTree [72, 73, 74] to detect patterns in numerical time series. The numerical values are binned into categorical values. VizTree then displays a pre x tree of patterns and uses edge thickness to repre- sent frequency. This representation can easily reveal frequent and rare patterns in the data. They also presented Di Tree as another view that supports comparison between time series by showing di erences between trees. 2.1.1.2 Visualization of multiple records in parallel To support the analysis of multiple records, a common technique is to stack instances of single-record visualizations [37, 127]. Many compact and space-e cient 14 single-record visualizations were proposed. In PatternFinder [37], each record is displayed with a ball-and-chain visual- ization. LifeLines2 [131] uses a single-record visualization based on LifeLines idea (Figure 2.4). Vrotsou et al. [127] visualized social science diary data using the vertical axis to represent time of day while the horizontal axis represent records. The visualization is compact, using one vertical line with many color-coded sections that represent events to represent each record. Weber et al.?s spiral-based visu- alization [137] also supports multiple records by showing each record as one ring. CloudLines [65] aggregates overlapping events on the timeline using decay function to better display high and low density areas. Those tools typically provide searching and ltering mechanisms to enhance the analysis [37, 131, 129]. LifeLines2 introduces Align-Rank-Filter (ARF) frame- work to support browsing and searching. The idea is to rearrange the time line by important events, rearrange records by event occurrence frequency, and lter by data characteristics. ActiviTree [129] combines the visualization in [127] with a framework for nding interesting sequential patterns. The query interface uses a tree-based visualization, showing all the events that occur before and after the current query pattern with their scores. In the context of my inspirational case study, users of those tools could nd patients who were transferred with a speci c known sequence, or nd patients who were admitted to the ICU at least once, but could not nd out what the common sequences are or spot anomalous sequences. Some tools allow the users to organize records into hierarchy [23] or groups [92]. 15 Timeline Tree [23] is a tree with timelines at the leaf nodes. The tree groups the timelines into hierarchy based on their attributes. Timeline Tree also has time bars as an alternative view. The time or order of transactions is encoded using color coding and the measure is represented by the width of the boxes instead of their height. Phan et al. [91, 92] proposed a visualization called progressive multiples that allows the users to create folders to organize timelines into di erent groups. By default, all records are placed sequentially in the initial folder. Users can create a new folder and drag records into the new folder. However, unlike LifeFlow which provides one visual abstraction that repre- sents multiple records, these systems do not provide any abstraction. Without the overview abstraction, users lose the big picture when the number of records exceeds the maximum number of single-record visualizations that can be displayed on the screen. For example, the visualization in Figure 2.4 can display only 8 records from a total of 377 records on the screen. Also, it does not provide a summary or make common patterns within records stand out. 2.1.1.3 Visualization that aggregates multiple records Some systems provide an overview of multiple records [48, 10, 132]. Contin- uum [10] shows a histogram of frequency of events over time while LifeLines2 [132] has a temporal summary, which is a stacked bar chart that shows the distribution of event types within each period of time. These methods can provide the distribution of events by time, which answers 16 Figure 2.3: Continuum?s overview panel (top) displays a histogram as an overview of the data set. [10] the questions related to the distribution, such as which type of event occurred most frequently in January 2007 or which type of event usually occurred within the rst week after patients arrived at the hospital? However, the event sequences within the records are obscured and thus, it cannot answer questions related to sequences, such as where did the patient usually go directly after arrival or what is the most common transfer sequence? 2.1.2 Hierarchy Visualization To visualize an overview of multiple event sequences, I group all records into a hierarchical structure called a Tree of Sequences. The overview visualization is then created from this hierarchical structure. According to Stasko and Zhang [118], many visualizations for displaying hierarchical structures were developed. 17 Figure 2.4: LifeLines2 [131, 132] displays a temporal summary in the bottom as an overview of the data set. 18 Figure 2.5: Icicle tree in ProtoVis. [21] The visualization is called an icicle tree because it resembles a row of icicles hanging from the eaves. [66] The most common way is to display a node-link layout in 2D [35, 138, 101], or 3D [104] or hyperbolic space [68, 82]. Some visualizations add more interaction techniques to enhance the node-edge tree visualizations. For example, SpaceTree [96] allows node expansion on demand. TreeViewer [61] visualizes trees in a form that closely resembles botanical trees. A dendogram [117, 47, 107] is a tree for visual classi cation of similarity, com- monly used in biology for grouping species. A dendogram can show the relative sequence similarity between many di erent proteins or genes. Generally, the hor- izontal or vertical axis indicates the degree of di erence in sequences and another axis is used for clarity to separate the branches. Space- lling techniques use implicit containment and geometry features to present a hierarchy [55, 66, 38, 21, 118]. TreeMaps [55, 15] display hierarchical data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is 19 then tiled with smaller rectangles representing sub-branches. A leaf node?s rectangle has an area proportional to a speci ed dimension on the data. Icicle tree [66, 38, 21], also called Icicle plot, displays hierarchical data as stacked rectangles, usually ordered from top to bottom. This visualization directly inspired LifeFlow design (Figure 2.5). The root takes the entire width. Each child node is placed under its parent with the width proportional to the percentage it consumes relative to its siblings. Sunburst [118] is a radial space- lling technique, which is essentially the polar form of the Icicle tree. At the core is the root of the tree, each concentric ring represents the child nodes and is partitioned to represent the percentage a node consumes relative to its siblings. Using these methods, the sequences of events can be represented. For example, the users can see what type of events usually occur after A. However, the length of time between events|which is important in many analyses|is not represented. For example, the users can see that B usually occurred after A, but they cannot see how long after A occurred that B occurred. Therefore, LifeFlow was designed to also display the gap between events. A phylogenetic tree [69] is a branching diagram showing the inferred evolu- tionary relationships among various biological species. The edge lengths in some phylogenetic trees may be interpreted as time estimates, but each node in the tree represents only one species, while each node in LifeFlow represents multiple records of event sequences. 20 Figure 2.6: Blaas et al.?s state transition visualization: A smooth graph representa- tion of a labeled biological time-series. Each ring represents a state, and the edges between states visualize the state transitions. This graph uses smooth curves to explicitly visualize third order transitions, so that each curved edge represents a unique sequence of four successive states. The orange node is part of a selection set, and all transitions matching the current selection are highlighted in orange. [17] 2.1.3 State Transition Visualization An alternative approach to aggregate multiple event sequences is to consider each event sequence as a path through di erent states. In the simplest case, a state can be an event type. For example, an event sequence A ! B ! A represents a path from state A to state B and back to state A again. A state transition graph then can be created from multiple event sequences. This idea was explored in a spin-o project, Out ow (Appendix C). Most of the state transition visualizations were inspired by a state diagram [20], or state transition graph. A state diagram is used in computer science and related elds to represent a system of states and state changes, e.g. nite state machines. State diagrams are generally displayed by simple node-link diagrams, where each node represents a state and each directed link represents a state transition [17]. 21 Many visualizations based on the state diagrams were developed [125, 136, 99, 100, 17]. These approaches typically focus on displaying multivariate graphs where a number of attributes are associated with every node. There are also extensions of a state diagram. For example, a Petri net (also known as a place/transition net or P/T net) [83] allows transitions from any number of states to any number of states. This property was designed to support the systems which can be in multiple states at once. Petri nets o er a graphical notation for stepwise processes that include choice, iteration, and concurrent execution. It was invented in 1939 by Carl Adam Petri|at the age of 13|for the purpose of describing chemical processes [90]. Petri nets are used to display the ow of distributed systems, where many concurrent processes are executed at the same time. Van der Aalst [83] adapted Petri nets for work ow management. However, using an approach based on state transition visualizations also has some limitations since these visualizations were designed to emphasize the transi- tions between states (event types, in this case). It is hard to recognize the sequences of events longer than two events and, therefore, it is harder to nd common sequences or outliers from the records. It is also di cult to incorporate temporal information into graph visualizations, which are often complex already when there are many nodes and edges. 22 Figure 2.7: Charles Minard?s Map of Napolean?s Russian Campaign of 1812 2.1.4 Flow Visualization Cartographers have long used Flow Maps to show the movement of objects from one location to another, such as the number of people in a migration, the amount of goods being traded, or the number of packets in a network [93, 102, 22]. The most famous one is Charles Minard?s Map of Napolean?s Russian Campaign of 1812 (Figure 2.7) [40]. However, these ow maps only focus on displaying proportion of the ow that splits in di erent ways, without showing temporal information or steps in the process. 2.2 Query Methods With respect to my work in overview visualization, related research in visu- alizations are highlighted and reviewed in the previous section. In this section, I collect query methods for temporal data including important work in related areas, 23 and summarize them to provide background knowledge for querying event sequences. 2.2.1 Query Languages A traditional approach to querying temporal data is to use database query languages. According to Chomicki [30] and Tansel and Tin [122], many research projects were conducted on designing temporal databases and extending standard query languages into temporal query languages. Some of the well-known languages were TQuel [115], TSQL2 [116] and Historical Relational Data Model (HRDM) [31]. However, these temporal query languages were built on top of their speci c data models and users had di culty in learning their unique syntaxes, concepts, and limitations. They also support only exact match. 2.2.2 Query-by-Example Languages To provide a high-level language that o ered a more convenient way to query a relational database (RDB), the query-by-example languages were introduced. Ac- cording to Ozsoyoglu and Wang[87], the early idea of query-by-example was a lan- guage that users entered what they expected to see in a database result table into a form that looked like a result table instead of writing lengthy queries, making it sim- pler for the users to specify a query. The rst was Zloof?s Query-by-Example [145], which was re ned by others [26, 64, 146, 53, 86, 121]. Time-by-Example [121] followed the Query-by-Example idea and adopted sub- queries concepts from Aggregates-by-Example [64] and Summary-Table-by-Example [86] 24 Figure 2.8: An example of query-by- lters: PatternFinder [37] allows users to specify the attributes of events and time spans to produce pattern queries. The parameters in the controls are converted directly into constraints that can be used to retrieve the records. 25 Figure 2.9: An example of query-by-example: QueryMarvel [54] allows users to draw comic strips to construct queries. With its exact result back-end, the comic strips are converted into rules, as seen in the status bar. (4). to serve a historical relational database (HRDB). HRDB is an extension of RDB that stores the changes of attribute values over time. For example, a patient entity had an attribute called room, which was changed every time each patient was moved to a new room. HRDB can keep track of the room values and Time-by-Example provides a way to query those values. For instance, the users can ask queries such as list all rooms a patient was in or which patient was in an ICU room between January and March? However, Time-by-Example supports only exact match, operates only on top of HRDM and still requires the users to learn their languages for specifying con- ditions in complex queries, e.g. \($sal.T overlaps $dept.T overlaps $msal.T overlaps $m.T) and $sal.v > $msal.v". 26 2.2.3 Query by Graphical User Interfaces (GUIs) 2.2.3.1 Exact Match Approach As the graphical user interfaces (GUIs) were becoming more common, many GUIs were developed for temporal data [5, 109, 62, 63]. Several GUIs used the exact match approach, in which users specify exact constraints to construct the queries. These constraints are often speci ed via controls, such as sliders or drop- down lists. The tool then returns only the records that follow every constraint in the query. Karam [59] presented a visualization called xtg, which allows users to explore temporal data and do simple searches for events. Hibino and Runden- steiner [50, 51] proposed a visual query language and user interface for exploring temporal relationships using slider lters with results displayed in a graph-like vi- sualization. PatternFinder [37] allows users to specify the attributes of events and time spans to produce pattern queries that are di cult to express with other for- malisms. LifeLines2 [131, 132, 130] uses an alignment, ranking and ltering (ARF) framework to query for event sequences. ActiviTree [129] provides a tree-like user in- terface with suggestions about interesting patterns to query for sequences of events. QueryMarvel [54] utilizes and extends the semantic elements and rules of comic strips to construct queries. Instead of following the exact match approach, Similan follows the similarity search approach (Section 2.2.3.2), and applied the concept for querying event sequences. 27 2.2.3.2 Similarity Search Approach Many GUIs follow the similarity search approach, in which users can draw an example of what they expect to see as a result of a query. The result from a query is a list of records, sorted by similarity to the given example. Kato et al. [60] presented QVE that accepts a sketch drawn by users to retrieve similar images or time series from the database. IFQ (In Frame Query) [71] is a visual user interface that supports direct manipulation [111] allowing users to combine semantic expressions, conceptual de nitions, sketch, and image examples to pose queries. Spatial-Query-by-Sketch allows users to formulate a spatial query by drawing on a touch screen and translates this sketch into a symbolic representation that can be processed against a geographic database. Bonhommeet al. [19, 18] discussed the limitations of previous query-by- sketch approaches and extended the Lvis language, which was developed for spatial data, to temporal data. The new language uses visual metaphors, such as balloons and anchors, to express spatial and temporal criteria. QuerySketch [135] allows users to sketch a graph freehand, then view stocks whose price histories matched the sketch. Watai et al. [134] proposed a web page retrieval system that enables a user to search web pages using the user?s freehand sketch. WireVis[27] introduces techniques to extract bank accounts that show similar transaction patterns. To the best of my knowledge, existing event sequence query tools have used an exact match approach. These systems demonstrated the similarity search concept in other types of data and inspired me to develop a similarity search tool for event sequences. Timesearcher [52] visualizes multiple timelines as line charts on the same 28 plane, using horizontal and vertical axis to represent time and value, respectively. Users draw timeboxes, rectangular widgets that can be used to specify query con- straints, on the timeline to query for all time series that pass through those time- boxes. In Timesearcher, users can draw an example (timeboxes) to specify the query, but the timeboxes are converted into exact rules, e.g. January < time < March and 100 < value < 200 , when processing the query in the background. Similan2 allows users to draw an example, but does not convert the example into any exact rule. Instead, it compares the example with each record directly and sorts the result by similarity to the example. 2.2.4 Similarity Measure Pattern matching computes a boolean result indicating whether an event se- quence matches the speci ed pattern, or it does not. In contrast, a similarity mea- sure calculates a real number measurement that expresses how similar an event sequence is to the speci ed pattern. 2.2.4.1 Numerical Time Series Many similarity measures had been proposed for comparison between series of numerical values measured over time, such as stock price. Event sequences, in contrast, are a series of categorical values measured over time. Hence, these approaches are not directly applicable to event sequences because they were designed to capture the di erence between numerical values, not categorical. 29 Nevertheless, there are some common concepts that are worth mentioning here. The rst concept is lock-step measure, which compares the i-th point of one time series to the i-th point of another, such as the well-known Euclidean distance. However, since the mapping between the points of two time series is xed, these measures are sensitive to noise and misalignments in time. The M&M measure is di erent from lock-step measures because it does not x the mapping of i-th events together. The second concept, elastic measure, allows comparison of one-to-many points (e.g., Dynamic time warping (DTW) [16] and one-to-many / one-to-none points (e.g., Longest Common Substring (LCSS)). The sequences are stretched or com- pressed non-linearly in the time dimension to provide a better match with another time series. Unlike elastic measures, the M&M measure does not allow one-to-many mapping. 2.2.4.2 String and Biological Sequences Edit distance is the number of operations required to transform one string into another string. The lower the number, the more similar the strings are. Hamming distance [44], Levenshtein distance [70] or Jaro-Winkler distance [140] are some ex- amples. The best known such distance is the LCSS distance [11]. A more completed survey can be seen from [84]. One neighbor area is biological sequence searching. There exist many algo- rithms for comparing biological sequence information, such as the amino-acid se- 30 quences of di erent proteins or the nucleotides of DNA sequences. BLAST [9], FASTA [88] and the TEIRESIAS algorithm [103] are some examples. Mongeau and Sanko [81] de ned a similarity measure speci cally for compar- ing musical pieces based on number of transformations required to transform one into another. Their measure allows one-to-many mapping called consolidation/frag- mentation, which is similar to time warping. G omez-Alonso and Valls [41] proposed a similarity measure for sequences of categorical data (without time) based on edit distance. These approaches consider the di erence in ordering and existence, but do not consider the time that events occurred. Event sequences might occur at non- uniform intervals, which made the timing become important. Also, more than one event could occur at the same time, while two characters or amino acids could not occur at the same position in the string or biological sequence. 2.2.4.3 Event Sequences Mannila and Ronkainen [77] introduced a similarity measure for event se- quences based on three edit operations: insert, delete and move. The move opera- tion was included to incorporate the occurrence time of the events. This approach allows only monotonic mapping, which means that the matched events in the target and candidate sequences must be in similar order, and do not o er a user interface. Sherkat and Ra ei [110] binned the timeline into intervals and compared events within each interval. 31 The Match & Mismatch (M&M) measure v.1 [144] calculates a similarity score from two types of di erence: time di erence of matched events and number of mismatches. It supports matching that may not preserve the order of event sequence (non-monotonic). Timed String Edit Distance [36] inserts timed null symbols into event sequences before matching. It allows matching between events with di erent event types and measured two types of di erence: time di erence and event type di erence (symbol dissimilarity). Vrotsou et al. [126, 128] identi ed nine measures to cover several aspects of similarity. This approach also considers multiple occurrences of the target sequence in the candidate sequence. Obweger et al. [85] de ned single- event similarity by comparing event attributes. Their event sequence similarity then combines single-event similarities, order of events and time that the events occurred with weights and more options. However, their computation time to nd the best match is exponential while others are polynomial. Some methods extracted \ ngerprints" from event sequences and compared the ngerprints instead of comparing the event sequences directly. Mannila and Moen [76] detected similar event types by comparing their context. They converted each context (event sequence around the selected event type) into feature vectors and developed methods for comparing these vectors. Mannila and Sepp anen [78] mapped event sequences into points in k-dimensional Euclidean space using a ran- dom function and searched for similar event sequences from their k-dimensional projections. The growth of measures for event sequence similarity demonstrates the impor- tance of these search capabilities in many domains beyond medical interests, such 32 as human activity event streams, business transactions, or legal actions. 2.3 Summary Decades of innovations in temporal data visualization and query methods are surveyed and reviewed in this chapter. On one end, researchers had developed many visualization techniques aiming to present information in a meaningful way and reveal underlying patterns. There are techniques ranging from displaying a single record to displaying multiple records in parallel. However, as data collections grow, it is becoming more di cult to detect any pattern among records when the users can see only a small portion of a huge dataset. An overview is needed. A few histogram-based techniques were developed but there are still many scenarios that they cannot support. Hierarchical, graph and ow visualization techniques seem to be applicable to some extent but still, they were not designed to support time in the display. New techniques that can aggregate and provide an overview of multiple records are needed. On the other end, many query languages and GUIs were invented for querying temporal data. Speci c query languages are very expressive but found to be complex and di cult for typical end users. Graphical user interfaces (GUIs) were then used to bridge the gap. The majority of the GUIs are exact match interfaces, which are precise, but at the same time, in exible for queries with uncertainty. A more exible alternative, similarity search interface, was used for querying numerical time series and other non-temporal data. Designing a similarity search interface for querying 33 event sequences remains an open problem. One part of the puzzle is how to de ne a similarity measure that can capture di erent de nitions of similarity. A further question is how to design an interface that can provide both precision of the exact match and exibility of the similarity search. These unsolved problems present exciting research opportunities. The follow- ing chapters describe how I approached these problems and explain my solutions in more detail. 34 Chapter 3 Providing an Overview of Temporal Event Sequences to Spark Exploration: LifeFlow Visualization 3.1 Introduction Previous work on temporal data visualization can support many types of event sequence analysis, ranging from examining a single record in detail to various ways of ltering and searching multiple records. They can answer questions regarding the number of records that include a speci c event sequence, but questions requiring an overview need innovative strategies. For example, a question such as \What are the most common transfer patterns between services within the hospital?" requires examination of all records one by one. Being able to see all records on the screen at once would greatly facilitate pattern discovery. Squeezing a billion records into a million pixels [113] is a great challenge in information visualization. On one hand, researchers want to be able to display millions of event sequences within limited space|a computer monitor, for instance. That means the data must be somehow aggregated to create a scalable visualization. On the other hand, we want to be able to preserve as much important information as possible. So, a trade-o between what information needs to be preserved and what information should be compromised has to be made. 35 To address this challenge, I aggregate event sequences into a data structure called a tree of sequences and introduce a novel interactive visualization called Life- Flow, which can summarize all possible sequences and the temporal spacing of the events within sequences. In this chapter, I describe a motivating example from the medical domain, explain the tree of sequences data structure, then introduce the LifeFlow visualization, describe the user interface and basic features, present the results of a user study, and nally describe advanced features that I have developed while working with the domain experts in long-term case studies. An earlier version of this chapter has been published in [142]. 3.2 Motivating Case Study The use of information visualization to support patient care and clinical re- search is gaining momentum [5, 63, 29]. This section describes a particular case study that motivated the original design of LifeFlow. It was conducted with Dr. Phuong Ho, a practicing physician in an emergency department, who was inter- ested in analyzing sequences of patient transfers between departments for quality assurance. 3.2.1 Event De nitions These terms will be used when describing the case study. 1. ER: Emergency Room, the section of a hospital intended to provide treatment for victims of sudden illness or trauma, also called Emergency Department 36 (ED) 2. ICU : Intensive Care Unit, a hospital unit in which patients requiring close monitoring and intensive care are kept 3. IMC : Intermediate Medical Care, a level of medical care in a hospital that is intermediate between ICU and Floor 4. Floor : a hospital ward where patients receive normal care 3.2.2 Example question One of Dr. Ho?s particular interests was the monitoring of bounce backs, which occurs when a patients? level of care is decreased then increased again urgently. For example, a patient?s condition might have improved enough to have him transferred from the ICU to Floor, but his condition worsened again and he had to be sent back to intensive care within 48 hours, suggesting he might have left the ICU too early. This pattern corresponds to a hospital quality metric that is very rarely monitored. Dr. Ho had been using an MS Excel spreadsheet to nd these patients. In an interview he described the complex and time consuming e ort to create the formulas and view the data. This is due in part to the fact that there are many room types and special conditions for using those rooms. My research group had previously worked with Dr. Ho using LifeLines2 [131] and Similan [144] to locate patients with speci c known event sequences such as the one described above. Once it had become easy to search for speci c known sequences we identi ed other questions that could not be answered as easily; e.g.: what typically happens to patients after 37 they leave the ER, or the ICU; what are the most common transfer patterns, what is the percentage of patients transferred from ICU to Floor and how long does it take, are there any unexpected sequences? All those new questions require a summary of all the transfer sequences and their temporal attributes. To this end, I propose LifeFlow to provide an overview of all transfer sequences. 3.3 Data Aggregation: Tree of Sequences The rst step of creating LifeFlow is to aggregate the data into a tree of sequences. The tree of sequences data structure was designed to preserve two im- portant aspects of event sequences: 1. Sequence of events e.g. Arrival!Intensive Care Unit (ICU)!Floor 2. Time gaps between events e.g. Emergency Room ! (and after 5 hours)! Transfer to ICU Figure 3.1 illustrates the conversion from four records of event sequences to a tree of sequences and then LifeFlow visualization. Raw data are displayed on a horizontal timeline with colored triangles representing events (in the same approach as LifeLines2 [131]). Each row represents one record. All records are aggregated into a tree of sequences based on the pre xes of their event sequences. For example, a record that contains event sequence Arrival ! ER ! ICU and a record that contains event sequence Arrival ! ER ! Floor, share the same pre x sequence Arrival ! ER. By default, the records are grouped event-by-event from the begin- 38 Pa#ent #3 '' 12/02/2008'14:26 'Admit' 12/02/2008'14:26 'Emergency' 12/02/2008'22:44 'ICU' 12/05/2008'05:07 'IMC' 12/08/2008'10:02 'Floor' 12/14/2008'06:19 'Exit' ' Pa#ent #3 '' 12/02/2008'14:26 'Admit' 12/02/2008'14:26 'Emergency' 12/02/2008'22:44 'ICU' 12/05/2008'05:07 'IMC' 12/08/2008'10:02 'Floor' 12/14/2008'06:19 'Exit' ' Pa#ent #2 '' 12/02/2008'14:26 'Admit' 12/02/2008'1 :26 'Emergency' 12/02/20 8'22:44 'ICU' 12/05/2008' 5:07 'IMC' 12/08/20 8' 0:02 'Floor' 12/14/2008'06:19 'Exit' ' #1' #2' #3' #4' Pa#ent #1 '' 12/02/2008'14:26 'Arrival' 12/02/2008'14:26 'Emergency' 12/02/20 8'22:44 'ICU' 12/05/2008'05:07 'IMC' 12/08/20 8'10:02 'Floor' 12/14/2008'06:19 'Exit' ' time Event Record Event Sequences LifeLines2 (4 records) 4 re co rd s time Tree of Sequences LifeFlow Represent Aggregate Average time Event Bar Node End Node Figure 3.1: This diagram explains how a LifeFlow visualization can be constructed to summarize four records of event sequences. Raw data are represented as col- ored triangles on a horizontal timeline (using the traditional approach also used in LifeLines2). Each row represents one record. The records are aggregated by se- quence into a data structure called a tree of sequences. The tree of sequences is then converted into a LifeFlow visualization. Each tree node is represented with an event bar. The height of the bar is proportional to the number of records while its horizontal position is determined by the average time between events. 39 ning of the event sequences to the end. In Figure 3.1, all records start with the blue event so they are grouped together (indicated by dashed rectangle) into a blue tree node. Then, they all also have the purple event, so they are still grouped together into a purple node. In the next step, two of them have red events while the other two have green events, so they are split into red and green nodes. Then do the same for the rest of the event sequences. Figure 3.2 shows a tree of sequences in more detail. Each node in the tree contains an event type and number of records while each edge contains time gap information and number of records. In some situations, users may choose to group consecutive events of the same type together when building the tree of sequences. For example, two consecutive transfers to Floor can be treated as one transfer with the second transfer ignored. A record with sequence Arrival ! ER ! Floor ! Floor ! ICU is treated as Arrival ! ER ! Floor ! ICU. This aggregation option can be turned on or o as needed via the user interface. Inspired by LifeLines2, LifeFlow allows users to choose any event type to be an alignment point. This supports tasks such as \what happened to the patients before and after they went to the ICU?", in which users can select ICU as an alignment point to answer. By default, an alignment point is not speci ed, so all records are aligned by the rst event in the record (Figure 3.2). An invisible root node is used as a starting point because all event sequences may not start with the same event type, which can result in multiple trees. Root node can connect these trees together into one tree. 40 Tree of Sequences A B Average time = 3 days No. of records = 1 C D Root E No. of records = 4 (without alignment) D F F F Figure 3.2: Tree of sequences data structure of the event sequences in Figure 3.1: Each node in the tree contains an event type and number of records. Each edge contains time gap information and number of records. Tree of Sequences B D A C E C A A D D B Alignment Point Positive Tree Negative Tree (with alignment) Figure 3.3: Two trees of sequences created from a dataset with an alignment point at B. A positive tree in forward direction and a negative tree in backward direction. 41 When an alignment point is speci ed, two trees are built separately from the alignment point. One tree for the sequences before the alignment (from right to left) and another tree for the sequences after the alignment (from left to right). Because the alignment point is time zero, the tree on the left is called negative tree because it occurs in negative time while the tree on the right is called positive tree. Figure 3.3 shows two trees of sequences built from another dataset with alignment point set at event type B. The tree of sequences data structure can be created in O(n) when n is a total number of events in a dataset. A tree building algorithm iterates through each record in a dataset and adds more information to existing nodes in a tree or grow new nodes if necessary. This approach reduces amounts of information from multiple event sequences into a data structure whose size depends on the number of patterns instead number of records, and makes it easier to be visualized. I then design LifeFlow visualization to visualize the tree of sequences. Inspired by the tree of sequences and LifeFlow, Lins et al. [75] develop an alternative visualization for the tree of sequences. 3.4 Visual Representation: LifeFlow Visualization Once a tree of sequences is created, it can be encoded into a LifeFlow visual- ization (Figure 3.1). Each node of the tree is represented with a color-coded event bar, matching the color of the event type. The height of a bar is determined by the number of records in that node proportionally to the total number of records. 42 For example, the red node contains two out of four records, so the height of its corresponding event bar is 50% of the total height. The horizontal gap between a bar (e.g. purple bar) and its parent (the blue bar on its left) is proportional to the mean time between the two events (blue ! purple). By default, the representative time gap is the mean, but users can select other metrics, such as the median. Each sequence ends with a trapezoid end node. The LifeFlow visualization scales for number of records, but is limited by number of patterns in the dataset and number of events in each record. One dataset that has four records and another dataset that has four millions records may have the same number of patterns and can be displayed using the comparable amount of screen space. Recommended datasets should have less than 100 patterns, with each record having less than 40 events. The number of records should not be an issue theoretically, but in its current implementation, LifeFlow is recommended for analyzing datasets with less than 250,000 records. 3.5 Basic Features I implemented a software prototype LifeFlow as a Java desktop application. (Please refer to the Appendix B for more implementation detail.) In addition to the compact visual representation, LifeFlow (Figure 3.4) includes the following interac- tions to support exploration: 1. Semantic Zooming: The horizontal zoom changes time granularity, while the vertical zoom allows the users to see rare sequences in more detail. Users 43 Figure 3.4: This screenshot of LifeFlow shows a random sample of patient trans- fer data based on real de-identi ed data. The way to read sequences in Life- Flow is to read the colors (using the legend). For example, the sequence (A) in the gure is Arrival (blue), Emergency (purple), ICU (red), Floor (green) and Discharge-Alive (light blue). The horizontal gap between colored bars represents the average time between events. The height of the bars is proportional to the num- ber of records, therefore showing the relative frequency of that sequence. The bars (e.g. Floor and Die) with same parent (Arrival! Emergency! ICU) are ordered by frequency (tallest bar on top), as you can see that Floor (green bar) is placed above Die (black bar). The most frequent pattern is the tallest bar at the end. Here it shows that the most common sequence is Arrival, Emergency then Discharge alive. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) 44 can also right-click any sequence and select \Zoom to this sequence". The visualization will animate and zoom to the selected sequence. 2. Tooltip: When users move the mouse cursor over an event bar (Figure 3.5), a tooltip displays the full sequence of events, and some statistical information, such as mean time between events, standard deviation, etc. 3. Overlay distribution of gap between events: Hovering the cursor over a bar displays the distribution of time gaps overlaid on the Life ow. Figure 3.5 shows the distribution of length of stay in the ER before the patients were discharged alive. 4. Sort: Users can sort the sequences with the same parent in di erent ways: by the number of records that the bars represent (tallest bar on top) (Figure 3.4) or by the average time to the end (longest time on top) (Figure 3.8). The default is to sort by number of records. 5. Integration with LifeLines2: LifeFlow can function as a standalone tool, but combining it with LifeLines2 facilitates exploration by allowing users to review individual records as details on demand [112]. By clicking on any event bar, users select all records that are included in that bar (Figure 3.6). Selected records are highlighted in the LifeLines2 view. Users can then choose to keep only the selection and remove everything else, or vice versa. In a symmetrical fashion, selecting a record in the LifeLines2 view highlights the pattern contained in that record in the LifeFlow view, allowing the users to 45 Figure 3.5: When users move the cursor over an event bar or gap between event bars, LifeFlow highlights the sequence and shows the distribution of time gaps. Labels help the users read the sequences easier. A tooltip also appears on the left, showing more information about the time gap. In this Figure, the distribution shows that most patients were discharged within 6 hours and the rest were mostly discharged exactly after 12 hours. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) 46 Figure 3.6: Here LifeFlow is used side-by-side with LifeLines2 so that individual records can be reviewed by scrolling. When a user clicks on a sequence in LifeFlow, the sequence is highlighted and all corresponding records are also highlighted and moved to the top in LifeLines2, allowing the user to examine them in more detail. In this example, a user noticed an uncommon pattern of frequent transfer back and forth between ICU (red) and Floor (green), so he selected those patients to see more detail. 47 nd other records that contain the same sequence. 6. Align: As mentioned earlier, LifeFlow allows users to choose any event type to be the alignment point. Figure 3.7 shows LifeFlow with alignment. The vertical dashed line marks the aligned event. The left and right side are what happened before and after the alignment point, respectively. Now one can see that in this dataset, patients most often come to ICU from Floor. After that, they often died. 7. Include/Exclude event types: Using the legend on the left side of the screen users can check or uncheck event types to include or exclude them from the sequences. This simple functionality allows powerful transformations of the display to answer questions. For example, in Figure 3.8, the user unchecked all event types except Arrival, Discharge-Alive and Die, i.e. the beginning and end of hospital visits. All other events that could occur during a visit are ignored and LifeFlow regenerated to show only those three event types, allowing rapid comparisons between the patients who died and survived in terms of number of patients and average time to discharge. 3.6 User Study I conducted a user study to investigate if LifeFlow was easy to learn, and if users could use the interface e ciently to answer representative questions. Another goal was to observe what strategies users chose and what problems they would en- counter, and gather feedback and suggestions for further improvement. The dataset 48 Figure 3.7: The same data with Figure 3.4 was aligned by ICU. The user can see that the patients were most likely to die after a transfer to the ICU than any other sequence because the black bar is the tallest bar at the end. Also, surprisingly, two patients were reported dead (see arrow) before being transferred to ICU, which is impossible. This indicates a data entry problem. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) 49 Figure 3.8: Using the same data with Figure 3.4, the user excluded all event types except Arrival, Discharge-Alive and Die, i.e. the beginning and end of hospital visits. All other events are ignored, allowing rapid comparisons between the patients who died and survived in terms of number of patients and average time to discharge. Patients who survived were discharged after 7.5 days on average while patients who died after 8.5 days on average. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) 50 in this study included 91 records of hospital patient transfer (a subset of real de- identi ed data, which included known anomalies to see if participants could nd them). Because medical professionals have very little availability, they are di cult to recruit for a user study. The data used in the study is simple enough to be under- stood by students, therefore, the participants in this study were graduate students (5 male and 5 female) from various departments of the University of Maryland. None of them were members of the LifeFlow development team. 3.6.1 Procedure Training consisted of a 12-minute video and ve training questions. When the participants could answer the questions correctly, they could start the study tasks. The order of the tasks was randomly permuted across participants. The tasks were representative of the questions proposed by domain experts during the case study, and designed to test the usability of the main interaction techniques of LifeFlow. Participants were encouraged to think aloud while performing the tasks. For the rst 14 tasks, observers recorded completion time and errors, if any. Because the participants needed time to understand the tasks, they were instructed to read the task description, indicate when they are ready and then start the timer. 51 3.6.2 Tasks 3.6.2.1 Tasks 1-9: Simple Features The rst 9 tasks required understanding the LifeFlow visualization and using simple interactive features such as tooltips or zooming. The tasks included ques- tions about frequent sequences (e.g. \Where did the patient usually go after they arrived?", \What is the most common pattern?"), number of records in speci- ed sequences (e.g. \How many patients went from arrival directly into ICU?"), time between events (e.g. \How long is the average time from ER to ICU?") and comparison between sequences (e.g. \After arriving to the ER, patients might be transferred to Floor or the ICU. Were the patients transferred from the ER to the ICU faster than from the ER to the Floor?"). 3.6.2.2 Task 10-14: Advanced Features Four tasks required using advanced features, such as alignment or using Life- Flow and LifeLines2 in combination (e.g. \Usually, where were the patients before they were admitted to the ICU for the rst time?", \Retrieve the IDs of all patients with this transfer pattern." or \How many patients had the same transfer pattern as patient no.10010010?") 3.6.2.3 Task 15: Overall analysis and nding anomalies In the last task, I asked the participants to imagine themselves as a manager who was trying to evaluate the performance of the hospital. I gave them 10 minutes 52 to nd any surprising, exceptional or impossible sequences that might indicate a problem in data entry or in hospital procedures, and explain why they thought it was a problem. I told them to report as many insights as they could in 10 minutes. I had planted three (realistic) data anomalies: 1. A few patients who died before being transferred to ICU. 2. Patients who bounce back and forth between the ICU and Floor several times. 3. Patients who stayed in the Floor for 3 months. 3.6.3 Results 3.6.3.1 Tasks 1-14 The participants were able to perform the simple and advanced tasks quickly. They were able to use the interactions to adjust the visualization and retrieve in- formation that was not presented in the initial view. The average SD completion time for the simple and advanced tasks were 14.9 12.7 seconds and 15.8 12.5 seconds, respectively. Please note that the participants were also narrating their actions while performing the tasks, which might have slowed them down. Only one participant made one mistake in a complex task: while retrieving the IDs of all patients who were transferred with a particularly long sequence she misread a color and could not nd the correct sequence. However, she knew what she needed to do to retrieve the IDs after the sequence was found. 53 3.6.3.2 Task 15 Eight out of ten participants were able to detect all three anomalies. Two participants reported only the rst two anomalies, but when I directed their atten- tion towards the third anomaly, they explained that they had noticed it but did not think it was abnormal because some patients (e.g., cancer patients) might stay in the hospital for a long time. In addition, they also provided insight about other sequences that were possible, but undesirable from a manager?s perspective, such as instances when patients who died in the Floor unit. They also reported surpris- ing patterns, such as many patients discharged alive directly from the ICU. I also observed the following common strategies: 1. Examine things that catch the eye rst. 2. Scan all sequences systematically top to bottom: When they saw the normal sequence, e.g., ICU to Floor, they noted that this is good and moved on to the next sequence. 3. Align by each type of event. 4. Hovering over bars with the mouse to explore distribution and detect outliers from the distribution. 5. See more detail in LifeLines2: Although I did not display LifeLines2 at the beginning of this task, several participants opened it to use the combined view. All participants used strategies 1-3 consecutively. Three participants also fol- lowed with strategy 4. Four participants followed with strategy 5. 54 3.6.3.3 Debrie ng During the debrie ng, typical comments included: \The tool is easy to un- derstand and easy to use.", \very easy to nd common trends and uncommon se- quences", \The alignment is very useful.", \In LifeFlow, it is easier to see the high level picture. With LifeLines2, you can check individuals. LifeFlow provides a great summary of the big picture." Common suggestions for improvement included in- creasing the bar width to make it easier to select and reorganizing the tooltip to make it more readable. Two participants also asked to analyze the data from their own research with LifeFlow. 3.6.4 Summary Results suggest that users can learn to use LifeFlow in a short period of time and that LifeFlow?s overview of the data allows them to understand patterns and nd anomalies. There were several common strategies used when performing data analysis but not every participant used all strategies, which indicated the need for a framework to support data analysis. 3.7 Advanced Features By collaborating with the domain experts in several case studies, I have de- veloped a better understanding of users? needs and limitations of LifeFlow. Their feedback guided me to invent several advanced features along the way to enhance LifeFlow?s ability to support more complicated tasks. These features are also gen- 55 eralizable and found to be useful for tasks in di erent domains. 1. Including Non-Temporal Attributes: Records also usually contain non- temporal attributes, e.g., patient?s gender or category of tra c incidents. These attributes can be categorized into record attributes and event attributes. (a) Record attribute is attached to each record of event sequences, for exam- ple, the gender of the patient or the county in which the patient lives. While LifeFlow does not focus on displaying these attributes, it allows users to select record attributes and group records by the selected at- tribute before the sequences are aggregated. This feature was motivated by a case study with the Center for Advanced Transportation Technology (CATT) Lab where the users have 200,000 tra c incident logs from sev- eral tra c agencies and want to compare their performance, so we wanted to be able to group the records by agencies and other record attributes LifeFlow in Figure 3.9 groups tra c incident records by agency before aggregating by sequence, therefore allowing simple comparison between agencies. Several attributes can also be used in combination. Please see the case study in Section 6.3 for further analysis of Figure 3.9 and more examples of attributes. (b) Event Attribute is attached to each event. For example, the name of the attending physician is a property of the event Arrival in a patient record because the physician may be di erent for each visit. 56 Figure 3.9: LifeFlow with tra c incidents data: The incidents are separated by agen- cies (A-G). Only Incident Notification and Return to normal (aggregated) events are shown. Other events are hidden. The agencies are sorted by a sim- ple measure of agency performance (average time from the beginning to the end). Agency C seems to be the fastest to clear its incidents, followed by E, A, H, D, F, B and nally G. 57 I received a request from my physician partners that they were also in- terested in the event attributes. They wanted to know the common diag- noses of the patients for each visit and hoped to see a summary of them. Therefore, I implemented a way to quickly show the summary by adding \Show attributes summary..." feature to the right-click menu. Users can right-click at any event bar and open the summary table that summarize all event attributes and also record attributes (Figure 3.10. Selecting any row in the summary table will also select the records in LifeFlow and LifeLines2 view, allowing users to drill down to the detail. 2. Custom Attributes: Users can assign custom attributes to selected records. This is found useful when users detect records with a pattern of interest and want to separate them from the rest. They can assign a custom attribute using the attribute for grouping. For example, user assigned an attribute \Status" as \Alive" and \Dead" to the patients who had pattern Arrival ! Discharge-Alive and Arrival! Die in Figure 3.8, respectively. After that, the user included other event types that were excluded earlier and chose to group records by attribute \Status" to show patterns of \Alive" and \Dead" patients (Figure 3.11). 3. Selection from distribution: Users sometimes nd interesting patterns from the distribution and want to select only part of the sequence to see more detail. For example, users can see from the distribution in Figure 3.5 that many patients were release exactly after twelve hours. Users can select 58 Figure 3.10: User right-click at the second Arrival event and select \Show at- tributes summary..." to bring up the summary table, which summarizes the common diagnoses. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) 59 Figure 3.11: User assigned attribute \Status" as \Alive" and \Dead" to the patients who had pattern Arrival ! Discharge-Alive and Arrival ! Die in Figure 3.8, respectively. After that, the user included other event types that were excluded earlier and chose to group records by attribute \Status" to show patterns of \Alive" and \Dead" patients. Notice that the majority of the dead patients were transferred to Floor rst and later transferred to the ICU before they died. (Please note that this is only a sample dataset and does not re ect the real performance of any hospital.) 60 only those patients to investigate more. This feature also enhances the functionality of LifeFlow as a query tool. For example, to nd patients who were transferred from ICU to Floor within two days, users can show the distribution of patients from ICU to Floor and draw a selection of patients whose transfer gaps were within two days. 4. Measurement: One limitation of LifeFlow is that it displays only time gaps between consecutive events. For example, for a sequence A ! B ! C, the horizontal gaps in the visualization only represent the mean/median time gaps for A!B and B!C, but the time gap between non-consecutive events (A!C) is not shown. Users cannot sum the two gaps together because a sum of the mean/median time gaps between consecutive events (A!B + B!C) is not always equal to the mean/median time gap A!C. See this counterexample: (A; 1)! (B; 2)! (C; 3) (A; 1)! (B; 3)! (C; 5) (A; 1)! (B; 6)! (D; 9) (A; 1)! (B; 7)! (D; 8) X(A! B) = 3:5; X(B ! C) = 1:5; X(A! C) = 3 To overcome this limitation, the measurement tool was developed to help users retrieve an accurate time gap between any two non-consecutive events. Users can select any event bar as start and end points and LifeFlow will display the 61 mean, median, SD on the tooltip and overlay the distribution at users? request. Users can also make a selection from this distribution (Figure 3.12). 5. Displaying all event bars with equal height: When data includes a large number of sequences, it could be di cult to review the rare sequences because they are represented with very thin bars. This option displaying all leaf nodes using equal height, regardless of the number of records, makes it easier to review and select rare sequences. 6. Episode: Some event sequences can be split into episodes. For example, a patient?s medical history may consist of multiple visits. My physician col- laborator wants to be able to see each visit easily. Therefore, I revised the visual representation to display a clear separation between episodes using dot- ted lines and allow users to select an event type that de nes the beginning of an episode from the user interface. For example, in Figure 3.13, the physician selected Arrival event to separate hospital visits. 7. Rank-by-feature: The gap between events indicates how long it took to change from the previous event to the next event. With many sequences and gaps displayed on the screen, nding the longest time gap to locate a bottle- neck in the process (for example, longest transfer time in the hospital) can be di cult, especially when some sequences are rare and displayed as very small bars. To support this task, I adopted the rank-by-feature technique and included a list of all gaps between events, sorted by chosen criterion (Fig- ure 3.14). User can choose to sort by mean, median, minimum, maximum and 62 Figure 3.12: User used the measurement tool to measure the time from ICU (red) to Discharge-Alive (light blue). The tooltip shows that it took about ten days. The distribution of time gap is also displayed. 63 Figure 3.13: The sequences of medical records are split into episodes using an event type Arrival to de ne the beginning of an episode. Dotted lines show separation between episodes. 64 standard deviation of the gaps. Once user selects any gap from the table, the visualization will animate and zoom to the selected gap. 3.8 Summary Analyzing large numbers of event sequences is an important and challenging task. Lacking ability to see all records on the screen at once makes it di cult to discover patterns. I introduce a new scalable visualization called LifeFlow that pro- vides an overview of event sequences to support users? exploration and analysis. The LifeFlow visualization is based on aggregating data into a data structure called a tree of sequences, which preserves all sequences and temporal spacing within sequences. I report on a short-term user study with ten participants which con rmed that even novice users with 15 minutes of training were able to learn to use LifeFlow and rapidly answer questions about the prevalence of interesting sequences, nd anomalies, and gain signi cant insight from the data. After the short-term study, I have worked with several domain experts in long-term case studies which led to deeper understanding of users? need and guided the improvement and extensions of LifeFlow to overcome its limitations and support more complex tasks. Although it was inspired by a case study in the medical domain, LifeFlow can be applied to many other elds, where event sequences are the main focus, such as student progress analysis, usability study or web log analysis, and human activities log analysis in general. I have conducted several case studies that demonstrate LifeFlow?s applications and report them in Chapter 6. 65 Figure 3.14: The sequences of medical records are broken into episodes using an event type Arrival to de ne the beginning of an episode. Dotted lines show sepa- ration between episodes. 66 Chapter 4 Querying Event Sequences by Similarity Search 4.1 Introduction Querying event sequences to answer speci c questions or look for patterns is an important activity. Such activities can be utilized in, for example, nding patients who were transferred from an emergency room to the ICU (Intensive Care Unit) and died, incidents in which the police arrived 2 hours after incident notification or a PhD student who proposed a dissertation topic twice before graduated. 4.1.1 Example of Event Sequence Query My physician partners in the Emergency Department at the Washington Hos- pital Center are analyzing sequences of patient transfers for quality assurance. One of their interests is the monitoring of bounce backs, which occurred when a patient?s level of care was decreased then increased again urgently, such as: 1. Patients who were transferred from the ICU to the Floor (normal bed) and then back to the ICU 2. Patients who arrived at the emergency room then were transferred to the Floor and then back to the ICU. 67 Time constraints are also associated with these sequences (e.g. the bounce backs should occur within a certain number of hours). The bounce back patients correspond to a quality metric for the hospital and are di cult to monitor. The physicians have been using MS Excel to nd these bounce back patients. They exported data from the database and wrote formulas to express the queries. An interview with the physician who performed these tasks revealed frustration with the approach because of its complexity and time-consuming aspects (it took too many hours to create the formulas). I also asked about the possibility of performing these queries using SQL. He explained that SQL was even harder for him and he was not quite sure how to start (even though he had earned a computer science undergraduate degree in addition to his medical degree.) 4.1.2 Motivation for Similarity Search Specifying temporal queries in SQL is di cult even for computer professionals specializing in such queries. I gave 6 computing students who had completed a database course, the schema of a simple dataset and asked them to write a SQL query to nd patients who were admitted to the hospital, transferred to Floor, and then to ICU (no time constraints were to be speci ed). Even with this simpli ed query, only one participant succeeded after 30 minutes, adding evidence that SQL strategies are challenging to use for temporal event sequences. Researchers have made progress in representing temporal abstractions and executing complex temporal queries [115, 116, 31], but there is little research that 68 focuses on making it easy for end users such as medical researchers, tra c engineers, or educators to specify the queries and examine results interactively and visually. To the best of my knowledge, existing event sequence query interfaces have used an exact match approach, in which each query is interpreted as \every record in the result MUST follow these constraints". As a result, the tool returns only the records that strictly follow every constraint in the query. This approach works well when the users are fairly certain about their query (e.g. \ nd all patients admitted to the emergency room within a week after leaving the hospital.") However, an exploratory search [124, 139], in which users are uncertain about what they are looking for, is gaining more attention. When using the exact match, broad queries return too many results that are not relevant. Narrow queries miss records that may be \just o " (e.g. 7.5 days instead of 7 days as speci ed in the query). A more exible query method could help the exploratory searchers. A similarity search interface has been used to query other types of data, such as images or text. In this approach, users can sketch an example of what they are seeking and get similar results. The users then receive a ranked list of results, sorted by similarity to the query. The ranked results can provide more exibility than exact match results, allowing users to capture the \just o " cases. The key behind similarity search is the similarity measure, which is used to calculate the similarity score between the query and every record, so all records then can be sorted by similarity to the query. 69 4.1.3 Chapter Organization In this chapter, I describe how I developed a new similarity search interface, Similan, and similarity measure for event sequences, Match & Mismatch (M&M) measure. Some parts of this chapter have been published in [144] and [143]. Section 4.2 introduces the rst version of M&M measure and Similan user interface. The M&M measure de nes similarity as a combination of the time dif- ferences between pairs of events and the number of mismatches. Similan allows the users to select an existing record from the database as a target record (query) and provides search result visualization. An evaluation of the rst version is reported in Section 4.3. Section 4.4 introduces the second version. To address the limitations of the rst version, Similan2, allows the users to draw an example of an event sequence by placing events on a blank timeline and search for records that are similar to their example and improves how events are visualized on the timeline. The M&M measure v.2 supports richer and more exible de nitions of similarity and improves performance using dynamic programming. Section 4.5 reports a controlled experiment compared exact match (Similan2 ) and similarity search interfaces (LifeLines2 [131]). I summarize the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both in Section 4.6. 70 Pa#ent#1) (target)) A" C" Pa#ent#2) A" C" B" B" Pa#ent#1) (target)) A" Pa#ent#2) A" C" B" B" b)" c)" Pa#ent#1) (target)) A" C" Pa#ent#2) A" B"C" B" B" C" Missing"Event" Extra"Event"Matched"Events" Mismatch"Score"="F(Number"of"Mismatches)" Match"Score"="F(?me"difference)" }"Total"Score"0"to"1" ?me" a)" Figure 4.1: (top) The M&M measure (bottom-left) High time di erence (low match score) but no mismatch (high mismatch score) (bottom-right) Low time di erence (high match score) but high mismatches (low mismatch score) 4.2 Similan and the M&M Measure: The First Version 4.2.1 Introduction to the Match & Mismatch (M&M) Measure The M&M measure is based on aligning temporal data by sentinel events [131], then matching events in the query record with events in the compared records. Since there can be many possible ways to match the events between the two records, I de ne an event matching process for the M&M measure, which will be explained in Section 4.2.3. After the matching is done, the M&M measure is a combination of two measures: The rst measure, match score, is for the matched events, events which occur both in the target record and the compared record. It captures the time di erence between events in the target record and the compared record. 71 The second measure, mismatch score, is for missing or extra events, events which occur in the target record but do not occur in the compared record, or vice versa. It is based on the di erence in number of events in each event type between the two records. Match and mismatch score are combined into total score, ranging from 0.01 to 1.00. For all three scores, a higher score represents higher similarity. 4.2.2 Description of the User Interface: Similan Given two event sequences, the M&M measure returns a score which repre- sents the similarity between that pair of records. However, the score alone does not help the users understand of why records are similar or dissimilar. Also, the M&M measure can be adjusted by several parameters. A tool to assist users in un- derstanding the results and customize the parameters is needed. To address these issues, Similan was developed to provide a visualization of the search results to help users understand the results, and an interface that facilitates the search and param- eter customization. Similan was written in C# .NET using the Piccolo.NET [14] visualization toolkit. The design of Similan follows the Information Visualization Mantra: overview rst, zoom and lter, details on demand [112]. 4.2.2.1 Overview Similan consists of 4 panels: main, comparison, plot and control, as shown in Figure 4.2. Users can start from selecting a target record from the main panel. 72 Figure 4.2: A screenshot of Similan, the predecessor of Similan2. Users can start by double-clicking to select a target record from the main panel. Similan will calculate a score that indicates how similar to the target record each record is and show scores in the color-coded grid on the left. The score color-coding bars on the right show how the scores are color-coded. The users then can sort the records according to these scores. The main panel also allows users to visually compare a target with a set of records. The timeline is binned (by year, in this screenshot). If the users want to make a more detailed comparison, they can click on a record to show the relationship between that record and the target record in the comparison panel on the top. The plot panel at the bottom shows the distribution of records. In this example, the user is searching for students who are similar to Student 01. The user sets Student 01 as the target and sorts all records by total score. Student 18 has the highest total score of 0.92, so this suggests that Student 18 is the most similar student. Although Student 41 and Student 18 both have one missing paper submission, Student 41 has a lower match score, therefore, Student 18 has a higher total score. 73 Figure 4.3: Relative Timeline: Time scale is now relative to sentinel events (blue). Time zero is highlighted in dark gray. After that, the main and plot panels give an overview of the similarity search result. Filtering and ranking mechanisms help users narrow down the search result. Users then can focus on fewer records. By clicking on a particular record, the comparison panel shows relationships between that record and the target record on demand. Moreover, mouse hovering actions on various objects provide details on demand in the form of tooltips. 4.2.2.2 Events and Timeline Colored squares are used to represent events. Each color represents a event type (category). Users can customize the colors and check the checkboxes in the control panel (Figure 4.4) to select interesting event types. The number of events in each event type is displayed behind the event type name. Similan?s timeline is not a continuous timeline but divided into bins. The bin interval is automatically calculated by the application window size and total time range of the data. As shown in Figure 4.2, the timeline is divided into years (05, 06, 07, 08). In each bin, events are grouped by event types and placed in the same order. Maintaining the same order allows for visual comparison between records. Each record is vertically stacked on alternating background colors and iden- 74 ti ed by its name on the left (see Figure 4.2). Ranking scores (more details in Section 4.2.2.4) appear on the left hand side before the name. Events appear as col- ored squares on the timeline. By default, all records are presented using the same absolute time scale (with the corresponding years or month labels displayed at the top) and the display is sized so that the entire date range ts in the screen. A double-click on any record marks that record as a target record. A target mark will be placed in front of the target record instead of a ranking score. Clicking on any record selects that record as a compared record. Both the target record and compared record will be highlighted. Users can move the cursor over colored squares to see details on demand in the form of tooltips. Also, zooming on the horizontal axis and panning are possible using a range slider provided at the bottom of the main panel. 4.2.2.3 Alignment Users can select a sentinel event type from a drop-down list as shown in Fig- ure 4.4. By default, the sentinel event type is set to none. When the sentinel event type is selected, the time scale will change from an absolute time, i.e. real time, into a relative time. The sentinel event becomes time zero and is highlighted (Figure 4.3). 4.2.2.4 Rank-by-feature Similan is inspired by the idea of rank-by-feature from Hierarchical Clustering Explorer (HCE) [108]. These following ranking criteria are derived from the M&M 75 Figure 4.4: Control Panel: (left) Legend of event types (categories) (middle-top) Users can choose to align events by selecting sentinel event type. (middle-bottom) Weight for calculating total score can be adjusted using slider and textboxes. (right) Links in comparison panel can be ltered using these parameters. measure proposed in this paper. 1. Total Score ranging from 0.01 to 1.00 Total score is the nal output of the M&M measure. It is a weighted sum of match and mismatch scores. The weight can be adjusted and users can see the result in real-time (Figure 4.4). 2. Match Score ranging from 0.01 to 1.00 This is a score derived from the distance (time di erence) between matched events. I choose to display match score instead of distance because the distance is a large number, so it can be di cult to tell the di erence between two large numbers and understand the distribution. 3. Number of Mismatches (#Mismatch) ranging from 0 to n This is the total number of missing and extra events compared to the target record. The #mismatch is shown instead of the mismatch score because it is 76 more meaningful to the user. Furthermore, I break down the #mismatch into event types. Positive and negative values correspond to the number of extra and missing events, respectively. Users can click on the ranking criteria on the top of the main panel to sort the records. By clicking on the same criteria once more, the order is reversed. A triangle under the header shows current ranking criterion. Legends in the control panel show the range of each ranking score and how they are color-coded. (See Figure 4.2.) 4.2.2.5 Scatterplot In addition to displaying results as a list in the main panel, Similan also visualizes the results as a scatterplot in the plot panel (Figure 4.2). Each record is represented by a \+" icon. Horizontal axis is the match score, while vertical axis is the number of mismatches (#mismatch). Records in the bottom-left area are records with high match score and low number of mismatches, which should be considered most similar according to the M&M measure. Moving the cursor over the + icon will trigger a tooltip to be displayed. Click- ing on a + will set that record to be the compared record and scroll the main panel to that record. Users can also draw a region on the scatterplot to lter records. The main panel will show only records in the region. Clicking on the plot panel again will clear the region and hence clear the lter. 77 4.2.2.6 Comparison The comparison panel is designed to show similarity and di erence between the target record and the compared record. Lines are drawn between pairs of events matched by the M&M measure. Line style is used to show the distance value. Strong links, or links with short distance, are shown as solid lines. Weak links, or links with large distance, are shown as dashed lines. Events without any links connected to them are missing or extra events. Users can adjust the distance threshold for strong links in the control panel. (See Figure 4.4.) Moving the cursor over a link will display a tooltip showing the event type, time of both events and distance. Furthermore, users can lter the links by using the lters (Figure 4.4). Users can lter by setting the minimum and/or maximum distance. By selecting link types, only the selected types are displayed. Strong links are links with a distance in the range speci ed by the slider. Forward Links are links which are not strong links and the event in the target record occurs before the event in the compared record, whereas, Backward Links are the opposite. 4.2.3 The Match&Mismatch (M&M) Measure This section explains the M&M measure in more detail. The base idea is that similar records should have the same events and the same events should occur almost at the same time. Therefore, the M&M measure uses the time di erence and number of missing and extra events as the de nition of similarity. The notation below is used to describe a record of event sequence, which is a 78 series of events (t; c). The i-th event in the record is denoted by xi or (ti; ci). X = f(t; c) j t 2 Time and c 2 EventTypesg 4.2.3.1 Matching The rst step is to match the events in the target record with events in the compared record. There can be many possible ways to match the events into pairs. Therefore, I de ne a distance function based on a sum of time di erence to guide the matching. The matching which produces the minimum distance (time di erence) will be selected. Note that the distance from the M&M distance function is not the nal result of the M&M measure, but only part of it. This distance is later converted to a match score. M&M Distance Function I rst de ne a distance function between each pair of events, as follows: d((t; c); (u; d)) = 8 >>>< >>>: jt uj if c = d 1 if c 6= d (4.1) The distance is computed from the time di erence if both events have the same type. The granularity of time di erence (years, months, days, etc.) can be set. Matching between di erent event types is not supported, so I set the distance between every pair of events that has di erent event types to in nity. 79 A distance function between the target record X and the compared record Y X = f(t1; c1); (t2; c2); :::; (tm; cm)g Y = f(u1; d1); (u2; d2); :::; (un; dn)g is described as the following: D(X; Y ) = min P i2[1;m];j2 [1;n] d(xi; yj) each value of i and j is used exactly once. (4.2) A distance function between two records is calculated by matching events from the two records into event pairs and summing up the distances d(xi; yj) between each pair. However, this distance function works only when the number of events in both records are equal because it requires a perfect match between the two records. Also, even when the number of events are equal, this case can occur: X = f(t1; \A"); (t2; \A"); (t3; \B")g Y = f(u1; \A"); (u2; \B"); (u3; \B")g \A",\B" 2 event types This will certainly create at least one pair of di erent type events, which is not preferred. Hence, the distance function lls in some null events (null; null) to equalize numbers of events between the two records in each event type. The two 80 lists above become. X = f(t1; \A"); (t2; \A"); (t3; \B"); (null; null)g Y = f(u1; \A"); (null; null); (u2; \B"); (u3; \B")g The distance function between each pair of events is revised. d0((t; c); (u; d)) = 8 >>>>>>>>>>>>>< >>>>>>>>>>>>>: 1 if c and d = null 0 if c = null; d 6= null 0 if c 6= null; d = null d((t; c); (u; d)) if c and d 6= null (4.3) The null events should not be paired together so the distance is in nity. The pairs that have one null event indicate missing or extra events. The distance function does not include extra penalty for missing or extra events. Penalty for missing and extra events will be handled separately by the mismatch score (Section 4.2.3.2). Therefore, the distance is zero in these cases. Last, if the pair does not contain any null events, the original distance function is used. Finally, a distance function between a target record X and a compared record Y becomes: D0(X; Y ) = min P i2[1;m];j2 [1;n] d 0(xi; yj) each value of i and j is used exactly once. (4.4) Minimum Distance Perfect Matching The problem is how to match every event in X to an event in Y to yield minimum distance. This problem can be 81 converted into an assignment problem [67]. \There are a number of agents and a number of tasks. Any agent can be assigned to perform any task, incurring some cost that may vary depending on the agent-task assignment. It is required to that all tasks are performed by assigning exactly one agent to each task in such a way that the total cost of the assignment is minimized." Let events from X (xi = (ti; ci)) become agents and events from Y (yj = (uj; dj)) become tasks. Cost of the assignment is d0(xi; yj). Then use the Hungarian Algorithm to solve the problem. The time complexity of the Hungarian Algorithm is O(n3) when n is the number of events in each record. If there are m records in the database, the time to perform a matching between the target record and all records, assuming that each record has approximately n events is O(mn3). 4.2.3.2 Scoring Once the matching is completed. The match, mismatch and total score can be derived from the matching. Match Score The distance from M&M distance function captures the time di erence between the two records. However, the distance can be a large number, which users nd di cult to compare. Therefore, I normalize the distance into a match score, ranging from 0.01 to 1.00. A higher score represents higher similarity. Only records with zero distance will yield a score of 1.00. Otherwise, the highest 82 possible match score for non-zero distance is bounded to 0.99. The lowest score is bounded to 0.01 because zero score may mislead the users to think that the target and compared record are not similar at all. Let n be total number of records in the dataset. X and Y are the target and compared record, respectively. The match score (M(X; Y )) is calculated from the following equations: D0max = Maxj2[1;n]D 0(X; Yj) (4.5) M(X; Yi) = 8 >>>< >>>: 1:00 if D0(X; Yi) = 0 D0max D 0(X;Yi) D0max :98 + :01 otherwise (4.6) Mismatch Score When the number of events in two records are not equal, there are missing or extra events. A missing event is an event that occurs in a target record but does not occur in a compared record. An extra event is an event that does not occur in a target record but occurs in a compared record. For example, imagine a target record for a patient who has chest pain, followed by elevated pulse rate, followed by a heart attack diagnosis. If the compared record has only chest pains and heart attack diagnosis, it has one missing event. I count a number of mismatches (N(X; Y )), a sum of missing or extra events in each event type, and normalize it into a mismatch score (MM(X; Y )), ranging from 0.01 to 1.00. Only records with no mismatch events will yield a score of 1.00. Other records will score within 0.01 to 0.99 range. Nmax = Maxj2[1;n]N(X; Yj) (4.7) 83 MM(X; Yi) = 8 >>>< >>>: 1:00 if N(X; Yi) = 0 Nmax N(X;Yi) Nmax :98 + :01 otherwise (4.8) Total Score The match score and mismatch score are combined into total score (T (X; Yi)) using weighted sum. T (X; Yi) = w M(X; Yi) + (1 w) MM(X; Yi) ; w 2 [0; 1] (4.9) Increasing the weight w gives match score more signi cance while decreasing w gives mismatch score more signi cance. The default value for weight is 0.5. (Both are equally signi cant.) For example, the users may not care whether there is any missing or extra event so the weight should be set to 1. Similan user interface allows users to manually adjust this weight and see the results in real-time. (See Section 4.2.2.4.) 4.2.3.3 Discussion The concept that the similar records should have the same events (low number of mismatches) and the same events should occur almost at the same time (low time di erence) is transformed into the M&M measure. Time di erence and number of mismatches are two important aspects of similarity captured by the M&M measure. Records with high match score are records with low time di erence while records with high mismatch score are records with low number of mismatches. The M&M measure can be adjusted to give signi cance to match or mismatch score. By de- 84 fault, the match score and mismatch score are assigned equal weights, so the most similar record should be the record with low time di erence and also low number of mismatches. 4.3 User Study A usability study for Similan was conducted with 8 participants. The goals in this study were to examine the learnability of Similan, assess the bene ts of a scatterplot, learn how the number of events and event types a ect user performance, and determine if users could understand the M&M measure in the context of its use. I also observed the strategies the users chose and what problems they encountered while using the tool. Synthetic data based on graduate school academic events, such as admission, successful dissertation proposal, and graduation, are used. This choice of data was intended to make the tasks more comprehensible and meaningful to participants, who were technically oriented graduate students. 4.3.1 Usability Study Procedure and Tasks Two versions of Similan were used in this usability study: one with full fea- tures (S-Full) and another without a scatterplot (S-NoPlot). All usability sessions were conducted on an Apple laptop (15 inch widescreen, 2.2 Ghz CPU, 2GB RAM, Windows XP Professional) using an optical mouse. The study had two parts. In the rst part, participants had an introduction to the M&M measure and training with the Similan interface without a scatterplot 85 (S-NoPlot). Then, the participants were asked to perform this task with di erent parameters: Given a target student and dataset of 50 students. Each student record has x event types of events and the total number of events is between y to z events. Find 5 students that are most similar to the target student using S-NoPlot. Task 1 : x = 2 , y = 4 and z = 6; Task 2 : x = 4 , y = 6 and z = 10; Task 3 : x = 6 , y = 8 and z = 16 In the second part, participants were introduced to the scatterplot and asked to perform task 4, 5 and 6 which are performing task 1, 2 and 3, respectively, but using S-Full instead of S-NoPlot. The datasets used in task 1-3 and 4-6 were the same but the students were renamed and the initial orderings were di erent. Task 1 and 4 were used only for the purpose of training. The results were collected from tasks 2, 3, 5 and 6. In addition to observing the participants? behaviors and comments during the sessions, I provided them with a short questionnaire, which asked speci c questions about the Similan interface. Answers were recorded using a seven-option Likert scale and free response sections for criticisms or comments. 4.3.2 Results For the rst part of this 30-minute study, all participants were observed to use the following strategy: rst select the target student, and then use the ranking mechanisms to rank students by the total score. In their rst usage, some partici- 86 pants also selected the student who had the highest total score to see more detail in the comparison panel. Afterwards, they just studied the visualization and reported that these students with high total scores are the answer. For the second part of the study, which focused on the scatterplot, most of the participants were observed to use the following strategy: rst select the target student, draw a selection in the plot panel, and then use main panel?s ranking mechanisms to rank students by the total score. However, a few participants did not use the scatterplot to do the task at all. They used the same strategy as in the rst part. Users spent comparable time on tasks 2 and 3 and on tasks 5 and 6. There was no di erence in performance times between tasks 2 and 3 or between tasks 5 and 6, even though there were more events in tasks 3 and 6. This is understandable since participants reported that they trusted the ranking provided by the interface. However, users spent more time doing the tasks while using the scatterplot. All of the participants trusted the total score ranking criterion and used it as the main source for their decisions. They explained that the visualization in the main panel convinced them that the ranking gave them the correct answers. Therefore, in the later tasks, after ranking by total score and having a glance at the visualization, they simply answered that the top ve are the most similar. All of them agreed that the main panel is useful for its ranking features and the comparison panel is useful in showing the similarity between the target and a compared student. However, they had di erent opinions about the scatterplot. Some of the participants mentioned that it was useful when they wanted to nd 87 similar students. They explained that the similar students can easily be found at the bottom left of the scatterplot. One participant said that she had to choose two parameters (#mismatch and match score) when she used the scatterplot. On the other hand, while using the main panel, she had to choose only one parameter (total score), which she preferred more. A few of them even mentioned that it is not necessary to use the scatterplot to nd similar students. Although they had di erent opinions about its usefulness in nding similar students, they all agreed that the scatterplot gives a good overview of the students? distribution. It can show clusters of students, which could not be discovered from other panels. Also, one participant pointed out that the main and comparison panels are helpful in showing how students are similar, while the plot is more helpful in explaining how students are dissimilar. Participants had positive comments on Similan?s simple, intuitive and easy to learn interface. Most of the participants got started without assistance from the experimenter. Nevertheless, some user interface design concerns were noted. Some participants noticed that the binned timeline could be misleading in some situations. Overall, participants liked the simple yet attractive Similan?s interface and strongly believed that Similan can help them nd students who are similar to the target student. Ranking in the main panel appears to be useful. By contrast, participants had di culties in learning the M&M measure, since it combines two kinds of scores. The scatterplot did not bene t the tasks in this study but it may prove useful for more complex databases. 88 Figure 4.5: Similan 1.5 prototype: The timeline is continuous and events are split into rows by event type. 4.3.3 Pilot Study of a New Prototype According to the user feedback, using a binned timeline can be misleading in some situations. A pair of events in the same bin can have a longer distance than a pair of events in di erent bins. Also, order of events within the same bin is hidden. Therefore, I develop a new prototype that adopts the continuous timeline used in Lifelines2 [131] and includes several improvements (Figure 4.5.) I did a pilot study with 5 participants to compare the binned timeline in the original version and continuous timeline in the new version and received these comments: The continuous timeline requires more space for each record and looks more complicated. The binned timeline is more compact, simpler and therefore more readable. It gives users less detail to interpret. However, the continuous timeline does not mislead users when comparing distances or ordering. Both types of timeline have advantages and disadvantages depending on the task. The binned timeline is suitable for general tasks that do not require ne-grain information while the continuous timeline is more suitable when ne-grain information is required. 89 4.4 Similan and the M&M Measure: The Second Version The rst Similan?s usefulness is limited for several reasons: it only allows users to select an existing record from the database as a query (not to specify an example of their choice), the binned timeline visualization can be misleading and frustrating to users in some situations (Figure 4.2), and the similarity measure is not exible enough to support di erent de nitions of similarity for di erent tasks. Therefore, to address these limitations, a new version of the similarity measure and the user interface were developed. Similan2 becomes a query interface which allows the users to draw an example of an event sequence by placing events on a blank timeline and search for records that are similar to their example using the M&M measure v.2. The M&M measure v.2 is designed to be faster than the rst version and customizable by four decision criteria, responding to users? need for richer and more exible de nitions of similarity. Similan2 allows the users to customize the parameters in the M&M measure v.2 via the user interface and also changes how events are visualized on the timeline. 4.4.1 Description of the User Interface: Similan2 4.4.1.1 Overview Similan2 (Figure 4.6), is an Adobe Air Application using the Adobe Flex 3 Framework. The designs of LifeLines2 and Similan2 have evolved in parallel. Similan2 adopted the basic display of records from LifeLines2 to solve the binned timeline issue: each record is stacked on the main panel, events are colored 90 Figure 4.6: Similarity search interface (Similan2) with the same query as in Figure 4.11. Users specify the query by placing events on the query panel. To set the time range of interest and focus on events within this range, users draw a red box. After clicking on \Search", all records are sorted by their similarity to the query. The similarity score is represented by a number that is the total score and a bar with four sections. A longer bar means a higher similarity score. Each section of the rectangle corresponds to one decision criterion, e.g. the top two records have longer leftmost sections than the third record because they have lower time di erence so the Avoid Time Di erence Score (AT) is high, resulting in longer bars. Figure 4.7 shows how users can adjust the weight. 91 triangle icons, and users can customize the visibility and colors of each event type (category). Users can also align all the records by a selected event type (e.g. align by admission to the hospital in Figure 4.6). Similan2 also employs an improved similarity measure (M&M measure v.2), which will be explained in Section 4.4.2. In Similan2, the panel on the top is called the query panel, where users can specify their queries. On the right side is the control panel, which provides controls for users to customize the search parameters. The largest area on the screen is the main panel, where all records in the data are listed. 4.4.1.2 Query To perform a query users rst create or select an existing record. For example, to nd patients who were admitted, transferred to the ICU room on the rst day and then to the intermediate room on the fourth day, users can start by aligning all records by Admit. Then users click on the edit button on the query panel to open a popup window, and drag and drop events onto the empty timeline (i.e. users can select Admit from the list of event types shown in the popup and click on Add. The cursor will change into a magic wand and they can drop the event on the line). Figure 4.6 shows the patterns they created Admit, ICU and Intermediate at time 0, on the rst day and fourth day, respectively. The only type of time constraint that is currently supported by Similan2 is specifying when each event occurred. Users can also select any existing record as a query by dragging that record from the main panel and dropping it into the query panel. This is useful for nding 92 patients who exhibit a pattern of events similar to a particular known patient. A time scope can be drawn on the top of the timeline (See red line in Figure 4.6). In the example query, drawing a scope from the time zero to the end of the fourth day will exclude all other events outside of the scope from the search. If no scope is speci ed, the entire timeline will be selected by default. The unit for time di erences (e.g. hours or days) can be selected from a drop-down list. Event types that should be excluded from the search can be unchecked in the control panel. After clicking on Search, the records are sorted by their similarity score (plac- ing records with the highest scores on the top). Each record has a score indicator, a rectangle with four sections of di erent color (See Figure 4.6.), inspired by Val- ueCharts [24], a visualization to support decision-makers in inspecting linear models. The length of a score indicator represents total score. It is divided into four colored parts which represent the four decision criteria. The length of each part corre- sponds to the weight score. Placing a cursor over the score indicator brings up an explanation tooltip. 4.4.1.3 Comparison Users can see a detailed comparison of the query and any other record by dragging that record into the comparison panel in the bottom. Lines are drawn between pairs of events matched by the M&M measure v.2. Hover over a link brings up a tooltip showing the event type, time of both events and time di erence. 93 Figure 4.7: Similan2?s control panel has 2 tabs. The rst tab is \search" as shown in Figure 4.6. Another tab is \weight and detailed weight", for which users can adjust the weight of the four decision criteria using the four sliders in the left gure. For more advanced customization, they can even set the weight for each event type within each decision criterion by clicking on \more details" (right gure). 94 4.4.1.4 Weights By default, the search uses default weights, which means that all criteria are equally important. However, users may have di erent meanings for similarity in mind. Similan2 allows users to adjust the weight of all criteria in the \Weight" tab in the control panel. (See Figure 4.7.) The weight for each decision criterion can be adjusted with the slider controls, as well as the weight of each event type for each decision criteria. A click on \Apply Weight" refreshes the similarity measures and the order of the records on the display. For example, if the value of time intervals is not important in this task (e.g. nding patients who were admitted, transferred to the special room and exited) the user can set a low weight for \Avoid Time Di erence" to reduce its importance. Because the de nition of weights can be complex, Similan2 includes sets of preset weight combinations for users to choose from. For instance, one preset is called \Sequence", which uses a low weight for \Avoid Time Di erence" and a high weight for \Avoid Missing Events". 4.4.2 The Match and Mismatch (M&M) Measure v.2 The M&M measure v.2 improves on the original version in two ways: First, the matching problem is reduced to a simpler problem than the assignment problem. Therefore, the matching algorithm can be improved by using dynamic programming instead of the Hungarian Algorithm. Second, the M&M measure v.2 considers more types of di erences. It splits the number of mismatches into number of missing events and number of extra events and also includes number of swaps. Moreover, 95 Record#1 (target) A! Record#2 B! C! B! C! C! M&M Matching v.1! M&M Matching v.2! A! B!C! B! C! D! Record#1 (target) A! Record#2 A! Record#1 (target) Record#2 B! B! B! B! Record#1 (target) Record#2 C! C! C! C! C! Record#2 D! Figure 4.8: (left) M&M Matching v.1 (right) M&M Matching v.2: Events in each event type are matched separately. it increases the exibility by adding more customizable parameters. The M&M measure v.2 still consists of two steps: matching and scoring. 4.4.2.1 Matching The M&M measure does not allow matching between events in di erent cat- egories and allows only one-to-one matching. For example, event A can only match x1! x2! x3! x4! x5! x6! x7! x8! x9! y1! y2! y3! y4! Figure 4.9: M&M Matching v.2: Dynamic programming table 96 with event A and cannot match with event B or C. (See Figure 4.8.) Therefore, the matching can be reduced into a simpler problem by separating the matching for each event type. The notation below is used to describe an event sequence record, which is a list of timestamped events (t; c). The i-th event in the record is denoted by xi or (ti; ci). X = f(t; c) j t 2 Time and c 2 Categoriesg (4.10) The M&M measure v.2 splits each record into several lists, one list for each event type. For example, these two records X and Y X = f(t1; \A"); (t2; \A"); (t3; \B")g Y = f(u1; \A"); (u2; \B"); (u3; \B")g \A",\B" 2 Categories are split into XA, XB and YA, YB, respectively. XA = f(t1; \A"); (t2; \A")g ; XB = f(t3; \B")g YA = f(u1; \A")g ; YB = f(u2; \B"); (u3; \B")g (4.11) The problem of \matching events between two records" is then reduced to \matching events between two lists that contain only events in the same event type" multiple times, which is simpler. (See Figure 4.8.) For example, matching X and Y is reduced to matching XA with YA, and XB with YB. A faster algorithm based on dynamic programming can be used instead of the Hungarian Algorithm to nd the match between XA and YA that produces the minimum time di erence. 97 Dynamic Programming Matching Figure 4.9 shows a dynamic program- ming table. The value in each cell (cell(i; j)) is the minimum cost of matching subsequences X[1::i] and Y [1::j]. X must be longer or have equal length with Y . Cross symbols mark the cells that cannot be used because the matches would yield non-perfect matchings for Y . For example, matching y2 with x1 will cause y1 to have no match. The M&M matching v.2 algorithm (Algorithm 2) starts from the top-left cell and lls the cells from left to right, row by row. For each cell, the cell value is equal to the minimum between: 1. Cost of matching xi to yj (d(xi; yj) = jxi:time yj:timej) plus minimum cost of matching the pre xes (upper-left cell: cell(i 1; j 1)) 2. Minimum cost of matching yj to some x before xi (left cell: cell(i 1; j)) which can be represented by this formula: cell(i; j) = min 8 >>>< >>>: d(xi; yj) + cell(i 1; j 1) cell(i 1; j) (4.12) If choice 1 is selected, that cell maintains a link to its upper-left cell. If choice 2 is selected, that cell maintains a link to its left cell. After lling the entire table, the minimum matching cost is the value of the bottom-right-cell. The matching that produces the minimum cost can be retrieved by backtracking the link, beginning from the bottom-right cell. Time Complexity If the number of events in XA and YA are nA and mA, 98 Algorithm 2 M&M Matching v.2 1: n length(X) 2: m length(Y ) 3: di n m 4: c array[di +1][m] 5: for j := 0 to m 1 do 6: for i := 0 to di do 7: cost d(xj+i; yj) 8: if j > 0 then 9: cost cost + c[i][j 1] 10: end if 11: if i > 0 then 12: c[i][j] min(cost , c[i 1][j]) 13: else 14: c[i][j] cost 15: end if 16: end for 17: end for and nA > mA, the time to match the events between XA and YA with dynamic programming is O((nA mA) mA) (4.13) Using the matching v.1 based on the Hungarian Algorithm, the time complex- ity of matching events between X and Y is O((max(nA;mA) + max(nB;mB) + max(nC ;mC) + :::) 3) (4.14) Using the matching v.2, the time complexity is reduced to: O((nA mA) mA + (nB mB) mB + (nC mC) mC + :::) (4.15) 99 Record#1 (target) A! B! C! Types of Difference! Record#2 A! B! C! Record#3 A! C! Record#4 A! B! C!C! Record#5 A! C! B! extra! missing! B! swap! time difference! time! Figure 4.10: Four types of di erence: time di erence, missing events, extra events and swaps 4.4.2.2 Scoring Once the matching is completed. The scores can be derived from the matching. The rst version of the M&M measure considers only two types of di erence: time di erence and number of mismatches (missing or extra events). In this second version, I decided to split the number of mismatches into number of missing and extra events because these two numbers can have di erent signi cance. For example, users may not care about extra events but want to avoid missing events, or vice versa. I also included the number of swaps because sometimes the users want the events in order but sometimes the order is not signi cant. Therefore, the M&M measure v.2 considers four types of di erence and allows users to customize each type of di erence in more details for each event type. The four types of di erences are listed as follows: 100 1. A match event is an event that occurs in both the query and the compared record. The time di erence (TD) is a sum of time di erences within each pair of matched events. The time di erence is kept separately for each event type. Users also can specify what time unit they want to use for the time di erence. 2. A missing event is an event that occurs in a query record but does not occur in a compared record. The number of missing events (NM) is counted for each event type. 3. An extra event is an event that does not occur in a query record but occurs in a compared record. The number of extra events (NE) is counted for each event type. 4. A swap occurs when the order of the events is reversed. The number of swaps (NS) is counted for each pair of event categories. For example, in Figure 4.10, the query has A followed by B then C but record#5 has A followed by A then C then B. If you draw a line from query?s C to record#5?s C and do the same for B, it will create one crossing. So, the number of swaps between B and C (NSB;C) is 1 while NSA;B and NSA;C are both 0. Since the time di erence may not be equally important for all categories, the total time di erence ( P TD) is a weighted sum of time di erence from each event type. Users can adjust what is important by setting these weights ( P wTD = 1). X TD = wTDA TDA + w TD B TDB + ::: (4.16) 101 Likewise, the total number of missing events ( P NM), total number of extra events ( P NE) and total number of swaps ( P NS) are calculated from weighted sums. X NE = wNEA NEA + w NE B NEB + ::: (4.17) X NM = wNMA NMA + w NM B NMB + ::: (4.18) X NS = wNSA;C NSA;C + w NS B;C NSB;C + ::: (4.19) Four Decision Criteria The 4 types of di erences are normalized into a value ranging from 0:01 0:99 and called penalties. The total time di erence ( P TD), total number of missing events ( P NM), number of extra events ( P NE) and total number of swaps ( P NS) are normalized into TD penalty, NM penalty, NE penalty and NS penalty, respectively. The 4 penalties are converted into these 4 decision criteria: 1. Avoid Time Di erence (AT) = 1 TD penalty 2. Avoid Missing Events (AM) = 1 NM penalty 3. Avoid Extra Events (AE) = 1 NE penalty 4. Avoid Swaps (AS) = 1 NS penalty Total Score The total score is a weighted sum of the four decision criteria. The users can adjust the weights (wAT ; wAM ; wAE; wAS) to set the signi cance of each 102 decision criteria ( P w = 0:99). T = wAT AT + wAM AM + wAE AE + wAS AS (4.20) The total score (T ) is from 0:01 to 0:99. The higher score represents higher sim- ilarity. The weighted sum model was chosen because of its simplicity and ease of presentation to the users. 4.5 User Study 4.5.1 Motivation for a Controlled Experiment Using the Multi-dimensional In-depth Long-term Case Study methodology [114], I worked with a physician by assisting him through the analysis of his data using two query tools: LifeLines2 [131, 132] (an exact match user interface from my research group) and Similan2 (similarity search). The physician reported that he was able to specify his queries easily in much shorter time than with the spreadsheet, and that he discovered additional patients who he had missed using his earlier work with Excel. He clearly stated that visualizing the results gave him a better understanding of the data, which could not have been achieved from his spreadsheet or an SQL query. He also hinted at advantages and disadvantages of both visual approaches. For example he felt that similarity search made it easier to specify the pattern but looking at the results ranked by similarity was di cult and sometimes frustrating as 103 Figure 4.11: Exact match interface (LifeLines2) showing the results of a query for pa- tients who were admitted to the hospital then transferred to the Intensive Care Unit (ICU) within a day, then to an Intermediate ICU room on the fourth day. The user has speci ed the sequence lter on the right selecting Admit, ICU and Intermediate in the menus, and aligned the results by the time of admission. The distribution panel in the bottom of the screen shows the distribution of Intermediate, which gives an overview of the distribution and has allowed users to select the time range of interest (e.g. on the fourth day) by drawing a selection box on the distribution bar chart. he was not always con dent that the similarity measure was adequately computed to t his needs. (The computation is explained in details later in this paper.) Those contrasting bene ts led me to design the controlled experiment to see if I could con rm those impressions and better understand which query method is better suited for di erent tasks. 104 4.5.2 Description of the User Interface: LifeLines2 LifeLines2 was developed during a previous HCIL project, which is described here only for the purpose of the controlled experiment. LifeLines2 (Figure 4.11) is a Java application, utilizing the Piccolo 2D graphics framework [14]. In LifeLines2, each record is vertically stacked on an alternating background color and identi ed by its ID on the left. Events appear as triangle icons on the timeline, colored by their type (e.g. Admission or Exit.) Placing the cursor over an event pops up a tooltip providing more details. The control panel on the right side includes lters and other controls. The visibility and color of each event type (category) can be set in the control panel. Users can select an event type to align all the records. For example, Figure 4.11 shows records aligned by the Admit event. When the alignment is performed, time is recomputed to be relative to the alignment event. Users can apply the sequence lter to query records that contain a particular sequence, e.g. nding patients who were admitted, then transferred to a special room and exited. The rst step is to select a sequence lter from the \ lter by" drop-down list, then several drop-down lists that contain event types will appear. Users then set the values of the 1st, 2nd and 3rd drop-down lists to Admit, Special and Exit, respectively. The records that pass this lter will be selected and highlighted in yellow. A click on \Keep selected" removes the other records. To query for records that have events occurring at particular intervals, users have to rst display the distribution of selected events (with the distribution control) 105 then select intervals on the distribution display. For example, to nd patients who were admitted, then transferred to the ICU room on the rst day of their stay and transferred to the intermediate ICU room on the fourth day, users have to align all the records by Admit, show the distribution of ICU using the \Show Distribution of" control, then select the rst day on the distribution of ICU events at the bottom of the screen and click on \Keep selected" then show the distribution of Intermediate and draw a selection box from the rst to the fourth day and \Keep selected". (See Figure 4.11.) A similar process can be used for consecutive interval speci cation using di erent alignments and ltering. 4.5.3 Method I conducted a controlled experiment comparing 2 interfaces: LifeLines2, an exact match interface, and Similan2, a similarity search interface. My goal was not to determine which tool was superior (as they are clearly at di erent stages of re nement and represent di erent design concepts), but to understand which query method was best suited for di erent tasks. Another goal was to observe the di culties that users encountered while using the interfaces to perform given tasks. Both interfaces were simpli ed by hiding certain controls to focus on the query features I wanted to compare. 4.5.3.1 Research questions The evaluation was conducted to answer these research questions: 106 1) Are there statistically signi cant di erences in performance time and per- formance accuracy between the two interfaces while performing di erent tasks? 2) Are there statistically signi cant di erences in time and accuracy between the performance of di erent tasks while using each interface? 3) Is there a statistically signi cant di erence between the subjective ratings given by the users for the two interfaces? 4.5.3.2 Participants Eighteen graduate and senior undergraduate students participated in the study. I recruited computer science students who are assumed to have a high level of com- fort with computers, but have no knowledge of either interface. The participants included 13 men and 5 women, 20 to 30 years of age. Participants received $20 for their 90-minute participation. To provide the motivation to perform the tasks quickly and accurately, an additional nominal sum was promised to the fastest user with the fewest errors of each interface. 4.5.3.3 Apparatus The participants were required to perform the given tasks with the two in- terfaces: LifeLines2 and Similan. The two software interfaces were running on an Apple Macbook Pro 15" with Windows XP operating system. The participants controlled the computer using a standard mouse. 107 Tasks { The tasks were designed based on real scenarios provided by physicians and simpli ed to make them suitable for the time limit and participants who had never used the interfaces before. Participants were requested to nd patients in the database who satis ed the given description. To avoid the e ect of alignment choice, all tasks contained an obvious sentinel event (e.g. Admit). I considered these factors when designing the tasks: 1. Query type: Either a sequence description was provided or an existing record was used as a query. 2. Time constraint : Present or not 3. Uncertainty : Yes or No, e.g. the number of events may be precise or not, the time constraint may be exible or not. The tasks that were used in the experiment are listed below: Task type 1: Description without time constraint, no uncertainty 1: \Find at least one patient who was admitted, transferred to Floor then to ICU." 1.2: \Count all patients who t task 1 description" Task 1 was designed to observe how quickly the participants can use the interface to specify the query while task 1.2 focused on result interpretation and counting. Task type 2: Description with time constraints, no uncertainty 2: \Find at least one patient who was admitted and transferred to Intermediate on the second day then to ICU on the third day." 2.2: \Count all patients who passed task 2 description." 108 Task type 3: Description with uncertainty, without time constraint 3: \Find a patient who best matches the following conditions: Admitted and then transferred to a special room approximately 2 times and transferred to ICU room after that. If you cannot nd any patient with exactly 2 transfers to the special room, 1-3 transfers are acceptable." 3.2: \Count all patients who passed task 3 description." Task type 4: Description with uncertainty and time constraint: \Find a patient who best matches the following conditions: Admitted, transferred to Floor on the rst day, ICU approximately at the end of the third day. The best answer is the patient who was transferred to ICU closest to the given time as possible." Task type 5: Existing record provided as query: \Find a patient who was transferred with the most similar pattern with patient no. xx during the rst 72 hours after being admitted. Having everything the same is the best but extra events are acceptable." Data { I used a modi ed version of the deidenti ed patient transfer data provided by my partners. The data contained information about when patients were admitted (Admit), transferred to Intensive Care Unit (ICU), transferred to Intermediate Care Unit (Intermediate), transferred to a normal room (Floor), and exited (Exit). Questionnaire { A 7-item, 7-point Likert-scale questionnaire was devised by the experimenter to measure the learnability and ease or di culty of using the interfaces while performing the di erent tasks, and the level of con dence of the answers they 109 provided for the di erent tasks. The highest (positive, such as \very easy" or \very con dent") score that could be attained on the measure was 7; the lowest (negative, such as \very hard" or \not con dent") score was 1. Thus, higher scores re ected more positive attitudes toward the interfaces. Q1: Is it easy or hard to learn how to use? Q2: Is it easy or hard to specify the query with sequence only? Q3: Is it easy or hard to specify the query with time constraint? Q4: Is it easy or hard to specify the query with uncertainty? Q5: Is it easy or hard to specify the query in the last task? Q6: How con dent is your answer for nding at least one, best answer tasks? Q7: How con dent is your answer for counting tasks? 4.5.3.4 Design The independent variables were: Interface type (2 treatments): exact match and similarity search, Task (8 treatments) The dependent variables were: The time to complete each task, error rate for each task, and subjective ratings on a 7-point Likert scale. The controlled variables were: Computer, mouse and window size. I used equivalent datasets for each interface. To control learning e ects, the presentation order of the LifeLines2 and Sim- ilan2 interfaces was counterbalanced. To avoid situations in which the users would 110 always nd repeating the tasks on the second system easier, since they already know the answer, I also used two sets of questions and datasets, one for each interface. The questions in the two sets are di erent but have the same di culty level, for example: \Find at least one patient who was admitted, transferred to Floor then to ICU." and \Find at least one patient who was admitted, transferred to Floor then to IMC." Half of the participants started with LifeLines2 while another half started with Similan2. 4.5.3.5 Procedure Participants were given training which included a brief description of the data and ten-minute tutorials on how to use the rst interface. Then, the participants had to complete two training tasks. When the participants could answer the training questions correctly, they were considered ready to perform the study tasks. Next, the participants were asked to perform eight tasks using the rst interface. After that, the experimenter followed the same procedure (tutorial, training tasks, study tasks) for the second interface. Upon completion of the tasks, the participants were asked to complete the 7-point Likert scale questionnaire. At the end of the experiment, I debriefed the participants to learn about their experience while using the interfaces for the di erent tasks and their suggestions for improving the interfaces. 111 Figure 4.12: Performance time a function of the interface type and the tasks (1-5). Vertical bars denote 0.95 con dence intervals. 4.5.4 Results 4.5.4.1 Performance Time To examine the e ects of the type of interface and the task on time to perform tasks 1-5, I conducted a two-way ANOVA with repeated measures. The time to perform the task was the dependent variable and the type of interface and the task were within participants independent variables. The results of the analysis showed that the main e ect of the task was signi cant (F(4, 68) = 15.15, p<.001). The two-way interaction (interface task) was also signi cant (F(4, 68) = 6.63, p<.001). The main e ect of the interface was not found to be signi cant (F(1, 17) = 1.60, p=.22). Figure 4.12 shows the performance time as a function of the interface and the task. It can be seen that for tasks 1-3, the performance times using the two 112 interfaces are very similar and increase for the tasks with time constraint (2) and uncertainty (3) (M SD of 26.83 10.90 s, 39.58 23.92 s and 58.67 33.60 s, re- spectively). However, the average performance times for tasks 4 and 5 are shorter using the similarity search interface (M SD of 51.73 13.21 s and 37.74 18.63 s, respectively) than while using the exact match interface (M SD of 68.33 31.18 s and 72.05 34.41 s, respectively). It can also be observed that the variances in the performance time of tasks 2-5 are larger while using the exact match. A post-hoc Duncan test showed that the performance times of tasks 4 and 5 are signi cantly shorter while using the similarity search interface (p<.05). When using the exact match, there were signi cant di erences in performance time between two homogenous groups: tasks 1-2 versus tasks 3-5 (p<.001). When using the similarity search, the main signi cant di erence in performance time was between task 3 to tasks 1 and 5 (p<.05). Similar analytic procedures were followed in the analysis of the e ects of the interface type and the task on time to perform the counting tasks (1.2, 2.2 and 3.2). The results of the analysis showed that the only e ect that was found to be signi cant was the main e ect of the interface (F(1, 17) = 23.65, p<.001). The average performance time while interacting with the exact match was signi cantly shorter than with the similarity search (M SD of 2.32 3.75 s and 15.20 25.10 s, respectively) The main e ect of the task (F(2, 34) = 2.05, p=.14) and the interaction e ect (F(2, 34) = 2.03, p=.15) were not found to be signi cant. 113 Table 4.1: Results of the analysis of subjective ratings given by the participants to the two interfaces while performing the di erent tasks. \X" denotes exact match while \S" denotes similarity search. \*" indicates preferred interface. 4.5.4.2 Error Rates To compare the error rates between the two interfaces while performing the di erent tasks, I performed a McNemar?s test, which is a non-parametric test that is used to compare two population proportions that are related or correlated to each other. Since the error rates of tasks 1-3 were zero for both interfaces, I conducted this analysis only for tasks 4 and 5 (4 and 2 incorrect answers using the exact match, respectively and no error while using the similarity search). The results of the analysis showed that there was no signi cant di erence between the two interfaces in the error rates of task 4 ( 2(1)=3.06,p=.08) and 5 ( 2(1)=1.13,p=.29). 4.5.4.3 Subjective Ratings To compare the di erence between the subjective ratings given by the partic- ipants to the two interfaces, I conducted a paired-sample t-test for each question. The results of the analysis are presented in Table 4.1. The results showed that there was no signi cant di erence for the ease of learning how to use the two interfaces (Q1). The participants reported the exact match to be signi cantly easier to use 114 than the similarity search for the task with sequence only (task 1) (Q2). However, for the tasks with only time constraint (task 2) (Q3) or only uncertainty constraint (task 3) (Q4), they reported the similarity search to be signi cantly easier to use than the exact match. They also reported the similarity search to be signi cantly easier to use than the exact match in the task that required them to nd a patient which is the most similar to the given patient (Q5). There was no signi cant dif- ference between the con dence levels of the answers for the tasks which required nding at least one, best answer (tasks 1-5) (Q6). However, the participants were signi cantly more con dent while using the exact match than the similarity search to nd the answers for the counting tasks (tasks 1.2, 2.2 and 3.2) (Q7). 4.5.4.4 Debrie ng When asked about what they liked in LifeLines2, the participants said that it is easy for nding a sequence (\Easy to nd sequence", \Very easy to query with sequence" \Very intuitive to specify sequence") and counting (\Show only matched records make it easy to count", \It gives con dence.") However, when asked about what they did not like in LifeLines2, they ex- plained that it is di cult for uncertain and more complex tasks because they had to change the query and sometimes, more than one lter is needed. (\It doesn?t nd the similar case when it can?t nd the case I want", \Di cult for complex tasks or tasks with uncertainty", \Hard to nd approximate best match", \Harder in Life- Lines2 because I had to change the query [for the uncertain task]", \In order to nd 115 a patient, sometimes more than one lter is needed.") When asked about what they liked in Similan2, the participants said that it is more exible and easier to nd similar patients. (\Very easy to nd the similar pattern.", \Similan is more exible.", \The similarity measure makes it FAR easier to nd the best matches.", \Excellent in nding ?at least one? type results when formulating the query is harder [in LifeLines2] due to ambiguity.") They also said that it is easier to specify the time constraints in a query and that specifying what the answers should look like makes the search process more transparent. (\Query with time constraint is very easy.", \Time constraint searches are easier to input.", \the search process is more transparent.", \Drag and drop triangles gave me better control of how the speci c sequences should look.") However, when asked about what they did not like in Similan2, the partici- pants expressed di culty in using it for the counting tasks because it is di cult to distinguish between the exact match results and the similar results. (\No easy way to count" \not sure [whether] the top rows are the only answers") Also, sometimes it is unclear where to place the events on the timeline. (\In Similan2, it is not im- mediately obvious where to place the icon for the ?second day?.") Two participants also mentioned that similarity search responded slightly slower than exact match. Common suggestions for improvement included: \LifeLines2 should have a list of previous actions and undo button", \A counter in Similan for all patients that had a match of X score or better could be helpful.", \Have individual weight for events in the query in Similan, so the users can specify if some events are more important than others.", \Have more weight presets to choose from." 116 4.6 Lessons Learned and Ideas for Hybrid Interface The experiment showed that exact match interface had advantages in nding exact results. Users preferred to use it to nd a simple sequence without time constraint or uncertainty more than the similarity search. The exact match interface also gave more con dence to the users in tasks that involve counting. However, users felt that it was more complex to use for tasks with time constraints or uncertainty (probably because it required many steps to add each constraint to construct a query). On the other hand, the similarity search interface had advantages in the ex- ibility and intuitiveness of specifying the query for tasks with time constraints or uncertainty, or tasks that ask for records that are similar to a given record. Users felt that it is easier to specify the time constraints in a query and that specifying how the answers should look makes the search process more transparent because they could see the big picture of their query. However, similarity search interface was more di cult for tasks that involve counting. The participants requested a better way to support counting tasks. The exact match and similarity search interfaces each have their advantages. How can I combine the best features from these two interfaces to create a new inter- face that can support tasks with uncertainty and time constraints as well as simpler and counting tasks? Based on the results of the experiment and my observations during the longitudinal study with my partners, I list several ideas for hybrid query interfaces that should be explored in the future: 117 1. Draw an example. Specifying the query by placing event glyphs on a time- line seems closer to users? natural problem-solving strategy and the visual representation of the query also helps users compare results with the query to notice and correct errors. 2. Sort results by similarity to the query but do not return all records and allow users to see more if necessary. Showing all records, even those that do not t the query, confuses users and reduces con dence in the results. However, users may want to see more results at certain times. One possible strategy is to show only exact results rst (i.e. like exact match) and have \more" button to show the rest or the next n records. Another strategy is to add a borderline that separates the exact results from the near matches. This may increase con dence and still be useful for exploratory search. 3. Allow users to specify what is exible and what is not. Even in a query with uncertainty, users may have some parts of the query that they are certain about, e.g. patients must be admitted to the ICU (i.e. do not even bother showing me records with no ICU event). These certain rules can be applied strictly to narrow down the result set without sacri cing the exibility of the other parts of the query. 4. Weights. Whether users would be able to translate more complex data anal- ysis goals into proper weight settings remains an open issue. One idea to prevent manual weight adjustment is to provide presets of weights that cap- ture common de nitions of similarity. 118 5. Avoid too many alternative ways to perform the same task. This can lead to confusion. In the experiment, I found many users used more lters than necessary. 4.7 Summary Event sequence data are continuously being gathered by various organizations. Querying for time-stamped event sequences in those data sets to answer questions or look for patterns is a very common and important activity. Most existing temporal query GUIs are exact match interfaces, which returns only records that match the query. However, in exploratory search, users are often uncertain about what they are looking for. Too narrow queries may eliminate results which are on the borderline of the constraints. On the other hand, the similarity search interface allows users to sketch an example of what they are seeking and nd similar results, which provides more exibility. In the beginning, I introduce the M&M measure, a similarity measure for event sequences. Brie y, the M&M measure is a combination of time di erences between events, and number of missing and extra events. The M&M measure was developed alongside with Similan, an interactive tool that allows users to search for event sequence records that are similar to a speci ed target record and provides search results visualization. A user study showed promising direction but also identi ed rooms for improvement. To address the limitations from the rst version, I developed Similan2, an in- 119 terface that allows users to draw an event sequence example and search for records that are similar to the example, with improved search results visualization. The M&M measure was also re ned. The M&M measure v.2 is faster and can be cus- tomized by four decision criteria, increasing its exibility. Finally, I conducted a controlled experiment that assessed the bene ts of exact match and similarity search interfaces for ve tasks, leading to future directions for improving event sequences query interfaces that combine the bene ts of both interfaces. These hybrid ideas were explored and led to the Flexible Temporal Search (FTS) in Chapter 5. 120 Chapter 5 Combining Exact Match and Similarity Search for Querying Event Sequences: Flexible Temporal Search 5.1 Introduction My research aims to support users in searching for event sequences when they are uncertain about what they are looking for. For example, the physician wants to nd the records that have event Surgery followed by event Die within approximately 2 days. The value 2 days is just an approximation. Many methods are exact match, which does not leave much room for exibility. The exact match creates a clear cut-o point for records that pass the query and excludes records that do not pass from the result set. A too narrow query (e.g. Surgery!2 days!Die) can inadvertently exclude records that might be slightly beyond the cut-o point (2.5 days instead of 2 days). In a situation when the users are not 100% con dent about the query, the exact match prevents them from seeing beyond the cut-o point what they might have missed. Trying to overcome the limitations of the exact match, many researchers have developed similarity search for event sequences. These methods calculate the simi- larity score between each record and the query and return all records sorted by their similarity scores. The result set now can tell the users which records are more or less 121 similar to the query. However, I have learned from the user study in the previous chapter that using similarity search also has its limitations. Without a clear cut-o point, users could not easily or con dently count how many records are acceptable results. It is more suitable for nding the most similar records to the query. These lessons guided me to set design goals for a new hybrid user interface. 1. Users must be able to see a clear a cut-o point between records that pass the query and records that do not. 2. Users must be able to see records that are beyond the cut-o point, sorted by similarity to the query. Following these goals, I have designed a new hybrid interface called Flexible Temporal Search (FTS) which combines the precision of the exact match and exi- bility of the similarity search. In this chapter, I will explain the FTS in more detail, starting from an explanation of the design of the query in Section 5.2, an algorithm for similarity score computation in Section 5.3, and a user interface in Section 5.4. 5.2 Query 5.2.1 How to de ne a cut-o point? Mandatory & Optional Flags The FTS result set is split into two bins: exact match results and other results. Records that are within the cut-o point are put in the exact match results while records that are beyond the cut-o point are put in the other results. The big question here is how could the interface know where the cut-o points 122 should be? The answer lies in the queries. Even in a query with uncertainty, users may have some parts of the query that they are certain about, e.g. patients must be in the ICU. If users can specify which part of the query they are certain about, the cut-o points can be derived from the certain part of the query. Using this concept, I introduce the mandatory and optional ags that let the users de ne which part of the query is exible or not. 1. Mandatory: When users are certain about some constraints, they specify them as mandatory, so any record that violates any of these constraints will be excluded from the exact match results and put in other results. 2. Optional: When users are exible about some constraints, they can specify them as optional. Records that pass these constraints will receive bonus simi- larity scores, but records that violate these constraints are not excluded from the exact match results. Records with higher similarity scores are displayed higher in the result set. Therefore, optional constraints allow user to specify what they prefer to see on the top of the list. Comparing to the existing interfaces, in exact match (such as LifeLines2 ), all events are mandatory while in similarity search (such as Similan), all events are optional. Figure 5.1 guides how to decide whether a constraint is mandatory or optional. In summary, users should use mandatory as much as they can, and use optional to specify what they prefer to see on the top of the list. 123 HOW TO DECIDE WHETHER A CONSTRAINT IS MANDATORY OR OPTIONAL! Will you accept a record that do not pass this constraint? Yes, but show records that pass the constraint higher in the results. No. Mandatory Optional Figure 5.1: How to decide whether a constraint is mandatory or optional 5.2.2 Speci cation Users can specify queries in terms of events, negations and gaps between con- secutive events. For simplicity and performance, FTS does not support a query that has events which occurred at the same time. 1. Event: Users add an event that they want and indicate if it is mandatory or optional. (a) Mandatory (X): Event X must occur. A! B ! C Find records with pattern A followed by B and C. For example, nd all patients who arrived (A), were transferred to ED (B) and discharged (C). (b) Optional (x): Event X should occur, but I can tolerate if it does not. A! b! C Find records with pattern A followed by b and C. b is preferred but not required. For example, nd all patients who were transferred 124 from ICU (A) to Floor (C). Being transferred to the IMC (b) in be- tween is preferred, but it is acceptable if the patients were transferred to Floor without passing the IMC rst. For each event, users can also specify when they want it to occur. If the time constraint is not speci ed, the query will only look for existence of the event and ignore its time. The time constraint can be mandatory or optional as well. (a) Mandatory (Xt): Event X must occur within time t. AMay2011 ! B Find records with pattern A followed by B and C. A must occur in May 2011. For example, nd all patients who had surgeries (A) in May 2011 and died (B). (b) Optional (X t): Event X should occur near time t. A Dec15;1642 ! B Mar20;1727 Find records with pattern A followed by B. A should occur near Dec 15, 1642 while B should occur near Mar 20, 1727. For example, a history student wants to nd famous scientists who was born (A) and died (B) closest to Sir Isaac Newton (Dec 25, 1642 { Mar 20, 1727). In this case, the user is certain that the two events (born and die) must occur, so the events are mandatory. However, the user is not certain about the timing. He only wants them to be as close to Newton?s as possible and does not have any time limit, so he set the timing as optional. 125 2. Negation or Not-occur: Users add negation for an event that they do not want and indicate if it is mandatory or optional. (a) Mandatory (:X): Event X must not occur. A! :B ! C Find records with pattern A followed by C without B in between. For example, nd all patients who arrived (A) and were discharged (C) without being in the ICU (B). (b) Optional (:x): Event X should not occur, but I can tolerate if it does. A! :b! C Find records with pattern A followed by C. It is more preferable not to have b between A and C, but not prohibited. For example, nd all patients who were in the oor (A) and were discharged (C). It is more preferable to avoid any additional transfer to another oor (b), i.e. I want to see patients who exited without being transferred to another oor on the top, and if there were any patients who were transferred to another oor, they can be included but show them in the bottom of the list. 3. Gap: Users can specify gap between consecutive events. Each gap consists of minimum duration, maximum duration, or both. The gap can also be mandatory or optional. (a) Mandatory : The gap must be within speci ed range. 126 A! (0; 10mins]! B Find records with pattern A followed by B within ten minutes. For example, a coach wants to know how many times in this season that his team conceded (B) a goal within ten minutes after scoring (A). (b) Optional : The gap should be close to speci ed range. A! [10mins]! B ! [10mins]! C. Find records with pattern A followed by B in approximately ten minutes and followed by C in approximately ten minutes. For exam- ple, a commentator is narrating a soccer match that the home team have scored (A), the opponent have retaliated by scoring (B) in ten minutes, and the home team have just scored again (C) after ten minutes. He then wants to nd another match in the past that is most similar to his current match. The time gaps in this example are not mandatory because the user does not have a clear range of what he expects in the results and he will accept all records no matter how much the gaps are. He just wants to use these gaps to nd the most similar matches. This is di erent from the previous example that the coach knows exactly what he is looking for. 5.2.3 Grade & Similarity Score When the search is executed, FTS compares each record to the query and cal- culates its grade and similarity score. The grades are either pass or fail, allowing the 127 interface to group records into exact match and other results, respectively. Records that violate any of the mandatory constraints are failed. The rest are passed. The similarity score is computed from 4 types of di erence. 1. Missing Events : Events that are in the query but do not exist in the record. 2. Extra Events : Events that are not speci ed in the query but appear in the record. It can be broken down into three sub-criteria if the users choose to: (a) Extra events before the rst match (b) Extra events between the rst match and the last match|in other words, interrupting the query pattern (c) Extra events after the last match 3. Negation Violations : Events that are listed as negations but appear in the record. 4. Time Di erence: Deviation from the time constraints Users can set the weight for each type of di erence. The total similarity score is computed from a weighted sum of these criteria (See Section 5.3.4). 5.3 FTS Similarity Measure 5.3.1 Preprocessing During preprocessing, a query is cut into blocks ([X]) that can be matched with the events in each records. 128 1. Events and negations are automatically cut into blocks, e.g., A! :B ! C is converted into [A]! [:B]! [C]. 2. Each gap is merged with the following event, e.g., A ! B ! (0; 2days) ! C is converted into [A]! [B]! [(0; 2days)! C]. 3. Consecutive negations are merged into special blocks when their order is not taken into account, e.g., A! :B ! :C is converted into [A]! [:Bj:C]#1 ! [:Bj:C]#2. When B is already matched in the rst block, the algorithm will remember and not match B in the second block. After the preprocessing is completed, the computation consists of two steps: matching and scoring. 5.3.2 Matching I use a typical dynamic programming approach [4] to nd the optimal matching between the blocks in the query Q = b1; b2; b3; :::; bm and events in each record R = e1; e2; :::; en. Given the query Q and record R, I use s(i; j) to denote the maximum similarity vector of matching the rst i blocks of the query Q and the rst j events of the record R. With this de nition, the maximum similarity vector of matching query Q and record R is s(m;n). 129 The base conditions and recurrence relation for the value s(i; j) are s(0; 0) = empty vector s(i; 0) = s(i 1; 0) + skip(bi) s(0; j) = s(0; j 1) + skip(ej) s(i; j) = max 8 >>>>>>>>< >>>>>>>>: s(i 1; j) + skip(bi) s(i; j 1) + skip(ej) s(i 1; j 1) + match(bi; ej) (5.1) where skip(bi) and skip(ej) are costs of skipping block bi and event ei, and match(bi; ej) is a cost of matching block bi and event ej. A match is possible only when the block and event have the same event type. To compare two similarity vectors s(i; j) for matching, the values within both vectors are compared in the following order. If the values are equal, the next values are used. Otherwise, the rest of the values are ignored. 1. number of matched events (mandatory): The higher number, the more similar the vectors are. 2. number of matched events (optional): The higher number, the more similar the vectors are. 3. number of negations violated (optional): In reality, having lower number should be more similar. However, if it is like that, the algorithm will always skip the negations, so I reverse the comparison just for matching to trick the algorithm. 130 4. number of negations violated (mandatory): The lower number, the more sim- ilar the vectors are, but the comparison is reversed for the same reason as above. 5. number of time constraints (gap & time of occurrence) violated (mandatory): The lower number, the more similar the vectors are. 6. time di erence: The less di erent, the more similar the vectors are. 7. number of extra events : The lower the number, the more similar the vectors are. If the users select to treat extra events separately, these three values will be used. They will be sorted by the weights that the users set for these sub-criteria. (a) number of extra events before the rst match (b) number of extra events between the rst and last match (c) number of extra events after the last match Skipping a block (skip(bi)) in the query does not add any change to the vector while skipping events in the record (skip(ej)) increases the number of extra events. Matching (match(bi; ej)) may increase the number of matched events (if the block is an event) or negations violated (if the block is a negation), incur time di erence or violate time constraints (if the block has any time constraint). For a block with gap, if the previous block is not matched, the gap is ignored. Algorithm 3 describes the FTS matching in more detail. 131 Algorithm 3 FTS Matching 1: s(0; 0) = new SimilarityVector(); 2: for i = 1 to m do 3: s(i; 0) = s(i 1; 0).clone(); 4: end for 5: for j = 1 to n do 6: s(0; j) = s(0; j 1).clone().incrementExtraEvent(); 7: end for 8: for i = 1 to m do 9: for j = 1 to n do 10: f// Compare between skipping bi and ej .g 11: s(i; j) = max( s(i 1; j).clone() ; s(i; j 1).clone().incrementExtraEvent() ); 12: if bi.eventType == ej.eventType then 13: sm = s(i 1; j 1).clone(); f// similarity vector for matching bi and ej .g 14: if bi.isNegation() then 15: if bi.isMandatory() then 16: sm.incrementViolatedNegation(); 17: else 18: sm.incrementViolatedOptionalNegation(); 19: end if 20: else 21: if bi.isMandatory() then 22: sm.incrementMatchedEvent(); 23: else 24: sm.incrementMatchedOptionalEvent(); 25: end if 26: if bi.hasTime() then 27: di = bi.getTime().di (ej); 28: sm.addTimeDi erence( di ); 29: if di > 0 and bi.getTime().isMandatory() then 30: sm.isTimeViolated = true; 31: end if 32: end if 33: if i > 1 and bi.hasTimeGap() then 34: if s(i 1; j 1).hasMatched(bi 1) then 35: gap = ej.time previousMatchedEvent.time; 36: di = bi.getTimeGap().di (gap); 37: sm.addTimeDi erence( di ); 38: if di > 0 and bi.getTimeGap().isMandatory() then 39: sm.isTimeViolated = true; 40: end if 41: end if 42: end if 43: end if 44: s(i; j) = max( s(i; j) ; sm ); 45: end if 46: end for 47: end for 132 Algorithm 4 FTS Grading 1: if s.mandatoryEventMatchedCount() < totalMandatoryEventsCount then 2: return false; 3: else if s.mandatoryNegationViolatedCount() > 0 then 4: return false; 5: else if s.isTimeViolated == true then 6: return false; 7: else 8: return true; 9: end if 5.3.3 Grading After the matching is completed, the similarity vector of each record is con- verted into a boolean ag passExactMatch indicating whether a record is included in the exact match or other results. If any of the mandatory events is not matched, any of the mandatory negations is matched or any mandatory time is violated, the ag is set to false. Otherwise, the ag is set to true. (Algorithm 4) 5.3.4 Scoring The similarity vector is also converted into a numerical similarity score. Users can set weights (0 100) for optional events, optional negations, time di erence and extra events (and the three sub-criteria of extra events if necessary) to customize the conversion. The weights for mandatory events and mandatory negations are xed to 100. Before calculating the score, the weights are normalized (w0 = wP w ) to make the total sum equal to 100 ( P w0 = 100). Then the total similarity score can be computed from the equation below (# = number of). To avoid division by zero, I 133 used a custom function called sDivide (Algorithm 5) instead of standard division. If a denominator is zero, this function returns a speci ed default value (0 or 1). This means that if any type of contraint is not set in the query, each record receives a maximum score (w0 1) for that type of constraint. score = w0mandatory events sDivide( # matched mandatory events # mandatory events in query ; 1) + w0optional events sDivide( # matched optional events # optional events in query ; 1) + w0mandatory negations (1 sDivide( # mandatory negations violated # mandatory negations in query ; 0)) + w0optional negations (1 sDivide( # optional negations violated # optional negations in query ; 0)) + w0time di erence (1 sDivide( time di erence maximum time di erence ; 0)) + w0extra events (1 sDivide( # extra events maximum # extra events ; 0)) (5.2) Algorithm 5 function sDivide(dividenddivisor , defaultValue) 1: if divisor == 0 then 2: return defaultValue; 3: else 4: return dividend/divisor; 5: end if If the three sub-criteria of extra events are used, the last line of the equation above is replaced with three lines of the sub-criteria instead. The nal similarity score is within the range 0 to 100. 5.3.5 Performance The time complexity of a matching between query and one record is O(mn) when m is the number of blocks in a query and n is the number of events in compared 134 record. Therefore, the time complexity for a search in dataset is O(mN), when N is the number of events in the dataset. 5.3.6 Di erence from the M&M Measure The FTS measure has evolved from the M&M Measure (Section 4.4.2). In summary, there are several di erence. (1) The FTS also provides a grade (pass/fail) while the M&M does not. (2) The FTS can support mandatory and optional con- straints. (3) The FTS handles negations. (4) The FTS does not handle swap. This change is due to the matching. In FTS, all events from all event types are matched at the same time while in the M&M, events are matched seperately for each event type. A matching from dynamic programming must be monotonic and, therefore, cannot include swaps. (5) FTS score ranges from 0.00 to 100.0 while M&M score ranges from 0.00 to 0.99. 5.4 User Interface I have developed a hybrid user interface for querying event sequences with FTS and integrated these features into the LifeFlow software (Figure 5.2). The user interface consists of three main parts: control, query and result panels. The control panel, inherited from LifeLines2 [131] allows users to align the records by sentinel events, if necessary. Users can specify query on the query panel and the search results will appear in the result panel. The following sections (5.4.1, 5.4.2 and 5.4.3) describe visual representation of FTS query, query speci cation and search results 135 Figure 5.2: Flexible Temporal Search (FTS) implemented in LifeFlow software: User can draw a query and retrieve results that are split into two bins: exact match and other results. In this example, a user is querying for bounce back patients, which are patients who arrived (blue), were later moved to the ICU (red), transferred to Floor (green) and transferred back to the ICU (red) within two days. The results panel display all bounce back patients in the exact match results while showing the rest in the other results, sorted by their similarity to the query. The top patient in the other results has a pattern very similar to bounce back but the return time to ICU was 4 days. The physician then can notice this patient and use his/her judgment to decide whether to consider this case as a bounce back patient or not. 136 Figure 5.3: Query Representation: (A) Triangles are mandatory events. (B) Manda- tory negation (red) is a triangle with a strike mark through the center placed in a balloon . (C) An event that has a time constraint (blue) has a path drawn from its glyph in the query strip to the position on the timeline that represents the beginning of its time constraint. The duration of the time constraint is rendered as a rectangle on the timeline. Solid line and lled rectangle represent mandatory constraint. (D) Query C with tooltip (E) A circle is an optional event (green). (F) An optional negation is a mandatory negation that uses a circle instead of a triangle. (G) An optional version of a time constraint in Query C. Dashed line and hollow rectangle are used instead. (H) Query G with tooltip in more detail. 5.4.1 Query Representation FTS query panel is a combination of timeline and comic strip approaches, as inspired by Similan [144] and QueryMarvel [54], respectively. The panel is split into two parts: timeline and query strip. All events and gaps are rendered as glyphs and placed back to back on the 137 Figure 5.4: Gap Representation: The gaps on the left (I,J,K,L) are mandatory while the gaps on the right are optional (M,N,O,P). Mandatory gaps are lled and use solid lines while optional gaps are not lled and use dashed lines. 138 query strip (Figure 5.2). The timeline is used to represent time constraints only. This design can eliminate occlusion and help users read the query easily. Tooltips are also available for all these glyphs to help users interpret them. All speci cations in Section 5.2.2 are converted to the following visual representations: 1. Event: All mandatory events are converted into triangles while optional events are converted into circles. Colors of the glyphs represent event types. 2. Time constraint: Each event that has a time constraint will have a path drawn from its glyph in the query strip to the position on the timeline that represents the beginning of its time constraint. Mandatory time constraints are drawn as solid lines while optional time constraints are drawn as dashed lines. The duration of the time constraint is rendered as a rectangle on the timeline. The rectangle is lled when the constraint is mandatory and hollow when it is optional. 3. Negation: Each negation is rendered similar to an event, but with a strike mark through the center. Consecutive negations are grouped and placed in a balloon. 4. Gap: Each gap is represented with a box plot (Figure 5.4). The width of the box plot represents the time using the same scale with the timeline. The box part represents minimum value and the extended part represents maximum value. An arrow is added to the end if there is no maximum value: for example, a gap more than 2days. Mandatory gaps are lled and use solid lines while optional gaps are not lled and use dashed lines. 139 Figure 5.5: User can click on an empty location on the query strip to add an event or a gap to a query. In this example, the user is going to add something between Floor (green) and Discharge-Alive (light blue). The location is marked with a cross and an \Add..." popup dialog appear. User then choose to add an event or a gap, which will open another dialog. 5.4.2 Query Speci cation The user interface supports three ways to specify a query: 1. Create a query from scratch: Users can click on an empty location on the query strip area to add an event or a gap to a query. Click in front of or behind existing events to add an event before and after, respectively. The clicked location will be marked with a cross and an \Add" dialog will appear (Figure 5.5). At this point, users can choose to add an event or a gap. If it is not possible to add a gap because the clicked location does not have events 140 on both side, the \Add Gap..." button will be disabled. (a) Add an event : The \Add Event"dialog contains a few questions about the event. First, do users want this event to occur (event) or not (negation)? Second, is this mandatory or optional? The dialog box provides some explanation to help the user choose. If the users want this event to occur, another question will be asked about timing (\Do you have a time constraint for this event?"). If the answer is no, then it is done. If the answer is yes, users can set the time constraint, which can be a point in time (\occur at") or an interval (\occur within"). Then users have to answer the last question (\Will you accept records that violate this time constraint?"), which will decide whether the time constraint is mandatory (\No") or optional (\Yes."). After answering all questions and clicking on \Add event", the new event is added to the query strip. (b) Add a gap: The \Add Gap" dialog contains one question that asks whether users will accept records that violate the gap. The answer to this question is used to decide whether the gap is mandatory (\No") or optional (\Yes."). The rest of the dialog are controls for specifying a time gap. Once users click on \Add gap", the new gap is added to the query strip. User can modify or delete an event by clicking on its glyph. This will bring up a \Modify Event" dialog, which is similar to the \Add Event" dialog, but the buttons are changed from \Add event" and \Cancel" to \Apply", \Delete" 141 Figure 5.6: Weights: Users can adjust the importance of each kind of di erence. and \Close". User can modify the parameters, click on \Apply" to see the changes take e ect and \Close" or simply click \Delete" to delete the event. The same interactions are used for modifying and editing gaps. 2. Create a query from a record: Instead of creating from scratch, users can right-click on any record in the list and click on \Set as query" to use that record as a template (Figure 5.7. This can be used for searching similar records as well. All events in the selected record are automatically converted into mandatory events in the query. Users can choose between two options: (a) Use speci c time: Each event use its original time from the selected record as an optional time constraint. (b) Use gap between events : All gaps between events in the selected records are converted into optional gaps in the query. 142 Figure 5.7: Creating a query from a record: In this example, a user chooses the record \10010010" as a template. (left) In \use speci c time" mode, each event use its original time from the selected record as an optional time constraint. (right) In \use gap between events", all gaps between events in the selected records are converted into optional gaps in the query. Users then can modify the query to suit their needs. 3. Create a query from an event sequence in LifeFlow: Users can pick any event sequence in LifeFlow as a starting point by right-clicking and selecting \Set as query" from the context menu (Figure 5.8). All events in the sequence are automatically converted into mandatory events. After that, users can modify the query as needed. The user interface also includes options to adjust the weights and choose event types to include in the search (Figure 5.6). These weight sliders correspond to the weights in Section 5.3.4. 143 Figure 5.8: Creating a query from an event sequence in LifeFlow: In this example, a user chooses the sequence Arrival (blue), Emergency (purple), ICU (red), Floor (green) and Discharge-Alive (light blue) as a template. 5.4.3 Search Results The results are split into two bins: exact match and other results. Exact match results are comparable to emails in inbox while other results are comparable to emails in spam folder. Users can examine the exact match results (inbox) rst, then look into other results (spam folder) when they cannot nd something, or want to con rm that they are not missing anything. Each bin is displayed using a list of single-record visualizations, similar to LifeLines2 and Similan. All records within each bin are sorted by their similarity score. The scores are shown in the front section of each row using ValueChart [24]. The length of each section of the colored bars represents the value (after weighted) of each of the four criteria in the similarity measure. User can place the cursor over each record to see a tooltip that 144 Figure 5.9: Comparison: Four events in the query are matched while one is missing (light blue) and another one is extra (black). explains the similarity score on the left (Figure 5.2). To see detailed comparison, user can right-click and select \Show comparison" to open a comparison window that explains whether an event is a matched, extra or missing events (Figure 5.9). 5.5 Use case scenario As mentioned earlier in Section 4.1.1, the bounce back study was the original motivation of my interest in event sequence query. Bounce back patients are patients whose level of care was decreased then increased again urgently. The common cases are patients who were transferred from the ICU to the Floor (normal bed) and then back to the ICU within a certain number of hours. 145 Therefore, I revisit this scenario to demonstrate how the FTS interface can support this task and provide the bene ts of both the exact match and similarity search interfaces. Figure 5.2 shows a screenshot of a bounce back query: Arrival (at time 0)! ICU! Floor! 2 days! ICU. The search results return 5 patients as exact match results and 95 patients as other results. The exact match results can be counted quickly and con dently similar to using an exact match interface and better than when using a similarity search interface. Looking at the other results shows what is right beyond the cut-o point, which is not possible when using the exact match interface. The top patient in the other results has a pattern very similar to bounce back but the return time to ICU was 4 days. While examining the results, the physician can easily notice this patient and use his/her judgment to decide whether to consider this case as a bounce back patient or not. If an exact match interface was used to query this dataset, this patient might have been left unnoticed. 5.6 Summary Exact match and similarity search are the two extremes in querying event se- quences. Both have their own advantages and limitations. In this chapter, I explain the design of a hybrid interface, Flexible Temporal Search (FTS), that combines the bene ts of both interfaces. FTS allows users to see a clear cut-o point between records that pass the query and records that do not, making it easier to count and give more con dence. All records are sorted by similarity, allowing users to see the 146 \just-o " cases that are right beyond the cut-o point. In order to do this, I designed a new similarity measure which includes information for each constraint whether it is mandatory or optional. The mandatory constraints are used to decide whether a record is an exact match or not. The optional constraints are used as bonus points for ranking results, but are not used for excluding a record from exact match results. The FTS similarity measure also supports negations, time of occurrences and time gaps between consecutive events. In addition, I have developed a user interface that allows users to specify FTS query by placing events on a timeline. Users can start from an empty timeline, or select one record or an event sequence from LifeFlow as a template. The user interface also supports search results interpretation and provides detailed comparison between query and each record. Finally, I have illustrated an example use case scenario to demonstrate the usefulness of FTS. More applications of the FTS are included in the case studies in Chapter 6. 147 Chapter 6 Multi-dimensional In-depth Long-term Case Studies 6.1 Overview Usability testing and controlled experiments are commonly used to evaluate user interfaces. These approaches have limitations due to their short time limit. It is di cult to learn how the users will use the tool to solve more di cult problems or test advanced features that require understanding of more complex concepts from participants who have just known the user interface for only 1{2 hours. In this chapter, I describe several case studies that were conducted following the multi- dimensional in-depth long-term case studies (MILCs) approach. I recruited participants who expressed interest in using LifeFlow to analyze their data, learned about their problem, then facilicated the initial phase, such as data cleaning and conversion, to help them start. After that, I provided necessary support and periodically collected feedback per convenience of the users. In some cases, we scheduled meetings (in person or sometimes virtually via Skype) to look at the data together. In some cases, the participants analyzed their data independently and later provided feedback through interviews or emails. The early participants were also involved in an iterative design of the interface where I revised many designs according to their feedback. Many useful features of the software were added from these early studies. All stories, analyses, ndings and feedback are reported in this 148 Domain Data Size Duration Highlighted Results (records) (of the study) 6.2 Medical 7,041 7 months Found patients reported dead before being transferred to the ICU; Sug- gested potential use for long-term monitoring of the ED and quality con- trol. 6.3 Transportation 203,214 3 months Found incidents that were reported to be longer than 100 years; Comparison between tra c agencies showed that the data were reported in inconsistent ways and could not provide a mean- ingful comparison. 6.4 Medical 20,000 6 months Analyzed drug prescribing patterns and drug switching; Detected patients with possible drug holidays; Led to a spin-o project to support events with interval. 6.5 Medical 60,041 1 year Explored patients? data from a man- agerial perspective in several aspects: admission, visit, mortality, diagnoses, etc.; Found that most dead patients died after admission instead of dying in the ED. 6.6 Web logs 7,022 6 weeks Studied how visitors read children?s book online; Discovered that many users also accessed the online books in backward direction. 6.7 Activity logs 60 5 months Analyzed activity logs of two user in- terfaces; Highlighted the di erent us- age characteristics between two user interfaces; Found activities that are often used together. 6.8 Logistics 821 6 weeks Tracked cement trucks using auto- matic sensors and analyzed delivery patterns; Pointed out inaccurate sen- sory input; Detected a plant that took longer delivery time than others. 6.9 Sports 61 5 weeks Picked interesting soccer matches; Re- ported interesting facts; Found an un- known defender who scored against the champions twice. Those two goals were his only two goals in three years. Table 6.1: Summary of all case studies 149 chapter. Table 6.1 highlights results from all case studies. In addition to the case study results, I have derived a process model and a set of design recommendations for exploring event sequence from usage statistics, case study results and observations of user behaviors. The process model and design guidelines are described in Section 6.11 and 6.12, respectively. 6.2 Monitoring Hospital Patient Transfers 6.2.1 Introduction As mentioned earlier in Section 3.2, the design of LifeFlow was motivated by this patient transfer case study with Dr. Phuong Ho, a practicing physician in the emergency department at Washington Hospital Center. Dr. Ho was interested in analyzing sequences of patient transfers between departments for quality assurance. As I developed LifeFlow, I continued to work with Dr. Ho to analyze patient transfer data in more detail. 6.2.2 Procedure I had 1{2 hour meetings with him tentatively every two weeks for seven months. Sometimes he visited the Human-Computer Interaction Lab (HCIL). Some- times I visited him at the hospital. Before the meeting he provided me with the data that he wanted to analyze, and a few initial questions. I converted the data and during the meeting, we sat down and looked at the data together. After discussing the questions sent in advance, he would come up with additional questions and gave 150 feedback about the user interface, therefore closely guiding the development. 6.2.3 Data A dataset in this study includes 7,041 patients who came to the ER in January 2010 and their 35,398 hospital events. Each record contains room assignments, time that the patient was assigned to each room, and how he/she was discharged from the hospital: dead, alive, leave without being seen (LWBS) and absence without leave (AWOL). I preprocessed the data by grouping the room numbers (e.g. ER15-P, 2G06-P) into types of room: 1. ER: Emergency Room, a section of a hospital intended to provide treatment for victims of sudden illness or trauma 2. ICU : Intensive Care Unit, a hospital unit in which patients requiring close monitoring and intensive care are kept 3. IMC : Intermediate Medical Care, a level of medical care in a hospital that is intermediate between ICU and Floor 4. Floor : a hospital ward where patients receive normal care 6.2.4 Analysis 6.2.4.1 First impression The rst time I showed LifeFlow to Dr. Ho using patient transfer data, he was immediately enthusiastic and con dent that the tool would be useful for looking at 151 Figure 6.1: Patients who visited the Emergency Room in January 2010. all patients who came to the hospital and, in particular, the emergency room (ER). He knew that many people would want to see the typical ow of the patients and the transfer time between rooms. In another meeting, I received additional feedback from Dr. Mark Smith, the director of the Emergency Department. Finding the bounce back patients visually in the display elicited comments such as \Oh! This is very cool!" and led to a discussion of the possibilities of using this tool to analyze hospital data in the long run. 152 6.2.4.2 Understanding the big picture In a meeting, I loaded the data into LifeFlow and asked Dr. Ho to review the ow of patients in the hospital (Figure 6.1). The rst thing that he noticed was the most common pattern, Arrival ! ER ! Discharge-Alive. 4,591 (65.20%) of the patients were not admitted to the hospital (discharged directly from the ER). This is regular and consistent with what he had expected because most of the pa- tients who visited the ER were not in severe condition and could leave immediately after they received their treatment, so I removed these 4,591 patients from the visu- alization to analyze other patterns. The second most common pattern, Arrival ! ER ! Floor ! Discharge-Alive (1,016 patients, 14.43%), now became more ob- vious. We decided to remove it too because it was also regular. We followed the same strategy, selecting regular common sequences and removing them from the visual- ization to detect the uncommon cases that might be irregular. Dr. Ho noticed that 193 patients (2.74%) left without being seen while 38 patients (0.54%) were absent without leave. These two numbers could be compared with the hospital standard for quality control. Then, he saw two patterns that he was interested in (Arrival ! ER ! Floor ! IMC and Arrival ! ER ! Floor ! ICU). These patterns correspond to another quality control metric called step ups, which occurs when the patients were admitted to a lower level of care (Floor), but later transferred to a higher level of care (IMC or ICU). Dr. Ho could quickly see from the visualization that the patients were transferred from Floor to ICU faster than Floor to IMC on average so he used the tooltip to see the distribution. He captured screenshots to 153 Figure 6.2: Six patients were transferred from ICU (red) to Floor (green) and back to ICU (bounce backs). The average transfer time back to the ICU was six days. The distribution shows that one patient was transferred back in less than a day. compare with practices reported in the research literature, but also commented that the average time seemed quite good from his knowledge. I also demonstrated the alignment and used it to analyze the transfer ow before and after the patients were admitted to the ICU (Figure 6.2). From the to- tal 181 ICU patients, 85 (46.96%) of them were transferred from the ER and 119 (66.75%) of them were transferred to the Floor after that. However, six patients were transferred back from Floor to ICU (bounce backs). We saw from the distri- bution that one patient was transferred back in less than a day. Dr. Ho requested 154 to see these six patients in more detail, so I clicked on the bar, which highlighted these patients in LifeLines2 view and noted down these patients? ID. In addition, he also noticed some anomalous sequences, e.g. a few patient records showed a trans- fer to ICU after being discharged dead, pointing to obvious errors in data entry or at least re ecting the possible delays in data entry. Although we did not identify other surprising transfers (which on the other hand, re ected good quality of care), this still showed that the tool is useful for monitoring the patient transfer ow. I also received additional questions from Dr. Ho after the meeting. Some questions included clear request for alignments (\I want to see this in LifeFlow. Speci cally I want to see a view that shows me where all my [ICU] patients are coming from.") indicating that he could understand what the tool is capable of and how to use it for his task. 6.2.4.3 Measuring the transfer time Because LifeFlow can easily calculate an average time, Dr. Ho formulated many queries asking about average time, such as \Of patients who came to the ICU from the ER, what was the average time it took for transfer of the patient to the ICU? More speci cally, if they went to the ICU, how long did it take from the time they had arrived at the ER to get to the ICU? Same question for IMC..." or \For all the quarters, Jan-Mar 09, Apr-Jun 09, Jul-sep 09, Oct- Dec 09 and Jan-Mar 10, I want an average time from ER to 2G [which is a speci c type of ICU room]." 155 6.2.4.4 Comparison Another use of LifeFlow was to compare di erent data sets by inspecting the di erence between two side-by-side LifeFlow visualizations. Dr. Ho had a hypothesis about whether IMC patients were transferred faster during the day (7am-7pm) than during the night. I opened the same dataset in two windows and ltered the records by time-of-day ltering, making the two windows contain only patients who arrived during the day and during the night, respectively. We inspected the di erence between the two visualizations but no signi cant di erence was found. In another case, we compared patients who were admitted to the IMC at least once in four quarters: Jan{Mar, Apr{Jun, Jul{Sep and Oct{Dec 2008. We opened the four datasets in four LifeFlow windows and noticed a di erence in the patients who were transferred from ER to the ICU. In the rst, third and fourth quarter, these patients were later transferred to IMC and Floor, with majority were transferred to the IMC. However, in the second quarter, all patients were later transferred to the IMC, suggesting further investigation whether this occurred by chance or any particular reason. 6.2.5 Discussion I found that the domain expert (Dr. Ho) was able to understand LifeFlow rapidly and that LifeFlow was useful to provide an overview of the entire data set, and to compare and measure the transfer times. Once the data was loaded, he could quickly see the big picture and nd anomalies. Dr. Ho expressed that being able 156 to formulate queries easily gave him more time to look at the data and formulate new hypotheses or think about other interesting questions. Although he might have been able to answer some of the questions in SQL, it would be very di cult and error prone. He also mentioned that LifeFlow would be very useful for long-term monitoring and quality control because it provides a quick way to inspect the data from the overview. For example, hospital directors can check the waiting time and nd bottlenecks in the process [39]. 6.3 Comparing Tra c Agencies 6.3.1 Introduction To illustrate how LifeFlow is not in anyway limited to medical applications, this case study was conducted with the Center for Advanced Transportation Tech- nology Lab (CATT Lab) at the University of Maryland with the help of another computer science PhD student, Mr. John Alexis Guerra G omez [43]. Vehicle crashes remain the leading cause of death for people between the ages of four and thirty-four. In 2008, approximately 6-million tra c accidents occurred in the United States. This resulted in nearly 40,000 deaths, 2.5 million injuries, and losses estimated at $237 billion. While traditional safety and incident analysis has mostly focused on incident attributes data, such as the location and time of the incident, there are other aspects of incident response that are temporal in nature and are more di cult to analyze. LifeFlow was used to examine a dataset from the National Cooperative Highway Research Program (NCHRP) that includes 203,214 157 tra c incidents from 8 agencies. 6.3.2 Procedure I attended initial meetings with researchers from CATT Lab and learned about dataset and problems. Mr. John Alexis Guerra G omez then facilitated this study while I was away to an internship at Microsoft Research. During that time, I provided feedback and technical support remotely. The total duration of this study was three months. 6.3.3 Data This dataset has 203,214 records, which contain 801,196 events in total. Each incident record includes a sequence of incident management events: 1. Incident noti cation: when the agency is rst noti ed of the incident 2. Incident Arrival: when the emergency team arrives on the scene 3. Lane Clearance: when the lanes are opened, but the incident scene may be not completely cleared 4. Incident cleared, Incident clearance, and Return to normal: all denote the end of incidents. For ease of analysis, all three event types are aggregated into the new event type Return to normal (aggregated). A typical sequence should start with Incident Notification and nish with Return to normal (aggregated), with the possibility of having Incident Arrival 158 Figure 6.3: This gure shows 203,214 tra c incidents in LifeFlow. There is a long pattern (more than 100 years long) in the bottom that stands out. I was wondering if there was an incident that could last more than hundred years, we probably should not be driving any more. Investigating further, the Incident Arrival time of all those incidents were on January 1th 1900, a common initial date in computer systems. This suggested that the system might have used this default date when no date was speci ed for an incident. and Lane Clearance in between. In addition, the tra c incidents data include two attributes: the agency (represented with a letter from A to H for anonymity) and the type of incident (e.g. Disabled Vehicle, Fatal Accident, etc.) 6.3.4 Analysis 6.3.4.1 Quantifying data quality issues After loading the dataset in LifeFlow, it was noticed immediately that Agency B contains 6,712 incidents that was more than 100 years long (Figure 6.3). Further 159 analysis showed that Agency B reported the Incident Arrival of those incidents as January 1st 1900. Since this date is commonly used as the initial date in computer systems, this suggested that the system the Agency used to register this event might have used it as a default value when no date was speci ed. Considering these incidents as corrupted data, all of them were removed from the dataset. While it was easy to spot this problem, such anomalies can often remain undetected, and skew the results of even the simplest of analyses such as calculating the mean time to clearance. Similarly, 48 incidents from Agency D that are about 10 months long, in which the Incident Arrival occurs before the Incident Notification, were found and removed. The next thing noticed from the data was that there were many incidents that lasted exactly 24 hours, which seemed unlikely. Using the Align, Rank and Filter operators, we found that those 24-hour-long incidents had Incident arrival events occur in the rst hour of the day (e.g. 12:30 a.m. April 10 2009) and Incident notification events happened in the last hour of the same day (e.g. 11:50 p.m. April 10 2009). This observation seemed to suggest that there were data entry problems with those incidents, indicating that the operator failed to|or was not able to|record the correct date of an event (e.g. 12:30 a.m. Apr 11, 2009 as opposed to 12:30 a.m. Apr 10, 2009). Similar errors were discovered for paths that are about 12 hours long, in which case the errors seem to be problems choosing between AM and PM in the date. Those anomalies were found quite easily by Mr. Guerra G omez, the computer scientist developer, who had no experience in transportation data. Finding such errors using traditional tools like SQL or manual analysis can 160 Figure 6.4: LifeFlow with tra c incidents data: The incidents are separated by agen- cies (A-G). Only Incident Notification and Return to normal (aggregated) events are shown. Other events are hidden. The agencies are sorted by a sim- ple measure of agency performance (average time from the beginning to the end). Agency C seems to be the fastest to clear its incidents, followed by E, A, H, D, F, B and nally G. be very di cult and time consuming, and requires experienced analysts who would suspect the existence of such errors. 6.3.4.2 Ranking the agencies? performance After cleaning the data, we used the time from when the agencies were noti ed until the nal clearance of the incidents as a performance measure. The time when the agency was noti ed can be indicated by the Incident Notification event. In order to compare the agencies performance, we rst removed the inconsistent data (incidents that do not start with Incident Notification), which could be 161 performed easily using the equal height overview feature. After the steps above, the visualization of the data can be seen in Figure 6.4. Incidents are grouped by agencies. We showed only two event types: Incident Notification (green) and Return to Normal (Aggregated) (blue), so the horizontal length of each agency?s path represents the average time from incident noti cation to nal clearance, which re ects the performance measure for that agency. We then sorted the agencies according to the length of their paths, resulting in the fastest agency (shortest path) on the top and the slowest agency (longest path) in the bottom. From Figure 6.4 we could see that Agency C was the fastest agency to clear its incidents, taking about 5 minutes on average, while the slowest one was Agency G with an average of about 2 hours and 27 minutes. To investigate deeper into Agency C?s data, we removed the data from other agencies and looked into the di erent incident types reported (Figure 6.5). Most of the incidents that Agency C reported are Disabled Vehicles which had about 1 minute clearance time on average. Looking at the event distribution, we also found that a large number of the incidents reported Clearance immediately after Incident Notification. This observation made us wonder if there is any explanation for these immediate clearances, and encouraged further analysis. In a similar fashion, Agency G, which seemed to be the slowest agency, were investigated. Agency G classi ed their incidents in only two types: Non-ATMS Route Incident and simply Incident. The Incident incidents had an average length of about 38 minutes, which is very fast compared to the other agencies. However, the Non-ATMS Route Incident incidents took on average 5 hours 14 minutes to clear. 162 Figure 6.5: LifeFlow with tra c incidents data from Agency C and Agency G: Only Incident Notification and Return to normal (aggregated) events are shown. The incidents are also grouped by incident types. Most of the incidents that Agency C reported are Disabled Vehicles which had about 1 minute clearance time on average. 163 This made us realize that when using the average time of all incidents from Agency G without considering the incident types, Agency G seemed to be slower than other agencies. While in fact, Agency G performed quite well for incident type Incident. These evidences showed that the agencies reported their incident management timing data in inconsistent ways, e.g., Agency C?s incidents with 1 minute clearance time. Beside timing information, the same problem occurred with incident types as the terminology was not consistent across agencies or even local areas. Di er- ence might stem from di erences in data entry practices, terminology variations, or missing data. Policies are needed to decide if, and how, to normalize the data and tools will be needed to manage this normalization process and provide audit trails documenting the transformations. At the time of this analysis, these inconsistencies among agencies restricted further meaningful performance comparison. 6.3.5 Discussion Although this data analysis was limited and preliminary, domain experts from the CATT Lab were conducting a more formal analysis of the data. They reviewed this work and stated that they wished LifeFlow was available earlier on when they started their own analysis. They con rmed the existence of anomalies that were found in the data, and stated that their elimination was non-trivial when using SQL because they had to expect the errors in advance and be careful to exclude them from their analysis. However excluding all the possible erroneous sequences in a SQL query would be very di cult. In the end, they needed to review the 164 results of SQL queries to ascertain that there were no longer any errors. Without LifeFlow, this kind of review and identi cation of unexpected sequences would be almost impossible. Finally, they mentioned that LifeFlow would allow them to ask more questions faster, and probably richer questions about the data. LifeFlow was also able to reveal unexpected sequences that may have been overlooked, but the tool also suggested that their prevalence is limited. I believe that using LifeFlow can assist analysts in exploring large datasets, such as the NCHRP tra c incident information, in ways that would be very di cult using traditional tools and might allow analysts to nd richer results in less time. 6.4 Analyzing Drug Utilization 6.4.1 Introduction The U.S. Army Pharmacovigilance center has assembled a data warehouse of medical records and claims representing the clinical care of 12 million soldiers, dependents and retirees over a period of ve years. They use that data to promote patient safety in the military and to serve as a federal partner for the FDA?s Sentinel program studying drug safety signals across the U.S. population. Mr. Sigfried Gold was a computer scientist at Oracle Corporation working with the PVC. He was helping the PVC epidemiologist and pharmacists explore drug exposure and diagnostic events in longitudinal patients. The dataset consists of 20,000 patients over 7 years with 5-100 medical events per patient. His initial goal was to look at causal relationships between drug exposure 165 and adverse events. However, after some preliminary attempts, we decided that causal analysis would require more complex analysis which might go beyond the capabilities of LifeFlow. We learned that LifeFlow is great for looking at multiple sequences when each sequence is one process, but less e ective when each sequence consists of events from multiple independent processes because the negligible order of events from di erent processes will be used to aggregate the sequences and lead to a large number of infrequent irrelevant patterns. It also experiences di culties when there are a large number of event types, which makes color-coding become more challenging. Causal analysis would require an analysis with a large number of event types and possibly includes events from multiple processes. As a result, we focused on a simpler case which LifeFlow seemed to be di- rectly applicable by looking at drug utilization: drug prescribing patterns and drug switching, in particular. 6.4.2 Procedure Since the data in this study were very sensitive and con dential, I was not allowed to look at the data by myself. I helped Mr. Gold start his analysis by generating a fake dataset and demonstrating how to use the software. After he was familiar with the software, I provided technical support as needed and let Mr. Gold and the PVC team perform an analysis on their own. We communicate by emails every 2{3 weeks and had in-person meetings several times at HCIL and PVC. The total duration of this study was six months before an extension project was spin- 166 o ed. In the end, I interviewed Mr. Gold and reported the following results. 6.4.3 Analysis 6.4.3.1 Drug prescribing patterns The PVC team looked at the drug prescribing data that record when medica- tions were prescribed. With drugs that the PVC team expected to be chronic, Life- Flow visualization pointed out that there are always more than half of the patients who have gaps within their exposures. These could potentially be drug holidays, i.e., when a patient stops taking a medication(s) for a period of time, anywhere from a few days to many months or even years if they feel it is in their best interests. The following question was immediately raised: If the gaps were really repre- sentative of holidays or they were just brief period between two prescription that they have enough pills to keep going? LifeFlow helped the PVC identify the patient cohorts with possible drug holidays and also show potential for taking part in a deeper analysis to answer this question. 6.4.3.2 Drug switching Another study focused on how patients switch from one drug to another. One interesting characteristic of this dataset is that some event types are interval events, not just point events. For example, an event, such as Drug A or Drug B, has start and end time, which indicate the exposure period. Two drugs can overlap, increasing the complexity of the analysis. There were also some periods that patients were not 167 taking any drug. LifeFlow was designed for point events, which each event has only one time. We came up with a workaround that converted the data from interval events (Drug A and Drug B) into point events (Drug A only, Drug A & B, Drug B only and no drug) before using LifeFlow. This approach was tolerable in the preliminary phase. However, the conversion increased the overhead in data processing time when any change had to be made and users lost exibilty of LifeFlow that can include/exclude any event type when needed. For example, when analyzing a dataset with drug A, B and C, including only drug A and B and excluding drug C will require one round of data preprocessing and including drug C back will require another. Users ended up spending most of the time preprocessing data, making the analysis more time consuming and less engaging. There were also several useful features for interval that Mr. Gold would like to have. Therefore, we decided to fork the LifeFlow project to create an extension of LifeFlow that can support interval, which is beyond the scope of this dissertation, to support further analysis. 6.4.4 Discussion This case study demonstrated the ability of LifeFlow to analyze drug prescrib- ing patterns, which led to a question whether patients are having drug holidays. \Statistical techniques for dealing with longitudinal data generally focus on changes in continuous variables over time, and the problem of identi- fying patterns of sequence and temporal spacing in categorical events is 168 not handled by standard techniques and software. This problem arises a lot in analysis of health care data, and this tool opens up a kind of study that just hasn?t been possible before," said Mr. Gold. Working with Mr. Gold also help me identify LifeFlow?s limitations in han- dling several event types, multiple processes and intervals. \LifeFlow is good for complex temporal relationships among a small number of variables (event types)." Understanding these limitations led to a spin-o project beyond the scope of my dissertation that extended LifeFlow idea to support intervals. 6.5 Hospital Readmissions 6.5.1 Introduction Hospital readmission shortly after discharge is well known to be a factor in rising healthcare costs and is felt to be a marker of inpatient quality of care. Dr. A. Zach Hettinger, a physician at the MedStar Institute for Innovation (MI2), was interested in the hospital readmissions study. He participated in this LifeFlow case study to visualize emergency department patient visits over time and planned to look for patterns in return visits and across chief complaints, and also to look at high risk populations. 6.5.2 Procedure I scheduled tentative biweekly meetings for coordination. Our methods for communication were email, phone, Skype and in-person meetings. In the rst few 169 months, I developed an understanding of medical problems while explaining LifeFlow to Dr. Hettinger. During this period, I learned new requirements that are not handled su ciently in LifeFlow and developed new features to support them. After each new feature was developed, I would demonstrate it and ask for feedback in the following meeting. This phase lasted for several months. We also had to request access to patient information, which caused some delay. Towards the end of this study, I arranged a few Skype sessions with Dr. Hettinger, in which he performed analyses and shared his screen. I collected results from those sessions and report them in Section 6.5.5 and 6.5.6. The total duration of this study was one year. 6.5.3 Before the Study Prior to this study, the hospital was using canned reports involving 72-hour returns. Dr. Hettinger explained that he was looking at the data using these steps: 1. Pull the data from Azyxxi, an EMR system, in a table view which contains the following columns: Patient ID Diagnosis Visit Date #1 Attending Physician #1 Visit Date #2 Attending Physician #2 and many more columns 2. Sort by the date that the he saw the patients (Visit Date #1). 170 3. See if the patients came back. (Visit Date #2 is not blank.) 4. Click on each row to read the diagnosis to learn why they came back plus more information. This method can be slow and sometimes not helpful. This method also cannot answer questions such as When they see me (Dr. Hettinger), is this their rst visit to the hospital? The table view did not show the data before this visit. Azyxxi can show if the patient came back once or did not come back, but could not show if they come back for the 2nd, 3rd, ... time. How many of them came back after seeing me (Dr. Hettinger)? Cannot see the proportion easily. Cannot easily lter by diagnosis type. For example, show patients who were diagnosed with heartburn only. With LifeFlow, the questions above can be answered easily. 6.5.4 Early Progress In the rst few meetings, we discussed the data format that LifeFlow can accept and used a simulated dataset for demonstration. The feedback from these meetings led to the development of DataKitchen 1, and new LifeFlow interactions and other improvements: Display clear separation between hospital visits (Section 3.7). 1A data conversion tool developed by Hsueh-Chien Cheng, another PhD student 171 Allow users to lter alignment point by event attributes. For example, attend- ing physicians and diagnoses are attributes of the event Arrival. { Find Arrival events when patients visited a speci ed physician (such as Dr. Hettinger) and align to see what happened after arrival. { Find only arrival events when patients visited with a particular diagnosis (such as chest pain) and see what happen after arrival. Selection from distribution (Section 3.7) Add/Modify custom attributes (Section 3.7) Display attribute summary (Section 3.7) New tooltip design 6.5.5 Analysis: Personal Exploration Dr. Hettinger envisioned using LifeFlow for two applications. The rst one is for a physician to review his/her own performance. The second one is using it for administrative purposes (Section 6.5.6). Dr. Hettinger used LifeFlow to select only his patients using the advanced alignment. He set the alignment point to be a visit to the hospital when the attribute \attending physician" was him. Then he looked for patients who returned within 72 hours after being discharged. He did not nd one, which re ects good performance. To continue the analysis, he selected a few patients who returned soon after their discharges and compared the diagnoses before they were discharged and when they 172 came back to see if these two visits were related. If they came back with the same complaints, it could indicate that the physician missed something during the patient?s prior visit. LifeFlow provided the physician with more information than the canned reported that he was previously using. 6.5.6 Analysis: Administration Another aspect that LifeFlow can be used for is to look at the data from a managerial perspective and see the overall performance of the hospital. We used an anonymized dataset which has 92,616 patients visits spread over 60,041 records of patients in this analysis. The duration of this dataset is one year. 6.5.6.1 Revisit To see the proportion of patients who visit once, twice and so on, we aligned them by the rst ED Registration. \What I don?t understand is why the patterns are not sorted by number of visits?" said Dr. Hettinger. I then explained that the patterns are sorted by number of patients by default, but we could change it. So, we changed the ranking to \Max time to the end" and saw the patterns sorted by number of visits (Figure 6.6). The tooltip displays more information when the cursor is placed over the bar. For example, in Figure 6.7, 1501 patients (2.5%) visited four times. 173 Figure 6.6: Patient records aligned by the rst ED Reg date (light blue): The bar?s height represents the number of patients. The visualization shows the proportion of revisits. Figure 6.7: Tooltip uncovers more information: Place cursor over a sequence in Figure 6.6 to see more information from tooltip. For this sequence with four visits, there were 1501 patients, which is 2.5% of all patients. 174 6.5.6.2 Revisit by recurring patients Another task was to see the number of all visits by patients with at least n visits. This could tell how many visits that the recurring patients were accounted for. We excluded all events except ED Reg (registration) date and aligned by all ED Reg date (Figure 6.8. The patient records were duplicated depending on their number of visits. This time the bar?s height represents the number of visits instead of the number of patients. The rst bar after the alignment point represents the number of visits by all patients who had at least one visit. The second bar represents the number of visits by all patients who had at least two visits. The tooltip allows access to more useful information (Figure 6.9). \We can quickly make a nice table out of this," said Dr. Hettinger. \Patients with at least six visits, which is 2% of patients, account for 4.3% of visits. Patients with at least ve visits, which is 3% of patients, account for 6.6% of visits. Patients with at least four visits, which is 5.6% of patients, account for 10.2% of visits and so forth. These are statistics that people enjoy looking at and can be compared between hospitals." In Dr. Hettinger?s opinion, getting the number of visits by the top 5% or more frequent patients is more interesting than knowing who the top 5% are. \If you wanna say I just want the top 5%, you are gonna get some random number like the top 5% have 6.2 visits. That is kinda useless number. It is actually a much better way to ask the other way round that we just went through down here" said Dr. Hettinger. 175 Figure 6.8: Patient records aligned by the all ED Reg date (light blue): The bar?s height represents the number of visits. The rst bar after the alignment point represents the number of visits by all patients who had at least one visit. The second bar represents the number of visits by all patients who had at least two visits. Figure 6.9: Access more useful information using tooltip{ Patients with at least ve visits, which is 3.1% of patients, accounted for 6.6% of visits. 176 Figure 6.10: Patient records aligned by all ED Reg date (light blue): The total height represents total number of visits. The ED Reg date bar (light blue) on the right of the alignment represents all visits. The Admit date (blue) bar on its right shows 19,576 admissions out of a total of 92,616 visits. 6.5.6.3 Admission Next, we looked at the proportion of people that came to the ED and were admitted We still aligned the data by all ED Reg date but included Admit date (Figure 6.10). The total height represents the total number of visits. The ED Reg date (light blue) bar on the right of the alignment represents all visits. The Admit date (blue) bar on its right shows admissions after visits. There were 19,576 admis- sions out of a total of 92,616 ED visits. 177 Figure 6.11: Patients who died: Patient records are aligned by all ED Reg date (light blue). The total height represents total number of visits. From the total of 92,616 visits to the ED, there were 788 deaths (red). 6.5.6.4 Mortality Using the same alignment by all ED Reg date, the Admit date events were excluded but Death date events were included. The visualization shows that, from the total 92,616 visits to the ED, 788 patients died. This number includes both patients who died in the ED and patients who died after admission (Figure 6.11). We selected these 788 patients and labeled them by setting an attribute Status to \Die", so the patients could be split into dead and alive patients using attribute Status. The alignment was then changed from all registrations to the rst regis- tration. We zoomed into the dead group, where the visualization shows a proportion of patients who died on their 1st, 2nd, 3rd visits and so forth (Figure 6.12). 178 Figure 6.12: Patient records aligned by rst ED Reg date (light blue): The total height represents number of patients. Next, we added the Admit date events back to the visualization (Figure 6.13) and found that the majority of the patients died after admission, not in the ED as expected. \You kinda think these people who are coming in and die are gonna die in the emergency department, but actually looks like large majority of them were admitted," said Dr. Hettinger. We could also see the proportions of patients who were admitted on their second visits and died, or patients who were admitted twice and died. While the major patterns can be detected easily, the rare patterns are more complex and more di cult to interpret. Dr. Hettinger referred to these patterns as \painful to watch". So, we changed the alignment to Death date to help us focus on what happened before they died. Of the people that died, 76% died in the hospital (admitted then died). The rest never were admitted and died in the ED. If the data are more 179 Figure 6.13: The majority of the patients died (red) after admission (blue) more than those who died while in the ED (light blue). reliable and the timestamps are accurate, we could look at patients who died within 30 minutes of showing up in the hospital, for example, patients who came in with cardiac arrest. There were also some patients who came in alive, but something happened to them while they were in the ED and they died. These are usually rare cases. If we have accurate timestamps, we could look at the time distribution. However, there are many problems with these timestamps. Some patients were reported dead without registration. Some were admitted but the timestamps are not in the correct order. 180 6.5.6.5 Revived patients and John Doe We also noticed that some patients were registered after they died. Is that because they came in dead and the registration did not get to them, or did they really come in alive but the timestamps were not consistent? In Dr. Hettinger?s opinion, these were the patients who came into the ED and died. There were some patients who came into the ED and died. Some came, were admitted and died. Then there were some patients with only one timestamp, which means that they died with no prior registration. These cases are called \John Doe". They probably were found dead outside. 6.5.6.6 Identify an interesting pattern and then search for it Using LifeFlow, Dr. Hettinger was able to drill down to nd interesting cases. One patient visited the ED, was discharged, then came back a few months later, was admitted and discharged from the hospital, and shortly thereafter came back and died. He was diagnosed with kidney failure, dehydration, an unde ned kidney problem and kidney disease. This case represents very sick patients that were dis- charged, came back and then died in the ED. The last diagnoses for this patient were kidney disease and cardiac arrest. Another patient came in with a headache, was discharged, came back four months later, then had cardiac arrest and died. Another patient was diagnosed with adema, shortness of breath and high cholesterol. After this person was discharged, he/she came back four months later and died. \See, this pattern right here. It would be awesome if you can reproduce the Similan search," 181 said Dr. Hettinger. So, we switched to the search tab to specify query \ED Reg date ! Discharge date ! ED Reg date ! Death date". At rst, no record was returned as an exact match results (Figure 6.14 left). However, records in other results look like the pattern that we were looking for. This caused some frustration, so we removed Discharge date from the query and submitted the query again. This time we received the expected results (Figure 6.14 right). The results were blank at rst because the ED Reg date and Discharge date in each record were exactly at the same time. This indicated that an additional data cleaning would be required to correct these two dates. We found the pattern that we were looking for in the current data and Dr. Hettinger was very happy with it. \These are exactly the patients that would be interesting to look at," he said. \Another thing that you can do is that... From the risk management standpoint, these are cases that are certainly concerning. I think most deaths ended up getting reviewed in the ED, but I don?t know if they always go back and look and see what was their last visits? They might have missed something, but certainly it could be interesting learning points as well to pull the cases and use it to teach people, so that if there?s something worthwhile to try to gure out to make it worthwhile to bring it to people?s attention. That?s great. That?s excellent that you are able to do that." 182 Figure 6.14: (left) Search for patients who visited, were discharged, came back and died. No record was returned as an exact match results. However, records in other results look like the pattern. This was because the ED Reg date and Discharge date in each record were exactly at the same time. (right) Re ne the query to search for patients who visited, came back and died. 237 patients were identi ed. 183 Figure 6.15: Summary of Complaints (left) and Diagnoses (right) of the Patients? First Visits 6.5.6.7 Diagnoses Another thing we spent time looking at was the diagnosis. To see the com- mon complaints from the rst visits, we right-clicked on the rst ED Reg date bar, selected \Show attributes summary" and selected the attribute \complaints" (Fig- ure 6.15 left). In the table that came up, we saw various kinds of pain: abdominal pain, chest pain, back pain, etc. \This is fantastic. This kind of summary," said Dr. Hettinger. These complaints are from the patients. The triage nurses asked what bring you in today and they answered something. The nurses then had to decide what to put there. I noticed some patients with \end-stage back pain", so I innocently asked if they died after that. \It could be," replied Dr. Hettinger. \Or they?re just being sarcastic." Then we changed the attribute to \V.Dx-Description-1" (Figure 6.15 right), which were the ICD-9 codes that the physicians wrote. However, these descriptions are very speci c. For example, Lumbago is a type of back pain. Backache Nos stands 184 for \Backache (not otherwise speci ed)". These codes were also spread into multiple columns (V.Dx-Description-1, V.Dx-Description-2, : : :), so it was not possible to show accurate counts in here, but at least it gave a sense of what the common diagnoses were. More preprocessing will be required to summarize these ICD-9 codes. 6.5.6.8 Frequent visitors This time we used LifeFlow to drill down to patients with frequent visits (more than 40 visits in one year). There was one patient who was not admitted in the early visits but after many visits he/she was eventually admitted. We checked the complaints and diagnoses for each visit and found that the complaints and diagnoses in the early visits were quite random. However, the rst time he/she was admitted, the diagnosis was \overdosed pain medication". The second time was also \overdosed pain medication". This patient was admitted and discharged. The third time was \suicidal ideation". Again, the patient was admitted and discharged. The fourth time was another \suicidal ideation". This time, the patient was admitted to the psychiatric department. In the bigger picture, we examined the LifeLines2 view on the right side and characterized two types of patients (Figure 6.16). 1. Patients with clusters of frequent visits : Frequent visits followed by a long gap then another series of frequent visits. These patients may have some recurring disease, such as asthma or a mental disorder. The gaps represented when they 185 Figure 6.16: Patients with more than 30 visits: (a) Patients with many visits dis- tributed regularly throughout the year, for example, 1212249 and 1332883 (b) Pa- tients with frequent visits followed by a long gap (marked by red horizontal lines) then another series of frequent visits, for example, 1011068 and 2355683 186 got better. Another possible explanation is that they went to another hospital. 2. Patients with regular visits : Many visits distributed regularly throughout the year. In Dr. Hettinger?s opinion, these two types of patients are like the Anscombe?s Quartet [12] of medical records. They have similar numbers of visits, but are very di erent. The di erence may be left unnoticed when looking at the statistics but can be easily detected with visualization. 6.5.7 Conclusions and Discussion This is the longest case study that I have conducted. Dr. Hettinger had participated since the very beginning of the development and been involved in many design iterations. His medical scenarios inspired many parts of the visualization and user interfaces. The work in this dissertation was found useful for many hospital readmissions use cases. Many of which are understanding the overview, for example, checking admission rate, revisit rate, mortality rate, etc., for which LifeFlow can provide much more information than the standard canned report. Using zoom and lters, users can drill down to particular patients and see their details on demand, for example, examining patients with more than 40 visits. Users may identify an interesting pattern and then start searching for other patients with similar patterns. This case study demonstrates a wide range of exploratory activities that the users can perform on event sequences. At the end of the study, Dr. Hettinger shared his 187 thoughts about LifeFlow in his own words: \I have enjoyed working with LifeFlow (as well as you and your team). I have not used LifeFlow directly in any clinical applications, but am still exploring the best way to use it. However, we have seen interesting pat- terns within LifeFlow, including patterns of use among frequent visitors to the ED that may serve as targets for directed intervention to help patient?s establish care with a primary care doctor and care for chronic conditions. We also looked at patterns of return visits to the ED and speci cally focused on patient?s that died. These cases could later be- come important cases for education as well as risk management concern that might not have otherwise been caught if they did not get visualized through the standard processes that usually screen for patients that re- turn within 72 hours. So I believe it will be very useful in the future. I have spoken with various people at conferences and other meetings who are interested in the application of LifeFlow with their datasets and I believe it will provide important insight into many di erent settings. I believe there will be two major barriers to the adoption of LifeFlow for looking at datasets for the average user. One is getting access to the data set and formatting it in the necessary manner. Data Kitchen in an excellent tool, and with the use of recipes can automate the task, but certainly requires a fair amount of trial and error and familiarity to be used e ciently. Secondly, the controls and features of LifeFlow provide 188 a large amount of customization of the data, but again require training and familiarity to be used well. Even after not using the software for a few weeks I have to re-orient myself to the application and frequently \rediscover" features or where things are located. Even today I found myself looking for a zoom out feature, to have you inform me that the feature was already in place by double clicking the zoom bars. Obvious once you told me, but not apparent from the UI. Not sure the answer to the problem, but future versions could attempt to make the most common features more accessible to the novice, with access to the variety of controls as necessary. This is in no way to diminish what you have been able to accomplish, and it has been a pleasure to work with you on the project." Working on this case study makes me realize the importance of facilitating the preprocessing. By making it easier to preprocess the data, users have less hesitation and overhead to adopt the approach. We also often nd data cleaning problems during the analysis and need to go back to the preprocessing step. In addition, many design issues were identi ed, such as freezing the tooltip 2 or interpreting the alignment 3. Some issues were easy to address while some issues presented more challenging problems that could lead to future research, such as including attributes in the similarity search. Dr. Hettinger?s \rediscover" experience demonstrated that 2When the tooltip is very long, it shows a scrollbar, but users cannot move cursor to the scrollbar because the tooltip will disappear. First, I added a locking mechanism. By pressing F2, users can lock the tooltip and move cursor to the scrollbar. However, this was found awkward and unintuitive. After receiving the complaints from Dr. Hettinger, I removed the lock and let users use arrow keys to control the scrollbar of the tooltip. 3Users often forget that sequences on the left and right of the alignment are not connected. 189 MILCs are also bene cial in testing intuitiveness of the interface repeatedly. He could remember how to use the main features correctly, but sometimes forgot about less common features, which could be due to the long duration of this study, issue in the design, or a combination of both. A challenge due to the length of the study ( 1 year) is that users might have di culty keeping track of added and removed features. Some minor changes could be easily forgotten because users were not using it everyday and there were many changes happened along the way. In terms of design, some features were found to be \not apparent from the UI" and need to be more obvious in the future. 6.6 How do people read children books online? 6.6.1 Introduction The International Children?s Digital Library (ICDL) (Figure 6.17) is an online library that contains 4627 children?s books in 61 di erent languages (as of November, 2011) with over three million unique visitors since its launch in November, 2002. Its users come from 228 di erent countries. One very important question that Anne Rose, the website administrator, wanted to know is how do people read children?s books from the ICDL website? This encouraged us to look at the Apache web logs from the ICDL website to analyze how people read children?s books online. 190 Figure 6.17: International Children?s Digital Library www.childrenslibrary.org 6.6.2 Procedure I requested access to the server and retrieved apache web logs for ICDL website. In the early meetings, I learned about the information contained in the web logs and things we hoped to learn from this dataset. After data conversion, I performed analyses and requested for meetings when I had questions or noticed interesting patterns in the data. The total duration of this study was six weeks. 6.6.3 Data This sample dataset was taken from the period July 01{07, 2011. The original size of the dataset was about 1GB. We ltered the log entries to select only http re- quests to access pages of books: http://www.childrenslibrary.org/icdl/BookPage. 191 Each request contains a query string behind the url, such as bookid=amrdima 00310002 &pnum1=6 &pnum2=7 &twoPage=true &lang=English &ilang=English We parsed the page number and book id from this query string then group the http requests by IP address and book id into records. Each record represents how each IP address (user) read one book, or a book session. IP address + book id Page Time 1 3 3 . 3 7 . 6 0 . 1 9 1 _radjese_00380046 2 2011 06 30 05 : 01 : 0 6 1 3 3 . 3 7 . 6 0 . 1 9 1 _radjese_00380046 1 2011 06 30 05 : 01 : 1 4 1 3 3 . 3 7 . 6 0 . 1 9 1 _radjese_00380046 2 2011 06 30 05 : 01 : 2 0 1 3 3 . 3 7 . 6 0 . 1 9 1 _radjese_00380046 4 2011 06 30 05 : 01 : 2 5 1 3 3 . 3 7 . 6 0 . 1 9 1 _radjese_00380046 6 2011 06 30 05 : 01 : 2 7 Only the rst 30 pages are included in the dataset. The processed dataset has 7,022 records and 57,709 events in total. 6.6.4 Analysis 6.6.4.1 Setup We encoded the page number using color gradient from blue to red. Because people can also read the books in two modes: single-page (1,2,3, ...) and double-page (1,2,4,6, ...) 4, we hid the odd pages. 4Page 1 is the cover. 192 Figure 6.18: The page numbers are color-coded using color gradient from blue to red. We found that people started their reading on di erent pages. Some started from the rst page, while others jumped into the later pages, probably skipping the empty pages in the beginning. The height of the bars shows that people started on the earlier pages more than the later pages. 193 6.6.4.2 First observation The visualization in Figure 6.18 shows that people started their reading on di erent pages. Some started from the rst page but some jumped into the later pages, probably skipping the empty pages in the beginning. The height of the bars shows that people started on the earlier pages more than the later pages. The color bars show a smooth gradient meaning that people mostly read in order. (Remember that the color coding is a gradient from blue to red.) 6.6.4.3 How do people read from the second page? We aligned by the second page to see the reading pattern after visiting the second page (Figure 6.19). The decreasing height of the bars, as time went by, shows that people gradually dropped out while reading. The little long lines that split from the main trend in each step tell us that some readers ipped back to the previous page before they continued reading. The color changed becoming a slightly darker color (previous page) then changed back to the brighter colors (continue reading). We also noticed some long patterns before the second page. 6.6.4.4 Reading backwards We zoomed into the mysterious sequence from the previous screen and saw people reading backwards. (The color changes from red to blue instead of blue to red.) We were curious so we selected them to see more detail (Figure 6.20). In the detail view, the visualization shows one reader started reading from 194 Figure 6.19: After aligned by the second page: People read in order from (blue to red). Some ipped back one page and continued reading (small lines). There are also some long patterns before the second page. 195 Figure 6.20: The selection shows book sessions in which readers accessed the pages in the backward direction. 196 page 30 and ipped back all the way to page 2. Another reader read from page 4 to the end and ipped back all the way. And many more cases. We checked the attributes of these books. They are in English and Span- ish (left-to-right languages). So there should not be any confusion about reading direction. To investigate further, we used the measurement tool to measure the time from page 2 to page 30 when users accessed the pages in the forward direction. The mean and median times were 25:02 and 4:45 minutes, respectively. We also measured the time from page 30 to page 2 when users accessed the pages in backward direction. The mean and median time were 33 and 24 seconds, respectively, which is approximately one second per page. It appears as though they were not reading backwards, just ipping the pages. 6.6.5 Conclusions and Discussion We dug deeper into several records with backward access one-by-one and hy- pothesized a few possible scenarios. 1. Finished reading a part of the book : For example, one reader read from page 10 to 18, which is exactly one chapter of the book, then ipped back to page 1 and did not have any interaction after that. 2. Shelf books : ICDL has a membership system, which a member can store books on his/her virtual shelf for reading later. The system also remembers the last page read by the member. However, we assumed that some members might 197 open the book and nd it at page 30, so they tried to ip it back to the rst page and start reading again from the beginning. 3. Parent preview : Some users read a few pages then ipped back to the be- ginning and started reading from the beginning. We assumed that this is a behavior of parents selecting books for their children. They ipped a few pages to check that it is suitable then went back to the beginning to read it. From these three possible scenarios, the users seemed to depend heavily on the page-by-page (previous/next button) navigation. To change from page 30 back to page 1, they went to the previous page 30 times instead of using other controls that could let them jump to page 1 faster. This could be because the users could use their arrow keys to go to previous/next page, which was convenient for them, so they kept pressing the arrow keys, or the users did not know how to jump back to the rst page directly. However, nding the real explanation of why many people access the online children?s books backwards or whether the navigation needs to be improved, will require a further user study, which is beyond the scope of this case study. Therefore, we would like to take a step back and summarize what we learned from the ICDL web logs so far using LifeFlow. LifeFlow shows potential for helping the ICDL administrator understand how people are accessing the online books. It highlights the users? reading behaviors from the majority that read in order, the readers that skipped the blank early pages going straight to the content, the readers that ipped back and continued reading, to the 198 readers that ipped backwards all the way back to the rst page. By understanding these behaviors better, the administrator can improve the website to suit the readers better. 6.7 Studying User Activities in Adaptive Exploratory Search Sys- tems 6.7.1 Introduction Adaptive exploratory search systems (ESS) can provide e cient user-centered personalized search results by incorporating interactive user interfaces. The core component of the adaptive systems is called a user model, where the users search tasks, contexts, and interests are stored, so that the systems can adapt to those spe- ci c tasks or contexts. In conventional adaptive search approaches, the user models are hidden from the users. They are black boxes and the users cannot estimate easily what is going on inside. Therefore, they cannot expect what personalized results will be returned from the black box and have very limited capability to cor- rect the systems unexpected erroneous behaviors. Several approaches tried to solve this problem by transparently revealing the user model contents and providing the ability to directly control the user models. Dr. Jae-wook Ahn, post-doc researcher at the HCIL, developed two adap- tive ESS called TaskSieve [2] (Figure 6.21) and Adaptive VIBE [1] (Figure 6.22) as parts of his PhD dissertation work. TaskSieve uses typical ranked lists but it has a component for adjusting the importance of user model and user queries. Users can 199 Figure 6.21: TaskSieve explore di erent search results by switching three sets of weight con gurations of user models versus user queries (1:0, 0.5:0.5, and 0:1) while monitoring their user model contents. Adaptive VIBE extends the open user model using a reference point based interactive visualization called VIBE (Visual Information Browsing En- vironment). It de nes the user query and the user model (as the list of keywords) as reference points (Point of Interest or POIs) and interactively places the search result documents according to their similarities to each POI. Documents are placed closer to more similar POIs and users drag the POIs in order to visually explore the search space. Both systems have log facilities that record user activities. These 200 Figure 6.22: Adaptive VIBE actions are organized by higher-level categories in Table 6.2 and 6.3. TaskSieve has six categories of actions while Adaptive VIBE has the same six categories and two additional. Dr. Ahn was interested in analyzing user activity sequences from the log les of the two systems. The log data was extracted from a user study including 33 users in 60 sessions (20 minutes per session). For each session, the systems were loaded with news articles from the Topic Detection and Tracking (TDT4) test collection and the participants of the user study were asked to nish search tasks. Our goal in this study was to discover two user behavior patterns from the log data. 1. Switch of activities : Typical look up search systems will have simple activity 201 Category Action Login/out login, logout Search search Overview search response, navigate page Examine doc open doc Update UM (User Model) save note, remove note Manipulate UM change UM & query weight Table 6.2: TaskSieve User Actions Category Action Login/out login, logout Search search Overview reset visualization Examine doc view by mouse-over, select doc, open doc Update UM (User Model) save note, remove note Manipulate UM change UM and query weight, select UM POI, move UM POI POI activities reset POI, select POI, select query POI, move POI, move query POI Find subset select similar documents from a POI, mar- quee selection, range slider ltering, show similar documents or POIs, view selected documents, view auto-selected documents Table 6.3: Adaptive VIBE User Actions 202 switch patterns such as query ! examine list ! examine documents. We expected to see di erent patterns from the open user model based ESS. Users were supported with more features to explore the search space and view/con- trol their mental models. We were interested to discover how the users really exploited those features. 2. User activity patterns : It would be useful to have a list of user activity patterns regarding how users actually used the features while trying to solve speci c sub-tasks. This knowledge will help improving the future ESS systems with open user models. 6.7.2 Procedure Dr. Ahn and I met a few times in the rst two months of this study. I demonstrated the software and explained the data format. After Dr. Ahn was familiar with the LifeFlow software, he converted his data and analyze them on his own. He contacted me when he needed technical support or found interesting patterns. In the end, we wrote a paper together to report the results [3]. The total duration of this study was ve months. 6.7.3 Data We generated datasets from activity logs of 60 user study sessions for the two analyses above: 1. List of all actions (per user session) for analyzing switch of activities: One 203 Figure 6.23: Before (above) and after (below) lling gaps with colors of previous events dataset for TaskSieve which has 1,716 events and another dataset for Adap- tiveVIBE which has 10,646 events. 2. Pairs (i.e. bigrams) of actions for analyzing user activity patterns: This dataset has 19,142 events. 6.7.4 Analysis 6.7.4.1 Switch of activities We used the high-level action categories (Table 6.2 and 6.3) instead of in- dividual actions in order to reduce the number of event types and visual clutters. Next, we merged repeated consecutive events into one event using a \merge repeated events" option in LifeFlow. However, the sequences are still long and visualizing the time gap continues to be di cult. So, we enabled another option in LifeFlow to 204 ll the gaps between events with colors of previous events, instead of light gray, to help us notice the long activities easier (Figure 6.23). Events are colored as follows: Login/out is not important, so we colored it gray for less attention. Search is encoded by orange. Overview (show an overview of search results) is a follow-up event of Search, so it is encoded in yellow to show connection with Search (orange). Examine doc and Update UM are encoded in pale blue and blue. Manipulate UM is an interesting event type so it is encoded in red to make it stands out. Two additional event types from Adaptive VIBE, POI activities and Find subset, are encoded in light green and green, respectively. Figure 6.24 compares the overviews of the search activity sequences of TaskSieve and Adaptive VIBE. Remember that the width of the colored blocks corresponds to the duration of the action. The two most frequent activities that occupied the largest portion of the screens were Examine Doc (pale blue) and Update UM (blue). It had been expected because the users would spend a signi cant amount of time reading the snippets or fulltexts of the documents (Examine Doc) and edit their notebooks. In the two systems, a notebook editing action automatically leads to updating the user model (Update UM). What was more interesting was the density of the exploratory ac- tivities. In the visualization-based system (Adaptive VIBE), the distribution of di erent actions was denser than in the text-based system (TaskSieve). The partici- pants switched more frequently by spending less time (smaller width colored blocks) per action, with the two additional event types (POI activities (light green) and Find subset (green)) mixed in the sequence. This observation re ects that they 205 Figure 6.24: Overview of User Behaviors in TaskSieve (above) and Adaptive VIBE (below): In the visualization-based system (Adaptive VIBE), the distribution of dif- ferent actions were denser than in the text-based system (TaskSieve). The partici- pants switched more frequently by spending less time (smaller width colored blocks) per action, with the two additional event types (POI activities (light green) and Find subset (green)) mixed in the sequences. 206 fully exploited the richer feature set provided by Adaptive VIBE. At the same time, the high action switches included a lot of exploratory actions. We could assume that they were able to take advantage of the exibility of the visualization system and performed more exploratory behaviors. The second observation is the user model manipulation. Users could directly manipulate the user model weights or drag the user model keywords (POIs), in order to better reorganize the visualization of retrieved documents, learn about and control the e ects of their user models. In Adaptive VIBE, these activities (red) were more frequent and prevalent, which suggests that the users more actively used the visual user model manipulation feature of the system. 6.7.4.2 User activity patterns Part I: Frequent patterns The second part of the analysis was to nd out the exact user behavior pat- terns. It will help to design more e cient adaptive exploratory search systems if we could learn how the users really used them. We limited this analysis to Adap- tive VIBE only, because it provided more diverse adaptive exploration features than TaskSieve. Each record in this dataset is a bigram taken from a full sequence of user?s activities in the previous section. For example, a sequence Login/out ! Search ! Overview ! Examine doc becomes three bigrams: (1) Login/out ! Search, (2) Search ! Overview and (3) Overview ! Examine doc. When these bigrams are visualized in LifeFlow, it becomes a distribution of user actions of length one and two (Figure 6.25). Each bar?s height represents 207 Figure 6.25: Bigrams of user activities visualized in LifeFlow 208 frequency of the action patterns and the horizontal gaps between bars represent the average time. In this analysis, we did not ll the gap with the colors of previous events. Reading the visualization from left to right, we could rst nd out the most frequent actions by looking at the tallest rst level bars and then the following frequent actions. For example, the most frequent action is Examine doc (pale blue). For the actions that follow Examine doc, the most frequent action was Examine doc again while the second and the third were POI Activities and Update UM, which means that the users moved/selected POIs (not user model POIs in this case) or updated their user model contents by adding or removing texts to/from the notebook. This hierarchy of the LifeFlow tree could make the structured analysis of the bigrams easier than simple frequency counting, by following down the branches of dominant activities in the rst level. Table 6.4 summarizes the list of frequent bigrams discovered from this analysis. Overall, the frequencies of the rst actions of the bigrams were almost equivalent except Examine doc. We thus included all event types in the rst level and then counted the top three event types that follow. The ndings from the table are as follows: 1. Many exploratory actions|examine the overview, control the visualization to change/re-interpret the big picture (including user model update/manipula- tion), zoom and nd subset|were much more prevalent compared to simple look-up search actions. 2. There are many sequences of repeating events|Examine doc ! Examine 209 Table 6.4: List of dominant user activity bigrams 1st activity 2nd activity Frequency Examine doc Examine doc 6,392 Examine doc POI activities 254 Examine doc Update UM 204 Search Overview 330 Search Examine doc 108 Search Search 48 POI activities POI activities 192 POI activities Examine doc 107 POI activities Find subset 102 Find subset Examine doc 178 Find subset Find subset 93 Find subset POI activities 62 Update UM Update UM 193 Update UM Examine doc 97 Update UM Overview 62 Manipulate UM Manipulate UM 222 Manipulate UM Find subset 53 Manipulate UM Examine doc 37 Overview Examine doc 63 Overview Find subset 58 Overview Overview 46 doc, POI activities ! POI activities, Update UM ! Update UM and Manipulate UM ! Manipulate UM. 3. User model exploration was used almost as frequently as other actions. 4. This table can be used as a list of possible user actions for similar adaptive ESS. 210 Figure 6.26: Bigrams of user activities after aligned by Manipulate UM: The three most frequent actions before Manipulate UM were still the most frequent actions after Manipulate UM. Seems like the user model manipulations were in the chain of the three actions repeatedly and the user model manipulation task needs to be considered as a set with those friend actions. 211 6.7.4.3 User activity patterns Part II: User model exploration The next step was to look deeper into the user activities regarding how the users explored and controlled their open user models, which were the core module of the adaptive ESS. Adaptive VIBE implemented the visual user models as keyword POIs and allowed users to drag them around the screen, select them for supporting further actions (e.g. selecting documents similar to the selected POI), disable some of them temporarily, etc. From the analysis above, these user model manipulation activities were conducted as signi cantly as the others. The three most frequent actions just after Manipulate UM were Manipulate UM (red), Examine Doc (pale blue), and Find subset (green) (Table 6.4). This suggests that the user model manipulation actions were done repeatedly and the users could narrow down their search targets as results of user model exploration. Therefore, we aligned the sequences by all Manipulate UM. In Figure 6.26, the right side after the vertical dashed line is the pairs where Manipulate UM was preceding the second action. The left side is vice versa. We could compare the left and right side and observe that the three most frequent actions before Manipulate UM were still the most frequent actions after Manipulate UM. It led us to conclude that the user model manipulations were in the chain of the three actions repeatedly and the user model manipulation task needs to be considered as a set with those friend actions. By looking at the overview again (Figure 6.24 below), we could con rm that occurrences of Manipulate UM (red), Examine doc (pale blue) and Find subset (green) are adjacent to each other. 212 6.7.5 Conclusions and Discussion This case study provided a preliminary analysis of user behavior patterns in adaptive exploration search systems. We analyzed the log data of a user study that included various user actions using LifeFlow. From the analysis, we could nd that the user exploration actions were switching more frequently using the visualization- based adaptive ESS system. At the same time, the user model exploration features were used as frequently as other features and we could nd patterns regarding how the user model explorations were combined with other exploratory behaviors. We also generated a list of user action patterns of the adaptive ESS systems for sup- porting future system design of the same kind. This could be extended to include deeper analysis of other activities closely related to user model exploration, such as nding subset, query exploration, and user model updates. Finally, I would like to include Dr. Ahn?s comment on LifeFlow in his own words: \LifeFlow is an e cient and powerful tool to analyze complex log data. I used to store my log data in a database and used SQL to analyze them. Even though I am familiar with formulating sophisticated SQL commands, they have innate limitations compared to visualizations such as LifeFlow. For example, I can get the activity switch statistics using SQL by calculating the average switch counts. However, they are mostly aggregated scores and it cannot provide the overview of the distribution of the activity switches, which LifeFlow can do successfully. At the same time, the activity switch analysis is prone to a criticism that 213 if the frequent switches only re ect the random and meaningless actions of the users (probably due to the frustration from a new tool). Using LifeFlow, we can check the visual length of actions (longer actions will represent a low chance of random switches) and the regular pattern of sequences in order to address such concerns. For identifying the user-model related activity patterns and generating a meaningful list of those activities, I believe visual analysis using LifeFlow is one of the most e cient methods. In particular, it was very helpful to examine the actions before and after the interested actions. Using this feature, I was able to easily identify the frequent chain of activities that the users performed during the experiment." 6.8 Tracking Cement Trucks 6.8.1 Introduction Freewill FX Co., Ltd., is a company based in Bangkok, Thailand that designs and produces wireless sensors and mobile tracking devices. Freewill FX?s mission is to help people lead better lives and to help businesses become more e ective by using mobile and wireless technologies. One of the products that the company o ers is TERMINUS: Advanced Fleet Management System. Knowing just the locations of your eet does not provide tangible business bene ts. Thus, TERMINUS focuses more on adapting eet management systems to business processes and optimizing eet utilization to generate measurable business bene ts. 214 After hearing about LifeFlow, Mr. Chanin Chanma, product specialist, and Mr. Sorawish Dhanapanichakul, product manager of TERMINUS, were interested in looking at one of the datasets that they had in LifeFlow. One of their clients had ordered devices for tracking cement trucks. The sensory data from these devices were automatically collected into an excel spreadsheet. 6.8.2 Procedure Mr. Chanma and I and discussed about the possibility of using LifeFlow to analyze his data. Following the discussion, he sent me a spreadsheet that contains sample data. I then converted the data using DataKitchen and contacted him back with a few screenshots. We then arranged an online meeting via Skype with him and Mr. Dhanapanichakul to look at the data together. After the online meeting, I summarized ndings from the meeting and requested for additional feedback via email. The time between each meeting was roughly one week and the total duration of this study was six weeks. 6.8.3 Data The dataset consists of 821 trips and 8,091 events. Each trip consists of multiple events tracked from the beginning to the end of the trip: Enter plant, Start load cement, End load cement, Leave plant, Arrive site, Start fill cement, End fill cement, Leave site and Back to plant. IDs of the plant, site and vehicle were also documented for each trip. 215 Figure 6.27: LifeFlow showing 821 trips: The majority of the trips had normal sequences. However, there were some anomalies in the bottom. The raw data was in MS Excel Spreadsheet format. A preprocessing was required to translate the column headers into English. (They were in Thai.) After that, the spreadsheet was converted into LifeFlow format using DataKitchen. 6.8.4 Analysis 6.8.4.1 Overview After loading the dataset with 821 trips, we could see that, most of the time, the trips occurred step-by-step as expected (Figure 6.27). Then we could see the 216 Figure 6.28: Anomalies: 1) Some trucks arrived to the sites but did not ll cement. 2) Some trips were reported to have begun lling cement before arriving to the sites. 3) Some trips were reported to have loaded cement before entering plants. time distribution of each step by placing the cursor over and select some trips that took a longer time than usual from the distribution. 6.8.4.2 Anomalies In addition to the normal sequences, we also noticed some rare sequences on the bottom of the screen, so we zoomed into those sequences (Figure 6.28). These anomalies could be placed into two categories: 1. Incompleted trips : Some trucks did not go to the sites or went to the sites but 217 did not ll the cement. A possible explanation was the trip might have been cancelled. 2. Erroneous sequences : Some trips were reported with Start fill cement be- fore Arrive site, or Start load cement before Enter plant, which is il- logical. Because the mobile sensors trigger events when reaching speci ed locations, this indicated inaccurate locations of some plants and sites. 6.8.4.3 Monitoring plants? performance Mr. Dhanapanichakul observed that trips from some plants usually take more time, so we tried to con rm his observation using LifeFlow. However, these trucks sometimes stay overnight at plants or sites. Comparing the overall time from Enter plant to Return to plant will include overnight time, resulting in a biased com- parison. Therefore, three event types; Enter plant, Leave site and Return to plant; were excluded to eliminate the overnight time and provide a more accurate comparison. Figure 6.29 shows the visualization after excluding the three event types and grouping by Plant ID. The ranking was changed to \Average time to the end" instead of the default \Number of records", so the plants were sorted by their trip time. The slowest plant was \C313", which is located far from the city. The average time from leaving this plant (green) to arrival at sites (red) was 45 minutes, which was much longer than other plants. From our discussion with the engineers at Freewill FX, this plant is responsible for a very wide area coverage and located in a district with very heavy tra c congestion. 218 Figure 6.29: Trips are grouped by Plant ID. Three event types; Enter plant, Leave site and Return to plant; were excluded to eliminate the overnight time and provide a more accurate comparison. Trips from plant \C313" took on average 45 minutes from leaving plants (green) to arrival at sites (red), which was much longer than other plants. This is because of its wide area coverage and regular heavy tra c near its location. 219 Figure 6.30: All events except Start fill cement (blue) and End fill cement (light blue) are hidden. Next, we aligned by Start fill cement, grouped by at- tribute Site ID, and ranked by \Average time to the end". We can see cement lling duration at each site ranging from two minutes to four hours. 6.8.4.4 Classifying customers from delay at sites The cement company was interested in analyzing the cement lling duration at each site. This could help the manager decide how to treat their customers according to the delay incurred. For this analysis, we start by hiding all events except Start fill cement (blue) and End fill cement (light blue). Next, we aligned by Start fill cement, grouped by attribute Site ID, and ranked by \Average time to the 220 end". We could see duration ranging from two minutes to four hours on average (Figure 6.30). Interesting cases were low-performance sites that took quite a long time to complete the operation, e.g, four hours. Zooming into the top portion of the visualization helped us identify those sites. Similar analysis can be performed to analyze these scenarios: 1. Group by Plant ID and analyze cement loading duration (time gap between Start load cement and Finish load cement). 2. Group by Vehicle ID and analyze travel time (time gap between Leave plant and Arrive site). 6.8.4.5 Search The search could be useful in nding particular patterns. For example: Finding trips that had cement loading time longer than a speci ed time period, e.g., 40 minutes. Finding trips that had cement lling duration longer a speci ed time period, e.g., three hours. Finding trips that had travel time from plants to sites longer than a speci ed time period, e.g., 30 minutes. 221 6.8.5 Conclusions and Discussion LifeFlow was applied to eet management in this case study. Because the ce- ment delivery process is structured and straightforward with little variations, Life- Flow can easily identify anomalous patterns and detect tracking errors. The ability to include and exclude events rapidly allows quick explorations of subsets of the data to answer particular questions, such as cement lling time or travel time from plants to sites. Grouping event sequences by attributes and ranking support easy comparison between subgroups, especially when all groups have the same sequence. Mr. Chanma said that LifeFlow helps them see the data and understand the big picture. However, the learning curve can be challenging in the beginning. A proper training is required to help users understand the visualization and learn the interactions. The Freewill FX team envisioned the possibilities of using LifeFlow for another scenario when additional data are available. Everyday, the cement company has to decide how many cars should be sent to a particular plant at the same time. For example, should they send a eet of six trucks or three eets of two trucks each? Too large eets can lead to long queues at destinations. To begin answering this question, we can group the trips by index of the truck in a eet (1, 2, 3, . . . ), and see how long the following trucks in each eet (2, 3, 4, . . . ) took to ll cement. 222 6.9 Soccer Data Analysis 6.9.1 Introduction Sports are activities that captivate a wide range of audiences and attract major media attention. For example, the World Cup 2010 was broadcast in 214 countries and 715.1 million people|that is one in ten people alive at the time|watched the nal. Key events and statistics from these competitive matches were carefully collected into sports databases. These databases are invaluable for coaches to learn their teams? performance and adjust their coaching strategies. Sports scientists are studying these data intensively [56] while sports fans also enjoy exploring fun facts from sports statistics. In this study, I explore how LifeFlow can be used for analyzing sports data. The dataset is collected from Manchester United Football Club (Man U), an English professional soccer club, based in Old Tra ord, Greater Manchester, that plays in the English Premier League. Manchester United has won the most trophies in English football, including a record 19 league titles and a record 11 FA Cups. It is one of the wealthiest and most widely supported football teams in the world. I invited Mr. Daniel Lertpratchya, a PhD student at Georgia Institute of Technology and a devoted Man U fan who has been supporting the team for more than ten years, to participate in this study. 223 6.9.2 Procedure I discussed the possibilities of analyzing data with LifeFlow with Mr. Lert- pratchya. After he expressed his interest, I sent the software, data and introduction videos to him via email, and encouraged him to explore the data on his own. I also provided technical feedback as necessary. Two weeks after, we arranged an 1.5-hour online meeting via Skype to discuss his ndings. I audiotaped this session and later transcribed and summarized all ndings into a report. The total duration of this study was ve weeks. 6.9.3 Data I gathered the data from all 61 matches that Man U. played in season 2010- 2011. Each match contains the following events: Kick o : Beginning of the match Score: Man U scored. Concede: Opponent scored. Pen missed : Man U missed a penalty. Yellow : Man U conceded a yellow card. Red : Man U conceded a red card. Opp Yellow : Opponent conceded a yellow card. Opp Red : Opponent conceded a red card. Final whistle: End of the match The dataset also contains the following attributes for each match: 224 Competition: English Premier League, UEFA Champions League (UCL), FA Cup, Carling Cup, FA Community Shield or friendly Opponent : The opponent team Result : win, loss or draw Venue: home, away or neutral Score: Total Man U goal(s) Opponent Score: Total opponent goal(s) 6.9.4 Analysis 6.9.4.1 Finding entertaining matches The rst scenario that we discussed was nding entertaining matches to see replay videos. Mr. Lertpratchya de ned his entertaining matches as follows: 1. Matches that Man U dominated and scored several goals Mr. Lertpratchya used the LifeFlow view to select matches that Man U scored the rst three goals in a row (Figure 6.31). The detail view on the right show matches against Birmingham City, Blackburn Rovers, Bursaspor, MLS All Stars, Newcastle United, West Ham United and Wigan Athletic. He was interested in the two highest scoring matches against Blackburn Rovers and Birmingham City when Man U won 7-1 and 5-0, respectively. Using tooltips to learn about goalscorers (event attributes), we found that Dimitar Berbatov scored ve goals in the Blackburn game and also a hat-trick (three goals) in the Birmingham game. 225 Figure 6.31: Matches that Man U scored the rst three goals 2. Matches that both teams scored in \an eye for an eye" fashion Another type of entertaining matches was when one team scored a goal and another team retaliated by scoring in return. We looked for patterns with alternating Score (green) and Concede (purple) bars and found the match against Wolverhampton Wanderers (Wolves) in the Carling Cup (Figure 6.32). In this game, Man U scored and was equalized by Wolves twice before Javier Hernandez was substituted and scored the winning goal in the stoppage time. 3. Matches that Man U conceded early goals and came back to win the game Mr. Lertpratchya grouped the matches by results and focused on the winning matches that start with Concede (purple). There were matches against Black- pool and West Ham United that Man U conceded two goals and came back 226 Figure 6.32: Matches that both teams scored in \an eye for an eye" fashion Figure 6.33: Matches that Man U conceded early goals and came back to win the game 227 Figure 6.34: Matches grouped by Score and Result to win 3-2 and 4-2, respectively (Figure 6.33). 6.9.4.2 Predicting chances of winning To see the percentage of winning when Man U scored n goals, we grouped matches by Score and Result. LifeFlow in Figure 6.34 shows all proportions of win, loss and draw when Man U score n goals. The proportion can be used for a rough estimation of winning. 6.9.4.3 Explore statistics Mr. Lertpratchya and I explored the dataset from sports fans? perspective and found the following interesting facts and observations: 228 Figure 6.35: Summary of all scorelines 1. Scoreline We grouped the matches by Score and Opponent Score (Figure 6.35). From the visualization, we found that: Man U often scored two goals (19/61 matches). There were only nine matches (14.75%) that they did not score. 2-1 was the most popular scoreline (nine matches), followed by 1-0 (eight matches). In four out of a total of eight matches that Man U won 1-0, the winning goals occurred within the last ten minutes. { We selected all 1-0 matches, removed others and saw the distribution split into two groups: matches with early goals and late goals (Figure 6.36). 229 Figure 6.36: A distribution of the winning goals in all 1-0 matches: Matches are split into two groups: matches with early goals and late goals. 2. Opponent Figure 6.37 shows the matches grouped by Opponent. Man U competed against Chelsea more often than any other teams ( ve matches), and always scored the rst goal. 3. Competition Grouping the matches by Competition, we saw the di erence between their performance in the English Premier League (EPL), a national competition, and the UEFA Champions League (UCL), the biggest tournament in Europe (Figure 6.38). Man U scored the rst goal in the EPL faster than in the UCL: On av- 230 Figure 6.37: Matches grouped by Opponent: Man U competed against Chelsea in this season more often than any other teams. erage the rst goals came after 27 and 43 minutes for EPL and UCL, respectively. This is understandable because most teams tended to be more cautious in the UCL. Matches with many goals occurred in the EPL more than in the UCL: Since the UCL is a more prestigious competition that consists of all the best teams in Europes, the teams? qualities are higher and, therefore, more di cult to score against. In all matches that Man U won in the UCL, they scored the rst goals : On the other hand, if they conceded the rst goal, they drew or lost (Figure 6.39). 231 Figure 6.38: Matches grouped by Competition: Man U scored (green) the rst goal in the English Premier League (EPL) faster than in the UEFA Champions League (UCL). 232 Figure 6.39: Matches grouped by Competition and Result: Man U scored (green) the rst goal in all matches that won in the UCL. 4. Home and Away Venues played an important role in the team?s performance. Man U is very strong at home but not as strong on the road : Grouping by Venue, the visualization shows that Man U scored rst in most of the home matches. Grouping by Venue and Result (Figure 6.40), we found that they won most of the home matches and scored rst in all of the winning home matches. Grouping by Result and Venue (Figure 6.41) shows that many scoreless (0-0) draws are away matches. For away matches, the number of matches decreased with an increasing number of goals scored, but the same is not true for home matches : The 233 Figure 6.40: Matches grouped by Venue and Result: Venues played an important role in the team?s performance. Man U had a great performance at home but was not as strong on the road. 234 Figure 6.41: Matches grouped by Result and Venue: Most common score in a draw match is 0-0. Most of these matches are away matches. number of goals sorted descending by the number of matches with that amount of goals are 0,1,2,3,4 and 5 for away matches and 2,1,3,4,0,5 and 7 for home matches (Figure 6.42). 5. Goals Goals are the most important events in soccer matches. Man U often scored rst : They scored rst 41 times while the opponents scored rst 14 times. The patterns can be seen in Figure 6.43. Three fastest goals occurred on the rst minute: We aligned by Kickoff and hide Concede. Using a selection from distribution, we found three matches that Man U scored very early (Figure 6.44). One match was 235 Figure 6.42: Matches grouped by Venue, Score and Opponent Score: For away matches, the number of matches decreased with an increasing number of goals scored, but the trend is not the same for home matches. Figure 6.43: All matches that Man U played in season 2010-2011 236 Figure 6.44: Three fastest goals occurred on the rst minute: Using a selection from distribution, we could select matches that Man U scored very early. a friendly match against MLS all stars. Federico Macheda was the goalscorer. The other two matches are in EPL against Aston Villa and Chelsea. Javier Hernandez scored in both of them. Javier Hernandez was the most frequent scorer of the rst goal : We right- clicked on the rst green bar ( rst goal) and brought up a summary of event attribute \Player". The summary table in Figure 6.45 lists fre- quent scorers of the rst goal: Javier Hernandez (ten matches) , Dimitar Berbatov (nine matches), Wayne Rooney (eight matches). Javier Hernandez scored the rst goal against Stoke City on the 27th minute twice: By clicking on \Javier Hernandez" in the summary ta- ble, the matches that Hernandez scored were selected (Figure 6.45). We 237 Figure 6.45: Javier Hernandez and rst goals: We right-clicked on the rst green bar ( rst goal) and brought up a summary of event attribute \Player". Javier Hernandez was the most frequent scorer of the rst goal. He also scored the rst goal against Stoke City on the 27th minute twice. 238 Figure 6.46: George Elokobi, Wolverhampton Wanderer?s left defender scored the rst goal against Man U twice. Those two goals were the only two goals in his total 76 appearances for Wolves from 2008{2011. looked at the entire match detail from the LifeLines2 view and noticed that he scored against Stoke City in both xtures on the exact same minute (27?). George Elokobi, Wolverhampton Wanderer?s left defender, not any world- class striker, scored the rst goal against Man U most often: We had looked that Man U players who scored the rst goal so we were curious about opponent players who scored the rst goal. We hid Score and showed Concede instead, then right-clicked on the rst Concede (purple) bar to see attribute summary (Figure 6.46). Each team meet with Man U only 2-5 times in this season, so most players scored the rst goal 239 Figure 6.47: Display only yellow cards and group matches by Result: Man U received less booking in winning matches. against ManUtd only once, except Mr. Elokobi, who scored twice. It was surprising because he is an unknown defender from a small team. Those two goals were also the only two goals in his total 76 appearances for Wolves from 2008{2011. 6. Yellow Cards Paul Scholes received the rst yellow card most frequently : We hid all events except Kickoff, Yellow and Final Whistle. We right-clicked the rst yellow card bar and brought up the attribute summary and found that Paul Scholes, not so surprisingly, received the rst yellow cards ve times. 240 Man U received less booking in winning matches : We grouped matches by Result and found that 17 out of a total of 40 winning matches were free from yellow cards (Figure 6.47). This supported our assumption that the opponents did not put much pressure to Man U defence, so there were not many fouls. We also thought that the players could be less aggressive and more disciplined when they were winning. In all matches that Man U lost, the opponents received yellow cards.{ This was the opposite situation. We hid all events except Kickoff, Opp Yellow and Final Whistle and grouped matches by Result. The oppo- nents received yellow cards in all matches that Man U lost. We made an assumption that Man U were attacking to score back and the opponents had to stop them by committing fouls. 7. Red Cards Figure 6.48 shows only Red and Opp Red and groups matches by Result. When opponents received red cards, Man U always won: The team took advantages of having an extra player well and turned them into victories. There was not any match that both teams received red cards : Sometimes this situation happens in a derby or rival match when the atmosphere is ery, or a stressful match when the stake is high. However, it did not happen during the duration of this dataset. 241 Figure 6.48: Display only red cards for both sides and group matches by Result: When opponents received red cards (blue), Man U always won. There was one match that Man U won with 10 players. 242 Figure 6.49: (left) Missed a penalty then conceded a goal : A disappointment for the fans as the match against Fulham ended with a draw at 2-2. Nani had a chance to make it 3-1, but he missed the penalty kick. (right) Missed a penalty but nothing happened after that : Man U was awarded a penalty kick while the team was leading 1-0, but Wayne Rooney could not convert it. However, the opponent could not score an equalizer. 6.9.4.4 Search for speci c situations Near the end of our session, we discussed whether some situations exist in the matches or not. To answer these questions, we used the FTS to specify and search for several situations. Some of the interesting queries are reported below: 1. Missed a penalty then conceded a goal : This was an example of situations that could lead to newspaper headlines. We found a match against Fulham, in which Nani missed a penalty while Man U was leading 2-1 (Figure 6.49 left). 243 Figure 6.50: Received a red card then scored a goal: A user searched for a situation when Man U received a red card then conceded a goal. However, he could not nd any, but found a match against Bolton when Man U scored after receiving a red card instead. 244 After that Fulham equalized and the match ended at 2-2. 2. Missed a penalty and nothing happened after that : This was the opposite of the previous situation when the team made mistake and was not punished. We searched for a pattern Pen missed ! no Concede / no Score ! Final whistle (Figure 6.49 right). There was a match against Arsenal, in which Wayne Rooney missed a penalty while Man U was leading 1-0. Luckily, the match ended at 1-0. 3. Received a red card then conceded a goal : This was a possible situation because another team might take advantage of having more players. However, we did not nd any match with this pattern (Figure 6.50). The exact match results was empty. To con rm the results, we looked at the other results and there was not any match with red card (red) followed by conceding a goal (purple). Instead, we surprisingly found a match when Man U received a red card then scored a goal. 6.9.5 Conclusions and Discussion This dataset, despite its small size (61 records), is interesting because it has several attributes that can be used in combination to produce interesting perspec- tives. Including/Excluding attributes and changing attribute orders were heavily used features in the analysis. The dataset also contains rich event attributes, such as goalscorers, that highlight the usefulness of attribute summaries. Mr. Lertpratchya was able to use the software comfortably after watching an 245 introduction video that I sent him. There were only a few issues that he raised during the study. The rst one was about the time axis in LifeFlow. He was wondering why some patterns were shorter than 90 minutes, despite the fact that a soccer match is always 90 minutes long. This is because LifeFlow displays a sum of averages, which can be shorter or longer than the actual time. Another issue was when we were looking at yellow cards in the LifeLines2 view, we knew that the selected yellow card is for a speci c player, but the tooltip showed another player that we did not expect. After a moment of frustration, we realized that both players had yellow cards at the same time and the events overlapped. On the positive side, he was satis ed with the software and enjoyed our discussion because it aligned with his interest. He could understand many interesting facts in a short period of time. In the last part of our study where the FTS was used primarily, I found that having the results displayed in two parts give more con dence to the results. When the exact match results are empty, Mr. Lertpratchya examined the other results and con rmed that the situations he was looking for did not exist. This is more bene cial than returning nothing and leave the users wonder whether the query is wrong or the situation really does not exist. In summary, this case study demonstrates many ways to use LifeFlow with soccer data. It can help video subscribers pick interesting matches from an archive to watch. Mr. Lertpratchya was able to select matches that he felt were entertain- ing. We also discovered many fun facts in the study. Sports journalists can use LifeFlow to nd these fun facts and write about them in the news. Sports fans like Mr. Lertpratchya can enjoy exploring interesting statistics in their free time or use 246 CATEGORY_SELECT_ALL FILTER CATEGORY_SELECT_NONE SAVE_CONFIG REVERT_TO_ORIGINAL LIFELINES2_SELECT REMOVE_INSTANCES LIFEFLOW_SELECT_FROM_DISTRIBUTION CATEGORY_CHANGE_ORDER PLUG_DATA ATTRIBUTE_CHANGE_CHECKBOX LIFEFLOW_SELECT_SEQUENCE ALIGN LIFEFLOW_CLEAR_SELECTION CATEGORY_CHANGE_CHECKBOX LIFELINES2_EVENT_TOOLTIP LIFELINES2_SCROLL LIFELINES2_INSTANCE_TOOLTIP LIFEFLOW_VERTICAL_PAN_ZOOM LIFEFLOW_HORIZONTAL_PAN_ZOOM LIFEFLOW_TOOLTIP Frequency (per session) 0 50 100 150 200 0.01 0.04 0.05 0.08 0.08 0.19 0.27 0.28 0.38 1 1.1 1.21 1.43 1.5 3.79 8.48 11.49 13.86 65.83 98.39 157.06 Figure 6.51: Features of LifeFlow used in 238 sessions them to predict match results. Coaches can analyze their teams? performance with LifeFlow and adjust their tactics. Moreover, LifeFlow has potential to be useful for other sports, which a normal user can foresee. Near the end of our discussion, Mr. Lertpratchya asked about the possibilities of using LifeFlow for football, hockey, and other sports. 6.10 Usage Statistics I have collected action logs of LifeFlow usage since March 2011. After removing short sessions, which resulted from debugging and testing, 238 sessions were quali ed for usage analysis. The frequencies of actions were counted and plotted as a bar chart (Figure 6.51). As shown in Figure 6.51, the most common action was LIFEFLOW TOOLTIP, followed by LIFEFLOW HORIZONTAL PAN ZOOM and LIFEFLOW VERTICAL PAN ZOOM. 247 The high frequencies of these three actions indicated that users spent most of the time navigating and exploring the LifeFlow overview visualization. The next three most common actions were LIFELINES2 INSTANCE TOOLTIP, LIFELINES2 SCROLL and LIFELINES2 EVENT TOOLTIP, which implied that users were also examining records in detail using the LifeLines2 view on the right side, but less frequently. From the frequencies of the three tooltip actions, users seemed to pay more attention towards higher-level information. LIFEFLOW TOOLTIP, a tooltip for Life- Flow sequence, was used more than LIFELINES2 INSTANCE TOOLTIP, a tooltip for each record, and LIFELINES2 INSTANCE TOOLTIP was used more than LIFELINES2 EVENT TOOLTIP, which is a tooltip for each event. After visualizing the tooltip usage pattern in LifeFlow (Figure 6.52), the visualization shows that tooltip for LifeFlow sequence (LIFEFLOW TOOLTIP (green)) was often used and followed by tooltip for each record (LIFELINES2 INSTANCE TOOLTIP (blue)). After that, users often used LIFEFLOW TOOLTIP (green) again, or LIFELINES2 EVENT TOOLTIP (light blue). Users rarely used LIFELINES2 EVENT TOOLTIP (light blue) immediately after LIFEFLOW TOOLTIP (green). Other than navigational and tooltip actions, other frequent actions were \in- cluding and excluding events" (CATEGORY CHANGE CHECKBOX), \including and ex- cluding attributes" (ATTRIBUTE CHANGE CHECKBOX), and \align" (ALIGN). In terms of selection, users were found selecting top-down from the LifeFlow overview (LIFEFLOW SELECT SEQUENCE and LIFEFLOW SELECT FROM DISTRIBUTION) more than selecting bottom-up from the LifeLines2 view (LIFELINES2 SELECT). 248 Figure 6.52: Analyzing LifeFlow?s tooltip usage with LifeFlow: LIFEFLOW TOOLTIP (green), a tooltip for LifeFlow sequence, was often used and followed by LIFELINES2 INSTANCE TOOLTIP (blue), a tooltip for each record. After that, users often used LIFEFLOW TOOLTIP (green) again, or LIFELINES2 EVENT TOOLTIP (light blue), which is a tooltip for each event. 249 6.11 A Process Model for Exploring Event Sequences Based on usage statistics and observation of user behaviors during the case studies, I have developed a process model for exploring event sequences (Figure 6.53). This process model follows the sensemaking loop by Card and others [95, 123], but emphasizes on event sequence analysis. My process model drew inspirations from Thomas and Cook?s canonical 4-step process model for analytical reasoning [123] and its extensions [45, 133]. The process model consists of the following steps: 1. De ning goals. 2. Gathering information. (a) Preprocessing data (b) Cleaning data. 3. Re-representing the information to aid analysis. 4. Manipulating the representation to gain insight. (a) Manipulating the representation. (b) Exploring results of manipulation. (c) Searching for patterns. (d) Exploring search results. (e) Handling ndings. 5. Producing and disseminating results. (a) Recording ndings. (b) Producing results. (c) Disseminating results. 250 Defining goals Gathering information Cleaning data Re-representing the information to aid analysis Manipulating the representation to gain insight Producing and disseminating results Manipulating the representation Exploring results of manipulation Searching for patterns Exploring search results Handling findings ?? Change alignment ?? Include/Exclude event types ?? Include/Exclude/Reorder attributes ?? Filter ?? Understand common trends ?? Extract rare or unexpected patterns ?? Extract difference between groups ?? Examine distributions of time gaps ?? Examine summaries of attributes ?? Examine exact match results ?? Examine other results Preprocessing data Disseminating results Producing results Recording findings Figure 6.53: A Process Model for Exploring Event Sequences 251 6.11.1 De ning goals In the beginning of an analysis, users are advised to de ne goals or questions that they seek to answer. Although it is not strictly required, having goals will clarify what will be needed in the dataset and guide the appropriate data conversion. For instance, in the medical case studies, the physicians can treat each visit as one record or multiple visits of the same patients as one record. In a patient transfer study, the goal is to learn about common trends of a visit. Treating each visit as one record emphasizes what happens within each visit. On the other hand, in a hospital readmission study, users focus on nding returning patients, so it is more suitable to group multiple visits into one record. 6.11.2 Gathering information This step consists of two phases: preprocessing raw data into LifeFlow format and cleaning data errors. 6.11.2.1 Preprocessing data LifeFlow does not connect to databases directly, but requires users to export and convert their data into a custom format. A tool called DataKitchen was devel- oped to help users convert their data. However, users still need to understand the data format to fully exploit the bene ts of LifeFlow. A dataset in LifeFlow consists of event sequence records. Each record contains events and a set of attributes. Each event consists of a timestamp, an event type and a set of attributes. The main 252 decisions in this step are de ning record and event types. Record: De nitions of a record vary according to analysis goals. For example, in the ICDL study, I rst de ned a record as an entire http session for each user that contains all events from when users enter the ICDL website until they leave the website. However, after realizing that the goal was to understand how people read books, not understand how people navigate the website, I rede ned a record as a book reading session instead. Each book reading session contains all events from one user from the time that he/she starts reading one book until he/she stops reading it. One user can have multiple book sessions. Event types: Although LifeFlow does not limit the number of event types, it is recommended to have less than 15 event types displayed at the same time. Therefore, it is important to de ne a minimal set of event types. Here are two common strategies. 1. Remove irrelevant events from the dataset. For instance, in the ICDL study, all events that do not represent a page of a book are removed. Users can choose to leave these events in the dataset and include/exclude them during analysis as well. However, removing them will make the dataset more compact and reduce the chances that users will feel overwhelmed because of having too many event types to choose from. 2. Group multiple event types into a higher-level event type. For exam- ple, ICU rooms on the 2nd oor (ICU-2nd) and ICU rooms on the 3rd oor (ICU-3rd) can be grouped into ICU. Users can have both high-level 253 (ICU) and low-level (ICU-2nd, ICU-3rd) event types in the dataset and include/exclude them according to their needs during analysis. It is common for users to revisit this step again once they become unsatis ed with converted data or an analysis goal is rede ned. An early phase of each analysis often consists of trial and error data preprocessings. 6.11.2.2 Cleaning data After data are converted, users should check for data integrity. From my observations, this is the rst thing to look for after loading data into LifeFlow for the rst time. Users would look at the visualizations and try to ensure that their data were converted correctly. Some errors due to corrupted timestamps were very obvious. Common errors found in the case studies fall into these two categories: 1. Illogical sequences : For example, patients died before they were transferred to ICU (Die ! ICU), or cement trucks started lling cement before reaching construction sites (StartFillCement ! EnterSite). 2. Unreasonably long gaps : For example, in the tra c incident study, many in- cidents were reported to be one hundred years. However, some errors could be obscured and not discovered until later in the analysis or even left without being noticed. In the soccer study, we found one match that Man U lost to the opponent, but the attribute Result for that match was Win. A systematic way to check for data integrity that could be further explored is to use integrity constraints in temporal databases as a source of error detection. 254 6.11.3 Re-representing the information to aid analysis After users are satis ed with their data, a visual analysis would begin. I recom- mend users to customize the fundamental visual properties, especially color codings for event types. Choosing appropriate colors can strengthen cognitive process and aid analysis. Three color-coding strategies were used in the case studies: 1. Distinct color schemes: Use colors that are distinguishable from each other. It is easier to assign distinct colors to a smaller number of event types. How- ever, not every dataset has a small ( 10) number of event types. When there are many event types, it can be quite di cult. Here are a few guidelines for assigning distinct colors: If possible, associate meanings of colors with the meanings of events. For example, in the patient transfer study, ICU events are encoded in red to indicate critical conditions while Floor events are encoded in green to indicate safe conditions. In the soccer study, red cards and yellow cards are encoded with red and yellow, respectively. Use neutral color, e.g. di erent shades of gray, for less important events. For instance, in the soccer study, Kickoff events are encoded in gray to capture less attention. Reuse the same color if necessary. Some event types that are always apart can share the same color. For example, in the soccer study (Figure 6.34), Kickoff and Final whistle events are both encoded in gray. In the cement truck study (Figure 6.27), blue was used for start loading cement 255 (StartLoadCement) and start lling cement (StartFillCement). Use the same color to provide a \bracket" e ect. In the cement truck study (Figure 6.27), red was used to indicate arrivals and departures from sites (ArriveSite and LeaveSite). It creates a visual block, from one red bar to another red bar, that contains StartFillCement and EndFillCement events in between. 2. Sequential color schemes: This type of scheme is useful for showing con- tinuity when there is an expected sequence of events. For example, in the ICDL study, page numbers are encoded in a gradient from blue to red to in- dicate reading direction (Figure 6.18). This strategy can also be applied to binned numerical values, levels of intensity, age ranges, temperature ranges, etc. However, assigning di erent colors for long sequences (> 10 events) can still be a challenge. Users can assign the same color to consecutive events. For instance, page 27 and 28 in the ICDL study are encoded with the same color. 3. A combination of both: An advanced color scheme can combine distinct and sequential colors. As seen in the cement truck study (Figure 6.27), dis- tinct colors are used primarily and sequential colors are used partially to indi- cate continuity of loading cement (StartLoadCement ! FinishLoadCement) and lling cement (StartFillCement ! EndFillCement). StartLoadCement and StartFillCement are encoded with blue while FinishLoadCement and EndFillCement are encoded with light blue. 256 6.11.4 Manipulating the representation to gain insight Users can apply di erent visual operators available in LifeFlow to change the visual representation, search for pattern, explore results and gain insight. To guide a systematic analysis, I have included a methodology for event sequence analysis, which was inspired by the 7-step methodology for social network analysis by Perer and Shneiderman [89]. 6.11.4.1 Manipulating the representation Users can use the following visual operators to manipulate the LifeFlow visu- alization and display di erent perspectives of the data. 1. Change alignment: An extreme approach is to try all event types as align- ment points. However, in a more practical approach, users pick only event type x that is interesting to ask what happen before or after x. 2. Include/Exclude event types: After each alignment is set, users can in- clude/exclude di erent combinations of event types to ask what are the rela- tionships between event types e1, e2, e3, . . . , en. An extreme approach is to try all combinations, which could be very time-consuming. Event types that are irrelevant to an alignment point can totally be excluded. For example, in the ICDL web logs, when a visualization is aligned by the second page of the books, an event type \enter wrong password during log in" seems to be irrelevant. 3. Include/Exclude/Reorder attributes and use them to group records: 257 After an alignment and a set of event types are speci ed, users may ask if there is any di erence between groups of event sequences when they are grouped by attributes a1, a2, a3, . . . , an?", or what are the proportions of each group. Life- Flow allow users to include/exclude attributes and group the event sequences in di erent ways. Using multiple attributes in combination and changing the order of attributes used for grouping were found to be very useful during the soccer analysis. 4. Filter: During an exploration, users can select and remove uninterested pat- terns from the dataset to narrow down. This action can be used in between the three actions above. An example of this strategy is demonstrated in the patient transfer study (Section 6.2.4.2). 6.11.4.2 Exploring results of manipulation After applying the visual operators in Section 6.11.4.1, users examine the updated visualization to extract ndings. If nothing interesting is present, users may continue manipulating the representation or switch to searching. 1. Understand common trends: Examine both sides of an alignment and try to answer what happen before and after the alignment. Extract the major trends from patterns with high proportion. For instance, in the patient transfer study, most patients came to the ER and were discharged (Figure 6.1). In the cement truck study, most of the event sequences follow a certain procedure (Figure 6.27). In the ICDL study, after aligning by the second page, most of 258 the users seemed to read in order and did not have any event before the second page (Figure 6.19). 2. Extract rare or unexpected patterns: For example, users who read books in backward direction, tra c incidents that last for more than one hundred years, or patients who died before they were transferred to the ICU. 3. Extract di erence between groups: If one or more attributes are used, nd if there is any di erence between groups. For example, in the soccer dataset when grouped by Competition, the visualization shows that Man U scored the rst goal in the English Premier League (EPL) faster than in the UEFA Champions League (UCL) (Figure 6.38). 4. Examine distributions of time gaps: Look for interesting gap information (e.g., three goals occurred on the rst minute) or irregularities (e.g., a skewed distribution of Agency C?s clearance time in the tra c incident study). When there are many gaps, the rank-by-feature panel (Figure 3.14) can be used to recommend interesting distributions. 5. Examine summaries of record and event attributes: Look for frequent record attributes and event attributes. For example, nd top goalscorers by looking at frequency of event attribute Player of event Score (Figure 6.45). 6.11.4.3 Searching for patterns According to the case studies, there were a few situations that trigger a search. 259 1. During exploration, users nd an interesting pattern . For example, in the hos- pital readmission study, my collaborator saw an interesting pattern (Registration ! Discharge! Registration! Die) from one patient record and searched for other patients with that pattern (Section 6.5.6.6). 2. Once users are familiar with and gained better understandings of their datasets from manipulating the representation, they may question existences of partic- ular scenarios. As reported in the soccer study, my collaborator asked towards the end of the session if there was any match in which Man U missed a penalty and then conceded a goal (Section 6.9.4.4). 3. It is also possible that users are already familiar with their datasets and can start an analysis by testing their hypotheses about existences of particular patterns. 6.11.4.4 Exploring search results After the search is executed, users can explore search results to learn and gain new insight about the patterns that they have just searched for. Similarly, if nothing interesting is present, users may continue searching or switch to manipulating the representation. 1. Examine exact match results: See if a speci ed pattern exists in a dataset. Count the number of results. Do the results match users? expectations? 2. Examine other results: If the exact match results are empty, examine other 260 results to con rm that the speci ed pattern does not exist or check if there is any similar pattern. Even if the exact match results are not empty, there could be interesting similar patterns. For instance, my collaborator searched for a match that Man U received a red card and then conceded a goal. The exact match results are empty. However, the other results show a match that Man U received a red card and then scored|to the surprise of both of us (Figure 6.50). 6.11.4.5 Handling ndings Each nding from an exploration (Section 6.11.4.2 and 6.11.4.4 ) falls into one of these three categories: positive (e.g., one that helps users answer their questions), negative (e.g., one that tells them the answer they seek cannot be answered) and unexpected (e.g., an unanticipated characteristic of the data is found and led to new questions) [133]. During this step, users may also rede ne their analysis goals according to their ndings. When users reach a positive nding, they would try to use their domain exper- tise to comprehend and explain the nding. If the observation sounds and provides interesting insights, they would record the results for dissemination. For example, in the ICDL study, we found sequences in which visitors accessed previous pages before continuing to the next pages, such as 3! 4! 3! 4! 5. Using an understanding of how a person normally read a book, we implied that the sequences re ect behav- iors of people who ipped back to a previous page to catch up some contents before 261 continuing (Section 6.6.4.3). Sometimes a positive nding also introduces new ques- tions that lead to further analysis. For instance, in the soccer study, after noticing that Man U often scored rst (Figure 6.43), an immediate follow-up question was how fast did they usually score? This encouraged us to analyze the distribution and found that the three fastest goals occurred on the rst minute (Figure 6.44). On the other hand, a negative nding could indicate a roadblock. In many cases, this roadblock can be passed with more data, or better preprocessing and cleaning, which will push users back to the information gathering step. For ex- ample, a physician wanted to nd common diagnoses for patients? rst visits (Sec- tion 6.5.6.7). However, the diagnoses were very detailed|such as \Chest Pain Nos", \Chest Pain Nec"|and, therefore, could not be grouped to show common trends until these diagnoses are preprocessed into higher level categories. Some negative ndings are dead-ends which cannot be continued. Some are due to insu cient data or limitations of LifeFlow. In the hospital readmission study, the discharge times- tamps were not guaranteed to be accurate, therefore preventing any meaningful interpretation of the time from registration to discharge. The software limitations might be resolved by adding new features or delegating further analysis to exter- nal tools. For example, I was asked if LifeFlow can produce a tabular report that summarizes number of patients according to number of visits (1,000 patients visited once, 200 patients visited twice, and so on). LifeFlow can provide number of pa- tients via tooltips, but does not provide a table that contains all the numbers. This feature can be added or users can use an external tool to perform this task. An unexpected nding is a wildcard that could lead to discovery. Similar to 262 a positive nding, if an observation sounds and provides interesting insight, users would record the results for dissemination. For instance, we found that some visi- tors accessed books on ICDL website in backward direction (Section 6.6.4.4). Un- fortunately, many of these unanticipated ndings are just illogical sequences, which indicate data errors and push users back to the information gathering step. 6.11.5 Producing and disseminating results Once users reach their analysis goals, the last step will be preparing their ndings to share with colleagues. I separate the work in this step into three phases: recording nding, producing results and disseminating results. 6.11.5.1 Recording ndings This phase occurs immediately after users handle ndings and nd interesting insight. Users record important information that will be re ned into the nal results. After that, if an analysis is not completed yet, users will go back to manipulate the representation to gain more insight. There are a few common activities in this phase. Record ndings : Users jot down the ndings, e.g., \In all matches that Man U won in the UCL, they scored the rst goals" or \Patients with at least six visits, which is 2% of patients, account for 4.3% of visits." Record provenance: Users record an analysis route that took them to their ndings. The provenance is very important because it allows users to repro- duce the ndings and explain their reasoning processes. 263 Capture screenshots : Screenshots are taken and sometimes annotated. Export data: In some cases, users also export the nal dataset for further analysis in external tools. For instance, in the patient transfer study, the physician can export medical record numbers of the bounce back patients to examine their medical histories in more detail. 6.11.5.2 Producing results After all analyses have been completed, users compile all raw ndings and include supplemental information from external sources into nal results. If neces- sary, users can reproduce ndings to create better screenshots, capture more detailed information, etc. Users may also gain more insight while reproducing results. 6.11.5.3 Disseminating results When LifeFlow users have extracted their insights, the screenshots can be annotated (graphically and textually), captions can be added and a report generated as a slide presentation, document, or video. Live presentations to groups and email dissemination can reach stakeholders who will make decisions about taking action or re ning the analysis. 6.11.6 Summary The visual analytics process model described in this section is designed to guide users in learning how to explore event sequences and help designers improve future 264 visual analytic tools. This iterative process promotes insight discovery through multiple steps that involve data understanding, data cleaning, choosing represen- tations, hypotheses generation, hypotheses validation and extraction of evidence to share with colleagues. Currently, this process is conceptual and not yet implemented in the user interface, but could be in the future. 6.12 Design Recommendations The case studies, usage statistics and observations have revealed the strengths and weaknesses of LifeFlow. I have noticed many interesting user behaviors and generalized these into the a list of recommendations for designing a user interface for exploring event sequences. 1. Use Align-Rank-Filter (ARF) framework: The ARF framework [131, 132] was originally developed for a list of multiple event sequences. However, applications of this concept are not limited to the list view only. Alignment works well with the LifeFlow overview and has proven to be one of the most useful features in the case studies. Changing the reference point often opens up new perspectives and triggers new questions from a new angle. Ranking o ers a way to reorder and organize visual components to gain new insight, such as ranking the LifeFlow sequences by time instead of cardinality. Filtering allows users to trim the dataset and focus on a subset of data. The ltering mechanisms interact directly with an underlying data model and, therefore, can operate independently from any visual components. 265 2. Provide multiple levels of information that can be accessed in an organized manner: Event sequence information can be grouped into three levels. Users should be able to drill down to a lower level from a higher level. (a) Event-level : Information speci c to each event, such as time or event type. This is the lowest level. (b) Record-level : Information speci c to each record, such as ID, and aggre- gations of event-level information within each record, such as number of events in each event type or sequence of events. (c) Overview-level : Aggregations of record-level information, such as num- ber of male patients, and aggregations of event-level information across multiple records, such as a distribution of events over time or a frequency table of all sequences of events. This is the highest level. 3. Provide multiple overviews from di erent aspects: An overview usually requires data aggregation that sacri ces some aspects of the data. Therefore, it is very unlikely that only one overview will be enough to summarize all aspects of the data. For example, LifeFlow can provide a summary of sequences and gaps between sequences, but it cannot show the temporal aspects as well as a histogram, e.g., how often a patient visited in January 2010. To this end, I believe that having multiple overviews will complement each other and provide a better big picture. 4. Use coordinated multiple views that are loosely coupled: The ben- 266 e ts of having multiple views is when using them in combination, they can complement each other. This can be achieved through brushing and linking. In addition, not every task will require all views at the same time. There- fore, each view should be independent and can function without knowing or requiring the existence of any other view, following the Model-view-controller (MVC) paradigm. This way, each view can be hidden or displayed as needed, to better utilize screen space. 5. Incorporate non-temporal attributes: It is challenging to gain insight from temporal information alone. Including other attributes in the analysis, such as patients? genders, physicians? names, can provide a better understand- ing of the data. These attributes sometimes can help explain the cause and e ect relationships between events. 6. Support rapid inclusion/exclusion of event types and grouping event types into hierarchy: As evidenced in the case studies, an ability to include and exclude events rapidly allows quick explorations of subsets of the data without going back to another round of preprocessing. Several users expressed also their needs to dynamically change the granularity of the event types. For example, ICU can be split into ICU-2ndFloor and ICU-3rdFloor. In the beginning of an analysis, users want to treat both event types as ICU, but in the later stage, they want them to be separated. 7. Provide search: Search may often be a forgotten feature in the beginning of an analysis, but once users have identi ed an interesting pattern, it becomes 267 a valuable feature to have. Users can come back with the same query a month later or use the same query in another dataset. 8. Support data preprocessing and cleaning: Data preprocessing and clean- ing are inevitably parts of the exploratory data analysis. Many visualization systems delegate these tasks to external tools [58, 57]. However, by including some simple tasks in the user interface, it can reduce the gap between prepro- cessing and analysis and facilitate a smoother and more continuous process. 9. Keep history and handle mistakes: An exploratory data analysis consists of many iterations. Without proper support from the user interface, users will have hard time remembering what has been done to the data. The nal results should not be only insight, but also analytic provenance|a user?s reasoning process through the study of their interactions with visualizations [94, 42]. Therefore, the user interface can help by storing history. Also, every once in a while, users will make a mistake. In my research prototype, \Revert to original" was found to be a successful feature because it provides a safe point for users to backtrack to, without reloading the dataset. However, being able to undo will prevent them from starting over from the beginning and save more time. 6.13 Summary As much as it is di cult to judge a book by its cover, it is di cult to evaluate the bene ts of a rich interactive visualization and user interface in a short period of 268 time, especially in a controlled environment. This chapter describes my attempts to engage with the users, learn and share the understandings of their data and problems, and use the innovations from my research to tackle their problems in a new way. Aiming to satisfy users? needs drives my research towards practical use in addition to its theoretical contributions. Many limitations from tiny design issues to requirements for new useful features were identi ed. Many bene ts were con rmed and new use cases keep being discovered. These case studies provide bene ts both to my dissertation research and to the satisfaction of the users, many of which are dealing with critical problems in healthcare. By helping them improve their processes and save more lives in the future, my research is becoming a small part in changing the world into a better place. My experience from these case studies also led to a process model and a set of design recommendations for temporal event sequence exploration that will be useful for future designers. 269 Chapter 7 Conclusions and Future Directions 7.1 Conclusions Temporal event sequence analysis is an important task in many domains: med- ical researchers may study the patterns of transfers within the hospital for quality control; transportation experts may study tra c incident logs to identify best prac- tices. While previous research has focused on searching and browsing these event sequences, overview tasks and exploratory search are often overlooked. This disser- tation describes four contributions that provide new ways to interactively explore temporal event sequences and describe recommendations for future designers. All of which are summarized as follows: 1. Overview visualization: Lengthy reports often have an executive summary to provide an overview of the report. Unfortunately, there was no executive summary to provide an overview for event sequences. Therefore, the rst con- tribution of my dissertation is developing an interactive visualization called LifeFlow|which includes data aggregation, visual representation and inter- action techniques|to provide an overview of millions of event sequences in a million pixels. By seeing this overview, users can grasp the big picture of the dataset and often start asking questions about their data that they have never thought of before. A user study with ten participants con rmed that even 270 novice users with 15 minutes of training were able to learn to use LifeFlow and rapidly answer questions about the prevalence of interesting sequences, nd anomalies, and gain signi cant insight from the data. Several MILCs illustrate its bene ts in supporting event sequence analysis in real world prob- lems. 2. Similarity search interface: In an exploratory search, users can be uncer- tain about what they are looking for and try to nd event sequences that are similar to the query. I have developed the M&M measure, a similarity measure, and Similan, a similarity search interface, to support this uncertain scenario. The rst version allows users to search for event sequences that are similar to selected target event sequence. A user study showed promising direction but also identi ed room for improvement. The second version addresses the limitations from the rst version. The user interface allows users to draw an event sequence example and search for records that are similar to the example, with improved search results visualization. The similarity measure was also sped up and could be customized by more criteria, increasing its performance and exibility. After that, a controlled experiment was conducted to assess the bene ts of exact match and similarity search interfaces for ve tasks, leading to future directions for designing query interfaces that combine the bene ts of both interfaces. 271 3. Hybrid search interface: Searching with an exact match interface, users may not know what they have missed, but they have con dence in the result sets and can count the number of records easily. Searching with a similarity search interface, users can see all records sorted by similarity but have di culty deciding how many records should really be accepted to the result sets. The Flexible Temporal Search (FTS) was then developed to combine the bene ts from both interfaces and provide a more powerful search, so that the users do not have to choose between the two previous interfaces and make trade- o s. FTS allows users to see a clear cut-o point between records that pass the query and records that do not, making it easier to count and giving more con dence. All records are sorted by similarity, allowing users to see the \just- o " cases that are right beyond the cut-o point. FTS was found useful in medical scenarios and was often used in the case studies once the interesting patterns were identi ed from the LifeFlow overview. 4. Case Study Evaluations, a Process Model and Design Guidelines: To evaluate the bene ts of this research outside of laboratory environment, I had recruited domain experts as users and applied these new techniques to tackle their data analysis problems or o er new ways to analyze their data. While this dissertation work was initially motivated from medical scenarios, the variety of applications shown in the case studies demonstrate its generalizability to many domains where event sequence is the main focus. The experience and lessons from these studies also led to a process model for exploring event sequences 272 and a set of design recommendations for future visualization designers. 7.2 Future Directions One goal of this dissertation was to explore this research area and open up new directions for future researchers. Throughout my PhD study, I have encountered many limitations in my approach and discovered many new frontiers that I believe are interesting future directions. Some of which are more speci c while others are more open-ended. I summarize and describe them in this section. 7.2.1 Improving the overview visualization Although the evaluations demonstrated many bene ts of LifeFlow in diverse applications, it also identi ed limitations to the design. Improvements would be welcome for: 1. Representing event types: LifeFlow uses color to represent event types. This approach has limitations when there are many event types. It is very di - cult to choose di erent colors for more than 10 categories. Current workarounds are using many shades of each color to create variations or use same colors for many event types, but it requires users to carefully pick the colors and does not always solve the problem. However, there are also bene ts of using color. It creates pre-attentive e ect, allowing users to notice some event types easily. Using colors instead of text labels also make the visualization compact and clean. Future researchers who nd a better way to represent numerous event 273 types, will facilitate still wider usage. 2. Displaying very long event sequences: The current visualization needs a better way to handle datasets with very long sequences, i.e., sequences with a large number of events. Each bar in LifeFlow is assigned a xed width. The gaps between the bars are used to represent time. When displaying the overview of the entire dataset before any zooming or ltering, it should be able to display everything. When the sequences are long, the gaps between bars are compressed to t all the bars on the screen. However, in the worst case, the sequence of bars still cannot t on the screen even when all the gaps are zero pixel. New designs might be needed to support these very long sequences. 3. Alignment: When the alignment is set, users often get confused that the sequences on the same vertical position on the left and right side are the same records. Actually, they are not, and the two sides (before and after the alignment) are independent. Even expert users sometimes forget this fact. 4. Displaying gaps between two non-consecutive events in the sequence: The current design can display a sequence, such as A ! B ! C, and gaps between consecutive events (A! B and B! C). However, it does not show the gap between non-consecutive events (A! C). The horizontal distance between A and C are the sum of the mean/median time between A ! B and B ! C, which may be not equal to A ! C. I addressed this limitation by adding an interaction (measurement tool) that allows the users to measure the time from A to C. However, many users did not realize in the beginning that horizontal 274 width of an entire sequence does not represent the time from the rst event to the last event in the sequence. Future design needs to emphasize this limitation to avoid misunderstanding. Moreover, there might be a way to include gaps between non-consecutive events in the visual display. 7.2.2 Improving the search The FTS has evolved through many design iterations|beginning from the similarity search in Similan. I have received many suggestions from users and col- leagues along the way and used them to improve the design. However, there are still many improvements that can be made. 1. Clustering results: Instead of showing other results as a list of records. It could be useful to cluster those records by the reasons that they are excluded from exact match results. This way, records that fail similar constraints will be grouped together. By providing a visualization of these clusters, users can quickly understand an overview of the results. 2. Caching intermediate results to speed up the search: Currently, the query has to be re-executed every time users adjust weights, which makes it computationally expensive. It would be more e cient to cache intermediate results that can be reused to avoid unnecessary computation. The challenge is designing what could be stored to save the computation. 3. Weights: The FTS user interface allows users to adjust weights and cus- tomize the similarity measure. However, this freedom to adjust the knobs 275 could also be challenging for users. Previously in Similan, a few experimental weight presets were implemented and found promising. Developing a set of common weight presets could help users avoid adjusting the weights manually. Furthermore, a user study that focuses on weight adjustment could provide a better understanding of this issue and help designers learn how to design a better user interface. 4. Including attributes in the query: The FTS focuses only on the tem- poral aspect of the event sequences, by looking only at the order, existence and timing of events. However, there is also non-temporal aspect of the event sequences. They have attributes that should be included in a more completed query. The attributes include record attributes, such as patient?s gender, and event attributes, such as room number or diagnosis for each visit. Includ- ing these attributes in the query will introduce new challenges to both the similarity measure and query speci cation. 7.2.3 Supporting more complex data Event sequence data also have di erent complexity in each dataset. These following characteristics require further research to support them: 1. Many similar patterns: The current tree building algorithm groups the sequences by their pre xes based on the idea that order of occurrence is im- portant. However, in some datasets, there are many similar sequences that should have been grouped together to reduce the complexity of the visualiza- 276 tion, but are not grouped because they have di erent pre xes. For example, A ! B ! C ! D and A ! B ! D. Two interesting questions are (1) how to determine what should be grouped together, and (2) what should be displayed on the visualization? 2. Streaming data: The techniques in this dissertation were originally devel- oped for static dataset. However, these techniques might be extended to sup- port streams of real-time event sequences. 3. Interval-based events: My research focuses on point events, where each event represents a point in time. However, some events are not just points in time but have intervals. For example, drug usage contains a start and end time. This increases the complexity of both the search and visualization to handle new types of relationships, such as overlap, contain, etc. New approaches have to be developed to address these new challenges. 4. Multiple concurrent activities: In some cases, a process can be viewed as a combination of multiple concurrent activities. For example, in the hospital, patients with di culty breathing were provided di erent breathing devices (e.g, CPAP ! BIPAP ! Intubation) throughout the treatment. At the same time, they were transferred between rooms in the hospital (ER ! ICU ! Floor). This process, as a result, consist of two concurrent event sequences: change of devices and room transfers. Each event sequence can be analyzed individually using the work in this dissertation. However, these two activi- ties are connected and can provide more complete information when analyzed 277 together, and therefore should not be separated. There are opportunities for new techniques to handle multiple concurrent event sequences. 5. Concurrent event sequences and numerical time series: Some pro- cesses may not only consist of multiple activities but also include changes to numerical variables over time. For example, the patients were transferred between rooms while the level of blood pressure and heart rate were moni- tored. Could future researchers develop techniques to help users understand relationships between all these event sequences and numerical time series from multiple patients? 7.2.4 Supporting new tasks Many exploratory tasks in analyzing event sequences were identi ed during the MILCs. Many were supported and included in this dissertation work. However, there were also some tasks that are beyond the scope of this dissertation but show potential future directions. 1. Comparison: Comparison is one of the interesting tasks that should be ex- plored further. The aggregated data from LifeFlow, a tree of sequences, could be used to compare performance of one hospital to another, or performance of the same hospital in di erent periods. Using LifeFlow, users can inspect manually to see the di erence between two visualizations, but it is still di - cult to see the changes clearly. I believe that there are research opportunities for supporting comparisons between multiple tree of sequences, or other trees 278 with values in nodes and edges, in general. 2. Ranking LifeFlow by Feature: An extreme way to explore a dataset is to try all combinations to include/exclude event types and attributes, which could lead to a large number of LifeFlow visualizations. Analyzing all of them can be very exhaustive and time-consuming. Providing a rank-by-feature framework that can suggest interesting LifeFlow visualizations will be very helpful for the analysis. The main challenge is de ning ranking criteria. 3. Associating outcomes with sequences of events Many event sequences have associated outcomes. For example, outcomes for medical records could be measured by cost, mortality or discharge rates. Connecting sequences of events to their associated outcomes can help data analysts discover how certain sequences of events may lead to better or worse outcomes. 4. Investigating the cause and e ect relationships between events: The work in this dissertation treat all events in an event sequence as observations, or e ects, and try to show an overview of the e ects. However, some event types could also be considered as factors that may cause the e ects. Such factors, such as the administration of a drug to a sick patient, or a soccer player receiving a red card (which leaves his/her team short-handed), can often change the course of subsequent events. Understanding how these factors in uenced the following subsequent events is another important task. Users should have the options to treat some event types di erently as factors and the visualization should help users detect factors that may in uence the e ects. 279 7.2.5 Scalability and Performance My research focuses on the design of the visualizations and user interface, which are the front-end components. However, the ideal visual exploration tool also need to be able to handle large datasets e ectively and provide responsive feedback independent of the size of the dataset, and that requires more work in the back-end components. 1. In-memory issue: For portability and ease of deployment, the working pro- totype was designed to load all data into memory. However, this design choice limits the size of the largest datasets to available memory. A scalable design should hold only necessary data in the memory and store the rest of the data in the secondary storage via databases or other techniques. 2. Database: Use of database can resolve the in-memory issue and also improve the performance when the database is properly indexed. A carefully-designed data model, indexing and query optimization can increase the performance of the event sequences query and other operations in the back-end of the visualization. 3. Rendering: Often, when dealing with large datasets, the user interface be- comes less responsive. This is partly because of the data processing, but another reason is also the rendering. It is possible that a dataset with millions of records can lead to millions of objects on the screen. Rendering these many objects is a very heavy task. In addition, these objects need to be tracked for event handling and user interactions. 280 4. Cloud computing: Another rising trend in recent years is the rise of cloud computing and MapReduce clusters. Many time-consuming tasks, such as data processing, query and rendering, can be accelerated by parallelizing and o oading the heavylifting to the cloud. Supporting a gigantic dataset in a responsive manner is a very challenging task, but cloud computing seems promising to make this task possible. 7.3 Summary This chapter summarizes all results and contributions from this dissertation. Each contribution was supported with evaluation results that demonstrate its ben- e ts. I believe that this dissertation has refreshed the way people think about analyzing multiple event sequences and shows the impact of using event sequences for understanding many aspects of people?s lives from book reading behaviors to hospital procedures and more. I have also combined many lessons and feedback learned from my users and colleagues into a list of interesting future directions, which could lead to challenging and fruitful research projects. I believe that there are still many promising opportunities for making sense of these fascinating event sequences, waiting for the research communities to explore. 281 Appendix A Examples of Temporal Event Sequences To exemplify how my research can be used beyond the case studies mentioned in this dissertation, help prospective users decide whether their datasets are appli- cable or not, and inspire future researchers about possible datasets, I have compiled a list of additional event sequence data that I see the possibilities of exploring them with the work in this dissertation. 1 1. Human activities : Record your daily activities for a period of time and use the data to re ect on yourself. For example, what is the average time gap between when the alarm clock rang and the time that you actually woke up? How long do you take a shower in the morning? 2. Bus stops : Keep track when the buses arrive at each bus stop. Each record is one bus trip. By analyzing these data, user can track the buses? performance or nd bottlenecks on the route. 3. Web logs : Each record can be a user or an IP address, and contains all the pages visited.This can be used to analyze the order that visitors visited pages on websites. How long did they spend on the pages? Which page did they visit next? 1Note: Each dataset may require additional cleaning and preprocessing. 282 4. Usability logs : Analyze how users use an application or user interface. See what features do users often use or what features are often used after certain actions? 5. U.S. Bill status : The Library of Congress keep history of the U.S. Bills under a system called Thomas (http://thomas.loc.gov/). A bill history usually follows this pattern: propose ! vote ! pass/decline/amend ! etc. Each bill can be declined and revised many times, leading to di erent patterns. 6. Sports events : In the case study, I demonstrate how to analyze soccer events. Similar analyses can be performed on football, hockey, baseball, and other sports. 7. Role-playing game (RPG) characters evolution: In role-playing game, such as World of Warcraft or Ragnarok (Figure A.1), characters can evolve by advancing to higher job classes, for example, novice ! thief ! assassin. Players have to collect experience points or complete requirements to achieve the promotion. In game administrators? and developers? perspective, it might be interesting to learn which jobs the players often chose or how long it took for each job transition, so they can adjust the di culty and make the game more balanced. 8. Bug tracking : Project management and bug/issue tracking software, such as TRAC, are widely used in many companies. Users can report bugs by creating tickets. Each ticket contains time when it was reported, accepted, completed, 283 Figure A.1: Ragnarok job tree 284 reopened, etc. Users can also indicate the importance level of each ticket or assign a ticket to a particular developer. These can be used as attributes. 9. Package tracking : Millions of packages are being delivered everyday by carriers such as FedEx, UPS or DHL. How many of these packages reach the nal destination? How many of these packages are delivered by the guaranteed time? How often they have to try the second or third delivery attempts? 10. Journal paper review process : Publishing in a journal often requires many rounds of revising and reviewing. A paper submission could lead to an im- mediate acceptance, a request for a revision, or a rejection. The revision also follows the same process again. The publishers might be interested in analyz- ing the peer review process to see how many papers were rejected? How long does it take to publish? What are the common paths? Is there any steps that take too much time and can be accelerated to reduce submission-to-publication time? 11. Medical check-up: Each person has to go through several stations in check- up process, for example, blood pressure checking, chest x-rays, pulmonary function testing, blood test, urinalysis, etc. By tracking the time spent in each stations and waiting time, the hospital administrators can learn how to improve the check-up process and customer?s experience. 12. Researchers? publications : A publication can be viewed as an event, which can be categoried using the type of the venue (journal, conference, workshop, 285 ldots), type of publication (full paper, short paper, note, ldots), authorship ( rst author, second author, ldots), and other attributes. The event sequences of publications could be used to review researchers? performance. 13. Student progress : Academic institutions keep track of when students take classes, propose their dissertation topics, publish papers and graduate. These academic histories can re ect the quality of education. 14. Manufacturing process : For many manufacturing processes, there are certain procedure that needs to be followed. Each step in the process can be tracked and recorded. Production managers then can review the current process and nd room for improvement. 15. Actions on social networks : Billions of users are using the social network, such as Facebook or Twitter. What are they doing? Is there any interesting pattern among the random activities? Can we learn their behaviors from their actions? 16. Recurring attendees of an event : There are many annual conferences, sym- posiums, workshops, meeting, etc. Event organizers may wonder how many attendees keep coming back? The organizers can keep records of their atten- dees to answer this question. Each record represents an attendee and contains all events that he/she attended. For example, Mr. Brown attended CHI?08, CHI?10, CHI?11 and CHI?12 while Miss Scarlet attended CHI?06, CHI?07, CHI?08, CHI?10 and CHI?11. 17. StarCraft build orders : Many of the industry?s journalists have praised Star- 286 Craft as one of the best and most important real-time strategy video games of all time. It is particularly popular in South Korea, where players and teams participate in professional competitions, earn sponsorships, and com- pete in televised tournaments. A build order is an early game strategy that is designed to optimize economy and timing. It indicates an order in which buildings and units should be produced to maximize e ciency. Expert players can record their games and parse the build orders from their recorded replays. Using LifeFlow, these players can nd the steps in build order that they often made mistake to improve their skills. 287 Appendix B LifeFlow Software Implementation B.1 Overview LifeFlow is implemented in Java. Its runtime requirements are machines with Java Runtime Environment (JRE) 1.6 or above, with at least 1024x768 screen reso- lution. It has been tested on Mac OS X 10.5-10.7, Windows XP and 7, and Ubuntu Linux. As of March 20, 2012, the LifeFlow SVN repository contains 61,188 lines of code from 867 revisions. This appendix describes its input data format, software architecture and implementation overview. B.2 Input Data Format To simplify deployment and reduce adoption overhead, LifeFlow does not re- quire any database installation and accepts text les as input data. It uses two input data les and one con g le: (Required) Event data le (*.txt) contains all events and event attributes. (Optional) Attribute data le (*.attrib) contains record attributes. (Optional) Con g le (*.xml) keeps the preferred settings of visualization. 288 Table B.1: Sample event data le (sample data.txt) Record ID Event type Date and Time Event attributes 4318 Arrival 2010-01-19 03:53:00 Entrance-Diagnosis=\Back pain" 4318 ER 2010-01-19 04:14:00 4318 ICU 2010-01-19 04:49:00 Room No.=\125" 4318 Floor 2010-01-19 16:12:00 Room No.=\854" 4318 Exit-DISC 2010-01-20 06:44:00 Physician=\Dr. X"; Exit-Diagnosis=\Back pain" 5397 Arrival 2010-01-19 06:53:00 Entrance-Diagnosis=\Heartburn" 5397 ER 2010-01-19 07:20:00 5397 Floor 2010-01-19 07:46:00 Room No.=\854" 5397 Exit-DISC 2010-01-20 09:44:00 Physician=\Dr. J"; Exit-Diagnosis=\Heartburn" B.2.1 Event Data File An event data le is a 4-column tab-delimited text le (Table B.1). First column is record ID. Second column is event type. Third column is date and time in YYYY-MM-DD HH:MM:SS format. Fourth column contains event attributes in the format: attribute-name1="attribute-value1" ; attribute-name2="attribute-value2" ; ... Use ; to separate between attributes. Leave this column blank if there is not any event attribute. B.2.2 Attribute Data File (optional) An attribute data le stores non-temporal attributes. This le is not required if there is not any record attribute. It is a 3-column tab-delimited text le (Table B.2), which has the same name as the event data le but with extension *.attrib, instead of *.txt. 289 Table B.2: Sample attribute data le (sample data.attrib) Record ID Attribute name Attribute value 4318 Gender Male 5397 Gender Female First column is record ID. Second column is attribute name. Third column is attribute value. B.2.3 Con g File (optional) A con g le helps users save their customized settings, such as colors of events, and reuse them in the next analysis. It is an XML le, which is self-descriptive. B.3 Design B.3.1 Software Architecture LifeFlow architecture follows the Model-View-Controller (MVC) paradigm. In the implementation, I adopted PureMVC, a free open source MVC Framework (avail- able from http://puremvc.org). In PureMVC, each view is wrapped by a mediator while each model is wrapped by a proxy. The mediator and proxy provide PureMVC functionalities while leaving the original view and model independent from PureMVC. The central Controller is called facade, which acts like a mothership that knows about all views and models. The facade let all views and models communicate without knowing each other. The 290 facade also maintains a list of commands that can be triggered by noti cations, such as loading dataset or exit. Commands handle actions that involve multiple components or actions that should not be handled by mediators or proxies. Each view is independent and do not know that other views exist. However, when users perform an action in one view, that action can trigger changes in other views although the views do not know each other. To achieve this, these components send noti cations. When a noti cation is issued, the facade (i.e. the mothership) will notify all mediators and commands that are registered to receive that type of noti cations. Each view has to declare types of noti cations that it will handle. In the ideal MVC, no views and models will know about each other. In Life- Flow, I relaxed this concept a bit by allowing a mediator (view) to know about proxy (model). A reference to the proxy of dataset is stored in each mediator. This allows direct calls to the dataset proxy without passing messages through the facade. B.3.2 Code Organization The code are organized to separate classes that are reusable and classes that are speci c to this software. 1. Reusable: Most of the classes in LifeFlow are designed to be indepedent and reusable. More generic classes, such as time, counter, popup menu or color palettes, are split into package edu.umd.cs.hcil.frozenlife. Classes that are more speci c to the work in this dissertation are organized in the following packages: 291 (a) Data structures : Events, records and tree of sequences are contained in the package edu.umd.cs.hcil.lifeflow.datastructure. (b) I/O : Package edu.umd.cs.hcil.lifeflow.io contains classes that han- dle importing/exporting data from/to text les. (c) ARF : Package edu.umd.cs.hcil.lifeflow.arf contains classes that provide di erent kind of alignment, ranking and ltering for the Align- Rank-Filters framework. (d) Similarity measure: The similarity measure used in FTS and related classes are contained in the package edu.umd.cs.hcil.lifeflow.similarity. (e) User interface components : Package edu.umd.cs.hicl.lifeflow.components contains LifeFlow, LifeLines2, FTS and other visual components. 2. Application-speci c: Classes that are speci c to LifeFlow and cannot be reused are contained in the package edu.umd.cs.hcil.lifeflow.mvc. These classes make use of the reusable components and combine them into the Life- Flow software. (a) Package edu.cs.umd.hcil.lifeflow.mvc.baseclasses contains custom extensions to the PureMVC classes. (b) Models are in the package edu.cs.umd.hcil.lifeflow.mvc.model. (c) Views are in the package edu.cs.umd.hcil.lifeflow.mvc.view. (d) Commands are in the package edu.cs.umd.hcil.lifeflow.mvc.command. 292 B.4 Data Structures B.4.1 Time These two classes are often used to represent time in LifeFlow: 1. AbsoluteTime: Real time, such as Jan 12, 2012. This class wraps Java?s stan- dard Calendar class. This object can be converted to number of milliseconds since January 1, 1970, 00:00:00 GMT, and vice versa. 2. RelativeTime: After an alignment is set, the time is represented as time rel- ative to the alignment time, such as 1 month or -2 years. Relative time can be positive (after alignment time), negative (before alignment time) or zero (at alignment time). This object can be converted to number of milliseconds after alignment time, and vice versa. These two classes also provide many useful methods for manipulating time. Both classes extend the ITime interface. B.4.2 Fundamental Data Structures These are fundamental building blocks in LifeFlow. All classes are contained in the package edu.umd.cs.hcil.lifeflow.datastructure.tc. \TC" stands for \Temporal Categorical". Event : A tuple of time and event type, for example, (August 17, 2008, Arrival to a hospital). Time is stored as a number of milliseconds since January 1, 1970, 00:00:00 GMT. An immutable class TCEvent represents an event. 293 Event Attribute: Property of an event. For example, attending physician and diagnosis are attributes of each visit (Arrival event). Event attributes are stored in a map (keys,values) in each TCEvent object using attribute names as keys and attribute values as values. Record : One entity of temporal event sequence, for example, a patient. Class TCRecord represents an event. TCRecord contains a unique ID and multiple lists of TCEvent: one list that contains all events and one list per event type. We separate this to optimize ltering events by event types. All TCEvent in these lists are sorted by time. Record Attribute: Property of a record that is not associated with time, for example, gender and age. Record attributes are stored in a map (keys, values) in each TCRecord object using attribute names as keys and attribute values as values. Instance: Each copy of record that is shown on the visualization. Class TCInstance represents an instance. It wraps a record and stores an align- ment time, a ag that indicates if the instance is selected. One record usually have only one instance with an alignment time at zero, meaning that an align- ment point is not speci ed. One record can also have multiple instances. For example, patient no. 10002 has three Floor events. After aligning this record using all Floor events, LifeFlow will duplicate the record into three instances of this record, each with an alignment point at one of the Floor events. Many data structures in LifeFlow are extended from TCInstance using the decorator 294 design pattern. Group of Instances : Multiple instances of the same record are grouped to- gether in an TCInstanceGroup object. Both TCInstance and TCInstanceGroup implements an interface ITCInstance. List of Instances : Class ITCInstanceList is a list of ITCInstance, which can be an instance (TCInstance or its extensions) or a group of instances (TCInstanceGroup). It keeps track of selected records and instances and pro- vides methods for rapid selection of records and instances. B.4.3 Handling Dataset Each dataset is loaded from text les into a datastructure called RawDataSet. RawDataset consists of a mapping from category into colors (CategoryLookUpTable) and an immutable array of all records. In the beginning of an analysis, when users select a dataset and click on \Ex- plore", a new working dataset (WorkingDataSet) is created from the selected raw dataset RawDataSet. A working dataset contains lists of instances and parameters for the visualizations. Each TCRecord from the raw dataset is converted into an TCInstance and added to a list of instances. This list is called the \original list" (originalList). A \working list" (workingList) is copied from the original list and used for all operations. When users revert the dataset, a working list is cleared and copied from the original list again. A new working dataset can also be created from another working dataset. In 295 this case, the original list in the new working dataset is copied from the current working list in the old working dataset. Class WorkingDataSet provides methods for manipulating dataset, such as align, lter or select. Class WorkingDataSetProxy wraps the WorkingDataSet to provide the same functionalities and send MVC noti cations when the data are changed. Class CoreModel maintains a list of all working datasets that are loaded into the system. It is wrapped into a CoreModelProxy and displayed as a list of all available datasets in the user interface that allow users to choose from. B.5 Main Components B.5.1 Main class Main class is contained in the package edu.umd.cs.hcil.lifeflow.mvc. Dur- ing startup, class LifeFlowStarter initializes LifeFlow application by creating an AppFacade. The AppFacade then instantiates all MVC components and open a MainWindow. The MainWindow contains a menubar and MainPanel, which contains all the panels. 296 B.5.2 LifeFlow B.5.2.1 Tree of Sequences Class TreeOfSequences is the backbone of LifeFlow visualization. It aggre- gates multiple event sequences into a tree that is later rendered as a LifeFlow vi- sualization. TreeOfSequenceBuilder2 takes a list of instances and builds a tree of sequences from it. User can choose to create a positive tree or a negative tree by setting a parameter. It iterates through each instance and add the instance to exist- ing nodes if possible, or add new nodes to the tree if necessary and add the instance to the new nodes. Each node in the tree (AbstractTreeNode) contains markers (EventMarker) that can backtrack to all instances and events that are contained in this node. It also provides methods for nding average time between nodes. B.5.2.2 Flexible Geometry I de ne a coordinate system that can change according to horizontal zoom factor and vertical zoom factor to support semantic zooming in LifeFlow component. A value in the FlexibleGeometry is called a FlexibleValue, which is de ned by two values and one factor. This can be described by the following equation: FlexibleValue v = fixedV alue+ variableV alue multiplyFactor v = v:f + v:v v:F When two FlexibleValue objects share the same multiply factor, they can 297 perform add and subtract operations by adding and substracting their xed values and variable values. v + w = (v:f + w:f) + (v:v + w:v) F v w = (v:f w:f) + (v:v w:v) F It also supports multiplication by a scalar value s. v s = (v:f s) + (v:v s) F A point in the FlexibleGeometry (FlexiblePoint) is a tuple of two FlexibleValue objects: x and y. A rectangle in the FlexibleGeometry (FlexibleRectangle) con- tains four FlexibleValue objects: x, y, width and height. B.5.2.3 LifeFlow Component Class JLifeFlow2 is the LifeFlow visual component. It is a JComponent and can be added to Swing containers. The number \2" is for its version. (I have rewritten the entire component once to clean up the spaghetti code from the rst version.) JLifeFlow uses Piccolo2D, a scene graph library for Java. Rendering a LifeFlow visualization consists of the following steps: 1. Preprocessing : All instances are preprocessed to remove excluded event types. 2. Building tree: Convert a list of preprocessed instances into two trees of se- quences: positive tree and negative tree. 298 3. Plotting : All nodes and edges in the trees are converted into drafts of visual objects. To support semantic zooming, bounds of a draft are de ned using FlexibleGeometry. For x position and width, the multiply factor is pixelPerTime, variable value is time in milliseconds and xed value is a xed value in pixels. For example, Node.width = fixedWidth+ time pixelPerTime For y position and height, the multiplication factor is pixelPerInstance, variable value is number of instances and xed value is a xed value in pixels. For example, Node.height = fixedHeight+ numberOfInstances pixelPerInstance 4. Pinpointing : Calculate the o set of the visualization and multiplication factors (pixelPerTime and pixelPerInstance) according to component size, zoom level and o set of the viewport. 5. Painting : Calculate actual dimensions of the visual objects from computed drafts and zoom factors in the previous steps, create these visual objects and add them to the Piccolo canvas and bind them with event handlers. Only objects that are within bounds and visible are created. 6. Painting selection: Render highlight to show selection. 299 This pipeline can be triggered from any step and all following steps after that will be executed. For example, resizing a window will trigger pinpointing because the size of the window is change and will a ect the zoom factor and o set of the visualization. After the pinpointing is completed, the visualization and selections are painted. B.5.3 LifeLines2 I implement LifeLines2 component in LifeFlow using visual design from the original LifeLines2 software, but did not reuse its code. Class JLifeLines2 is the new component. JLifeLines2 groups instances of the same records together and allows users to collapse or expand the group as needed. It uses an ItemRenderer to render each component. Future developers can write their custom renderers and replace the default renderer with theirs. I include two renderers in this software distribution: one used for default display and another for instances with similarity scores. JLifeLines2 also provides custom events and event handler interface. The rendering process consists of the following steps: 1. Creating draft : Create a draft for each instance in the list. It is also used to keep the state of TCInstanceGroup whether it is currently expanded by user or not. 2. Sort : Sort the drafts according to speci ed criteria. 3. Update position: Calculate position on the screen for each draft. 300 4. Plot time: Plot timeline according to component size and zoom factor. 5. Paint : Draw only objects that are within display bounds and visible, add them to the Piccolo canvas and bind event handlers. For convenience, I implement class JLifeLines2Panel that wraps JLifeLines2 and includes a scrollbar and header. B.5.4 Flexible Temporal Search (FTS) B.5.4.1 Similarity search A SimilarityQuery object contains query speci cation. A query consists of multiple query elements (IQueryElement). When a similarity search is exe- cuted, an engine (SimilarityEngine) is created for the given SimilarityQuery. This SimilarityEngine contains a table for dynamic programming, which can be reused for multiple matchings. The engine performs the similarity search us- ing the given SimilarityQuery on a given list of ITCInstance and return re- sults (SimilaritySearchResult). The results contain two lists of instances that are exact match results and other results. Each instance in these two lists is an ITCInstanceWithSimilarityScore. It wraps the original ITCInstance that was passed to the engined and adds a similarity score following the decorator design pattern. 301 B.5.4.2 User Interface Class SearchPanel is the FTS user interface. It consists of two main parts: 1. Query Panel : This part is a component called QueryPanel. It has its own data model (QueryModel) in the back and can convert from this model to SimilarityQuery, or vice versa. It handles users? interaction to add Glyph or Gap to the model. Glyphs include all events and negations. The QueryPanel utilizes QueryRenderer for rendering di erent types of glyphs and gaps. 2. Results Panel : This panel consists of exact match results and other results. Both parts are JLifeLines2 components with item renderer set to a renderer for displaying an instance with similarity score. When right-click on any in- stance and select \Show comparison", it will initialize a ComparisonDialog and pass the selected ITCInstanceWithSimilarityScore to show the com- parison. The time scale of these two JLifeLines2 components are bound together. 302 Appendix C Spin-o : Out ow C.1 Introduction This appendix describes a spin-o project during my internship with the Healthcare Transformation group at IBM T.J. Watson Research Center. I worked with Dr. David Gotz to develop an alternative visualization for exploring ow, factors, and outcomes of temporal event sequences. This visualization, , called Out- ow [141], aggregates multiple event sequences in a new way and includes summaries of factors and associated outcomes. Many event sequences have associated outcomes. For example, outcomes for EMR data could be measured by cost, mortality or discharge rates. For sports ap- plications the outcome could be a win, loss or draw. Analyzing common patterns, or pathways, in a set of event sequences can help people better understand aggre- gate event progression behavior. In addition, connecting these pathways to their associated outcomes can help data analysts discover how certain progression paths may lead to better or worse results. For example, consider a medical dataset containing information about a set of patients with a dangerous disease. Each patient is described by an outcome mea- surement (e.g., if they survived the disease or not) and an event sequence containing the date that certain symptoms were rst observed by a doctor. An analysis of path- 303 ways in such a dataset might lead to the discovery that patients with symptoms A, B and C were likely to die in the hospital while patients with symptoms A, B and D were more likely to recover. Similarly, consider a dataset representing a set of soccer matches where goals scored are events and wins are considered a good outcome. An analysis of pathways in this dataset could help answer questions such as, \Does a team win more often when it comes from behind?" While analyzing event sequence data as described above can help answer many questions, there are often external factors|beyond the set of event types that de ne an event sequence|that make an analysis even more complex. Such factors, such as the administration of a drug to a sick patient, or a soccer player receiving a red card (which leaves his/her team short-handed), can often change the course of subsequent events. These factors must be incorporated into an analysis to understand how they in uence outcomes. Finally, event collections can be massive. Major healthcare institutions have millions of medical records containing millions of event sequences and many di erent event types. The scale and variability of this problem can lead to an extremely complex set of pathways for many scenarios. For example, even for a small data set with just ve event types and where each event sequence has just ve events, there are 3; 125 (55) possible pathways. This vast amount of information can be overwhelming and makes these datasets di cult to analyze. To address these challenges, I have designed Out ow, an interactive visualiza- tion that combines multiple event sequences and their outcomes into a graph-based 304 Figure C.1: Out ow processes temporal event data and visualizes aggregate event progression pathways together with associated statistics (e.g. outcome, duration, and cardinality). Users can interactively explore the paths via which entities arrive and depart various states. This screenshot shows a visualization of Manchester United?s 2010-2011 soccer season. Green shows pathways with good outcomes (i.e., wins) while red shows pathways with bad outcomes (i.e., losses). 305 visual representation. Out ow can summarize collections of event sequences and display all pathways, time gaps between each step, and their associated outcomes. Users can interact with the visualization through direct manipulation techniques (e.g., selection and brushing) and a series of control widgets. The interactions allow users to explore the data in search of insights, including information about which factors correlate most strongly with speci c pathways. This appendix describes the Out ow visualization in detail by explaining key design, rendering process and interaction techniques. I illustrate the generalizability of Out ow by discussing two applications (a medical use case and a sports statistics use case) and demonstrate its power via two example analyses and a user study. The remainder of this appendix is organized as follows: Section C.2 presents two motivating applications. The Out ow design is discussed in Section C.3. I then report the evaluation results in Section C.4 and C.5, and summarize in Section C.6. C.2 Motivation Out ow provides a general solution for a class of event sequence analysis prob- lems. This section describes two examples from di erent application domains which served as motivating problems for this work. C.2.1 Congestive Heart Failure (CHF) Out ow was originally inspired by a problem faced by a team of cardiologists. They were working to better understand disease evolution patterns using data from 306 a cohort of patients at risk of developing congestive heart failure (CHF). CHF is generally de ned as the inability of the heart to supply su cient blood ow to meet the needs of the body. CHF is a common, costly, and potentially deadly condition that a icts roughly 2% of adults in developed countries with rates growing to 6-10% for those over 65 years of age [80]. The disease is di cult to manage and no system of diagnostic criteria has been universally accepted as the gold standard. One commonly used system comes from the Framingham study [79]. This sys- tem requires the simultaneous presence of at least two major symptoms (e.g., S3 gal- lop, Acute pulmonary edema, Cardiomegaly) or one major symptom in conjunction with two minor symptoms (e.g., Nocturnal cough, Pleural e usion, Hepatomegaly). In total, 18 distinct Framingham symptoms have been de ned. While these symptoms are used regularly to diagnose CHF, my medical collab- orators are interested in understanding how the various symptoms and their order of onset correlate with patient outcome. To examine this problem, I were given access to an anonymized dataset of 6,328 patient records. Each patient record includes timestamped entries for each time a patient was diagnosed with a Framingham symptom. For example: Patient#1:(27 Jul 2009, Ankle edema), (14 Aug 2009, Pleural e usion), ... Patient#2:(17 May 2002, S3 gallop), (1 Feb 2003, Cardiomegaly), ... The dataset also contains information about medication orders and patient metadata. Available metadata includes date of birth, gender, date of CHF diagnosis, and (when applicable) date of death. 307 In line with the use of Framingham symptoms for diagnosis, I assume that once a symptom has been observed it applies perpetually. I therefore lter the event sequences for each patient to select only the rst occurrence of a given symptom type. The ltered event sequences describe the ow for each patient through di erent disease states. For example, a ltered event sequence symptom A ! symptom B indicates that the patient?s ow is no symptom ! symptom A! symptoms A and B. I used the presence (or lack thereof) of a date of death as an outcome measure (dead or alive). An inspirational task was to examine aggregated statistics for the ows of many patients to nd common disease progression paths. In addition, I wanted to discover any correlations between these paths and either (1) patient outcomes (i.e. mortality) or (2) external factors (i.e. medications). C.2.2 Soccer Result Analysis Although originally inspired by the medical application outlined above, Out- ow itself is not domain speci c and can generalize to other application areas. To demonstrate the broad applicability of this work, Out ow is also used to analyze soccer match results. For example, Figure C.1 shows an Out ow visualization of the 2010-2011 season for Manchester United Football Club (Man U.), an English Premier League soccer club based in Old Tra ord, Greater Manchester. Man U. has won the most trophies in English soccer and is one of the wealthiest and most widely supported teams in the world. 308 The 2010{2011 season was another successful one for Man U. in which they won multiple trophies. To better understand their route to success, I collected data from all 61 matches that Man U. played that season. For both Man U. and their opponents, I captured time-stamped events for every kicko , every goal scored, and every nal whistle. I also recorded the outcome for every match (win, loss, or draw) along with timestamped records of every yellow and red card received. As in the healthcare case, events are cumulative. Each time a goal is scored, it is added to the scoreline with the goal tally increasing over the course of a match. Using Out ow, sports analysts are able to see the average outcome associated with each scoreline, nd the average time to the next goal, predict what is likely to occur next, and understand how non-goal factors, such as red cards, can impact how a match may progress. This use case is the one tested in an evaluation where users were asked to perform many of these tasks. The study design and results are described in detail in Section C.5. C.3 Description of the Visualization Out ow?s design consists of four key elements: data aggregation, visual encod- ing, the rendering process, and user interaction. This core design is then expanded to support two important features: simpli cation and factors. 309 m Outflow Graph [e1,e2,e3] [e1,e2] [e1,e3] [e2,e3] [e1,e2,e3,e4] [e1,e2,e3,e5] [e1] [e2] [e3] [ ] Alignment Point Average outcome = 0.4 Average time = 26 days Number of entities = 41 Figure C.2: Multiple temporal event sequences are aggregated into a representation called an Out ow graph. This structure is a directed acyclic graph (DAG) that captures the various event sequences that led to the alignment point and all the sequences that occurred after the alignment point. Aggregate statistics are then anchored to the graph to describe speci c subsets of the data. [e 1 ,e 2 ,e 3 ] [e 1 ,e 3 ] [e 1 ,e 2 ] [e 2 ,e 3 ] [e 1 ,e 2 ,e 3 ,e 4 ] [e 1 ,e 2 ,e 3 ,e 5 ] Node End Edge Color represents outcome measure Time Edge Link Edge Time edge width represents duration of transition Height represents number of entities Alignment Point Target Entity?s Current State Past Future Figure C.3: Out ow visually encodes nodes in the Out ow graph using vertical rectangles. Edges are represented using two distinct visual marks: time edges and link edges. Color is used to encode average outcome. 310 C.3.1 Data Aggregation The rst step in creating an Out ow visualization is data aggregation. I de ne an entity E (e.g., a patient record) as a timestamped (ti) progression through di erent states (Si). Each Si is de ned as a set of zero or more events (ej) that an entity has experienced at or before time ti. A transition from state Sm to state Sn is denoted by Tm!n. E = (S0; t0)! (S1; t1)! (S2; t2)! (S3; t3)! : : :! (Sn; tn) Si = [e1; e2; e3; : : : ; ei] Given a collection of entities, fEg, I begin by choosing one state (which must be experienced by all E 2 fEg) as an alignment point. For example, users can align a set of medical records around a state where all patients have the same three symptoms (and no other symptoms). After choosing an alignment point,1 I aggregate all entities that pass through the alignment point into a data structure called an Out ow graph (Figure C.2). An Out ow graph is a state diagram expressed as a directed acyclic graph (DAG). A node is added to the graph for each unique state observed in fEg (e.g., a node for each unique combination of co-occurring symptoms). Edges are used to represent each state transitions observed in fEg, and they are annotated with various statistics: the number of entities that make the corresponding transition, 1The system uses S0 (the state where no events have occurred) as the default if no other alignment point is speci ed. 311 the average outcome for these entities, and the average time gap between the states. Therefore, an Out ow graph captures all event paths in fEg that lead to the alignment point and all event paths that occur after the alignment point. In the medical analysis example, users can select a target patient from the database and use the target patient?s current state as the alignment point. This approach allows for the analysis of historical data when considering the possible future progression of symptoms for the selected target patient. In the soccer analysis, users can align by a state with a speci c score (e.g., 2-1), which would include only matches that, at some point in time during the game, reached the speci ed state (e.g., two \Score" events and one \Concede" event). This could be useful for prediction by looking at historic data. C.3.2 Visual Encoding Based on the information contained in the Out ow graph, I have designed a rich visual encoding that displays (1) the time gap for each state change, (2) the cardinality of entities in each state and state transition, and (3) the average outcome for each state and transition. Drawing in part on prior work from FlowMap [93] and LifeFlow [142], I developed the visual encoding shown in Figure C.3. Node (State): Each state is represented by a rectangle whose height is pro- portional to the number of entities. Layer: A graph is sliced vertically into layers. Layer i contains all states with i events. The layers are sorted from left to right, showing information from the past 312 to the future. For example, in Figure C.1, the rst layer (layer 0) contains only one node, which represents all records before any event. The next layer (layer 1) also has one node because all games begin with a \Kick o " event. Layer 2, however, has three nodes because each game evolved in one of three di erent ways: \Score", \Concede", or \Final whistle". Edge (Transition): Each state transition is displayed using two visual marks: a time edge and a link edge. Time edges are rectangles whose width is proportional to the average time gap of the transition and height is proportional to the number of entities. Link edges connect nodes and time edges to convey sequentiality. End Edge: Each entity can end in a di erent state. A trapezoid followed by a circle marks these end points. Like transition edges, the height of the trapezoid is proportional to the number of entities that end at the corresponding state. The circles are included to ensure that small end edges remain visible. Color-coding: Colors assigned to edges are used to encode the average out- come for the corresponding set of entities. The outcome values are normalized to the [0,1] range with higher values representing better outcomes. The normalized values are then mapped to a color scale. In this prototype, the color coding scales linearly from red (outcome of 0) to yellow (outcome of 0.5) to green (outcome of 1). The color scale can be adjusted to accommodate users with color vision de ciency. 313 m a b a b Origin Point! Destination Point! Control Point #1! Control Point #2! Figure C.4: Link edges are rendered using quadratic B ezier curves. Control point placement is selected to ensure horizontal starting and ending edge slopes. C.3.3 Rendering Graphs with many nodes and edges can be di cult to visualize due to possible edge crossings and overlapping visual marks. I apply several rendering techniques to emphasize connectivity and reduce clutter in the visualization. C.3.3.1 B ezier Curve Each link edge is rendered as a quadratic B ezier curve to emphasize the con- nectivity and ow of the paths in the visual display. I make the control line from the origin point to the rst control point perpendicular to the origin node, and the control line from the destination point to the second control point perpendicular to the destination node. As shown in Figure C.4, this ensures that the edges are horizontal at both the start and end. C.3.3.2 Sugiyama?s Heuristics Out ow initially sorts nodes and edges in each layer according to their out- comes. However, this often leads to unnecessarily complex visualizations because of 314 Figure C.5: A multi-stage rendering process improves legibility. (a) Initial layout after sorting edges by outcome. (b) After applying Sugiyama?s heuristics to reduce crossings. (c) The nal visualization after both Sugiyama?s heuristics and Out ow?s force-directed layout algorithm to obtain straighter edges. edge crossings. Therefore, I apply Sugiyama?s heuristics [119], a well-known graph layout algorithm for DAGs, to reorder the elements in each layer to reduce edge crossings. Figures C.5a and C.5b show example layouts of the same data before and after applying Sugiyama?s heuristics, respectively. C.3.3.3 Force-directed Layout Once the order of nodes in a layer has been determined, the next step is to calculate the actual vertical position for each node. Positions are initially assigned by distributing the nodes equally along a layer?s vertical axis. However, this approach can often result in unnecessarily curvy paths (as shown in Figure C.5b) that make the visualization more di cult to follow. To produce straighter paths, I apply a spring-based force-directed optimization algorithm [35] that reduces edge curvature. As illustrated in Figure C.6, nodes and edges are simulated as particles and springs, respectively. To prevent nodes from getting too close to each other, spring repulsions are also inserted between nodes. 315 m ?? Spring simulation x Each node is a particle.! Total force = Force from edges - Repulsion between nodes! Force-directed layout Figure C.6: A spring-based optimization algorithm is used to obtain straighter (and easier to read) edges. Nodes and edges are simulated as particles and springs, respectively. Spring repulsions are inserted between nodes. During the optimization, nodes are gradually moved along a vertical axis to reduce the spring tension in their edges. Through a series of iterations, nodes are gradually moved along the layer?s vertical axis to reduce the spring tensions. The optimization stops either when the entire spring system is stable or when a maximum number of iterations has been reached. Figure C.5c shows the more readable visualization with straightened edges obtained by applying this force-directed layout algorithm. C.3.3.4 Edge Routing When large time di erences exist between two alternative paths, time edges and link edges can overlap as shown in Figure C.7a. This can make it more di cult for users to trace paths through the visualization during an analysis. I resolve this issue by routing link edges through an intermediate point that avoids crossings (Figure C.7b). The intermediate point is calculated by (1) nding the longest time edge from n neighbor edges in the direction traveled by a link edge (up/down), and 316 (a) (b) Figure C.7: Edge routing prevents overlaps between time and link edges. (a) A link edge is seen passing \behind" the time edge above it. Out ow?s edge routing algorithm extends the link edge horizontally beyond the occluding time edge. (b) The new route avoids the overlap and makes the time edge fully visible. (2) moving the origin of the link edge horizontally beyond the longest time edge?s x position. This method does not guarantee avoidance for neighbors further than n. However, in practice a low value of n (e.g., n = 3) provides e ective reduction in overlaps without excessive routing of edges. C.3.4 Basic Interactions To allow interactive data exploration, I further designed Out ow to support the following user interaction capabilities. Panning & Zooming: Users can pan and zoom to uncover detailed structure. Filtering: Users can lter both nodes and edges based on the the number of 317 associated entities to remove small subgroups. Event Type Selection: Users can select which event types are used to con- struct the Out ow graph. This allows, for instance, for the omission of events that users deem uninteresting. For example, users in the medical use case can include/ex- clude a symptom (e.g., \Ankle Edema") if they deem it relevant/irrelevant to an analysis. In response, the visualization will be recomputed dynamically. Brushing: Hovering the mouse over a node or an edge will highlight all paths traveled by all entities passing through the corresponding point in the Out ow graph (Figure C.8). Tooltips: Hovering also triggers the display of tooltips which provide more information about individual nodes and edges. Tooltips show all events associated with the corresponding node/edge, the average outcome, and the total number of entities in the subgroup (Figure C.8). Pinning: Users can \pin" a node or edge to freeze the brushed selection. This interaction is performed by clicking on the element to be pinned. Users can then move the mouse pointer to display tooltips for brushed subsets. This allows the quick retrieval of information about subsets that satisfy two constraints. For example, a user in soccer use case can pin soccer games that reached a 2-2 score before moving the mouse pointer to hover over the 1-0 state to see detailed information about the set of matches that pass through both nodes. 318 Figure C.8: Interactive brushing allows users to highlight paths emanating from spe- ci c nodes or edges in the visualization. This allows users to quickly see alternative progression paths taken by entities passing through a given state. C.3.5 Simpli cation The rendering techniques outlined earlier in this section can signi cantly re- duce visual clutter and make the visualization more legible. However, there are still situations when visual complexity arises due to inherent complexity in the underly- ing data. To enable analyses of these more challenging datasets, Out ow includes a simpli cation algorithm that actively simpli es the underlying graph structure used to construct the visualization. I apply a hierarchical clustering method that reduces the number of states in an Out ow graph by merging similar states within the same layer. States with a similarity distance less than a user-speci ed threshold are grouped together, while states that do not satisfy the threshold remain distinct. The user controls the threshold via a slider on the user interface. This prototype de nes the similarity distance between two states as the di erence in average outcomes (Equation C.1). Alternative measures could be easily substituted (e.g., a similarity-based metric 319 Figure C.9: Using the same dataset as illustrated in Figure C.1, a user has adjusted the simpli cation slider to group states with similar outcomes. Clustered states are represented with gray nodes. This simpli ed view shows that as more events occur (i.e., as more goals are scored), the paths diverge into two distinct sets of clustered states. The simpli ed states more clearly separate winning scorelines from losing scorelines. As more goals are scored, the probable outcomes of the games become more obvious. using application-speci c properties of the underlying entities). d(nodeA; nodeB) = jnodeA:outcome nodeB:outcomej (C.1) The hierarchical process begins as follows. First, each state in a layer is as- signed to its own cluster for which it is the only member. Then, distances between all pairs of clusters are computed. The distance between two clusters, de ned in Equation C.2, is determined by the average of the distances between all nodes in 320 the rst cluster to all nodes in the second cluster. d(clusterX ; clusterY ) = P m2clusterX&n2clusterY d(m;n) size(clusterX) size(clusterY ) (C.2) Once the distances have been computed for all pairs of clusters in a given layer, clusters are merged in a greedy fashion. The most similar pair of clusters is merged, and the cluster distances are updated to re ect the new clustering. The greedy process repeats until either (1) only one cluster remains, or (2) the most similar pair of remaining clusters has a distance above the threshold speci ed by users. After the simpli cation process completes, clusters containing multiple states are rendered as a single node lled with gray in the visualization. To preserve state transition information, the edges are not merged even though their origin nodes or destination nodes may have been simpli ed (Figure C.9). Nodes that are the alone in their cluster are rendered using the normal Out ow visual encoding. C.3.6 Factors As described so far, Out ow provides an interactive visualization of event pathways and their associated outcomes. However, it does not yet incorporate external factors that can often in uence how the events progress. For example, while goals determine the pathways in soccer use case, yellow and red cards can have a major impact on how a game unfolds. Similarly, a CHF patient?s symptoms may be strongly in uenced by the medications they are prescribed. In cases where 321 Figure C.10: Out ow highlights factors that are strongly correlated with speci c event pathways. In this screenshot, a physician has focused on a group of patients transitioning from the current state due to the onset of the \NYHA" symptom. This transition seems to be deadly, as seen from the color-coding in red. The right sidebar displays medications (factors) with high correlations to this transition. The factor with the highest correlation in this example is prescribing antiarrhythmic agents. This correlation, which may or may not be causal, can help clinicians generate hypotheses about how best to treat a patient. they can be controlled, factors can be important clues to analysts that are working to gure out if they can in uence how an entity?s events may progress. Given the importance of factor analysis, I extend the basic Out ow data model to associate a set of timestamped factors (fi; ti) with each entity. Because of the timestamps, each occurrence of a factor can be placed within the sequence of events associated with the corresponding entity. The Out ow graph for a set of entities is constructed as before using only the entities? events. For each node and edge in the graph, I then compute additional statistics to identify correlated factors and suggest them to users via the user interface. 322 For this prototype, I have derived two metrics to detect factors that occur unusually often (or rarely) before a given state or transition. These metrics could be easily replaced with more sophisticated metrics that are more suitable for speci c scenarios. In fact, I envision having a collection of metrics that users could choose from to measure various types of associations. The prototype?s baseline metrics are inspired by the term frequency-inverse document frequency (tf?idf) measure [106] used widely in information retrieval: presence correlation and absence correlation. 1. Presence correlation detects factors that are unusually frequent for a given state or transition. For states, if a factor fi often occurs before reaching state Sj while fi rarely occurs elsewhere in the dataset, then fi will be given a high presence correlation value for the corresponding state. I measure this correlation using by a present factor frequency-inverse state frequency (pff?isf) score, which I de ne as follows. pff = number of entities with fi before Sj number of entities reaching Sj (C.3) isfp = log number of states number of states preceded by fi + 1 ! (C.4) pff?isf = pff isfp A similar calculation is made for transitions by substituting Sj with Tm!n in Equation C.3 and replacing number of states with number of transitions in 323 Equation C.4. For isfp, only states (or transitions) for the current layer or earlier are counted. 2. Absence correlation detects factors that are unusually rare for a given state or transition. For states, if a factor fi rarely occurs before reaching state Sj while fi occurs commonly elsewhere in the dataset, then fi will be given a high absence correlation value for the corresponding state. This correlation is formulated as the absent factor frequency-inverse state frequency (aff?isf) score de ned below. aff = number of entities without fi before Sj number of entities reaching Sj (C.5) isfa = log number of states number of states not preceded by fi + 1 ! (C.6) aff?isf = aff isfa A similar calculation is made for transitions by substituting Sj with Tm!n in Equation C.5 and replacing number of states with number of transitions in Equa- tion C.6. For isfa, only states (or transitions) for the current layer or earlier are counted. A new sidebar panel is added to the Out ow user interface to display these correlation scores. When users mouse over any time edge, the panel is updated to display the most highly correlated present and absent factors for the corresponding 324 Figure C.11: Out ow aggregates temporal event data from a cohort of patients and visualizes alternative clinical pathways using color-coded edges that map to patient outcome. Interactive capabilities allow users to explore the data and uncover insights. transition. The factor panel can be seen on the right side of Figure C.10. Factors are listed alphabetically and displayed with a histogram to convey the strength of the correlation. C.4 Preliminary Analysis Out ow was used to view the evolution over time for a cohort of CHF patients similar to a clinician?s current patient. An initial analysis illuminates a number of interesting ndings and highlights that various types of patients evolve di erently. Here are two examples of the type of analysis that can be performed using the Out ow technique. 325 Figure C.12: The progression from green to red when moving left to right in this gure shows that patients with more symptoms exhibit worse outcomes. Leading Indicators: In several scenarios, patient outcome is strongly correlated with certain leading indicators. For example, consider the patient cohort visualized in Figure C.11. The strong red and green colors assigned to the rst layer of edges in the visualization shows that the eventual outcome for patients in this cohort is strongly correlated with the very rst symptom to appear. Similarly, the strong red and green colors assigned to the rst layer of edges after the alignment point show that the next symptom to appear may be critical in determining patient outcome. Progressive Complications: In contrast to the prior example, which showed strong outcome correlation with speci c paths, the patient cohort in Figure C.12 exhibits very di erent characteristics. At each time step, the outcomes across the di erent edges are relatively equal. However, the outcomes transition from green to red when moving left to right across the visualization. This implies that for this group of patients, no individual path is especially problematic historically. Instead, a general increase in co-occurring symptoms over time is the primary risk factor. 326 C.5 User Study I conducted a user study to evaluate Out ow?s ability to support a variety of event sequence analysis tasks. I rst describe the study?s design which asked users to answer questions about Man U.?s soccer season using a visualization of the data described in Section C.2.2. I then report the study?s results and discuss its ndings. C.5.1 Design Twelve users (eight males and four females) participated in this study. All users were adult professionals who are comfortable with computers. None of the users would consider themselves \soccer experts", but all have a basic understanding of the game. None of the users had any prior experience using Out ow. C.5.1.1 Procedure Each user participated in a single 60-minute session during which they were observed by a study moderator. Each session started with a brief orientation in which the moderator explained Out ow?s design and interactions. Participants were then allowed to experiment with the visualization to gain some experience working with the system. After roughly 15 minutes, the formal section of the study began. Participants were asked to perform a list of tasks. While the tasks were performed, the moderator recorded both accuracy and time to completion for each task. Users were then asked to freely explore the data and describe any interesting ndings. After that, 327 users were given a written questionnaire to gather subjective feedback. Finally, the moderator debriefed the participants to learn about their experience and any comments or suggestions they might have. C.5.1.2 Tasks and Questionnaire Each user was given a list of 16 tasks to perform. The tasks were designed to to evaluate people?s ability to understand proportion and cardinality of states and transitions, transition time, outcome of states and transitions, and factors associated with transitions. The rst nine tasks were designed to measure the participant?s ability to inter- pret Out ow?s visual representation. These tasks were further divided into two sets: practice tasks and test tasks. The rst four tasks, unbeknownst to the participants, were used only to ensure that participants fully explored Out ow?s visual design. Timing/accuracy data for these tasks was not included in this analysis. The ve test tasks (T1{T5) asked questions similar to the preceding practice tasks, but on di erent aspects of the dataset. The next seven tasks were designed to measure performance when using Out- ow?s interactive capabilities. Once again, these tasks were split into two groups: practice tasks and test tasks. The rst three were designed to give participants practice using Out ow?s interactive features and results were not included in this analysis. The four remaining tasks (T6{T9) asked similar questions about other aspects of the dataset. The test tasks used in the study are as follows: 328 T1. \Can you nd the state where Man U. conceded the rst goal?" Objective: Traverse graph using labels. T2. \What happened most rarely after Man U. conceded the rst goal?" Objective: Interpret proportion from height of time edge. T3. \Was it faster to concede a goal or to score a goal from the state in T1?" Objective: Interpret time from time edge width. T4. \Was it more likely for Man U. to win or lose after the state in T1?" Objective: Interpret state outcome. T5. \What is the most common score after two goals are scored (2-0, 1-1 or 0-2)?" Objective: Traverse graph and interpret proportion from state node height. T6. \Can you nd the state where Man U. led 2-1?" Objective: Traverse graph using labels. T7. \Which transition from the state in T6 led to the lowest percentage of winning? How many percent?" Objective: Use tooltip. T8. \Which factor(s) are highly correlated with the transition in T6?" Objective: Understand factors. T9. \For games that reached 2-2, which situation resulted in better outcomes? (a) Scoring to move from down 1-2 into a 2-2 tie or (b) conceding to move from a 2-1 lead to a 2-2 tie?" Objective: Use tooltip. The actual data shows a 0.73 outcome for (a) and a 0.67 outcome for (b). While the similar outcome values make the di erence in color-coding hard to see, the tooltip provides access to the numerical outcome statistics. 329 At the end of their study session, participants completed a questionnaire. It contained eight questions (Q1{Q8) to which users responded on a 7-point Likert scale (1=very easy, 7=very hard). The questionnaire also included free response questions to gather more subjective feedback. The eight Likert-scale questions included the following: Q1. Is it easy or hard to learn how to use? Q2. Is it easy or hard to interpret proportion of states? Q3. Is it easy or hard to interpret proportion of transitions? Q4. Is it easy or hard to interpret transition time? Q5. Is it easy or hard to interpret outcome of states? Q6. Is it easy or hard to interpret outcome of transitions? Q7. Is it easy or hard to understand factors correlated with a transition? Q8. Is it easy or hard to nd a particular state in the graph? C.5.2 Results C.5.2.1 Accuracy Overall, participants were able to complete the tasks accurately with only three mistakes observed out of 108 total tasks (97.2% accuracy). Two users erred on T4. These participants were able to identify the node that was the focus of the question. However, neither responded based on the color of the identi ed node. One user looked at all of the future paths and mentally 330 aggregated/averaged the values (incorrectly) to guess at the eventual outcome. This approach is subject to bias because long paths cover more pixels even if their outcome is not representative. When told of their mistake, the user said that looking at the color of the node \would have been trivial. I just forgot." The second user who answered T4 incorrectly responded based on the color of the largest outbound path. While that path corresponds to the most often occurring next event, it does not by itself represent the overall average outcome for the identi ed node. I hypothesize that both errors were due in large part to the users? lack of experience. The only other error occurred on T9. However, during the questionnaire por- tion of the study session, it was determined that the user misunderstood the task. He actually used Out ow correctly to answer what he thought the question was asking. C.5.2.2 Speed Participants were able to nish the tasks rapidly with average completion times ranging between 5.33 to 64.22 seconds.2 The wide range in timings re ects in part variations in task complexity. However, the standard deviations between timings for individual tasks were fairly large (up to 30.47 s). In general, some users quickly mapped tasks to a feature of the visualization while others spent more time thinking about what exactly the question was asking before settling on an answer. Overall, the times are quite fast and I believe that allowing novice users to precisely answer 2Task completion times were as follows: (T1) Mean=5.33 SD=4.18 s; (T2) 8.79 7.09 s; (T3) 8.16 7.64 s; (T4) 26.26 14.39 s; (T5) 49.53 30.47 s; (T6) 10.19 3.01 s; (T7) 16.98 10.81 s; (T8) 7.98 4.01 s; (T9) 64:22 26:92 s 331 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 1 2 3 4 5 6 7 Questions Rating (1 =v e ry ea sy , 7 =v e ry hard ) Figure C.13: Questionnaire results for each of the eight questions (Q1{Q8) answered on a 7-point scale by study participants. a complex task like T9 in as little as one minute highlights Out ow?s utility. C.5.2.3 Unrestricted Exploration After completing the study tasks, users were given time to freely explore the visualization. Participants were able to quickly identify several interesting aspects of the dataset. For example, many users quickly found the highest scoring game which had a total of eight goals. Participants were also drawn to states that had multiple \arriving" edges and compared outcomes for the alternative paths. This is similar to what they were asked to do in T9. One user discovered a general trend that Man U. rarely received yellow cards when they lost. This could be interpreted as a sign of lack of intensity in those games. Users also enjoyed \the ability to investigate the progression of a large number of games on one screen." Users observed a number of strong global trends, including: 332 (1) Man U. wins a large majority of their games, (2) Man U. wins most high scoring games, (3) Man U. often comes back after falling behind, and (4) Man U. rarely loses a game when ahead. C.5.2.4 Questionnaire and Debrie ng Results from the questionnaire are shown in Figure C.13. A pairwise t-test between all questions shows no signi cant di erence between the ratings of each question. Average ratings from all questions are between 1.50{2.33, suggesting that the participants generally found Out ow easy to learn and easy to use.3 However, not all participants responded in the same way. As shown in Fig- ure C.13, there were a small number of higher ratings for questions Q5-Q7 that indicate some frustration. In fact, all of the high scores (> 4) came from a single participant who had di culty understanding the di erence between the outcome of a state and the outcome of a transition. He repeatedly asked for clari cation and commented in the questionnaire that tasks on these topics required \some e ort to parse the wording". The moderator explained that while a state (e.g., 1-1) has one outcome, the transitions to that state (e.g., conceding to go from 1-0 to 1-1 vs. scoring to go from 0-1 to 1-1) can have di erent outcomes. The user understood for a moment, but then quickly became confused again. His di culty in grasping this concept led to his high responses (and slower task completion times). Based on free response questions and interviews with the moderator, partici- 3Question ratings were as follows: (Q1) Mean=2:33 SD=0:78; (Q2) 1:58 0:67; (Q3) 1:5 0:67; (Q4) 1:58 0:90; (Q5) 2:08 1:56; (Q6) 2:17 1:53; (Q7) 2:25 1:71; and (Q8) 2 0:953 333 pants felt overall that Out ow was \pretty", \looks cool", and that the colors were \very meaningful". They said that it provides a \good high level view" that en- codes a lot of information into a single \simple to follow" visualization. In addition, users felt that the ability to see outcome associated with alternative paths out of (or into) a given state, and not just the state itself, is a powerful feature. \I like the di erence between states and transitions [which allowed me] to compare two paths to the same state and to understand di erences." Another participant commented that \highlighting of paths was very helpful." When asked about learning to use Out ow, some participants expressed that the tool is unique, and therefore required some training to get used to it. However, those participants also felt strongly that by the end of the study they were pro cient in using the tool. One remarked that they would have done much better on the study tasks \with a few more minutes of training at the start" suggesting a short learning curve. Some limitations of the current design were also identi ed during the study. One participant pointed out that users tend to view width as time, but that the widest sequences don?t necessarily take the longest (in fact, soccer games all take roughly the same amount of time). This is indeed a limitation of the technique. Out ow handles graphs with multiple incoming edges to a node. Because these di erent paths to a node can have di erent durations, time can not be represented horizontally using an absolute time axis. While making comparisons between al- ternative paths easier, this design choice can make temporal comparisons across multiple steps somewhat harder. 334 Participants also suggested ideas for new features including (1) the ability to pin multiple states at once, and (2) moving the display of correlated factors into the main visualization space (instead of the separate sidebar used in the current design). C.6 Summary This appendix shows an example of new ideas that are inspired from the work in this dissertation. Out ow, a new visualization technique, was designed to summa- rize temporal event sequences, and show aggregated pathways, factors and outcomes. New visual representation, a number of interactive features and two simple metrics for factor recommendations are presented. I provided a detailed description of the visualization?s design including a multi-step rendering process designed to reduce visual complexity. A portion of this process|the combination of Sugiyama?s algo- rithm to reduce edge crossings and force-directed layout to straighten unnecessarily curvy edges|is also applicable to a more general class of DAG layout problems. Two preliminary analysis highlight some of the capabilities of this approach. Re- sults from a user study with twelve participants demonstrated that users were able to learn how to use Out ow easily within fteen minutes of training, and were able to accurately and quickly perform a broad range of analysis tasks. There are many promising directions to further explore including integration with forecasting/prediction algorithms, the use of more sophisticated similarity mea- sures, and deeper evaluation studies with domain experts. 335 Bibliography [1] Jae-wook Ahn and Peter Brusilovsky. Adaptive visualization of search results: Bringing user models to visual analytics. Information Visualization, 8(3):167{ 179, 2009. [2] Jae-wook Ahn, Peter Brusilovsky, Daqing He, Jonathan Grady, and Qi Li. Personalized web exploration with task models. In Proceeding of the 17th international conference on World Wide Web - WWW ?08, page 1. ACM Press, 2008. [3] Jae-wook Ahn, Krist Wongsuphasawat, and Peter Brusilovsky. Analyzing User Behavior Patterns in Adaptive Exploratory Search Systems with LifeFlow. In Proceedings of the Workshop on Human-Computer Interaction and Informa- tion Retrieval (HCIR), 2011. [4] Alfred V. Aho. Algorithms for nding patterns in strings. In Handbook of The- oretical Computer Science, Volume A: Algorithms and Complexity, chapter 5, pages 255{400. 1990. [5] Wolfgang Aigner and Silvia Miksch. CareVis: integrated visualization of computerized protocols and temporal patient data. Arti cial Intelligence in Medicine, 37(3):203{18, July 2006. [6] Wolfgang Aigner, Silvia Miksch, Heidrun Schumann, and Christian Tomin- ski. Visualization of Time-Oriented Data. Number 1997 in Human-Computer Interaction Series. Springer, 2011. [7] Wolfgang Aigner, Silvia Miksch, Bettina Thurnher, and Stefan Bi . Plan- ningLines: Novel Glyphs for Representing Temporal Uncertainties and Their Evaluation. In Proceedings of the International Conference on Information Visualization (IV), pages 457{463. IEEE, 2005. [8] Diane Lindwarm Alonso, Anne Rose, and Catherine Plaisant. Viewing per- sonal history records: a comparison of tabular format and graphical presen- tation using LifeLines. Behaviour & Information Technology, 17(5):249{262, September 1998. [9] Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403{410, 1990. [10] Paul Andr e, Max L. Wilson, Alistair Russell, Daniel A. Smith, Alisdair Owens, and M.c. Schraefel. Continuum: designing timelines for hierarchies, relation- ships and scale. Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST), page 101, 2007. 336 [11] Henrik Andr e-J onsson and Dushan Z. Badal. Using signature les for querying time-series data, 1997. [12] Francis J. Anscombe. Graphs in Statistical Analysis. American Statistician, 27(1):17{21, 1973. [13] Ragnar Bade, Stefan Schlechtweg, and Silvia Miksch. Connecting time- oriented data and information to a coherent interactive visualization. In Pro- ceedings of the Annual SIGCHI Conference on Human Factors in Computing Systems (CHI), pages 105{112. ACM, 2004. [14] Benjamin B. Bederson, Jesse Grosjean, and Jon Meyer. Toolkit design for interactive structured graphics. IEEE Transactions on Software Engineering, 30(8):535{546, August 2004. [15] Benjamin B. Bederson, Ben Shneiderman, and Martin Wattenberg. Ordered and quantum treemaps: Making e ective use of 2D space to display hierar- chies. ACM Transactions on Graphics, 21(4):833{854, 2002. [16] Donald J. Berndt and James Cli ord. Using dynamic time warping to nd patterns in time series. In AAAI-94 Workshop on Knowledge Discovery in Databases, pages 229{248, 1994. [17] Jorik Blaas, Charl P. Botha, Edward Grundy, Mark W. Jones, Robert S. Laramee, and Frits H. Post. Smooth graphs for visual exploration of higher- order state transitions. IEEE Transactions on Visualization and Computer Graphics, 15(6):969{76, 2009. [18] Christine Bonhomme and Marie-Aude Aufaure. Mixing icons, geometric shapes and temporal axis to propose a visual tool for querying spatio-temporal databases. In Proceedings of the Working Conference on Advanced Visual In- terfaces (AVI), pages 282{289. ACM, 2002. [19] Christine Bonhomme, Claude Tr epied, Marie-Aude Aufaure, and Robert Lau- rini. A visual language for querying spatio-temporal databases. In Proceedings of the ACM International Symposium on Advances in Geographic Information Systems (GIS), pages 34{39. ACM, 1999. [20] Taylor Booth. Sequential Machines and Automata Theory. John Wiley and Sons, New York, NY, USA, 1967. [21] Michael Bostock and Je rey Heer. Protovis: a graphical toolkit for vi- sualization. IEEE Transactions on Visualization and Computer Graphics, 15(6):1121{8, 2009. [22] Kevin Buchin, Bettina Speckmann, and Kevin Verbeek. Flow Map Layout via Spiral Trees. IEEE transactions on visualization and computer graphics, 17(12):2536{44, December 2011. 337 [23] Michael Burch, Fabian Beck, and Stephan Diehl. Timeline trees: visualiz- ing sequences of transactions in information hierarchies. In Proceedings of the Working Conference on Advanced Visual Interfaces (AVI), pages 75{82, Napoli, Italy, 2008. ACM. [24] Giuseppe Carenini and John Loyd. ValueCharts: analyzing linear models expressing preferences and evaluations. In Proceedings of the Working Con- ference on Advanced Visual Interfaces (AVI), pages 150{157. ACM, 2004. [25] John V. Carlis and Joseph A. Konstan. Interactive visualization of serial pe- riodic data. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST), pages 29{38, New York, New York, USA, 1998. ACM. [26] N. S. Chang and K. S. Fu. Query-by-Pictorial-Example. IEEE Transactions on Software Engineering, 6:519{524, 1980. [27] Remco Chang, Mohammad Ghoniem, Robert Kosara, William Ribarsky, Jing Yang, Evan Suma, Caroline Ziemkiewicz, Daniel Kern, and Agus Sudjianto. WireVis: Visualization of Categorical, Time-Varying Data From Financial Transactions. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST), pages 155{162. IEEE, October 2007. [28] Cleve Cheng, Yuval Shahar, Angel Puerta, and Daniel Stites. Navigation and visualization of abstractions of time-oriented clinical data. Section on Medical Informatics Technical Report No. SMI-97, 688, 1997. [29] Luca Chittaro. Visualization of patient data at di erent temporal granularities on mobile devices. Proceedings of the Working Conference on Advanced Visual Interfaces (AVI), page 484, 2006. [30] Jan Chomicki. Temporal query languages: A survey. In Proceedings of the International Conference on Temporal Logic, volume 827, pages 506{534. Springer, 1994. [31] James Cli ord and Albert Croker. The Historical Relational Data Model (HRDM) and Algebra Based on Lifespans. In Proceedings of the IEEE In- ternational Conference on Data Engineering (ICDE), pages 528{537. IEEE, 1987. [32] Steve B. Cousins and Michael G Kahn. The visual display of temporal infor- mation. Arti cial Intelligence in Medicine, 3(6):341{357, 1991. [33] Steve B. Cousins, Michael G Kahn, and Mark E Frisse. The display and ma- nipulation of temporal information. In Proceedings of the Annual Symposium on Computer Application in Medical Care, pages 76{80, November 1989. 338 [34] Raimund Dachselt and Markus Weiland. TimeZoom: A Flexible Detail and Context Timeline. In Proceedings of the Annual SIGCHI Conference on Hu- man Factors in Computing Systems (CHI) - Extended Abstracts, page 682. ACM, 2006. [35] Guiseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis G. Tollis. Algorithms for drawing graphs: an annotated bibliography. Computational Geometry, 4(5):235{282, October 1994. [36] Simon Dobrisek, Janez Zibert, Nikola Pavesi c, and France Mihelic. An edit- distance model for the approximate matching of timed strings. IEEE transac- tions on pattern analysis and machine intelligence, 31(4):736{41, April 2009. [37] Jerry Fails, Amy Karlson, Layla Shahamat, and Ben Shneiderman. A Visual Interface for Multivariate Temporal Data: Finding Patterns of Events across Multiple Histories. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST), volume 0, pages 167{174. IEEE, October 2006. [38] Jean-Daniel Fekete. The InfoVis Toolkit. In Proceedings of the IEEE Sympo- sium on Information Visualization (InfoVis), pages 167{174. IEEE, 2004. [39] Samuel Fomundam and Je rey W. Herrmann. A Survey of Queuing Theory Applications in Healthcare. Mechanical Engineering, pages 1{22, 2007. [40] Michael Friendly. Visions and re-visions of Charles Joseph Minard. Journal of Educational and Behavioral Statistics, 27(1):31{51, 2002. [41] Cristina G omez-Alonso and Aida Valls. A Similarity Measure for Sequences of Categorical Data Based on the Ordering of Common Elements. In Vicen c Torra and Yasuo Narukawa, editors, Modeling Decisions for Arti cial Intelli- gence, number 1, chapter 13, pages 134{145. Springer, 2008. [42] David Gotz and Michelle X. Zhou. Characterizing users visual analytic activity for insight provenance. Information Visualization, 8(1):42{55, January 2009. [43] John Alexis Guerra G omez, Krist Wongsuphasawat, Taowei David Wang, Michael L. Pack, and Catherine Plaisant. Analyzing Incident Management Event Sequences with Interactive Visualization. In Transportation Research Board Annual Meeting Compendium, Washington, DC, January 2011. [44] Richard W. Hamming. Error Detecting and Error Correcting Codes. The Bell System Technical Journal, 29(2):147{160, 1950. [45] Derek L Hansen, Dana Rotman, Elizabeth Bonsignore, Nata sa Mili c-frayling, Eduarda Mendes Rodrigues, Marc Smith, Ben Shneiderman, and Tony Capone. Do You Know the Way to SNA ?: A Process Model for Analyz- ing and Visualizing Social Media Data. Technical Report 3, University of Maryland: Human Computer Interaction Lab, 2009. 339 [46] Beverly L. Harrison, Russell Owen, and Ronald M. Baecker. Timelines: an interactive system for the collection and visualization of temporal data. In Proceedings of Graphics Interface (GI), pages 141{141. Citeseer, 1994. [47] J. A. Hartigan. Representation of Similarity Matrices by Trees. Journal of the American Statistical Association, 62(320):1140{1158, 1967. [48] Susan Havre, Elizabeth Hetzler, Paul Whitney, and Lucy Nowell. ThemeRiver: visualizing thematic changes in large document collections. IEEE Transactions on Visualization and Computer Graphics, 8(1):9{20, 2002. [49] K. Priyantha Hewagamage, Masahito Hirakawa, and Tadao Ichikawa. Inter- active visualization of spatiotemporal patterns using spirals on a geographical map. Proceedings of the IEEE Symposium on Visual Languages, pages 296{ 303, 1999. [50] S. Hibino and E.A. Rundensteiner. A visual query language for identifying temporal trends in video data. In Proceedings of the International Workshop on Multi-Media Database Management Systems, pages 74{81. IEEE, 1995. [51] Stacie Hibino and Elke A. Rundensteiner. User interface evaluation of a direct manipulation temporal visual query language. In Proceedings of the ACM In- ternational Conference on Multimedia (MULTIMEDIA), pages 99{107. ACM, 1997. [52] Harry Hochheiser and Ben Shneiderman. Dynamic query tools for time series data sets: Timebox widgets for interactive exploration. Information Visual- ization, 3(1):1{18, 2004. [53] Barry E. Jacobs and Cynthia A. Walczak. A Generalized Query-by-Example Data Manipulation Language Based on Database Logic. IEEE Transactions on Software Engineering, SE-9(1):40{57, 1983. [54] Jing Jin and Pedro Szekely. QueryMarvel: A visual query language for tem- poral patterns using comic strips. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 207{214. IEEE, September 2009. [55] Brian Johnson and Ben Shneiderman. Tree-maps: A space- lling approach to the visualization of hierarchical information structures. In Proceedings of the IEEE Conference on Visualization (Vis), pages 284{291. IEEE, 1991. [56] Geir Jordet, Esther Hartman, Chris Visscher, and Koen A P M Lemmink. Kicks from the penalty mark in soccer: the roles of stress, skill, and fatigue for kick outcomes. Journal of Sports Sciences, 25(2):121{129, 2007. 340 [57] Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Je rey Heer. Wran- gler: interactive visual speci cation of data transformation scripts. In Pro- ceedings of the Annual SIGCHI Conference on Human Factors in Computing Systems (CHI), page 3363, New York, New York, USA, 2011. ACM Press. [58] Hyunmo Kang, Lise Getoor, Ben Shneiderman, Mustafa Bilgic, and Louis Licamele. Interactive entity resolution in relational data: a visual analytic tool and its evaluation. IEEE transactions on visualization and computer graphics, 14(5):999{1014, 2008. [59] Gerald M. Karam. Visualization using timelines. In Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 125{137. ACM, 1994. [60] Toshikazu Kato, Takio Kurita, Nobuyuki Otsu, and Kyoji Hirata. A sketch retrieval method for full color image database-query by visual example. In Proceedings of the IAPR International Conference on Pattern Recognition, pages 530{533. IEEE, 1992. [61] Ernst Kleiberg, Huub van De Wetering, and Jarke J. van Wijk. Botanical visualization of huge hierarchies. In Proceedings of the IEEE Symposium on Information Visualization (InfoVis), page 87. IEEE, 2001. [62] Denis Klimov, Yuval Shahar, and Meirav Taieb-Maimon. Intelligent selection and retrieval of multiple time-oriented records. Journal of Intelligent Infor- mation Systems, 35(2):261{300, September 2009. [63] Denis Klimov, Yuval Shahar, and Meirav Taieb-Maimon. Intelligent visual- ization and exploration of time-oriented data of multiple patients. Arti cial Intelligence in Medicine, 49(1):11{31, May 2010. [64] Anthony C. Klug. Abe: a query language for constructing aggregates-by- example. In Proceedings of the LBL Workshop on Statistical Database Man- agement (SSDBM), pages 190{205. Lawrence Berkeley Lab, 1981. [65] Milos Krstajic, Enrico Bertini, and Daniel A. Keim. CloudLines: Compact Display of Event Episodes in Multiple Time-Series. IEEE Transactions on Visualization and Computer Graphics, 17(12):2432, 2011. [66] Joseph B. Kruskal and James M. Landwehr. Icicle Plots: Better Displays for Hierarchical Clustering. The American Statistician, 37(2):162{168, 1983. [67] Harold W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1-2):83{97, March 1955. [68] Jonh Lamping and Ramana Rao. The Hyperbolic Browser: A Focus+Context Technique for Visualizing Large Hierarchies. Journal of Visual Languages & Computing, 7(1):33{55, March 1996. 341 [69] Ivica Letunic and Peer Bork. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics, 23(1):127{8, January 2007. [70] Vladimir I. Levenshtein. Binary codes capable of correcting deletions, inser- tions and reversals. Soviet Physics Doklady, 10(8):707{710, 1966. [71] Wen-Syan Li, K. Selcuk Candan, Kyoji Hirata, and Yoshinori Hara. IFQ: a visual query interface and query generator for object-based media retrieval. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, pages 353{361. IEEE, 1997. [72] Jessica Lin, Eamonn Keogh, and Stefano Lonardi. Visualizing and discovering non-trivial patterns in large time series databases. Information Visualization, 4(2):61{82, April 2005. [73] Jessica Lin, Eamonn Keogh, Stefano Lonardi, Je rey P. Lankford, and Donna M. Nystrom. Visually mining and monitoring massive time series. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), page 460, New York, New York, USA, 2004. ACM. [74] Jessica Lin, Eamonn Keogh, Stefano Lonardi, Je rey P. Lankford, and Donna M. Nystrom. VizTree: a tool for visually mining and monitoring mas- sive time series databases. In Proceedings of the International Conference on Very large data bases (VLDB), volume 30, page 1272. VLDB Endowment, 2004. [75] Lauro Lins, Marta Heilbrun, Juliana Freire, and Claudio Silva. VISCARE- TRAILS: Visualizing Trails in the Electronic Health Record with Timed Word Trees, a Pancreas Cancer Use Case. In Proceedings of the IEEE VisWeek Workshop on Visual Analytics in Healthcare, pages 13{16, 2011. [76] Heikki Mannila and Pirjo Moen. Similarity between Event Types in Se- quences. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (DaWaK), pages 271{280. Springer, 1999. [77] Heikki Mannila and Pirjo Ronkainen. Similarity of Event Sequence. In Pro- ceedings of the International Workshop on Temporal Representation and Rea- soning (TIME), pages 136{139, 1997. [78] Heikki Mannila and Jouni K. Sepp anen. Finding similar situations in se- quences of events via random projections. In Proceedings of the SIAM Inter- national Conference on Data Mining, pages 1{16. Citeseer, 2001. [79] P. A. McKee, W. P. Castelli, P. M. McNamara, and W. B. Kannel. The natural history of congestive heart failure: the Framingham study. The New England journal of medicine, 285(26):1441{6, December 1971. 342 [80] John J. McMurray and Marc A. Pfe er. Heart failure. Lancet, 365(9474):1877{ 1889, 2005. [81] Marcel Mongeau and David Sanko . Comparison of Musical Sequences. Com- puter and the Humanities, 24(3):161{175, 1990. [82] Tamara Munzner. H3: laying out large directed graphs in 3D hyperbolic space. Proceedings of Visualization Conference, Information Visualization Sympo- sium and Parallel Rendering Symposium (VIZ), pages 2{10,, 1997. [83] Tadao Murata. Petri nets: Properties, analysis and applications. In Proceed- ings of the IEEE, volume 77, pages 541{580. IEEE, 1989. [84] Gonzalo Navarro. A guided tour to approximate string matching. ACM Com- puting Surveys, 33(1):31{88, March 2001. [85] Hannes Obweger, Martin Suntinger, Josef Schiefer, and Gunther Raidl. Sim- ilarity searching in sequences of complex events. In Proceedings of the Inter- national Conference on Research Challenges in Information Science (RCIS), pages 631{640. IEEE, May 2010. [86] Gultekin Ozsoyoglu, Victor Matos, and Meral Ozsoyoglu. Query processing techniques in the summary-table-by-example database query language. ACM Transactions on Database Systems, 14(4):526{573, December 1989. [87] Gultekin Ozsoyoglu and Huaqing Wang. Example-based graphical database query languages. Computer, 26(5):25{38, May 1993. [88] William R. Pearson and David J. Lipman. Improved tools for biological se- quence comparison. In Proceedings of the National Academy of Sciences of the United States of America, volume 85, pages 2444{2448, April 1988. [89] Adam Perer and Ben Shneiderman. Systematic yet exible discovery: Guiding Domain Experts through Exploratory Data Analysis. In Proceedings of the International Conference on Intelligent User Interfaces (IUI), page 109, New York, New York, USA, 2008. ACM. [90] Carl Adam Petri. Communication with automata. Technical report, DTIC Research Report, 1966. [91] Doantam Phan, John Gerth, Marcia Lee, Andreas Paepcke, and Terry Wino- grad. Visual analysis of network ow data with timelines and event plots. In Proceedings of the International Workshop on Visualization for Computer Security (VizSEC), pages 85{99. Springer, 2007. [92] Doantam Phan, Andreas Paepcke, and Terry Winograd. Progressive multiples for communication-minded visualization. In Proceedings of Graphics Interface (GI), page 225, New York, New York, USA, 2007. ACM. 343 [93] Doantam Phan, Ling Xiao, Ron Yeh, Pat Hanrahan, and Terry Winograd. Flow map layout. In Proceedings of the IEEE Symposium on Information Visualization (InfoVis), pages 219{224. IEEE, 2005. [94] William A Pike, John Stasko, Remco Chang, and Theresa A OConnell. The science of interaction. Information Visualization, 8(4):263{274, 2009. [95] Peter Pirolli and Stuart Card. The sensemaking process and leverage points for analyst technology as identi ed through cognitive task analysis. In Proceedings of International Conference on Intelligence Analysis, volume 2005, pages 2{4, 2005. [96] Catherine Plaisant, Jesse Grosjean, and Benjamin B. Bederson. SpaceTree: supporting exploration in large node link tree, design evolution and empirical evaluation. In Proceedings of the IEEE Symposium on Information Visualiza- tion (InfoVis), pages 57{64. IEEE, 1998. [97] Catherine Plaisant, Rich Mushlin, A. Snyder, J. Li, D. Heller, and Ben Shnei- derman. LifeLines: using visualization to enhance navigation and analysis of patient records. In Proceedings of the AMIA Annual Symposium, pages 76{80. American Medical Informatics Association, 1998. [98] Seth M. Powsner and Edward R. Tufte. Graphical Summary of Patient Status. Lancet, 344(8919):386{389, 1994. [99] A. Johannes Pretorius and Jarke J. van Wijk. Visual analysis of multivariate state transition graphs. IEEE Transactions on Visualization and Computer Graphics, 12(5):685{92, 2006. [100] A. Johannes Pretorius and Jarke J. van Wijk. Visual Inspection of Multivariate Graphs. Computer Graphics Forum, 27(3):967{974, May 2008. [101] Edward M. Reingold and John S. Tilford. Tidier Drawings of Trees. IEEE Transactions on Software Engineering, SE-7(2):223{228, March 1981. [102] Patrick Riehmann, Manfred Han er, and Bernd Froehlich. Interactive Sankey diagrams. In Proceedings of the IEEE Symposium on Information Visualiza- tion (InfoVis), pages 233{240. IEEE, 2005. [103] Isidore Rigoutsos and Aris Floratos. Combinatorial pattern discovery in bio- logical sequences: The TEIRESIAS algorithm. Bioinformatics, 14(1):55{67, January 1998. [104] George G. Robertson, Stuart K. Card, and Jack D. Mackinlay. Information visualization using 3D interactive animation. Communications of the ACM, 36(4):57{71, 1993. [105] Richard Rudgley. The Lost Civilizations of the Stone Age. Simon & Schuster, 1999. 344 [106] Gerard Salton and Christopher Buckley. Term-weighting approaches in auto- matic text retrieval. Information Processing & Management, 24(5):513{523, 1988. [107] Michael Schroeder, David Gilbert, Jacques van Helden, and Penny Noy. Ap- proaches to visualisation in bioinformatics: from dendrograms to Space Ex- plorer. Information Sciences, 139(1-2):19{57, November 2001. [108] Jinwook Seo and Ben Shneiderman. A rank-by-feature framework for interac- tive exploration of multidimensional data. Information Visualization, 4(2):96{ 113, 2005. [109] Yuval Shahar, Dina Goren-Bar, David Boaz, and Gil Tahan. Distributed, intelligent, interactive visualization and exploration of time-oriented clinical data and their abstractions. Arti cial Intelligence in Medicine, 38(2):115{35, 2006. [110] Reza Sherkat and Davood Ra ei. E ciently evaluating order preserving sim- ilarity queries over historical market-basket data. In Proceedings of the Inter- national Conference on Data Engineering (ICDE), pages 19{30, 2006. [111] Ben Shneiderman. Direct Manipulation: A Step Beyond Programming Lan- guages. Computer, 16(8):57{69, August 1983. [112] Ben Shneiderman. The eyes have it: a task by data type taxonomy for in- formation visualizations. In Proceedings of the IEEE Symposium on Visual Languages, pages 336{343. IEEE, 1996. [113] Ben Shneiderman. Extreme visualization: squeezing a billion records into a million pixels. In Proceedings of the ACM SIGMOD international conference on Management of data, pages 3{12. ACM, 2008. [114] Ben Shneiderman and Catherine Plaisant. Strategies for evaluating informa- tion visualization tools. In Proceedings of the AVI workshop on BEyond time and errors novel evaluation methods for information visualization (BELIV), pages 1{7. ACM, 2006. [115] Richard T. Snodgrass. The temporal query language TQuel. ACM Transac- tions on Database Systems, 12(2):247{298, June 1987. [116] Richard T. Snodgrass. The TSQL2 temporal query language. Kluwer Academic Publishers, 1995. [117] Robert R. Sokal and Peter H. A. Sneath. Principles of Numerical Taxonomy. W. H. Freeman and Co., San Francisco, 1963. [118] John Stasko and E. Zhang. Focus+context display and navigation techniques for enhancing radial, space- lling hierarchy visualizations. In Proceedings of the IEEE Symposium on Information Visualization (InfoVis), pages 57{65. IEEE, 2000. 345 [119] Kozo Sugiyama, Shojiro Tagawa, and Mitsuhiko Toda. Methods for Visual Understanding of Hierarchical System Structures. IEEE Transactions on Sys- tems, Man, and Cybernetics, 11(2):109{125, 1981. [120] Martin Suntinger, Josef Schiefer, Hannes Obweger, and M. Eduard Groller. The Event Tunnel: Interactive Visualization of Complex Event Streams for Business Process Pattern Analysis. In Proceedings of the IEEE Paci c Visu- alization Symposium (Paci cVis), pages 111{118. IEEE, March 2008. [121] Abdullah U. Tansel, M. Erol Arkun, and Gultekin Ozsoyoglu. Time-by- example query language for historical databases. IEEE Transactions on Soft- ware Engineering, 15(4):464{478, April 1989. [122] Abdullah U. Tansel and Erkan Tin. The expressive power of temporal rela- tional query languages. IEEE Transactions on Knowledge and Data Engineer- ing, 9(1):120{134, 1997. [123] James J. Thomas and Kristin A. Cook. Illuminating the path: The research and development agenda for visual analytics. IEEE, 2005. [124] John W. Tukey. Exploratory Data Analysis. Addison-Wesley, 1977. [125] Frank van Ham, Huub van De Wetering, and Jarke J. van Wijk. Interactive visualization of state transition systems. IEEE Transactions on Visualization and Computer Graphics, 8(4):319{329, October 2002. [126] Katerina Vrotsou. Everyday mining Exploring sequences in event-based data. PhD thesis, Linkoping University, 2010. [127] Katerina Vrotsou, Kajsa Ellegard, and Matthew Cooper. Everyday Life Dis- coveries: Mining and Visualizing Activity Patterns in Social Science Diary Data. In Proceedings of the International Conference on Information Visual- ization (IV), pages 130{138. IEEE, July 2007. [128] Katerina Vrotsou and Camilla Forsell. A Qualitative Study of Similarity Mea- sures in Event-Based Data. In Proceedings of the Human Interface and the Management of Information. Interacting with Information Symposium on Hu- man Interface, pages 170{179. Springer, 2011. [129] Katerina Vrotsou, Jimmy Johansson, and Matthew Cooper. ActiviTree: inter- active visual exploration of sequences in event-based data using graph similar- ity. IEEE Transactions on Visualization and Computer Graphics, 15(6):945{ 52, 2009. [130] Taowei David Wang, Amol Deshpande, and Ben Shneiderman. A Temporal Pattern Search Algorithm for Personal History Event Visualization. IEEE Transactions on Knowledge and Data Engineering, 22(12):1{12, 2010. 346 [131] Taowei David Wang, Catherine Plaisant, Alexander J. Quinn, Roman Stan- chak, Shawn Murphy, and Ben Shneiderman. Aligning temporal data by sen- tinel events: discovering patterns in electronic health records. In Proceedings of the Annual SIGCHI Conference on Human Factors in Computing Systems (CHI), pages 457{466. ACM, 2008. [132] Taowei David Wang, Catherine Plaisant, Ben Shneiderman, Neil Spring, David Roseman, Greg Marchand, Vikramjit Mukherjee, and Mark Smith. Temporal Summaries: Supporting Temporal Categorical Searching, Aggregation and Comparison. IEEE Transactions on Visualization and Computer Graphics, 15(6):1049{1056, November 2009. [133] Taowei David Wang, Krist Wongsuphasawat, Catherine Plaisant, and Ben Shneiderman. Extracting insights from electronic health records: case studies, a visual analytics process model, and design recommendations. Journal of medical systems, 35(5):1135{52, October 2011. [134] Yasuyuki Watai, Toshihiko Yamasaki, and Kiyoharu Aizawa. View-Based Web Page Retrieval using Interactive Sketch Query. In Proceedings of the IEEE International Conference on Image Processing, pages 357{360. IEEE, 2007. [135] Martin Wattenberg. Sketching a graph to query a time-series database. In Pro- ceedings of the Annual SIGCHI Conference on Human Factors in Computing Systems (CHI) - Extended Abstracts, pages 381{382. ACM, 2001. [136] Martin Wattenberg. Visual exploration of multivariate graphs. In Proceedings of the Annual SIGCHI Conference on Human Factors in Computing Systems (CHI), page 811. ACM, 2006. [137] Marc Weber, Marc Alexa, and Wolfgang Muller. Visualizing time-series on spirals. In Proceedings of the IEEE Symposium on Information Visualization (InfoVis), pages 7{13. IEEE, 2001. [138] Charles Wetherell and Alfred Shannon. Tidy Drawings of Trees. IEEE Trans- actions on Software Engineering, SE-5(5):514{520, September 1979. [139] Ryen W. White and Resa A. Roth. Exploratory Search: Beyond the Query- Response Paradigm. In Synthesis Lectures on Information Concepts, Re- trieval, and Services, pages 1{98, April 2009. [140] William E. Winkler. The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Census Bureau, 1999. [141] Krist Wongsuphasawat and David Gotz. Out ow: Visualizing patient ow by symptoms and outcome. In IEEE VisWeek Workshop on Visual Analytics in Healthcare, pages 25{28, 2011. 347 [142] Krist Wongsuphasawat, John Alexis Guerra G omez, Catherine Plaisant, Taowei David Wang, Meirav Taieb-Maimon, and Ben Shneiderman. LifeFlow: Visualizing an Overview of Event Sequences. In Proceedings of the Annual SIGCHI Conference on Human Factors in Computing Systems (CHI), pages 1747{1756. ACM, 2011. [143] Krist Wongsuphasawat, Catherine Plaisant, Meirav Taieb-Maimon, and Ben Shneiderman. Querying Event Sequences by Exact Match or Similarity Search: Design and Empirical Evaluation. Interacting with computers, 24(2):55{68, March 2012. [144] Krist Wongsuphasawat and Ben Shneiderman. Finding comparable temporal categorical records: A similarity measure with an interactive visualization. In Proceedings of the IEEE Symposium on Visual Analytics Science and Tech- nology (VAST), pages 27{34. IEEE, October 2009. [145] Mosh e M. Zloof. Query by example. In Proceedings of the National Computer Conference and Exposition (AFIPS), pages 431{438. ACM, 1975. [146] Mosh e M. Zloof. O ce-by-Example: A business language that uni es data and word processing and electronic mail. IBM Systems Journal, 21(3):272{ 304, 1982. 348