ABSTRACT Title of dissertation: SITUATED ANALYTICS FOR DATA SCIENTISTS Andrea Batch, Doctor of Philosophy, 2022 Dissertation directed by: Professor Niklas Elmqvist College of Information Studies Much of Mark Weiser?s vision of ?ubiquitous computing? has come to fruition: We live in a world of interfaces that connect us with systems, devices, and people wherever we are. However, those of us in jobs that involve analyzing data and developing software find ourselves tied to environments that limit when and where we may conduct our work; it is ungainly and awkward to pull out a laptop during a stroll through a park, for example, but difficult to write a program on one?s phone. In this dissertation, I discuss the current state of data visualization in data science and analysis workflows, the emerging domains of immersive and situated analytics, and how immersive and situated implementations and visualization techniques can be used to support data science. I will then describe the results of several years of my own empirical work with data scientists and other analytical professionals, particularly (though not exclusively) those employed with the U.S. Department of Commerce. These results, as they relate to visualization and visual analytics design based on user task performance, observations by the researcher and participants, and evaluation of observational data collected during user sessions, represent the first thread of research I will discuss in this dissertation. I will demonstrate how they might act as the guiding basis for my implementation of immersive and situated analytics systems and techniques. As a data scientist and economist myself, I am naturally inclined to want to use high- frequency observational data to the end of realizing a research goal; indeed, a large part of my research contributions?and a second ?thread? of research to be presented in this dissertation?have been around interpreting user behavior using real-time data collected during user sessions. I argue that the relationship between immersive analytics and data science can and should be reciprocal: While immersive implementations can support data science work, methods borrowed from data science are particularly well-suited for supporting the evaluation of the embodied interactions common in immersive and situated environments. I make this argument based on both the ease and importance of collecting spatial data from user sessions from the sensors required for immersive systems to function that I have experienced during the course of my own empirical work with data scientists. As part of this thread of research working from this perspective, this dissertation will introduce a framework for interpreting user session data that I evaluate with user experience researchers working in the tech industry. Finally, this dissertation will present a synthesis of these two threads of research. I combine the design guidelines I derive from my empirical work with machine learning and signal processing techniques to interpret user behavior in real time in Wizualization, a mid-air gesture and speech-based augmented reality visual analytics system. SITUATED ANALYTICS FOR DATA SCIENTISTS by Andrea Batch Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2022 Advisory Committee: Professor Niklas Elmqvist, Chair/Advisor Professor Kimbal Marriott Professor Eun Kyoung Choe Professor Vanessa Frias-Martinez Professor Jordan Boyd-Graber ? Copyright by Andrea Batch 2022 Dedication To my wife, Chloe; my mentor, Niklas; my siblings, Jenny, Aaron, & Jonathan; and my grandparents (Kapper and Julca alike), whose lives have been an inspiration to my own. ii Acknowledgments There are many people who have played tremendously important roles in my life throughout the process of my doctoral research without whom I could never have arrived at this point. This dissertation would not have been at all possible without the presence of two people in particular: My advisor, guide, and lifeline from the beginning to the end of my doctorate, Niklas Elmqvist; and my wife, Chloe Batch, without whom I would surely be dead in a ditch somewhere, and whose artist?s eye and?in no less than two instances?illustrations have contributed greatly to my body of published work. I must also give special thanks to Vanessa Frias-Martinez and Eun Kyoung Choe, who have been on my committee since before I had a committee and who have given me invaluable direction throughout my time at Maryland. I will be eternally indebted to Catherine Plaisant for using her emerita time to guide my work. I am equally indebted to Kim Marriott for sacrificing his late-night hours to offer his guidance from the other side of the planet. I thank Jordan Boyd-Graber, as well, for his expert guidance despite his many commitments. I am also truly grateful to the Assistant Chief Economist at the U.S. Bureau of Economic Analysis, Abe Dunn, and to the Chief Economist, Dennis Fixler, and to the BEA as an agency, for their flexibility, support, and direction throughout this long process. When I think about getting down to the gritty technical details and odd hours spent working on software (and sometimes hardware) engineering, it will forever be my colleagues, coauthors, and comrades Biswaksen Patnaik, Pete Butcher, Sungbok Shin, Max Cordeil, Andrew Cunningham, Sigfried Gold, Hanuma Teja Maddali, Kyungjun Lee, Yipeng Ji, Sebastian Hubenschmid, Jonathan Wieland, Daniel Fink, Johannes Zagermann, and Julia Liu who will come to mind. And of course what use is engineering without vision and wisdom? Tim Dwyer, Bruce H. Thomas, Harald Reiterer, Panos Ritsos, Jian Zhao, Mingming Fan, and Moses Akazue all plotted the intellectual routes I have followed. Finally, my dear iii friends Sydney Hodges; Andrew Bossi; Bryan Seaborne Reid IV, Esquire; and Erica and John Henderson: Without our regular chats and D&D sessions, I would likely have become an even more eccentric hermit by the end of this thing than the mildly eccentric semi-hermit that I have indeed become. iv Table of Contents Dedication ii Acknowledgements iii I Part 1: Overview 1 1 Introduction 2 1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Research Problems, Questions, and Objectives . . . . . . . . . . . . . . . . 6 1.4 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Relevance and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.6 A Positionality Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.7 The Structure of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . 14 2 Related Work 17 2.1 Data Science Workflows: Batch?s Visualization Gap . . . . . . . . . . . . . 17 2.2 ?Space to Think? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Direct Manipulation and Sketching in Visualization . . . . . . . . . 21 2.2.2 Immersive Analytics . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.3 Situated Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.4 Interaction and Situated Analytics . . . . . . . . . . . . . . . . . . 27 2.2.5 Mid-Air Gestures and Speech for Visualization . . . . . . . . . . . 29 2.3 Visualization Grammars and Beyond . . . . . . . . . . . . . . . . . . . . . 30 2.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.1 Cooperative and Contextual Inquiry for Visualization . . . . . . . . 32 2.4.2 Quantitative Measures in Task Performance, Use of Space & Time . 34 2.4.3 Characterizing User Behavior with Machine Learning . . . . . . . . 34 2.4.3.1 Deep Learning Models for Human Poses . . . . . . . . . 35 2.4.3.2 Behavioral Coding . . . . . . . . . . . . . . . . . . . . . 36 v 2.4.3.3 Unsupervised Video Summarization . . . . . . . . . . . . 37 2.4.3.4 Machine Learning in HCI . . . . . . . . . . . . . . . . . 38 II Part 2: Data Science Workflow to Design Guidelines 42 3 The Interactive Visualization Gap 43 3.1 Contextual Inquiry with Data Scientists . . . . . . . . . . . . . . . . . . . 43 3.1.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.1.2 Apparatus and Locale . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.1.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.1.4 Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1.5 Data Collection and Analysis . . . . . . . . . . . . . . . . . . . . . 47 3.2 Contextual Inquiry Results . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2.1 Stage 1: Pre-experiment Interview . . . . . . . . . . . . . . . . . . 48 3.2.1.1 Self-Reported Workflows . . . . . . . . . . . . . . . . . 48 3.2.1.2 Work Focus . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2.1.3 Self-Reported Tool Use: Revisiting Kandel?s Archetypes . 49 3.2.2 Stage 2: Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2.2.1 Summary of Tools and Visualizations Used . . . . . . . . 51 3.2.2.2 Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2.2.3 Acquisition and Transformation . . . . . . . . . . . . . . 53 3.2.2.4 Exploration, Modeling, and Communication . . . . . . . 53 3.2.3 Stage 3: Sketching . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2.4 Stage 4: Post-experiment Interview . . . . . . . . . . . . . . . . . 57 3.2.4.1 Not Enough Time . . . . . . . . . . . . . . . . . . . . . 57 3.2.4.2 Show me the Numbers! . . . . . . . . . . . . . . . . . . 59 3.2.4.3 Visualization is Unnecessary . . . . . . . . . . . . . . . . 59 3.3 A Discussion of the Visualization Gap . . . . . . . . . . . . . . . . . . . . 59 4 Evaluating Performance, Space Use, and Presence in Immersive Analytics 63 4.1 Mixed Methods for Immersive Evaluation: Supervised and In-the-Wild . . . 63 4.1.1 Setting and Participant Pool . . . . . . . . . . . . . . . . . . . . . 63 4.1.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1.4 Common Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.5 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.5.1 Visualization of Spatial Activity . . . . . . . . . . . . . . 66 4.1.5.2 Replaying Participant Sessions . . . . . . . . . . . . . . . 66 4.1.6 Formative: Pilot and ?In the Wild? Studies . . . . . . . . . . . . . . 66 4.1.7 Improvements to ImAxes . . . . . . . . . . . . . . . . . . . . . . . 68 4.1.8 Summative: Case Studies in Economics . . . . . . . . . . . . . . . 70 4.1.9 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 vi 4.1.10 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 Findings in IA with Economists . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.1 Representative Use Case . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.2 Explore Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2.3 Presentation Stage . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.4 All Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.5 Self-reported Perceptual and Cognitive Effects . . . . . . . . . . . 82 4.2.6 Qualitative Feedback . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.3 A Discussion of Our Predictions Versus Our Results . . . . . . . . . . . . . 88 5 View Management for Situated Visualization 95 5.1 Properties and Challenges in Situated Visualization View Management . . . 97 5.2 Prototyping Situated View Management . . . . . . . . . . . . . . . . . . . 100 5.2.1 World-in-Miniature . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2.2 Summon and Dispel . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.2.3 Shadowbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.2.4 Cutting Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.2.5 Data Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.3 Motivating Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.3.1 Situated Analytics Implementation Details . . . . . . . . . . . . . . 109 5.4 Evaluating Situated Analytics . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4.2 Apparatus and Data . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4.3 Experimental Design and Procedures . . . . . . . . . . . . . . . . . 113 5.4.4 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.5 SA View Management Findings . . . . . . . . . . . . . . . . . . . . . . . 116 5.6 A Discussion of Post-Experiment Thoughts . . . . . . . . . . . . . . . . . 125 III Part 3: Models for Interpreting User Session Data 128 6 Gesture and Action Discovery 129 6.1 Observational Data Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.1.1 Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.1.1.1 Joint Angle Segmentation . . . . . . . . . . . . . . . . . 133 6.1.1.2 Symbolic Representation . . . . . . . . . . . . . . . . . . 134 6.1.1.3 Semi-Supervised Clustering . . . . . . . . . . . . . . . . 135 6.1.2 Deep Learning Network . . . . . . . . . . . . . . . . . . . . . . . 137 6.1.2.1 OpenPose CNN . . . . . . . . . . . . . . . . . . . . . . 137 6.1.2.2 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.2 Experiments in Computer Vision for Pose Grouping . . . . . . . . . . . . 139 6.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 vii 6.2.3 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.2.4 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.3 A Discussion of Our Pipeline Results . . . . . . . . . . . . . . . . . . . . . 146 7 UX Evaluation using Visualization and Computer Vision 149 7.1 Framework: Extracting Behavior from Video . . . . . . . . . . . . . . . . 149 7.1.1 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.1.2 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . 151 7.1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 7.2 System Infrastructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.2.1 uxSense: Computer Vision for HCI . . . . . . . . . . . . . . . . . 154 7.2.1.1 Overall Workflow . . . . . . . . . . . . . . . . . . . . . 155 7.2.1.2 Feature Extraction Filters . . . . . . . . . . . . . . . . . 156 7.2.1.3 Analysis Interface . . . . . . . . . . . . . . . . . . . . . 157 7.2.1.4 Annotlettes: Micro-Report Generation with uxSense . . . 159 7.2.1.5 Implementation Notes . . . . . . . . . . . . . . . . . . . 160 7.3 Expert UX Designer Review of CV for HCI . . . . . . . . . . . . . . . . . 161 7.3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.3.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.3.3 Tasks and Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.4 Computer Vision for HCI User Study Results . . . . . . . . . . . . . . . . 164 7.4.1 Think-Aloud Transcripts . . . . . . . . . . . . . . . . . . . . . . . 165 7.4.2 User Experience Survey . . . . . . . . . . . . . . . . . . . . . . . 165 7.4.3 Time Use: Observed and Self-Reported . . . . . . . . . . . . . . . 167 7.4.4 Eating Our Own Dogfood: uxSense in Three Vignettes . . . . . . . 169 7.5 A Discussion of the Vision uxSense Represents and our Evaluation Findings 173 IV Part 4: Synthesis, Limitations, and Research Vision 176 8 Implementation: Wizualization, Optomancy, Weave, and Spellbook 177 8.1 Design of the Wizualization ( ) Rendering System . . . . . . . . . . . . . 179 8.1.1 System Overview and Specifications . . . . . . . . . . . . . . . . . 181 8.1.2 Cross-Virtuality: Arcane Focuses ( ) and Weave ( ) . . . . . . . 181 8.1.3 Indirect User Input Interpretation . . . . . . . . . . . . . . . . . . . 182 8.1.3.1 Verbal Components (Spoken Commands) . . . . . . . . . 183 8.1.3.2 Somatic Components (Gestures) . . . . . . . . . . . . . . 185 8.1.3.3 Material Components (?Enchanted Items?) . . . . . . . . 186 8.2 Optomancy: The Grammar of Wizualization . . . . . . . . . . . . . . . 188 8.2.1 Interactions and Spell Chaining: Macro Recording as Spellcrafting . 190 8.2.2 Grammar Transition Data Format and Cast List . . . . . . . . . . . 190 8.3 Spellbook: A Mixed Reality Code Notebook . . . . . . . . . . . . . . . 191 viii 8.3.1 Compendium of Primitives . . . . . . . . . . . . . . . . . . . . . . 192 8.3.2 Linked Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 8.4 Postmortem Discussion of Wizualization . . . . . . . . . . . . . . . . . . . 194 9 Limitations 198 9.1 Limitations in Small-Sample Qualitative Work . . . . . . . . . . . . . . . . 198 9.2 Technical Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9.3 Ethical Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 10 Conclusions 203 10.1 Questions Answered and Objectives Met . . . . . . . . . . . . . . . . . . . 203 10.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Bibliography 208 ix Part I Part 1: Overview 1 Chapter 1: Introduction Sometimes, we want to keep working even when we?re not at our desks. Or maybe we simply want a nice change of scenery to do the work we were already planning on doing in front of a screen in a stuffy office. This has never been more true than it is now, with many of us in a professional climate that has pivoted dramatically toward remote work following the rise of the COVID-19 pandemic. However, programming and data analysis are hard to do using a phone, leaving remote workers just as stuck with the desktop or laptop as their core working environment as they were prior to the global outbreaks beginning in 2020. There is a world beyond the mouse, keyboard, and monitor that is currently being delved to its depths by a small but growing population within the visualization research community which concerns itself with immersive environments?settings that surround the user with virtual representations of their work, either bringing the data to their world or bringing them into the imagined and abstract world of their data?but the average analytical professional has yet to see the fruits of all the labor performed by these researchers. Within this immersive analytics community is the domain of situated analytics, and therein lies the means through which the analyst can free themselves of the desk. But this begs the questions: Should they? What do they stand to gain, beyond simply changing their surroundings? We want to build immersive and situated visual analytics systems that fit the needs of the average data analyst based on their real-world workflow. But first, we need to establish that there is justification in doing so?not just because we want to take our analysis on the road and pull ourselves away from the desk, but because there 2 are real benefits in terms of task performance, work enjoyment, and other elements that the analyst takes into consideration when deciding where and how to do their analyses. I will demonstrate through both a review of existing work (Section 2.2) and my own original work (Chapter 4) that immersive visual analytics systems do offer real benefits over traditional environments. 1.1 Context In his 90-minute-long ?Mother of All Demos? given at the 1968 Fall Joint Computer Conference in San Francisco, Douglas Engelbart introduced, among many other elements of modern computing, the computer mouse.1 The manipulation of windows (also introduced by Engelbart in the same demo) and other interface components (several of which were also introduced at the same time) via the mouse cursor would change the course of computer navigation and define what would long be considered modern personal computing. The reduction of the barrier between user and virtual objects represented a dramatic shift toward direct manipulation [217]. In a 1993 magazine article on the concept of ubiquitous computing that he had introduced two years prior, Mark Weiser said that ?[the] best user interface is the self-effacing one, the one that you don?t even notice? [247, p. 71]. In other words, one of the central assumptions of both ubiquitous computing and of direct manipulation is that the thinner the perceived boundary (i.e., the interface) between the user and their objectives, the better. Touchscreens further reduce this barrier. While the invention of the touchscreen in 1946 predates the invention of the computer mouse by about two decades, touchscreens at first suffered a bad reputation among both human-computer interaction (HCI) research community and the computing device industry until the introduction of three redeeming features by researchers 1Video of the Mother of All Demos at the timestamp of the demonstration of the computer mouse is available at https://www.youtube.com/watch?v=yJDv-zdhzMY&t=1888s. 3 at the University of Maryland (UMD) Human-Computer Interaction Lab (HCIL): the ?lift-off strategy? of waiting for touch input to end before triggering an event (1988) [190], touchscreen switches and sliding toggles (1990 and 1991) [189],2 and a comparative analysis of high-precision touchscreens (1991) [211]. In 2013, Elmqvist and Irani [66] narrowed Weiser?s vision specifically to the area of data analysis, coining the term ?ubiquitous analytics:? The use of embedded and mobile networked devices for data analysis, and ideally for analyses that exploit so-called ?big data.? This vision of ubiquitous analytics has been hampered, however, by the fact that serious programming and data analysis remain difficult to do using a phone, and portable devices like laptops may offer a complete working environment for analysts, but they are not truly mobile devices. The following year, Roberts et al. [198] extended Elmqvist and Irani?s vision to include immersive displays, such as virtual reality head-mounted displays (HMD) or multisensory displays that convey data through touch, sound, scent, or taste instead as well as visually. It is only now, nearly a decade later, that HMDs are increasingly affordable and capable of rendering high-quality environments and the visual analytics research community is shifting its focus more intently toward the immersive through the domains of immersive analytics (IA) and situated analytics (SA). As a new field, IA presents a number of ?grand challenges?, several of which have been outlined by Ens et al. [71]; many of these challenges revolve around SA, the use-cases and evaluation of IA, collaboration, and interaction. One such challenge in IA that does not fit into these categories but does mesh well with the focus on large datasets and computer vision modeling seen in early work in ubiquitous analytics is the combination of human and computer intelligence. Computer intelligence?specifically, computer vision?was also identified by Roberts et al. [198] as an ?enabling technology? 2Materials, including papers and video of a demonstration of the touchscreen switches and toggles, can be found at https://www.cs.umd.edu/hcil/touchscreens. 4 Model Acquire Summarize EDA Tabulate Hypotheses Graph Figure 1.1: Exploratory data analysis elements and flow. for ubiquitous visualization displays. 1.2 Terminology Below I define some of the commonly-used terms throughout this dissertation. Immersive Analytics (IA): A domain of visualization research covering techniques and systems in which the user is spatially co-located with virtual representations of abstract data. Situated Analytics (SA): IA that deals specifically with mixed reality (MR) views of data and the situational context of the user. Exploratory Data Analysis (EDA): The steps after acquiring a dataset during which the analyst tabulates, summarizes, and creates visual representations of data to form hypotheses (Figure 1.1). Grammar of Graphics: Any declarative language and set of rules that is composed of a set of building blocks for specifying visual representations of data. Machine Learning (ML): The use of computer algorithms that ?improve? (typically in accuracy given a set of target outputs) based on repeated exposure to data. Computer Vision (CV): The use of computer algorithms, typically machine learning algorithms, for processing and interpreting image and video data. 5 Human-Computer Interaction (HCI): A field of research focusing on all points at which the human and the computer meet. Contextual Design: A method of software and system design that involves inserting the researcher into the context in which the target user goes about their daily lives. Contextual Inquiry: The data collection methods of contextual design involving the observation of, and communication with, participants. 1.3 Research Problems, Questions, and Objectives The grand challenges described by Ens et al. [71] aside, Elmqvist and Irani?s [66] vision of ubiquitous analytics and the emergence of the domains of situated and immersive analytics demonstrate a great deal of interest in data analysis beyond the monitor and keyboard on the part of the visualization research community. I, too, believe that the future of visual analytics is in immersive, situated systems for research, development, and data analysis. Yet researchers in IA and SA face a problem: We do not see data scientists flocking to either mobile or immersive environments to do their daily work. This observation?that data scientists and similar analytical workers are not making use of immersive technology for their work?begs the problem central to this dissertation: If there is so much to be gained by adopting a display system that is both mobile or situated and immersive for serious data analysis, what are the design requirements for real data analysts to voluntarily do real work in such a system? Augmented reality HMDs of today are, I argue, in a similar market position to the one touchscreens were in following the publications by UMD HCIL noted in Section 1.1: There is a great deal of interest in the research communities to which they are relevant, but the industry and consumer markets for them have not yet caught up to the research. The HoloLens 2, is listed at the time I am writing as being available to purchase at the hefty?but 6 representative for AR HMDs?price tag of $3,500 (USD); AR HMDs are close to being, but are not quite yet, poised to become a common household appliance. In conjunction with the price, one thing holding the AR HMD back from ubiquity is the lack of a so-called ?killer app.? I believe that such an app may be found in analytical work; no device but the AR HMD offers as thin an interface between the user and their environment, nor an interface that can be so seamlessly integrated with the tools of the trade and workflows currently common among analytical workers. In addition to my beliefs about immersive and situated analytics as the future of visu- alization research, I also argue that systems in this domain are particularly well-suited for machine learning pipelines designed to support user interaction evaluation as part of the system design and development process. I make this argument based on the self-evident observation that immersive and situated systems require necessity of a level of detail in spatial tracking of users that is neither needed nor typical in traditional displays. Another research question also arises from both ubiquitous and immersive analytics literature in this context: How can machine learning and related data modeling algorithms be applied to user data to support immersive and situated analytics? At its highest level, my work here revolves around the objective of creating a full-featured R&D and data analysis workspace beyond the 2D screen that is based on real domain expert work processes with the extension of the user?s view and experience based on their inputs and activities as a focal center of my system?s design. These inputs can be any data observable during their user sessions from keyboard activity to camera or sensor tracking data. In other words, while my published work to date is sadly lacking the influence to prove my thesis by virtue of having created the killer app that makes AR HMDs become an item found in every household, my research objective is to make full use of multi-modal user inputs (e.g., combining spoken word with hand gesture with direct manipulation of selected objects) in IA and SA environments, which should in turn be designed for easy integration with 7 existing data science workflows. 1.4 Thesis Statement My dissertation focuses on combining design parameters based on observations of data science workflows with models for interpreting real-time user interaction data to produce an improved system and grammar of graphics for situated analytics. 1.5 Relevance and Contributions Included in my work here is material from the following published and still under review research papers, with my role relative to the author authors described following each item within the list below: 1. Andrea Batch and Niklas Elmqvist. The interactive visualization gap in initial ex- ploratory data analysis. IEEE Transactions on Visualization and Computer Graphics, 24(1):278?287, Jan. 2018. In this work, my first publication?conducted mainly during the summer prior to my first doctoral-level classes?I played the role of the embedded researcher, conducting user sessions with participants across several agencies within the U.S. Department of Commerce and performing a qualitative evaluation of their work and how visualization fits into the initial iterations of their exploratory analysis process. 2. Andrea Batch, Kyungjun Lee, Hanuma Teja Maddali, and Niklas Elmqvist. Gesture and action discovery for evaluating virtual environments with semi-supervised seg- mentation of telemetry records. In Proceedings of the IEEE International Conference on Artificial Intelligence and Virtual Reality, Piscataway, NJ, USA, 2018. IEEE. In this publication, I, Kyungjun Lee, and Hanuma Teja Maddali split the tasks of 8 developing the machine learning pipeline for identifying and clustering novel gestures, with my focus being on combining pre-trained joint recognition models with the com- putational statistics algorithms used to segment and cluster joints, Kyungjun focusing on constructing the conputer vision models used to validate the clusters, which I also evaluated, and Teja sharing in both areas of application as well as in constructing the architecture for our pipeline. 3. Andrea Batch, Andrew Cunningham, Maxime Cordeil, Niklas Elmqvist, Tim Dwyer, Bruce H. Thomas, and Kim Marriott. There is no spoon: Evaluating performance, space use, and presence with expert domain users in immersive analytics. IEEE Transactions on Visualization and Computer Graphics, 26(1):536?546, 2020. In this publication, I once again took up the mantle of the embedded researcher, this time exclusively at the U.S. Bureau of Economic Analysis, where I lead the data collection and analysis; I conducted all user sessions with our expert participants during the course of this study, and all authors contributed to evaluation of the sessions. I also made some minor software development contributions in extending ImAxes, an implementation introduced by my coauthors Maxime Cordeil, Andrew Cunningham, Tim Dwyer, Bruce H. Thomas, and Kim Marriott in their prior work [52]. 4. Andrea Batch, Yipeng Ji, Jian Zhao, Mingming Fan, and Niklas Elmqvist. uxSense: Supporting user experience evaluation using visualization and computer vision. Pend- ing review for publication. In this work, I developed the computational statistics and computer vision pipeline back-end, and co-developed the front end with Yipeng Ji; Yipeng also worked on the audio processing filters of our system. I also conducted the remote user studies and evaluated the data we collected. 5. Andrea Batch, Sungbok Shin, Julia Liu, Peter W.S. Butcher, Panagiotis Ritsos, and 9 Niklas Elmqvist. The world is your Holodeck: View management for situated visual- ization. Pending review for publication. In this evaluation study, I lead the software development of the implementations we used to test our view management techniques, with Sungbok Shin, Julia Liu, and Pete Butcher sharing the responsibility as co-developers. Pete and I evenly split the task of conducting our user sessions, which I then evaluated. 6. Andrea Batch, Peter W.S. Butcher, Panagiotis Ritsos, and Niklas Elmqvist. Wizualiza- tion: A ?Hard Magic? WebXR Visualization System. Pending review for publication. In this implementation, Pete Butcher and I split the roles of developing software for our Wizualization ecosystem, with my work including our rendering system (Wiz- ualization), signaling server (Weave), and code notebook (Spellbook), while Pete constructed our grammar of graphics (Optomancy). There have also been several peripherally-related publications that I played roles in which are relevant enough to cite in this dissertation, but which were either not so integral to the scope of this dissertation to merit inclusion as content or for which my role was not central enough to justify its inclusion here, including: 1. Biswaksen Patnaik, Andrea Batch, and Niklas Elmqvist. Information olfactation: Harnessing scent to convey data. IEEE Transactions on Visualization and Computer Graphics, 25(1):726?736, 2018. In this publication, Biswaksen Patnaik engineered the hardware and I developed the software to construct an olfactory display system able to swap modes between both desktop and VR; I also evaluated a selection of work from across several disciplines, including HCI, cognitive science, and neurology, and used it as the grounding for our theoretical model of information olfactation. 10 2. Andrea Batch, Biswaksen Patnaik, Moses Akazue, and Niklas Elmqvist. Scents and sensibility: Evaluating information olfactation. In Proceedings of the ACM Conference on Human Factors in Computing Systems, page 1?14, New York, NY, USA, 2020. ACM. Biswaksen and I both reprised our respective roles in this publication, I as lead software developer and he as lead hardware engineer, with Moses Akazue joining the project as a leading expert on thermal display engineering. Biswaksen conducted the user studies in person for this study, and all co-authors contributed to evaluation. 3. Sebastian Hubenschmid, Jonathan Wieland, Daniel Immanuel Fink, Andrea Batch, Johannes Zagermann, Niklas Elmqvist, and Harald Reiterer. ReLive: Bridging in-situ and ex-situ visual analytics for analyzing mixed reality user studies. In Proceedings of the ACM Conference on Human Factors in Computing Systems, New York, NY, USA, 2022. ACM. While this publication is relevant to this dissertation, my contributions in this work were not significant enough to justify its inclusion here; my role initially revolved around developing the 2D web-based interface for the ReLive system, but as my other responsibilities and projects piled up, most of the development tasks for the 2D interface were offloaded onto my very gracious co-authors, especially Sebastian Hubenschmid, Jonathan Wieland, and Daniel Fink, who handled it with aplomb. I also contributed anonymized data from one of my prior studies as use cases, and conducted a small number of the user studies with expert visualization researchers for evaluation of the ReLive system. The publication history of my work in this area does, I hope, indicate that my contribu- tions have been deemed relevant, interesting, and important enough by the visualization and HCI communities to have a place in the discussion of immersive analytics, user evaluation, 11 and the intersection of both specializations. Indeed, the primary two contributions of this dissertation revolve around this intersection, and they are as follows: First, I construct a design space from several years of evaluating a) the data analysis pro- cesses of data science and domain experts in their places of work, and b) user studies of immersive and situated systems and techniques. I also describe in detail the methods, results, and various implementations that I have contributed to as noted in the lists above. Second, I introduce Wizualization, a novel situated analytics implementation designed to be practical for data scientist and domain expert use based on the first two contributions of my work: Practical, in that it fits the requirements of the user?s workflow and the value it adds to the user?s work is worth the inconvenience of routinely using a HMD. While not every aspect of prior systems featuring machine learning pipelines that I have contributed to?most notably uxSense, an HCI evaluation visual analytics tool featuring a pipeline and supporting the detection of a priori events of interest in user data for the purpose of evaluating unanticipated interactions with the interface?have been integrated with the Wizualization system, many of the methods I have picked up for working with multi-modal action and interaction data have been integrated with deterministic models as part of the extensible nature of Wizualization and the components we have developed as part of its ecosystem. 1.6 A Positionality Statement The data science work process is central to my thesis, and I also want to give you an idea of what my own work process has looked like, so I?ve tried to represent it here. I see myself first and foremost as a software developer or engineer: I like to build things that turn data into insight, and that?s what I came to Maryland in the hopes of getting better at. That?s what the largest part of the process flow I have included in Figure 1.2, ?Iterative 12 Itera?ve Design Design Planning Theore?cal Development Cycle Framework Observe Work Process System Implement Evaluate Architecture Technique Design Figure 1.2: I consider myself to be, above all else, a software architect at heart; given the focus in my work on data science processes, I feel it appropriate to also describe the general cycles of my research and development process. Design,? is all about. But before we can do that, I first needed to understand the ways that my target users already make sense of data, which is where the first step, observing their work processes, on the left of my iterative design cycle, comes into play. This is why so much of my work has revolved around the qualitative evaluation of my target user group. However, despite this reliance on qualitative work, consider this as a design perspective: We believe tools should be designed to be deployed into uncontrolled, messy, settings, where context matters and the task flow is be determined by the user. Data collection out in the wild is going to be limited to events and entities in the system and the sensors available on the user?s hardware that the system is able to access. As an economist by training and trade, my instinct is to use this observational data to detect patterns in user behavior. This was our perspective in the algorithm we introduced in our work on gesture and action discovery in Chapter 6. We wanted to take this design perspective to its logical conclusion?take it a step further to ease the qualitative evaluation process, which we believe we demonstrate in our work on developing uxSense, presented in Chapter 7. All the work in this dissertation involving participants has been approved by the Univer- sity of Maryland, College Park Institutional Review Board. 13 Data Science Workflow ? Design Guidelines (Part 2) Evalua?ng SA Understanding the Understanding views that work users: Contextual their use of space: across pla?orms inquiry in ?The Economists using with technical Synthesis (Part 4) Interac?ve ImAxes in ?There experts in ?The Visualiza?on Gap? is No Spoon? World is Your Wizualiza?on: Chapter 3 Chapter 4 HoloDeck? Crea?ng a system Chapter 5 and grammar for Developing data science in computer vision Combining CV (CV) models to models with situated analy?cs interpret human interface design Chapter 8 behavior in and evalua?ng with ?Gesture and UX researchers in Ac?on Discovery? ?uxSense? Chapter 6 Chapter 7 Modeling User Data (Part 3) Figure 1.3: Two threads of research, synthesized in the implementation of Wizualization and the components of its ecosystem. The Part in this dissertation that each thread corresponds to and the Chapter in this dissertation that each study corresponds to are shown in bold. 1.7 The Structure of this Dissertation The structure of this dissertation is split into four parts: Part 1, which begins with this chapter, presents a top-level roadmap to my work, including the related work particularly in visualization, human-computer interaction, and computer vision. My work can be viewed as falling along two threads of research, as shown in in Figure 1.3; Part 2 deals with the first thread, in which I seek to gain an understanding of the user?s data analysis work process and use that to construct design guidelines. Part 3 deals with the second thread, in which I expand on the perspective I have just introduced in Section 1.6 regarding the use of observational data collected during user sessions for identifying patterns in their interactions and experience through the use of statistical and machine learning models. Part 4 synthesizes these two threads of research via the implementation of a system, Wizualization; as I said, I see myself first as a software developer, so what better way to present the synthesis of my work than through the implementation of software? In the second chapter of Part 1 detailing related work, Chapter 2, I begin with the 14 most important part of any analytical system: The human user. The users that my work has targeted have all been analytical experts of some stripe: Data scientists, economists, UX researchers, and others following similar pursuits. Section 2.1 briefly details prior work investigating the tasks, tools, and processes that such users are often concerned with. Section 2.2 makes what may at first seem a jarring pivot into the discussion of how physical space and the objects in them can support cognition; however, it is this relationship between space and analysis that has motivated my work, so please bear with me. Section 2.3 briefly discusses some of the more influential grammars of graphics for data visualization; the reason yet another seemingly abrupt shift in direction will become evident to the reader by the end of Section 2.2.5. Section 2.4 covers a broad selection of qualitative and quantitative methods for evaluating the points at which human meets computer that I believe are most germane to the methods that I myself have applied. Again, the chapters of Part 2 detail my work along that first thread of research in Figure 1.3; each of these chapters presents the reader with my methods, the study results, and the discussion of those results which lays out any design findings that arose from our study. In the first chapter of Part 2, Chapter 3, I discuss my exclusively qualitative methods in Section 3.1, my results in Section 3.2, and the design ramifications of our findings in Section 3.3. The second, Chapter 4, describes a multi-phase study that involved a mixture of qualitative and quantitative methods, described in Section 4.1, selected specifically for immersive analytics systems; the results of our study are discussed in Section 4.2, and the design implications are discussed in Section 4.3. We narrow our focus further to a study of view management for situated visualization in Chapter 5; much of the focus of this chapter is on the design space itself?namely, properties and challenges of SA view management (Section 5.1) and how those properties and challenges relate to the techniques we opted to evaluate (Section 5.2). We discuss the methods we used in Section 5.4 and our results in Section 5.5, and the design implications in Section 5.6. 15 The chapters of Part 3 detail my work along the second thread of research in Figure 1.3. Chapter 6 presents a lens through which models for detecting and classifying spatial and multi-modal human behavior might be seen as tools to interpret the time users spend in a system, with IA systems in particular in mind, as they involve a fuller use of the user?s body and the space around them; in this chapter, we introduce and evaluate a pipeline for detecting novel actions based on user pose data extracted from video without prior knowledge of what those actions may be. The methods we used for this pipeline, described in Section 6.1, are exclusively quantitative, and covers the use of machine learning and statistical models for evaluating observational data. Chapter 7 continues this train of thought by introducing uxSense, a client/server system that implements our framework for extracting user behavior from video; an overview of our mixed methods for evaluating uxSense is provided in Section 7.3. Part 4 synthesizes the threads of research discussed in Parts 2 and 3. The first chapter of Part 4, Chapter 8 introduces Wizualization, its grammar of graphics (Optomancy), its signaling server (Weave), and the code notebook we built to demonstrate ecosystem (Spellbook). Chapter 9 summarizes the limitations of my work, which I view as falling into three categories: Methodological (in the case of qualitative work, which practically necessitates small user samples), technical, and ethical. Finally, this dissertation concludes with Chapter 10. I summarize my contributions in Section 10.1, while Section 10.2 discusses some suggestions for future directions in research. 16 Chapter 2: Related Work In order to understand the design space of situated visualization and analysis tools, I must first discuss the design issues and problems that affect the wider world of immersive analytics (IA). For my application in particular, I must also first understand the workflow for which situated analytics is perhaps most relevant. 2.1 Data Science Workflows: Batch?s Visualization Gap Digital tools are critical to data science and analytics workflows, and current practice spans data analysis tools such as R1, Pandas [157]2, and SAS3; database systems such as MySQL4, MongoDB5; data warehousing services such as Amazon Redshift6; and machine learning libraries such as scikit-learn [186]7 and TensorFlow [1]8. While there is no formally standardized workflow or process that fits every data scientist, and every professional tends to establish their own, a common process typically consists of the following general stages [4, 18, 119]: 1. Discovery: Formulating an interesting question and determining the data necessary to 1http://r-project.org/ 2http://pandas.pydata.org/ 3http://www.sas.com/ 4http://www.mysql.com/ 5http://www.mongodb.com/ 6https://aws.amazon.com/redshift/ 7http://scikit-learn.org/ 8https://www.tensorflow.org/ 17 answer it; 2. Acquisition: Locating, organizing, and preparing data so that it is accessible to the chosen analysis environment;9 3. Exploration: Investigating and analyzing the dataset in order to collect insights and understand the data; 4. Modeling: Building, fitting, and validating a model that can explain the dataset and the observed phenomena; and 5. Communication: Disseminating the results to stakeholders in reports, presentations, and charts. Static visualization is commonly used in the communication phase of data science workflows, and data scientists sometimes use them as part of the analysis as well [87, 119]. John Tukey?s notion of exploratory data analysis [237] is firmly entwined with visual methods. Four years ago, Niklas Elmqvist and I began an exploration of how visualization fit into data scientists? exploratory analysis process, and at that time, interactive visualization was generally not a standard component of this workflow [18]. Spreadsheet software chart generation and tools that extend data tables, such as Tableau [228], along with Python and R libraries such as ggplot2 [252], were used for a variety of static visualization techniques in a format easily accessible and usable by data scientists during the course of my study. While examples of visualization researchers developing techniques using environments popular with data scientists do exist [187], they were not commonplace. The response to my findings by the visualization community has shifted the needle forward on closing this gap. Shortly before my work was published, Satyanarayan et al. [205] had already begun to address the gap by introducing a high-level grammar of 9Also often called ?ETL,? meaning ?extract, transform, and load.? 18 graphics, Vega-Lite, which presents a set of standardized linguistic rules for producing interactive data visualizations using a concise JSON format for data to be represented by the grammar. Since my publication on the visualization gap, further inquiry has been conducted to characterize the analytical processes of data scientists and domain experts, and how visualization research can better support their work. Milani et al. [160], for example, look evaluate the early parts of the data science workflow, creating profiles of data pre- processing activities. In their retrospective analysis, Crisan et al. [54] break the data science workflow into four higher-order processes and fourteen lower-order processes, with each higher-order process containing one lower-order process for which visualization was a key component: profiling (preparation), interpretation (analysis), monitoring (deployment), and dissemination (communication). New libraries and systems have also been introduced which combat the visualization gap, including Vistrates [11], and an implementation that has perhaps held most true to my recommendations, B2 [259], a Jupyter notebook that brings interactive visualization to the tools that are already common for exploratory analysis among data scientists. 2.2 ?Space to Think? Managing and navigating space, virtual or physical alike, has always been central to human cognition. As Norman holds, ?it is things that make us smart? [175], and according to distributed [107], embodied [213], and extended [47] forms of cognition, this very much includes physical space. In seminal work from cognitive psychology, Kirsh [125] demonstrated that humans tend to offload cognitive tasks in physical space to simplify choice, perception, and internal computation. But how many of these ideas translate to digital space on a computer screen? Kirsh and Maglio [126] showed that screen space can support internal computation in 19 so-called epistemic actions?actions that serve no other purpose than to facilitate thought? in the video game Tetris. Similar effects have also been observed for recall through spatial memory: using the Data Mountain [199], where digital objects are arranged on the face of a pseudo-3D ?mountain,? participants were able to find previously placed website icons significantly faster than when using a conventional bookmark display. This harnessing of spatial memory is also similar to users leveraging physical navigation [14] in large and immersive displays with persistent locations of objects, thus allowing muscle memory and proprioception to replace some of the mental effort involved in spatial navigation. In particular, having access to large visual spaces has been shown to be useful for analytical tasks. For example, screen space can be organized into complex structures such as lists, stacks, heaps, and composites [216], thus reducing the need for mental models. Tan et al. [231] compared analytical task performance between monitors and wall displays, and showed that a physically large display yields significant improvements due to the increased immersion and presence, which biased participants to adopt an egocentric view of the data. Reda et al. [194] built on such findings to study the impact of physical size on actual visual exploration of data, and found consistent effects where more pixels yielded more discoveries and insights. Finally, Andrews et al. [5] (to whom I owe the title of this section) directly addressed strategies for spatial arrangement of documents on a large, tiled 2D display in a visual analytics task. Their observations unearthed several interesting phenomena, such as the support of external memory, the structuring of the space using grouping and layout, and the high degree of integration between process, representation, and data that the large display space scaffolded. One of the mechanisms by which IA supports human cognition is by simulating the experience of space and self: Through presence, immersion, and embodiment. Presence is the subjective psychological experience of being in a virtual or remote space, and immersion is the objective characteristics of the technology used to present the space [123]. The sense 20 of embodiment refers to the sensations that accrue while being inside, having, and controlling a body in VR. A common method of measuring presence is with questionnaires [15, 138, 210, 222, 241, 257]. These studies make a distinction between immersion and presence, where immersion is a necessary (but not sufficient) condition for the experience of presence in a VR interface [83, 222, 257]. Another necessary condition is involvement (or attention): the internal processes and external conditions influencing the user?s ability to focus on stimuli in the environment [257]. Clearly, while immersion is tied to the technology used to deliver the virtual environment, presence is a more holistic property that is harder to pin down. Witmer and Singer argue, backed by other foundational research on the subject, that immersion and presence are determined by factors influencing the user?s sense of control, realism, sensory feedback/stimulation, and distraction [138, 214, 257]. 2.2.1 Direct Manipulation and Sketching in Visualization Beyond games, one of the original applications of direct manipulation (as discussed in Section 1.1) was in visualization [217]. As a case in point, the seminal dynamic queries method minimizes indirection and reduces barriers between users and visual representa- tions [2]. Building on the direct manipulation idea, Elmqvist et al. [68] proposed the notion of fluid interaction, arguing that ?interaction in visualization is the catalyst for the user?s dialogue with the data, and, ultimately, the user?s actual understanding and insight into these data.? They define a fluid interface as being one that: (1) promotes flow (a mental state of complete immersion in an activity) [55], (2) supports direct manipulation [217], and (3) minimizes Norman?s gulfs of interaction [177] (i.e., the difference between the user?s intended action and the actions afforded by the system). Sketch-based systems are an example of fluidity in interaction. In SketchStory, Lee et al. [136] enable users to give ad-hoc data presentations by authoring visualizations on the 21 fly using sketch-based pen strokes. ScribbleQuery [174] applies touch-based sketching to brushing and selection in parallel coordinate plots. In Visualization-by-Sketching [209], Schroeder and Keefe pivot from analytical users to artistic users as their target population in implementing a system that augments the digital work of artists and designers with real data. The advent of consumer-ready immersive displays reduces the barrier further still, with the most common modes of interaction in immersive analytical environments being virtual hands and virtual ray pointers [242]. Scientific sketching [120] allows for free-form 3D sketching to support data analytics in an immersive environment; the approach has since been adapted to multiple applications, including 3D fluid flow, collaborative analysis, and paleontology [178]. With that said, the use of sketching in immersive systems is not new; in fact, work in the area of direct manipulation was already exploring this mode of input for creating 3D scenes in 1996 with the SKETCH system [263]. However, this and similar work of that era focused on simulated objects or free-form design [109] rather than on data visualization. The increasing affordability of immersive head-mounted displays have given rise to new, more specialized fields of research, most relevantly that of visualization revolving around immersive and situated analytics. 2.2.2 Immersive Analytics AR, along with virtual reality (VR) and mixed reality (MR)?immersive display and input technologies on the reality-virtuality continuum [161]?have long been used for visualizing physically embedded data [128, 131, 197, 239]. Recently, this has been extended to include more abstract data using immersive analytics [43, 65, 154]. IA is a visualization (VIS) framework and research focus on environments in which the user is spatially co-located with virtual representations of abstract data [43]. According to Dwyer et al. [65], ?Immersive Analytics is the use of engaging, embodied analysis tools to support data understanding 22 and decision making.? This notion fits well with a definition of AR by Mackay [146], which describes an environment augmented by interactive, networked objects. Mackay?s viewpoint also approaches Weiser?s [246] Ubiquitous Computing [16], and in that regard is well aligned with the notions of ubiquitous [66], immersive, and situated analytics [154] explored in this work. Several IA applications have emerged that leverage the presence and engagement of VR. Simpson et al. [221] proposed an IA tool to explore climate economy models by leveraging spatial understanding from immersion on 2D multidimensional representations. The open-source ImAxes system [52] introduced the concept of an embodied axis to enable users to quickly build multidimensional visualizations in VR using natural interactions. FiberClay [106] uses an immersive approach for exploring large-scale spatial trajectory data in 3D, and the system was informally evaluated with air traffic controllers. However, none of these systems involved formal studies on how experts use the available 3D space, or how they might use immersive systems in day-to-day data analysis. Patnaik et al. [183] introduced the design space of information olfactation?the use of scent to convey abstract information?and implemented viScent as proof of concept. Butscher et al. [38] proposed the ART tool for collaborative AR parallel-coordinate-plot viewing with tabletop touch-input and performed an informal group-based walkthrough evaluation of the system with expert users exploring immersion, presence, spatial layout, and engagement. The premise of IA is that the immersive setting will yield a richer and more embodied data analysis experience than traditional means. IA has been touted to decrease level of indirection, allow more natural input mechanisms, and the free-form space of a 3D virtual environment, which enables intelligent space usage [5, 125]. However, navigation and orientation issues arise in immersive environments in general, and immersive visualization tools are no exception. Problems like occlusion [67], depth and distance perception [60], as well as interaction with said distant objects [249] remain challenging issues in ?vanilla? 23 VR/MR and, consequently, in IA. Last but not least, discomfort, motion sickness, and other ocular and non-ocular symptoms of HMD use, well-explored in the VR domain [114, Pp. 159?221], are worthy of consideration in any immersive implementation. There are still few studies that test these factors for IA, but empirical grounding for IA and its expanding design space has begun to emerge from both the virtual and mixed reality (VR/MR, collectively denoted XR) [250], and VIS communities [9, 17], including examples of multimodal data representation?systems that present data to the user using senses other than vision?such as the work in olfactation that I have conducted alongside Biswaksen Patnaik, Moses Akazue, and Niklas Elmqvist [21]. In a related study, Steed et al. [225] evaluated these factors with Samsung Gear VR and Google Cardboard, and they found tangible evidence of aspects of presence and immersion being measurable in this setting. Mottelson and Hornb?k [163] conducted a similar field-deployed evaluation with cardboard VR devices, comparing the results to a laboratory study. Their findings are consistent with those of Steed et al., yet also indicate that performance is impacted by the quality of the VR technology and the internal validity of the study. However, because the discipline is relatively new, the problems involved in designing IA environments have not been thoroughly defined or validated. In Batch et al. [17], Andrew Cunningham, Maxime Cordeil, Niklas Elmqvist, Tim Dwyer, Bruce H. Thomas, Kim Marriott, and I conducted an evaluation of economists exploring multivariate temporal data ?in the wild;? few other recent studies exist that study VR ?in the wild,? and even fewer exist for multidimensional data visualization. 2.2.3 Situated Analytics Situated analytics (SA) [69] is a subspace of IA that deals specifically with MR views of information that visually link virtual and physical objects of interest, registering spatial 24 locations for abstract information and supporting analytical interactions. SA has strong links to early research efforts on mobile, wearable?and often cumbersome?AR, such as the Touring Machine [76], which provided information, as labels that were situated near various buildings of a university campus. Since then, the hardware requirements have become significantly less cumbersome, and the application areas and task requirements of SA systems have also diversified. However, despite the growing trend of emergent SA systems and use cases, there are still technical vulnerabilities?such as inaccurate GPS sensors [110, 251]?and intrinsic challenges?such as occlusion and distortion [45]?that adversely affect user experience and task performance [45]. While IA and SA are relatively new areas of research, there are a multitude of existing implementations that could be called ?immersive analytics? systems, many of them also situated. Some systems address universally challenging issues, such as labelling [89, 149], ef- ficient highlighting [70], impact of real-world background on visualization perception [203], and synergy of HMDs and handhelds [132]. Yet most SA systems tend to be use-case- specific. The advent of light, hand-held, multi-functional devices (e.g., smartphones, tablets etc.) has made AR accessible to a larger audience. This, consequently, has aided the emergence of SA systems for the general public, such as for tourism [42], sports [140], entertainment [13] and shopping [69]. In many research and analytical disciplines, the work space is out ?in the field? rather than an office or laboratory setting. Indeed, the decision-making processes and situational awareness of manufacturing, construction [24], agricultural [270], and utilities employees who work in the field [207] is an example of an enterprise-facing domain for applications of in situ analysis. In this space, Whitlock, Wu, and Szafir conduct a design probe involving expert users from five such disciplines to evaluate the needs and challenges of existing situated analytical systems for data analysis and collection, and demonstrate their resulting design recommendations via their implementation, FieldView [251]. Last but not 25 least, medicine and medical imaging has been a popular application domain for MR and has has yielded important techniques, such as the cutting plane implemented in this work, which originates in the interactive slicing of brain imaging data [99]. For a more comprehensive study of techniques, Zollman et al. [272] present a taxonomy for visualization in AR, based on many such examples, and use it to extend the traditional data visualization pipeline to situated implementations. They identify six recurring design dimensions in their AR visualization taxonomy?purpose (for using AR), visibility (vs. occluded or out of view), depth cues, abstraction (for reducing data complexity), filtering (to a subset of observations to reduce clutter), and compositing (the method for modifying the user?s non-augmented view of reality)?and the domains thereof. While my work may be comparable to Zollman et al. [272] in that my conclusions are based on a compendium of existing implementations, I focus more narrowly on the view management end of the visualization pipeline. Most situated analytics systems seek to take advantage of the user being embedded in the same space as data with spatial attributes by presenting them with an immersive integration? i.e., taking up the user?s field of vision. If we accept Mackay?s definition of AR described in Section 2.2.2, then the means and methods by which objects of interest are networked together and joined to the user?s view of reality are inseparable from the experience itself. While those means include commercially-developed platforms such as Unity, Apple?s ARKit and Google?s ARCore, they also include open source solutions developed and maintained by individuals and research and development communities, such as the Unity-based DXR [219] and IATK [51] toolkits, and the Web-based framework VRIA [37]. 26 2.2.4 Interaction and Situated Analytics Interaction has a central role in visualization despite typically receiving much less attention than visual aspects [260], and this is equally true for immersive and situated visualization [156]. Nevertheless, enabled by technological advances in contemporary immersive technologies, recent efforts have explored novel interaction techniques and device synergies in an SA context [35]. For example, Bach et al. [9] assessed the effectiveness of direct tangible interaction with 3D holograms. They compared the use of the Microsoft HoloLens with fiducial markers in an AR visualization setting to a handheld and a desktop- based setup. The notion of analyzing user interactions in MR spaces has also received attention. MRAT [171] is a MR toolkit that allows the visualization of usage data of interaction techniques in MR, providing mechanisms for interaction tracking, task definition and evalu- ation mechanisms, and visual inspection tools with in-situ visualizations. Likewise, Bu?chel et al. [36] present MIRIA, a toolkit designed to support in-situ visual analysis of spatial temporal interaction data in mixed reality and multi-display environments. MIRIA provides mechanism to depict and analyze movement of users and tracked devices, interaction events as well as to identify issues such as tracking problems or obstructions from physical objects. Flex-ER [144] is a web-based environment that enables users to design, run and share investigations in MR, supporting different platforms and interfaces via a JSON specification of interactions and tasks. In a similar theme of cross-device analysis of MR, ReLiVe [104], a mixed-immersion tool, combines an IA in-situ view with a synchronized visual analytics ex-situ desktop view. Cross-device synergies have also been exploited towards enhancing the analytical process within SA environments. Butscher et al. [38] investigate synergies between tabletops and AR-enabled HMDs to visualize and manipulate 3D parallel coordinate plots. Hubenschmid 27 et al. [105] explore similar synergies with tablets, and interactions via touch and voice commands. Reipschla?ger et al. [195] does this for AR HMDs and touchscreens. Finally, Langner et al. present MARVIS [132], a framework enabling the combination of mobile devices and AR HMDs in AR-based analytical setting. MARVIS allows the depiction of 3D visualizations above and between devices through the HMD. However, these are all one-off designs for specific displays and devices. Of particular interest to my work are the interaction affordances of toolkits designed for building immersive experiences. DXR [219], a Unity-based IA/SA toolkit, supports multi-visualization workspaces, with interactions (toggles, filters etc.), specified in JSON specification. Interactive elements include tooltips, view manipulation, and configuration controls, and its grammar can be extended to work with other modalities such as tangible, direct manipulation, gesture, and speech input. IATK [51] defines a high-level interaction model that provides filtering, brushing, linking and details on demand functionalities, harnessing GPU power to optimise performance. Building on IATK, RagRug [82] uses data streams from Internet-of-Things devices in SA. RagRug portrays the potential of cross-device connectivity with a visualization pipeline that combines IoT devices, data acquisition via MQTT, Node-RED for filtering, and IATK for visual encoding and rendering in MR. Finally, VRIA [37], although predominately designed to work in VR, also works in AR settings, largely thanks to the ongoing development of the open WebXR specification. Beyond its grammar, which is discussed in the next section, a central aspect is that VRIA is built with web technologies, an approach I take in this work as well. Interestingly, Besanc, on et al. [28] point out that new interaction techniques for exploring, filtering, selecting, or manipulating 3D data are often published in non-visualization venues, and thus remain unnoticed by visualization researchers. They also note that leveraging sens- ing technologies and adapting 3D interaction techniques from other contexts has significant potential for positive impact on 3D visualization. Nevertheless, structured mechanisms for 28 creating and defining interaction affordances in SA environments, especially for voice and gesture input (such as my contribution in this paper), is much less explored. 2.2.5 Mid-Air Gestures and Speech for Visualization Combining speech and gestures as input mechanisms for controlling virtual objects have long been a vision for the future of computing [95]. In 1980, for example, Bolt [32] combined the two for a large-screen display system, ?Put-That-There,? allowing a user to combine spoken commands with pointing to populate a room-sized space with 3D objects. However, speech with gesture as input is much less well-explored in visualization and visual analytics systems. The visualization of gestures [78, 113, 117] is a more prevalent theme than the use of gestures for visualization. One exception is Proxemic Lenses [10], where collaborators use explicit gestures and implicit body language to interact with large-scale data displays. Another is DA-TU [96], which features a tablet-based multi-finger gestural vocabulary for interacting with objects in a large database. A 2018 AVI workshop [137] highlighted this shortcoming in exploration of input modalities in contemporary visualization research. Even following this workshop, however, the use of mid-air gestures for immersive analytics research has remained uncommon, with one notable exception in Filho et al.?s evaluation of an immersive space-time cube [81]. Considerably more work has been conducted in the area of speech as an input mode for visualization for traditional displays [56, 262], large screens [10], and immersive en- vironments [12]. The Natural Language for Data Visualization (NL4DV) system [170] is noteworthy in that it integrates contemporary visualization tools with multimodal user input and popular analytical tools and workflows. Even the subject of combining speech and touch?not gesture, but touchscreen interaction?has been addressed in visualization 29 literature [202], with the results confirming that multimodal input is preferred over single modes of input for either speech or touch. However, I argue that speech and mid-air gestures have not been used in combination for immersive analytics in the existing literature; this is one of my goals in this paper. 2.3 Visualization Grammars and Beyond Visualization grammars, first introduced by Leland Wilkinson as the eponymous Gram- mar of Graphics [254] in 1999, provide combinatorial building blocks for specifying visual representations using a concise declarative language. This approach is radically different from the standard chart template galleries used in tools such as Microsoft Excel. Wilkinson?s grammar was quickly adopted by the visualization and statistics communities. Hadley Wickham?s ggplot2 [253] operationalizes the grammar in the R language. Vega [206] is a low-level explanatory specification expressed as JSON and rendered on the web using SVG or Canvas; upon it is built the higher-level Vega-Lite [205], which facilitates rapidly building interactive visualization without the full complexity of the Vega backend. Several special-purpose visualization grammars have since evolved, including Atom [182] for unit visualizations, Cicero [124] for responsive web-based visualizations, and PGoG [191], a probabilistic extension to Wilkinson?s original Grammar of Graphics. Visualization grammars can also serve as low-level specification backends for higher- level visualization environments. This allows point-and-click interaction rather than textual specification. For example, Tableau (originally published as Polaris [228]) is built on the underlying VisQL grammar. Similarly, the Lyra 2 [273] interactive visualization environment generates Vega or Vega-Lite specifications as output. Finally, there exists some visualization grammars designed specifically for immersive, situated, and ubiquitous analytics. ImAxes [52] can be said to be one such system, enabling a 30 VR user to freely combine axes to author various multidimensional visualizations in 3D using direct manipulation. DXR [219] uses a JSON specification language similar to Vega-Lite to author 3D visualizations for immersive analytics in the Unity engine, even providing an interface for modifying the representations while in the immersive environment. VRIA [37] supports a similar Vega-like JSON specification, but is entirely implemented using open web technologies such as A-Frame10, React, and D3.js rather than Unity. Compared to these existing offerings, in this paper I propose a visualization environment for ubiquitous and immersive analytics based on mid-air gesture and speech interaction. Similar to VRIA, my approach is built on open web technologies rather than a proprietary graphics engine. To the best of my knowledge, this work is the first to allow users to author visualization grammar specifications in 3D mixed reality using such a direct manipulation method. 2.4 Evaluation Given the need for further evaluation and empirical work in IA in general and SA specifically, I must take a broader view of evaluation in the HCI and VIS communities. In the domain of HCI and data visualization, some approaches to understanding factors influencing users? experiences and needs for systems involve constructing personas?representational archetypes of ?typical? users and their daily lives [102]. This often involves qualitative and ethnographic methods in which the researcher tracks, records, and interprets the users? daily activities in collaboration with the participant, reaching a shared understanding of the user?s thought processes through interview and activity [102, 215], but approaches using observational data, statistical models, and machine learning are becoming increasingly common. 10https://aframe.io/ 31 2.4.1 Cooperative and Contextual Inquiry for Visualization In their seminal paper, Wixon et al. [258] introduce ?contextual design? as a systems development method in which the researcher partners with the user at the user?s place of work to ?develop a shared understanding? of the user?s activities, and they define contextual inquiry as the first part of the broader process. Specifically, contextual inquiry is the data collection step of the field research element of the contextual design method, and it emphasizes four essential principles: (1) the context of the activity being performed by the user, (2) the partnership between the researcher and the participant, (3) the spoken verification that the investigator?s interpretation of the activity matches the user?s, and (4) the focus of the study as central to the approach taken by the interviewer [29, 102]. The most typical application of contextual inquiry is in the form of a contextual interview, which begins in the user?s actual work environment as a traditional interview regarding the user?s recollections of their work activities, and, within fifteen minutes, is transitioned o an activity in which the participant conducts their work while the researcher watches and takes a participatory role by sharing and summarizing their understanding of the user?s work [29, 102]. Cooperative inquiry is a qualitative evaluation method based on an iterative cycle of three primary steps: contextual inquiry, participatory design, and technology immersion [63]. Contextual inquiry is the data collection process in which the researcher and participant form a partnership to reach a shared understanding of the user?s experience as part of a broader design study [29, 258]. In my ?visualization gap? study [18], I employed contextual inquiry to understand data scientist workflows and their relationship to interactive visualization through in-depth interview sessions. In participatory design, the user partners with the researcher to continuously develop new prototypes for the implementation. One method that I particularly draw from participatory 32 design is to embed a researcher with both the users and designers of the system to act as a values lever: A link between user and researcher team who is responsible for translating user requests into technical specifications [215]. On an operational level, this is similar to the pair analytics approach proposed by Arias-Hernandez et al. [6], where a visual analytics expert ?drives? the system while a domain expert gives directions. While alternatives to the qualitative methods for developing personas may be applicable in certain cases (e.g., where mouse events are the most notable method for human-computer interaction) [264], this approach is more difficult to apply in design for the sciences beyond simply categorizing event sequence structures [143]. Field survey methods are still a popular approach for determining the direction of design targeting scientific users [148, 192], including data scientists [23, 119]. Kandel et al. [119] conducted what might be considered a contextual interview study similar to my own in that they analyze data scientists? self-reported work processes, and attempted to interview participants at their place of work in as many cases as possible. They propose three main archetypes that data scientists may be classed into: Hackers, who build processes chaining together multiple programming languages of different types (analytical, scripting, and database languages, for example) and who use visualization in a variety of environments; Scripters, who perform most of their analysis in an analytical environment (e.g., R) and perform the most complex statistical modeling of the types but who do not perform their own ETL; and Application Users who performed most or all of their work in an application such as Excel or SPSS and, like Scripters, relied on others (namely, their organizations? IT departments) for ETL. The appropriateness of contextual inquiry for analytical professions in more contemporary research is further evidenced by the recent, complete contextual design study of data scientists [181] conducted by IBM, a notable employer of data scientists. 33 2.4.2 Quantitative Measures in Task Performance, Use of Space & Time Quantitative analysis of interfaces in the context of data visualization typically revolves around task performance metrics given a type of data [218]. Task performance studies often measure accuracy, correctness, speed, and other measures of how well and how easily the user is able to interpret different types of data: Namely, quantitative, ordinal, and nominal data [226]. Empirical work on graphical perception, from early seminal work by Cleveland and McGill [49], Mackinlay [147], and Bertin [27], to more recent work in visual perception [180, 250] or?in the case of multisensory IA studies?other senses [21], often attempts to determine an internal ranking between visualization techniques or sensory channels and these three data types. Events within the interface, such as mouse activity [3], may be used to develop ?data- driven personas? [265] for specific types of users; platforms for crowdsourcing experiments such as Amazon Mechanical Turk [127] make the creation of these types of personas more manageable at larger scales. With the rise of IA and SA, approaches analogous to this are becoming more popular as a means of characterizing study participants? navigation through space, view arrangement, and time use [17, 22, 204]. These types of studies, whether early or more recent, are often accompanied by qualitative evaluation to provide nuance and identify patterns that were not captured by the quantitative data collected during the study [17, 59, 238]. 2.4.3 Characterizing User Behavior with Machine Learning In the machine learning (ML) community, there has been more than a decade?s worth of literature exploring methods for action classification [133, 173], motion and path prediction[145], eye tracking [129], and gesture detection [162]. While there have been a few position pa- pers [39] and more serious studies [98] advocating for a closer relationship between the 34 HCI and machine intelligence communities, the current body of literature on the subject is surprisingly sparse. If a trained ML model can identify individuals? emotions, moods, and expressions [73, 143, 230, 261, 267] and can accurately predict whether a basketball player is good or bad [26], why is there so little work identifying whether a user?s experience has been positive or negative, or their task performance good or bad? This section will review a range of applications in computer vision (CV) literature to human behavior, and then provide an overview of HCI work that does make use of CV methods. 2.4.3.1 Deep Learning Models for Human Poses Considerable prior research has explored the topic of human pose estimation using deep learning in single-person single camera [271], multi-person single camera, and multi-person multi-camera settings [40, 86]. Recent work includes top-down approaches using a 2-stage pipeline with a CNN for frame-level pose prediction followed by a matching algorithm to efficiently link the predictions to specific people[40, 86, 223, 244]. The CNN itself can use a 3D mask as in Girdhar et al. [86] to incorporate temporal data for more robust prediction. In my project, I use the pretrained OpenPose model [40] to jointly detect human body, hand, and facial keypoints (in total 135 keypoints) on single frames. Walker et al. [243] tried to address the video forecasting problem by taking advantage of the strengths of Variational Autoencoders (VAEs) and GANS. Instead of solving this forecasting problem directly in the pixel-level space, this paper projects the problem into the human-pose space through the human-pose estimation. In Batch et al. [20], my motivation was similar in that I also try to address the action classification problem in the human-pose space, instead of classifying actions directly from videos. 35 2.4.3.2 Behavioral Coding Coding behavioral events in video is common research practise in HCI and other fields, often those related to the social sciences [196]. It is largely performed in three steps. First, a coding scheme that describes the categories of actions has to be created via a bottom-up, top-down [245], or a hybrid approach. In a bottom-up approach the themes for the actions emerge from the data itself and are agreed upon by the coders after watching and rewatching of the videos. In a top-down approach, labels emerge from the theoretical literature on human gestures. The second step would be to train some number of coders which takes an amount of time proportional to the complexity of the videos. The final step is to actually label the videos and ensure that the coders are able to label videos in a consistent way which is measured by an agreement metric such as Cohen?s Kappa [101]. The codebook might be rewritten in iterations during this process. Several existing tools have been built to support the video coding process, particularly to help with coder training and video labeling in a systematic way. For example, ANVIL, Datavyu [232], VACA [34], and VCode [92]. There have also been systems in the past that have leveraged crowdworkers instead in the codebook creation and video labeling process [134]. In our system [20], Kyungjun Lee, Hanuma Teja Maddali, Niklas Elmqvist, and I implemented a hybrid approach in which an unsupervised clustering mechanism grouped actions in the data by a measure of similarity related to change in pose. A human in the loop then used knowledge of relevant theoretical models to select potential AOI, either though expected actions or outlier detection. The action detection and label assignment process in my pipeline, however, was completely automated via an action classification model. 36 2.4.3.3 Unsupervised Video Summarization Summarization models are probably closer to my objective than any other, but my target is the narrow context of HCI researchers discovering new actions based on user interactions in systems using 3-dimensional body motion and gestures, and reducing the computational cost of model training is a high priority. Mahasseni et al. [151] take what might be considered the most contemporary approach to detecting events in video for summarization by using generative adversarial networks (GANs) to detect keyframes?frames marking the end or beginning of transitions in motion?in high-resolution video. In their model, the generative network (summarizer) creates a summary of a longer video in order to trick the discriminator, and the discriminator network is trained to discriminate between the summarizer and the human-summarized video. They use the SumMe dataset, which has short, human-made summaries for a corresponding set of longer videos (1 to 6 minutes in length) [91]. The use of keyframes itself is not a new idea. In fact, as an alternative approach to detecting keyframes, the study that originated the SumMe benchmark dataset used by Mahasseni et al. [151], Gygli et al. [91] draw from video editing theory in proposing Superframe segmentation, a technique that cuts video into arbitrary segments and then shifts the the cuts to neighboring frames with the least motion, as part of a video summarization pipeline. Following segmentation, they evaluate numerous other features of the video? including attention, color, contrast, edge distribution, and object detection (people and landmarks)?and then calculate an ?interestingness? score. The interestingness of a segment must meet a predefined threshold in order to be cut into the output of the model, which concatenates the most interesting segments of the video into a short summary. 37 2.4.3.4 Machine Learning in HCI Video and audio recordings tend to be a nearly-ubiquitous form of data to capture and analyze during user study sessions. HCI and visualization research communities have already begun to make advancements that shift away from video and audio recordings being an intractable media format, cheap to capture but expensive to analyze in evaluation studies for HCI and visualization, toward the use of video and audio inputs as a revealed behavior dataset that is time-cost cheap and therefore scalable for the analysis of large user populations. Relevant examples include discovering speech patterns [75], identifying gesture [152, 193, 234] and gaze [164, 266], classifying user emotion and facial expression [94, 159, 229], and detecting characteristics of the user, such as gender [240]11, by constructing and implementing neural network architecture. The visualization community has also made contributions to the toolkit of methods used in evaluating user video, logs, transcripts, and other qualitative data [44], as well as user gesture analysis [122]. Systems for visualizing and analyzing visual and semantic features of cinematic films in the context of film studies been implemented, for example, in VIAN [93], which represents information about average frame color to the user, who can then manually segment the video with semantic annotations. Kurzhals et al. [130] introduce a system that uses the text of movie scripts to assign semantic labels to frames, which is graphically represented to the user along with motion and other visual frame information in an interactive dashboard that affords user annotation. Pavel et al. [184] present a system for automatically segmenting and summarizing lecture recordings and append them with crowdsourced transcripts. QuickCut [236] is a system for fast video editing and annotation that allows audio annotations corresponding to timestamped clip segments to be quickly transcribed, semantically matched, and cut together. Leake et al. [135] create a system for automatically generating audio-video slideshows using text and 11I note that this approach, like many similar projects conceived with little thought to their sociotechnical impact, are a highly questionable practice. 38 imagery from written articles. However, these systems either generate read-only output to be consumed (rather than analyzed) by the user, or they require semantic information that is derived either manually by the user, or via existing scripts or metadata that contains semantic information beyond that which is contained in the video and audio data itself. In the scenario I envision, the visualization or HCI researcher can simply add a video and/or audio recording setup, or turn on the onboard camera of a test computer and mobile or wearable device, to collect additional semantic information about the user?s speech and physical actions while participating in the user study. The video and audio footage can then be easily and quickly analyzed using off- the-shelf models, resulting in different data streams (e.g., pitch, speech rate, gaze direction, hand posture, and the output from semantic models) synchronized to the rest of the study telemetrics. All of these metrics can then complement task performance data collected in a user study, to reveal deeper insights about the evaluated system and/or targeted users. The user can then annotate the session recording with their thoughts as they conduct their analysis. As ML continues demonstrating its potential, qualitative researchers are becoming increasingly interested in adopting ML into their analysis flows. However, they often face challenges when incorporating ML into their analysis. First, although traditional classification and clustering ML methods are helpful for generating additional labels to inform analysis, these labels alone are often not sufficient for addressing human-center research problems. Instead, human-centered researchers need to leverage their skills to make sense of the ML-generated labels to gain a deeper and more nuanced understanding of the data. Second, many ML methods require a significant amount of data for optimize parameters and thus have limited accuracy when dealing with small-scale yet rich-in- meaning human-behavior data. Such challenges have inspired researchers to investigate ways, such as interactive visualizations, to better integrate ML into qualitative researchers? 39 analysis workflow. One line of research is to support qualitative coding, which is a powerful yet labor- intensive method. Felix et al. [77] designed a visual data analysis tool that integrates unsupervised learning methods to provide suggestions to help researchers progressively code a large corpus of texts. Another challenge that qualitative researchers often face is to resolve conflicts among researchers when analyzing qualitative data. Drouhard et al. [62] designed a tool, Aeonium, that identifies potential conflicts in codes created by different coders using ML and highlights the conflicts to facilitate coders to spot their disagreements and resolve conflicts efficiently. Another line of research is to support the analysis of user interaction data to uncover users? intentions and reasoning processes. Both low-level user inputs (e.g., mouse clicks,drags, key presses [88, 200]) and high-level graphical structures of user interactions [97] are captured and visualized to help researchers make sense of their analytic activity. Moreover, eye- tracking data (e.g., scanning trajectory, Area of interest) have also been visualized to help researchers analyze users? interactions and even predict users? intents [31, 220]. Furthermore, researchers have investigated manually recorded provenance (e.g., user-generated annota- tions) and developed visual interfaces to uncover hidden sense-making patterns [268, 269]. In addition to using proxy data (e.g., mouse events, eye-tracking data) and manual prove- nance (e.g., user-generated annotations), researchers have recently begun to investigate think-aloud data, which are generated by asking users to verbalize their thought processes while working on a task, to better understand their hidden thinking process. Think-aloud data have been used to understand analysts? reasoning processes [61, 142] as well as users? interactions [74]. VA2 visualizes think-aloud, interaction, and eye movement data to facili- tate the analysis of multiple concurrent evaluation results [30]. Recently, Fan et al. built an ML model that predicts usability problems of think-aloud sessions based on users? speech and verbalization patterns, and further designed VisTA to visualize ML?s predictions as 40 well as speech related features on synchronized timelines [75]. In addition, using advanced analytical technologies, several researchers developed systems that detect users? moods and facial expressions to facilitate user experience evaluation [57, 167, 224, 230]. Inspired by much of this prior work, in Batch et al. [19], Yipeng Ji, Mingming Fan, Jian Zhao, Niklas Elmqvist and I extended this line of research by considering a wider range of modalities of data extracted from video and audio footage, that are indicative of users? experiences (speech rate, transcripts, gaze direction, facial expressions, semantic actions), to create a more comprehensive visual analytical tool to better support the analysis of users? behaviors. 41 Part II Part 2: Data Science Workflow to Design Guidelines 42 Chapter 3: The Interactive Visualization Gap We conducted an investigation of how data scientists engage in the early stages of exploratory data analysis (EDA) with an eye toward how visualization, specifically, fits into that process. In this chapter, I describe the methods we used, our results, and the design recommendations we construct based on our findings. 3.1 Contextual Inquiry with Data Scientists In Batch et al. [18], we conducted our study as a contextual inquiry [102], where we first interviewed participants to establish their everyday work practice. However, our study deviated slightly from standard contextual inquiry protocols in that we then asked participants to solve specific problems that we provided (instead of using their own datasets). These problems were based on (1) artifacts used throughout the participants? work process, including code, databases, spreadsheets, methods documentation, and checklists; (2) on our prior knowledge of data science workflows; and (3) on user feedback gathered during beta testing of an R library developed to aid in the extract, transform, and load (ETL) processing of data from a major producer of economic statistical indicators. Our motivation for the modification was that we already have a reasonable understanding of current data science practice (e.g., as described by Anderson [4] and Kandel et al. [119]), the practices of our participants based on their organizational artifacts and their feedback, and we were more interested in directing participants towards specific tasks to elicit a better 43 understanding of the initial exploratory stages of the data analysis process. We believe that inferences about these stages would be difficult to make if participants were instead asked only to walk through routine data product maintenance procedures or to give a verbal explanation of already completed projects. By controlling the tasks and problems to work on, we hoped to eliminate some of the wide variation in tools and approaches that individual analysts may exhibit. 3.1.1 Participants We recruited eight data scientists and economists from several federal agencies in Washington, D.C., USA to participate in our experiment. Five of the participants were male and three were female, their ages ranged from 26 to 50 (mean age: 35.5), and they all had normal or corrected-to-normal vision (self-reported). Six participants had earned masters degrees in quantitative fields, one had started?but not finished?a Ph.D. program in economics, and the remaining participant was in the process of earning a masters degree in economics. The participants? experience in their fields ranged from 4 years to 20 years (self-reported). Participants were screened to be experts in data analysis; all participants reported routinely using data management and analysis operations in their daily work and had several years of experience working on this type of duties. Four of these participants had developed or contributed to the development of interactive data visualization projects. Once screened, participants self-selected in response to emailed requests for their in- volvement in our study. The self-selection and small sample size must be acknowledged as a limitation to how representative this study may be, but is not uncommon in field studies involving the entry of researchers into the personal or professional environments of the participants [48, 108, 158]. Similarly, the sample was selected based on their employment with, and roles within, federal agencies, which must also taken into consideration with 44 respect generalizing based on our results. 3.1.2 Apparatus and Locale All inquiry sessions were conducted in the workplace of the participant and using their everyday computing environment to ensure their familiarity and comfort during the study. The exact computing platform, hardware setup, and data analysis software thus varied significantly between participants. Because of this difference, screen recording tools varied across two organizations; one organization had a preexisting screen recording utility and security settings prevented the use of external screen recording software, and the other participants used a free screen recording application. All participants used pencils and paper provided by the researchers for the sketching activity. 3.1.3 Procedure A single inquiry session consisted of the study administrator arriving at the participant?s workplace, collecting informed consent, and then giving a brief background of the study. Significantly, at no time?either in recruitment or during the introduction of the session? did the administrator mention the visualization theme of our study. The reason for this omission was to avoid priming and potentially biasing participants with regards to their use of visualization. The rest of the study then consisted of four primary steps: 1. A preliminary interview regarding the participant?s work processes and tools used in their work (10 to 15 minutes); 2. A data analysis activity designed to mimic a standard data science workflow [4] (approximately 1 hour); 45 3. A formative design activity during which the participants were asked to sketch visu- alizations appropriate to tasks in the preceding analysis activity (20 to 30 minutes); and 4. A final semi-structured interview on visualization in the context of the participant?s workflow (10 to 15 minutes). Each session lasted approximately two hours. After finishing a session, the administrator summarized the participant?s findings, asked for clarifications or corrections, and answered any remaining questions. 3.1.4 Problem Set Each participant was asked to pick one of the four questions below to answer using real, public data by the end of the Stage 2 within one hour of making their selection (see the Appendix for more details): 1. ?How has the rate of a specific type of crime changed over the last few years?? ? Optional: ?What might be causing this change?? 2. ?Tell me something interesting about the careers or personal finances (e.g., income, spending habits, or employment) of a particular group of people compared to (an)other group(s).? ? Optional: ?Suggest an explanation for your observations.? 3. ?When and where has a number of major catastrophic events occurred? Do they share anything in common with events you didn?t expect to exhibit similar characteristics? ? Optional 1: ?How frequently and how long after the fact did people talk about/re- ported on these events?? 46 ? Optional 2: ?What was the weather like in the area of the event before and afterward?? 4. ?What?s been going on with gasoline for the past few decades? Tell me as many things about it as you can.? As noted at the beginning of the methods section, these questions were based mainly on artifacts used throughout the participants? work process (code commentary, spreadsheet notes, process documentation, and so on). Questions were made fairly open-ended so that analysts could use their experience to not only determine how they would answer it, but also to decide what constitutes a satisfactory solution. 3.1.5 Data Collection and Analysis Participant voices and on-screen activities were recorded during each session, and some participants drew sketches which were retained by the researchers. Furthermore, the test administrator took extensive notes of observations as well as discussions with the participants during the session. These transcripts and notes form the primary data collected from the study. We followed a basic qualitative interview analysis method when extracting insights from these transcripts. We first listened through the audio recordings in their entirety to form a general understanding of the themes and topics of the discussion. We then used the interviews to start coding these themes and topics. While we did not use a formal Grounded Theory approach, we did apply an open-coding scheme and regularly stopped to calibrate and merge codes as needed. 47 3.2 Contextual Inquiry Results In Batch et al., we reported our results for each of the four different stages of the evaluation: (1) preliminary interview, (2) data analysis using a problem set, (3) formative sketching, and (4) final post-experiment interview. 3.2.1 Stage 1: Pre-experiment Interview With one exception, all participants described their work procedures to largely occur within the context of existing information systems and data structures. 3.2.1.1 Self-Reported Workflows The work processes reported by all participants began at the point understanding the problem or issue they were addressing in their analyses. Participants all moved on to describing the sources of their data, and all participants described a central component to their work being to join or infer relationships between series across different data stores. Three participants noted that the most frustrating part of their work process is often these first two stages when it required communication with data providers. In describing the methods used, all analysts described a need to extract data from an external source and transform it for use with statistical programming languages (R, FAME, and Python). Participants described using models of varying complexity in their typical work process; most notably, they mentioned statistical language processing and other information matching and retrieval methods, as well as hierarchical and relational structures. Three participants reported the end of their workflow as generally being the communication of their findings, with the remainder reporting archival as the final stage. Five participants reported recent work projects ending in the completion and deployment of tools for data manipulation or 48 analysis; the remaining three conducted their analyses using existing tools. 3.2.1.2 Work Focus All participants had recently (within the last year) conducted independent analytical or development projects for which they were the lead or sole contributor. One participant described his work as consisting of running projects that primarily start from scratch. This participant recently developed a search method for large, unstructured, and highly technical text data that had been accruing for roughly forty years. The four remaining participants reported that the primary focus of their work was in the context of an existing information system. Three of these had made lasting and substantive methods contributions to the body of data science or analytical systems within their current agencies: one had built a user interface for querying agency databases; another had restructured a complex, hierarchical data structure; the third had constructed a revision analysis tool referencing a node aggregation structure. 3.2.1.3 Self-Reported Tool Use: Revisiting Kandel?s Archetypes In some ways, the results from the study by Kandel et al. [119] are similar to ours (e.g., finding appropriate data, ETL, and integrating datasets from several sources took up a large share of many of the analysts? time). However, in contrast to the findings that lead them to propose their three archetypes, interview question responses from the participants in our study indicate that they invariably straddled the ?Hacker? and ?Scripter? role; not one of them relied on others within their organization for data ETL (although some reported receiving data from external providers under contract as part of a wider process that involved conducting their own ETL). Perhaps even more importantly, all of our respondents reported performing the bulk of their analyses in a scripting or analytical language and had used 49 multiple languages on the job. This difference may, admittedly, be a result of our small sample size, but it may also be an indicator that their third archetype, the ?Application User,? has become passe? in analytical professions. Alternatively, it may mean that we have not yet reached a tool maturity where this archetype can become dominant. In our study, one participant reported mainly using Python, and noted that the SciPy, NumPy, multiprocessing, and glob libraries were essential for recent work, but that a number of additional libraries made their work easier, with the ?ujson? library being among their most favored. This participant also made a note of recent work made use of the Python interface for the Stanford Network Analysis Project (SNAP). Four participants reported using R, but only two of these reported using it regularly on the job. Four participants reported developing interactive visualizations using Plot.ly, Leaflet, and D3 [33], among other tools, at least once in the past. Three of these also reported using JavaScript/HTML/CSS infrequently on the job to communicate output from statistical models to colleagues. These same three participants further reported having used Python, but this was mainly used for personal projects (e.g., combining the use of an API of a financial newspaper, a string pattern recognition algorithm, and a text-to-speech function in order to find and produce audio summaries of news related to their interests which they could no longer find the time to read through manually). Five participants reported used Excel and the time-series database and programming environment FAME (?Forecasting Analysis and Modeling Environment?) as the primary environment for analysis on the job.1 For all of these participants, FAME was described as the environment used most heavily for analysis, whereas Excel was described as being used mainly for the purpose of viewing data and communicating analysis results to others. 1FAME is a time-series database with many easily accessible APIs and a domain-specific programming language. 50 Table 3.1: Participant time use and static visualization rate by task types. Participants spent by far the most time in discovering the appropriate dataset to use in answering their selected question. ?Static Visualization Rate? in this context refers to the percentage of participations who created static visualizations during their activity. Task Average Time Static Visualization Rate Discovery 37 minutes 50.0% Data ETL 9 minutes 0.0% Exploration 14 minutes 62.5% 3.2.2 Stage 2: Problem Set Of the eight participants, two partly answered the question asked in the problem set to their own satisfaction, and the remaining six participants fully answered the question. In all cases, the main stage that participants found impediments to their progress was in the ?Discovery? stage. Interactive visualization was not implemented at any stage of the problem set activity, but static visualization was used by a majority of participants (Table 3.2.2). Several participants used interactive visualizations built by others regarding the data they were considering using to answer the problem. We also observed that all participants using programming environments either received syntax error messages or had minor difficulties reshaping the data which required minutes to resolve. 3.2.2.1 Summary of Tools and Visualizations Used During the activity, one participant used Python without an IDE, three participants used R in RStudio, and five used Excel. For direct manipulation and analysis of the data, three participants only used Excel, and two participants only used R in RStudio. Of the participants who stated during the interview section that their primary analytical environment was FAME, if any visualization was produced during their session, both the visualization and the analysis itself were done using Excel. None of the participants in this study used any visualization 51 tools outside of those built into their analytical environments. All participants used the ?look at the data? (or ?show me the numbers? [79]) approach as primary means of verifying the relevance and completeness of the data prior to communication stage (i.e., looking at the data in whatever format it was stored). The two most experienced users in this study did not use visualization at any stage of the problem set. 3.2.2.2 Discovery The discovery stage was by far the most time-intensive activity for all participants during the approximately 1-hour-long problem set activity, taking participants on average 37 minutes to complete. Of this time spent in discovery, ? An average of approximately 22 minutes was spent reading reference material (ex- cluding metadata) to find potential causal factors, and to explore statistical methods including syntactical options within analytical environments. The participants referred to a combination of news, academic, and data science blog articles to assist with this stage of their process. Three participants mainly referenced articles, two of whom read online tutorials (e.g., R cookbook), StackOverflow, and R help documentation; of these, one also referred to API documentation and metadata, and the other partici- pant mainly referenced financial news, academic articles, and statistical reports from government agencies. The third of these participants mainly referenced popular press articles and data science blog posts. Two participants made a point of referring to visualizations produced by others in their readings. ? An average of approximately 15.25 minutes was spent referencing site or API meta- data and conducting searches as a means to find the location of the correct data. One participant spent the large majority of the discovery stage searching and exploring site metadata, and virtually no time reviewing other reference material. No visual 52 representation of the reference metadata was referenced or created by any of the participants. All participants exclusively selected government data; one used local government data for crime statistics, while all others used federal government data. 3.2.2.3 Acquisition and Transformation None of the participants used visualization during this stage. The average amount of time spent on data acquisition (ETL) was approximately 9 minutes. ? Data extraction and loading took, on average, approximately 2.25 minutes, which was skewed upward by a participant who needed to extract several large datasets from a site, and skewed downward by a participant who extracted the data using an API request that took only the amount of time required to write the request function (approximately 10 seconds). One participant used a REST API, and the remaining three exclusively used site download tools. ? Once it was loaded into the analytical environment, transforming the data to prepare it for modeling took slightly longer for participants across all environments, taking an average of approximately 7.75 minutes. This process was lengthier in cases where the structure of the source data being used in the model was more complex, and in cases where the data was being manipulated using a programming language, and was skewed downward where Excel was used with minimal transformation. 3.2.2.4 Exploration, Modeling, and Communication This process took, on average, approximately 14 minutes. The most complex model attempted was a basic linear regression model. One participant attempted a categorical 53 Figure 3.1: One user produced a column chart with U.S. Census Bureau data in RStudio using the ggplot2 library parent/child aggregation hierarchy, but was unable to finish the analysis. The participant using this hierarchy did not use visualization at any stage of the problem set activity. Of the remaining participants, one examined a cross-section of ratios across geographic categories; this participant produced a column chart comparing public sector employment rates against private sector employment rates by state using ggplot2 (Figure 3.1). This participant also expressed a desire to create a grid of faceted bar charts (also using ggplot2), but decided against it because it would take too long. One participant examined the rate of change of two potentially related time series with different units of measurement, and produced a line chart comparing the series scaled to different axes to explore the potentially causal relationship. This participant was the only one who used a chart to inform the later stages of analysis, first charting one series and using that information to search the time period of interest, and was the only participant to perform comparative data analysis. The remaining participant examined the rate of change in a single series and produced a line chart representation of the series. All charts used 54 or produced during this activity were static. All participants who used visualization for exploration used the same charts as part of the communication of their findings. 3.2.3 Stage 3: Sketching As in other studies [46], we opted to a sketching activity to allow for the creation of visualization in instances which may otherwise have been constrained by either technological barriers or the time limitations of our interview sessions. The most common theme in participant sketches of potentially helpful visualizations during this stage was that most participants viewed a table as the most beneficial visual aid. Only four of them drew a chart, and in one of these cases, it was mainly as an afterthought. All participants focused on the work involved in data discovery as the most difficult element of the activity, including participants who were already familiar with the source of the data they selected. All participants were most strongly interested in methods for multistage search-and-filter interface design; all participants included either drop-down menus or search bars (or both) in their sketches. Three participants also included tables in their sketches; two of these sketches contained lists of potential data sources, the third contained the data itself (Figure 3.2). One participant expressed interest in a related-data search and discovery tool inside the RStudio IDE. Of the participants whose sketches extended beyond search-and-filter methods for data discovery, one drew a bar chart representation of a hierarchical time series and expressed an interest in better illustrating the hierarchy. Another participant expressed a desire to represent auto-regression models of the series used during the problem set activity, and noted that it would have been easier for them to do using Stata. A third participant, who we consider to have the most experience in developing interactive visualizations within the study cohort, incorporated interactive elements within his sketch as a small window which 55 Figure 3.2: The fifth participant simply sketched a data table and, not without sarcasm, added a ?download? button. 56 appears on mouse-over (i.e., a tooltip) with details regarding data linked to a visual object within the view (Figure 3.3). 3.2.4 Stage 4: Post-experiment Interview When asked about reasons for not using visualization, three recurring themes arose during the post-experiment interviews: (1) Visualization was too time-consuming to be worth their effort, (2) numeric data provided more detail in many instances than visualization could, and (3) visualization was just not needed. 3.2.4.1 Not Enough Time Five of the eight participants stated that visualization is important, but that they did not have time to do it often. One participant said that only one of their projects, not a routine part of their work process, involved visualization in order to check the accuracy of predictive models. This participant said that building visualization into their typical workflow was difficult due to time constraints. When asked about the tools they typically use for visualization, they responded that use of Excel was most common, but that they have used Stata, Eviews, and R for visualization as well in their free time or as a student. Regarding ggplot2, one participant remarked: ?The syntax just doesn?t feel right[...] to come up with one beautiful graph, if I put it in a nice block format, it would be like fifteen additional lines. To me, that seems superfluous. I also don?t like this syntax?using ?plus? signs between each line. R?s syntax is more functional?traditional functions have commas, all within the same parens; I understand that maybe the philosophy is that you have to be explicit about [features...] but that seems like overkill.? We found this emblematic of the guidelines we propose: It is not enough to build tools for interactive visualization, or even to port them to the researcher?s environment?we must also make it syntactically familiar, 57 Figure 3.3: Interactivity appeared only once in our study, in a sketch; this indicates that the desire to build interactive views is present within the data science community, but the costs of using the tools outweigh the need during initial exploration. 58 concise, and convenient to use within that environment. 3.2.4.2 Show me the Numbers! One participant said that they occasionally use a line graph to track rates of change, but that they typically just look at the numeric representation of a time series when checking for volatility or revisions, as they find it clearer and more accurate than the line chart. This participant noted, however, that representing thresholds or other important characteristics by changing the color of the number or background was helpful. 3.2.4.3 Visualization is Unnecessary Five participants noted that the data was straightforward enough that there was not a strong need to visualize it, and one of these, along with one other participant, noted that familiarity with the conceptual context of the data coupled with a quick examination of the numeric data was sufficient for their purposes. One data scientist stated that they virtually never used visualization except to communicate their findings with others, and during the post-activity interview, noted that the exception to this was in cases where data was either structurally complex (e.g., representing networks), or when it was intrinsically spatial. 3.3 A Discussion of the Visualization Gap In Batch et al. [18], we analyze the results of a contextual inquiry conducted with eight data scientists and economists using qualitative methods and derived both expected and surprising findings. More specifically, our results confirmed that visualization was primarily seen as a communication tool among professional analysts and that few of our participants ever use visual representations of their data in the middle of an analysis process. The reason stated for this was that visualization tools are generally seen as endpoints in the process 59 in that they (a) are separate from the computational tools that data scientists typically use (R, Matlab, SPSS, JMP, among others), (b) require extensive data wrangling [118] to use, and (c) provide poor functionality for exporting insights, operations, and filters used in the visualization. We did note that our participants have a quite pragmatic view of the use of visual representations; visualization is just yet another tool, and they claim no intrinsic bias against its use if it provides clear utility. This represents a promising opportunity for the visualization field provided that our tools can be better integrated in data science workflows. Our results highlighted the quandary of visualization in professional data science: visualizations?including static visualizations?were rarely seen as obligatory or even useful components of the initial analytical process, and were instead relegated to the final checking and dissemination stages of the process. In other words, a dynamic visual repre- sentation was considered a good tool for communicating results with a lay audience, but was not considered vital when trying to understand which results to communicate in the first place. Based on our conversations with the data scientists involved in our contextual inquiry, we outlined a few measures that the visualization field should focus on: ? For visualization scientists collaborating with data scientists, use the same program- ming environments and syntax that they do and build visualization elements into ?data discovery? libraries, creating or tying together data ETL tools that can be used in a non-interruptive step within the analytical environment to facilitate sensemaking. Sensemaking is often described as a cognitive skill requiring human in- tervention [80, 90, 115, 201], and libraries within statistical environments are nothing if not artifacts of data scientists? efforts to simplify that process for their peers. ? Conduct user experience (UX) design sessions with data scientists to investigate ways to soothe the frustration of errors and data foraging. All of our participating data scientists noted that the user experience of their most commonly used tools left 60 much to be desired. Unfortunately, given their small population size and because of the haphazard and highly personal data science process, not enough attention has been spent on this topic. ? The verdict on data tables: Not bad. Participants of this study gravitated toward the data table format as their visual representation of choice, and every single participant viewed the data in a tabular format. Those using Excel, which links chart creation with table views, were able to more quickly and successfully visualize their data; however, many of these users expressed a degree of embarrassment at resorting to Excel. Those using R or Python either did not attempt to visualize, or found the syntax to be inconvenient. Bridging visualization and data science may require visualization researchers spending more time on augmenting basic representations such as tables with additional functionality rather than designing entirely novel visual representations. ? Design self-contained, visualization components that can integrate into the command- line interfaces that data scientists routinely use while still allowing for full-fledged interaction (zooming and panning, filtering, details-on-demand, etc) [218]. The syntax of calling the components must match that of the target environment; for instance, calling visualizations using single-line functions with parenthetical variables and specifications was a feature more than one of our respondents mentioned finding desirable. Furthermore, these visualization components should be first-class members of the analytical process so that actions and transformations interactively performed in the component can be exported and passed on to the next component in the sequence. ? Education, not evangelization is what is primarily needed to improve visualization adoption within data science, including providing easily accessible galleries of useful visualization techniques based on data type and tasks, giving examples of best prac- 61 tices, and finding allies within the data science community who can evangelize on our behalf. 62 Chapter 4: Evaluating Performance, Space Use, and Presence in Immersive Analytics Having evaluated the exploratory data analysis process of data scientists in their typical work environment, we wanted to pivot toward studying exploratory data analysis in imm- mersive environments. To do this, we conducted a multi-phase user study with economists and statisticians using an existing IA tool, ImAxes [52]. 4.1 Mixed Methods for Immersive Evaluation: Supervised and In-the-Wild In Batch et al. [17], our study involved four main phases (Figure 4.6): a pilot study (P), a formative ?in-the-wild? phase (F), and two in-depth phases (S1+S2). 4.1.1 Setting and Participant Pool All phases of the study were conducted at a U.S. federal agency where one of the authors was embedded. The participant pool for all experiments thus consisted of data scientists, economic analysts, and economists employed, interning, or contracting at this agency. Overall, the education level was high among our participant pool, with all participants having advanced degrees in economics (collectively, 6 master?s degree and 3 Ph.D.s), statistics/mathematics (1 master?s, 2 Ph.D.s), public policy (2 master?s), political science (1 master?s, 1 Ph.D.) or similar domains. Participants in our individual in-depth experiments were screened to be experts in data analysis; they routinely used data management and 63 analysis operations daily and had several years of experience working in this duty. 4.1.2 Apparatus All studies were conducted in a small office of approximately 10?10 feet (3?3 meters) dedicated to this study. The computing equipment was a personal computer equipped with a Nvidia GeForce GTX 1060 (6GB) GPU, Intel Xeon E5-2620 v3 (2.40GHz) CPU, and 16GB RAM, and running Microsoft Windows 10. The rig was equipped with an HTC Vive VR system, including a head-mounted display (HMD), two base stations, and a monitor that enabled the experimenter to observe the viewpoint of the HMD. The ImAxes application was built using Unity 5.6.5f1. Additional evaluation of video and telemetry data was conducted using a PC equipped with an EVGA GeForce GTX 1080 SC (8 GB) GPU, Intel Core i7-7700 CPU (3.60GHz, 4 cores), and 24 GB RAM, also running Windows 10. 4.1.3 Data Collection Here we review the data collection methods employed across all studies. We use the identifiers P (pilot), F (formative), S1, and S2 (summative 1 and 2) to match collection methods to specific phases: Demographics Survey (P, S1, S2): We began our sessions by introducing the study and gathering demographics information. Specifically, we used a written survey to inquire about their past use of VR, their past use of visualizations, gaming experience, and their professional and academic experience. Telemetry Recordings (P, F, S1, S2): The software was instrumented to record controller and headset tracking data over time. The system also recorded specific interactions, such as grabbing and manipulating axes, creating visualizations, and selecting data. 64 Video Recordings (P, F, S1, S2): Two Raspberry Pi Zeros with 8MP Pi cameras and with MotionEyeOS served as webcams set up to capture the interaction space whenever the software was active. One Raspberry Pi was positioned at chest height directly in front of the user?s starting position, and the second was positioned in a top corner of the room, a location it shared with one of the Vive?s base stations. Screen Recordings (S2): Screen activities were captured using the Windows built-in screen recorder from the game bar. These showed the virtual environment from the partici- pant?s viewpoint. Audio Recordings (S1, S2): We recorded participant think-aloud utterances during in- depth sessions using a mobile device. Exit Interview (S1, S2): We ended sessions with a survey and an open-ended interview; answers were recorded and transcribed. 4.1.4 Common Procedure Users signed in to use the device. Users were only permitted to access the system if they were formal participants of the study who had signed the consent form. Participants were verbally informed that their activities would be recorded even if the researcher was not present during their use of the implementation. At the end of the study, participants were asked to complete an exit interview and a survey. The procedure for S1 and S2 in particular is given in detail in the study preregistration: https://osf.io/53e7n/ 65 4.1.5 Data Analysis Our collected data was analyzed with several common methods across the different phases. Here we describe these methods in detail. 4.1.5.1 Visualization of Spatial Activity The tracked 3D telemetry data over time provides important insights in how participants move around (physical navigation), interact with 3D objects (axes and visualizations), and arrange their space. To best analyze and present this data, we aggregate movement data over time into a projected 2D grid of the space. We use a top-down view to study physical navigation as well as spatial arrangements of views and axes (heatmap), and a side view to explore interaction heights (histogram). 4.1.5.2 Replaying Participant Sessions By combining telemetry data and interaction logs, we are able to replay individual participant sessions. This allows us to understand the participant?s view of the analytical space at any point in time. This ability to replay sessions is useful for understanding dynamic behavior and to recreate the arrangement of the space at different times. 4.1.6 Formative: Pilot and ?In the Wild? Studies Deploying a novel technical intervention in a new environment typically requires careful customization [212]. Prior to actually evaluating the utility of IA for economic analysis, we thus conducted a month-long formative study that included a pilot study (2 weeks) and an ?in-the-wild? deployment (3 weeks). We opted to use the ImAxes platform [52] for immersive multidimensional visualization as our starting point; see the next section for 66 details. An added benefit of this formative approach is that it allowed us to continuously iterate on the design based on results from the user sessions as they occurred throughout the duration of the study. Participants were updated on notable changes to the system as they occurred and were asked to engage in additional tutorial, challenge, exploration, and interview activities following each major change to the system. The ImAxes System ImAxes [53] is an IA system based on the concept of embodied axes to let users build data views in a 3D virtual environment. Each axis corresponds to a dimension in a multivariate dataset. Users define visualizations by positioning axes in the 3D space, a spatial grammar producing specific visualizations based on their layout. The basic operations consist of combining two or three orthogonal axes, which produces 2D or 3D scatterplots, respectively. Axes arranged in parallel to each other yield a parallel coordinates plot [111]. More advanced operations consist of stacking axes at the extremities of the axes of an existing scatterplot, which extends 2D and 3D scatterplots to scatterplot matrices. ImAxes also uses the proximity between visualizations to create linked 2D and 3D scatterplots. Pilot Study During the initial pilot study, we invited 6 participants to use the ImAxes platform for hour-long individual sessions with ImAxes left unmodified from its previous incarnation apart from the inclusion of an embedded tutorial. The dataset used during the pilot was the classic cars dataset [64]. The purpose of the pilot study was to: 1) identify the new features to add to ImAxes, 2) calibrate our data collection mechanisms, and 3) determine the datasets participants wanted to view. ?In-the-Wild? Study After having established a working baseline system, we launched an ?in-the-wild? formative study where the equipment stood available for a full three weeks 67 for anyone to use at their own discretion. The author embedded at the agency advertised the study via the agency intranet, encouraging interested volunteers to bring their own datasets to explore. The room was kept unlocked, basic documentation was made available in the room, and the software configured to allow new participants to sign in and load their own data. However, while no experimenter was present during these sessions, IRB regulations required us to collect signatures of informed consent from volunteer participants. This allowed us to record video and telemetry whenever the equipment was in use. Similar to the pilot, the purpose of this study was to collect data for how to customize the system for an economist audience. A total of six participants were engaged in this formative study (all provided signatures of informed consent; no unauthorized person used the tool). They logged a total of 3.8 hours of use in ImAxes during this phase. Figure 4.6 outlines the significant findings from our review of the logged data: this includes several observations that lead to refinements, as well as direct feature requests by the participants. Throughout the three weeks of the formative study, we rolled out new features as soon as they were implemented, essentially using the field deployment as a ?living laboratory.? 4.1.7 Improvements to ImAxes The original ImAxes system lacked many features necessary for an economics setting, including some that aid users regardless of domain background. We thus extended the system with additional features to support general use improvements to visual exploration and analysis of data based on feedback from economists. Below we list the main features added (labels refer to Figure 4.6). ] DR1: Tooltip (Details-on-demand). We implemented details-on-demand as a tooltip for 2D and 3D visualizations using a pointer metaphor (Figure 4.7). By pressing a button 68 on the controller and pointing in the direction of a 2D visualization, the data values of the nearest point are shown in a 2.5D box with a leader attached to the point. To obtain details-on-demand in a 3D scatterplot, a small pointer sphere is attached to the VR controller that can probe the nearest values. ] DR2: Time-series data. The original ImAxes supported only scatterplots and parallel coordinates plots. Since many of our users wanted to explore time-series data, we added line graphs as well. ] DR3: Visualization design menu. We added a simple menu control panel attached to the VR controller. This allows users to remap data dimensions to axes, create and bind a gradient colour to a continuous variable, and map the size of the points or lines to a data attribute. ] DR5: Add mountain range backdrop. Participants disliked the space?s flat, sharp horizon and featureless terrain, causing us to add a mountainous landscape in the distance. ] DR6: Axis selection. Vanilla ImAxes used a shelf metaphor for selecting axes, where data axes were arranged in rows like books on a bookshelf.1 While this metaphor is easy to understand, it does not scale with the number of axes and requires a large amount of locomotion (walking or teleporting). Our first solution, a ?Rolodex? (DR4), was poorly received. Instead we implemented a rotational menu based on a ?Lazy Susan? metaphor. The menu can be rotated like a Lazy Susan via the controller touchpad. An axis is selected by pulling it out of the Lazy Susan menu. Thus, in the end, there was no shelf. ] DR7: Grouped selection. Based on user feedback, we implemented a group selection mechanism that allows the user to move a linked group of visualizations instead of a single one. This enables the user to arrange visualizations around them without breaking links. 1We need axes, lots of axes. (https://youtu.be/5oZi-wYarDs) 69 4.1.8 Summative: Case Studies in Economics To understand the utility of IA for professional analysts and data scientists, we conducted a contextual inquiry using our ImAxes tool in case studies involving participants from one of several bureaus of the U.S. federal government. This part of the study was split into two phases: Summative 1 (S1) and Summative 2 (S2). participants during S1 used a version of ImAxes that was slightly different in S2 (Figure 4.6). We report on both below, highlighting differences when needed. Unfortunately, a software error precluded collection of axis position data from S1. Other data was collected from both groups. Table 4.1: Phases and datasets for summative participants. # Job Title Yrs Education Phase* Dataset? exp 1 Economist 12 M.A., Economics S1 D1+D2 2 Economist 2 M.A., Economics S1 D3+D4 3 Economist 9 Ph.D., Economics S1 D1+D5 4 Economist 6 M.A., Economics S1 D2 5 Economist 4 M.A., Economics S1 D6 6 Econ spec. 8 M.A., Int. Business S1 D6 7 Economist 2 M.A., Econ/Public Policy F, S2 D7 8 Economist 5 M.A., Public Policy S2 D8 9 Statistician 3 M.S., Statistics/Math P, S2 D7 10 Economist 9 Ph.D., Economics S2 D8 11 Economist 13 M.A., Economics S2 D8 12 Economist 3 Ph.D., Economics S2 D8 * P = pilot, F = formative (?in the wild?), S1/2 = summative 1/2 ?Dataset labels in Table 4.3. 4.1.9 Participants We recruited twelve participants (six in each phase) with expertise in economics, statis- tics, and data science. The participants were all employees at a U.S. federal agency with job 70 descriptions that include data analysis, all with 2?12 years of experience (M = 6.21,SD = 3.93) and graduate degrees in economics or related fields (Table 4.1). They had significant experience in using data analysis tools in their daily work (Table 4.2). While outside the scope of this study, the typical workflow in government and industry data analysis is de- scribed by Batch and Elmqvist [18] and Kandel et al. [119]. Six participants had used VR previously, and five participants routinely played video games (1+ hr/wk). Table 4.2: Count and context of participant use of specific tools. Environment/Language Ever Work Graphic analytical env. (e.g., Tableau, Excel) 12 11 Statistical lang. (e.g., Stata, R, SAS, Julia) 11 10 DBMS (e.g., SQL, PostGres, dBase) 8 6 Econometric DBMS (e.g., FAME, Aremos) 5 5 Markdown/doc-creation (e.g., HTML, LATEX) 6 4 Object-oriented lang. (e.g., Python, JS) 7 3 Imperative lang. (e.g., FORTRAN, Pascal) 1 1 4.1.10 Procedure Our study consisted of several stages: preparation, tutorial, exploration, presentation, and post-session interview. Preparation: Before participants even appeared at the study session, we asked them to send suitable datasets (Table 4.3) that we could integrate into ImAxes prior to the study. Some of these datasets caused us to make changes to the tool itself, such as the axis selection metaphor, as described in Section 4.1.7:DR6, which we implemented to accommodate a larger number of variables than practical in the shelf layout. 71 Table 4.3: Datasets used by summative participants. # Dataset Name D1 Compensation by State and Industry D2 Nominal PCE by State and Industry D3 International Trade: Services D4 U.S. Military Spending D5 BLS Consumer Price Index D6 National PCE Price Indexes D7 Blended Health Care Satellite Accnt/Capita Exp. Index D8 Nominal PCE by State Instructional Tutorial (10 mins): We began by familiarizing the users with ImAxes via a tutorial embedded in the system. Pre-recorded interactions were played, and the user was prompted to follow along to learn how to use the tool. During this stage, the affordances of a single axis were exhaustively demonstrated before moving on to two axes, then three, and finally SPLOMs and parallel coordinate plots. After each feature was demonstrated, we asked participants to use that feature in a sample dataset. Before finishing the training, participants were encouraged to freely explore the sample dataset while verbalizing their thought process using a think-aloud protocol. Exploration (30 mins): Now participants were set free to explore their own dataset on their own. Exploration was structured as a sequence of iterations, each no less than five minutes, and started with giving the participant the option of introducing a new dataset if desired. For each iteration, the researcher prompted the participant to maintain the think-aloud protocol, and would gently inquire about their motivations throughout the duration. The goal of each iteration was to generate at least one insight and corresponding visualization. Participants were told that they would be expected to present their findings, and were regularly updated on remaining time. 72 Presentation (30 mins): Finally, the participant was asked narrate their findings as if they were presenting their analysis to an external party (the experimenter). The participant was reminded that the experimenter could see what they saw on a monitor, and was asked to create at least one distinct visualization for each point in their narrative. They could use speech, gestures, and ImAxes itself to tell their story. Post-Session Interview: Immediately after the exploration activity, the researcher and participant engaged in a brief, semi-structured interview and survey to (a) validate the researcher?s understanding of the user?s motivations for their actions during the explo- ration activity, and (b) Evaluate the user?s sentiment regarding the existing iteration of the implementation, including features that they felt were lacking. 4.2 Findings in IA with Economists Table 4.1 reviews the participants and their datasets. Below we discuss a representative use case derived from the experiment. We then present the performance and subjective results. 4.2.1 Representative Use Case The following scenario is a pastiche based on our observations of participants as they explored and presented insights from their macroeconomic data. It is not a description of a single user session; rather, it is a collection of real observations from multiple sessions organized into a representative, narrative summary. In other words, unlike the scenario in the introduction, it is not fictional; these events all happened. The scenario begins with our economist ?Sasha? loading their regional personal consumer expenditures dataset into ImAxes. Sasha has just found this dataset from a public source and wants to explore the 73 2007?2009 Great Recession?s effect on trends in consumer expenditures. u U SASHA dons their VR headset, launches ImAxes, and grabs three axesfrom the Lazy Susan. They build a 3D scatterplot of TimePeriod ?Goods ? GeoFips (states) by first holding TimePeriod and Goods or- thogonal to each other, then placing GeoFips orthogonal to the scatterplot?s ori- gin. They orient the visualization so that they are looking down the temporal axis, leveraging the depth perception afforded by VR to provide a view of the states where the goods have trended higher over time. Using this view, they activate the details on demand using the controllers for these states to obtain numeric values of points along the axes. Sasha then flips the view so that they are looking at TimePeriod from the side, and points out the general upward trend in total consumer spending for all com- modities over all time periods except the Great Recession around 2009. They create a 2D scatterplot of gasoline expenditures over time, noting that the trend is less stationary (i.e., has greater variance over time) in that particular commod- ity than in others. Sasha creates a 3D scatterplot of Food Services ? TimePeriod ? Off Premises Food and Drink. Grabbing another Time Period axis, they switch from a 3D scatterplot to two separate 2D scatterplots, which they stack on top of each other. They observe that there is a switch from spending on restaurants (Food Services) toward spending more on groceries (Off Premises Food and Drink) during the Great Recession. 74 Once they have constructed all of charts they intend to discuss with their col- leagues, they arrange them in the space in a linear order from left to right roughly corresponding to the narrative order they plan on following, a little like a mu- seum or gallery of artifacts. As they discuss each point, starting with the most ag- gregate commodity bundles and drilling down into more detailed commodities, they dynamically interact with the visualization with one or both hands, shift- ing it for a different viewing angle with one hand and calling the tooltip with the other hand to give their expert audience the detail they would otherwise demand. When they are done discussing the points related to one visualization, they walk to the right to begin their next talking point, until they have run through all of the economic trends they wish to discuss. u U 4.2.2 Explore Stage Participants spent between 4 and 10 minutes (M =4:33,SD =1:50) in the explore stage. All participants would begin the stage by facing the Lazy Susan within arm?s reach, and would rotate it until they found an axis they recognized from which they could start exploring the data. Participants would then often rotate their body away from the Lazy Susan to create a work space by building basic 2D and 3D scatterplots. Figure 4.1 shows that most participants stayed in one place and arranged views egocentrically (E1). However, none utilized the full 360? space. This behavior of recycling the views and axes in their workspace instead of physically moving to a new workspace also supports prediction E1.2 (participants would arrange their views within easy reach). To examine prediction E1.1?that participants would arrange views at roughly chest 75 Figure 4.1: Heatmaps of axis interaction in S2 (top-down). Participant position and view direction is represented by a direction arrow. level?we studied interaction patterns w.r.t. height. Since our tracking data only includes headset position, we estimate chest height to be approximately 30 cm below this position. Figure 4.2 shows a histogram of these relative interactions. Interaction above eye level often occurred when building scatterplot matrices, highlighting a physical limitation of the ImAxes systems (that a user must be able to reach the ends of a scatterplot matrix). While this limitation is somewhat mitigated by design refinement DR7 (grouping mechanism), these observations suggest that the issue is still present. Participants would often discard axes and visualizations while exploring the data, main- taining only one to two visualizations at a time (supporting E2). Essentially, participants were recycling their views and continuously cleaning their space. Furthermore, we observed that certain types of visualization would be more transient than others. Notably, linked visualizations, whether between two axis or between an axis and a scatterplot, were created 76 explore present 0.3 0.0 ?0.3 ?0.6 0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000 count Figure 4.2: Histogram showing the vertical distance of participant interactions with axes relative to their eye level. Eye level is at 0, and the approximate chest level is represented by the red line. 77 distance from eyes (m) Figure 4.3: Macroeconomics analysis in the ImAxes immersive analytics tool [52]. (Photo by Samuel Zeller on Unsplash.) and used more than any other type of visualization, but the majority existed for less than five seconds. 4.2.3 Presentation Stage Participants spent between 7 and 11 minutes each (M =6:30, SD =2:30) during the presentation stage. Most participants chose to organize their views in either a linear or semi- circular layout. For example, Participants 4 and 5 placed a series of visualizations in a left to right ?narrative order? (Figure 4.1). This somewhat supports prediction P1 (participants will arrange the views in an exocentric way). However, as can be seen in the ?present? columns of Figure 4.1, these arrangements were not strictly exocentric, but remained egocentric (undermining P1). We belatedly realized that since the experimenter?the intended audience of the presentation?viewed the 3D space through the eyes of the participant, there was no incentive for the participant to organize the space in an exocentric fashion. However, we did find support for views being arranged in chronological order (P1.1); Figure 4.3 shows snapshots of several final view layouts. We predicted (P2) that participants would build more complex visualizations during the 78 Table 4.4: Count of view creations per participant, split into exploration (E) and presentation (P) stages. 2D Scatterplot 3D Scatterplot SPLOM Link Participant E P E P E P E P 1 8 17 10 2 29 - 111 113 2 9 33 - 12 5 5 116 201 3 27 74 6 16 - 26 103 4136 4 7 43 1 11 8 18 123 516 5 5 58 2 15 6 22 31 195 6 3 49 - 23 2 35 527 2866 presentation stage as they would spend time to carefully craft a meaningful visualization. This is also supported by our data; Table 4.4 indicates that most scatterplot matrices were used in the presentation stage. All participants except Participant 1 created a scatterplot matrices while preparing for presentation stage; however, only Participant 3 actually used a scatterplot matrix when presenting their data. Five of the six participants explored the data using parallel coordinate plots. However, it is worth noting that during the presentation stage, only Participant 3 used a parallel coordinate plot. 4.2.4 All Stages We captured view creation events for 2D and 3D scatterplots, SPLOMs, and linked views. These events are summarized in Table 4.4. Contrary to prediction A1 (that participants would avoid complex visualizations), all participants (except P3) experimented with creating scatterplot matrices during exploration. The majority of these scatterplot matrices involved adding a third axis to an existing 2D scatterplot in order to see the relationship between two variables and on a common axis (such as a time-series axis). 79 Details on Demand Distance moved per minute (meters) ? ? 5.0 7.5 10.0 12.5 Head rotation per minute (degrees) 0 3000 6000 9000 Exploration Presentation Figure 4.4: Movement frequency distributions by study phase All participants except P2 and P6 created 3D scatterplots during the explore stage. However, all participants used 3D scatterplots during their presentation stage. Notably, P2 and P6 used 3D scatterplots exclusively during the presentation stage, and P5 used three 3D scatterplots and a 2D scatterplot during the presentation stage. This result ran counter to prediction A1.1 (participants would avoid creating 3D visualizations). One participant commented that they felt they may as well use 3D scatterplots and other kinds of visualizations as they were exploring data in VR, saying ?I wanted to create more graphs of different types, [especially for] my presentation.? We found only weak support for A2; during the explore stage, participants would merely choose the nearest open space for creating new views, i.e., not using an organizing principle. Only in the explore stage were they more conscious of structuring the space; more specifically, as noted in our observations supporting P1.1, chronology was a common such organizing principle (also partially supporting A2.1). We also noted that many undertook a ?curation? stage where they would select views that should be included in the presentation, and move them to a designated area. When considering A2.2, we expected participants to minimize their walking, rely- ing instead on rotating their viewpoint. We found that participants walked less during 80 [CNTRLR] Virtual objects were easy to manipulate [CNTRLR] Virtual filtering controls were easy to use [CNTRLR] Display and control devices were inobtrusive [CNTRLR] Control mechanisms were not distracting [ADJUST] Felt proficient using environment post?session [ADJUST] Environment responded predictably to user actions [ADJUST] Adjusted quickly to environment [DISCRD] No disorientation after exiting [DISCRD] Information from various senses was consistent [DISCRD] Environment was responsive to user?initiated actions [DISCRD] Environment was free of delays [ENGAGE] User lost track of time while in the environment [ENGAGE] The session was intense [ENGAGE] The environment engaged the senses [ENGAGE] Experiencing the environment was enjoyable [IMMERS] Lost sense of real?world position and orientation [IMMERS] Able to block out awareness of real?world events ?100 ?75 ?50 ?25 0 25 50 75 100 Percent No Not Really Kind of Yes Figure 4.5: Subjective ratings from exit survey. Subfactors include: controller ease of use [CN- TRLR], adjustment to environment [ADJUST], perceived immersion [IMMERS], user engagement [ENGAGE], and avoidance of sensory discord [DISCRD]. the explore stage compared to the presentation stage. We ran a paired sample t-test to compare the movement per minute of the explore and present stages. The present stage (M = 8.10m, SD = 2.18m) had significantly more movement than the explore stage (M = 5.41m, SD = 1.8m); t(5) = ?3.456, p = 0.018 (see chart). One participant com- mented that ?When I start thinking of myself as a visual focal point rather than thinking of myself as being surrounded by [vertical] boards, viewing the environment became easier and I felt comfortable using more of the space.? We did not find a significant difference in head rotations per minute between the present stage (M = 6671.86,SD = 5397.94) and the explore stage (M = 2571.38,SD = 1321.33); t(5) =?2.277, p = 0.072. 81 4.2.5 Self-reported Perceptual and Cognitive Effects Even if ImAxes depicts an abstract data analysis setting, it is subject to the same strengths and weaknesses as a general virtual environment; Figure 4.5 shows self-reported perceptual and cognitive effects similar to typical such environments (A3). According to the figure, participants reported high scores for perceived engagement (A3.1), rating the experience as enjoyable and engaging the senses. We expected participants to report a high level of presence (A3.2) using the system. Supporting this prediction, the survey responses (Figure 4.5) show that participants felt that they were interacting in a natural environment (100% described the environment as being realistic and generally feeling natural, 83% felt moving around was natural). One user described the experience as being somewhat like ?being in the Mojave Desert.? Several reported that they lost track of time while in the environment. However, full presence may not always be ideal; two thirds of participants reported that their exploration of the data was not intense, and several users pointed out that the sound of the researcher?s voice improved their sense of orientation in the real room. As for increased fatigue level (A3.3), we were not able to find support for this prediction. In fact, as our discussion for A2.2 shows, physical navigation actually increased for the presentation stage (which followed exploration), suggesting that fatigue was not a factor. Furthermore, the nausea level was low, which is another indication that participants were not fatigued by the end of the study. We made several predictions related to the challenges that an immersive VR system could potentially introduce. We predicted that participants would suffer from reduced text legibility (A3.4). However, the reported Likert scores for the ability to examine closely and obtain details from objects in the environment was high (Figure 4.5), which undermines this prediction. We also expected that participants would suffer from the impreciseness of the VR wand controllers (A3.5). Again to the contrary, the Likert scores 82 indicate that actually participants were able to effectively interact with the 3D virtual objects in ImAxes. By and large, one of the things the participants stated liking most about the environment was that visualizations were very fast and easy (or intuitive) to create relative to their traditional 2D working environment. However, there was a more even split in the participants in regards to wearing the physical VR headset, with one participant reporting ?I don?t really like wearing a headset. It?s cool to look at things in 3D, but it doesn?t really add enough value. However, I also don?t usually create visualizations in general during my analyses.? Finally, we made two predictions in regards to VR experience; that participants with a lack of VR experience would encounter significant navigation issues (A4), and that gaming experience would mitigate a lack of VR experience (A4.1). Based on differences in survey responses for users with VR experience versus those without, we find support for the first of these predictions, but not for the second. In fact, participants without VR experience who regularly play computer games for more than an hour in a week reported having more difficulty with the controls and had a more difficult time examining objects in the environment than those who were not regular gamers. 4.2.6 Qualitative Feedback Beyond interaction and visualization requests, participants provided several insightful comments. Whiteboard analogies were commonplace: ?It feels like I?m surrounded by whiteboards,? said one user; another, after arranging axes around himself in a semi-cylinder shape, described it as feeling like a ?wraparound whiteboard.? When asked how ImAxes compared to their traditional desktop display, participants had a range of responses; 58.3% reported feeling more engaged in the problem while in ImAxes than while in their traditional environment (A3.2). The most common draw participants 83 felt to the environment was that creating visualizations was easier and faster in ImAxes than in their typical environments. In general, participants said they might be able to use ImAxes for preparing presentations, reports, and video communications, or for exploratory analysis data validation. One participant responded that they could use it to detect errors during the monthly multi-stage process of reviewing economic indicator estimates prior to publication. Another said that their indicator estimation process involves multilateral aggregation for price indices, and ImAxes could be useful for exploratory analysis during that process. One user noted that export for 2D display presentation would be particularly helpful for the purpose of creating reports. Several participants said that a major barrier to wanting to use ImAxes on the job is that VR is inconvenient for the purpose of the type of work they perform, which typically involves programming and switching between multiple environments. Said one participant, ?VR seems more oriented toward real-time demonstrations, which is great, but that?s not useful for [the participant?s] analytical process, which involves long periods of exploration and evaluation switching between tabular views, charts, modeling, and programming.? While we were able to implement some changes between our formative phase and our summative phase, there were some changes that were not practical to implement during the span of this study; some of these might be considered applicable for general use, while others are more economics domain-specific. One participant, who was not interested in using ImAxes on the job, said ?it would be sick [sic] if I could click something and see the full hierarchy of categories in the data.? The absence of this feature wound up being the primary reason for their recalcitrance. Other features this particular participant wanted to see included the ability to run regressions, a group-by mechanism, extra-grammatical filtering mechanisms for building views, and simple computational tasks. Like this participant, several other participants during the summative and earlier phases of the study suggested the inclusion of matrix and column-wise algebraic operations. A number of participants 84 torta s c ) da on ele dr op nd ic ati S cka r ce em no m y) liz int xis a otr a o A e b ec t ue n o s u p l n d ec d u vis u ( s) dia ran g el ism eq s o fic In es en te a n xis s ha n g sn eta il ci by eri ue m R ai a c i d sp CE e-s r trib re t lle nt n e in ( n- l u a P m tro l k a tro o us ing m d t ra olt ip ai al ti n m d om on dd Co n n S r, l i o p C dd zy rou 1: A To: : D g i A 3: lo :: A a1 1 Re, R2 DR e/c o R4 : L G R R R . z D DR 5 6: 7: S D S D Re.g s i D DR ( Pilot Formative Summative 1 Summative 2 2 weeks 3 weeks 5 weeks 4 weeks 6 participants 6 participants 6 participants 6 participants T I M E OP O1 P O 2 P O O 3 P4 F OF : 1 W : N : O : S : l h ab 2: a e R o e pe e ad r i Mn at t d o t n l h f a ia ul s ni o p u" is me or xi aiz s s xi la e te s s o e in t sh eri n u lec e g e le m en s nu vi s to s et r ct uo l m t l do r t ipl be in e un e a r g s ps n" ? x t o es pu sc a ll ae r Figure 4.6: Process, timeline, refinements, and observations for our study of ImAxes [17]. DR4 shown in grey, as it was replaced based on user feedback. also requested linear modeling operations and views of multicollinearity, which they noted as being particularly relevant for hedonic modeling. Another common request was that we extend ImAxes as a tool specializing in outlier detection. Economists typically evaluate time series; while our addition of a line mark connecting scatterplot points was one change we did implement to accommodate this activity, participants regularly reused time period axes, and the option of having a convenient ?favorite axes? quick-access area was requested by multiple users. Finally, one participant strongly suggested the addition of a Markov Chain Monte Carlo simulation, stating that it is ?what everyone is doing now? in econometric modeling. IA with Economists: An Iterative Development Vignette u U 85 Observations Study Imaxes Design/Study Phase Refinements (DR/SR) Figure 4.7: Tooltip providing details-on-demand for data items. The original ImAxes system lacked many features necessary for an economics setting, including some that aid users regardless of domain background. We thus extended the system with additional features to support general use im- provements to visual exploration and analysis of data based on feedback from economists. Below we list the main features added (labels refer to Figure 4.6). DR1: Tooltip (Details-on-demand). We implemented details-on-demand as a tooltip for 2D and 3D visualizations using a pointer metaphor (Figure 4.7). By pressing a button on the controller and pointing in the direction of a 2D visu- alization, the data values of the nearest point are shown in a 2.5D box with a leader attached to the point. To obtain details-on-demand in a 3D scatterplot, a small pointer sphere is attached to the VR controller that can probe the nearest values. 86 DR2: Time-series data. The original ImAxes supported only scatterplots and parallel coordinates plots. Since many of our users wanted to explore time-series data, we added line graphs as well. DR3: Visualization design menu. We added a simple menu control panel at- tached to the VR controller. This allows users to remap data dimensions to axes, create and bind a gradient colour to a continuous variable, and map the size of the points or lines to a data attribute. DR5: Add mountain range backdrop. Participants disliked the space?s flat, sharp horizon and featureless terrain, causing us to add a mountainous landscape in the distance. DR6: Axis selection. Vanilla ImAxes used a shelf metaphor for selecting axes, where data axes were arranged in rows like books on a bookshelf.2 While this metaphor is easy to understand, it does not scale with the number of axes and re- quires a large amount of locomotion (walking or teleporting). Our first solution, a ?Rolodex? (DR4), was poorly received. Instead we implemented a rotational menu based on a ?Lazy Susan? metaphor. The menu can be rotated like a Lazy Susan via the controller touchpad. An axis is selected by pulling it out of the Lazy Susan menu. Thus, in the end, there was no shelf. DR7: Grouped selection. Based on user feedback, we implemented a group selection mechanism that allows the user to move a linked group of visualizations 2We need axes, lots of axes. (https://youtu.be/5oZi-wYarDs) 87 Figure 4.8: Lazy Susan menu implemented for the study. Participants spin the menu by rotating their thumb on the controller touchpad. instead of a single one. This enables the user to arrange visualizations around them without breaking links. u U 4.3 A Discussion of Our Predictions Versus Our Results In Batch et al. [17], we reported on a design study investigating the use of IA [154], specifically Virtual Reality (VR), for professional economic analysis in a U.S. federal agency. Inspired by Sedlmair?s design study methodology [212], this overall study consisted of multiple phases: 1. A design stage where we collected requirements using contextual inquiry methodol- ogy [29] and improved an existing immersive VR system for multidimensional data analysis?ImAxes [52]?to support macroeconomics data; 88 2. A formative ?in-the-wild? deployment of the prototype application in a communal space, which lead to multiple incremental insights and improvements of the prototype; and 3. An in-depth mixed methods study (preregistered) involving professional economic analysts exploring their own datasets in our immersive economics environment, and then presenting their findings to the experiment administrator. The results from these studies include observations, video and audio recordings, interac- tion logs, and subjective interview plus survey feedback from the participants. In particular, we report on the use and organization of space to support analysis and presentation, barriers against effective use of immersive environments for data analysis, and the impact of immer- sion on navigation and orientation in 3D. Even more specifically, our predictions?organized into the stage of our study they refer to, exploration (E), presentation (P), and all (A)?were as follows: E1 Participants will arrange the views egocentrically around themselves. Motivation: For individual work, it is more efficient to use local space around yourself. Result: We found evidence supporting this prediction. E1.1 Participants will tend to arrange their views at chest level. Motivation: Par- ticipants have no specific VR training, and will thus likely not utilize the 3D environment to the fullest. Result: We found mixed or inconclusive evidence about this prediction. E1.2 Participants will arrange their views within easy reach of the center of the space. Motivation: The small space that the study is conducted in will not permit significant physical navigation. Result: We found evidence supporting this prediction. 89 E2 Participants will build many ephemeral visualizations that they quickly discard. Mo- tivation: ImAxes supports exploration by creating transient and new visualizations through brushing. Result: We found evidence supporting this prediction. P1 Participants will arrange the views in an exocentric way. Motivation: During presenta- tion, it makes sense to more carefully arrange the views, e.g., in a gallery or sequence. Result: We found mixed or inconclusive evidence about this prediction. P1.1 Participants will arrange the views in a chronological order w.r.t. to their presen- tation. Motivation: The intelligent use of physical space can help streamline a narrative. Result: We found evidence supporting this prediction. P2 Participants will build more complex visualizations in the presentation stage than the explore stage. Motivation: Presentation involves creating a linear, coherent, and comprehensive narrative. Care can thus be spent on crafting complex visualizations. Note: This prediction was not part of our preregistration. Result: We found evidence supporting this prediction. A1 Participants will prefer basic visual representations (scatterplots, linegraphs, maps), and avoid more complex ones (parallel coordinates, scatterplot matrices). Motivation: These more complex representations are not commonplace in real-world data analysis. Result: We found evidence against this prediction. A1.1 Participants will avoid using 3D representations (such as 3D scatterplots or surfaces). Motivation: Our participants have no VR training and are accustomed to 2D displays in their work. Result: We found evidence against this prediction. 90 A2 Participants will utilize the physical space to structure their work. Motivation: Physical space can be used to support specific tasks, e.g., to simplify choice, perception, and computation [125]. Result: We found evidence supporting this prediction. A2.1 Participants will group views in space based on their logical relationships. Moti- vation: Views that belong together should be grouped in physical proximity. Result: We found evidence supporting this prediction. A2.2 Participants prefer interacting with objects at a near distance than those at a far distance. Motivation: Near objects require no physical navigation to access, and ImAxes does not support a reliable distance interaction technique. Result: We found evidence supporting this prediction. A3 Participants will report typical perceptual and cognitive effects of VR on their perfor- mance and perception. Motivation: Even if ImAxes depicts an abstract data analysis setting, it is subject to the same strengths and weaknesses as other VR applications. Result: We found mixed or inconclusive evidence about this prediction. A3.1 Participants will report a high level of engagement. Motivation: VR is commonly associated with high engagement because of realism and low indirection. Result: We found evidence supporting this prediction. A3.2 Participants will report a high level of presence. Motivation: VR is commonly associated with high presence because of the low indirection, natural interaction, proprioception, and the perception of physical space. Result: We found evidence supporting this prediction. A3.3 Participants will report fatigue from physical navigation and interaction. Motiva- tion: The use of gross body motor controls to navigate the virtual environment 91 and interact with its objects will require significant effort by the participants. Result: We found mixed or inconclusive evidence about this prediction. A3.4 Participants will suffer from reduced legibility of text in the 3D environment. Motivation: HMDs have a significantly lower resolution than typical monitors, and labels in ImAxes are 3D and thus subject to distance and orientation con- cerns. Result: We found evidence against this prediction. A3.5 Participants will suffer from the challenge of using VR wands to interact with virtual 3D objects. Motivation: While more direct than using a mouse and key- board, the HTC Vive controllers still do not allow for hand and finger interaction. Result: We found evidence against this prediction. A4 Participants will encounter significant navigation and interaction hurdles due to a lack of VR expertise. Motivation: Our participant pool has no specific VR training, and will thus be challenged by 3D navigation and interaction concerns. Result: We found mixed or inconclusive evidence about this prediction. A4.1 Participants with 3D computer gaming experience will be less hindered by lack of VR training. Motivation: 3D gaming experience will help people interact more efficiently. Result: We found mixed or inconclusive evidence about this prediction. We were surprised to find that many of our predictions?particularly those anticipating negative effects of VR use?found no support in the collected data. For example, we noticed few effects of fatigue (A3.3), legibility was not a clear concern (A3.4), and even participants with little gaming and/or VR experience were able to use our tool efficiently (A3.5). Some of these findings can be easily explained?e.g., that the lack of an exocentric layout likely 92 happens because presenters actually view the environment through the eyes of the participant (P1)?but others are more unexpected. Most of the time, while aptly highlighting our lack of knowledge, these contrary results are actually in favor of IA; for example, participants did actually use advanced visualizations (A1), not merely sticking to scatterplots. On the surface, this finding disappointingly does not extend to the intelligent use of space (A2), as participants in the explore stage merely picked the closest free space to put new visualizations. However, when viewed through the sensemaking loop [188], this makes more sense as one of its early phases involves placing potentially relevant information in a so-called ?shoebox.? When foraging for information, analysts typically do not have time to worry about structure, similar to how the purpose of sketching for artists is to generate new ideas rather than fixate on existing ones. Only in a secondary curation step in our study would participants evaluate these views and organize them into a designated area in the environment (the ?evidence file? in the sensemaking loop). Incidentally, Andrews [5] noted similar observations, referring to them as ?evidence marshalling,? often having the same chronological organizing principle (A2.1) as our findings. Still, it is clear that participants did not use the full available 3D space of the analysis environment to its full potential. Telemetry data and video recordings showed participants mostly stayed in place and merely used the space directly in front of them. We did see participants making better use of the space in the presentation stage. Reasons for this may include the cramped confines of our experimental space, the interactions needed to move visualizations, and no automatic layout control. This points to the need for system support, such as constraints and organization frameworks, to help users organize their spaces as has previously been done for 2D GUI tools [141, 169]. That many of our?in retrospect pessimistic?predictions about the drawbacks of the virtual environment were not supported is worth unpacking. One reason may be the high presence and engagement levels reported, leaving participants willing to simply overlook 93 minor usability concerns. The novelty factor may also be working in our favor and produce goodwill towards the tool. Finally, perhaps the natural interaction metaphors in ImAxes simply aided participants in quickly learning the system and exploring their data. In fact, we were surprised by the level of interest in using ImAxes in the workplace. There were a number of domain-specific tasks and requests which we were unable to fully accommodate in the scope of this study?MCMC simulation, linear modeling and views of multicollinearity for hedonics, a suite of features for outlier detection?as well as the more generally- applicable requests for matrix algebraic operations, quick-access-axes for ?favorite? axes (like time series period indices), and integration with hierarchical views of the data. While these changes were not practical to implement in this study, we view them as low-hanging fruit for future work extending immersive implementations either for general analytical tasks or more specialized economic analysis ones. 94 Chapter 5: View Management for Situated Visualization Advances in mobile and wearable display interfaces, positional sensing, and computer graphics have fueled recent efforts of displaying data in situ. Current research themes such as immersive [43], ubiquitous [66], and situated analytics [233] (IA/UA/SA) explore a world where contextual data is readily available at the fingertips of the user, anywhere and anytime. Particularly exciting is the topic of situated visualization, where data relevant to a location is visualized directly in said physical location [69, 219, 248]. However, there are several challenges in making such situated visualization practically useful, such as the intrinsic VR/MR challenges of registration mechanisms, power consumption, device ergonomics, etc.; a recent paper surveyed the grand challenges of immersive analytics research [72]. One such challenge is view management of situated visualizations: effectively viewing (and interacting with) the situated visualizations that co-inhabit the user?s physical space in an AR environment. A situated visualization is a data visualization ? often 3D volumetric in nature ? that has been embedded into an AR or MR environment [8] to support situated [208], ubiquitous [66], and immersive analytics [154]. However, because of the 3D nature of the environment, making efficient use of such situated visualizations requires significantly more overhead, navigation, and layout than for traditional visualizations drawn on a normal 2D screen. Bell et al. [25] define view management for AR and VR as ?maintaining visual constraints on the projections of objects on the view plane, such as locating related objects near each other, or preventing objects from occluding each other.? Drawing on this definition, 95 we refer to view management for situated visualization as optimizing the user?s view of the visualizations ? both on an individual as well as a collective level ? in a IA/SA environment. We present an analysis of the challenges of view management for situated visualizations in MR, enumerating concerns such as physical distance and reach, orientation and legibility, and depth and occlusion. These challenges apply to both the components of a single situated visualization, as well as to multiple visualizations that exist in the same physical environment. Based on this analysis, we revisit existing techniques from the domain of computer graphics and visualization to propose a set of interaction, layout, and presentation techniques that are designed to mitigate these aforementioned challenges. More specifically, we investigate the following techniques: 1. a shadowbox that enables eliminating effects of perspective foreshortening and occlu- sion in 3D visualizations, including an unfolding interaction for transforming a 3D visualization into individual orthographic 2D views; 2. a cutting plane interaction for accessing occluded elements of a 3D visualization; 3. a world in miniature (WIM) technique for overviewing and accessing multiple visual- izations in a 3D situated analytics environment; 4. a summoning interaction for bringing distant visualizations to the user and arranging them in an accessible grid (dispelling returns them to their original locations); and 5. a data tour for guiding the user through a 3D situated analytics environment to visit all visualizations of interest. While each of our techniques are derived from existing work, we claim that their combination as well as their application to situated visualization in MR, presented in this paper, is novel. Furthermore, to validate the work, we present a practical implementation 96 of these ideas inside a novel situated visualization environment implemented using the web-based VRIA [37] framework. We also report on findings from a remote user study where 12 participants used their smartphones to perform situated analytics tasks using our proposed techniques. While these results highlight many of the typical challenges associated with mixed reality and sensemaking, we also found convincing evidence supporting our suite of view management techniques. 5.1 Properties and Challenges in Situated Visualization View Management While there are significant and valid concerns with using 3D visualizations in the first place [168], these are mostly moot when discussing SA on MR devices. In such settings, the user is by definition active in the real world using hand-held or HMD technologies, and thus representing visualizations in 3D is inevitable even if the visualizations themselves are not 3D. To understand the underlying challenges of view management for situated visualization, let us first enumerate the basic properties of a visualization inhabiting a SA environment [25]: ? Position: The visualization?s location in the environment; ? Size: Its geometric size in relation to the rest of the world; ? Transparency: The opacity of the visualization, which also incorporates its general geometry (i.e., some visualizations such as a 3D scatterplot are more sparse than a volume rendering); ? Priority: A relative priority for each individual visualization (potentially whether a visualization is selected or not); ? Orientation: The visualization?s 3D rotation; ? Distance: Its distance from the viewer and other visualizations; 97 ? Area of interest: The area (often a 3D volume) from which to optimally view the visualization; and ? Spatial relation to the surrounding world: The visualization?s relation to real objects co-inhabiting the physical world. The above list is by no means a minimal one, as some properties are derivatives of others (distance vs. position, for example). However, it is useful to distinguish each of these properties individually, as they all give rise to specific challenges. We outline these below; again, we make no effort to streamline these challenges (e.g., visibility subsuming occlusion), but instead list them individually because they add reasoning power to our argument. Furthermore, some of these challenges apply within a single visualization (e.g., occlusion within the points in a 3D bar chart), whereas others apply for multiple visualizations (e.g., overview for all of the visualizations in an environment), and some apply to both (occlusion between marks, as well as occlusion between two visualizations). Visibility. Maintaining visibility of a situated visualization in the user?s field of view is a fundamental challenge [25]. Many situated visualizations are just that, situated in a specific location in the physical world, which means that they easily fall outside the user?s vision, either by being too far away or above, below, or behind the user. In such situations, the visualizations cannot be moved to always be visible, and other mechanisms must be employed to make the user aware of their existence and location. This challenge is compounded when multiple visualizations are jostling for space on the user?s field of vision. Occlusion. The three-dimensional nature of SA environments means that a geometric object can be hidden by other objects even if they do not intersect in 3D space [67]; the problem is further exacerbated when they do intersect. This fundamental challenge affects 98 both marks within a single visualization, such as a cluster of marks in a 3D scatterplot occluding an outlier on the far end, as well as between multiple visualizations, such as a 3D volume occluding a barchart in the distance. Overview. Overview is a central aspect of data visualizations [218], but gaining an overview of all of the visualizations in a SA environment is particularly challenging because of the 3D nature of the space. This is not merely about the ability to access and read an individual visualization, but being aware of its existence in the first place; a visualization that is fully occluded by other visualizations, outside the user?s field of view, or too far away to see, will inevitably not be included in the overview. This means that many of the below challenges contribute to the overall Overview challenge. Perspective Foreshortening. A more subtle aspect of the 3D environment is the impact of perspective foreshortening due to visualizations being at different distances from the viewer. Perspective foreshortening arises from the non-linear 3D perspective, essentially making nearer items disproportionately larger than more distant ones. Besides having an impact on the Occlusion challenge, it also makes it difficult to compare between two visualization marks at different distances, such as two different bars in a 3D bar chart. Legibility. A particular concern for situated visualizations that are not rotation invariant or not always facing the user, such as a billboard,1 is legibility, particularly of text. In such situations, the slanted or rotated view of the visualization makes reading more difficult or even impossible. In addition, similar legibility concerns arise when a visualization is far away from the viewer, making graphical features in general?and text in particular?too small to distinguish. 1In 3D computer graphics, a ?billboard? is a 3D object that is drawn to always be facing the viewer either along just the vertical axis (like a sign spinning around its post to face the user), or both vertical and horizontal axes. 99 Physical Navigation. When a situated visualization of interest is too far away to be legible or manipulated, one (or both) of either the visualization or the user will generally need to move. When this task falls on the user, such as when the visualization cannot be moved from a specific geographic position or real-world object, this translates to the user physically having to navigate to the object of interest. Unlike in dedicated VR spaces, such as open labs or even CAVEs, such navigation can be particularly tricky in a physical environment filled with slippery or uneven surfaces, physical barriers, and other people, as well as when using interaction devices such as wands, gamepads, or touch surfaces to manipulate virtual objects in the environment. Physical Reach. Even when physical navigation is not needed, many situated visualiza- tions require interaction that involve the physical reach of the user. In fact, sometimes a situated visualization can be located in a position that is not physically accessible to a person moving around in the real world, and which they cannot reach. Temporal and Spatial Continuity. Finally, as observed by Bell et al. [25], given view management strategies that minimize the above challenges, it is important to maintain continuity over time and in space so that objects and visualizations do not ?jump around? due to discontinuous layouts that are calculated independently from frame to frame. Thus, objects should move smoothly over time and space. 5.2 Prototyping Situated View Management In Batch et al. [22], Sungbok Shin, Julia Liu, Peter Butcher, Panagiotis Ritsos, Niklas Elmqvist and I used the design space described in Section 2.2.2 as a generative lens for designing situated views. For each technique below, we enumerated its properties, how the technique modifies the properties of the situated visualizations, and which of the challenges 100 the technique addresses (see Table 5.1 for a summary). Table 5.1: Challenges addressed by each technique. Technique ? @  Challenge Visibility ? ? ? ? ? Occlusion ? ? ? ? ? Overview ? ? ? Perspective Foreshortening ? Legibility ? ? Physical Navigation ? ? ? Physical Reach ? ? ? ? Temporal & Spatial Continuity ? ? ? ?WIM Summon & Dispel ShadowBox @ Cutting Planes  Data Tour ? partially or indirectly addressed Figure 5.1: Sketch of an example world-in-miniature scenario. 5.2.1 World-in-Miniature Synopsis: A World-In-Miniature [227] (WIM) is a miniaturized view of the environment that is controlled by the user, allowing them to see their surroundings from any direction and distance. We apply the basic WIM approach to a situated visualization environment where the WIM is instantiated by the user and is represented by a box containing a miniature represen- 101 tation of all virtual features in the scene (including the user). The WIM omits real-world features, but may include contextual information such as a mesh of the landscape as detected by the device, or a map tile layer based on the user?s GPS coordinates. Finally, the WIM, as an object in the user?s view itself, has its own properties, in addition to affecting the properties of the situated visualizations. Properties affected: ? Position: Situated visualizations can be moved by dragging them in the WIM. Their positions within the WIM also reflect their relative positions in the world. ? Size: The WIM duplicates all situated visualizations at a significantly smaller scale within a space. ? Transparency: Making WIM elements semi-transparent allows the user to see the real world as well as spot virtual elements that are occluded by other virtual elements. ? Orientation: The orientation a situated visualization in the WIM should reflect that of its true orientation in the world. ? Distance: The relative distance between the user and situated objects is represented accurately by the WIM. ? Spatial relation to the surrounding world: The WIM allows for decoupling situated visualizations from the real world. The WIM itself is non-situated. Challenges Addressed: The WIM technique directly addresses Overview, Visibility and Occlusion by creating miniature copies of all objects in the scene, including those not visible to the user, and by giving the user the freedom to rotate the scene to discover and access hidden content. Furthermore, this also eliminates the need for Physical 102 Navigation and Physical Reach, as the miniature allows for easy navigation and access. However, the technique does not address Perspective Foreshortening and Legibility, but can be designed to respect Temporal and Spatial Continuity by synchronizing the positions of the miniature visualizations with those of their true situated counterparts. 5.2.2 Summon and Dispel Synopsis: The Summoning interaction brings all points of interest to the user?s position, whereas Dispelling returns them back. Summoning can be applied to all situated objects, or only to those fitting a specific criteria, such as a special data type, visualization, or spatial area. We opt for displaying the summoned objects using a ?shelf? layout, with the points of interest arranged in a grid in front of the user. Interactive or aggregated alternatives can also be considered, such as a ?Lazy Susan? or carousel-style layout [17]. Properties affected: ? Position: Summoning arranges situated visualizations in a neat layout that is readily visible to the user; dispelling reverses this operation. ? Distance: Summoning drastically reduces the distance between a situated visualization and the viewer. ? Area of interest: Each visualization will be placed where it can be optimally viewed and manipulated. ? Spatial relation to the surrounding world: Summoning eliminates a virtual object?s relation to the spatial world; dispelling restores this mapping. 103 Challenges Addressed: Summoning is primarily designed to minimize Physical Nav- igation and facilitate Physical Reach by essentially bringing the spatial visualization content to the user rather than having the user travel to them. As a secondary effect, this also reduces the Visibility and Occlusion challenges and provides improved Overview of the virtual space. By enabling dispelling the objects to their original locations, the technique also supports Temporal and Spatial Continuity. 5.2.3 Shadowbox Synopsis: The Shadowbox puts a given 3D object (situated visualization) inside a virtual ?display case? represented as a 3D box, with 2D orthogonal projections of the object on each of its faces, and support for unfolding the box. In this way, the Shadowbox is similar to ExoVis [235] but presents the user with either an exterior or interior a view of the box and enables hiding the 3D object to mitigate occlusion. We also propose ?unfolding? the sides of the box to align multiple 2D projections in one plane, allowing the user to view all projections at once (Figure 5.2c). Properties affected: ? Area of interest: The Shadowbox provides optimal views of a situated visualization along each of the primary axes. ? Spatial relation to the surrounding world: The Shadowbox has the possibly problem- atic side-effect that it isolates and separates the situated visualization being examined from the rest of the world. 104 (a) Exterior projection. (b) Interior projection. (c) Unfolding. Figure 5.2: The Shadowbox technique, including its (a) exterior and (b) interior projection modes, as well as the (c) unfolding interaction. Challenges Addressed: The Shadowbox was primarily designed to manage the Per- spective Foreshortening typical in 3D environments by using an orthographic projection for each of the planes, thus enabling exact visual comparison for, e.g., the bars in a 3D barchart. However, it can also help aid Visibility and Occlusion as well as support Overview by providing a structured view of the 3D object. This axis alignment can also facilitate Legibility. 5.2.4 Cutting Planes Synopsis: A Cutting Plane is an interactive filtering and view-flattening mechanism that takes a 2D slice of the 3D object from a position and orientation determined by the user?s manipulation of a plane [99]. Like the WIM technique, cutting planes are a classic approach that have stood the test of time. Here we discuss how it affects the properties of a situated visualization: 105 Figure 5.3: A user is guided to a location of interest by an arrow overlay positioned on the ground in front of her. Properties affected: ? Transparency: A cutting plane essentially renders the cut part of the situated visual- ization fully transparent, enabling easy visual inspection and access to its interior. Challenges Addressed: The primary design rationale for cutting planes is to optimize Visibility and to manage Occlusion arising from the 3D object obscuring itself (such as in a 3D scatterplot). It also facilitates Physical Reach of areas that would otherwise be inaccessible to the user. 106 5.2.5 Data Tour Synopsis: A Data Tour is a guided walk through all points of interest in an environment. The complexity of the algorithm for placing navigation cues may vary. A simplified set of steps for implementing a data tour are as follows: 1. Select all nodes not already visited from the user. 2. Render a visual guide in the user?s field of view to the currently nearest point of interest. 3. Once the user comes in close proximity to the point of interest, add it to the list of visited points and repeat from Step 1. A more sophisticated implementation would implement actual wayfinding into the system, thus taking advantage of street-level maps and blueprints to guide the user on the most optimal path to a target. Properties affected: ? Position: Visualizations are not moved; instead, the user visits the visualizations through physical navigation. ? Distance: The Data Tour tries to bring each point of interest within optimal distance to the user over the entire tour. ? Spatial relation to the surrounding world: Significantly, the Data Tour preserves the spatial location and mapping of each situated visualization to the physical world. 107 Challenges Addressed: The Data Tour facilitates accessing situated visualizations to avoid Occlusion and support Visibility by guiding the user?s Physical Navigation. This also means that the Legibility and Physical Reach challenges can be addressed by guiding the user to the objects. In particular, the Data Tour provides Temporal and Spatial Continuity since it does not alter the environment at all. 5.3 Motivating Scenario Sydney is investigating quarterly productivity measures in her manufacturing company. She has determined that it would be informative to visit the shop floor of a plant that has shown unusual fluctuations in productivity over the last two months. While on the shop floor, it becomes apparent that she needs to collectively evaluate a handful of production input and intermediate output records that had not been on their radar prior to her visit and thus is not present in any of the prepared views she has on hand. Rather than traversing the company?s intranet looking for the data, Sydney uses her mobile device to view the measures in situ using space itself as an index. She creates an assortment of 3D visualizations of the records that are summoned and laid out in a view centered on her current location. The visualizations form a shared virtual workspace visible only to her, wrapping her in an mixed reality environment of contextual data that is anchored to specific machines and processes on the factory floor. Noticing an unexpected pattern in one such visualization, Sydney first runs a cutting plane through one of the 3D charts, and then instantiates a shadowbox around it, which she unfolds to add precision to her view of relationships between multiple variables in the chart. Based on these findings, Sydney is able to pinpoint a part of the process that is most concerning, select it, then dispel the visualizations to their situated locations. Following 108 a guidance arrow indicating the direction to the workstation most relevant to the process, she quickly arrives at the failing location in physical space. After an inspection of the workstation, Sydney finds that a component of a machine at the workstation used for producing an intermediate output is suffering from a faulty component. 5.3.1 Situated Analytics Implementation Details We have implemented all techniques described in this section as a unified system demonstrating their synthesis using three.js, VRIA [37], and AFrame-React. VRIA was used to create ?staged? visualizations?the initial collection of 3D visualizations instantiated upon loading the page. To evaluate our implementation, we set up a MERN stack; the backend of our system is described in Section 5.4.2. Our motivation for using the Web as a development ecosystem was two-fold. Firstly, we believe that the Web?and in particular the mobile Web?is the most ubiquitous, collabo- rative, and platform-independent way to build and share information [198]. Unlike game engine-based systems, there is no need to download bespoke applications or executables, whereas the outputs can be easily integrated in other Web-based applications [37]. These characteristics make the Web an excellent platform for situated visualizations as intercon- nected hypermedia, whether in MR such as those presented in this work, or not. In addition, standardization advances within the Web ecosystem, such as WebXR or WebRTC, provide interoperability capabilities, such as those implied by Mackay [146]. This make mobile MR-based SA possible. When these capabilities are coupled with faster communication, such as 5G, the opportunities for providing interconnected, data-driven information, in situ, as enhancements of our physical world, increase significantly. Secondly, the reliance on such Web technologies and tools, besides providing a familiar and versatile development ecosystem, enabled the collaborative development and real-time 109 (a) Exterior projection. (b) Interior projection. (c) Unfolding. (d) World-in-miniature. (e) 3D VRIA chart with marker. (f) Guided tour arrow. Figure 5.4: The user study implementation, featuring 3D VRIA figures (e,f) and a Shadowbox with exterior and interior projection modes (a and b), as well as the unfolding technique (c), for the analysis stage. The WIM (d), target marker (e), and guided tour (f) were all used for navigation. inspection of MR prototypes across two continents, much like one would do with ?traditional? 2D visualizations, with outputs accessible through a WebXR-compatible browser. Due to VRIA?s WebXR support, a single codebase can be experienced with immediacy, in a variety of devices, without prior knowledge of the underlying hardware. Finally, it enabled us to carry out an evaluation, described in the next session, which respected the social distancing rules many of us have to abide to, albeit with some limitations on the platforms examined. 5.4 Evaluating Situated Analytics In Batch et al. [22], our study?s goal was to gauge a selection of the presented view management techniques, described in Section 5.2. Rather than testing hypotheses in a confirmatory experiment, we conducted an exploratory study and evaluated the participants? use of time and space, the correctness of their responses to basic analysis tasks, and self- reported measures of task-related user experience factors. COVID-19 pandemic restrictions forced us to perform all testing remotely. This meant that we had to conduct our evaluation using standard consumer-level hardware (smartphones) rather than the MR HMDs (HoloLens 2) we had initially planned to use. This also impacted our choice of which techniques to assess. More specifically, we did not evaluate the summon and dispel technique in part because it is suited for larger, possibly outdoor spaces, rather than the indoor spaces we 110 anticipated most would use. 5.4.1 Participants We recruited a convenience sample of 12 participants (hereafter referred to as ?users?) with professional backgrounds in user experience, design, and interface or systems develop- ment, maintenance, and engineering. All users were screened to possess AR-compatible Android mobile devices; at the time of our writing this paper, the only WebXR viewer available on Apple iPhones, Mozilla?s WebXR Viewer, was no longer being maintained, and its final version suffers from major performance issues with one of the libraries upon which our implementation depends. We opted to filter our users to those with professional experience in these domains because they are the target user that motivates this work, as described in the scenario in Section 5.3. However, professional domain experts are difficult to recruit with monetary incentive alone. Thus, we argue that our approach of collecting a convenience sample of individuals within our personal and professional networks is a necessary one in this instance. Due to the requirements by which we filtered our users (i.e., those with professional experience in technical domains who already possess compatible mobile devices), we argue that the relatively small sample represents an adequate contribution. 5.4.2 Apparatus and Data As noted in Section 5.4.1, we screened out users who did not possess mobile devices compatible with the libraries upon which our implementation depends. These libraries include A-Frame and three.js. We set up an Express NodeJS server API endpoint that posts to a MongoDB database to log user session events (navigation, multiple choice question responses), position and orientation during their experiment sessions. When the scene is 111 Step Task Type Variable Technique POI Correct Answer Incorrect Options 1 Navigation lat/longitude 10 POI 10 2 Analysis injury severity (0-5) K 10 3 1, 2, 4 3 Analysis weather 10 Clear Snow, Sleet, Rain 4 Analysis year 10 2016 2017, 2018, 2019 5 Analysis role | 10 Undergrad Faculty, Grad, Staff 6 Navigation lat/longitude 11 POI 11 7 Analysis injury severity (0-5) K 11 4 1, 2, 3 8 Analysis weather 11 Sleet Snow, Clear, Rain 9 Analysis year 11 2018 2016, 2017, 2019 10 Analysis role | 11 Undergrad Faculty, Grad, Staff 11 Navigation lat/longitude ? 1 POI 1 12 Analysis injury severity (0-5) K 1 0 1, 2, 3 13 Analysis weather K 1 Sleet Snow, Clear, Rain 14 Analysis year K 1 2019 2016, 2017, 2018 15 Analysis role K 1 Undergrad Faculty, Grad, Staff 16 Navigation lat/longitude ? 9 POI 9 17 Analysis injury severity (0-5) 9 1 0, 2, 3 18 Analysis weather 9 Clear Snow, Sleet, Rain 19 Analysis year 9 2019 2016, 2017, 2018 20 Analysis role 9 Grad Faculty, Undergrad, Staff 21 Navigation lat/longitude  3 POI 3 22 Analysis injury severity (0-5) 3 2 1, 3, 4 23 Analysis weather 3 Snow Clear, Sleet, Rain 24 Analysis year 3 2019 2016, 2017, 2018 25 Analysis role 3 Undergrad Faculty, Grad, Staff 26 Navigation lat/longitude  4 POI 4 27 Analysis injury severity (0-5) | 4 2 1, 3, 4 28 Analysis weather | 4 Snow Clear, Sleet, Rain 29 Analysis year | 4 2018 2016, 2017, 2019 30 Analysis role | 4 Grad Faculty, Undergrad, Staff Hovering Marker (Control) ?WIM  Data Tour Table 5.2: Sequence of tasks. | VRIA Chart (Control) ShadowBox (Folded, Interior Projection) ShadowBox (Folded, Inverted/Exterior Projection) K ShadowBox (Unfolded) initialized, an ID is assigned to the session and posted to the server from the client via the API. Each one-second interval after the session is posted, the client interface posts the user?s scene camera position and rotation along with their session ID, and the server updates the database collection with these features, the server timestamp, and the user?s IP. Whenever the user answers a question, the client interface posts the user?s answer, the possible answers (with the correct answer indicated), the POI and task it corresponds to. These questions all correspond to features of a synthetic dataset comprised of spatially-tagged falling accident events on an institution?s campus with features of the time and setting in which the accident took place (weather, season, year), of the victim (gender, role within the institution), and of 112 the accident itself (injury severity). User experience survey data was collected via Google Forms following the session task completion experiment. The survey structure was based heavily on the NASA Task Load Index (TLX)2. All questions from the NASA TLX were asked for each technique individually, and the survey ended with two ranked voting questions?one for the techniques used for navigation and one for the techniques used for analysis?and three open-ended text entry questions about the user?s experience. NASA TLX questions and scales were used verbatim in our survey, with one exception: We flipped the scale of the question ?How successful were you in accomplishing what you were asked to do?? prior to conducting our formal study after several pilot users reported misinterpreting the scale?s order for that question specifically. Qualitative data was also collected in the form of researcher notes, video and audio recording, and transcripts from the session. 5.4.3 Experimental Design and Procedures We conducted user sessions using video conferencing on Zoom lasting approximately 30 minutes to 1 hour during which the user was given a summary description of what to expect during the session. The researchers verbally confirmed that the user was aware that video and audio recording would be collected throughout the session; the users were also informed once the recording begins. Users were asked to stand in the middle of the space they would be using during the session, and to visit a randomly-generated URL for the experiment implementation using the Chrome browser on their mobile device. The experiment implementation prompted the user to find their way to a point-of-interest (POI) using one of three techniques represented in AR in their living space through their mobile device, and once they had arrived at their destination, to answer a multiple choice question 2https://humansystems.arc.nasa.gov/groups/tlx/ 113 using either a shadowbox technique or the VRIA figure alone. These tasks and the order in which they were encountered by users are described in Section 5.4.4. Once the users completed this series of sequences, they were redirected to a page informing them that the study was complete and providing them with a link to the survey. The users were asked to complete their survey while they were on the video call with the researcher present and to discuss any final thoughts about the session that they did not feel were captured by the survey. After the user completed the survey and discussed any additional feedback about the techniques they wished to provide, the video and audio recording were halted and the session was ended. All user sessions were conducted within the span of four days. While the techniques we have discussed thus far applicable to HMD users located in public spaces with real-world objects or places that are linked to virtual objects in a MR view, there are two major constraints that have resulted in our use of mobile users in their home environments in which the ?situatedness? of objects in the view is only a simulated one. First, the relative unavailability of consumer-facing MR HMD devices makes it highly unlikely that recruitment of individuals would be possible?particularly if those users must also be professionals willing to spend an hour of their time completing a user session. Second, the state of pandemic lockdown at the time that this study was conducted made in-person sessions impossible due to safety and ethical concerns, as well as organizational policy. Consequently, we were forced to limit the study to mobile users and simulated a situated view, and did not link the virtual objects with real-world ones. 5.4.4 Tasks We asked users to perform a series of navigation tasks (see Figures 5.4e, 5.4d, and 5.4f), each of which initialized a sequence of analysis tasks in which the user was required to an- 114 swer multiple choice questions (see Figures 5.4e, 5.4b, 5.4a, and 5.4c) about an observation in the synthetic dataset described in Section 5.4.2. The sequence and permutations of tasks requested of the user are detailed in Table 5.2. The techniques used for navigation tasks include a guided data tour using an arrow on the floor (Figure 5.3), the WIM (Figure 5.1), and a control condition in which a blue ring marker was rendered directly beneath the target POI. When the user?s viewport reached a position within 0.5 meters of the POI, the user was prompted by the implementation with the first of a series of three multiple choice questions about variables represented via the position of square marks relative to three axes, each question referencing a different variable from the synthetic dataset. This sequence?navigate, then answer three multiple choice questions?was repeated six times. The multiple choice questions during the first four repetitions referred to only one analysis technique per repetition. The final two repetitions cycled through the analytical techniques, with a different technique being applied for each question. For the first two repetitions, the guided tour arrow (Figures 5.3, 5.4f) was used for the navigation task; the following two repetitions used the WIM (Figures 5.1, 5.4d); the final two questions used the control condition of a blue ring beneath the target point of interest (Figure 5.4e). The first of the four analysis conditions used was the control condition of a 3D VRIA chart (Figure 5.4e). This sequence was followed by another sequence using the exterior wall projection view of the Shadowbox (Figures 5.2a, 5.4a), followed by the interior wall projection (Figures 5.2b, 5.4b), followed by the unfolded view (Figures 5.2c and 5.4c). Our rationale for selecting the techniques we chose for the tasks?and for omitting the summon and dispel, and cutting plane techniques?is as follows: To begin with, we opted to omit the summon and dispel technique for either navigation or analysis tasks, because it negates the ?situatedness? of the visualizations by clustering them all in front of the viewer. Another reason that we omitted the summon and dispel technique, as mentioned at 115 the beginning of Section 5.4, is that it is more appropriate to the larger public or outdoor spaces we initially envisioned our study taking place in, not the smaller living spaces the pandemic forced us to conduct our study in. The WIM and guided data tour both address or partly address the challenges of physical navigation, physical reach, and spatial/temporal continuity (Table 5.1), and so they meet the criteria of being appropriate for navigation tasks. Our navigation control condition?the use of a hovering ring beneath the navigation target?met the criteria for indicating that a point in space was a target of interest, but we did not feel that it was a good candidate for inclusion in our list of techniques by itself (Section 5.2), because it is more a standalone mark than a full-fledged technique. We chose the three states of the Shadowbox as our sole test condition, omitting the cutting plane, mainly for two reasons: First, the Shadowbox is arguably a novel technique contributed by this work, while cutting planes are not. Second, we believed that the introduction of a fifth analysis task condition with an interactive user input mechanism would push the task load for our users from reasonable to onerous. Finally, we did not evaluate cutting planes in part because none of our abstract datasets benefited from this technique; it is most suited for volumetric 3D representations. In practice, we found that the time required by users to complete the tasks did indeed tend to hit near the maximum amount of time they were willing to commit for several users. The control condition for analytical tasks of having the user respond to questions using a VRIA figure, rather than a shadowbox, represents what we believe to be a reasonable default of using the targets of the Shadowboxes? projections and removing the Shadowboxes. 5.5 SA View Management Findings In Batch et al. [22], each user was exposed to each technique multiple times, and there was a clear discrepancy between task performance during the tutorial attempt relative to 116 World?in?Miniature Simple marker (control) Guided tour arrow 0 100 200 300 400 Seconds Figure 5.5: Distribution of each user?s average task completion time for navigation tasks (wayfinding to target point of interest) by technique, excluding the first navigation task using each technique. The white area of the box plots begin on the left at the 25th percentile, are split at the 50th percentile, and end at the 75th percentile. The whiskers extend from the 25th and 75th percentile hinges to the farthest observation within 1.5 times the inter-quartile range of either hinge. all following task completion attempts, as the users were still acclimating to the technique during the tutorial stage. For this reason, we opted to evaluate only observations after the first tutorial attempt per technique (i.e., to exclude the first attempts). Despite the guided tour arrow technique being users? very first method for navigation in the AR scene, users were able to locate the target POI significantly faster than they were using the other two techniques (Figure 5.5). Users also reported feeling most confident in their success at the navigation task when they were using the guided tour in the NASA TLX, and generally did not feel stressed, under time pressure, or as if they had to physically or mentally overexert themselves while using the technique (Figure 5.10). Users took significantly longer finding their way to the target POI using the WIM; like the guided tour, the task completion time matches the users? responses to the NASA TLX, where they reported feeling least successful and generally quite negatively about the WIM?s application to navigation tasks. User question response correctness was slightly superior while using the Shadowbox?s folded view with 2D charts projected onto the interior walls of the box (Figure 5.2b) relative to 3D charts, although the inverted, exterior projection had a long right tail, with several users correctly answering all questions using the exterior projection. The unfolded Shadowbox 117 ShadowBox (unfolded) ShadowBox (folded, inverted exterior projection) ShadowBox (folded, interior projection) 3D VRIA Chart (control) 0.00 0.25 0.50 0.75 1.00 Share of Questions Answered Correctly Figure 5.6: Distribution of average correctness for each user by technique, excluding the first attempted question response using each technique. performed poorly relative to other techniques, with users answering fewer questions correctly using this view than other views. Despite its poor performance in correctness, the unfolded view did see a faster response time than all other analysis techniques (Figure 5.7), followed by the interior projection and the exterior-wall projection views (roughly tied, with the interior projection having slightly faster mean times but a long tail of slower responses), while 3D view responses took significantly longer. Participants? use of space is summarized in Figure 5.8. A disclaimer to these results should be provided immediately for the sake of transparency: During three user sessions, the positions of the users became untrackable, resulting in ?null? values being recorded for a small portion of the experiment near its end; this issue had not been encountered during the pilots, and we were unable to trace the cause. All users were using different models of mobile devices, so the hardware cannot be pinpointed as the root cause of the matter. Broadly, users preferred viewing the POIs with their viewport at a height between 1.2 and 1.5 meters. The exception to this is most notably the unfolded Shadowbox, which saw users? point of view angle diverge dramatically from that of all other techniques; they raised their 118 ShadowBox (unfolded) ShadowBox (folded, inverted exterior projection) ShadowBox (folded, interior projection) 3D VRIA Chart (control) 0 50 100 150 Seconds Figure 5.7: Distribution of each user?s average task completion time for analysis tasks (multiple choice question answering) by technique, excluding the first attempted question response using each technique. devices higher while answering questions using the unfolded Shadowbox than they did with other devices. They also tended to view the unfolded Shadowbox at a greater distance; this was particularly true during their first sequence of questions using the technique, and then participants tended to move closer to the center of the POI for later attempts at interpreting dataset values using this technique. In conjunction with the generally poor survey rating (Figure 5.11) and correctness (Figure 5.6) associated with the unfolded Shadowbox, along with user feedback, we must conclude that the reason for this was that users suffered some difficulty in actually viewing the panels of the unfolded Shadowbox; they also encountered minor occlusion issues as the remaining 3D VRIA objects were left in the view during this stage of the experiment. Conversely, users tended to prefer viewing interior wall projection Shadowboxes at a closer distance than the other techniques, and moved closer to the POI during the first set of questions using the technique, but then spent more time farther away from the POI during the final question using this technique. In fact, this pattern of dramatically changing the distance and then reverting to a distance more similar to the one observed during the first 119 sequence appears in all techniques except for the unfolded Shadowbox. When asked to rank their preferred techniques for navigation, users responded resound- ingly against the WIM (Figure 5.9a), with the target marker coming in slightly behind the guided tour as most preferred. When asked to rank their preferred techniques for analysis, users preferred the 3D chart (Figure 5.9b), and?somewhat surprisingly?gave the interior projection view of the Shadowbox the fewest votes. The measure of ?general effort for completion? shown in Figures 5.10 and 5.11 corre- spond to the NASA TLX question ?How hard did you have to work to accomplish your level of performance??3 Users reported finding the general amount of effort required to achieve the level of performance they did during navigation sequences of the sessions to be greatest for the guided tour. However, upon reflection and review of the transcripts and observational data discussed above, this appears to be a result of a common misunderstanding of the scale for this question, as several users who explicitly mentioned finding the WIM difficult to use and/or finding the guided tour easy to use rated the WIM as requiring less effort to achieve their level of performance than they rated the guided tour as requiring. In light of this, we must also disregard the results for this question as applied to the analysis task techniques. However, we have opted to include these results in this paper for the sake of transparency. Despite of the possible misinterpretation of the question regarding general effort for completion, the remaining patterns in navigation tasks largely match observations, feedback, and results; users reported that the guided tour left them feeling less stressed and irritated, less pressed for time, and required less physical or mental exertion than the other navigation techniques, although it did see competition from the ring marker control condition. One user volunteered that they found the guided tour superior for finding their way to the target POI, but the ring marker did a better job of helping them pick the right POI when multiple POIs 3Note: We did not rephrase this question, or any other NASA TLX question; the labels are shortened only for representation in Figures 5.10 and 5.11. 120 Floor distance (XZ-axes) between viewport and active POI during question-answering (meters) 2 Users' average view height 1.31m away from POI 10. Ht: 1.57m. using this technique, all POIs Order (relative to other POIs used to evaluate this technique): 3rd ShadowBox: 1.92m away from POI 11. Unfolded Ht: 1.54m. Order: 2nd 2.21m away from POI 1. Ht: 1.18m. Order: 1st 1 Users' average distance using this technique, all POIs 0 1 2 3 4 0 2 3D VRIA charts (control condition) 1.74m away from POI 11. Ht: 1.38m. Order: 3rd 2.47m away from POI 1. 1.25m away from POI 4. Ht: 1.28m. Order: 2nd Ht: 1.32m. Order: 1st 1 0 1 2 3 4 0 2 2.21m away from POI 9. Ht: 1.42m. ShadowBox: Order: 1st Interior wall projections 2.14m away from POI 10. Ht: 1.39m .1.63m away from POI 11. Order: 3rd Ht: 1.28m. Order: 2nd 1 0 1 2 3 4 0 2 2.02m away from POI 11. ShadowBox: Ht: 1.45m. Order: 2nd Inverted 1.89m away from POI 10. Ht: 1.29m. exterior wall 1.6m away from POI 3. Order: 3rd projections Ht: 1.33m. Order: 1st 1 0 1 2 3 4 0 Figure 5.8: Distributions of user means in viewport heights, distance from the active point of interest (POI) during the question-answering period for each analysis technique, with mean distances by POI. Mean user-POI distances, heights, and the order in which the user interacted with each POI annotated by mark indicating distance. These results exclude the first attempted question response for each technique. 121 Height (Y-axis) of perspective during question-answering (meters) 0.3 0.4 0.2 0.2 0.1 0.0 0.0 World?in?Miniature Blue Ring Marker Guiding Arrow Interior Wall S.B. Unfolded S.B. Exterior Wall S.B. 3D Chart (a) Navigation technique preferences. (b) Analysis technique preferences. Figure 5.9: Technique ranked vote results. General effort for completion [GT] General effort for completion [RM] General effort for completion [WM] Mental exertion [GT] Mental exertion [RM] Mental exertion [WM] Physical exertion [GT] Physical exertion [RM] Physical exertion [WM] Stress level [GT] Stress level [RM] Stress level [WM] Time pressure [GT] Time pressure [RM] Time pressure [WM] Perceived success [GT] Perceived success [RM] Perceived success [WM] Figure 5.10: Survey response measures for navigation tasks based on NASA TLX. Yellow/red spectrum is bad, blue or green is good, color intensity represents response intensity, and bar length indicates the share of users in each category. Abbreviations: [GT]: Guided Tour; [RM]: Ring Marker (control condition); [WM]: World-in-Miniature. Note that the labels have been shortened from the phrasing used in the survey itself to improve the legibility of this figure 122 were situated near each other. On the other hand, other users did find the ring markers easy to pick out or fun to hunt down. The general consensus among users was that the WIM was not very effective for navigation, but that a large part of the problem was the mechanism by which the users controlled its rotation (namely, via the orientation of the phone). Users reported feeling the most successful in their task completion while using the guided tour, and the accuracy of this sentiment is strongly reflected in the task completion times shown in Figure 5.5. As with the navigation task survey responses, we must disregard the responses measuring ?general effort for completion? in light of conflicting interpretation of the question by users. In general, users seemed to feel most comfortable with the 3D VRIA chart visualizations acting as our control condition, which they reported as requiring less mental exertion, and inflicting less stress and irritation upon them relative to other techniques. The 3D VRIA charts performed generally well in the users? survey responses for most categories, and they reported feeling most successful using this technique?despite answering more questions correctly using the interior wall projection view of the Shadowbox, and taking longer to complete their tasks using the 3D chart than using any other technique. This may be a result of the view itself being somewhat more common relative to the Shadowbox views. Despite this, users did report that the 3D charts required more physical exertion than any other technique, while the interior wall view of the Shadowbox required the least. They also reported that the interior wall projection made them feel least pressed for time. The unfolded Shadowbox was received generally negatively for reasons noted above. One aspect of the use of situated visualizations that appeared during our user sessions but has not been discussed thus far is that of fun and enjoyment. In response to open-ended questions about their experiences, four users described the ability to move around the scene as ?fun? without being prompted, saying that ?[it] was fun to navigate?, ?I like [being able to move and look at charts] from all around?such fun!?, ?hunting around for 123 General effort for completion [3D] General effort for completion [ES] General effort for completion [IS] General effort for completion [US] Mental exertion [3D] Mental exertion [ES] Mental exertion [IS] Mental exertion [US] Physical exertion [3D] Physical exertion [ES] Physical exertion [IS] Physical exertion [US] Stress level [3D] Stress level [ES] Stress level [IS] Stress level [US] Time pressure [3D] Time pressure [ES] Time pressure [IS] Time pressure [US] Perceived success [3D] Perceived success [ES] Perceived success [IS] Perceived success [US] Figure 5.11: Survey response measures for analysis tasks based on NASA TLX. Yellow/red spectrum is bad, blue or green is good, color intensity represents response intensity, and bar length indicates the share of users in each category. Abbreviations: [3D]: 3D charts (control condition); [ES]: [IS]: 2D projection onto Interior walls of Shadowbox; 2D projection onto Exterior walls of Shadowbox; [US]: Unfolded Shadowbox. As with Figure 5.10, the labels have been shortened for this figure for legibility. 124 [the target ring marker] actually added some fun to the session.?, and ?AR mode was fun!? Several other users who did not mention the factor of ?funness? in the survey did express similar sentiments verbally during the user sessions, and in two cases the user?s onlooking partner made note of how fun it was for them to watch the session in progress despite not participating themselves. 5.6 A Discussion of Post-Experiment Thoughts In Batch et al. [22], we presented an analysis of the challenges of view management for situated visualizations, enumerating concerns such as physical distance and reach, orientation and legibility, and depth and occlusion. These challenges apply to both the components of a single situated visualization, as well as to multiple visualizations that exist in the same physical environment. Based on this analysis, we propose a set of interaction, layout, and presentation techniques that are designed to mitigate these challenges. Our results indicate that the interior projection view of the Shadowbox outperforms other techniques when the three factors of completion time, correctness of interpreting data, and user experience are all taken into consideration. It is closely followed by the 3D VRIA charts in correctness, slightly outperformed by 3D charts in the survey, and, like every other technique we have evaluated, beaten outright in task completion time by the unfolded Shadowbox. Our results also indicate that the guided tour arrow outperforms the other techniques when completion time and user experience are taken into consideration, given that it received the most favorable ratings in most user navigation time was significantly faster using this technique than the other two evaluated in this study. The WIM did not perform well for user navigation, but part of the blame for this result lays with the rotation mechanism in our implementation. Our results also indicated, anecdotally, that users find data exploration fun when it?s in AR. 125 Beyond these broad results, there were a number of details in our results that were conflicting. One such detail was in the participant use of space. We suspect that there may be several of the following factors at play in explaining the pattern of the differences in users? view distance during the second task discussed in Section 5.5. The difference from first to second sequence may be the result of users becoming more competent with the system by the second sequence, correcting view distance issues encountered during the first sequence. The difference from second to third sequence may be a result of the user, now a seasoned expert, feeling free to take a leisurely stroll around the scene. Given the potential for confounding factors, we examined the data for outliers, but after reproducing Figure 5.8 several times with one potential outlier user omitted each time, we found that the results did not significantly diverge from Figure 5.8, which includes all users. There is always the possibility of measurement error as an explanation; if that is true in this case, it may be the result of some of the limitations of the WebXR utilities used in the design of our implementation, which represent the current state of the art in AR-in-the- browser applications. When the user initializes an AR session, the scene camera no longer shares user position. As a workaround, we attached a three.js Object3D to the camera, and the world position of this object could be used to derive the user?s position. However, upon review in VR views of the scene, the coordinates logged by the Object3D did not always perfectly correspond to those of the camera. A different issue related to remote nature of the study was user interpretation of subjective survey measures that could possibly have been avoided if the researchers and users had been sharing a room, as it may have been easier to notice the discrepancy between user responses relative to user remarks and researcher observations. The brief window of time during which user studies were conducted, and lack of opportunity for a chance observation, precluded a thorough review of survey responses as they were completed. Future remote research in WebXR should mind such pitfalls in data collection, but we are enthusiastic about its 126 potential. 127 Part III Part 3: Models for Interpreting User Session Data 128 Chapter 6: Gesture and Action Discovery With the exception of models such as GOMS, Fitts? law, and the Steering Law, there are few theoretical models in human-computer interaction (HCI) sufficiently sophisticated to enable formal verification. For this reason, validating a new technique, innovation, or system in HCI often leaves empirical evaluation as the only available recourse. It is not uncommon that such empirical evaluation reduces to systematically observing and coding video recordings of users engaging with the interactive system. This is particularly true for virtual reality (VR) systems, where the interaction to be evaluated may involve the user?s whole body. However, such coding is generally costly, time-consuming, and prone to inconsistencies [134]. Furthermore, it often requires multiple coders agreeing on and calibrating a common code book. Finally, the very nature of this process injects subjective biases that make the experiment results difficult to reproduce [172]. To address this issue, we propose a semi-automated computer vision system for behav- ioral coding of videos that will make the process more robust and scalable. Existing systems and action recognition studies tend to focus on actions that are familiar and meaningful in a larger range of contexts such as walking, running, eating, opening the fridge. These studies are able to take advantage of large amounts of publicly-available video data, but these datasets are not typically applicable to usability testing of a novel VR system. In VR, users perform specialized actions to interact with objects in the virtual environment using a custom interface. The actions may take the form of moving parts of the body in a non-generalizable way such as swinging arms diagonally, or tilting their head. These actions 129 do not necessarily map perfectly to real-world scenarios that may be assigned semantic labels in existing video datasets. In our scenario, researchers might be more interested in outlier actions and want to manually filter out certain actions indicative of bugs in the system in unlabeled videos. An automated system that segments videos into potential actions of interest (AOI) and indicates these in unlabeled videos would make the process faster, more scalable, and easier to reproduce. Our approach also draws on additional data?such as depth, audio, tracked marker positions, etc?synchronized with video for efficient identification of AOIs. We believe that this approach is well-suited to most contemporary VR devices, which rely on sensors to detect the positions of the user?s controllers and head-mounted display (HMD). In a user study involving the collection of video data, this ground truth can be used for video segmentation. Our pipeline can be used by HCI researchers who can collect video and telemetry data capturing users during sessions in which the user explores the virtual environment. After all user data has been collected, telemetry data is segmented, and then segments are clustered into micro-gesture classes, using a set of statistical methods described in Section 6.1. Video data is temporally labeled with gesture codes based on telemetry segmentation, and predicting these gestures becomes the training/testing target of our neural network architecture. While our results leave some room for improvement, we believe that?given the lack of a semantic ground truth?our model performs with reasonably high accuracy relative to the current state of the art in action discovery. 6.1 Observational Data Modeling In Batch et al. [20], we proposed a novel pipeline for semi-supervised behavioral coding of videos of users testing a device or interface, with an eye toward human-computer 130 interaction evaluation for virtual reality. Our system applied existing statistical techniques for time-series classification, including e-divisive change point detection and ?Symbolic Aggregate approXimation? (SAX) with agglomerative hierarchical clustering, to 3D pose telemetry data. These techniques create classes of short segments of single-person video data?short actions of potential interest called ?micro-gestures.? A long short-term memory (LSTM) layer then learns these micro-gestures from pose features generated purely from video via a pre-trained OpenPose convolutional neural network (CNN) to predict their occurrence in unlabeled test videos. We present and discuss the results from testing our system on the single user pose videos of the CMU Panoptic Dataset. Figure 6.1 shows the overall pipeline for our system. The videos in our dataset had synchronized 3D pose data available that was used in the training phase. The output of this part of the pipeline was a list of pseudo-ground-truth labels for a selection of ?micro-gestures? detected in this 3D data which act as our AOI. 1. Clustering Phase: The synchronized 3D pose data is converted to features indicating temporal variance using e-divisive change-point detection followed by SAX edit distance matrix transformation. A hierarchical clustering method is then used to group these features into clusters that indicate similar video segments. A researcher can then be presented with an interface that displays the identified video segment groups to check for qualitative similarity and a set of potential AOI. 2. Training Phase: The AOI video segments act as training data for the following LSTM network. The input frames are converted into pose feature vectors using a pre-trained CNN and the output of the LSTM is an action label for this input frame. 3. Testing Phase: In the testing phase we only use the unlabeled video to predict an action label for every frame. The ground truth for these is the output from the hierarchical clustering. 131 Figure 6.1: Our overall pipeline. 6.1.1 Statistical methods We exploit the spatio-temporal continuity of the human body by using sensor data tracking human joint positions to temporally segment full-length video into short (less than 15 seconds) micro-gestures. Before beginning the process, we select eleven angles (? ) between 15 joints from the CMU Panoptic dataset (Figure 6.2). Figure 6.2: Subject joint angles. 132 6.1.1.1 Joint Angle Segmentation Matteson and James [155] originate the e-divisive method for detecting changes in the mean of multivariate time series: Estimated temporal divergence measure for any two joints ? X ,?Y ? IRd is given by Equation 6.1, 2 n m E? (? X ,?Yn m ;?) = ? ? |? X Yn ??m |??(? Xn ;?)? ?(?Ym ;?) (6.1)mn i=1 j=1 where ? is some value between 0 and 2; following James and Matteson [112], we use ? = 1. ? is given by Equation 6.2. (n)?1 ?(?n;?) = ? |? ?i ??k| (6.2) 2 1?i