ABSTRACT
Title of dissertation: TOWARDS VISUAL ANALYTICS
IN VIRTUAL ENVIRONMENTS
Eric Krokos
Doctor of Philosophy, 2018
Dissertation directed by: Professor Amitabh Varshney
Department of Computer Science
Virtual reality (VR) is poised to become the new medium through which we
engage, view, and consume content. In contrast to traditional 2D desktop displays,
which restrict our interaction space onto an arbitrary 2D-plane with unnatural in-
teraction mechanisms, VR expands the visualization and interaction space into our
3D domain, enabling natural observations and interactions with information. With
the rise of Big Data, processing and visualizing such enormous datasets is of utmost
importance and remains a difficult challenge. Machine learning, specifically deep
learning, is rising to meet this challenge. In this work, we present several stud-
ies: (a) demonstrating the effectiveness of immersive environments over traditional
desktops for memory recall, (b) quantifying cybersickness in virtual environments,
(c) enabling human analysts and deep learning to support, refine, and enhance each
other through visualization, and (d) visualizing root-DNS information, enabling an-
alysts to find new and interesting anomalies and patterns.
In our first work, we conduct a user study where participants memorize and
recall a series of spatially-distributed faces on both a desktop and head-mounted
display (HMD). We found that the use of virtual memory palaces in the HMD
condition improves recall accuracy when compared to the traditional desktop condi-
tion. This improvement was statistically significant. Next, we present our work on
quantifying cybersickness through EEG analysis. We found statistically significant
correlations with increases in delta, theta, and alpha brain waves with self-reported
sickness levels, enabling future virtual reality developers to design countermeasures.
Third, we present our work on enabling domain experts to discover hidden labels and
communities within unlabeled (or coarsely labeled) high-dimensional datasets using
deep learning with visualization. Lastly, we present a 3D visualization of root-DNS
traffic, revealing characteristics of a DDOS attack and changes in the distribution
of queries received over time. Together, this work takes the first steps in bringing
together machine learning, visual analytics, and virtual reality.
TOWARDS VISUAL ANALYTICS
IN VIRTUAL ENVIRONMENTS
by
Eric Krokos
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2018
Advisory Committee:
Professor Amitabh Varshney, Chair/Advisor
Professor Joseph JaJa
Professor Matthias Zwicker
Professor John Dickerson
Dr. Kirsten Whitley
©c Copyright by
Eric Krokos
2018

Table of Contents
List of Tables v
List of Figures vi
1 Introduction 1
1.1 Virtual Memory Palaces . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Characterizing Vection-Induced Cyber Sickness using EEG . . . . . . 9
1.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Enhancing Deep Learning with Visual Interactions . . . . . . . . . . 14
1.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Visual Analytics for Root DNS Data . . . . . . . . . . . . . . . . . . 20
1.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Virtual Memory Palaces: Immersion aids Recall 29
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.1 Memory Palaces on a Desktop Monitor . . . . . . . . . . . . . 33
2.2.2 Memory Palaces on Multiple Displays . . . . . . . . . . . . . . 34
2.2.3 Search and Recall in Head-mounted Displays . . . . . . . . . . 36
2.2.4 Embodied Interaction and Recall . . . . . . . . . . . . . . . . 38
2.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
ii
2.3.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.1 Task Performance . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.2 Errors and Skips . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4.3 Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.4.4 Ordering Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5.1 Study Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 Interactive Characterization of Cybersickness in Virtual Environments using
EEG 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 Self-reporting Cybersickness . . . . . . . . . . . . . . . . . . . 64
3.2.2 Measuring Motion Sickness with EEG . . . . . . . . . . . . . 65
3.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.2 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . 69
3.3.3 Signal Acquisition and Pre-Processing . . . . . . . . . . . . . 70
3.3.4 Independent Component Analysis . . . . . . . . . . . . . . . . 71
3.3.5 Time-Frequency Analysis . . . . . . . . . . . . . . . . . . . . . 73
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4.1 Self-Reported Cybersickness . . . . . . . . . . . . . . . . . . . 76
3.4.2 Spectral Differences . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4.3 Time-Frequency with User input signals . . . . . . . . . . . . 81
3.4.4 External Factors . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . 86
4 Enhancing Deep Learning with Visual Interactions 88
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2.1 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . 91
4.2.2 High-Dimensional Community Visualization . . . . . . . . . . 93
4.2.3 Interactive Analysis of High-Dimensional Data . . . . . . . . . 94
4.2.4 Label Generation . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2.5 Deep Learning Semi-Supervised Classification . . . . . . . . . 99
4.2.6 Interactive Intelligent Systems and Active Learning . . . . . . 101
4.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.1 Point-Distribution Generation . . . . . . . . . . . . . . . . . . 105
4.3.2 Cluster Visualization and Manipulation . . . . . . . . . . . . . 109
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.4.1 Pavia University Dataset . . . . . . . . . . . . . . . . . . . . . 122
iii
4.4.2 Salinas Valley . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.4.3 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.4.4 Interpreting Discovered Labels . . . . . . . . . . . . . . . . . . 128
4.4.5 DNS Query Dataset . . . . . . . . . . . . . . . . . . . . . . . 131
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5 Visual Analytics for Root DNS Data 138
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.2.1 Traditional 2D Network and DNS Visualization . . . . . . . . 142
5.2.2 3D Network Visualization . . . . . . . . . . . . . . . . . . . . 144
5.3 Problem and Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.3.1 The Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.3.2 Approach Overview . . . . . . . . . . . . . . . . . . . . . . . . 149
5.3.3 Flow-Map IP-Space Visualization . . . . . . . . . . . . . . . . 149
5.3.4 IP-Space Observations . . . . . . . . . . . . . . . . . . . . . . 151
5.3.5 Deep Learning Driven Query Space Visualization . . . . . . . 153
5.3.5.1 3D Query Flow Visualization . . . . . . . . . . . . . 156
5.3.6 Dual IP-Query Visualization Interaction . . . . . . . . . . . . 159
5.4 Empirical Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.4.1 DNS Expert A . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.4.2 DNS Expert B . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.4.3 DNS Expert C . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6 Conclusions and Future Work 169
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.3 Peer Reviewed Publications . . . . . . . . . . . . . . . . . . . . . . . 172
Bibliography 179
iv
List of Tables
3.1 Correlations (Pearson R-values) between average ERSP values for the
four frequency bands and the self-reported cybersickness levels. All
the correlations are statistically significant (p < 0.001). The graphs
of the various frequency bands for clusters A can be seen in Figure
3.10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
1 Trend Scores for each face for Face Set 1 from Google Trends, with
the average trend score of 30.5 and standard deviation of 21.86. The
data was collected from April, May, June, and July of 2015. . . . . . 174
2 Trend Scores for each face for Face Set 2 from Google Trends, with
the average trend score of 29.83 and standard deviation of 18.32. The
data was collected from April, May, June, and July of 2015. . . . . . 175
3 Angular Resolution of Faces in the Town and Palace scenes, with the
average and standard deviation of the angular resolutions of the set
of faces for each scene. The difference in angular resolutions between
the two scenes was not statistically significant with (p = 0.44 > 0.05). 176
v
List of Figures
1.1 One of the virtual memory palace scenes used in our user study (left)
an ornate palace showing some of the faces used, and (right) the same
ornate palace, with the faces replaced with numbers. . . . . . . . . . 6
1.2 The overall average recall performance of participants using a HMD
was 8.8% higher compared to a desktop. The median recall accuracy
percentage for HMD was 90.48% and for desktop display was 78.57%.
The figure shows the first and third quartiles for each display modal-
ity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 The overall confidence scores of participants using a HMD and a
desktop. Each participant gave a confidence score between 1 and
10 for each face they recalled. Those in the HMD are slightly more
confident about their answers than those on the desktop. . . . . . . . 7
1.4 The distribution of incorrect answers for each display modality show-
ing the median, first, and third quartiles. . . . . . . . . . . . . . . . 8
1.5 The distribution of Simulator Sickness Questionnaire (SSQ) scores
obtained from participants after participating in the experiment. The
plot shows the median, first and third quartiles (orange and grey
respectively), with the minimum and maximum shown as error bars. . 11
1.6 The names and locations of the 14 EEG electrodes in the Emotiv
Epoc headset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Comparison of the EEG power spectra between the baseline (blue)
and virtual flythrough (green) for ICA cluster A. The paired t-test
with Bonferroni-correction between the two spectra reveal p < 0.001
for much of the frequency ranges, and with p < 0.05 for most ranges. 12
1.8 Average over four frequency bands for Cluster A compared with the
average self-reported cybersickness (in green). . . . . . . . . . . . . . 13
1.9 A brief illustration of the difference between traditional deep learning
techniques and our approach. Deep learning traditionally requires
a large, time-consuming, and precisely labeled dataset for training.
For several reasons, such datasets may be inappropriately labeled.
In our approach, we start with coarse labels (that are typically far
easier to construct) and then refine them through an iterative process,
involving visual interactions and deep learning. . . . . . . . . . . . . 15
vi
1.10 A comparison of the initial coarse labeling, to the generated refined
labels, and the precise (true) labels of the Pavia university dataset
after 3 iterations. Starting from the three initial categories: natu-
ral surfaces, roads, and buildings, we were able to reconstruct the
distribution of the 9 labels with an accuracy of 88.2%. . . . . . . . . . 17
1.11 A comparison of the initial coarse labeling, the generated refined la-
bels, and the precise (true) distribution of labels, of the Salinas valley
dataset after five iterations. Starting with 6 initial coarse labels, we
were able to reconstruct the distribution of the 16 labels with an
accuracy of 97.4%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.12 The evolution of the query-space representation over eight iterations,
showing the influence of the iterative labeling. Note that for some
clusters certain colors have been re-used due to the high-number of
groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.13 Our root-DNS dual-visualization that provides both high and low-
level overviews and interactions of the IP and query spaces. IP packet
traffic is visualized on the left, revealing hidden patterns, IP distribu-
tions, and a real TCP-SYN flood attack. A two-dimensional query-
space generated using deep learning portrays a spatial distribution
of received queries and counts. The right image portrays the spa-
tial distribution of queries as they change over time, revealing the
diminished number of received queries due to a DDOS. . . . . . . . . 22
1.14 Overview of the process from raw pcap files to The Flow-Map IP-
Space visualization. Starting from a binary pcap file, we extract and
count the occurrence of each IPv4 IP-Address and type of packet.
Next, the IPs are converted from a 4D to a 2D grid representation,
with glyphs scaled and colored based on the number and type of
packets. This process repeats for each time slice, with slices stacked
along the z-axis. The result is then visualized using 3D accelerated
rendering, which allows for high-level structure and low-level analysis,
to help analysts establish a sense of normalcy (central blue image),
identify outliers (green TCP burst), classify and characterize attacks
(top right), measure attack impacts (middle right), and monitor after
effects (lower right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.15 In the lower-left, the 2D query-space clustered into semantic query
categories. There is a general trend with alphabet-based queries
towards the left, and number-based queries the right, along with
a general trend to alphanumeric characters at the top, and non-
alphanumeric characters at the bottom. The lower-right shows the
temporal query visualization, portraying a high-level temporal overview
of the distribution of queries. The top reveals a selection of interest-
ing observations, such as rapidly diminishing groups of queries at the
start, temporally repeating groups of queries, the large reduction in
queries during the attack, and an overall decrease in the number of
queries over time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
vii
1.16 The region of the query-space consisting of different distributions of
random characters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.17 The region of the query-space consisting of different distributions of
IP addresses, fragments, and expressions. . . . . . . . . . . . . . . . . 27
1.18 The region of the query-space consisting of unusual characters and
queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1 Giulio Camillo’s depiction of a memory palace (1511 AD). Memory
palaces like this have been used since the classical times as a spatial
mnemonic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 The two Virtual Memory Palace scenes used in our user study (a) an
ornate palace, and (b) a medieval town, as seen from the view of the
participants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3 Virtual memory palace: recall phase . . . . . . . . . . . . . . . . . . . 47
2.4 Locations of faces and numbers in the Virtual Memory Palaces used
in our user study (a) an ornate palace, and (b) a medieval town. Note
that this is not the view the participants had during the experiment,
these pictures are used to convey the distribution of the face locations.
The participants would have been placed in the middle of these scenes
surrounded by the faces as seen in Figure 2.2. . . . . . . . . . . . . . 47
2.5 The overall average recall performance of participants in the HMD
condition was 8.8% higher compared to the desktop condition. The
median recall accuracy percentage for HMD was 90.48% and for desk-
top display was 78.57%. The figure shows the first and third quartiles
for each display modality. . . . . . . . . . . . . . . . . . . . . . . . . 50
2.6 The distribution of incorrect answers for each display modality show-
ing the median, first, and third quartiles. . . . . . . . . . . . . . . . 51
2.7 The distribution of faces skipped during recall for each display modal-
ity showing the median, first, and third quartiles. . . . . . . . . . . . 52
2.8 The overall confidence scores of participants in the HMD condition
and the desktop condition. Each participant gave a confidence score
between 1 and 10 for each face they recalled. Those in the HMD
condition are slightly more confident about their answers than those
in the desktop condition. . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.9 The number of errors made for each display condition for various
confidence levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.10 The performance of participants going from a desktop to a HMD
and from a HMD to a desktop, showing the median, first and third
quartiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1 A still from the virtual spaceport flythrough used in our cybersickness
study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2 Averaged scalp maps of clustered independent components. The scalp
map which correlated with cybersickness is shown in the black box. . 72
viii
3.3 The names and locations of the 14 EEG electrodes in the Emotiv
Epoc headset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4 The virtual camera flythrough of the spaceport that each participant
in our study experienced. Note how the above correspond to the
self-reported cybersickness levels in Figure 3.5. . . . . . . . . . . . . . 75
3.5 The self-reported cybersickness levels, using joystick, for each par-
ticipant are shown in the thin colored curves. The bold black curve
shows the average of all the participants’ self-reported cybersickness
levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.6 Participant Simulator Sickness Questionnaire (SSQ) scores after the
experiment are shown here. The plot shows the median, first and
third quartiles (orange and grey respectively), with the minimum
and maximum shown as error bars. . . . . . . . . . . . . . . . . . . . 78
3.7 A comparison of the average score as reported by the joystick with the
SSQ sum for each participant. The SSQ score and the self-reported
cybersickness using the joystick have a Pearson Correlation r-value of
0.49. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.8 Comparison of the EEG power spectra between the baseline (blue)
and virtual flythrough (green) for ICA cluster A. The paired t-test
with Bonferroni-correction between the two spectra reveal p < 0.001
for much of the frequency ranges. . . . . . . . . . . . . . . . . . . . . 81
3.9 Time-Frequency visualization of cluster A. The average self-reported
cybersickness levels are shown below in red. . . . . . . . . . . . . . . 82
3.10 Average over four frequency Bands for Cluster A compared with the
average self-reported cybersickness (in green). . . . . . . . . . . . . . 84
3.11 Visualization of the ERSP from a cluster A participant with self-
reported cybersickness levels. Note how the changes in ERSP values,
especially for Delta and Theta bands, align with the participant’s
self-reported cybersickness. . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1 A brief illustration of the difference between traditional deep learning
techniques and our approach. deep learning traditionally requires a
large, time-consuming, and precisely labeled dataset for training. For
many different reasons, such datasets may be inappropriately labeled.
In our approach, we start with coarse labels (that are typically far
easier to construct) and then refine them through an iterative process,
involving visual interactions and deep learning. . . . . . . . . . . . . 104
4.2 Equations used in calculating the Constrastive Loss for the Siamese
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
ix
4.3 An example of the result of running the Variational Autoencoder and
then refining that result using the Siamese Network. By running the
Siamese network after autoencoder, the generated clusters tend to be
tighter with more space in-between, making the individual clusters
easier to identify. The user has the ability to adjust the number of
iterations the Siamese network runs which affects the tightness of the
clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4 A visual representation of the network structure used in both the
Variational Autoencoder and the Siamese network. The same net-
work structure and weights are shared across the networks. First the
Variational Autoencoder runs, and the Siamese network continues
using the network weights generated by the autoencoder. After the
Siamese network has run, the resulting network is used to generated
the 2D distribution of points. The network weights are also saved,
reused, and refined in the following iterations. . . . . . . . . . . . . . 109
4.5 Initial view of points, with all points given the same color, which are
only assigned a color once selected or activated by the user. . . . . . . 112
4.6 The menu interface presented when a user selects a point/group. . . . 113
4.7 Selection of points through manual paint-brush and circle selection
interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.8 Equations used to compute normalized cut. . . . . . . . . . . . . . . . 114
4.9 Segmentation of a set of points using normalized cut (Ncut). . . . . . 115
4.10 Common Interaction Techniques . . . . . . . . . . . . . . . . . . . . . 117
4.11 Handling of overlapping sets of points. . . . . . . . . . . . . . . . . . 118
4.12 An example of where a few points are selected from a parent cluster,
and over time the points are separated away from the parent cluster
to form their own cluster. . . . . . . . . . . . . . . . . . . . . . . . . 119
4.13 The resilience of the algorithm to labeling mistakes. . . . . . . . . . . 120
4.14 A comparison of the initial coarse labeling, to the generated refined
labels, and the precise (true) labels of the Pavia university dataset
after 3 iterations. Starting from the three initial categories: natu-
ral surfaces, roads, and buildings, we were able to reconstruct the
distribution of the 9 labels with an accuracy of 88.2%. . . . . . . . . . 122
4.15 A comparison between the labeling generated by our system and the
ground truth labeling. The labeling generated by our system is driven
by the clearly distinct group off the main body of points above it. This
difference is supported by a visual difference in the aerial view. . . . . 124
4.16 A comparison of the initial coarse labeling, the generated refined la-
bels, and the precise (true) distribution of labels, of the Salinas valley
dataset after five iterations. Starting with 6 initial coarse labels, we
were able to reconstruct the distribution of the 16 labels with an
accuracy of 97.4%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.17 Participants accuracy and timings for the Pavia University dataset. . 128
4.18 Participants accuracy and timings for the Salinas Valley dataset. . . . 129
x
4.19 A potential newly discovered sub-group within the painted metal
sheets label in the Pavia University dataset. On the left shows the
point distribution and spatial representation of the labeling as gener-
ated by our technique. On the right shows another iteration showing
if we had sub-divided the current points and the resulting spatial
representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.20 A potential newly discovered sub-group within the untrained grapes
label in the Salinas Valley dataset. The left figure shows the meta
information used, the aerial view of the valley. The middle image
shows the labeling as generated using our approach. The right im-
age shows the labeling as given by the dataset. The yellow labeling
portion in the middle image matches with a discoloring in the aerial
view, which suggests that there may be a different material there. . . 132
4.21 The evolution of the query-space representation over eight iterations,
showing the influence of the iterative labeling. Note that for some
clusters certain colors have been re-used due to the high-number of
groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.22 The output of the DNS query investigation coarsely labeled to identify
the meaning of the individual clusters and their relation to each other.
Note that for some clusters certain colors have been re-used due to
the high-number of groups. . . . . . . . . . . . . . . . . . . . . . . . . 135
5.1 Overview of the process from raw pcap files to The Flow-Map IP-
Space visualization. Starting from a binary pcap file, we extract and
count the occurrence of each IPv4 IP-Address and type of packet.
Next, the IPs are converted from a 4D to a 2D grid representation,
with glyphs scaled and colored based on the number and type of
packets. This process repeats for each time slice, with slices stacked
along the z-axis. The result is then visualized using 3D accelerated
rendering, which allows for high-level structure and low-level analysis,
to help analysts establish a sense of normalcy (central blue image),
identify outliers (green TCP burst), classify and characterize attacks
(top right), measure attack impacts (middle right), and monitor after
effects (lower right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.2 An example analysis in Wireshark, a widely used pcap analyzer. . . . 147
5.3 A overview of the process from raw pcap files to Query-Space visual-
ization. Starting from a binary pcap file, we extract and count each
query. Next, each query is converted to a TF-IDF (Term Frequency,
Inverse Document Frequency) character-level feature vector. A deep
learning autoencoder is trained using all queries, to generate a vi-
sually coherent spatial distribution of queries when projected into a
2D space. This distribution of queries is then visualized using 3D
accelerated rendering, which allows for high-level temporal structure
(top-right) and low-level query analysis (bottom right). . . . . . . . . 152
xi
5.4 Interesting self-similar patterns of intra-IP-bin queries and across-IP-
bin traffic (from the TCP-Syn DDOS) over time in D-Root traffic. . . 152
5.5 In the lower-left, the 2D query-space clustered into semantic query
categories. There is a general trend with alphabetic-based queries
towards the left, and numeric-based queries the right, along with a
general trend to normal characters at the top, and unusual characters
at the bottom. The lower-right shows the temporal query visualiza-
tion, portraying a high-level temporal overview of the distribution of
queries. The top reveals a selection of interesting observations, such
as, rapidly diminishing groups of queries at the start, temporally re-
peating groups of queries, the large reduction in queries during the
attack, and an overall decrease in the number of queries over time. . . 154
5.6 Two selected regions of queries. The top selection indicates the
queries originate from a wide range of IPs, while the bottom selec-
tion indicates those queries came from very few IPs. Non-included
IPs may be set transparent (bottom) or left un-transparent(top). The
bottom-right image shows the resulting queries included from an en-
tire IP-bin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.7 The region of the query-space consisting of different distributions of
random characters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.8 The region of the query-space consisting of different distributions of
IP addresses, fragments, and expressions. . . . . . . . . . . . . . . . . 164
5.9 The region of the query-space consisting of unusual characters and
queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
1 Face Set 1, containing 21 faces. . . . . . . . . . . . . . . . . . . . . . 177
2 Face Set 2, containing 21 faces. . . . . . . . . . . . . . . . . . . . . . 178
xii
Chapter 1: Introduction
The work presented in this dissertation is geared towards enabling visual an-
alytics within virtual environments. A successful virtual reality-based analytical
visualization should consist of several key components: provide the analyst with an
intuitive view of and interaction with the data, leverage and support the respective
advantages of humans and computers, and be capable of presenting vast amounts
of abstract data enabling the analysis and discovery of actionable intelligence. This
dissertation presents advances in each of those areas, justifying and enabling the
use of head-mounted displays, leveraging and enhancing machine-learning using hu-
man pattern recognition and domain knowledge for data discovery and analysis,
and the development of a three-dimensional visualization for the exploration and
characterization of vast amounts of abstract DNS network data.
The world is amidst a global cyber war. Conducting the analysis of vast
amounts of network data for monitoring and safeguarding a core pillar of the inter-
net, the root DNS, is an enormous challenge. It is critical to the continuing stability
and operation of the Internet that we discover, understand, and thwart attacks. The
current state-of-the-art tools are in essence log-file editors, only portraying specific
details about specific Internet packets, with no sense of global structure, spatial
1
orientation, or temporal connections. Humans, in contrast, are adept at organizing
and interacting with the world in a spatial manner.
While the monitors and traditional displays are generally familiar and com-
fortable to use, the benefit of head-mounted displays is their ability to leverage and
connect our kinesthetic intelligence (our natural inclination for movement and spa-
tial interaction) with the content displayed. Rather than mapping data to a large
2D array of monitors, head-mounted displays allow for an immersive and natural in-
teraction space. Discovering and quantifying the benefits of head-mounted displays
over traditional desktops are therefore very important. In our first study we ask,
what are the benefits of using a head-mounted display over a traditional desktop
display? Our study showed that there was a statistically significant improvement in
the recall of spatially-distributed information when using a head-mounted display, as
compared to a traditional desktop. We present the summary of our research in sec-
tion 1.2, and give details in Chapter 2. This result shows that using head-mounted
displays has advantages over traditional desktop when engaging with spatial content.
A crucial problem with head-mounted displays that still prevents their wide
adoption is the onset of cybersickness. It has been reported [1] that a large propor-
tion of the population, after an extended usage of head-mounted displays, experience
feelings of nausea, headache, fatigue, and other negative ailments, which are collec-
tively referred as cybersickness. This means there are analysts who are unable to
use virtual reality for data visualization. At present, there is little understanding
of the causes of cybersickness, let alone how to mitigate it. One of the prevailing
theories on the cause of cybersickness is the sensory conflict theory, which attributes
2
it to the dissonance between the visual and the vestibular sensory cues [1,2]. Before
we are able to treat and correct for cybersickness, we need to be able to quantify
and measure it, motivating our second study. Our goal is to develop a technique
for the objective measurement and quantification of cybersickness. Using an elec-
troencephalogram (EEG) we validate our measurements by correlating the EEG
signals with a continuous self-diagnosis of cybersickness from our participants using
a joystick. We found that the delta, theta, and alpha wave bands were correlative
with the reported levels of cybersickness. We present the summary of our research
in section 1.3, and give details in Chapter 3. This result provides the framework
to evaluate the onset and mitigation of cybersickness in future applications and
head-mounted displays.
Having established that head-mounted displays have advantages over desk-
tops, and a framework to detect and quantify cybersickness, we shift focus towards
visual analytics. The ability of our technology to collect large amounts of data
has far surpassed our ability to process and generate insights and conclusions using
existing methods. Traditionally, the visualization of large datasets on desktops is
done with large lists, tables, or conventional plots (such as line and bar plots). Each
representation has its disadvantages, but their weaknesses are most evident when
the datasets visualized are large and high dimensional. These weaknesses make in-
terpreting and drawing conclusions from such datasets very difficult. Rather than
relying on pure visualization for analysis, analysts often turn to machine learning
for information discovery and analysis. Deep learning, a branch of machine learn-
ing, has proved itself extremely successful for many different application areas of
3
science and visualization. However, deep learning requires a tremendous amount
of precisely-labeled training data in order to function properly. In the real world,
such training sets are not readily available or easily created, such as in the case of
training a classifier to detect attacks on DNS. An approach where machines and
humans are able to leverage the strengths of the other is desirable. This is the
motivation for our third study, to develop a technique which leverages the pattern
recognition ability of humans with the analytical processing power of machines and
deep learning, with the goal of discovering hidden communities and labels within
sparsely labeled, high-dimensional, non-spatial datasets. To evaluate our technique,
we treated two finely labeled, high-dimensional, spatial datasets as ground-truth.
Then, we started the iterative discovery process by stripping away the spatial con-
text, and greatly reduced the number of labels by merging many of the precise labels
into initial coarse labels. For the study, our goal was to reconstruct, to the best of
our ability, the original labeling distribution of the test datasets. For each of the
datasets, we were able to rediscover the hidden labels and accurately label many
of the data points appropriately. We further present the summary of our research
in section 1.4, and give details in Chapter 4. This result shows we can discover
latent clusters and information through an iterative process, while simultaneously
improving the capabilities of the deep learning training model with human guidance.
A virtual reality-based visualization will inherently leverage a 3D environ-
ment. While 2D visualizations are often regarded as easier to create and under-
stand (in terms of time required for comprehension), recent research has shown
there are many benefits to 3D visualizations over 2D for abstract data visualiza-
4
tion [3–5], including clearer spatial separation, reduced over-plotting, and enabling
faster construction and deeper mental models. These ideas have inspired our work,
summarized in section 1.5, and presented in more detail in Chapter 5, where we
develop a three-dimensional virtual-environment data visualization capable of por-
traying large amounts of abstract network data. For our visualization, we sought to
tackle a real-world problem, enabling the analysis of the vast amounts of network
traffic that flows through the root-DNS. Traditional network tools largely consist of
advanced text editors, simply listing every aspect of every packet. Although such
visualizations provide unparalleled detail, they inhibit the potential of detecting
anomalous trends, generally require analysts to know of a search target, and do not
scale to handle the ever-increasing amounts of data to be reviewed. Our visual-
ization has been carefully constructed with DNS industry experts to leverage our
pattern recognition and spatial recognition capabilities. We show that our visual-
ization is capable of providing analysts with a comprehensible analysis of the data
of the spatio-temporal data, enabling a definition of normalcy, detecting previously
unknown anomalies and clusters, and characterizing large-scale real-world attacks
for future attack identification and mitigation.
1.1 Virtual Memory Palaces
1.1.1 Introduction
Since classical times, people have used memory palaces as a spatial mnemonic
to help remember information by organizing it in an environment and associating it
5
with salient features in that environment. Virtual reality affords a new medium for
exploring large datasets in an embodied way. We present here a summary of Chap-
ter 2, where we explore whether using virtual memory palaces in a head-mounted
display (HMD) will allow a user to be able to better recall information than with a
traditional desktop display.
Figure 1.1: One of the virtual memory palace scenes used in our user study (left) an
ornate palace showing some of the faces used, and (right) the same ornate palace,
with the faces replaced with numbers.
1.1.2 Approach
For this study, we focused on analyzing and leveraging the spatial component of
human memory and the inherently spatial nature of immersive virtual reality head-
mounted displays. Our hypothesis is that the added immersion of virtual reality will
improve the recall performance of human subjects compared with desktop displays.
To test this hypothesis, we prepared two 3D realistically rendered environments and
6
Figure 1.2: The overall average recall performance of participants using a HMD
was 8.8% higher compared to a desktop. The median recall accuracy percentage for
HMD was 90.48% and for desktop display was 78.57%. The figure shows the first
and third quartiles for each display modality.
Figure 1.3: The overall confidence scores of participants using a HMD and a desktop.
Each participant gave a confidence score between 1 and 10 for each face they recalled.
Those in the HMD are slightly more confident about their answers than those on
the desktop.
7
Figure 1.4: The distribution of incorrect answers for each display modality showing
the median, first, and third quartiles.
placed faces of well-known people within them. The faces were distributed around
a central location where we position our study participants. After our participants
familiarize themselves with the faces and their names, the participants first use
either a desktop or a head-mounted display, in one of the two scenes, and view one
of two sets of faces. We give the participants five minutes to familiarize themselves
with the scene and the faces distributed around the scene. After a two-minute
break, we place participants back in the environment, with the faces replaced with
numbers. The participants next recall the names of the faces which were previously
at each numbered location, and to give a confidence in their answer (0 − 10), with
10 as fully confident. This process then repeats for the alternate display, the other
environment, and with the alternate set of faces. An example of what one of the
scenes looked like with faces and with numbers can be seen in Figure 1.1.
8
1.1.3 Results
Our study, with 40 participants, found that virtual memory palaces viewed
in an HMD provide a superior memory recall ability compared to a traditional
desktop display. Specifically, we found a statistically significant increase of 8.8% in
the recall accuracies of the HMD as compared to a desktop, and that 40% of our
participants had at least a 10% advantage while using the HMD. The distribution
of the accuracies for each display can be seen in Figure 1.2. In addition, we also
found a statistically significant increase in the confidence of our participants in their
answers when in the HMD as compared to using the desktop (as can be seen in
Figure 1.3). This increase in confidence was supported by a statistically significant
decrease in the number of errors for the HMD as compared to the desktop (See
Figure 1.4). We believe that these types of virtual environments can create natural
and more memorable experiences that enhance productivity for better recall and
understanding of large amounts of information.
1.2 Characterizing Vection-Induced Cyber Sickness using EEG
1.2.1 Introduction
If virtual reality is to revolutionize the way we view and interact with machines,
data, and each other, mitigating and understanding the onset of cybersickness is crit-
ical. Virtual and augmented reality are poised to fundamentally shift the way we
consume visual information and interact with technology. However, the adaptation
9
of this new method for viewing and interacting with information is hampered by
the fact that a large proportion of people suffer from what has been named cyber
sickness [1]. Common symptoms of cybersickness include nausea, increased heart
rate, disorientation, sweating, eye strain, and headaches [6,7]. One of the prevailing
theories on the cause of cybersickness (also referred as simulator sickness or visual
fatigue) is the sensory conflict theory, which attributes it to the dissonance between
the visual and the vestibular sensory cues [1, 2]. Our goal is to develop a simple,
passive, and objective technique which quantifies cybersickness. We present a sum-
mary of our study here, and cover its design and results in more detail in Chapter
3.
1.2.2 Approach
We conducted a user study which involves taking EEG measurements of par-
ticipants, using the Emotiv Epoc 14-channel 128Hz EEG headset, while the par-
ticipants are exposed to an experience designed to induce cybersickness in virtual
reality. After taking a baseline EEG reading, the participants fly through a virtual
spaceport for approximately a minute. During the fly-through, participants report
their current level of cybersickness by tilting a joystick, indicating higher sickness
with more tilt. After each session, each participant completes a Simulator Sickness
Questionnaire (SSQ), which subjectively assesses the different symptoms related to
cybersickness, giving each a score from 0 to 4.
For the user study, we recruited 44 participants, of which we were able to use
10
Figure 1.5: The distribution of Simulator Sickness Questionnaire (SSQ) scores ob-
tained from participants after participating in the experiment. The plot shows the
median, first and third quartiles (orange and grey respectively), with the minimum
and maximum shown as error bars.
the data of 43 participants. First, we confirm that our participants experienced cy-
bersickness through analysis of the SSQ scores. The distribution of the SSQ scores
can be seen in Figure 1.5. By looking at the distribution of the SSQ scores, it is
clear that our participants, on average, claimed to experience some cybersickness
symptoms. To process the collected EEG data we use the EEG toolkit EEGLab
(https://sccn.ucsd.edu/eeglab/). EEGlab works by taking each participant’s
EEG signals and performing independent component analysis (ICA), generating 14
independent components (ICs) for each participant. The goal of ICA is to decon-
struct the original signals into component signals such that each of the component
signals represents independent sources of the resulting signal, such as eye-blinks,
noise, etc. Using the calculated independent components from each of the partic-
11
ipants, we cluster them using EEGLab’s built-in K-means functionality, such that
similarly attributed components across participants are together.
Figure 1.6: The names and locations of the 14 EEG electrodes in the Emotiv Epoc
headset.
Figure 1.7: Comparison of the EEG power spectra between the baseline (blue) and
virtual flythrough (green) for ICA cluster A. The paired t-test with Bonferroni-
correction between the two spectra reveal p < 0.001 for much of the frequency
ranges, and with p < 0.05 for most ranges.
12
Figure 1.8: Average over four frequency bands for Cluster A compared with the
average self-reported cybersickness (in green).
1.2.3 Results
From the 14 generated independent component (IC) clusters, we identified one
of the clusters (cluster 12 which we call cluster A) as representative of cybersick-
ness, which is shown in Figure 1.6. For this cluster, we found that there was a
statistically significant power increase across many frequency ranges for the partic-
ipants in the sick condition (when flying through the spaceport), as compared to
the baseline condition (stationary), which can be seen in Figure 1.7. To validate
our results further, we compare the clustered EEG data for the sick condition with
the collected joystick information. For the different EEG frequency bands, we see a
high correlation between the delta-wave (1.0 − 4.0) Hz, theta-wave (4.0 − 7.0) Hz,
and Alpha-wave (7.0 − 13.0) Hz signals, which can be seen in Figure 1.8.
13
One of the first and most crucial steps to mitigating cyber-sickness is to figure
out how to quantify the characteristics of cybersickness. The result of this study
offers an objective method to measure and quantify cybersickness, which can be
used for further work on mitigating cybersickness.
1.3 Enhancing Deep Learning with Visual Interactions
1.3.1 Introduction
Recent advances in deep learning have led to impressive advances in many areas
of science and visualization. However, deep learning requires huge amounts of finely
and precisely labeled training data to function. Such complete training datasets
are rare due to their, high cost, large amounts of time to create, erroneous labels,
and labeling subjectivity. In addition, most datasets are abstract or non-spatial,
further increasing the difficulty of their annotation. In Chapter 4, we present how
we leverage the strengths of human pattern recognition and deep learning analytics
to facilitate the exploration and interaction of high-dimensional datasets that have
coarse labels, through an interactive and iterative process of refinement, re-training,
and visualization. Our approach can be used to (a) alleviate the burden of intensive
manual labeling that captures the fine nuances in a high-dimensional dataset by sim-
ple visual interactions, (b) replace a complicated (and therefore difficult to design)
labeling algorithm by a simpler (but coarse) labeling algorithm supplemented by
user interaction to refine the labeling, or (c) use low-dimensional features (such as
the RGB colors) for coarse labeling and turn to higher-dimensional latent structures,
14
that are progressively revealed by deep learning, for fine labeling.
吀爀愀搀椀琀椀漀渀愀氀 䐀攀攀瀀 䰀攀愀爀渀椀渀最
䠀椀最栀ⴀ䐀椀洀攀渀猀椀漀渀愀氀  䴀愀渀甀愀氀礀 䤀渀琀攀渀猀椀瘀攀 䰀愀戀攀氀椀渀最 䐀攀攀瀀 䰀攀愀爀渀椀渀最
䐀愀琀愀猀攀琀 䈀爀漀挀挀漀氀椀 ㄀
䈀爀漀挀挀漀氀椀 ㈀
䘀愀氀漀眀
刀漀甀最栀 䘀愀氀漀眀
匀洀漀漀琀栀 䘀愀氀漀眀
匀琀甀戀戀氀攀
䌀攀氀攀爀礀
唀渀琀爀愀椀渀攀搀 䜀爀愀瀀攀猀
嘀椀渀攀礀愀爀搀 匀漀椀氀
䌀漀爀渀 眀攀攀搀猀
䰀攀琀甀挀攀 㐀 圀攀攀欀猀
䰀攀琀甀挀攀 㔀 圀攀攀欀猀
䰀攀琀甀挀攀 㘀 圀攀攀欀猀
䰀攀琀甀挀攀 㜀 圀攀攀欀猀
唀渀琀爀愀椀渀攀搀 嘀椀渀攀礀愀爀搀猀
嘀攀爀琀椀挀愀氀 嘀椀渀攀礀愀爀搀 吀爀攀氀椀猀
嘀椀猀甀愀氀礀ⴀ䐀爀椀瘀攀渀 䤀渀琀攀爀愀挀琀椀瘀攀 䐀攀攀瀀 䰀攀愀爀渀椀渀最
䠀椀最栀ⴀ䐀椀洀攀渀猀椀漀渀愀氀 䴀愀渀甀愀氀礀 匀甀瀀攀爀昀椀挀椀愀氀 䰀愀戀攀氀椀渀最 䐀攀攀瀀 䰀攀愀爀渀椀渀最 嘀椀猀甀愀氀 䘀攀愀琀甀爀攀 刀攀瀀爀攀猀攀渀琀愀琀椀漀渀
䐀愀琀愀猀攀琀
䈀爀漀挀挀漀氀椀
䘀愀氀漀眀
䌀攀氀攀爀礀
嘀椀渀攀礀愀爀搀猀
䰀攀琀甀挀攀
伀琀栀攀爀
䈀爀漀挀挀漀氀椀 ㄀
䈀爀漀挀挀漀氀椀 ㈀
䘀愀氀漀眀
刀漀甀最栀 䘀愀氀漀眀
匀洀漀漀琀栀 䘀愀氀漀眀
匀琀甀戀戀氀攀
䌀攀氀攀爀礀
唀渀琀爀愀椀渀攀搀 䜀爀愀瀀攀猀
嘀椀渀攀礀愀爀搀 匀漀椀氀
䌀漀爀渀 眀攀攀搀猀
䰀攀琀甀挀攀 㐀 圀攀攀欀猀
䰀攀琀甀挀攀 㔀 圀攀攀欀猀
䰀攀琀甀挀攀 㘀 圀攀攀欀猀
䰀攀琀甀挀攀 㜀 圀攀攀欀猀 䐀椀猀挀漀瘀攀爀攀搀 氀愀琀攀渀琀 氀愀戀攀氀猀 愀渀搀  䘀椀渀攀爀 䰀愀戀攀氀猀 䠀甀洀愀渀 刀攀昀椀渀攀搀 䰀愀戀攀氀椀渀最 䠀甀洀愀渀 䤀渀琀攀爀愀挀琀椀漀渀
唀渀琀爀愀椀渀攀搀 嘀椀渀攀礀愀爀搀猀 猀攀洀洀愀渀琀椀挀 猀琀爀甀挀琀甀爀攀
嘀攀爀琀椀挀愀氀 嘀椀渀攀礀愀爀搀 吀爀攀氀椀猀
Figure 1.9: A brief illustration of the difference between traditional deep learning
techniques and our approach. Deep learning traditionally requires a large, time-
consuming, and precisely labeled dataset for training. For several reasons, such
datasets may be inappropriately labeled. In our approach, we start with coarse
labels (that are typically far easier to construct) and then refine them through an
iterative process, involving visual interactions and deep learning.
15
1.3.2 Approach
Our goal is to take a coarsely labeled, non-spatial, high-dimensional dataset,
and generate a low-dimensional representation that is easily interpretable by an
analyst. To generate the low-dimensional representation, we employ two convo-
lutional deep neural networks, a variational autoencoder, and a Siamese network.
The variational autoencoder attempts to project the high-dimensional data to a
two-dimensional representation in an unsupervised manner, relying entirely on the
underlying distribution and patterns within the data. The variational autoencoder
is constrained by requiring the network to reconstruct the original high-dimensional
input data as accurately as possible from the projected two-dimensional layout.
This provides analysts with a starting 2D data layout, derived from the data itself.
Once the network generates the 2D distribution of data points, an analyst visually
analyzes the distribution of the points, colored based on their currently assigned
label (if provided), and manually selects points to assign or reassign their label-
ing. A point’s label may change for many reasons, such as the point’s proximity
to other points of a different label and group evolution. The advantage here is the
ability of humans to identify spatial patterns and provide that information to the
network. Once the analyst has adjusted the labeling of the data points to their sat-
isfaction, they are fed into the deep learning Siamese network system. The Siamese
network consists of an internal network exactly similar with both internal structure
and weights to the autoencoder, but compares how two input points are projected
into the two-dimensional space based on their given labels. If the points are close
16
but share different labels, the network attempts to push them further apart in the
projection, and if the points share a label, the network attempts to push them closer
together. With this new model, a new distribution of points is generated using both
the autoencoder and Siamese networks. This iterative process of projection and
relabeling continues until the generated labeling and distribution is satisfactory. An
overview of this process can be seen in Figure 1.9.
1.3.3 Results
䌀漀愀爀猀攀 䰀愀戀攀氀椀渀最 嘀椀猀甀愀氀礀ⴀ䐀爀椀瘀攀渀 䐀攀攀瀀 䰀愀戀攀氀椀渀最 䜀爀漀甀渀搀 吀爀甀琀栀
刀漀愀搀   一愀琀甀爀攀  匀琀爀甀挀琀甀爀攀    䄀猀瀀栀愀氀琀   䴀攀愀搀漀眀猀  䜀爀愀瘀攀氀  吀爀攀攀猀  倀愀椀渀琀攀搀 䴀攀琀愀氀 匀栀攀攀琀猀  
䈀愀爀攀 匀漀椀氀   䈀椀琀甀洀攀渀   匀攀氀昀ⴀ戀氀漀挀欀椀渀最 戀爀椀挀欀猀  匀栀愀搀漀眀猀
䤀渀椀琀椀愀氀 䐀椀猀琀爀椀戀甀琀椀漀渀 䘀椀渀愀氀 䐀椀猀琀爀椀戀甀琀椀漀渀 刀䜀䈀 匀瀀攀挀琀爀甀洀
Figure 1.10: A comparison of the initial coarse labeling, to the generated refined
labels, and the precise (true) labels of the Pavia university dataset after 3 iterations.
Starting from the three initial categories: natural surfaces, roads, and buildings, we
were able to reconstruct the distribution of the 9 labels with an accuracy of 88.2%.
To illustrate our process, we evaluate our technique on three datasets, a hyper-
spectral image of Pavia University [8], a hyperspectral image of Salinas Valley [9],
17
䌀漀愀爀猀攀 䰀愀戀攀氀椀渀最 嘀椀猀甀愀氀礀ⴀ䐀爀椀瘀攀渀 䐀攀攀瀀 䰀愀戀攀氀椀渀最 䜀爀漀甀渀搀 吀爀甀琀栀
䈀爀漀挀挀漀氀椀  䘀愀氀漀眀  䌀攀氀攀爀礀  䰀攀琀甀挀攀  嘀椀渀攀礀愀爀搀猀   䈀爀漀挀挀漀氀椀 ㄀  䈀爀漀挀挀漀氀椀 ㈀  䘀愀氀漀眀  刀漀甀最栀 䘀愀氀漀眀  匀洀漀漀琀栀 䘀愀氀漀眀  匀琀甀戀戀氀攀  䌀攀氀攀爀礀  唀渀琀爀愀椀渀攀搀 䜀爀愀瀀攀猀  嘀椀渀攀礀愀爀搀 匀漀椀氀  䌀漀爀渀 眀攀攀搀猀   䰀攀琀甀挀攀 㐀 圀攀攀欀猀  䰀攀琀甀挀攀 㔀 圀攀攀欀猀  
伀琀栀攀爀 䰀攀琀甀挀攀 㘀 圀攀攀欀猀   䰀攀琀甀挀攀 㜀 圀攀攀欀猀  唀渀琀爀愀椀渀攀搀 嘀椀渀攀礀愀爀搀猀  嘀攀爀琀椀挀愀氀 嘀椀渀攀礀愀爀搀 吀爀攀氀椀猀
䤀渀椀琀椀愀氀 䐀椀猀琀爀椀戀甀琀椀漀渀 䘀椀渀愀氀 䐀椀猀琀爀椀戀甀琀椀漀渀 刀䜀䈀 匀瀀攀挀琀爀甀洀
Figure 1.11: A comparison of the initial coarse labeling, the generated refined labels,
and the precise (true) distribution of labels, of the Salinas valley dataset after five
iterations. Starting with 6 initial coarse labels, we were able to reconstruct the
distribution of the 16 labels with an accuracy of 97.4%.
and a collection of text-based queries from the D-Root DNS authority at the Univer-
sity of Maryland. For our purposes, we combined and reduced the number of known
labels to start, with the goal of reconstructing the hidden ground truth labels for
the first two datasets. The third DNS dataset is completely unlabeled. The initial
and final distribution of the data with labeling can be seen for the first two datasets
in Figure 1.10 and Figure 1.11 along with the evolution of the final dataset labeling
distribution in Figure 1.12, demonstrating the ability of our approach to discover
latent clusters and categories.
Our approach offers two main contributions. First, we are able to improve
and enhance deep learning by introducing an approach that leverages human pat-
18
Figure 1.12: The evolution of the query-space representation over eight iterations,
showing the influence of the iterative labeling. Note that for some clusters certain
colors have been re-used due to the high-number of groups.
tern recognition to improve the data that is used to train the deep learning model.
Second, we show that we are able to construct spatially-meaningful representations
of abstract high-dimensional data that can be easily interpreted and manipulated.
The next work is an expansion of the ideas established here, organized in a vir-
tual three-dimensional environment, and evaluated using a real-world dataset and
experts.
19
1.4 Visual Analytics for Root DNS Data
1.4.1 Introduction
The analysis of vast amounts of network data for monitoring and safeguarding
the root DNS, a core pillar of the internet, is an enormous challenge. Understand-
ing the distribution of the queries received by the root DNS, and how those queries
change over time, in an intuitive manner is sought. Traditional query analysis is
performed packet by packet, lacking global, temporal, and visual coherence, obscur-
ing latent trends and clusters. In Chapter 5 we present our approach that leverages
human pattern recognition and computational power of deep learning with 2D and
3D rendering techniques for quick and easy interpretation and interaction with vast
amount of root DNS network traffic. Working with real-world DNS experts, we
developed a visualization that reveals several surprising latent clusters of queries,
potentially malicious and benign, discover previously unknown characteristics of a
real-world root DNS DDOS attack, and uncover unforeseen changes in the distribu-
tion of queries received over time. These discoveries will provide DNS analysts with
a deeper understanding of the nature of the DNS traffic under their charge, which
will help them safeguard the root DNS against future attack.
1.4.2 Approach
Our visualization consists of three main components, an IP-space visualization,
a 2D query-space visualization, and a 3D query-space visualization. The purpose
20
of these visualizations is to provide analysts with a well-rounded representation of
the DNS packets they receive and process, providing a high-level overview while
preserving low-level salient details. The first visualization, portraying the IP-space
distribution of the received packets, as shown in the left of Figure 1.13, with more
detail provided in Figure 1.14, conveys a 2D representation of the 4D IPv4 space,
with the X-axis as a linear combination of the first two IPv4 octets, and the Y-
axis as the last two IPv4 octets. Within each cell, a five-second period of packet
accumulation is presented, conveying the number and variety of packets received
using the size and color of glyphs. As these packet streams aggregate over time
along the Z-axis they reveal patterns, anomalies, and distinct characteristics of DNS
attacks. This representation provides DNS analysts with an unparalleled level of
temporal and visual coherence for understanding the changing characteristics of
their data.
The second visualization, as shown in the middle of Figure 1.13, leverages
deep learning to project and organize the distribution of high-dimensional non-
spatial DNS queries in an easy-to-interpret two-dimensional spatial layout. In this
representation, distinct clusters of queries emerge, enabling analysts to gain a sense
of the normal and abnormal distribution of queries along with the number of times
each of those queries was received. Using this projection, a set of semantic axes
arises, with alphabet-based queries appearing on the left, and number-based queries
appearing on the right of the space, along with a general usage of alphanumeric
characters on the top, and non-alphanumeric characters on the bottom. Through
our investigation with DNS experts, we have identified several interesting general
21
Figure 1.13: Our root-DNS dual-visualization that provides both high and low-level
overviews and interactions of the IP and query spaces. IP packet traffic is visualized
on the left, revealing hidden patterns, IP distributions, and a real TCP-SYN flood
attack. A two-dimensional query-space generated using deep learning portrays a
spatial distribution of received queries and counts. The right image portrays the
spatial distribution of queries as they change over time, revealing the diminished
number of received queries due to a DDOS.
groups of queries, as presented in the left of Figure 1.15. Such a query-based spatial
visualization was previously unknown to our experts, who achieved new insights
into their traffic using the system. The third visualization, as shown in the right of
Figure 1.13 and Figure 1.15, is a 3D temporal expansion of the 2D query-space visu-
alization, providing additional information such as the evolution of the distribution
of queries over time, revealing patterns and anomalies. Through such visualizations,
our analysts have been able to make connections between the traffic in the IP-space
and the query-space, discovering previously unknown clusters, repeating patterns,
and anomalies.
22
刀愀眀 倀挀愀瀀 䘀椀氀攀猀 䘀愀挀椀氀椀琀愀琀攀 匀瀀愀琀椀愀氀 愀渀搀 吀攀洀瀀漀爀愀氀 䐀攀琀攀挀琀 愀渀搀 䄀渀愀氀礀稀攀 䄀琀琀愀挀欀猀䔀砀瀀氀漀爀愀琀椀漀渀 漀昀 䄀挀琀椀瘀椀琀礀 椀渀 䤀倀ⴀ匀瀀愀挀攀 
䔀砀琀爀愀挀琀 䤀倀 ㄀㤀㈀⸀㄀㘀㠀⸀　⸀　 唀䐀倀 䔀砀琀爀愀挀琀 倀愀挀欀攀琀㄀㈀㌀⸀㐀㔀㘀⸀㜀㠀㤀⸀㈀㔀㔀 吀䌀倀 ⴀ 䄀䌀䬀
䄀搀搀爀攀猀猀攀猀 ㄀⸀㄀⸀㄀⸀㄀ 唀䐀倀 吀礀瀀攀㄀　⸀㄀　⸀㄀　⸀㄀　 吀䌀倀 ⴀ 匀夀一
⸀⸀ ⸀⸀
⸀ ⸀
䈀椀渀 䤀倀瘀㐀 䤀倀ⴀ䄀搀搀爀攀猀猀攀猀
⠀䤀倀瘀㐀⸀㄀Ⰰ 䤀倀瘀㐀⸀㈀Ⰰ 䤀倀瘀㐀⸀㌀Ⰰ 䤀倀瘀㐀⸀㐀⤀ 䴀攀愀猀甀爀攀 䄀琀琀愀挀欀 䤀洀瀀愀挀琀
堀 㴀 䤀倀瘀㐀⸀㄀ 砀 ㈀㔀㘀 ⬀ 䤀倀瘀㐀⸀㈀砀 䈀椀渀䌀漀甀渀琀㈀㔀㘀⨀㈀㔀㘀
夀 㴀 䤀倀瘀㐀⸀㌀ 砀 ㈀㔀㘀 ⬀ 䤀倀瘀㐀⸀㐀砀 䈀椀渀䌀漀甀渀琀㈀㔀㘀⨀㈀㔀㘀
䌀漀甀渀琀 䄀渀搀 䐀椀猀瀀氀愀礀 
倀愀挀欀攀琀猀 伀瘀攀爀 吀椀洀攀 䴀漀渀椀琀漀爀 䄀昀琀攀爀 䔀昀攀挀琀猀
䌀漀甀渀琀 伀挀挀甀爀爀攀渀挀攀 漀昀 
䤀倀ⴀ䄀搀搀爀攀猀猀攀猀 
Figure 1.14: Overview of the process from raw pcap files to The Flow-Map IP-Space
visualization. Starting from a binary pcap file, we extract and count the occurrence
of each IPv4 IP-Address and type of packet. Next, the IPs are converted from a 4D
to a 2D grid representation, with glyphs scaled and colored based on the number and
type of packets. This process repeats for each time slice, with slices stacked along
the z-axis. The result is then visualized using 3D accelerated rendering, which
allows for high-level structure and low-level analysis, to help analysts establish a
sense of normalcy (central blue image), identify outliers (green TCP burst), classify
and characterize attacks (top right), measure attack impacts (middle right), and
monitor after effects (lower right).
1.4.3 Results
We have validated our approach on a real-world dataset that includes a DDOS
attack from one of the servers under the domain of the D-Root DNS authority. In
the IP-space visualization, a baseline before the attack is established, with the ma-
23
椀洀攀吀
䌀愀琀攀最漀爀椀稀攀搀 儀甀攀爀礀 䰀愀琀攀渀琀 匀瀀愀挀攀 ㌀䐀 匀瀀愀琀椀漀琀攀洀瀀漀爀愀氀 儀甀攀爀礀 匀瀀愀挀攀
刀愀渀搀漀洀 䌀栀愀爀愀挀琀攀爀猀
樀焀甀瀀挀礀漀栀洀眀
砀欀挀樀砀椀栀稀栀最焀 䐀䐀伀匀 䄀琀琀愀挀欀 䤀渀琀攀爀爀甀瀀琀椀漀渀
樀戀眀砀眀眀爀洀瘀甀搀甀栀甀
愀焀唀䰀䰀砀䈀䘀䰀洀䰀儀樀䤀礀 䤀倀 䌀漀渀昀椀最甀爀愀琀椀漀渀猀
吀栀儀礀夀䔀戀琀䬀䔀 㤀㤀⸀㄀㄀㔀⸀㄀㄀⸀㄀㜀㔀Ⰰ 㤀㤀⸀㄀㄀㔀⸀㄀㄀⸀㄀㜀㔀
䬀渀栀最夀堀匀唀樀䈀焀䌀䌀氀 㐀㈀⸀㄀㈀㜀⸀㄀　㔀⸀㈀　㔀Ⰰ
䠀漀洀攀 䄀渀搀 䰀漀挀愀氀 最㤀ⴀ㈀㜀樀瀀最㘀　稀洀⸀㄀㠀㔀⸀㌀㠀⸀㄀㐀㤀⸀㔀㜀 　㘀㘀⸀㌀㔀㘀⸀㈀㔀㌀⸀㄀㄀㘀
䐀䄀唀䴀䌀伀䴀䴀䔀刀䌀䤀䄀䰀⸀䰀伀䌀䄀䰀 戀㄀㤀㜀ⴀ琀㄀㘀　　ⴀ㄀㈀ ㄀㄀㈀㔀㔀⸀㈀㔀㔀⸀㈀㔀㔀
䄀唀吀伀䐀椀猀䌀漀瘀攀刀⸀䌀伀氀漀最⸀氀伀䌀愀氀 愀挀㐀攀　㈀㈀挀㄀㄀昀㘀㤀昀㄀㠀㔀㜀㠀㐀挀㈀㜀愀㜀㜀愀㄀㄀挀㠀㄀ ㄀㤀㈀⸀㄀㘀㤀⸀㄀　⸀㄀ⴀ㈀㔀㐀
挀琀愀欀一渀搀瀀爀攀⸀栀伀䴀䔀猀吀䄀琀䤀漀渀 爀　　㔀愀渀　挀
倀椀一甀䜀眀爀眀稀最⸀栀漀洀攀猀琀愀吀䤀伀渀
最攀攀欀渀漀愀挀愀爀爀⸀䠀漀䴀攀猀琀䄀吀椀漀渀
琀䌀挀漀猀漀搀礀⸀䠀伀䴀䔀匀琀愀琀䤀伀渀
䐀唀愀䰀䌀伀刀䔀⸀䠀䤀琀爀漀一栀甀戀⸀栀漀洀攀
倀刀䄀䠀栀䜀焀眀䐀⸀栀伀䴀攀 儀甀攀爀椀攀猀 圀椀琀栀 䐀漀洀愀椀渀猀
匀䴀匀开匀䰀倀⸀猀礀搀氀愀戀⸀氀漀挀愀氀
䐀猀氀䤀稀砀稀漀娀䠀䘀䜀瘀甀甀⸀栀伀洀攀
䄀䐀䌀㈀㄀　㤀㔀⸀砀氀栀攀愀氀琀栀⸀氀漀挀愀氀
眀瀀愀搀⸀樀椀洀猀开愀甀琀漀
唀渀甀猀甀愀氀 儀甀攀爀椀攀猀 　㔀　㌀⸀倀爀攀猀椀搀椀漀⸀䌀漀爀瀀
⨀⸀搀渀猀ⴀ猀搀⸀甀搀瀀⸀⨀ 樀漀戀猀⸀愀昀瀀渀攀琀⸀漀爀最⼀䘀䔀倀
眀眀眀⸀쌃鈁ꌃ儥뀀쌃جئ쌃쌃鈁쐀⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀 倀䌀ⴀ㘀ⴀ倀䌀⸀氀漀挀愀氀 䤀倀猀Ⰰ 一甀洀戀攀爀猀Ⰰ 愀渀搀 
眀眀眀⸀쀃ဥ쀃똀쀃ဥ쀃鈁쀃ȥꘃథ쌃⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀 渀猀㄀⸀愀搀漀猀⸀渀攀琀
渀猀㠀⸀眀攀氀挀漀洀瀀愀渀椀攀猀⸀挀漀洀 䌀漀搀攀 䘀爀愀最洀攀渀琀猀
⨀⸀瀀欀㔀　　㄀稀 ㈀㐀　㔀─㌀䄀㈀　㔀─
昀攀㠀　㨀㠀攀㌀愀㨀攀㌀昀㨀昀攀㌀㔀㨀攀㜀㐀㤀─眀氀愀渀　─㄀㐀
㌀㘀㐀㌀㘀㜀㐀㨀
栀琀琀瀀㨀⼀⼀嬀昀攀㠀　㨀㈀㄀搀㨀戀㔀昀㨀昀攀㈀㈀㨀攀昀挀㘀崀
唀渀甀猀甀愀氀 䌀栀愀爀愀挀琀攀爀猀 　㄀⼀㘀㈀㄀㜀㘀
嬀㐀㄀㌀㈀㌀㜀㜀⸀㤀㘀　　㐀㌀崀
윀�蠥欥   %开眀搀 㘀㘀⸀㌀⸀㄀㘀㔀⸀㈀㐀㈀⼀㈀㤀
眀攀焀㜀㄀最欀㔀樀㠀欀ⴀ焀⸀攥  쐀儀㴀넃 ☀攀瘀攀渀琀琀礀瀀攀㴀挀氀漀猀攀☀爀攀愀猀漀渀㴀㔀☀
윀甀䬀蠥 넃嬥 一堀 堥蠥錥ဥ 昀漀爀㴀㄀㘀㘀娥⸀椀渀琀攀爀渀愀氀 吀椀洀攀
Figure 1.15: In the lower-left, the 2D query-space clustered into semantic query cat-
egories. There is a general trend with alphabet-based queries towards the left, and
number-based queries the right, along with a general trend to alphanumeric char-
acters at the top, and non-alphanumeric characters at the bottom. The lower-right
shows the temporal query visualization, portraying a high-level temporal overview of
the distribution of queries. The top reveals a selection of interesting observations,
such as rapidly diminishing groups of queries at the start, temporally repeating
groups of queries, the large reduction in queries during the attack, and an overall
decrease in the number of queries over time.
jority of the IP-space consistently empty, and many of the used IP-bins falling
into one of two categories, consisting of low or high volume traffic. Within the
flow of the traffic, as shown in Figure 5.1 several bursts of short-lived anomalies
emerge, and other temporally repeating bins of traffic are observed. In addition,
the scale, duration, and distinct characteristics of a real DDOS attack are revealed,
24
such as the distinct range of IP-bins used, and the oscillatory nature of the flow
of packets. The characterization of this DDOS attack through our visualization
provided more detail than the official report on this attack released by the Root
Servers (http://root-servers.org/news/events-of-20160625.txt). The two and three-
dimensional visualizations of the query distributions reveal the wide variety of re-
ceived queries for the DNS to resolve. Historically, queries are generally ignored in
packet analysis due to their unstructured, high-dimensional, and abstract nature.
In this visualization, the distribution, count, and evolving behavior of these queries
are revealed. In Chapter 5 we go into more detail about the various discoveries
made by the DNS experts. There were three categories of discoveries made, (1)
those involving the distribution of the random characters, as shown in Figure 1.16,
(2) a distribution of numeric and code-fragments (Figure 1.17), and (3) a distribu-
tion of queries containing non-alphanumeric characters (Figure 1.18). Through joint
interactions between the query-space and IP-space visualizations, analysts can see
from which IP-bins and when certain queries arrive, indicating if a particular query
or set of queries comes from a wide or small set of users and the variety of queries
sent from a particular bin. Using these visualizations, our DNS experts were able
to learn many new aspects of the data that passes through their system.
25
儀䌀匀栀䈀䨀倀夀䄀
爀唀䘀氀娀夀䨀礀堀
甀䜀樀䤀儀䬀䈀䨀戀
倀琀搀砀刀堀夀嘀椀挀 匀吀䬀䠀䌀娀娀夀
砀堀堀砀堀 稀儀䜀䌀焀䔀䄀礀礀䰀甀倀娀䰀樀焀嘀䠀一
挀瘀愀瀀瀀氀琀⸀䈀攀氀欀椀渀 䜀一伀伀儀䘀圀刀
渀堀倀堀刀眀搀唀䜀爀䜀圀匀 樀伀䌀栀眀圀䨀
堀眀瀀稀堀伀䠀稀䨀吀
猀䤀䜀椀栀娀最眀娀渀琀䘀娀
愀儀倀吀堀渀伀䨀稀倀最一椀
愀樀戀椀昀甀椀栀愀
儀䈀稀渀娀眀氀眀樀䤀樀䄀 樀戀眀礀欀爀瘀樀 攀洀渀攀眀樀戀漀椀
焀猀愀砀琀洀樀
䬀儀圀漀䰀樀稀瀀栀 洀攀椀昀欀栀椀渀甀樀 樀挀戀眀椀焀栀栀挀
洀洀最氀琀愀最稀⸀挀漀洀
Figure 1.16: The region of the query-space consisting of different distributions of
random characters.
26
嬀㐀㘀㌀　㐀㔀㐀　⸀㘀㌀㠀㤀㄀㘀崀
椀瀀ⴀ㄀㈀㤀ⴀ㄀㈀㄀ⴀ㄀㘀ⴀ㐀㤀⸀氀漀挀愀氀
搀瘀焀㤀砀㘀椀ⴀ樀瀀㜀㌀搀⸀㈀　㠀⸀㔀㐀⸀㐀⸀㈀㔀㄀Ⰰ 㔀㈀⸀㌀㌀⸀㄀㘀㔀⸀㄀㤀㈀ ㄀㜀㌀⸀㔀㔀⸀㈀㌀㐀⸀㈀㌀Ⰰ ㄀㜀㐀⸀㔀㔀⸀㈀㌀㐀⸀㈀㌀
㈀㘀　㄀─㌀䄀㐀　　　─㌀䄀㘀挀搀㜀─㌀䄀攀㄀㜀戀─㌀䄀戀㔀攀㜀─㌀䄀㠀戀㤀戀─㌀䄀昀㘀㄀　 椀瀀ⴀ㄀　ⴀ　ⴀ㘀ⴀ㐀
昀攀㠀　㨀㈀　㘀攀㨀㤀挀昀㨀攀攀挀㠀─瀀㈀瀀　─㈀㔀
㄀　⸀㈀㤀⸀㄀㘀⸀㤀㐀㨀㔀㤀㠀㄀㄀ 㤀㈀⸀㔀㐀⸀㄀㐀　⸀㔀㐀　㨀㌀㈀㠀㐀㠀
栀琀琀瀀㨀⼀⼀嬀㈀㘀　㐀㨀㈀搀㠀　㨀挀　㄀昀㨀挀　㠀㐀㨀攀㐀搀　㨀攀㜀愀　㨀㄀㠀㌀愀㨀㄀昀㠀㔀崀㄀　⸀㤀㈀⸀㜀㤀⸀㈀㌀㈀ⴀ㄀　⸀㤀㈀⸀㜀㤀⸀㈀㌀㈀⸀眀愀瀀愀⸀椀渀琀Ⰰ
㄀㄀㌀⸀㈀㠀⸀㌀㄀㔀⸀㄀㠀㘀
㄀㘀㜀⸀㄀⸀㌀㄀㐀⸀㈀㘀 ㄀㜀㐀⸀㈀㌀⸀㔀　⸀㄀㘀㄀Ⰰ─㈀　㘀㘀⸀㄀　㈀⸀㘀⸀㈀㐀㌀
㈀㔀㐀⸀㤀㤀⸀㈀㘀㔀⸀㄀㜀㐀
㄀㠀㄀⸀㄀㘀㔀⸀㈀㌀㠀⸀㠀㤀Ⰰ 㘀㘀⸀㈀㐀㤀⸀㠀㔀⸀㠀
开㔀㌀㠀开㤀㤀开㈀
昀漀爀㴀㄀㜀㈀⸀㔀㘀⸀㐀　⸀㠀㘀 开㜀㠀㠀开㔀㜀开㈀
昀漀爀㴀㄀㄀㐀⸀㜀㜀⸀㄀㠀　⸀㄀㔀㈀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
☀攀瘀攀渀琀琀礀瀀攀㴀挀氀漀猀攀☀爀攀愀猀漀渀㴀㌀☀搀甀爀愀琀椀漀渀㴀㤀㔀㔀
Figure 1.17: The region of the query-space consisting of different distributions of IP
addresses, fragments, and expressions.
27
爀⸀开搀渀猀ⴀ猀搀⸀开甀搀瀀⸀윀堥ꈀ 爀⸀开搀渀猀开猀搀ⴀ⸀甀搀瀀⸀جئ쌃
戀⸀开搀渀猀ⴀ猀搀⸀开甀搀瀀⸀瀀頃%
眀眀眀⸀딀ꘃجئ쌃쌃⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀 氀椀攀甀琀愀甀搀─㈀　琀爀愀渀猀瀀漀爀琀─㈀　琀漀甀爀椀猀琀椀焀甀攀
搀戀⸀开搀渀猀ⴀ猀搀⸀开甀搀瀀⸀倀䠢ꈀ
眀眀眀⸀ꘃ쐀頃였ꌃ唥�⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
䤀渀琀攀爀渀愀氀 攀爀爀漀爀 ⴀ 䤀渀瘀愀氀椀搀 䄀琀漀洀
栀氀樀眀甀猀昀眀⸀吀栀攀爀攀 愀爀攀渀✀琀 愀渀礀 匀攀愀爀挀栀 䐀漀洀愀椀渀猀 猀攀琀 漀渀 唀匀䈀 ㄀　⼀㄀　　⼀㄀　　　 䰀䄀一
搀㄀㘀㜀瘀㜀㌀㤀砀昀㄀樀最⸀윀猀尀搀ꀀꀀ椥 ꀀꀀ氥ⴀ堀栀
栀攀氀漀㴀洀挀㈀ⴀ瀀爀漀搀ⴀ最攀渀⸀愀最漀爀愀⸀氀漀挀愀氀⤀
攀　　㄀戀甀爀欀眀㄀㐀　㤀㔀搀㰀䌀 伀 䔀㈀
爀琀砀㔀琀瀀㄀最瘀昀㈀㔀氀⸀ꀀꀀ㰥春愢ꀀꀀ쐀儀㴀ꘃ ⠀渀甀氀⤀㌀㌀
眀眀眀⸀最愀爀搀攀爀椀攀帥థ洀攀氀椀攀⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
稀㤀漀瀀挀挀椀㘀砀最瀀㜀最⸀ꀀꀀ最넃ꀀꀀ쐀儀㴀뀀 眀眀眀⸀栥鄥栥儥搥搥جئ搥栥唥栥儥栥唥⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
眀眀眀⸀쌃鈁ꌃ儥뀀쌃جئ쌃쌃鈁쐀⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
眀眀眀⸀頃ဥ딀錥혀쌃唥쐃밀밀ꌃ儥쌃جئꌃ唥윀ꌃ唥ꄀ쌃ꄀꨀ⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
Figure 1.18: The region of the query-space consisting of unusual characters and
queries.
28
Chapter 2: Virtual Memory Palaces: Immersion aids Recall
2.1 Introduction
Throughout history, humans have relied on technology to help us remember
information. From cave paintings, clay tablets, and papyrus, to modern paper,
audio, and video, we have used technology to encode and recall information. This
chapter addresses the question of whether virtual environments could be the next
step in our quest for better tools to help us memorize and recall information. Virtual
reality displays, in contrast to traditional displays, can combine visually immersive
spatial representations of data with our vestibular and proprioceptive senses. The
technique of memory palaces provides a natural spatial mnemonic to assist in recall.
Since classical times, people have used memory palaces (Method of Loci), by taking
advantage of the brain’s ability to spatially organize thoughts and concepts [10–
12]. In a memory palace, one mentally navigates an imagined structure to recall
information [13, 14]. Even the Roman orator Cicero is believed to have used the
memory palace technique by visualizing his speeches and poems as spatial locations
within the auditorium he was in [13, 15]. Spatial Intelligence has been associated
with a heightened sense of situational awareness and of relationships in one’s own
surroundings [16,17].
29
Research in cognitive psychology has shown that recall is superior in the
same environment in which the learning took place [18]. Such findings of context-
dependent memory have interesting implications for virtual environments that have
not yet been fully explored. Imagine, for instance, a victim of a street aggression
being asked to recall the appearance details of their assailant. Virtual environments
that mirror the scene of the crime could provide superior assistance in recall by
placing the victim back into such an environment.
In this chapter we present the results of a user study that examined if virtual
memory palaces could assist in superior recall of faces and their spatial locations
aided by the context-dependent immersion afforded by a head-tracked head-mounted
display (HMD condition) as compared to using a traditional desktop display with a
mouse-based interaction (desktop condition). To explore this question we designed
an experiment where participants were asked to recall specific information in the
two environments: the HMD condition and the desktop condition. We created the
virtual memory palaces prior to the start of the study. Our hypotheses are as follows:
• Hypothesis 1: The participant memory recall accuracy will be higher in the
HMD condition as compared to the desktop condition due to the increased
immersion.
• Hypothesis 2: Participants will have higher confidence in their answers in the
HMD condition as compared to the desktop condition.
The experiment was a within-subject, 2 × 2 × 2 Latin-square design, ensuring
all the different combinations of variables and factors were accounted for. The
30
experimental results of our study support both hypotheses.
2.2 Related Work
Memory palaces have been used since the classical times to aid recall by using
spatial mappings and environmental attributes. Figure 2.1 shows a depiction of a
memory palace attributed to Giulio Camillo in 1511. The idea was to map words
or phrases onto a mental model of an environment (in this case an amphitheater),
and then recall those phrases by mentally visualizing that part of the environment.
Figure 2.1: Giulio Camillo’s depiction of a memory palace (1511 AD). Memory
palaces like this have been used since the classical times as a spatial mnemonic.
An important component of the memory palace technique is the subjective
experience of being virtually present in the palace, even when one is physically
elsewhere. This notion of presence has long been considered central to virtual en-
vironments, for evaluation of their effectiveness as well as their quality [19]. More
precisely, Slater et al. [20] developed the idea of place illusion (PI), referring to the
aspects of presence “constrained by the sensorimotor contingencies afforded by the
31
virtual reality system”. Sensorimotor contingencies are those actions which are used
in the process of perceiving the virtual world, such as moving the head and eyes to
change gaze direction or seeing around occluding objects to gain an understanding of
the space o2001sensorimotor. Slater et al. [20] therefore concluded that establishing
presence or “being there” for lower-order immersive systems such as desktops is not
feasible. In contrast, the sensorimotor contingencies of walking and looking around
facilitated by head-mounted displays contribute to their higher-order immersion and
establishing presence.
Recent research in cognitive psychology [21] suggests that the mind is inher-
ently embodied. The way we create and recall mental constructs is influenced by the
way we perceive and move [22, 23]. The memory system that encodes, stores, rec-
ognizes, embodies, and recalls spatial information about the environment is called
spatial memory [24]. Several studies have found that embodied navigation and
memory are closely connected [25,26]. Madl et al. [24] states that there are several
different types of brain mechanisms involved in processing spatial representations
in the brain.Grid cells in the entorhinal cortex, used for path integration, are ac-
tivated by changes in movement direction and speed [27, 28]. Head-direction cells
activate in the medial parietal cortex when the head points in a given direction,
providing information on viewing direction [29]. Border cells and boundary vector
cells in the subiculum and entorhinal cortex activate in close proximity to environ-
ment boundaries, depending on head direction [28, 30]. Lastly, place cells in the
hippocampus activate in specific spatial locations, independent of orientation, pro-
viding an internal representation of the environment [31,32]. It is believed that place
32
cell fields arise from groups of grid and boundary cells which activate for different
spatial scales and environmental geometry to provide a sense of location [33, 34].
In addition, these hippocampal cells also provide information about place-object
associations, associating place cell representations of specific locations with the rep-
resentations of specific objects in recognition memory [35, 36]. This leads us to the
possibility that a spatial virtual memory palace, experienced in an immersive vir-
tual environment, could enhance learning and recall by leveraging the integration
of vestibular and proprioceptive inputs (overall sense of body position, movement,
and acceleration) [32].
2.2.1 Memory Palaces on a Desktop Monitor
Legge et al. [37] have compared the use of the traditional method of Loci using
a mental environment against a 3D graphics desktop environment. In this study,
the subjects were divided into three groups. The first group was instructed to use
a mental location or scene, the second group a 3D graphics scene, and the third
(control) group was not informed on the use of any mnemonic device. The subjects
in the three groups were given 10 to 11 uncorrelated words and asked to memorize
the words with their mnemonic device, if any. The users then recalled the words
serially. This study found that the users who used a graphics desktop environment
as the basis for their method of Loci performed better than those using a mental
scene of their choice, and those who were not instructed on a memory strategy
did not perform as well as those who were instructed to use the memory strategy.
33
Fassbender and Heiden [38] compared the ability of users to recall a list of 10 words
when using a desktop compared to memorizing the word list. The authors created
a navigable 3D castle with 4 sections and 10 objects, where each object has a visual
and audible component, with the idea that a user will associate a word with that
object. First, each user was given 10 words to memorize and then were asked to
recall as many as they could after a two-minute distraction task. Next, each user
was explained and shown the 3D castle on a desktop. After being given time to learn
the associations between the words, images, and audio, the users were evaluated on
their ability to recall the words in the 3D castle on the desktop. The study found
that there was no significant difference between the users’ ability to immediately
recall the words after a two-minute break, but after one week there was a 25%
difference in recall in favor of the 3D graphics desktop memory palace environment
condition. The above studies show that compared to a purely mental mnemonic, a
graphics-desktop setup is better in assisting retention and recall.
Both of these studies have been carried out on desktops and not in immersive
HMDs. In our study we compare the performance of users on a desktop compared
with an immersive HMD.
2.2.2 Memory Palaces on Multiple Displays
The efficacy of varying immersion levels by changing the field of view has also
been studied in the context of procedural training [39]. Sowndararajan et al. [40]
compared subject performance for a simple and complex procedural task (involving a
34
different number of steps and interactions), but with two different fields of view – one
with a laptop and the other with a large rear-projected L-shaped display. The study
had participants trained on two procedures and the performance with the two levels
of immersion was compared. The study found that higher levels of immersion (in this
case, field of view) were more effective in learning complex procedures that reference
spatial locations. In addition, there was no statistical difference in performance for
the simple task for the different levels of immersion. Ragan et al. [41] carried out a
user study in which participants were asked to memorize and recall the sequence of
placement of virtual objects on a grid shown on three rear-projected screens (one
front and two side screens). The participants were divided into multiple groups that
performed the task with different fields of view and fields of regard. The field of
view is the size of the visual field seen in one instant while the field of regard is the
total size of the visual field that can be seen by a user [39]. Both are measured in
degrees of visual angle. Ragan et al. found that higher field of view and field of
regard produced a statistically-significant performance improvement.
The above studies examined the effectiveness of memory recall of objects, their
locations, and the sequence of placement actions, in a limited field of view and field
of regard in monoscopic display environments with multiple monitors. The field
of regard in these studies did not surround the viewer completely. In our study we
wanted to examine the effectiveness of stereoscopic, spherical-field of regard afforded
by modern HMDs compared to a desktop for memory recall of objects and their
spatial locations.
35
2.2.3 Search and Recall in Head-mounted Displays
Pausch et al. [42] studied if immersion in a virtual environment using a HMD
aids in searching and detection of information. For their study they created a virtual
room with letters distributed on walls, ceiling, and floor. A user was placed in the
center of this room and was asked if a set of letters was present or not. The test
was conducted using a HMD and a traditional display with a mouse and keyboard.
They found that when the search target was present, the HMD and the traditional
display had no statistically significant difference in performance. However, when
the target was absent, the users were able to confirm its absence faster in the HMD
than on the traditional display. In addition, the users that used the HMD first and
then moved to a traditional desktop had better performance than those who used
the desktop first and then the HMD. This suggests a positive transfer effect from
the HMD to a desktop. Our user study is highly influenced by the study of [42],
but in our study, users perform recall rather than search.
Ruddle et al. [43] compared user navigation time and relative straight-line
distance accuracy (amount of wasteful navigational movement) between a HMD
and a traditional desktop. Users were then asked to learn the layout of two virtual
buildings, one using a HMD, and the other using a desktop. After familiarizing
themselves with the buildings, each user was placed in the lobby of that building
and were told to go to each of five named rooms and then return to the lobby. They
found that the users wearing the HMD had faster navigation times, less waste-full
movement, and were more accurately able to estimate distances, compared to those
36
using a desktop.
Mania et al. [44] examined accuracy and confidence levels associated with re-
call and cognitive awareness in a room filled with objects such as pyramids, spheres,
and cubes. Participants were exposed to one of the following scenarios: (a) a virtual
room using a HMD, (b) a rendered room on a desktop, or (c) a real room experi-
enced through glasses designed to restrict the field of view to 30◦ to match that of
the HMD and desktop. All the four walls of the room were distinct. After three
minutes of exposure, the participants were given a paper containing a representation
of the room which included numbered positions of objects in the various locations.
The participants were asked to recall which objects were present and where they
were located in the room, and to give a confidence and awareness state with each
answer. The study evaluated the participants immediately after the exposure and
then again after one week. The study found that immediately after the exposure
the participants had the most accurate recall in the real-world scene, were slightly
less accurate and confident in the HMD, and least accurate and confident on the
desktop. After one week, the overall scores and confidence levels dropped consis-
tently across the board, with the viewing condition having no effect on the relative
reduction in performance. In this inspirational study, the participants only experi-
enced one display. In our study the participants were exposed to both the desktop
and the HMD. This makes it possible to compare recall for the same user across
the two display modalities. Further, to use the context provided by immersion, the
participants in our study were asked to recall the information while viewing the
same virtual scenes on the same display, rather than recording their answers on a
37
representation of the scene on paper.
Harman et al. [45] explored immersive virtual environments for memory recall
by having participants take on the role of a boarding an airplane in a virtual airport.
After the experience, the participants were asked about the tasks they performed.
The participants who experienced the virtual airport in a HMD had more accurate
recall than those who used the desktop. In this study each participant used either a
HMD or a desktop. Also, the evaluation of the memory recall was done outside of the
visual experience, through a questionnaire. In our study, not only do participants
experience the virtual environment in both, HMD and desktop, but are also asked
to recall in the same environment in which they experienced the information.
2.2.4 Embodied Interaction and Recall
Virtual walk-throughs have been one of the earliest applications of virtual
worlds [46]. Brooks et al. [47] studied if active participants had superior recall of
the layout of a 3D virtual house on a desktop compared to passive participants.
Active participants controlled camera navigation via a joystick, while passive par-
ticipants observed the navigation. They found that active participants had a supe-
rior environment layout recall compared to those who were passive. However, they
also found that there was no statistically significant difference between the recall or
recognition of objects (such as furniture or entrances and exits of a room) or their
positions within the environment between the active and passive participants. This
suggests that memory was only enhanced for those aspects of the environment that
38
were interacted with directly – particularly the environment which was navigated.
Richardson et al. [48] had users learn the layout of a complex building through
either 2D maps, physically walking through the real building, or through a 3D
virtual representation of that building built using the Doom II engine and shown
on a desktop. The study found that when the building was a single floor, the
real-world and virtual-environment-trained users had comparable results. However,
when the building had two floors, relative view orientation during learning and
testing mattered. If the participants were in the same orientation that they had used
during learning, they were able to navigate the environment just as well as those
who were physically in the environment. However, participants were susceptible
to disorientation if their starting-out views were different between their training
and testing. The authors concluded that training in the virtual and real-world
environments likely used similar cognitive mechanisms.
Wraga et al. [49] compared the effectiveness of vestibular and proprioceptive
rotations in assisting recall by having participants recall on which of the four walls
was a object located relative to their orientation before and after rotation. Par-
ticipants were placed in a virtual room with four distinctly colored alcoves on four
walls and given time to learn and recognize the alcoves. Participants would then
rotate, either using the HMD accelerometer or a joystick, to find a certain object
on one of the alcoves as described by the tester. Once the user was looking at that
object on one of the alcoves, their view would be frozen and the tester would ask
the participant to state where a particular (different) alcove was relative to their
orientation. They found that users in a HMD were better able to keep track of the
39
objects by rotating their heads as compared to using a joystick. In another experi-
ment, the authors also found that users in a HMD who controlled their bearing in
a virtual world by actively rotating in a swivel chair were better able to keep track
of an object than those that were being rotated by a tester. In our study we expect
vestibular and proprioceptive inputs to improve performance in the HMD. We study
how well people can recall information regardless of their orientation. In addition,
our objects are distributed in more than four unique locations.
Perrault et al. [50] leveraged the method of Loci technique by allowing par-
ticipants to link gestural commands, that would control some system, to physical
objects within a real room. They compared their interaction technique to a mid-air
swipe menu which relies on directional swiping gestures. Their idea was to leverage
spatial, object, and semantic memory to help users learn and recall a large number
of gestures and commands. In a home environment, participants were shown a com-
mand (or stimulus) on a television and then performed a motion that a Microsoft
Kinect would track and record as representing that command. For the mid-air swipe
the participant would perform a 2-segment marking menu gesture. For the physical
Loci, the participant would simply point at an object in the environment that they
wanted associated with the command, such as a chair or poster. Once the gestures
and physical Loci were trained, the participants went into the recall phase. In this
phase, a command would be presented on the screen and the user had to quickly and
accurately perform the corresponding gesture. The system would then show if the
participant performed the correct gesture or pointed at the correct Loci object that
they originally assigned for that command. The authors found that users, when us-
40
ing their physical Loci technique, had superior command recall and was more robust
compared to the more traditional mid-air swipe menu.
2.3 Method
A memory palace is a spatial mnemonic technique where information is asso-
ciated with different aspects of the imagined environment, such as people, objects,
or rooms, to assist in their recall [13,14]. The goal of our user study was to examine
if a virtual memory palace, experienced immersively in a head-tracked stereoscopic
HMD, can assist in recall better than a mouse-based interaction on a traditional,
non-immersive, monoscopic desktop display. Previous work has examined the role
of spatial organization, immersion, and interaction in assisting recall.
This study is different from the previous work in several ways. First, we are
focusing on spatial memory using a 3D model of a virtual memory palace, rather
than relying on other forms of memory (such as temporal/episodic). Second, both
the training and testing (recall) phases take place within the same virtual memory
palace. Third, participants used both the desktop and HMD displays, which allows
us to compare each participant’s recall across displays. Lastly, the content used
in previous studies was either abstract, verbal, textual, visually simplistic, low in
diversity, or time-based, whereas our study uses faces, with unique and diverse
characteristics.
41
2.3.1 Participants
Our user study for this research was carried out under IRB ID 751321 − 1
approved on August 7, 2015 by the University of Maryland College Park IRB board.
In this study, we recruited 40 participants, 30 male and 10 female, from our campus
and surrounding community. Each participant had normal or corrected-to-normal
vision (self reported). The study session for each participant lasted around 45
minutes.
2.3.2 Materials
For this study we used a traditional desktop with a 30 inch (76.2) cm - diagonal
monitor and an Oculus DK2 HMD. The rendering for the desktop was configured
to match that of the Oculus with a resolution of 1920 × 1080 pixels (across the
two eyes) with a rendering field-of-view (FOV) of 100◦. In order to give the desk-
top display the same field-of-view as the HMD, the participants were positioned
with their heads 10 inches (25.4 cm) away from the monitor. The software used
to render the 3D environments on both the desktop and HMD was identical and
was designed in-house using C++ and OpenGL accelerated rendering. The render-
ing was designed to replicate a realistic looking environment as closely as possible,
incorporating realistic-lighting, shadows, and textures. The models (the medieval
town and palace), were purchased through the 3D modeling distribution website
TurboSquid [51,52].
42
2.3.3 Design
The participants were shown two scenes, on two display conditions (head-
tracked HMD and a mouse-based interaction desktop), and two sets of faces (within-
subject design), all treated as independent variables, with the measured accuracy of
recall as the dependent variable. The two scenes (Virtual Memory Palaces) consist
of pre-constructed palace and medieval town environments filled with faces. We
decided to use faces given the previous work [53, 54] showing the effectiveness of
memory palaces aiding users in recalling face-name pairs. We used faces as the
objects to be memorized and carefully partitioned them into two sets of roughly
equal familiarity. We quantified the familiarity of the faces using Google trends data
over the four months preceding the study. The faces are shown in the appendix (at
the very end of the dissertation) in Figures 1 and 2, and the Google trends statistics
are presented in Tables 1 and 2. There was no statistically significant difference
between the two sets of Google trends data: p = 0.45 > 0.05.
The faces in the palace and medieval town were hand positioned for each en-
vironment, before the start of the study, and remained consistent throughout the
study. We distributed the faces at varying distances from the users’ location (see
Figure 2.4) so that they surrounded and faced the user. Since we used perspective
projection, the sizes of the faces varied. However, the distribution of the angular res-
olution of the faces across the two sets/environments were not statistically different,
with p = 0.44 > 0.05 (see Table 3 in the appendix).
Users were allowed to freely rotate their view but not translate. This effectively
43
simulated a stereoscopic spherical panoramic image with the participant at its center.
Our motivation behind this study design decision was that if even this limited level
of immersion could show an improvement in recall, it could lead to a better-informed
exploration of how greater levels of immersion relate to varying levels of recall.
2.3.4 Procedure
First, each participant familiarized themselves with all the 42 faces and their
names used in the study. The participants received a randomly permuted collection
of printouts, each containing a face-name pair used in the study. Participants were
given as much time as needed until they stated when they were comfortable with
the faces. In general, participants did not spend more than 5 minutes on this
familiarization.
Next, each participant was told about the training and testing procedure,
including how many faces were going to be in each scene (21), how much time they
had to view the faces (5 minutes), how the breaks would work, that the faces would
be replaced with numbers in the recall phase, and that they were to give a name and
confidence for their recalled faces for each numbered position. In almost every case
we recorded the answer as the name explicitly recalled by the participant. However,
in rare, exceptional circumstances, when the participants gave an extremely detailed
and unambiguous description of the face (“fat, wore a wig, was King of France, and
is not Napoleon” for King Louis), we marked it correct. Next, each participant was
placed either in front of a desktop monitor with a mouse or inside a head-tracked
44
Figure 2.2: The two Virtual Memory Palace scenes used in our user study (a) an
ornate palace, and (b) a medieval town, as seen from the view of the participants.
stereoscopic HMD. They were given as much time as they desired to get comfortable,
looking around the scene without numbers or faces. The users rotated the scene on
a desktop monitor with a mouse and in the HMD setup they rotated their head and
body, but no further navigation was possible.
Once each participant was comfortable with the setup and the controls, a set
of 21 faces were added to the 3D scene and distributed around the entire space as
shown in Figure 2.4. We used two such scenes – a palace and a medieval town,
shown in Figure 2.2. The faces were divided into two consistent sets used for the
whole study; if a face appeared in one set (or scene) for a given participant, it would
not be shown again in the second set or scene.
To cover all possible treatments of the 2 × 2 × 2 Latin square design, each
participant was tested in both scenes, both display conditions (HMD and desktop),
and both sets of faces, with their relative ordering counterbalanced across partici-
45
pants. The 21 faces within the scene were presented to the participants all at once,
and the participants were able to view and memorize the faces in any order of their
choosing. The faces were deterministically placed in the same order for all partici-
pants. However, since the participants were free to look in any direction, the order
of presentation of faces was self-determined. Each participant was given five minutes
to memorize the faces and their locations within the scene. After the five-minute
period, the display went blank and each participant was given a two-minute break in
which they were asked a series of questions. Questions we asked included how each
participant learned about the study, what their profession/major was, and what
were their general hobbies or interests. In the second half of the study, during the
break for the alternative display, we asked how often a participant used a computer,
what their previous experience was with VR, and their general impressions of VR.
We consistently asked these questions of each participant, but did not record the
responses.
The reasons for these study design decisions are rooted in foundational research
in psychology on memory. From the seminal work by [55] we learn that the working
memory [56] can only retain 7 ± 2 items. According to Atkinson et al. [57] the
information in the short-term memory decays and is lost within a period of 15− 30
seconds. We feel confident that having participants recall 21 faces after a two-minute
break will engage their long-term memory.
After the two-minute break, the scene would reappear on the display with
numbers having replaced the faces, as shown in Figure 2.3. Each participant was
then asked to recall, in any order, which face had been at each numbered location.
46
Figure 2.3: Virtual memory palace: recall phase
Figure 2.4: Locations of faces and numbers in the Virtual Memory Palaces used in
our user study (a) an ornate palace, and (b) a medieval town. Note that this is
not the view the participants had during the experiment, these pictures are used
to convey the distribution of the face locations. The participants would have been
placed in the middle of these scenes surrounded by the faces as seen in Figure 2.2.
47
During this recall phase, each participant could look around and explore the scene
just as they did in the training phase, using the mouse on the desktop or rotating
their head-tracked HMD. Each participant had up to five minutes to recall the names
of all the faces in the scene. Once the participant was confident in all their answers,
or the five-minute period had passed, the testing phase ended. After a break, each
participant was placed in the other display that they had not previously tested with.
The process was then repeated with a different scene and a different set of 21 faces
to avoid information overlap from the previous test.
For each numbered location in the scene, the participants verbally recalled the
name of the face at that location, as well as a confidence rating for their answer,
ranging from 1 to 10, with 10 being certain. If a participant had no answer for a
location, it was given a score of 0. The results were hand recorded by the study
administrator, keeping track of the number, name, user confidence, and any changes
in a previously given answer.
To mitigate any learning behavior from the first trial to the second, we em-
ployed a within-subject trial structure, using a 2 (HMD-condition to desktop-condition
vs desktop-condition to HMD-condition) x 2 (Scene 1 vs Scene 2 ) x 2 (Face Set 1
vs Face Set 2) Latin-square design. By alternating between the displays shown first
(2), the scenes (2), and the faces (2), we expect to mitigate any confounding effects.
At the end, each participant was tested on the two display conditions, desktop and
HMD, on two different scenes, and with two different sets of 21 faces. We note that
participants could have used personal mnemonics to help remember the locations
and ordering of faces. However, since we evaluated recall for each participant over
48
a desktop and a HMD, their performance should be counterbalanced between the
two display conditions.
2.4 Results
Our hypothesis is that a virtual memory palace experienced in an immersive
head-tracked HMD (the HMD condition) will lead to a more accurate recall than
on a mouse-controlled desktop display (the desktop condition). In addition, we
hypothesized that participants should be more confident in their answers in the
headset and make fewer mistakes or errors in recall. Our null hypothesis is that
there is no statistical difference between the accuracy and confidence of results
between the HMD and desktop conditions, and that there is no statistical difference
in the ordering of the display conditions.
We confirmed using a four-way mixed-ANOVA that there were no statistically
significant effects on recall due to the scenes (palace and town) F (1, 79) = 0.27, p >
0.05, the two sets of 21 faces F (1, 79) = 0.27, p > 0.05, or the ordering of display
conditions (HMD followed by desktop vs desktop followed by HMD) F (1, 79) =
1.93, p > 0.05. We found that there was a statistically significant effect for the
display condition (HMD vs desktop) with F (1, 79) = 4.6 and p < 0.05. This means
participants were able to recall better in the HMD condition as compared to the
desktop condition, permitting us to reject the null hypothesis.
49
2.4.1 Task Performance
The overall average recall performance of participants in the HMD condition
was 8.8% higher compared to the desktop condition with the mean recall accuracy
percentage for HMD condition at 84.05% and the desktop condition at 75.24%.
Using a paired t-test with Bonferroni-Holm correction, we calculated p = 0.0017 <
0.05 which shows that our result was statistically significant. In Figure 2.5 we
present the overall performance of the users in the HMD condition as compared to
the desktop condition.
Figure 2.5: The overall average recall performance of participants in the HMD
condition was 8.8% higher compared to the desktop condition. The median recall
accuracy percentage for HMD was 90.48% and for desktop display was 78.57%. The
figure shows the first and third quartiles for each display modality.
50
2.4.2 Errors and Skips
The recall accuracy measures the number of correct answers. In addition, we
kept track of when participants in our user studies made an error in recall (i.e.
gave an incorrect answer) or skipped answering (i.e. did not provide an answer).
We show the percentile distribution of the average number of erroneous answers
per participant for each display modality in Figure 2.6. Participants in the HMD
condition made on average fewer errors than those in the desktop condition. The
total number of errors in the HMD condition for 40 people were 33 out of 840,
and in the desktop condition was 56 also out of 840. In addition, the difference in
the incorrect answers was statistically significant, shown using a paired t-test with
Bonferroni-Holm correction resulting in p = 0.0195 < 0.05.
Figure 2.6: The distribution of incorrect answers for each display modality showing
the median, first, and third quartiles.
In Figure 2.7, we showed that the number of faces for which participants
51
skipped an answer in the desktop condition was significantly higher than in the
HMD condition. This was shown to be statistically significant using a paired t-test
with Bonferroni-Holm correction with p = 0.0062 < 0.05, which reinforces that
participants in the HMD had better recall than those on the desktop.
Figure 2.7: The distribution of faces skipped during recall for each display modality
showing the median, first, and third quartiles.
2.4.3 Confidence
Previous work by Mania et al. [44, 58] examined user confidence with recall
accuracy. This allows us to study not only the objective recall accuracy but also the
subjective certainty of the user answers. We asked each participant to indicate their
confidence on a scale of 1 to 10, with 10 being certain, as a measure of how certain
they were in the correctness of their response, for each answer. The confidence scores
aggregated across all the 40 participants and all the 42 faces that each studied are
shown in Figure 2.8.
52
Figure 2.8: The overall confidence scores of participants in the HMD condition and
the desktop condition. Each participant gave a confidence score between 1 and 10
for each face they recalled. Those in the HMD condition are slightly more confident
about their answers than those in the desktop condition.
From Figure 2.8, we can see that users were slightly more confident in the HMD
condition than on the desktop condition. The average confidence values for the HMD
and desktop conditions were 9.4 and 9.1 respectively, ignoring skips. For the highest
confidence, a confidence score equal to 10, there was a statistical difference between
the number of correct answers given in the HMD and the desktop conditions, with
p = 0.009 < 0.05 using a chi-square test, and with p = 0.022 < 0.05 including
Yates community correction. However, confidence is not always an indication of
correctness. We wanted to see if the HMD condition was giving a false sense of
confidence. Figure 2.9 shows the number of errors given in each display based on
the confidence of participant answers.
53
Figure 2.9: The number of errors made for each display condition for various confi-
dence levels.
The results in Figure 2.9 show that when the users were less error-prone in the
HMD condition, their confidence was better-grounded in the recall accuracy, than
when in the desktop condition. In general, participants were more often correct in
the HMD condition than for the desktop condition for a given confidence level.
2.4.4 Ordering Effect
In our study we alternated the order in which participants were exposed to
the displays. Figure 2.10 shows the accuracy when using the desktop first followed
by the HMD versus using the HMD first and then the desktop.
For both the desktop and HMD conditions, users started with roughly the
same performance (accuracy) on both the desktop and HMD (desktop-1 and HMD-
1 in Figure 2.10); but when going to the other display, the performance changed.
54
Figure 2.10: The performance of participants going from a desktop to a HMD and
from a HMD to a desktop, showing the median, first and third quartiles.
When users went from a desktop to a HMD, their performance generally improved.
However, when the users went from a HMD to a desktop, their performance sur-
prisingly decreased. When comparing each participant’s first trials, the desktop-1
and HMD-1, their distribution of recall scores were not significantly different with
p = 0.62 > 0.05, but they were for the second trials, the HMD-2 and desktop-2,
with p = 0.025 < 0.05.
2.5 Discussion
We next report some interesting observations based on a questionnaire the
participants filled out after the study. All our participants were expert desktop users,
but almost none had experienced a HMD before. We believe that if there were to
be any implicit advantage it would lie with the desktop, given the overall familiarity
55
with it. Although we gave the participants enough time to get comfortable in the
HMD before we began the study, we observed that many were not fully accustomed
to the HMD, even though they performed better in it. We asked each participant
which display they preferred for the given task of recall. We explicitly stated that
their decision should not be based on the novelty or “coolness” of the display or the
experience. All but two of the 40 participants stated they preferred the HMD for
this task. They further stated that they felt more immersed in the scene and so were
more focused on the task. In addition, a majority of the users ( 70%) reported that
HMD afforded them a superior sense of the spatial awareness which they claimed
was important to their success. Approximately a third mentioned that they actively
used the virtual memory palace setup by associating the information relative to their
own body. This ability to associate information with the spatial context around the
body only adds to the benefit of increased immersion afforded by the HMD.
We note the interesting results we obtained with the display ordering. When
starting with the desktop and then using the HMD we observed a significant im-
provement as compared to starting with the HMD and then using the desktop. A
possible explanation for this could be that those who used the HMD first are able
to benefit from the HMD’s superior immersion, which they lose when they transfer
to the desktop. However, when the users start on the desktop they invest a greater
effort to memorize the information and therefore when they transfer to the HMD,
they not only keep their dedication but also gain from the improved immersion.
56
2.5.1 Study Limitations
In general, it is a difficult design decision to balance the goals of experimental
control and ecological validity. In our study, we placed the faces for a particular face
set in the same locations for all participants. However, since the participants were
free to look in any direction, the order of presentation of faces was self determined.
We could have restricted the participants to look at the faces in a pre-determined
order. However, we allowed the participants to look around freely, so that the re-
sults would achieve greater ecological validity. Randomization of faces could have
led to unintended consequences; having the Dalai Lama’s face next to Abraham
Lincoln’s in one instantiation could alter its memorability, as could the opportune
positioning of the Dalai Lama on a roof-top background. To avoid such inter-object
semantic saliency confounds, we decided to preserve the same ordering of faces for
all participants that viewed the scene with a given set of faces. We recognize that
not randomizing the stimuli in a within-subject design could introduce a bias. To
make sure that this did not result in any significant effects, we carried out a four-way
mixed ANOVA (reported at the beginning of Section 4) and we did not find any
statistically significant effects on recall due to the scenes, face sets, or the ordering
of the display conditions. Previous research, such as [59] points out the tradeoffs be-
tween experimental control and ecological validity for virtual environments. Parson
et al. [60] persuasively argues for designing virtual environment studies that strike
a balance between naturalistic observation and the need for exacting control over
variables.
57
The modality of interactive exploration of the virtual environment in the two
conditions was different (head tracking versus mouse tracking). Thus, differences
in the recall performance may be explained by this diverse interaction modality.
Our study did not attempt to distinguish the role of proprioceptive and vestibular
information from visual stimuli, but examined them in the respective contexts of
immersive HMD and desktop display conditions. It will be interesting to examine
the relative advantage of the diverse interaction modalities with the same display
modality, in future user studies.
2.5.2 Conclusions
We found that the use of virtual memory palaces in HMD condition improves
recall accuracy when compared to using a traditional desktop condition. We had
40 participants memorize and recall faces on two display-interaction modalities for
two virtual memory palaces, with two different sets of faces. The HMD condition
was found to have 8.8% improvement in recall accuracy compared to the desktop
condition, and this was found to be statistically significant. This suggests an exciting
opportunity for the role of immersive virtual environments in assisting in recall.
Given the results of our user study, we believe that virtual memory palaces offer
us a fascinating insight into how we may be able to organize and structure large
information spaces and navigate them in ways that assist in superior recall.
One of the strengths of virtual reality is the experience of presence through
immersion that it provides [19, 61]. If memory recall could be enhanced through
58
immersively experiencing the environment in which the information was learned, it
would suggest that virtual environments could serve as a valuable tool for various
facets of retrospective cognizance, including retention and recall.
2.5.3 Future Work
Our study provides a tantalizing glimpse into what may lie ahead in virtual-
environment-based tools to enhance human memory. The next steps will be to
identify and characterize what elements of virtual memory palaces are most effective
in eliciting a superior information recall. At present, we have only studied the effect
of in-place stereoscopic immersion, in which the participants were allowed to freely
rotate their viewpoint but not translate. It will be valuable to study how the addition
of translation impacts information recall in a virtual memory palace.
Other directions of future studies could include elements in the architecture of
the virtual memory palaces such as their design, the visual saliency of the structure
of model [62], their type, and various kinds of layouts and distribution of content
that could help with recall. Another interesting future work would be to allow
people to build their own virtual memory palaces, manipulate and organize the
content on their own, and then ask them to recall that information. If their active
participation in the organization of the data in virtual memory palaces makes a
meaningful difference, then that could be further useful in designing interaction-
based virtual environments that could one day assist in far superior information
management and recall tools than those currently available to us. Yet another
59
interesting future direction of research could be to compare elements of virtual
memory palaces that are highly personal versus those that could be used by larger
groups. Much as textbooks and videos are used today for knowledge dissemination,
it could be possible for virtual memory palaces to be used one day for effective
transfer of mnemonic devices amongst humans in virtual environments.
60
Chapter 3: Interactive Characterization of Cybersickness in Virtual
Environments using EEG
3.1 Introduction
With the resurgence of virtual reality (VR), cybersickness has become a grow-
ing concern for researchers, developers, and users alike. Previous studies have shown
that a large portion of the population (40%−60% according to a survey by Kolasin-
ski [1]) may experience moderate to severe cybersickness in virtual environments.
While there are several theories on reasons underlying cybersickness, there does
not exist an easy or systematic method of measuring and quantifying cybersickness
from one moment to another. Without the existence of a reliable tool to measure
and interactively quantify cybersickness, understanding and mitigating it remains
a challenge. Early work on studying cybersickness and motion-sickness relied on
examining physiological changes such as sweating and increased heart rate. Even-
tually, this led to a standardized self-evaluation form for determining the intensity
of sickness the person experienced, the Simulator Sickness Questionnaire [63]. A
limitation of this approach is that measuring the effects of cybersickness requires
either interrupting the subject during the experience (thereby affecting the experi-
61
ence itself and thus the results) or waiting until the end of the experience to assess
their symptoms, which relies on the subject accurately recalling their sickness. This
survey-based qualitative approach is unable to provide real-time quantitative mea-
surements, making it difficult to objectively assess real-time cybersickness in the
virtual environment.
In this chapter, we present the results of a user study that measures and ex-
amines cybersickness experienced by participants wearing a commercially available
HMD and EEG headset. For this study, we designed a 3D environment and a camera
path that was likely to evoke a moderate degree of cybersickness among participants.
During this experience, the subjects’ brain activity is measured using an EEG device
and compared against a baseline EEG, when the scene is stationary. In addition, we
also had participants continuously self-report their level of sickness with a joystick
interface. We compared the self-reported data with their time-frequency spectral
EEG information showing a correlation between the EEG data and the self-report
data.
This chapter makes the following contributions to understanding and quanti-
fying cybersickness in virtual environments:
• We establish that cybersickness in an immersive HMD is correlated with brain-
wave activity measured by EEG;
• We find a statistically-significant correlation of Delta, Theta, and Alpha-waves
with self-reported cybersickness;
• Our approach facilitates ease of measurement and characterization of cyber-
62
sickness by using inexpensive, commodity off-the-shelf devices for VR headsets
and EEG devices.
3.2 Related Work
LaViola [6] and Holmes et al. [7] found that common symptoms of cybersick-
ness include nausea, increased heart rate, disorientation, sweating, eye strain, and
headaches. One of the prevailing theories on the cause of cybersickness (also referred
as simulator sickness or visual fatigue) is the sensory conflict theory, which attributes
it to the dissonance between the visual and the vestibular sensory cues [1, 2]. This
happens, for instance, when a user is immersed in a moving virtual environment,
while stationary in the real world. The sensory conflict between what the eyes see
and what the body feels is believed to lead to a physiological sense of discomfort
and associated cybersickness. Cybersickness is closely related to motion sickness.
Motion sickness is often induced by the unsettling movement, such as travel in ve-
hicles or aircraft or amusement rides, but can also be caused with a mismatch of
visual and vestibular sensation. Some of the techniques to mitigate cybersickness
have therefore relied on minimizing this mismatch. A highly creative solution to re-
solving this mismatch was devised by Maeda et al. [64], who used galvanic vestibular
stimulation to produce the sensation of vection or movement. Riecke et al. [65] re-
duced motion sickness by increasing a user’s sense of self-motion without physically
moving. This was elegantly accomplished through auditory cues, seat vibrations,
and the introduction of subtle scratches in the periphery of the projection screen.
63
In contrast to the above, a highly innovative research direction has been in
examining the role of peripheral vision in cybersickness. Rebenitsch and Owen [66]
presented a thorough review of modern techniques to detect and measure cyber-
sickness and urge for more research in the minimally understood subject. In their
review, they state that the usage of EEG for such an endeavor is rare, noting only
one related previous work. In a seminal study, Lin et al. [67] found that a user’s
visual field of view was positively correlated to their simulator sickness (SSQ) scores.
More recently, Fernandes et al. [68] devised a clever solution to mitigating cybersick-
ness by strategically and automatically manipulating the field of view of the wearer
of a HMD based on virtual camera movement (full field of view when stationary and
narrow field of view when in motion).
3.2.1 Self-reporting Cybersickness
The most common method for measuring cybersickness is to measure the sever-
ity of the users’ symptoms using subjective self-reporting surveys [69]. A commonly-
used survey is the Simulator Sickness Questionnaire (SSQ) by Kennedy et al. [63]
which assesses sixteen symptoms, with each item rated on a scale of four (none,
slight, moderate, and severe). These symptoms have been further grouped into
three categories: oculomotor, disorientation, and nausea. Oculomotor symptoms
include effects such as fatigue, eyestrain, and difficulty in focusing. Disorientation
includes vertigo, dizziness, and blurred vision. Lastly, the nausea category includes
symptoms such as sweating, burping, salivation, and nausea [6]. While the self-
64
reporting surveys are quite informative, they have the shortcoming that they can
be administered only at the end of the simulator session or require the interruption
of an experiment for a study participant to fill out the questionnaire. Waiting till
the end loses the fine temporal granularity of cybersickness reporting. At the same
time, interrupting the participant in a continuous experiment may be undesirable or
even impossible. Further, an interruption may result in alteration of physiological
symptoms in the study participant which may impact their reporting. For instance,
the interruption could result in recovery from motion sickness due to the passage
of time and lack of sickness-inducing stimuli. Therefore, passive, but continuous,
approaches to measuring cybersickness are highly desirable.
Several biological metrics have been used to detect and measure the presence of
motion sickness and cybersickness. These include heart rate, respiratory rate, finger-
pulse volume, skin conductance, and gastric tachyarrhythmia [70]. A challenge with
these metrics is that not all people suffer from these symptoms when experiencing
cybersickness, and cybersickness is not the only cause of these symptoms. Other
studies use a user-driven metric, where a participant uses a clicker or a joystick to
continuously indicate when and how much cybersickness the participant is feeling
at that moment [71].
3.2.2 Measuring Motion Sickness with EEG
EEG has been used to measure motion sickness [71–75]. Previous papers have
focused on four frequency ranges, Delta (1.0− 4.0 Hz), Theta (4.0− 7.0 Hz), Alpha
65
(7.0−13.0 Hz) and Beta (13.0−25.0 Hz). Kim et al. [76] found that an increase in the
delta power with a decrease in beta power was indicative of cybersickness during an
object-finding VR experiment which used a rear-projected CAVE (cave automatic
virtual environment) display system to show the visual stimuli. Another study by
Min et al. [77] concluded that a decrease in delta power was indicative of a visually-
induced motion sickness in a car-driving experiment which used a standard rear-
projected display to show the visual stimuli. Chen et al. [71] built a driving simulator
using a motion platform inside a 360◦ rear-projection display, in order to provide
both visual and vestibular stimulation to induce motion sickness. In this study,
each participant used a controller to continuously log their level of motion sickness.
By using independent component analysis (ICA) with time-frequency analysis and
cross-correlation analysis, the authors were able to examine the EEG changes in
brain-wave activity when induced by both visual and vestibular stimuli. They found
a more complex interaction of power increases and decreases in different regions of
the brain as the level of motion sickness changed. Another set of studies by Naqvi
et al. [78, 79] recorded the EEG signals of participants viewing a movie on a 3D
LCD TV in either 3D or 2D in order to determine if 3D movies cause a greater
visual fatigue. Their study found a decrease in theta-power for the frontal regions
of the brain in addition to a decrease in beta power in the temporal region. To
the best of our knowledge, there has not yet been a systematic study that has
used EEG to measure and quantify cybersickness for users in immersive virtual
environments wearing head-mounted displays. Given the previous work quantifying
motion sickness using EEG, we also believe EEG is appropriate for quantifying
66
cybersickness.
3.3 Materials and Methods
Our study evaluates the EEG dynamics of cybersickness from binocular visual
stimuli in a virtual reality head-mounted display. To the best of our knowledge, this
is the first study that uses EEG signals to continuously evaluate cybersickness in
participants wearing head-mounted displays.
We used a 14-channel, 128 Hz, Emotiv Epoc EEG device, which has been
successfully validated [80] and used for EEG research for a variety of research studies,
including measuring the cognitive load [81], examining the relationship between the
environment and happiness [82], and as a proof of concept for robust and mobile
EEG recording in the outdoors [83]. We used the HTC Vive head-mounted display,
which has a 110◦ field of view and a resolution of 1080× 1200 per eye with a refresh
rate of 90 Hz, for both eyes. In this user study, the participants were limited to
rotational viewing with no translational movement permitted. The participants
viewed a 3D stereo rendered scene in the head-mounted display that involved a fly-
through of a virtual spaceport with twisting, turning, accelerating, and decelerating
of the virtual camera. A screenshot of the scene is shown in Figure 3.1. In addition
to the HMD and EEG devices, we used a Thrustmaster joystick device, which the
participants used to manually record their current level of cybersickness during the
camera fly-through. The participants were instructed to indicate, by tilting the
joystick in any direction, the magnitude of their sickness. They were told that no
67
tilt indicated that they felt no sickness and that full tilt indicated extreme sickness.
We have examined the correlation between the sickness reported by the participants
and their EEG brain-wave recordings.
Figure 3.1: A still from the virtual spaceport flythrough used in our cybersickness
study.
3.3.1 Participants
We recruited 44 participants from our university campus and surrounding
community for the user study, of which 31 were male and 13 were female, with
an average age of 27 and standard deviation of 8 years. Every participant had
normal or corrected-to-normal vision (self-reported). The study session for each
participant lasted around 30 minutes. Due to technical problems associated with
the EEG recording interface, we had to discard one participant’s data. We have
68
used the EEG data from the remaining 43 participants for our analysis. Stanney
and Kennedy [84] note that 3040% of participants in flight-simulator studies do not
experience simulator sickness.
3.3.2 Experimental Protocol
Each study participant was explained the entire procedure and how they would
interact with the HMD and the input joystick mechanism. First, the EEG device
was placed on the participant’s head and manually configured until the EEG device
showed that all electrodes had registered good contact with the head. Second, the
participant donned the HMD and their interpupillary distance (IPD) was adjusted
so that the participant could comfortably see the 3D stereo rendering in the HMD.
Third, the participant was given the joystick in their hand and explained that they
had to push the joystick based on how cybersick they felt. Finally, each participant
was given 60 seconds to get used to and comfortable with the EEG, HMD, and the
joystick. Throughout the entire study session, the participants were standing while
wearing both the EEG and HMD as well as holding the joystick in their hands.
After being acquainted, a baseline EEG reading was taken. Inside the HMD the
location of the user was static, but they were allowed to rotate their view direction
by turning their head. The participants were asked to make slow and deliberate
head movements while wearing the HMD and EEG devices to minimize any risk
of injury and electrode separation. After the baseline reading, the participant was
re-instructed to use the joystick device to report their sickness levels and was then
69
virtually flown through the scene, which lasted approximately one minute. Our pre-
study trials showed that if a participant was at all susceptible to cybersickness, they
would most certainly feel sick in the one-minute virtual fly-through. Restricting the
flythrough to one minute kept the exposure time minimal for participant safety but
long enough to record a satisfactory amount of data. Throughout the flythrough,
each participant continuously logged their self-reported level of cybersickness using
the joystick, with no tilt corresponding to no reported sickness, and full tilt as severe
sickness.
3.3.3 Signal Acquisition and Pre-Processing
As discussed earlier, we recorded the brain-wave activity using an Emotiv
Epoc EEG with 14-channels sampling at 128 Hz. The name and locations of each
of the nodes/channels The EEG headset uses a saline electrolyte solution on the
contact heads. The raw data was acquired and saved to disk using the Emotiv Epoc
C++ SDK which was integrated into our rendering program. This enabled the
EEG recording and camera path to be synchronized for all participants. bf We used
Matlab with the very commonly used EEGLAB for the EEG signal management
and processing (https://sccn.ucsd.edu/eeglab/). The first part of signal processing
involved importing each participant’s raw Emotiv EEG data into Matlab and then
into EEGlab for both the baseline and virtual flythrough recordings. Once the
data is loaded, the mean power for each channel is calculated and subtracted from
that channel’s data, centering the signals. Next, a high-pass filter with a cut-off
70
frequency of 1 Hz was used in conjunction with a low-pass filter with a cut-off
frequency of 50 Hz to remove any unwanted noise from the signals. Next, the
filtered EEG signals are manually inspected for any recording anomalies, which can
occur if a subject moves too abruptly or if an electrode temporarily loses contact.
We manually remove those erroneous sections or reject the EEG sample entirely. In
our study, we found EEG recording anomalies for one subject, and we removed its
data from further processing. After pre-processing, we exported the data into an
EEGlab study package.
While participants are viewing the fly-through of the space-port, they reported
their level of cybersickness with the joystick, with more tilt indicating stronger
sickness. The joystick was sampled at 90Hz, the same as the frame-rate of the
HMD. The cybersickness level is a score between zero and one, which is reported
continuously in real-time without interrupting the experiment.
3.3.4 Independent Component Analysis
Similar to previous work on EEG-signal analysis [71,85], we decompose our fil-
tered EEG signals, for each subject, into independent components, or brain sources,
using Independent Component Analysis (ICA) [86] using EEGLab. The intuition
behind the use of ICA is that the observed EEG signals are the result of a mixture
of sources throughout the brain and scalp, which are assumed to be independent,
such as eye-blinks, muscle movement, or other psycho-physiological stimuli, includ-
ing cybersickness. In our study, we apply ICA to the EEG recordings for each
71
individual subject, resulting in 14 independent components per participant. From
the calculated independent components, we cluster similar components using the
built-in EEGLab K-means independent component clustering functionality. The
idea is to cluster similar independent components so that similar underlying phe-
nomena across participants are grouped together for further analysis and separates
undesired phenomena from our target source. The resulting scalp maps from the
clustered independent components are shown in Figure 3.2. From the 14 gener-
ated clusters, we found one to be most representative of cybersickness as it had the
most statistically significant difference between the baseline EEG frequencies and
the virtual flythrough EEG frequencies. These generated clusters, along with the
one selected cluster, are shown in Figure 3.2. The selected cluster, which to EEGLab
is cluster 12, from this point forward to refer to it as cluster A. Figure 3.3 shows the
selected cluster with the EEG node labels in more detail. The Emotiv Epoc uses 14
electrodes, AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4.
Figure 3.2: Averaged scalp maps of clustered independent components. The scalp
map which correlated with cybersickness is shown in the black box.
72
Figure 3.3: The names and locations of the 14 EEG electrodes in the Emotiv Epoc
headset.
3.3.5 Time-Frequency Analysis
During the study, participants continuously logged their current feeling of
cybersickness using a joystick while virtually flying through of the spaceport. We
correlated the participants’ self-reported cybersickness levels with the ICA cluster
power spectra. As the reported level of cybersickness changes, we hypothesized that
the ICA power should also change at different frequencies relative to the strength
or weakness of reported cybersickness. To calculate the time-frequency spectra, we
used the EEGlab ERSPs (Event Related Spectral Perturbation) function, resulting
in an average of the power-spectrum density over time. The power-spectrum density
is then converted into decibel power by the EEGLab.
73
(b) The camera does a fast
(a) The start of the fly through
acceleration close to the surface
with the camera slowly moving
of the spaceport at the 20-second
towards the spaceport
mark.
(d) The free-fall suddenly
(c) A sudden and fast drop off
decelerates near one of the
the edge of the spaceport,
landing platform, around the
around the 35-second mark.
43-second mark.
74
(e) The camera starts to (f) Finally, at the 60-second
accelerate directly upwards. This mark, the fly-through arrives at
moment occurs at the 50-second a ledge and shows the large
mark. depth of the spaceport.
Figure 3.4: The virtual camera flythrough of the spaceport that each participant in
our study experienced. Note how the above correspond to the self-reported cyber-
sickness levels in Figure 3.5.
75
3.4 Results
In this section, we review the results of our user study exploring cybersick-
ness in virtual reality using EEG. First, we review the subjective sickness levels and
symptoms as reported by each participant during and after the experiment. Sec-
ond, we examine the results of the EEG analysis, showing a statistically significant
difference between the averaged baseline and cybersickness EEG recordings. Third,
we review the time-frequency spectral power graphs and compare them to the con-
tinuous self-reported sickness levels and show that there is a correlation between
them.
3.4.1 Self-Reported Cybersickness
After the baseline EEG recording was taken, the EEG measurements during
the virtual flythrough phase began. Each study participant was told that they would
virtually fly-through the spaceport and if they felt any of the previously mentioned
symptoms that they were to indicate their presence and strength by tilting the
hand-held joystick device. We refer to this input as the participants’ self-reported
cybersickness levels, which are shown in Figure 3.5. In addition to the joystick
information, each participant completed an SSQ form at the end of the study.
The highest peaks of the average of participants’ self-reported cybersickness
levels, shown in the bold black curve of Figure 3.5, can be attributed to specific
events that occurred in the spaceport fly through (see Figure 3.4). The first peak
corresponds to a sudden burst in camera acceleration in close proximity to the
76
Figure 3.5: The self-reported cybersickness levels, using joystick, for each participant
are shown in the thin colored curves. The bold black curve shows the average of all
the participants’ self-reported cybersickness levels.
surface of the spaceport. The second peak corresponds to a sudden free-fall off an
edge of the spaceport. The third peak aligns with the sudden and hard pull-up of
the camera after free-falling from the previous event. The fourth peak corresponds
to the sudden acceleration upwards after the initial camera pull-up. The final peaks
correspond to a sudden deceleration of the camera as it comes to rest on the landing
platform of the spaceport.
In addition to the continuous self-reported information from the joystick, the
Simulator Sickness Questionnaire (SSQ) scores from each participant were collected
at the end of the session. The SSQ consists of 16 questions with 4 severity options,
with values between 0 to 3, with 0 as none, 1 as slight, 2 as moderate, and 3 as Ex-
treme. The average SSQ scores for each of the 16 symptoms and their variances are
shown in Figure 3.6. Based on the information from the graph it can be concluded
77
that our participants primarily experienced varying levels of vertigo, dizziness, and
general discomfort.
Figure 3.6: Participant Simulator Sickness Questionnaire (SSQ) scores after the
experiment are shown here. The plot shows the median, first and third quartiles
(orange and grey respectively), with the minimum and maximum shown as error
bars.
Our study participants self-reported cybersickness through either the joystick
or the SSQ survey. From the distribution of SSQ and joystick scores, we can see that
the participants rated their level of cybersickness from mild to severe. This wide
range of symptom intensities suggests that the brain-wave EEG data should be di-
verse and that not all users will be a part of the cybersickness revealing independent
clusters.
In Figure 3.7 we show a comparison of the average level of self-reported cyber-
sickness as reported through the joystick with the sum of reported SSQ sickness level
78
for each participant. The two distributions shown are correlated with a statistically
significant score of 0.49 using a Pearson Correlation.
Figure 3.7: A comparison of the average score as reported by the joystick with the
SSQ sum for each participant. The SSQ score and the self-reported cybersickness
using the joystick have a Pearson Correlation r-value of 0.49.
3.4.2 Spectral Differences
In this section, we compare the differences between the spectral frequencies of
the EEG recordings of the baseline (green curve) and the virtual flythrough (purple
curve). From the frequency spectra of the 14 clusters shown in Figure 3.2, we have
closely analyzed the one labeled as the 12th cluster, hereafter referred as cluster A.
The selected cluster had statistically significant differences between the baseline and
the virtual flythrough EEG frequency spectra. Further, it represented a meaningful
79
fraction of the participants, composed of 24 out of the 43 total (55%) participants.
We find it interesting that this is in general agreement with previous research by
Stanney and Kennedy [84] in which they noted that 3040% of the participants in
flight-simulator studies do not experience simulator sickness.
Figure 3.8 shows the mean component power spectra of the selected indepen-
dent component cluster A for the baseline and virtual flythrough conditions. It
is clear from the figure that there is a power increase across many frequencies for
participants experiencing the virtual flythrough of the spaceport. In the component
cluster spectra plot, we indicate where the EEG power changed significantly using
paired t-tests. For the selected ICA cluster, we see that the difference between the
baseline and virtual flythrough frequency spectra are statistically significant with
(p ≤ 0.01 for much of the frequency range), using EEGLab’s built-in paired t-test
with Bonferroni-correction statistical analysis. Similar to previous work which stud-
ied motion sickness, we also see a power increase across many frequency bands for
the virtual flythrough scenario compared to the baseline.
The EEG spectral power differences, between the baseline EEG recording with
the stationary scene and during the virtual flythrough, indicate that cybersickness
can be detected using EEG. To be more specific, we have identified that an increase
in spectral power, with respect to a baseline recording, is indicative of the onset
of cybersickness. For both recording sessions, the participants used the EEG, the
HMD, the joystick, and experienced the same environment while standing. The only
difference was the camera motion during the virtual fly-through. We next look at
the frequency spectra over time for the selected cluster, to examine when specifically
80
Figure 3.8: Comparison of the EEG power spectra between the baseline (blue) and
virtual flythrough (green) for ICA cluster A. The paired t-test with Bonferroni-
correction between the two spectra reveal p < 0.001 for much of the frequency
ranges.
a participant experienced cybersickness and correlate them with their self-reported
cybersickness levels.
3.4.3 Time-Frequency with User input signals
During the virtual flythrough of the spaceport, the study participants contin-
uously recorded their current levels of cybersickness through a joystick device. The
self-reported cybersickness levels for each participant are shown in Figure 3.5 along
with the average sickness level shown in black. We used time-frequency analysis
to evaluate the EEG spectra changes across all participants with the self-reported
81
sickness levels. The values for all frequencies, averaged over all users, for cluster A
is shown in Figure 3.9.
Figure 3.9: Time-Frequency visualization of cluster A. The average self-reported
cybersickness levels are shown below in red.
The average self-reported cybersickness is shown below the time-frequency vi-
sualizations in red. We observe a correlation between the spectral power changes
shown in the time-frequency plots, especially for the lower frequency bands, and the
82
䄀瘀攀爀愀最攀 匀椀挀欀渀攀猀猀 䰀攀瘀攀氀
self-reported cybersickness levels from the participants. To assess the degree of cor-
relation, we computed the correlation between the time-frequency band values and
the average self-reported cybersickness information. The Pearson correlation r-value
scores for each of the four frequency bands are presented in Table 3.1. Figure 3.10
compares each of the frequency bands with the average self-reported cybersickness
levels over time.
Table 3.1: Correlations (Pearson R-values) between average ERSP values for the
four frequency bands and the self-reported cybersickness levels. All the correlations
are statistically significant (p < 0.001). The graphs of the various frequency bands
for clusters A can be seen in Figure 3.10.
Frequency Band Cluster A
Delta Band (1.0 − 4.0 Hz) 0.642
Theta Band (4.0 − 7.0 Hz) 0.589
Alpha Band (7.0 − 13.0 Hz) 0.476
Beta Band (13.0 − 25.0 Hz) 0.465
Our analysis shows that a statistically significant and high correlation exists for
Delta, Theta, and Alpha bands for cluster A with the self-reported cybersickness
information from the participants. This is perhaps best illustrated in the Time-
Frequency plot with the self-reported cybersickness through joystick input for one
of the participants, as shown in Figure 3.11.
83
Figure 3.10: Average over four frequency Bands for Cluster A compared with the
average self-reported cybersickness (in green).
Figure 3.11: Visualization of the ERSP from a cluster A participant with self-
reported cybersickness levels. Note how the changes in ERSP values, especially for
Delta and Theta bands, align with the participant’s self-reported cybersickness.
84
3.4.4 External Factors
We note that the high-correlations between the averaged time-frequency sig-
nals of the different EEG bands and the average joystick signal may be due to a
confounding effect of increased cybersickness and the actual movement of the joy-
stick. However, the joysticks movements were very sparse and were not sustained
over long periods of time. The participants were instructed to tilt the joystick only
when they felt their level of cybersickness changed. It has been shown in previous
studies [71, 87, 88] that changes in spectral power as a result of finger and hand
movements for sustained attention tasks diminished quickly, within the order of a
few seconds. The effect would also result in a spectral change and rebound effect
within that short period of time, which we do not see in our EEG signals. In ad-
dition, during the baseline recording, our participants held and moved the joystick
(randomly) to simulate the same effect and these do not appear in the EEG signals
either. Therefore, any changes of the joystick would not have influenced the overall
spectral frequency and power differences of our analysis.
Another external factor for consideration is head movement. Participants were
instructed to freely look around the same environment that they would be placed
in during the fly-through (the spaceport) in the baseline recording session. During
the fly-through, the participants could also freely look around their environment as
the camera flew through the scene. Therefore, any significant spectral differences
between the baseline and sick condition are unlikely to be due to head movement.
85
3.5 Conclusions and Future Work
Throughout the course of the study, we witnessed a wide range of reactions to
the rendered stimuli. Some participants experienced minor discomfort, while others
experienced moderate to high levels of cybersickness. Each participant was asked to
briefly report what aspect of the experience made them the most cybersick. They
reported that the sudden changes in direction and velocity of movement made them
feel ill compared to when the motion was more smooth. In addition, they reported
that the anticipation of where the camera was going to move heightened their reac-
tion. Lastly, they expressed that if they were in control of the camera, as opposed
to the camera being automatically moved, they might have felt less sick due to prior
knowledge and mental preparation of what was about to happen. One observa-
tion that the test administrator made was that approximately 70% of participants
would lean their bodies, with varying (in some cases, almost alarming) degrees of
tilt, based on the motion of the camera. Approximately 32% of participants had
previous experience with a head-mounted display.
In this chapter, we have presented our findings of a user study with the goal
of continuously measuring and quantifying cybersickness. In our study, the partici-
pants wore both an HMD and an EEG recording device, while being presented visual
stimuli of a virtual flythrough in a spaceport. The recorded EEG data was decom-
posed using ICA to separate the underlying sources of the brainwave activity and
eliminate noise. The independent components were then clustered across users for
the purposes of comparing the EEG of those grouped users. Through independent
86
component analysis and time-frequency spectral analysis, our findings suggest that
a spectral power increase in the Delta, Theta, and Alpha frequency bands, relative
to a baseline, strongly correlates to the presence of cybersickness.
Our findings in this chapter are just a first step to the many opportunities that
present themselves in using EEG to study cybersickness in virtual environments.
Some of the more important amongst these include a better understanding of the
sources of cybersickness, the relationship of the duration of immersion to cybersick-
ness, and the effect of age and gender on cybersickness. A number of cybersickness
mitigation strategies have been studied over the last decade, but their evaluation
has been largely based on questionnaires at the end of the immersive experience. An
exciting direction of future work is in continuous evaluation of the effectiveness of
cybersickness mitigation strategies, while the user is immersed in the virtual world.
In our study, the participants were not asked to perform a task. It would be inter-
esting to explore what effect if any, task performance has on cybersickness. Finally,
it will be highly desirable, if at all possible, to move towards standards of assessing
cybersickness and to use them to rate hardware (headsets, trackers, and displays)
as well as the content (games, performances, and other immersive experiences).
87
Chapter 4: Enhancing Deep Learning with Visual Interactions
4.1 Introduction
Computer-based semantic learning systems have made impressive strides in
the last few years, but there remains a striking disparity between the abilities of
humans and machines. Current-generation deep-learning systems require thousands
of finely-labeled images to train. If a child needed a thick stack of images of cats
to learn what a cat looks like, we would be in deep trouble [89]. The goal of this
chapter is to take the first steps to bridge this gap, by using visual interactions
to help enhance the performance of deep learning. Furthering the capabilities of
deep learning through interactions can help it emerge as a powerful engine for new
discoveries in high-dimensional data.
A challenge with deep learning, as previously mentioned, is that it requires
large amounts of precisely annotated and labeled data for training purposes. Recent
advances in data collection and annotation have allowed for the creation of large
high-dimensional datasets, consisting of millions of points, which are often used to
train deep-learning models. However, in many real-world datasets, the labels and
annotations provided may be incomplete or not capture all the distinctions within
the data, known and unknown. Precisely annotating a high-dimensional dataset
88
that contains many labels is expensive, time consuming, and error-prone, caused
possibly by mislabeling data-points, the unintended omission of precise labeling, or
the subjectivity of the annotator(s).
Semi-supervised learning aims to improve classification performance by using
labeled and unlabeled examples. In this work we extend semi-supervised-learning
to include coarse-labeling that a user can refine based on visual feedback. Since it
is often easier and faster to carry out coarse labeling, we expect our approach to
be broadly applicable to many more datasets where coming across large amounts of
precisely labeled data is difficult.
In this chapter, we present our approach to facilitate visually-driven deep
learning that enables the refinement of a coarsely-labeled dataset through intuitive
interactions by leveraging the latent structures present in high-dimensional datasets.
Through the combined efforts of human analysts and deep learning, we hope to not
only facilitate discovery of hidden communities and structures within these high-
dimensional datasets, but also start to bridge the gap in our understanding of deep
learning.
This chapter makes the following contributions to interaction-assisted deep-
learning for high-dimensional semantic labeling:
• We use permutation-invariant deep neural networks to generate 2D point dis-
tributions such that similar points are proximal.
• We iteratively and manually refine the distribution and labeling of data points
using visual feedback, which are in-turn used to refine the deep neural net-
89
works.
• We facilitate the discovery of detailed latent groups or labels from datasets
that only consist of a few high-level or no labels.
4.2 Related Work
A new focus on bringing human judgment and intuition back into the data ex-
ploration process has recently gained popularity. Individually, machines and people
are generally good at solving different problems, but the union of their strengths
will lead to new and better insights. Our system builds upon the ideas by Turkay
et al. [90], who have shown that placing the human into the data analytics process,
particularly for high-dimensional data, is beneficial especially when the temporal,
perceptual, and cognitive abilities of the user can be leveraged. Recent results also
show the benefits of increased immersion the on the recall of spatially organized
data [91].
We drew inspiration from the work by Endert et al. [92], whose insight was to
use a model that is continuously updated with information from user interactions,
and of Sacha et al. [93], who have shown that although there exist many tools and
techniques for dimensionality reduction, there is a lack of general-purpose tools for
human-in-the-loop interactive dimensionality reduction and exploration.
We next present a review of related research, touching on dimensionality reduc-
tion, high-dimensional visualization, interactive analysis of high-dimensional data,
label generation, active learning, along with comparisons between traditional semi-
90
supervised learning and our goal, label discovery and refinement.
4.2.1 Dimensionality Reduction
The idea behind dimensionality reduction is to capture the dominant trends in
the data and project them to a lower dimension. We use dimensionality-reduction
techniques to produce a human comprehensible representation of an input dataset
to allow analysts to visually detect clusters and refine the data labeling. Since the
previous work on dimensionality reduction is extensive, we only refer to a small
representative set here.
One set of dimensionality reduction techniques is based on Taylor-series ex-
pansions. Roweis and Saul [94] introduced and compared locally linear embedding
(LLE) projections of data to those produced by principal component analysis (PCA)
and multi-dimensional scaling (MDS) [95]. Belkin and Niyogi [96] introduced Lapla-
cian eigenmaps that generalize the idea of LLE by computing a low-dimensional
manifold representation of a high-dimensional data set such that local neighborhood
information is preserved in a least-squares sense. The dimensionality reduction by
this algorithm produces an approximation that reflects the intrinsic geometric struc-
ture of the high-dimensional dataset. The idea of spectral dimensionality reduction
has also been used in visual depiction of graph relationships [97].
A popular approach, introduced by Maaten and Hinton [98], is a technique
called t-Distributed Stochastic Neighbor Embedding (t-SNE), which is often used
to generate visualizations of high-dimensional data in two dimensions. This is done
91
using a modified algorithm based on Stochastic Neighbor Embedding by Taylor
et al. [99]. Their approach generates a view that reveals the structure of low-
dimensional manifolds within the high-dimensional datasets and visualizes those
manifolds using a collection of two-dimensional scatterplots. The algorithm is non-
linear, performing different transformations on different regions of the feature space,
looking for similarities in these smaller regions. The need to search through and
check many different configurations of hyper-parameters just to generate the visu-
alization, such as step and perplexity, can lead to a lot of time being spent simply
searching for meaningful and revealing information, while also potentially leading
to incorrect interpretations of the underlying data. According to Wattenberg et
al. [100], t-SNE does not always produce meaningful representations. For our work,
it is crucial that the generated visualizations facilitate intuitive and accurate inter-
pretations of the data.
We next review deep-learning-based dimensionality-reduction techniques. Hin-
ton et al. [101] popularized the idea of using a multilayer neural network to perform
dimensionality reduction. Hadsell et al. [102] developed a convolution-based encoder
using deep learning to perform dimensionality reduction to create a globally coherent
nonlinear function that maps the data evenly on an output space. The advantage
of these techniques is that the learning relies only on point-neighborhood relation-
ships and does not require any distance measurements in the high-dimensional input
space. More recently, Chen et al. [103] developed a system using deep learning Au-
toencoders to perform dimensionality reduction on hyper-spectral data. In their
approach they combine neighboring pixels to generate spatial feature sets which
92
produce output class labeling after autoencoding and logistic regression. We use a
modified version of Hadsell et al. [102] method to enable handling of coarsely labeled
data.
Traditional dimensionality reduction and machine learning approaches, such
as the ones reviewed above, require fully and accurately labeled data. Further, in
contrast to our approach, they are single-shot techniques, with no mechanism to
leverage the analyst’s domain knowledge or pattern recognition ability to guide,
shape, or influence the clustering of the data.
4.2.2 High-Dimensional Community Visualization
Visualizing high-dimensional data is a core element of our system since our
goal is to reveal hidden patterns and communities within. In this section we review a
selection of previous recent work on high-dimensional data visualization. A common
technique for high-dimensional visualization is multidimensional scaling (MDS), pi-
oneered by Kruskal [95]. The goal of MDS is to visually group data objects such
that similar objects are spatially close to each other and dissimilar data objects are
far away, as determined by a similarity function. The idea of plotting similar objects
in close spatial proximity is fundamental to our own work.
One example of using spatial proximity for conveying information is the work
by Amir et al. [104] who have created a 2D visualization tool for the analysis of
high-dimensional biological-cell data. Each individual cell is represented as a point
in a scatter plot, with the positioning of each point calculated using the t-SNE [98]
93
algorithm.
One of the most common and easiest-to-interpret visualization techniques is
the scatterplot. Dang and Wilkinson [105] have improved the classical scatter-
plots with scagnostics (scatter-plot diagnostics). One of the main problems in using
scatter-plots is that as the dimensionality of the dataset increases, the number of
generated scatter-plots also increases rapidly. To get around this problem, Dang
and Wilkinson developed techniques to enable discovery of hidden structures within
a subset of scatter-plots through various transformations.
The work we present here takes inspiration from the previous visualization
works mentioned. Our goal is to build upon the foundations of that work and expand
the capability of visualization-based investigation by incorporating and interleaving
an intelligent system (deep learning) into the exploration process, rather than make
an advance in the field of visualization itself.
4.2.3 Interactive Analysis of High-Dimensional Data
The ability to directly interact with high-dimensional data in order to search
for trends or other interesting information is crucial. For our system, we expect a
human analyst to understand, interact, and interpret results based on the output of
a dimensionality reduction algorithm. In this section we review some of the previous
works whose goal was to provide a method to interact in an iterative manner with
high-dimensional datasets.
Ip et al. [106] developed a tool that facilitated the visual exploration of hid-
94
den spatial structures in volumetric datasets. The tool used a 2D intensity-gradient
histogram to enable a user to iteratively search for interesting regions in the 3D vol-
ume. The interaction involved manipulating and selecting regions of the generated
histogram through normalized cuts. We were inspired by the work done by Ip et al.
and have also included the usage of normalized cuts to aid in our own segmentation
and clustering process.
Chen et al. [107,108] developed a system which leverages deep learning to guide
user exploration of interesting high-dimensional and temporally changing structures
within volumetric cell data. Rather than using the raw features provided by the
data, deep learning is able to transform those features into compact and seman-
tically meaningful representations, which better capture and distinguish biological
properties, such as boundaries and other components. Using this new feature space,
a quantifiable metric of similarity between these deep-learning constructed features
may be calculated, simplifying the transfer function used for volume rendering, giv-
ing rise to the interesting structures within the original biological data.
Liu et al. [109] developed an application that allows for the interaction and
exploration of high-dimensional data presented in low-dimensional space. The pri-
mary contribution in their work is the use of distortion-guided manipulations, where
a user can select a data-element and then move or delete it, causing point-wise dis-
tortion measures to be re-calculated and visualized. As a data-element is moved in
the 2D space, a global structural change occurs on the fly, which provides informa-
tion regarding the relationship between different parts of the 2D projection. Liu et
al. [110] further expanded upon the previous work with the inclusion of a subspace
95
view navigation graph. This graph allows for animated transitions between differ-
ent subspace views to facilitate easy comparison between those views. Our work
shares a similar goal to Liu et al., exploring high-dimensional data, but their tool
is designed for understanding how different variables are related in low-dimensional
space, by changing the configurations of the projection, rather than finding new
subsets/categories in low-dimensional space. In addition, there is no mechanism in
place for influencing the projections based on other input data, such as a newly
discovered group or label information.
4.2.4 Label Generation
For many applications, it is useful to be able to generate labels for various
content. For example, these labels could be hashtags for Twitter [111], keywords
for image or scene search [112] or for music labeling [113]. Traditional labeling
classification consists of learning from a dataset where each data example is asso-
ciated with a known single label. For example, if there are only two distinct labels
in the dataset, the problem is known as binary classification [114]. In multi-label
classification each data example is associated with more than a single label. A com-
mon application for multi-label classification is image labeling, online search, and
machine learning [115] [116]. However, the problem this chapter tackles is often
considered a challenge that many multi-label classification approaches suffer from,
which is the lack of large, precisely labeled datasets to train on. Rather, many
datasets have missing, or over simplified labels. This is the challenge we tackle in
96
this chapter, identifying, differentiating, and re-labeling hidden labels and groups
within high-dimensional data.
One way to get around this problem is to pre-train the classification network
(in the case of deep learning) in an unsupervised manner [117], without the usage
of labels. Erhan et al. [117] have shown that pre-training the classification network
allows the layers to focus on and capture the variation and nuances of the data
itself, which allows for better regularization and generalization of the network. An
example of this is the work done by Hinton and Salakhutdinov [101], who use an
autoencoder to pre-train a deep learning network to learn a low-dimensional em-
bedding of high-dimensional data. The primary purpose of the previous work is to
increase the performance of trained models by seeding the weights in the network
through pre-training. Instead of modifying the methods behind training networks,
we are improving model performance by modifying the training data itself to be
more precise and accurate, resulting in a better model. The benefit of our approach
is that the result of our technique, an improved training set, can be leveraged by
all forms of machine learning which use training data, not just neural networks, as
well as other applications where labeled datasets are required.
Lee [118] developed a technique that uses labeled and unlabeled data to train
a deep neural network in a semi-supervised fashion. For the data with unknown
labels, a pseudo-label is assigned with the maximum predicted probability after
every update cycle, which is then used as if it were the true label. Testing their
approach using the MNIST dataset, they were able to develop a deep neural network
capable of state-of-the-art classification performance. Similar to the previous work,
97
our system is designed to help discover the true label of the data-point based on
the given labeling present in the dataset. As the data points are relabeled, they
are placed back into the dataset and are used for re-training the network. However,
our technique benefits from the ability of an analyst to use their judgment and
domain knowledge to help drive the re-labeling of data, while also benefiting from
the suggestions and output of a model. If the model should make a mistake in
labeling, a human analyst can step into the loop and correct the mistake, before
it propagates and influences the future labeling of data. Lastly, the previously
mentioned approach can only assign data points with a label that already exists in
the dataset, and cannot generate new labels or groups, which may be necessary if a
large enough portion of the data is without a label.
There has not been much work on including human-in-the-loop interactions
to enhance labeling accuracy. Much of the prior work relies on statistical models
that are limited by the domain knowledge of the dataset or expert availability.
One example of rectifying crowd sourced labeling with expert review is the work
by Hung et al. [119]. In their work, they use statistical methods to generate a
few meaningful questions about the data, previously labeled by crowd-sourcing, to
minimize the amount of expert time needed to correct that labeling. The output of
the crowd sourcing and expert refinement of the labeled data may then be used for
machine learning. The authors found that they were able to reduce the amount of
time needed from domain experts to refine and correct the labeling to achieve near
perfect classification. While the impact and usefulness of the previous work cannot
be understated, the weakness of the previous method is that there remains a large
98
reliance of crowd sourced information to provide the majority of labels and handle
the burden of labeling the data. This may not always be possible, as the domain of
the data to be labeled may not be easily understood by the population in general
(such as hyper-spectral imagery or computer-network data). The datasets typically
targeted for crowd sourcing, and those used in the previous work, are image tagging
and sentiment analysis. As the ability of a crowd sourced audience to provide labels
reduces, the burden on the expert increases dramatically, and the effectiveness of
the previous work diminishes. Our tool is designed to handle high-dimensional and
abstract data, that at face value would be daunting to label without additional meta
information (such as maps or domain knowledge), and to present that data in an
easy-to-interpret-and-manipulate format regardless of the type of data.
4.2.5 Deep Learning Semi-Supervised Classification
In this section, we cover some of the recent related works which use deep
learning for classification tasks. A new approach by Rasmus et al. [120] uses a ladder-
network for a classification task, where unsupervised and semi-supervised training
methods are combined and used simultaneously to improve the overall training and
performance of a model. A ladder network utilizes an autoencoder model, but
with additional “skip” connections between the encoder and decoder to transfer
details which would otherwise be lost. In their work, the ladder network is treated
as both a noisy encoder and a noise decoder, known as a denoising autoencoder
(dAE), which also functions as a hierarchical latent variable model. In the dAE,
99
an autoencoder is trained to reconstruct the original observation from a corrupted
version, which is compared to a clean version of the original observation run through
the encoder. Using this setup, they are able to perform state of the art semi-
supervised classification on the MNIST and CIFAR-10 datasets using only a small
subset of the data, using only 100 examples and 1000 examples for two different tests,
but such that every unique label (type) is present and no single label group is over-
represented in the training set. Using a ladder network, the authors demonstrate
that they can achieve low test error percentages (high classification accuracy) as
compared to previous semi-supervised classification results using extremely small
training labeled sets.
Pezeshki et al. [121] conducted an investigation into how the different aspects
of a ladder network function, and the influence of different amounts of training
data on those components. In their experiments, they removed or reconfigured
individual components to learn about their relative importance in the operation of
the model. One particularly interesting result reported is that the skip connections
between layers become less important as the number of training examples increases,
with more emphasis placed on the injection of noise in each layer. The experimental
setup included evaluations using a permutation-invariant MNIST dataset, evaluated
on 100 and 1000 training examples, such that every label is equally represented and
present. The authors found that using their own configuration of the ladder network,
they were able to achieve state of the art classification accuracy.
Our work differs from the previous work in a few ways. The goal of our work is
label discovery, to refine the initial coarse labeling provided with a dataset to a more
100
precise labeling, and adding labels which did not previously exist in the dataset. The
power of the previous works is their ability to leverage the unsupervised power of
deep learning in conjunction with traditional semi-supervised classification. The
previous works are able to generate impressive classification accuracy given a small
number of training examples, but still require knowledge and examples for each of
the unique categories, where our approach does not have this requirement.
4.2.6 Interactive Intelligent Systems and Active Learning
There have been other systems that combine the abilities of humans and ma-
chines to achieve a goal that would be difficult by either alone. The process of
continuously updating a model through human interaction, which then produces
results for a human to judge and operate with, often called semi-supervised learn-
ing, is also known as active learning. The advantage of active learning is that both
the machine and human are able to make better decisions as well as continuously
update and refine their decisions based on the output from the other. An example
of this iterative updating process is the work by Ware et al. [122] who have devel-
oped a system for interactively training a machine learning classifier by leveraging
the background and domain knowledge of the users. The system used a 2D scatter-
plot visualization where two of the many feature attributes selected define the plane
axes. By iterating over the different pairs of attributes, the authors found that users
were able to create classifiers on par with those of the state of the art by visually
partitioning the data and drawing decision boundaries.
101
A system called CueT by Amershi et al. [123] uses human analysts to train
and refine a recommender system for network system triage (linking associated error
messages into a common problem). In their work, they build a similarity matrix to
group and classify incoming alarms and tickets, which is refined through interactions
of the analyst. These recommendations are then sorted and presented at the top of
a ticket list and color-coded based on severity. Soto et al. [124] developed a system
called ViTA-SSD for presenting and identifying patterns among semi-structured
documents for text mining and analysis. This is done by leveraging a learned corpus
of important words from the documents along with meta-data to generate document
clusters. These 2D clusters are generated using a dimensionality reduction based
on a combination of t-Distributed Stochastic Neighbor Embedding (t-SNE) and K-
means clustering, which can be refined using a user adjustable distance metric for
measuring similarity between documents. These clusters are then visualized using
a scatter plot, and can be refined through the selection of particular keywords or
documents which an analyst may find useful, to allow further refinement of the
documents and clusters relating to the user input to be shown in the following
iterations. The ultimate goal is to allow for analysts to discover correlations, trends,
and similarities between different sets of documents and topics, such as similarities of
documents across different topic domains. The primary limitation of the previously
mentioned systems is that they have been engineered to handle a specific or discrete
type of data, or to try to help solve a specific kind of problem. In addition, the
previous systems are designed to help find correlations or differences between data
elements, or to improve the speed with which a user can interact with large amounts
102
of a specific kind of data. While also finding correlations and patterns in high-
dimensional data, our system allows for the actual modification and improvement of
a given dataset for future use. Not only does our approach improve the model used
within our own tool, it allows for the improvement of any future models trained
using the same data.
4.3 Our Approach
In this chapter, we present our approach to facilitate the discovery of latent
groups and labels within high-dimensional datasets through spatially meaningful
visualizations generated using deep learning. In addition, we also present our tech-
nique that allows an analyst to iteratively define and refine the labeling of a dataset
to reveal interesting trends and sub-communities. We illustrate our approach with
several datasets.
A general overview of our technique as compared to traditional approaches can
be seen in Figure 4.1. The traditional learning paradigm consists of first labeling a
large dataset through intensive and tedious work, and then using that constructed
dataset to train a model. As stated earlier, this approach can suffer from label-
ing errors and omissions, which can lead to errors and gaps in classification. In
our approach, we take the same datasets along with a simple labeling scheme to
train a deep neural network and produce a visually meaningful representation of the
dataset and the labels associated with that data. Using that visual representation,
an analyst can visually refine the labeling of the data based on patterns and spatial
103
吀爀愀搀椀琀椀漀渀愀氀 䐀攀攀瀀 䰀攀愀爀渀椀渀最
䠀椀最栀ⴀ䐀椀洀攀渀猀椀漀渀愀氀  䴀愀渀甀愀氀礀 䤀渀琀攀渀猀椀瘀攀 䰀愀戀攀氀椀渀最 䐀攀攀瀀 䰀攀愀爀渀椀渀最
䐀愀琀愀猀攀琀 䈀爀漀挀挀漀氀椀 ㄀
䈀爀漀挀挀漀氀椀 ㈀
䘀愀氀漀眀
刀漀甀最栀 䘀愀氀漀眀
匀洀漀漀琀栀 䘀愀氀漀眀
匀琀甀戀戀氀攀
䌀攀氀攀爀礀
唀渀琀爀愀椀渀攀搀 䜀爀愀瀀攀猀
嘀椀渀攀礀愀爀搀 匀漀椀氀
䌀漀爀渀 眀攀攀搀猀
䰀攀琀甀挀攀 㐀 圀攀攀欀猀
䰀攀琀甀挀攀 㔀 圀攀攀欀猀
䰀攀琀甀挀攀 㘀 圀攀攀欀猀
䰀攀琀甀挀攀 㜀 圀攀攀欀猀
唀渀琀爀愀椀渀攀搀 嘀椀渀攀礀愀爀搀猀
嘀攀爀琀椀挀愀氀 嘀椀渀攀礀愀爀搀 吀爀攀氀椀猀
嘀椀猀甀愀氀礀ⴀ䐀爀椀瘀攀渀 䤀渀琀攀爀愀挀琀椀瘀攀 䐀攀攀瀀 䰀攀愀爀渀椀渀最
䠀椀最栀ⴀ䐀椀洀攀渀猀椀漀渀愀氀 䴀愀渀甀愀氀礀 匀甀瀀攀爀昀椀挀椀愀氀 䰀愀戀攀氀椀渀最 䐀攀攀瀀 䰀攀愀爀渀椀渀最 嘀椀猀甀愀氀 䘀攀愀琀甀爀攀 刀攀瀀爀攀猀攀渀琀愀琀椀漀渀
䐀愀琀愀猀攀琀
䈀爀漀挀挀漀氀椀
䘀愀氀漀眀
䌀攀氀攀爀礀
嘀椀渀攀礀愀爀搀猀
䰀攀琀甀挀攀
伀琀栀攀爀
䈀爀漀挀挀漀氀椀 ㄀
䈀爀漀挀挀漀氀椀 ㈀
䘀愀氀漀眀
刀漀甀最栀 䘀愀氀漀眀
匀洀漀漀琀栀 䘀愀氀漀眀
匀琀甀戀戀氀攀
䌀攀氀攀爀礀
唀渀琀爀愀椀渀攀搀 䜀爀愀瀀攀猀
嘀椀渀攀礀愀爀搀 匀漀椀氀
䌀漀爀渀 眀攀攀搀猀
䰀攀琀甀挀攀 㐀 圀攀攀欀猀
䰀攀琀甀挀攀 㔀 圀攀攀欀猀
䰀攀琀甀挀攀 㘀 圀攀攀欀猀
䰀攀琀甀挀攀 㜀 圀攀攀欀猀 䐀椀猀挀漀瘀攀爀攀搀 氀愀琀攀渀琀 氀愀戀攀氀猀 愀渀搀  䘀椀渀攀爀 䰀愀戀攀氀猀 䠀甀洀愀渀 刀攀昀椀渀攀搀 䰀愀戀攀氀椀渀最 䠀甀洀愀渀 䤀渀琀攀爀愀挀琀椀漀渀
唀渀琀爀愀椀渀攀搀 嘀椀渀攀礀愀爀搀猀 猀攀洀洀愀渀琀椀挀 猀琀爀甀挀琀甀爀攀
嘀攀爀琀椀挀愀氀 嘀椀渀攀礀愀爀搀 吀爀攀氀椀猀
Figure 4.1: A brief illustration of the difference between traditional deep learning
techniques and our approach. deep learning traditionally requires a large, time-
consuming, and precisely labeled dataset for training. For many different reasons,
such datasets may be inappropriately labeled. In our approach, we start with coarse
labels (that are typically far easier to construct) and then refine them through an
iterative process, involving visual interactions and deep learning.
distinctions, which are fed back into the model to generate a new spatial represen-
tation. After a few iterations, our approach can accurately refine the labeling and
104
discover hidden groups within high-dimensional datasets. The end of the iterative
process depends on the structure of the data. As the user makes changes to the
labeling of various data points, the structure of the presented data changes. As
the iterative process continues, the structure changes less and less and eventually
converges. Once the structure has converged, the user may conclude the iterative
process.
4.3.1 Point-Distribution Generation
Our goal is to generate a spatially meaningful 2D representation of the data
such that data elements with similar features are located close together. To gener-
ate the representation, we used Matlab and Python with the deep learning libraries
TensorFlow [125] and Keras [126]. Therefore the deep-learning performance re-
ported here is not particularly optimized for time but for flexibility. We developed
a Siamese deep neural network inspired by Hadsell et al. [102], which is built on-
top of a Variational Autoencoder as inspired by Hinton and Salakhutdinov [101].
Our visual knowledge discovery program uses 2D and 3D rendering techniques with
GPU-accelerated OpenGL and C++. Data processing was performed on a computer
with an Intel Xenon 2.6 GHz CPU and a NVIDIA GTX 1080 GPU.
The deep neural network used for dimensionality reduction and revealing hid-
den clusters is composed of two systems, a Variational Autoencoder, and a Siamese
network. For a given dataset, whose labeling may be coarse or absent, the autoen-
coder first learns a 2D representation based solely on the data. The idea is to let
105
the data itself drive the dimensionality reduction, and to have any emerging groups
that come out of the process be based on the features within the data. From this
pre-trained representation, the Siamese network then steps in, using the same net-
work structure and weights. Its’ goal is to refine that network, by pushing and
pulling on these clusters in 2D space, such that similarly labeled groups of points
are pulled together, and dissimilar groups are pushed apart. It is this joint process,
of leveraging both the labels and features, that is used to generate the 2D distribu-
tions. Figure 4.3 shows the effect of the Siamese network on a distribution of points
generated by the Variational Autoencoder.
A visual representation of the network structure and procedure can be seen
in Figure 4.4. The deep neural network structure starts with an input layer with
the size of the dimensionality of the input data. From that input layer, that data is
passed into a set of three one-dimensional convolutional layers, a 128 filter, 9 feature
wide layer, a 64 filter, 6 feature wide layer, and a 64 filter, 3 feature wide layer. All
convolutional layers have rectified linear activation functions [127], also known as
Relu, as defined as max(0, x). The last convolutional layer is then passed into a
Flatten layer, which brings the internal data structure back to a 1D representation.
From this flattened layer, the data is passed into a Dense layer with two nodes,
using the identity activation function f(x) = x, which is used to generate the (x, y)
coordinates of a given datapoint. This network structure is shared between both
the Variational Autoencoder and the Siamese networks.
To update the Variational Autoencoder, reconstruction loss (binary cross-
entropy [101]) and Kullback-Leibler (KL) Divergence [128] are used to refine the
106
network. The Variational Autoencoder is completely unsupervised, relying entirely
on the features within the data. The goal is to generate a 2D representation of
the data which captures as many of the intrinsic properties of the high-dimensional
data, such that from the 2D representation, the original high-dimensional data can
be reconstructed accurately. For the Siamese network, two data points are fed into
two networks with the same structure and weights, with the weights of the networks
updated identically across iterations based on the contrastive loss function.
√∑
Distance = (x 2a − xb)
ContrastiveLoss = y ∗Distance+ (1 − y) ∗ max(0, 1 −Distance)
Figure 4.2: Equations used in calculating the Constrastive Loss for the Siamese
Network
To refine the Siamese network, we use a contrastive loss function using Eu-
clidean distance (as defined in Figure 4.2), which compares the two output 2D
points and checks to see if they belong to the same label, in conjunction with
ADADELTA [129] during training. If the points to the same label, then the loss
is the Euclidean distance between them. Otherwise, we adjust the loss to be high
(one minus the distance), so that the points are pushed apart. This creates a 2D
distribution of points such that points belonging to the same label are spatially
distinct to points belonging to different label. We use a batch size of 20, 000 and
train for a total of 10 iterations for both the Variational Autoencoder and Siamese
107
networks, values which were chosen as a balance to maintain real-time usability and
data distribution integrity. The same network and weights are used and shared
across iteration stages. Once the network is finished training, the network is used
to generate the 2D distribution of points. After the 2D points are generated, the
network weights are saved, to be re-loaded and re-used in the next iteration. For
each of the spatial scatter-plot representations, we process, visualize, and allow the
user to manipulate the entire dataset.
嘀愀爀椀愀琀椀漀渀愀氀 䄀甀琀漀攀渀挀漀搀攀爀 匀椀愀洀攀猀攀 一攀琀眀漀爀欀
Figure 4.3: An example of the result of running the Variational Autoencoder and
then refining that result using the Siamese Network. By running the Siamese net-
work after autoencoder, the generated clusters tend to be tighter with more space
in-between, making the individual clusters easier to identify. The user has the abil-
ity to adjust the number of iterations the Siamese network runs which affects the
tightness of the clusters.
108
嘀愀爀椀愀琀椀漀渀愀氀  䐀攀攀瀀ⴀ䰀攀愀爀渀椀渀最 
一攀琀眀漀爀欀 匀琀爀甀挀琀甀爀攀 䄀甀琀漀攀渀挀漀搀攀爀 匀椀愀洀攀猀攀 一攀琀眀漀爀欀 䜀攀渀攀爀愀琀攀搀 䐀椀猀琀爀椀戀甀琀椀漀渀 
䤀渀瀀甀琀 䠀椀最栀 䐀椀洀攀渀猀椀漀渀愀氀  䤀渀瀀甀琀 䠀椀最栀 䐀椀洀攀渀猀椀漀渀愀氀  䤀渀瀀甀琀 䠀椀最栀 䐀椀洀攀渀猀椀漀渀愀氀 䤀渀瀀甀琀 䠀椀最栀 䐀椀洀攀渀猀椀漀渀愀氀  䤀渀瀀甀琀 䠀椀最栀 䐀椀洀攀渀猀椀漀渀愀氀 
䘀攀愀琀甀爀攀 嘀攀挀琀漀爀 䘀攀愀琀甀爀攀 嘀攀挀琀漀爀 䘀攀愀琀甀爀攀 嘀攀挀琀漀爀 䘀攀愀琀甀爀攀 嘀攀挀琀漀爀 䘀攀愀琀甀爀攀 嘀攀挀琀漀爀
䌀漀渀瘀㄀䐀 ㄀㈀㠀 砀 㤀
刀攀氀甀 䔀渀挀漀搀攀爀
䌀漀渀瘀㄀䐀 㘀㐀 砀 㘀
刀攀氀甀
䌀漀渀瘀㄀䐀 㘀㐀 砀 ㌀
刀攀氀甀 ㈀䐀
䘀氀愀琀攀渀 ㈀䐀 ㈀䐀 ㈀䐀
䐀攀渀猀攀 ㈀ 䐀攀挀漀搀攀爀 䌀漀渀琀爀愀猀琀椀瘀攀 䰀漀猀猀䰀椀渀攀愀爀
伀甀琀瀀甀琀 
⠀堀Ⰰ夀⤀
刀攀挀漀渀猀琀爀甀挀琀椀漀渀 䰀漀猀猀 
⬀ 
䬀䰀ⴀ䐀椀瘀攀爀最攀渀挀攀
Figure 4.4: A visual representation of the network structure used in both the Vari-
ational Autoencoder and the Siamese network. The same network structure and
weights are shared across the networks. First the Variational Autoencoder runs,
and the Siamese network continues using the network weights generated by the
autoencoder. After the Siamese network has run, the resulting network is used to
generated the 2D distribution of points. The network weights are also saved, reused,
and refined in the following iterations.
4.3.2 Cluster Visualization and Manipulation
Our approach utilizes a 2D scatterplot to visualize the output from the deep
learning network. Each data-point in the dataset is assigned a 2D position from the
deep learning network and colored based on the group or label it currently belongs
109
to. The first distribution of points is generated using the initial coarse labeling
from the dataset. As points are removed and added to new or existing groups,
the distribution of the 2D datapoints changes to reflect the changing relationships
between them.
We next discuss how the user interacts with the system to modify and reorga-
nize the labeling of the 2D points. Initially, the entire distribution is presented to
the user, but such that every point shares the same color (gray), which helps reveal
the density of points, as shown in Figure 4.5(a). When a user clicks on a point,
all the points which share the same group as the selected point, become activated
(Figure 4.5(b)), as indicated by becoming colored. Multiple groups of points can
be activated in this way. When the user clicks on the white space between points,
all selected points/groups become deactivated, and return to the gray color. This
scheme allows for an almost unlimited number of groups/labels to be handled by
our system, especially since only a relatively few groups will be interacted with at
any given time. When a user identifies a potential sub-cluster, they first “activate”
the set of points by clicking on one of the points in the region of interest. After
clicking, a menu appears, prompting the user with several different options. An
example of this menu can be seen in Figure 4.6. The most basic interaction is “Se-
lect Points”, which allows for a manual selection of points. When this button is
clicked, the cursor changes to a square glyph, 5 pixels in size, to be used like a paint
brush select points, as can be seen in Figure 4.7(b). When points are selected, then
turn brighter. For larger selections, the control key may be held, which allows for
a drag-and-select circle to appear, which automatically selects all the points that
110
fall within its boundary that belong to the selected group (Figure 4.7(c)). To turn
these selected points into a new group, the user would then click the “Cut Group”
button. These controls allow for manual interaction with the points, but can be
tedious to use, and should only be used for fine-tuned selection. For larger changes
to the points, we have implemented an automatic partitioning tool, inspired by
how humans would naturally segment the points. The “Cut Group” menu option
starts a normalized cut operation [130] on the clicked/activated group (if no points
are currently selected). From the points belonging to the activated group, a simi-
larity matrix is calculated by comparing the distances between all pairs of points,
and then replacing those distance values x with 1.0/x to form a correlation matrix.
Using this matrix, a diagonal and Laplacian matrix [131] are computed. The Lapla-
cian matrix is then passed into an eigensolver, which generates the two smallest
eigenvalued-eigenvectors. Ignoring the Fiedler vector, the vector whose eigenvalue
is zero, the elements of the correlation matrix are reordered using the values of the
second-smallest-eigenvalued eigenvector. Next, we compute an appropriate cutting
location by minimizing the NCut function as described in the formulas in Figure
4.8, with cut(A,B) as the total weight of the edges connecting A and B, assoc(A,V)
as the total weight of the edges of A in V, and with NCut(A,B) as the cost to cut A
and B, normalized to favor roughly equally sized segments within a tolerance. The
index which is found to have the minimum NCut value is used as the splitting point,
and the group of points which was under the cursor at the time of first click are
assigned to a new group. An example of the output from the NCut algorithm can
be seen in Figure 4.9, with the left, smaller group of points being selected and cut
111
from the larger group. The second button, “Select Group” is used to select an entire
set of points belonging to a group. This is useful for merging groups or merging a
group with a selection of points from another group. To merge a group, the “Merge
Group” button is used. This will merge together all selected points into a single
group (label). The final button, “Deep Learning”, initializes a call to run the deep
learning algorithm which takes the current labeling distribution and retrains the
network to generate a new distribution of points.
(a) An initial view of the Salinas (b) Same set of points with one
Valley dataset, with none of the group activated.
points or groups selected
Figure 4.5: Initial view of points, with all points given the same color, which are
only assigned a color once selected or activated by the user.
We have identified a few techniques that often enable an analyst to locate latent
structures. The most common indication of a latent structure arises as visually
distinct cluster or sets of clusters in the 2D visualization. The distinction could
112
Figure 4.6: The menu interface presented when a user selects a point/group.
appear as geometric or color separation, or a combination of the two.
An example of such an interaction can be seen in Figure 4.10(a). A slightly
more complicated and common scenario is where the previous example exists, but
the second cluster overlaps with a different cluster of another label. This can in-
dicate that either the secondary group belongs to a separate group, such as in
Figure 4.10(b), or that the two clusters belong to the same group and should be
merged, as in Figure 4.10(c). This often indicates that the two groups share a com-
mon feature subset. A merge may become necessary when trying to extract a hidden
cluster, and instead of one clear cluster being extracted, two smaller clusters overlap,
indicating that the clusters belong to the same group. Another common scenario is
when a bulge exists somewhere on a large cluster, such as in Figure 4.10(d). Finally,
113
(b) Same set of points (c) Same set of points
(a) A set of points with the top section selected using a circle
before selection being selected using a selection interface
“paint brush”
Figure 4.7: Selection of points through manual paint-brush and circle selection
interaction.
√
(x− y)2
SimilarityMatrix(x, y) =
2 ∗ σ2
1.0
CorrelationMatrix =
∑SimilarityMatrix
cut(A,B) = w(u, v)
u∈A,∑v∈B
assoc(A, V ) = w(u, t)
u∈A,t∈V
cut(A,B) cut(A,B)
NCut(A,B) = +
assoc(A, V ) assoc(B, V )
Figure 4.8: Equations used to compute normalized cut.
a more complex and rare scenario exists when there are two sub-clusters within one
common parent and one child group surrounds the other child group. An example
114
(b) Same set of points after Ncut
(a) A set of points before Ncut
segmentation
segmentation
Figure 4.9: Segmentation of a set of points using normalized cut (Ncut).
of this can be see in Figure 4.10(e). This is often represented as a dense set of points
surrounded by many disparate points surrounding the central denser group.
One common problem across all visualizations which perform dimensionality
reduction is overlapping points. In high-dimensional space, points may be far apart,
but when projected down into 2 or 3 dimensions, they can overlap and obscure
one another. Our tool uses two different techniques to handle overlap. First, all
points are rendered using transparency, such that the points below can still be seen.
Secondly our visualization is designed around the idea that only a few clusters, or
sets of points, will be interacted with at any given time. Therefore, when a group is
selected, all the points belonging to that group “pop” to the front of the visualization
and are given a distinct color, while all inactivated groups retain a high-transparency
115
(b) An example of two clusters which
(a) An example of clear cluster
are distinct, but the separation
separation.
boundary is not clearly defined
(c) A merge example, where the (d) An example of a hidden cluster
green and purple clusters should be adjoining to another cluster,
combined. This can occur when a typically identified as a bulge
larger green cluster from a previous protruding off another dense cluster
iteration is partially split. of the same color.
value and gray color. This allows points of the given group to become easily visible
compared to inactive points or points in the background. Figure 4.11 demonstrates a
simple example, where the purple and yellow points are highly overlapped, and when
116
(e) An example of a cluster existing within
another cluster. These are typically identified
as points of the same label radiating from a
dense center.
Figure 4.10: Common Interaction Techniques
selected, the other group fades away. Using our tool, we are able to individually
select and visualize the independent clusters to see their full extent.
Throughout this process, it is possible for a user to make an error in labeling,
such as mislabeling a few points. This is an inevitable part of the discovery process.
If a new group is accidentally created, such that two groups now exist where one
group should have only existed, it is easy enough to simply select both groups and
merge them into a new group. Second, if a few points are mislabeled, it is very
likely that those mislabeled points will appear spatially within the group they were
meant to belong to after re-running the point-distribution algorithm, and contrast
visually with those points. This contrast of points would then lead the user to merge
those two groups together. In addition, this mechanism is often leveraged to tease
apart tight groups and to test if a hidden group may exist over multiple iterations (an
example of which can be seen in Figure 4.12). We present an example in Figure 4.13,
117
(c) The purple set of
(b) The yellow set of
(a) Two clusters, purple points are selected, and
points are selected, and
and yellow, are the yellow points are
the purple points are
obscuring one another greyed-out
greyed-out
Figure 4.11: Handling of overlapping sets of points.
where there exist two independent clusters, and a user has accidentally re-assigned
half the points of one cluster into the domain of another. Figure 4.13 shows that our
algorithm is resilient against such mistakes, and that while deep learning is taking
into account the labels assigned to each data point, the underlying features of the
data are also drive the distribution of points, and that the mistakenly labeled points
are still co-located with points sharing their true labeling.
There are three main components for each iteration. The first is training
the neural network with the current labeling of the dataset to generate the 2D
scatterplot. The second component is the creation, modification, and deletion of
118
Figure 4.12: An example of where a few points are selected from a parent cluster,
and over time the points are separated away from the parent cluster to form their
own cluster.
groups using the visual interface and human interaction. Lastly, the modified labels
are fed back into the neural network, where the weights of the network are updated
from the previous iterations using the new labeling, and then a new distribution of
points are generated.
4.4 Results
In this section, we present the results of our technique on three different high-
dimensional datasets. Our goal was to ease the burden of human labeling and
iteratively use deep learning to discover the latent groups and labels within these
high-dimensional datasets.
In this section, we will review two of the datasets tested in our system, a
hyper-spectral image of Pavia University [8], and a hyper-spectral scan of Salinas
Valley [9]. While the datasets we have selected and used for our study are spatial
119
(b) A labeling mistake where more
(a) Initial labeling distribution than half of the points are
mis-assigned to a nearby group
(c) The result of regenerating the
point distribution using the (d) The labeling after correction
erroneous labeling
Figure 4.13: The resilience of the algorithm to labeling mistakes.
120
by nature, we disregard the spatial and neighborhood aspects of these datasets and
consider each data-point individually during our discovery trials. Many real-world
datasets have no natural or obvious spatial component. Therefore, our goal is to
generate meaningful spatial representations using our technique that enables the
discovery of latent communities for non-spatial datasets. However, it is still possible
to use spatial datasets in our approach, but the spatial aspect will act as meta
information that is not directly utilized by our technique at present. Throughout
this chapter, we use the innate spatial distributions of our selected datasets as an
easily understandable visual metaphor to help analyze how well we are able to
recover the hidden labels, but these visual representations are never used or seen
during the iterative discovery process. The only visualizations presented to the
user during the iterative discovery process are the constructed scatter-plot spatial
representations.
For each dataset, we present a series of figures showing the initial coarse label-
ing, the feature-space distribution generated through iterative deep learning guided
by visual interactions, and the true distribution of labels. Throughout the process,
the ground-truth labeling was not used to aid the discovery and refinement process,
but is being presented here merely as an illustrative tool. Below we discuss the
results for each of the previously presented datasets.
121
4.4.1 Pavia University Dataset
The hyperspectral Pavia University dataset consists of a 610 × 340 pixel im-
age, where each pixel is represented by a 103-dimensional feature vector containing
intensity values at different spectral bands. Each pixel has been labeled based on
the identity of the surface. There are 9 discrete labels: asphalt, meadows, gravel,
trees, metal sheet, soil, bitumen, brick, and shadow. We decided to group those
labels into three coarse categories natural surfaces, roads, and buildings, which can
be seen in Figure 4.14. In Figure 4.14, we show the initial spatial representation of
the Pavia University dataset.
䌀漀愀爀猀攀 䰀愀戀攀氀椀渀最 嘀椀猀甀愀氀礀ⴀ䐀爀椀瘀攀渀 䐀攀攀瀀 䰀愀戀攀氀椀渀最 䜀爀漀甀渀搀 吀爀甀琀栀
刀漀愀搀   一愀琀甀爀攀  匀琀爀甀挀琀甀爀攀    䄀猀瀀栀愀氀琀   䴀攀愀搀漀眀猀  䜀爀愀瘀攀氀  吀爀攀攀猀  倀愀椀渀琀攀搀 䴀攀琀愀氀 匀栀攀攀琀猀  
䈀愀爀攀 匀漀椀氀   䈀椀琀甀洀攀渀   匀攀氀昀ⴀ戀氀漀挀欀椀渀最 戀爀椀挀欀猀  匀栀愀搀漀眀猀
䤀渀椀琀椀愀氀 䐀椀猀琀爀椀戀甀琀椀漀渀 䘀椀渀愀氀 䐀椀猀琀爀椀戀甀琀椀漀渀 刀䜀䈀 匀瀀攀挀琀爀甀洀
Figure 4.14: A comparison of the initial coarse labeling, to the generated refined
labels, and the precise (true) labels of the Pavia university dataset after 3 iterations.
Starting from the three initial categories: natural surfaces, roads, and buildings, we
were able to reconstruct the distribution of the 9 labels with an accuracy of 88.2%.
122
Starting from this initial representation, we manually select and re-label points
into new clusters, and then generate new point distributions in an iterative manner.
Within 3 iterations we were able to generate 9 labels from the initial three coarse la-
bels and improve the labeling accuracy from 67.7% to an accuracy of 88.2%. To test
the robustness of our technique, a hold-out test evaluating 10% of the data was con-
ducted, to determine if the hold-out test set could be accurately classified/grouped
into the proper cluster when trained on the other 90% of the dataset. For the 10%
holdout test-set, 95.3% of those points were correctly assigned to their proper group,
confirming the generalization and stability of the technique. There is a lot of over-
lap between the different “brush” categories; trees, dirt, and meadows. We were
unfortunately unable to differentiate these categories perfectly, but based on the
satellite RGB imagery, we believe that these categories are not strictly distinct and
therefore, there exist a lot of overlap between those categories. An example of this
is the irregular patch of “meadows” in the center of campus that many participants
in our user study, presented in more detail later, labeled as an independent group,
as seen in Figure 4.15. The proximity of this cluster in the 2D space indicates that,
while this group of points is similar to “meadows” group, there is a clear distinction
in their features, as supported by the aerial view. The reconstructed labeling we
generated for this dataset can be seen in Figure 4.14 along with the ground truth.
The average time required to run each iteration of the deep neural network was 9
seconds.
Based on the final data-point distribution as shown in Figure 4.14, it is pos-
sible to further refine the cluster segmentation and labeling, which we discuss more
123
㈀䐀 倀漀椀渀琀  嘀椀猀甀愀氀礀ⴀ刀攀昀椀渀攀搀  䄀攀爀椀愀氀 嘀椀攀眀 䜀爀漀甀渀搀 吀爀甀琀栀 
䐀椀猀琀爀椀戀甀琀椀漀渀 䐀攀攀瀀 䰀愀戀攀氀椀渀最 䰀愀戀攀氀椀渀最
Figure 4.15: A comparison between the labeling generated by our system and the
ground truth labeling. The labeling generated by our system is driven by the clearly
distinct group off the main body of points above it. This difference is supported by
a visual difference in the aerial view.
in a later section. The next dataset we present is more complex, involving more
dimensions and more hidden labels.
4.4.2 Salinas Valley
Here we review our results for another hyperspectral dataset, a (224-band)
scan of Salinas Valley, California. The dataset consists of a 512 × 217 image, with
each pixel represented by a 224 (204 after removing the water absorption bands)
feature vector. The valley consists of fallows, broccoli weeds, stubble, celery, grapes,
various vineyards, corn weeds, bare soils, and various stages of lettuce growth for
a total of 16 different labels. For our purposes, we clustered those 16 labels into
6 groups, where each group contains similar labels. For example, there were four
124
labels consisting of lettuce at various stages of growth. These were clustered into a
single group, which would constitute a reasonable assignment if the growth stages
were not known at the time of initial coarse labeling. The 6 groups we created are
named broccoli, fallow, celery, lettuce, vineyards, and other. This can be seen in
Figure 4.16. In the bottom left of Figure 4.16 we show the initial distribution of
data points from the Salinas Valley dataset. After three iterations of re-labeling
and re-generating the point position distribution, starting from the 6 initial coarse
labels and an accuracy of 58.61%, we reconstruct the 16 precise labels with an
accuracy of 97.4%. The state-of-the-art technique using the Salinas Valley dataset
achieves 97.11% classification accuracy [132]. The average amount of processing
time required by each iteration of the deep neural network was 10 seconds. Another
hold-out test using 10% of the data was conducted using the Salinas Valley dataset,
to determine if the test set could be accurately classified/grouped into the proper
cluster. For the 10% holdout test-set, trained on the other remaining 90% of data,
100% of those points were correctly assigned to their proper group, also confirming
the generalization and stability of the technique.
4.4.3 User Study
To demonstrate the usability and capability of our approach and tool, we con-
ducted a small user study, inviting participants to iteratively investigate the two
datasets presented earlier in this chapter. The participants were informed that their
goal was to try to determine if there were any undiscovered groups within the pre-
125
䌀漀愀爀猀攀 䰀愀戀攀氀椀渀最 嘀椀猀甀愀氀礀ⴀ䐀爀椀瘀攀渀 䐀攀攀瀀 䰀愀戀攀氀椀渀最 䜀爀漀甀渀搀 吀爀甀琀栀
䈀爀漀挀挀漀氀椀  䘀愀氀漀眀  䌀攀氀攀爀礀  䰀攀琀甀挀攀  嘀椀渀攀礀愀爀搀猀   䈀爀漀挀挀漀氀椀 ㄀  䈀爀漀挀挀漀氀椀 ㈀  䘀愀氀漀眀  刀漀甀最栀 䘀愀氀漀眀  匀洀漀漀琀栀 䘀愀氀漀眀  匀琀甀戀戀氀攀  䌀攀氀攀爀礀  唀渀琀爀愀椀渀攀搀 䜀爀愀瀀攀猀  嘀椀渀攀礀愀爀搀 匀漀椀氀  䌀漀爀渀 眀攀攀搀猀   䰀攀琀甀挀攀 㐀 圀攀攀欀猀  䰀攀琀甀挀攀 㔀 圀攀攀欀猀  
伀琀栀攀爀 䰀攀琀甀挀攀 㘀 圀攀攀欀猀   䰀攀琀甀挀攀 㜀 圀攀攀欀猀  唀渀琀爀愀椀渀攀搀 嘀椀渀攀礀愀爀搀猀  嘀攀爀琀椀挀愀氀 嘀椀渀攀礀愀爀搀 吀爀攀氀椀猀
䤀渀椀琀椀愀氀 䐀椀猀琀爀椀戀甀琀椀漀渀 䘀椀渀愀氀 䐀椀猀琀爀椀戀甀琀椀漀渀 刀䜀䈀 匀瀀攀挀琀爀甀洀
Figure 4.16: A comparison of the initial coarse labeling, the generated refined labels,
and the precise (true) distribution of labels, of the Salinas valley dataset after five
iterations. Starting with 6 initial coarse labels, we were able to reconstruct the
distribution of the 16 labels with an accuracy of 97.4%.
sented visualization, and were briefly shown how to interface with the tool. No
information on the number of true clusters, the nature of the data, or how to deter-
mine what makes a cluster distinct were provided. Our goal was to simply provide
the bare-minimum needed for the participant to use the tool. All participants found
the tool to be very intuitive and easy to use. Each participant started from the exact
same layout of points, network structure, and initial network weights, and were in-
structed to continue refining their labeling until they verbally stated that they were
satisfied with their groupings. After the study, the participants were explained the
true nature of the task, and were astonished with the ease they were able to accom-
plish this task. The results of the user study are summarized in Figures 4.17 and
126
4.18. Note that the scores for the primary author have been also been added to these
plots, showing that the performance between experienced and inexperienced users
are comparable. For each dataset, there is a clear iterative upwards improvement in
the labeling accuracy of the datasets, suggesting that even users who have had no
prior experience working with dimensionality reduction and data analytics were able
to not only use our tool, but achieve positive or spectacular results. The solid-line
plots show the accuracy of the data-labels with respect to the true, hidden labeling,
for each iterative step, showing that the recovered labeling increases in accuracy
with each subsequent iteration. The dashed-line shows the amount of time spent
on manual interaction during each iteration of the reconstruction process, revealing
a general downward trend. This suggests that in the beginning, large changes are
being made in labeling, which changes to fine refinements as the iterative labeling
process continues. It is important to note that, although we use accuracy as a
measure to quantify the success of our system, it is not exactly a proper measure.
The goal of our system is hidden label/group discovery, and throughout the redis-
covery task, the primary author and participants have found a few sub-groups or
hidden groups within the existing datasets, which are clearly differentiated in our
visualization (covered further in the next section). The participants, unaware of the
underlying metric of accuracy, have over-labeled the datasets, driving down their
otherwise high accuracy scores, but in doing so, have been successful in their stated
goals and the goals of the presented system.
127
Figure 4.17: Participants accuracy and timings for the Pavia University dataset.
4.4.4 Interpreting Discovered Labels
For our testing, we used datasets where the precise labeling of data was com-
plete, and then grouped those precise labels to create coarse labels. Our goal was
to reconstruct and rediscover the provided distribution and labeling of the data.
Because the full precise labeling (the ground truth) is known, we are able to discern
which of the newly discovered labels are associated with each pre-existing label.
However, our technique is designed to be used for hidden label/group discovery. A
user could start with a group whose label is known, and discover it contains a sub
128
Figure 4.18: Participants accuracy and timings for the Salinas Valley dataset.
group or is composed of a few sub groups, which could be then found and re-labeled
using our technique. It may then be possible to identify what the newly discovered
labels correspond to or mean based on additional meta information. For example,
we identified through our visualization that the coarse lettuce category from the
Salinas Valley dataset had four sub-clusters within it, but it would be very diffi-
cult to identify that the reason they are different is due to age, based only on the
hyperspectral imagery, and without the additional meta information.
In our iterative discovery process on the two previous datasets, it was possible
to continue the iterative process to generate more labels, beyond the given ground-
129
truth labeling. For example, in the Pavia University dataset (shown in Figure 4.14),
in addition to the findings presented in Figure 4.15, we discovered another sub-
cluster within one of the larger groups for the painted metal sheets. In Figure 4.19,
we show the result and spatial representation of sub-dividing the purple cluster. By
using the additional meta-information (in this case the spatial information) we can
see there is a consistent spatial coherence for the newly discovered group. The newly
discovered sub-cluster runs along the vertical portions of the roof of these buildings.
We may speculate as to why there is a distinct difference here, but unfortunately
the precise labeling and aerial view given with the dataset does not reveal any clues
as to why there is a consistent feature difference within the group. Another set of
interesting discoveries made were in the Salinas Valley dataset (as shown in Figure
4.16). In Figure 4.20, we compare the aerial view of the Salinas valley with the
labeling generated using our technique and the ground-truth labeling as provided
by the dataset. Looking at the aerial view, a discoloration in the field is visible, which
clearly does not match the rest of the field, indicating a different surface material
is present. The ground truth labeling does not take into account this discoloration,
and techniques which use shape or structure to partition or label an image may
miss this entirely. Our technique reveals that this surface material was different,
and correlates with the aerial view. The absence of these hidden groups within the
datasets may be for one of the earlier presented reasons: lack of knowledge of the
existence, a simple mistake, or labeling simplification by relying only on the satellite
RGB, or overall shape information.
130
䈀攀昀漀爀攀 䰀愀戀攀氀椀渀最 䄀昀琀攀爀 䰀愀戀攀氀椀渀最
Figure 4.19: A potential newly discovered sub-group within the painted metal sheets
label in the Pavia University dataset. On the left shows the point distribution and
spatial representation of the labeling as generated by our technique. On the right
shows another iteration showing if we had sub-divided the current points and the
resulting spatial representation.
4.4.5 DNS Query Dataset
One of the goals of the presented system is to find hidden groups within real-
world, large, high-dimensional datasets where there is little to no meta information,
such as spatial cues, to aid in that process. In this section, was present the results
of our system on a real-world dataset from the University of Maryland D-Root DNS
server, one of 13 root DNS services which handle DNS requests for the entire in-
ternet. DNS stands for Domain Name Service, which is responsible for answering
queries, converting human understandable URLs into IP-addresses that the com-
131
䄀攀爀椀愀氀 嘀椀攀眀 嘀椀猀甀愀氀礀ⴀ刀攀昀椀渀攀搀  䜀爀漀甀渀搀ⴀ吀爀甀琀栀 
䐀攀攀瀀 䰀愀戀攀氀椀渀最 䰀愀戀攀氀椀渀最
Figure 4.20: A potential newly discovered sub-group within the untrained grapes
label in the Salinas Valley dataset. The left figure shows the meta information used,
the aerial view of the valley. The middle image shows the labeling as generated
using our approach. The right image shows the labeling as given by the dataset.
The yellow labeling portion in the middle image matches with a discoloring in the
aerial view, which suggests that there may be a different material there.
puter can understand. The D-Root DNS service receives billions of such requests
per day. As part of their analyses, understanding the nature and distribution of the
traffic they receive is crucial. Using this data, our goal was to uncover clusters of
queries, to determine if there were any trends in the huge volume of queries the D-
Root DNS server receives. To convert the queries into a format that can be passed
to deep learning, each query string is converted to a vector of TF-IDF (Term Fre-
quency, Inverse Document Frequency) values, where each value of the feature vector
corresponds to a particular character. For our purposes, we process an hour of traf-
fic, which is 7.5 gigabytes for nearly 7 million queries (data points). In Figure 4.21,
132
we show the evolution of the generated query space throughout the discovery and
labeling process, as performed by the primary author. To start, there were no labels
(as nothing was known about the dataset), and after 8 iterations, 42 groups/labels
were generated (although this process could continue to further refine the groups).
To verify the success of our process, we have manually investigated the discovered
clusters and have provided each group with a human-understandable label in addi-
tion to presenting any trends that exist within a group or across groups. The result
of this labeling is shown in Figure 4.22. Many different groups were discovered, the
most salient being those regarding erroneous IP address queries. Using our tool,
we were able to identify different configurations of queried IP addresses based on
the error of the request. Other interesting discoveries were sets of code fragments
and commands being issued over DNS, which is common for command and control
of botnets. Lastly, we were able to find different configurations of alphabetic-based
queries, ranging from differently sized sets of random characters, to more formed
queries with specific domains. Our collaborators at the University of Maryland D-
Root server agree that not only have we confirmed the various types of traffic they
see, but we have also revealed other types of traffic, that they were unaware of but
are now interested in.
4.5 Discussion
In recent years there has been an explosion of large high-dimensional datasets.
Extracting meaningful information hidden within these high-dimensional datasets is
133
Figure 4.21: The evolution of the query-space representation over eight iterations,
showing the influence of the iterative labeling. Note that for some clusters certain
colors have been re-used due to the high-number of groups.
134
匀栀漀爀琀 刀愀渀搀漀洀 
䌀栀愀爀愀挀琀攀爀猀
砀ᤠ猀
一漀 搀漀洀愀椀渀猀Ⰰ
㸀㈀㔀㘀 䤀倀瘀㐀 
伀挀琀攀挀琀猀 猀甀戀ⴀ搀漀洀愀椀渀猀Ⰰ
⨀⸀戀椀稀 ⨀⸀栀漀洀攀猀琀愀琀椀漀渀 漀爀 瀀甀渀挀琀甀琀愀琀椀漀渀
㸀㸀 一甀洀攀爀椀挀 唀刀䰀猀㈀㔀㘀 䘀 ⨀⸀渀攀琀
伀 椀爀挀 猀琀攀 琀 
䴀愀渀礀 瘀ᤠ猀Ⰰ 
挀 䤀琀 倀嘀㐀  ⨀⸀搀漀洀愀椀渀⨀⸀栀漀洀攀
䴀愀渀礀 樀ᤠ猀 
礀ᤠ猀Ⰰ 愀渀搀 焀ᤠ猀 愀渀搀 眀ᤠ猀
䤀倀嘀㐀⸀䤀倀瘀㐀 椀渀瘀 ⨀⸀渀攀琀愀氀椀搀 ⨀⸀挀漀洀
䤀倀瘀㐀Ⰰ 倀漀爀琀  甀爀氀⸀ ⨀⸀氀漀挀愀氀 䰀漀渀最攀爀 刀愀渀搀漀洀 
眀眀眀⸀䤀倀瘀㐀 氀漀挀愀氀 䌀栀愀爀愀挀琀攀爀猀䤀倀瘀㐀Ⰰ 䤀倀瘀㐀 ⨀⸀戀攀氀欀椀渀
搀ᤠ猀 愀渀搀 最ᤠ猀
䤀倀瘀㐀⸀搀漀洀愀椀渀
䤀倀瘀㐀嬀䤀倀瘀㐀⨀崀 开洀挀搀挀猀开⸀⨀ 堀ᤠ猀
开氀愀搀瀀开⸀⨀ 䴀愀渀礀 戀ᤠ猀 
䤀倀瘀㐀㨀倀漀爀琀 䤀渀瘀愀氀椀搀  愀渀搀 渀ᤠ猀
䤀倀瘀㘀 挀漀洀洀愀渀搀⠀愀搀搀爀攀猀猀⤀
开猀攀爀瘀椀挀攀猀⸀⨀
眀眀眀⸀䤀倀瘀㘀 䌀漀搀攀 
倀漀爀琀㨀䤀倀瘀㘀⼀㘀㐀 䘀爀愀最洀攀渀琀猀 搀戀⸀开搀渀猀ⴀ猀搀⸀开⨀氀戀⸀开搀渀猀⸀开⨀
䤀倀瘀㘀㨀㄀⼀㘀㐀 爀⸀开搀渀猀开⸀⨀
搀爀⸀开搀渀猀ⴀ⨀
Figure 4.22: The output of the DNS query investigation coarsely labeled to identify
the meaning of the individual clusters and their relation to each other. Note that
for some clusters certain colors have been re-used due to the high-number of groups.
135
爀愀渀
爀 搀愀 漀洀爀 ⸀愀 戀渀 渀 攀搀 搀 氀欀漀 漀 椀洀 洀⸀栀漀 渀⸀栀漀洀 洀攀 攀猀琀愀琀椀漀渀
戀ⴀ搀漀洀愀
椀渀猀
㸀㌀ 猀甀
猀
 䤀倀瘀㐀 伀
挀琀攀挀琀
䰀攀愀搀椀渀
最 　
not trivial, and is made more difficult by erroneous and high-level (coarse) labeling.
These errors may be caused by subjectivity, lack of time, or misunderstanding of the
data. One way of revealing hidden trends or structures within a high-dimensional
dataset is to group similar points based on their features. Identifying similar high-
dimensional data-points is difficult in part due to the Curse of Dimensionality, which
states that the distance between all pairs of high-dimensional points converge, mean-
ing that traditional distance techniques reveal little information. Another difficult
aspect of high-dimensional information visualization is deciding on how to show
distinctions and similarities between high-dimensional points in an intuitive and
insightful way.
In our approach, finer labeling and classification of the data requires more
iterations. To reduce the amount of human effort required we have added the point-
and-click normalized cut segmentation. This is particularly effective for segmenting
large numbers of smaller clusters quickly. Introducing more automated tools such as
this to reduce the effort required by a human, but still keeping a human in the loop
and giving them the final say, would be an interesting direction of future research.
4.6 Conclusions
The current advances and future potential of deep learning is without question.
The objective of this chapter is to provide a first step to advance our capability
and understanding of deep learning. In this chapter we presented our approach
to facilitate the discovery of latent structures and communities/labels within high-
136
dimensional non-spatial datasets using deep learning. Given a coarsely, or broadly
labeled, or unlabeled high-dimensional dataset, we generate a 2D distribution of
the data based on the idea that similarly attributed and labeled points should be
in close proximity to each other. Through an iterative process, an analyst can
select points and assign them to new or existing communities, which is then used in
conjunction with deep learning, to refine the 2D spatial distribution of the points,
revealing more new information. Supplementing the instinct and ability of analysts
to identify patterns with deep learning, we have shown on three different datasets
that our technique is able to reconstruct hidden structures and communities. In
many previous works, deep learning has been used as a black-box classifier, with
little or no human interaction. Our technique, presented here, enables both deep
learning and the human analyst to support, refine, and enhance each other through
visualization.
137
Chapter 5: Visual Analytics for Root DNS Data
5.1 Introduction
There are two schools of thought on how to deal with cyber-attacks, automatic
detection methods, and human-driven investigation. Automatic detection methods
work by modeling normal and abnormal behavior, through prior knowledge of the
behavior of malware and since by definition a cyber-attack is abnormal. However,
humans and the machines are also capable of abnormal behavior, causing these au-
tomatic detection algorithms to throw many false alarms. In addition, these models
take months to create, are created using attack data that occurred on average at
least six months prior, and generally only detect the presence of old cyber-attacks,
and are therefore unprepared for ever changing and newer attacks. In addition,
these methods generally only report the presence of an attack, but give no details
on specifics. In contrast, and in part in response to automated mechanisms, many
cyber-security analysts prefer manual investigation and analysis of attacks. There
is an overall mistrust of automated systems by those who perform cybersecurity
analysis [133]. Generally, analysts tools consist of listing packets and related infor-
mation, whereby data is inspected line by line. In contrast, many tools often only
provide very high-level abstractions of the data, typically in the form of histograms,
138
where the histogram consists of the number of received packets.Few tools fill the
gap between very high and low-level analysis, as well as provide distinct informative
views of the underlying data.
One major target of cyber-attacks is the Domain Name System (DNS) in-
frastructure, responsible for converting human-understandable URL queries into
machine-understandable IP-addresses.The ubiquity and central importance of DNS
make it a tempting target for attack and exploitation. If these domain name systems
go down, it would create unprecedented chaos and instability on the internet as IP
addresses change, caches expire, and queries remain unresolved. Finding, charac-
terizing, understanding, and the mitigation of these attacks on DNS is of utmost
importance.
One core aspect of maintaining and defending the DNS is providing DNS
analysts with a method of monitoring the queries received. Those DNS analysts
that are able to easily comprehend the variety and scope of the queries that pass
through their system will be better able to characterize attacks, anomalies, and
normal behavior. The challenge is the sheer amount of root DNS traffic which
ranges from 100 to 300 GB per server, per DNS letter, per day. The primary
systems in use today typically focus on packet counts and origination (source IPs).
Modern packet analyzers generally present every aspect of every packet in tabular
lists, with query information buried deep in expandable subsections in those lists,
or as a single column among many. These techniques, while providing unparalleled
detail, do not leverage our innate ability to process spatially organized data to
find patterns and anomalies. In addition, they have little emphasis on the aspect
139
of DNS that makes it so important, the queries themselves. By presenting DNS
queries and IP-activity in a spatially and temporally coherent manner, with cross-
visualization interactivity, enabling high and low-level investigation, DNS analysts
will be able to more effectively process and organize that data. In this chapter, we
present our visualization which has been designed to take the vast amounts of root
DNS queries, organize them in a spatially comprehensible manner, and facilitate
easy investigation to not only answer existing questions, but to help DNS analysts
discover new questions. We validate our presented approach on data from one of
these 13 root DNS providers, namely the D-Root.
In summary, this chapter makes the following contributions to immersive an-
alytics and visualization for network security:
• We have designed a dual-interactive-visualization system for DNS query and
IP data which leverages 3D graphics techniques to convey that data in a novel
representation.
• We visualize an order of magnitude more DNS data than previous systems,
providing analysts with high-level situational awareness, while preserving low-
level details and nuance, without the need for switching between multiple
different applications.
• We organize abstract DNS queries in an easy to interpret spatial layout us-
ing a deep learning variational autoencoder, such that co-located queries are
semantically similar. We also leverage volume rendering to provide analysts
with a high-level spatiotemporal understanding of how the distribution of DNS
140
queries change.
• We identify and characterize distinct DNS anomalies and attacks through an
informal empirical evaluation and discussion of discovered trends and clusters
with industry experts.
In the following section, we present a review of the challenges, standard prac-
tices, as well as present and characterize new techniques.
5.2 Background
Originating in the days of ARPANET, the DNS can be considered as a simple
list of host names with their mappings of to and from addresses, maintained in a
frequently-updated host table. However, the open nature of DNS makes it vulnerable
to a wide variety of attacks and abuses.
The constant attack by malicious sources has necessitated the need for au-
tomated intrusion detection systems (IDS).However, many industry operators have
observed that modern IDS, although useful, are not optimal or trustworthy [134],
in-part due to the large presence of false positives and inability to detect the lat-
est threats [133].Often these systems require a human-in-the-loop to review these
detection alerts, and to contextualize the alerts with additional information [135],
often manually with separate visualization tools from the IDS [136]. Therefore, vi-
sualizations that can provide summary and precise representations of the data is
of utmost importance [137]. In the remainder of this section, we review techniques
with visualization as a core-component of the analytic and investigation process.
141
5.2.1 Traditional 2D Network and DNS Visualization
Traditional techniques for visualizing network data include charts (histograms),
line-plots (including parallel coordinate views), graphs (including node-link dia-
grams), among others. The challenge is the enormous and always increasing amount
of data to portray. Visualizing all aspects of the data at once is untenable. Tools
such as Excel, NetStat [138] and Wireshark [139] (Figure 5.2), outline every aspect
of every packet. This gives analysts an unprecedented level of detail, but hinders
finding trends, correlations, and anomalies over time [140]. Traditionally, an analyst
will write queries to explore their data, leveraging their background knowledge of
the dataset. This process is extremely tedious and labor intensive, and generally
requires a known starting point [141]. Generally, DNS analysts are interested in
monitoring the health of their system, the flows of traffic, patterns, and anomalies.
Histograms are very commonly used for quick analysis of overall trends, such as
direct comparisons between adjacent periods of time [142], the count of a particular
feature, such as the number and type of alerts [143], and portray counts of packets
[142], query types [144], and severity [145]. Histograms are arranged in 1D, by
stacking elements to simultaneously show different properties [142], or with curved
and circular representations [143], and in 3D [146]where the direction and orientation
of the histogram along the z-axis provides additional information.
Similar in function to histograms, line-plots can convey counts over time [145].
One common implementation is parallel-coordinates, used to find botnets [147]and
anomalies [146]in DNS traffic by plotting packet attributes along each axis such
142
as IP-address, time, and attribute counts. Circular representations, such as those
used for network intrusion detection [148], can reveal patterns providing what hap-
pened, where, and when. Theme rivers, akin to stacked histograms, have been
used to visualize changes and anomalies in DNS query traffic [149]. One problem
with parallel-coordinate visualizations, including traditional line-plots, are inten-
tional and unintentional obfuscation (Windshield Attacks) [150]. Similarly, as the
number of axes and data-points grow, the data elements can self-occlude and hide
lingering patterns.
Network graphs, representing IPs, AS, domains, machines, queries, or users,
connected via edges (shared traffic, association, or other connections) [133] have
been used to visualize communities of hosts in DNS traffic [151], changes in DNS
routing and look-up behaviors [152], and anomalous behavior in failed DNS queries
[153].While network graphs are useful, previous research [154] found that their effec-
tiveness decreases dramatically if the graph exceeds approximately twenty vertices,
limiting their effectiveness for fine-level network analysis.
Many new visualizations leverage TreeMaps [155], which color-code packet
counts and anomalies in IP-address bins. Other visualizations correlate geospatial
aspects of DNS traffic and overlay packet counts [156,157]. More creative visualiza-
tions, such as glTail (http://www.fudgie.org/) and Logstalgia [158] use interactive
graphics to render a log file as dynamic 2D simulations.
The previously mentioned visualizations generally trade scalability for fine-
level detail, and focus either on high-level summary overviews for large amounts
of data, or detailed views for small amounts of data. Therefore many analysts use
143
multiple tools to gain a complete picture, but this creates an unnecessary context
switch and overhead. Additionally, many approaches layout their information with
a focus on aesthetic qualities such as maintaining symmetry with uniform glyph
positions, potentially compromising latent global and local data structures [159]. In
our visualization, we preserve and show both precise and high-level representations
for vast amounts of DNS data. Previously, the focus has been on the evolution
of source IP packet counts over time, with little to no emphasis on the messages
in the packets. The DNS system exists to handle queries, so enabling analysts
to explore the changing distributions of queries is of critical importance. Such a
visualization would be infeasible using traditional visualizations due to the arbitrary
and high dimensionality of the queries, in addition to the irregular behavior of
their transmission. Our work portrays a spatiotemporal distribution of packets and
queries over time, revealing patterns and anomalies difficult to identify through
earlier means.
5.2.2 3D Network Visualization
While 2D visualizations are regarded as easier to create and understand (in
terms of time required for comprehension), recent research has shown there are many
benefits to 3D visualizations over 2D for abstract data visualization [3–5], including
clearer spatial separation, reduced over-plotting, and enabling faster construction
and deeper mental models. In addition, many visualizations rely heavily on spatial-
ization, encoding information in the location of data-elements. The addition of a
144
third dimension makes available more insightful relative positioning [160].
One of the earliest uses of 3D for cyber-security visualization was by Stephen
Lau [161]. The visualization uses a 3D scatter-plot to reveal patterns associated
with vulnerability scan attacks. To minimize clutter from 2D parallel coordinate vi-
sualizations, many are expanding into the third dimension [162]. P3D, a 3D parallel
coordinate network security visualization [150] creates multiple 3D planes, each with
a set of either IPs, packet counts, ports, or other information along the x and y-
axes, with lines connecting these planes representing connections or FTP transfers,
to detect port scans while preventing the occlusion attacks such as Port Source Con-
fusion and Windshield Wiper attacks [163]. Another example is Daedalus-Viz [164],
which consists of several circular rings, corresponding to various monitored organi-
zations, in orbit around a central sphere representing the complete IPv4 space, with
connecting lines indicating the transfer of packets.
One main drawback of the previous systems is the relatively small amount
of data they can visualize. We expand upon these ideas by combining elements of
scatterplots and parallel coordinate visualizations. Rather than just plotting one
element per cell, we interleave multiple data points within a given spatiotemporal
cell using 3D transparency, enabling more information to be presented, as well as a
direct comparison between similar elements. Lastly, plotting many discrete points
temporally increases the overall visual complexity. Instead, we have clumped to-
gether spatially coherent groups of points into mesh surfaces to minimize the visual
clutter, to reveal structural patterns and changes within the original query point
cloud.
145
5.3 Problem and Solution
刀愀眀 倀挀愀瀀 䘀椀氀攀猀 䘀愀挀椀氀椀琀愀琀攀 匀瀀愀琀椀愀氀 愀渀搀 吀攀洀瀀漀爀愀氀 䐀攀琀攀挀琀 愀渀搀 䄀渀愀氀礀稀攀 䄀琀琀愀挀欀猀䔀砀瀀氀漀爀愀琀椀漀渀 漀昀 䄀挀琀椀瘀椀琀礀 椀渀 䤀倀ⴀ匀瀀愀挀攀 
䔀砀琀爀愀挀琀 䤀倀 ㄀㤀㈀⸀㄀㘀㠀⸀　⸀　 唀䐀倀 䔀砀琀爀愀挀琀 倀愀挀欀攀琀㄀㈀㌀⸀㐀㔀㘀⸀㜀㠀㤀⸀㈀㔀㔀 吀䌀倀 ⴀ 䄀䌀䬀
䄀搀搀爀攀猀猀攀猀 ㄀⸀㄀⸀㄀⸀㄀ 唀䐀倀 吀礀瀀攀㄀　⸀㄀　⸀㄀　⸀㄀　 吀䌀倀 ⴀ 匀夀一
⸀⸀ ⸀⸀
⸀ ⸀
䈀椀渀 䤀倀瘀㐀 䤀倀ⴀ䄀搀搀爀攀猀猀攀猀
⠀䤀倀瘀㐀⸀㄀Ⰰ 䤀倀瘀㐀⸀㈀Ⰰ 䤀倀瘀㐀⸀㌀Ⰰ 䤀倀瘀㐀⸀㐀⤀ 䴀攀愀猀甀爀攀 䄀琀琀愀挀欀 䤀洀瀀愀挀琀
堀 㴀 䤀倀瘀㐀⸀㄀ 砀 ㈀㔀㘀 ⬀ 䤀倀瘀㐀⸀㈀砀 䈀椀渀䌀漀甀渀琀㈀㔀㘀⨀㈀㔀㘀
夀 㴀 䤀倀瘀㐀⸀㌀ 砀 ㈀㔀㘀 ⬀ 䤀倀瘀㐀⸀㐀砀 䈀椀渀䌀漀甀渀琀㈀㔀㘀⨀㈀㔀㘀
䌀漀甀渀琀 䄀渀搀 䐀椀猀瀀氀愀礀 
倀愀挀欀攀琀猀 伀瘀攀爀 吀椀洀攀 䴀漀渀椀琀漀爀 䄀昀琀攀爀 䔀昀攀挀琀猀
䌀漀甀渀琀 伀挀挀甀爀爀攀渀挀攀 漀昀 
䤀倀ⴀ䄀搀搀爀攀猀猀攀猀 
Figure 5.1: Overview of the process from raw pcap files to The Flow-Map IP-Space
visualization. Starting from a binary pcap file, we extract and count the occurrence
of each IPv4 IP-Address and type of packet. Next, the IPs are converted from a 4D
to a 2D grid representation, with glyphs scaled and colored based on the number and
type of packets. This process repeats for each time slice, with slices stacked along
the z-axis. The result is then visualized using 3D accelerated rendering, which
allows for high-level structure and low-level analysis, to help analysts establish a
sense of normalcy (central blue image), identify outliers (green TCP burst), classify
and characterize attacks (top right), measure attack impacts (middle right), and
monitor after effects (lower right).
146
吀椀洀
攀
5.3.1 The Challenge
As part of our development process, we interviewed DNS analyst experts from
the University of Maryland D-Root. One of the challenges they face is the scope and
enormity of the data they manage. Over the course of an average day at just one
of their 131 global facilities, they process over 100 GB of traffic, with a peak traffic
around 300 GB. When under attack by a typical DDOS, one server can process
roughly 600 GB in one day.
Figure 5.2: An example analysis in Wireshark, a widely used pcap analyzer.
Traditional query visualization tools are very limiting and typically omit the
queries and contents of the packets as part of the investigation, emphasizing packet
counts and the distribution in the IP-space. As stated earlier, most pcap (packet
capture) tools present packets line-by-line. An example of queries presented in a
commonly used tool (Wireshark [139]) is presented in Figure 5.2. While this level of
detail can be very useful, it limits the ability of an analyst to generalize and discover
trends due to the lack of a global, visual, and temporal coherence as well as eliciting
147
a sense of information overload.
There have been many root DNS attacks, historically lasting three [165] and
five hours [166]. DDOS attacks outside the realm of the root-DNS on average last
less than twenty hours [167].To ensure we can cope with the largest of attacks, we
visualize 24 hours of traffic in our case study. However, our visualization is capable
of showing larger durations of time.
For the purposes of this chapter, we explore one recent root DNS attack.
On June 25th 2016, a moderately sized DDOS attacked all root DNS authori-
ties in a coordinated attack. A report published by the root DNS authorities
on this specific attack can be found here (http://root-servers.org/news/events-of-
20160625.txt).According to the official report, all DNS root name servers received
a high rate of TCP SYN packets in a SYN flood attack for nearly four hours. The
source addresses appeared to be randomized and uniformly distributed throughout
the IPv4 address space. The observed traffic volume due was up to approximately
10 million packets per second (approximately 17 GB/s), per DNS root name server
letter. Our goal is to provide analysts with a sense of normalcy over the course of a
day, contextualize attacks when and if they occur, to aid in subsequent investigation
and mitigation strategies, and help develop a characterization of current and future
attacks for comparison.
Previous network visualizations have visualized up to approximately 350 mil-
lion packets [152, 168, 169].In our presented visualization, we visualize, in real-time
using 3D accelerated rendering techniques, over 2.4 billion packets, consisting of over
487 million unique queries, spanning 24 hours, from the McLean Virginia D-Root
148
DNS site.
5.3.2 Approach Overview
There were three design considerations driving our development, to display an
entire day of query traffic from a Root DNS server, show high-level structures and
patterns in an intuitive manner with interactions enabling finer investigation, and
display time as a spatial dimension. In this chapter, we present two complementary
visualizations of the activity in both the IP and query domains. An overview of the
IP-space and query-space construction processes from raw packets to visualization
are presented in Figure 5.1 and Figure 5.3.
5.3.3 Flow-Map IP-Space Visualization
The IP-space visualization uses what we call a Flow-Map, spatially presenting
information regarding packet counts and types over time. The IP-space consists
of 4-octets, resulting in over 4 billion unique values/indexes. To properly visualize
such a space would require a four-dimensional cube, or a very tightly indexed 1D
histogram. The compromise reached with our DNS experts, to maintain a fine-level
of detail without overwhelming the user, as summarized in Figure 5.1, is to reduce
the IP space from over 4 billion to 10, 000 values, by combining the first and second
octet pairs into a single value. These values are used to bucket (bin) the IP-addresses
of the packets into a 2D IP-space. Our current scheme does not take into account
autonomous domains. Therefore, some IP addresses that belong to very different
149
autonomous domains will be sent to the same bucket while IP addresses that belong
to the same corporation could be sent to different buckets. It should be possible
in future, to add an Autonomous Domain layer above it, showing only those IP-
bins belonging to a particular autonomous domain, or by reorganizing the IP-space
into an autonomous domain space. Due to irregular packet arrival times, packets are
temporally binned into 5 second chunks. Our discussions with DNS experts revealed
that knowing the precise IP and arrival time of particular packet was insignificant
and that gaining a general understanding of the distribution of packet sources is
preferable. A novel characteristic of our visualization is that each bin/cell contains
multiple glyphs, seamlessly linked together, sized based on the number of received
packets (within a given time it represents), and colored transparently based on the
type of packet received (UDP, TCP SYN, etc.). The transparency and sizes of the
glyphs can be adjusted to aid in minimizing occlusion for certain views, revealing
hidden information, and to emphasize different bins with certain counts of packets.
The advantage of this Flow-Map representation is that time is represented as a
spatial component, removing temporal animations or scrolling through individual
time slices, thereby providing a globally and temporally coherent model for the
analyst. Within this constructed visualization, analysts may freely move and rotate
their view to get a high-level overview of the space, zoom in close for an in-depth
analysis, and change the current transparency and glyph scaling levels using the
keyboard and mouse. Hovering the cursor over any given element presents additional
information, among which are the range of the IP bin, the number of packets, and
the time.
150
5.3.4 IP-Space Observations
Three general observations can be made using the Flow-Map representation as
shown in Figure 5.1. First, most of the space is empty, suggesting that most queries
fall into relatively few IP-bins. Second, for most filled bins, the glyphs are relatively
small, suggesting that most of the queries received are singletons (customers send
just one or a few packets). Third, there are a few persistent high-packet count
buckets that send out thousands of queries in just a few seconds. From this IP-
space representation, an analyst can grasp the nature of the changing volume of
traffic.
Within the first few hours of data, we found a small selection of interesting
patterns. First is the anomaly in highlighted in the middle of Figure 5.1 which
shows an instance of high TCP-based packet activity, as indicated by the green
color. This was a large burst of packets, as indicated by the large size of the glyphs,
and was distinct in that no TCP activity preceded or followed this period lasting
roughly a minute. In the left Figure 5.4, we have extracted three IP bins that have
regularly repeating, self-similar, internal patterns of traffic. This subset of data can
be seen near the top-right of the blue Flow-Map in Figure 5.1. From our discussion
with D-Root experts, this traffic might be from external monitoring sources, who
periodically query the root DNS, resulting in this regular pattern. After a closer
look at the queries from these IP-bins in the query-visualization, we found that the
received queries were generally of the form *.trafficmanager.net. The top-right
of Figure 5.1 shows the distribution of IPs used in the TCP-SYN flood attack (half
151
䐀攀攀瀀 䰀攀愀爀渀椀渀最 一攀琀眀漀爀欀 匀琀爀甀挀琀甀爀攀 嘀愀爀椀愀琀椀漀渀愀氀 䄀甀琀漀攀渀挀漀搀攀爀
䤀渀瀀甀琀 儀甀攀爀礀 吀䘀ⴀ䤀䐀䘀  匀瀀愀琀椀愀氀 儀甀攀爀礀 匀瀀愀挀攀 匀瀀愀琀椀漀琀攀洀瀀漀爀愀氀 儀甀攀爀礀
刀愀眀 倀挀愀瀀 䘀椀氀攀猀 䤀渀瀀甀琀 儀甀攀爀礀   嘀漀氀甀洀攀
⸀⸀ 吀䘀ⴀ䤀䐀䘀
眀眀眀眀⸀䜀漀漀最氀攀⸀挀漀
夀愀栀漀漀℀ 䜀攀渀攀爀愀琀攀 䰀椀猀琀 漀昀  刀攀氀甀 䔀渀挀漀搀攀爀
砀㔀䬀㈀䴀氀瀀㄀ 　⸀㜀 砀 䤀渀瀀甀琀 匀椀稀攀
锡㨀欀攀眀㌀搀 儀甀攀爀椀攀猀 ⸀⸀
⸀⸀ 刀攀氀甀
⸀
⸀ 䌀漀洀瀀甀琀攀 吀䘀ⴀ䤀䐀䘀  ⸀⸀ 　⸀㔀 砀 䤀渀瀀甀琀 匀椀稀攀 ⠀堀Ⰰ夀⤀
䘀攀愀琀甀爀攀 嘀攀挀琀漀爀猀 刀攀氀甀
⸀⸀ 　⸀㌀ 砀 䤀渀瀀甀琀 匀椀稀攀 紀䐀攀挀漀搀攀爀匀椀最洀漀椀搀
⸀
⸀ ㈀ 一漀搀攀猀
吀爀愀椀渀  刀攀挀漀渀猀琀爀甀挀琀椀漀渀 䰀漀猀猀
䄀甀琀漀䔀渀挀漀搀攀爀 䴀漀搀攀氀  ☀ ⠀堀Ⰰ夀⤀ 䬀䰀ⴀ䐀椀瘀攀爀最攀渀挀攀 䰀漀眀 儀甀攀爀礀 䌀漀甀渀琀 䠀椀最栀 儀甀攀爀礀 䌀漀甀渀琀
甀猀椀渀最 䄀氀 䐀愀琀愀 䜀攀渀攀爀愀琀攀 ㈀䐀
匀愀瘀攀 䴀漀搀攀氀 䘀愀挀椀氀椀琀愀琀攀 䠀椀最栀 愀渀搀 䰀漀眀 䰀攀瘀攀氀 䐀椀猀琀爀椀戀甀琀椀漀渀 吀攀洀瀀漀爀愀氀 䔀砀瀀氀漀爀愀琀椀漀渀
Figure 5.3: A overview of the process from raw pcap files to Query-Space visualiza-
tion. Starting from a binary pcap file, we extract and count each query. Next, each
query is converted to a TF-IDF (Term Frequency, Inverse Document Frequency)
character-level feature vector. A deep learning autoencoder is trained using all
queries, to generate a visually coherent spatial distribution of queries when projected
into a 2D space. This distribution of queries is then visualized using 3D accelerated
rendering, which allows for high-level temporal structure (top-right) and low-level
query analysis (bottom right).
Figure 5.4: Interesting self-similar patterns of intra-IP-bin queries and across-IP-bin
traffic (from the TCP-Syn DDOS) over time in D-Root traffic.
152
of the first IP Octet, and the entirety of the other octets, contrary to the official
report indicating the totality of the IPv4 space was spoofed) , the decreased volume
of traffic from lost customers, and the resulting hard-drive failure as a result. The
right of Figure 5.4 shows a high inter-bin temporal similarity found across all IP-bins
involved in the TCP-Syn flood. It is possible that all these characteristics could serve
as an attack signature. Thanks to the preservation of the low-level of detail, which
would otherwise be summarized or abstracted away by other tools, these interesting
patterns and anomalies were identified, driving further investigation.
5.3.5 Deep Learning Driven Query Space Visualization
Our query-space visualization provides analysts with a deeper understanding
of the distribution of received queries. We organize the non-spatial queries into a
spatial representation, enabling easy detection of patterns and structure from the
large amount of data using deep learning. Each query is visualized as a 3D sphere,
positioned near similar queries, and colored to indicate the number of times it was
received. Using this visualization, an analyst can see the high-level distribution and
volume of queries, then drill down to discover the precise queries received and draw
observations, panning and zooming the camera with the keyboard or mouse, and
obtain details for a specific data-element by hovering over it with the cursor. The
goal is to provide high-level information, in the form of natural and easy to interpret
geometric categorical structures as generated by deep learning and to facilitate low-
level investigation by providing an analyst to get close and personal with the raw
153
䌀愀琀攀最漀爀椀稀攀搀 儀甀攀爀礀 䰀愀琀攀渀琀 匀瀀愀挀攀 ㌀䐀 匀瀀愀琀椀漀琀攀洀瀀漀爀愀氀 儀甀攀爀礀 匀瀀愀挀攀
刀愀渀搀漀洀 䌀栀愀爀愀挀琀攀爀猀
樀焀甀瀀挀礀漀栀洀眀
砀欀挀樀砀椀栀稀栀最焀 䐀䐀伀匀 䄀琀琀愀挀欀 䤀渀琀攀爀爀甀瀀琀椀漀渀
樀戀眀砀眀眀爀洀瘀甀搀甀栀甀
愀焀唀䰀䰀砀䈀䘀䰀洀䰀儀樀䤀礀 䤀倀 䌀漀渀昀椀最甀爀愀琀椀漀渀猀
吀栀儀礀夀䔀戀琀䬀䔀 㤀㤀⸀㄀㄀㔀⸀㄀㄀⸀㄀㜀㔀Ⰰ 㤀㤀⸀㄀㄀㔀⸀㄀㄀⸀㄀㜀㔀
䬀渀栀最夀堀匀唀樀䈀焀䌀䌀氀 㐀㈀⸀㄀㈀㜀⸀㄀　㔀⸀㈀　㔀Ⰰ
䠀漀洀攀 䄀渀搀 䰀漀挀愀氀 最㤀ⴀ㈀㜀樀瀀最㘀　稀洀⸀㄀㠀㔀⸀㌀㠀⸀㄀㐀㤀⸀㔀㜀 　㘀㘀⸀㌀㔀㘀⸀㈀㔀㌀⸀㄀㄀㘀
䐀䄀唀䴀䌀伀䴀䴀䔀刀䌀䤀䄀䰀⸀䰀伀䌀䄀䰀 戀㄀㤀㜀ⴀ琀㄀㘀　　ⴀ㄀㈀ ㄀㄀㈀㔀㔀⸀㈀㔀㔀⸀㈀㔀㔀
䄀唀吀伀䐀椀猀䌀漀瘀攀刀⸀䌀伀氀漀最⸀氀伀䌀愀氀 愀挀㐀攀　㈀㈀挀㄀㄀昀㘀㤀昀㄀㠀㔀㜀㠀㐀挀㈀㜀愀㜀㜀愀㄀㄀挀㠀㄀ ㄀㤀㈀⸀㄀㘀㤀⸀㄀　⸀㄀ⴀ㈀㔀㐀
挀琀愀欀一渀搀瀀爀攀⸀栀伀䴀䔀猀吀䄀琀䤀漀渀 爀　　㔀愀渀　挀
倀椀一甀䜀眀爀眀稀最⸀栀漀洀攀猀琀愀吀䤀伀渀
最攀攀欀渀漀愀挀愀爀爀⸀䠀漀䴀攀猀琀䄀吀椀漀渀
琀䌀挀漀猀漀搀礀⸀䠀伀䴀䔀匀琀愀琀䤀伀渀
䐀唀愀䰀䌀伀刀䔀⸀䠀䤀琀爀漀一栀甀戀⸀栀漀洀攀
倀刀䄀䠀栀䜀焀眀䐀⸀栀伀䴀攀 儀甀攀爀椀攀猀 圀椀琀栀 䐀漀洀愀椀渀猀
匀䴀匀开匀䰀倀⸀猀礀搀氀愀戀⸀氀漀挀愀氀
䐀猀氀䤀稀砀稀漀娀䠀䘀䜀瘀甀甀⸀栀伀洀攀
䄀䐀䌀㈀㄀　㤀㔀⸀砀氀栀攀愀氀琀栀⸀氀漀挀愀氀
眀瀀愀搀⸀樀椀洀猀开愀甀琀漀
唀渀甀猀甀愀氀 儀甀攀爀椀攀猀 　㔀　㌀⸀倀爀攀猀椀搀椀漀⸀䌀漀爀瀀
⨀⸀搀渀猀ⴀ猀搀⸀甀搀瀀⸀⨀ 樀漀戀猀⸀愀昀瀀渀攀琀⸀漀爀最⼀䘀䔀倀
眀眀眀⸀쌃鈁ꌃ儥뀀쌃جئ쌃쌃鈁쐀⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀 倀䌀ⴀ㘀ⴀ倀䌀⸀氀漀挀愀氀 䤀倀猀Ⰰ 一甀洀戀攀爀猀Ⰰ 愀渀搀 
眀眀眀⸀쀃ဥ쀃똀쀃ဥ쀃鈁쀃ȥꘃథ쌃⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀 渀猀㄀⸀愀搀漀猀⸀渀攀琀
渀猀㠀⸀眀攀氀挀漀洀瀀愀渀椀攀猀⸀挀漀洀 䌀漀搀攀 䘀爀愀最洀攀渀琀猀
⨀⸀瀀欀㔀　　㄀稀 ㈀㐀　㔀─㌀䄀㈀　㔀─
昀攀㠀　㨀㠀攀㌀愀㨀攀㌀昀㨀昀攀㌀㔀㨀攀㜀㐀㤀─眀氀愀渀　─㄀㐀
㌀㘀㐀㌀㘀㜀㐀㨀
栀琀琀瀀㨀⼀⼀嬀昀攀㠀　㨀㈀㄀搀㨀戀㔀昀㨀昀攀㈀㈀㨀攀昀挀㘀崀
唀渀甀猀甀愀氀 䌀栀愀爀愀挀琀攀爀猀 　㄀⼀㘀㈀㄀㜀㘀
嬀㐀㄀㌀㈀㌀㜀㜀⸀㤀㘀　　㐀㌀崀
윀�蠥欥   %开眀搀 㘀㘀⸀㌀⸀㄀㘀㔀⸀㈀㐀㈀⼀㈀㤀
眀攀焀㜀㄀最欀㔀樀㠀欀ⴀ焀⸀攥  쐀儀㴀넃 ☀攀瘀攀渀琀琀礀瀀攀㴀挀氀漀猀攀☀爀攀愀猀漀渀㴀㔀☀
윀甀䬀蠥 넃嬥 一堀 堥蠥錥ဥ 昀漀爀㴀㄀㘀㘀娥⸀椀渀琀攀爀渀愀氀 吀椀洀攀
Figure 5.5: In the lower-left, the 2D query-space clustered into semantic query cate-
gories. There is a general trend with alphabetic-based queries towards the left, and
numeric-based queries the right, along with a general trend to normal characters at
the top, and unusual characters at the bottom. The lower-right shows the temporal
query visualization, portraying a high-level temporal overview of the distribution of
queries. The top reveals a selection of interesting observations, such as, rapidly di-
minishing groups of queries at the start, temporally repeating groups of queries, the
large reduction in queries during the attack, and an overall decrease in the number
of queries over time.
query data.
To generate the distribution of queries, we use deep learning to find patterns
within the set of queries, and then project those queries into a 2D plane. Previous
techniques, such as spectral clustering and PCA were unable to generate adequate
representations, less cope with the huge quantities to be projected, due to the re-
154
quirement of a full similarity matrix between all pairs of elements. In addition,
the lack of a ground truth or implicit notion of similarity inhibits the use of other
traditional methods. Therefore, we use deep learning, which is able to handle the
data in chunks, maintain a global conception of the data-space, and learn its own
metric of similarity from the data rather than have one imposed. Each query string
is converted to a vector of TF-IDF (Term Frequency, Inverse Document Frequency)
values, resulting in a 256-length character feature vector. TF-IDF is commonly
used in natural language processing algorithms for converting text into a machine-
understandable format by converting each word/letter into a value based on its
importance to the local sentence/word relative and the entire corpus. All query fea-
ture vectors are used to train an unsupervised variational autoencoder, which learns
to project the queries into a latent space, with the constraint that it must then
accurately recover the original input feature vector from that projection, resulting
in similarly structured queries as spatially co-located. An example of the queries
projected into a semantic latent space is presented in the lower-left of Figure 5.5.
The deep learning network uses four decreasingly sized dense layers, to capture
interesting co-occurrences of features, each using a Rectified Linear Unit activation
function [127] (Relu), defined as max(0, x), except the last output layer, which uses a
Sigmoid, defined as 1 − . This network structure is embedded into a Variational1+exp( x)
Autoencoder, which refines the internal weights. A visual overview of the model is
presented in Figure 5.3. The model is trained using all queries as a pre-processing
stage before the visualization starts. The libraries TensorFlow [125] and Keras [126]
are used to build, train, and process the deep learning network.
155
A coarse labeling of the query-space visualization with examples of queries
within each group are presented in the left of Figure 5.5. There is a general trend
with alphabetic-based queries appearing on the left, and numeric-based queries ap-
pearing on the right, along with a general usage of normal characters on the top,
and unusual characters on the bottom. Through our investigation, we have identi-
fied several interesting general groups of queries. The most common form of queries
are those primarily composed of random configurations of alphabetical characters.
Other discovered groups include configurations of IP addresses and numbers, in-
valid queries which include distinct domains such as .com and .org, device names
with local and home domains, and more strangely, queries consisting of unusual
(non-alphanumeric) characters (most likely binary and code fragments).
5.3.5.1 3D Query Flow Visualization
The enormous scale of the DNS data pushes the limits of dimensionality pro-
jection. Earlier solutions would, in general, independently project down segments
of the overall high-dimensional data, creating separate local models, resulting in
projections that could not be directly compared due to differences in projection
axes, or result in a gradual change in axes as more data is projected. Both solu-
tions are undesirable due to the increased cognitive burden of constantly updating
one’s understanding of the local axes and interpretation of the projection. With
deep learning, we are able to remove this effect by training a globally consistent
model, providing analysts with a cohesive and consistent representation. In addi-
156
tion, provided a set of projected points, displaying a time-varying scatter-plot is
a challenging problem. A common solution is to use animation to convey time.
However, research has shown animation incurs higher cognitive loads, can be of-
ten difficult to comprehend, and is generally less effective in communication than a
static visualization [170]. Therefore we opted for a static approach for visualizing
temporal information similar to the IP-space. However, unlike the IP-space which
consists of regular interval IP-addresses, the queries have arbitrary distances be-
tween each other, requiring a different visualization method. Another solution is to
stack individual 2D scatter-plot time-slices, but for large datasets, this leads to an
overwhelming amount of information, as well as a large amount of visual clutter.
Therefore, we use a volume-rendered marching cubes approach using CUDA and
OpenGL. This view is designed, in contrast to the earlier blanket temporal 2D view,
to provide analysts with a structural high-level overview of how the query-space
changes over time, naturally revealing temporally repeating, absent, and anomalous
queries. Similar to the IP-space visualization, the camera in the temporal query-
space visualization may be translated, rotated, and zoomed for closer inspection
using the keyboard and mouse. An example of this temporal query visualization is
presented in Figure 5.5.
Among many observations, one unexpected finding is that as time continues,
the overall number/volume of queries, particularly on the periphery involving un-
usual queries, decreases. Starting from the beginning of the day, many groups of
queries have high volume and variety, in particular, those belonging to the .pk5001z
group. However, as the day continues, this cluster of queries decreases, with a reg-
157
ularly repeating increase and decrease in volume. Other groups, such as those
consisting of invalid IP-address fragments, start with a high volume of traffic, but
quickly drop-off, as shown in the top-left of Figure 5.5.
Another emergent observation is the large quantity of temporally repeating
queries, which fade in and out, growing and shrinking in volume and diversity.
A large group of these, consisting of random character queries, have temporally
aligned high and low points, as shown in the middle of Figure 5.5. Others, such as
the aforementioned .pk5001z group, along with many groups of unusual character
queries, such as the group of queries of the form *_dns-sd.udp_*, also share an
intermittent and oscillatory emission. The discovery of such patterns would have
been almost impossible without such a visualization. Due to the oscillatory nature
of these queries, it is therefore likely they originate from an erroneous program.
Lastly, the TCP-Syn flood DDOS, as visually portrayed distinctively from the
norm in the IP-space visualization as a large increase in activity in TCP-Syn packets,
is measured as an absence of queries in the 3D spatiotemporal visualization, notably
as the reduction in traffic before the large empty gap in the right of Figure 5.5.
Although we see that some queries are processed, the majority of traffic, particularly
on the periphery, has ceased. The large gap, similar to its portrayal in the IP-space,
corresponds to a hard-ware failure. In the IP-space visualization, we learned that
there was an overall reduction in the number of queries after the attack. In this view,
we can also see that only some of the traffic has returned, but now we are informed
that primarily those queries consisting of domains are processed, and unexpectedly,
those groups of queries on the periphery are mostly absent.
158
5.3.6 Dual IP-Query Visualization Interaction
In contrast to previous cyber-visualizations, which focus primarily on the
counts of packets, we present a visualization capable of providing analysts with
a well-rounded and complete representation of the DNS data. This involves a dual
representation, namely an IP and query space. In our system, the IP-space view is
presented on the left of the screen, with the query-space shown on the right. Users
simply move their cursor from one view to the other to direct their input focus. Un-
til now we have focused on the construction, interaction, and discoveries made with
these visualizations independently. In this section we review the interactions and
discoveries made when analyzing the D-Root DNS data in a dual-representation.
Suppose the analyst desires to know from which IPs and times a particular
query originated. In the query-space visualization, an analyst can double-click on
the particular query, to highlight the corresponding bins and times in the IP-space
view. We are not filtering out the response packets from the D-Root DNS, so most
queries will have at least two IP-space occurrences. To select multiple queries, an
analyst holds control+click to brush-select the queries. One selection of queries,
presented in the top-left of Figure 5.6 originate from a wide range of IP-bins and
time periods. The selected queries are of the form *.pk5001z, whose occurrence
in the query-space visualization surprised our DNS experts. Initially, they thought
these queries were from one or a few IPs (old hardware), but in the dual interaction,
we see that these packets come a wide range of IP addresses for a long duration
of time. At the bottom-left of Figure 5.6, the selected group of queries arrives in
159
three IP-bins, one corresponding to the D-Root, and the others corresponding to
sources, indicating that these queries are an anomaly due to the small number of
sources. Interestingly, one source is consistent, with the other intermittent. One
query was “211.67.67.217.”, with the remainder of a similar structure. Using our
visualization and interaction methodology, analysts may also to do the reverse, and
select individual IP-Time-Bins and visualize the corresponding queries, or select an
entire duration by double-clicking the IP bin, as shown in the bottom-right of Fig-
ure 5.6, revealing this temporally oscillating IP-bin (the middle-left from Figure 5.4
primarily consists of *.trafficmanager.net queries. The developed visualizations
along with the dynamic interactions enable analysts to visually identify new be-
havior, develop hypotheses, and overall gain a deeper understanding of the network
flow.
5.4 Empirical Validation
The primary motivation of our visualization is to help analysts understand the
distribution of queries they receive, how it changes over time, identify anomalous
behavior, and help them explore new questions. Throughout our investigation, we
found and reviewed several trends and anomalies using our visualization. We present
a few insights that were discovered by three DNS experts, who will be named A,
B, and C. The discoveries made would have been difficult, if not impossible, to
make through traditional analysis tools that lack organizing DNS information in
an intuitive spatial and temporal representation. The discoveries presented here
160
Figure 5.6: Two selected regions of queries. The top selection indicates the queries
originate from a wide range of IPs, while the bottom selection indicates those queries
came from very few IPs. Non-included IPs may be set transparent (bottom) or left
un-transparent(top). The bottom-right image shows the resulting queries included
from an entire IP-bin.
originate from multiple joint discussions and sessions with both the authors and
DNS experts engaging with the visualizations.
5.4.1 DNS Expert A
DNS expert A stated that they would often monitor the overall health with
a pcap analysis program, and look at a small random selection of packets to see
how those queries looked. As expert A was interacting with our visualization, he
161
儀䌀匀栀䈀䨀倀夀䄀
爀唀䘀氀娀夀䨀礀堀
甀䜀樀䤀儀䬀䈀䨀戀
倀琀搀砀刀堀夀嘀椀挀 匀吀䬀䠀䌀娀娀夀
砀堀堀砀堀 稀儀䜀䌀焀䔀䄀礀礀䰀甀倀娀䰀樀焀嘀䠀一
挀瘀愀瀀瀀氀琀⸀䈀攀氀欀椀渀 䜀一伀伀儀䘀圀刀
渀堀倀堀刀眀搀唀䜀爀䜀圀匀 樀伀䌀栀眀圀䨀
堀眀瀀稀堀伀䠀稀䨀吀
猀䤀䜀椀栀娀最眀娀渀琀䘀娀
愀儀倀吀堀渀伀䨀稀倀最一椀
愀樀戀椀昀甀椀栀愀
儀䈀稀渀娀眀氀眀樀䤀樀䄀 樀戀眀礀欀爀瘀樀 攀洀渀攀眀樀戀漀椀
焀猀愀砀琀洀樀
䬀儀圀漀䰀樀稀瀀栀 洀攀椀昀欀栀椀渀甀樀 樀挀戀眀椀焀栀栀挀
洀洀最氀琀愀最稀⸀挀漀洀
Figure 5.7: The region of the query-space consisting of different distributions of
random characters.
zoomed into a cluster of points and noted that there was a high volume of lower-case
random character queries. He pointed out it was interesting that our visualization
could cluster such queries which could come from the Chrome internet browser.
Upon further discussion, we learned that when Chrome starts, it tries to learn the
nature of the DNS it sits behind by issuing multiple random queries, as ISPs tend
to wildcard DNS servers to catch all domains and load advertisements. If the result
of the random queries is a valid response, then Chrome knows something is playing
strange with the DNS. For those who do not sit behind one of these particular
162
ISPs, these random queries end up at the root to be resolved. While we cannot
attribute all of the random lower-case queries to Chrome, it is likely responsible for
a large majority. In our visualization, there is a large chunk of the distribution space
dedicated solely to sets of random characters, with small differences between them
(typically the frequency and capitalization of individual letters), as can be seen in
Figure 5.7.
In addition, DNS expert A was able to learn that many random character
queries contained valid domains. When the root encounters such queries, it forwards
those queries to the authority domain listed as part of the query. This could lead to
a kind of DDOS attack from re-directed queries. With this new information and our
visualization, it may be possible to establish filters to mitigate the effects of such
queries.
Lastly, DNS expert A noticed a large collection of queries from different routers
and modems. Human error or malfunctioning machines often result in erroneous
queries. One example of this unusual behavior was the presence of a large number
of queries of the form *.pk5001z. This initially struck our DNS experts as very
unusual, and after some investigation on their end, they found that these types of
requests are typically associated with a particular model of modems, namely the
PK5001Z flavor of modem. The presence of these queries at the root indicates that
someone somewhere has a miss-configured or infected modem sending erroneous
queries.In addition, there was an entire distinct cluster dedicated to queries of the
form *.Home, *.Belkin, and *.local, indicating erroneous configurations of home
routers and devices. The presence of router and modem based queries, while known
163
to our analysts, surprised them by their variety and the age of the originating
hardware. In particular, DNS expert A found a set of queries belonging to a 20 year
old version of OS VXworks. Using our visualization tool, our DNS experts have
been informed on the scope of this problem, and that this traffic can lead to intense
bursts when an outdated system desperately searches for a valid DNS response.
5.4.2 DNS Expert B
嬀㐀㘀㌀　㐀㔀㐀　⸀㘀㌀㠀㤀㄀㘀崀
椀瀀ⴀ㄀㈀㤀ⴀ㄀㈀㄀ⴀ㄀㘀ⴀ㐀㤀⸀氀漀挀愀氀
搀瘀焀㤀砀㘀椀ⴀ樀瀀㜀㌀搀⸀㈀　㠀⸀㔀㐀⸀㐀⸀㈀㔀㄀Ⰰ 㔀㈀⸀㌀㌀⸀㄀㘀㔀⸀㄀㤀㈀ ㄀㜀㌀⸀㔀㔀⸀㈀㌀㐀⸀㈀㌀Ⰰ ㄀㜀㐀⸀㔀㔀⸀㈀㌀㐀⸀㈀㌀
㈀㘀　㄀─㌀䄀㐀　　　─㌀䄀㘀挀搀㜀─㌀䄀攀㄀㜀戀─㌀䄀戀㔀攀㜀─㌀䄀㠀戀㤀戀─㌀䄀昀㘀㄀　 椀瀀ⴀ㄀　ⴀ　ⴀ㘀ⴀ㐀
昀攀㠀　㨀㈀　㘀攀㨀㤀挀昀㨀攀攀挀㠀─瀀㈀瀀　─㈀㔀
㄀　⸀㈀㤀⸀㄀㘀⸀㤀㐀㨀㔀㤀㠀㄀㄀ 㤀㈀⸀㔀㐀⸀㄀㐀　⸀㔀㐀　㨀㌀㈀㠀㐀㠀
栀琀琀瀀㨀⼀⼀嬀㈀㘀　㐀㨀㈀搀㠀　㨀挀　㄀昀㨀挀　㠀㐀㨀攀㐀搀　㨀攀㜀愀　㨀㄀㠀㌀愀㨀㄀昀㠀㔀崀㄀　⸀㤀㈀⸀㜀㤀⸀㈀㌀㈀ⴀ㄀　⸀㤀㈀⸀㜀㤀⸀㈀㌀㈀⸀眀愀瀀愀⸀椀渀琀Ⰰ
㄀㄀㌀⸀㈀㠀⸀㌀㄀㔀⸀㄀㠀㘀
㄀㘀㜀⸀㄀⸀㌀㄀㐀⸀㈀㘀 ㄀㜀㐀⸀㈀㌀⸀㔀　⸀㄀㘀㄀Ⰰ─㈀　㘀㘀⸀㄀　㈀⸀㘀⸀㈀㐀㌀
㈀㔀㐀⸀㤀㤀⸀㈀㘀㔀⸀㄀㜀㐀
㄀㠀㄀⸀㄀㘀㔀⸀㈀㌀㠀⸀㠀㤀Ⰰ 㘀㘀⸀㈀㐀㤀⸀㠀㔀⸀㠀
开㔀㌀㠀开㤀㤀开㈀
昀漀爀㴀㄀㜀㈀⸀㔀㘀⸀㐀　⸀㠀㘀 开㜀㠀㠀开㔀㜀开㈀
昀漀爀㴀㄀㄀㐀⸀㜀㜀⸀㄀㠀　⸀㄀㔀㈀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
☀攀瘀攀渀琀琀礀瀀攀㴀挀氀漀猀攀☀爀攀愀猀漀渀㴀㌀☀搀甀爀愀琀椀漀渀㴀㤀㔀㔀
Figure 5.8: The region of the query-space consisting of different distributions of IP
addresses, fragments, and expressions.
Rarely, people directly enter an IP address into the web-browser to directly
164
connect to a specific IP-enabled device. However, expert B noted queries containing
IP addresses, which often contain different mistakes. Our visualization is able to
identify and cluster these mistakes. Previously, IP-based investigations primarily
use the source IP in the packet header, rather than look at IPs in the query itself.
Using our visualization, we can see the distribution of queried IPs and the mistakes
made when querying them. DNS expert B found that the most common mistake
is the usage of an invalid IP-octet (> 255), or an incorrect port address, often
using brackets, dashes, or parenthesis to delineate port. More elaborate mistakes
include entering too few or too many octets, surrounding the IPv4 address with
brackets and other formatting, or enter two or more IP addresses at once, separated
in many different ways. Other errors include partial URLs followed or preceded
by IP addresses, erroneous bit masks, IP addresses which replace different numbers
with letters (perhaps in an attempt to use IPv6), strange hybrid combinations of
URLs with IPv6 IPs, generally in the form of http://, and IPs containing many
percent symbols, perhaps in an attempt to use a regular expression, or as a fragment
from a printf statement. An instance could be programs erroneously copying code
or URL fragments into a browser DNS query packet. As a result, many of these
queries reach the root DNS. In our query-space visualization, there are a few clusters
dedicated to these types of queries, as shown in Figure 5.8.
In addition, we also often find command and instruction segments or simple
statements, such as a large occurrence of for= statements, boolean expressions, and
variable assignments. For other queries, we find many instances of queries struc-
tured as www. followed by a random collection hexadecimal and unusual characters.
165
We believe these are instances of broken applications going through random permu-
tations of URLs trying to resolve to a valid response.In addition, sending commands
through DNS is a common way to control bot-nets. Learning of the occurrence and
distribution of these queries have reinforced their belief that the majority of traffic
they receive is machine rather than human generated. Just as with the random
characters, knowing the types of queries containing IP addresses, and knowing that
they cannot be resolved, would allow automatic filtering of such traffic earlier.
5.4.3 DNS Expert C
Expert C noted that there were a large number of queries that contain un-
usual characters as shown in Figure 5.9. These characters are those that cannot be
interpreted by normal ASCII.Therefore, for our parsing purposes, we relied on the
ISO/IEC 8859-1:1998 (also known as Latin-1) character encoding to properly decode
and display the queries. The very existence of these queries is unusual, as people do
not generally perform queries using such characters. Expert C has theorized that
these queries are data and code binary fragments, likely erroneously copied from an
invalid buffer. Another theory is they consist of exfiltrated data exploiting the DNS
system (DNS tunneling). Possible examples of this were the large occurrence of long
queries containing sequences of prodID= and other delineated information. Another
source of these unusual characters could be bad Unicode translation in software ap-
plications. In our IP and temporal query-space visualization, these particular sets
of queries tend to fluctuate depending on the hour of the day (with many only being
166
爀⸀开搀渀猀ⴀ猀搀⸀开甀搀瀀⸀윀堥ꈀ 爀⸀开搀渀猀开猀搀ⴀ⸀甀搀瀀⸀جئ쌃
戀⸀开搀渀猀ⴀ猀搀⸀开甀搀瀀⸀瀀頃%
眀眀眀⸀딀ꘃجئ쌃쌃⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀 氀椀攀甀琀愀甀搀─㈀　琀爀愀渀猀瀀漀爀琀─㈀　琀漀甀爀椀猀琀椀焀甀攀
搀戀⸀开搀渀猀ⴀ猀搀⸀开甀搀瀀⸀倀䠢ꈀ
眀眀眀⸀ꘃ쐀頃였ꌃ唥�⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
䤀渀琀攀爀渀愀氀 攀爀爀漀爀 ⴀ 䤀渀瘀愀氀椀搀 䄀琀漀洀
栀氀樀眀甀猀昀眀⸀吀栀攀爀攀 愀爀攀渀✀琀 愀渀礀 匀攀愀爀挀栀 䐀漀洀愀椀渀猀 猀攀琀 漀渀 唀匀䈀 ㄀　⼀㄀　　⼀㄀　　　 䰀䄀一
搀㄀㘀㜀瘀㜀㌀㤀砀昀㄀樀最⸀윀猀尀搀ꀀꀀ椥 ꀀꀀ氥ⴀ堀栀
栀攀氀漀㴀洀挀㈀ⴀ瀀爀漀搀ⴀ最攀渀⸀愀最漀爀愀⸀氀漀挀愀氀⤀
攀　　㄀戀甀爀欀眀㄀㐀　㤀㔀搀㰀䌀 伀 䔀㈀
爀琀砀㔀琀瀀㄀最瘀昀㈀㔀氀⸀ꀀꀀ㰥春愢ꀀꀀ쐀儀㴀ꘃ ⠀渀甀氀⤀㌀㌀
眀眀眀⸀最愀爀搀攀爀椀攀帥థ洀攀氀椀攀⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
稀㤀漀瀀挀挀椀㘀砀最瀀㜀最⸀ꀀꀀ最넃ꀀꀀ쐀儀㴀뀀 眀眀眀⸀栥鄥栥儥搥搥جئ搥栥唥栥儥栥唥⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
眀眀眀⸀쌃鈁ꌃ儥뀀쌃جئ쌃쌃鈁쐀⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
眀眀眀⸀頃ဥ딀錥혀쌃唥쐃밀밀ꌃ儥쌃جئꌃ唥윀ꌃ唥ꄀ쌃ꄀꨀ⸀渀攀琀⸀攀挀㈀⸀椀渀琀攀爀渀愀氀
Figure 5.9: The region of the query-space consisting of unusual characters and
queries.
issued in the early morning and late at night as indicated by our temporal query
visualization), suggesting that a machine is likely the initiator. Expert C noted that
these observations would have been very difficult to make without the usage of our
visualization.
5.5 Conclusions
The goal of our visualization was to provide a natural and easy to use interface
for working with large amounts of real-world DNS IP and query data, for providing
167
analysts with a general overview of the distribution of the packet traffic and queries,
while also allowing them to investigate small temporal events, individual queries,
and find correlations between the IP and query spaces. We have shown that using
deep learning to generate spatial representation of non-spatial queries is a very
effective method of presenting such data. By working closely with real-world root
DNS experts, we have been able to find new and interesting anomalies, groups,
and patterns that were previously unknown, and have led to further investigation.
As the internet of things is set to grow exponentially, the number of erroneous,
malformed, and junk queries is set to explode, as well as increasing complexity and
scale of future attacks. Having knowledge of the different types of queries and packet
behaviors, what they tend to look like, and how they change over time, may allow for
DNS analysts to start automatically filtering these packets as the traffic gradually
increases, to keep operations functioning normally. Prior to our visualization, the
DNS experts would often only look at a handful of queries at a time, not fully
grasping the variety and dynamics of the queries flowing through their network.
With this new knowledge and capability, closer inspections of their vast quantities
of DNS data may now be conducted, and a greater preparedness for the future may
now begin with greater confidence.
168
Chapter 6: Conclusions and Future Work
6.1 Conclusions
In this dissertation, we have presented our research towards enabling visual
analytics within virtual environments. We found that the use of virtual memory
palaces with head-mounted displays improves recall accuracy as compared to tradi-
tional desktops. We had 40 participants memorize and recall faces on two display-
interaction modalities for two virtual memory palaces, with two different sets of
faces. The HMD condition was found to have a statistically significant 8.8% im-
provement in recall accuracy compared to the desktop condition. Given the results
of our user study, we believe that virtual memory palaces offer us a fascinating in-
sight into how we may be able to organize and structure large information spaces
and navigate them in ways that assist in superior recall.
We presented our findings of a user study with the goal of continuously mea-
suring and quantifying cybersickness. Using an EEG, the recorded participant data
was decomposed using ICA to separate the underlying sources of the brainwave ac-
tivity and eliminate noise. The independent components were then clustered across
users for the purposes of comparing the EEG of those grouped users. Through
independent component analysis and time-frequency spectral analysis, our findings
169
suggest that a spectral power increase in the Delta, Theta, and Alpha frequency
bands, relative to a baseline, strongly correlates to the presence of cybersickness.
We presented an approach to facilitate the discovery of latent structures and
communities/labels within high-dimensional non-spatial datasets using deep learn-
ing. Through an iterative process, an analyst can select and assign points to new or
existing communities, which is then used in conjunction with deep learning, to refine
the 2D spatial distribution of the points, revealing more new information. We have
shown on three different datasets that our technique is able to reconstruct hidden
structures and communities, enabling both deep learning and the human analyst to
support, refine, and enhance each other through visualization.
We developed a visualization for working with large amounts of real-world DNS
IP and query data. Our goal was to provide analysts with a general overview of the
distribution of the packet traffic and queries, while also allowing them to investigate
large and small temporal events, individual queries, and find correlations between
the IP and query spaces. We have shown that using deep learning to generate
spatial representation of non-spatial queries is a very effective method of presenting
such data. By working closely with real-world root DNS experts, we have been able
to find new and interesting anomalies, groups, and patterns that were previously
unknown, and have led to further investigation.
170
6.2 Future Work
Virtual reality has recently taken center stage in the graphics and entertain-
ment world. The potential possibilities for virtual reality are endless, including
applications spanning entertainment, engineering, medicine, communication, and
others. The DNS visualization research presented in this dissertation is just one of
many real-world applications where analysts struggle to handle the vast and ever-
increasing amounts of data. The amount of data each of us creates, processes, and
interacts with everyday is immense. Visualizing such large amounts of data and in-
ternet traffic all at once, providing analysts with a complete situational awareness,
would traditionally call for a large display. Traditionally, this would mean using
more monitors or using larger displays with more screen real-estate. A radical shift
away from increasing the display dimensions for traditional visualizations is neces-
sary. Instead of expanding the sizes of our displays for visualizing large amounts of
information, it would be interesting if smaller displays, such as head-mounted dis-
plays and virtual reality, could be used to visualize the same or more information.
Our hope is that the work presented here is a first step towards realizing this goal.
171
6.3 Peer Reviewed Publications
• Eric Krokos, Kirsten Whitley, and Amitabh Varshney, “Visual Analytics for
Root DNS Data”, IEEE Symposium on Visualization for Cyber Security (VizSec
2018), Berlin, Germany, October 2018, (accepted for publication, September
2018).
• Eric Krokos and Amitabh Varshney, “Interactive Characterization of Cyber-
sickness in Virtual Environments using EEG”, Virtual Reality, (submitted
September 2018).
• Eric Krokos, Hsueh-Chien Chen, Jessica Chang, Celeste Lyn Paul, Bohdan
Nebesh, Kirsten Whitley, and Amitabh Varshney, “Enhancing Deep Learn-
ing with Visual Interactions”, ACM Transactions on Interactive Intelligent
Systems, (accepted for publication, July 2018).
• Eric Krokos, Catherine Plaisant, and Amitabh Varshney, “Virtual memory
palaces: immersion aids recall”, Virtual Reality (May 2018): 1-15.
• Eric Krokos, Catherine Plaisant, and Amitabh Varshney, “Spatial Mnemonics
using Virtual Reality”, In Proceedings of the 10th International Conference on
Computer and Automation Engineering (ICCAE 2018), pp 17-30, presented
at Brisbane, Australia, February 24, 2018.
• Hsueh-Chien Cheng, Antonio Cardone, Somay Jain, Eric Krokos, Kedar Narayan,
Sriram Subramaniam, and Amitabh Varshney, “Deep-learning-assisted Vol-
172
ume Visualization”, IEEE Transactions on Visualization and Computer Graph-
ics (accepted for publication, January 2018).
• Hsueh-Chien Cheng, Antonio Cardone, Eric Krokos, Bogdan Stoica, Alan
Faden, and Amitabh Varshney, “Deep-learning-assisted visualization for live-
cell images”, In Proceedings of the IEEE International Conference on Image
Processing (ICIP 2017), September 2017, pp 1377-1381, IEEE.
• Eric Krokos, Hanan Samet, and Jagan Sankaranarayanan, “A look into Twit-
ter hashtag discovery and generation”, Proceedings of the 7th ACM SIGSPA-
TIAL International Workshop on Location-Based Social Networks, November
2014, pp 49-56, ACM.
173
Table 1: Trend Scores for each face for Face Set 1 from Google Trends, with the
average trend score of 30.5 and standard deviation of 21.86. The data was collected
from April, May, June, and July of 2015.
FaceSet1 APR MAY JUN JUL AVG
Martin Luther King 14 14 10 8 11.5
Bill Gates 48 50 48 47 48.25
Mahatma Gandhi 57 55 57 54 55.75
Donald Duck 66 71 71 65 68.25
Buzz Lightyear 37 38 36 41 38
George Washington 21 21 15 15 18
George Bush 2 2 2 2 2
Oprah Winfrey 13 12 10 12 11.75
Taylor Swift 59 79 69 68 68.75
Steve Jobs 2 3 2 3 2.5
Michael Jackson 3 3 4 3 3.25
Harry Potter 6 7 8 11 8
Stephen Hawking 43 36 31 30 35
Mona Lisa 38 38 31 29 34
Shrek 9 10 9 9 9.25
Frodo Baggins 19 18 19 17 18.25
Albert Einstein 44 43 39 34 40
Vladimir Putin 36 31 27 22 29
Galileo Galilei 34 35 32 35 34
King Louis XVI 65 73 60 56 63.5
Napoleon Bonaparte 42 44 46 34 41.5
174
Table 2: Trend Scores for each face for Face Set 2 from Google Trends, with the
average trend score of 29.83 and standard deviation of 18.32. The data was collected
from April, May, June, and July of 2015.
FaceSet2 APR MAY JUN JUL AVG
Abraham Lincoln 39 35 26 25 31.25
Katy Perry 37 37 35 34 35.75
Hillary Clinton 32 11 12 13 17
Arnold Schwarzenegger 25 25 34 39 30.75
Tom Cruise 17 16 15 28 19
Batman 27 27 29 37 30
Mickey Mouse 76 75 73 78 75.5
Marilyn Monroe 49 56 64 45 53.5
Testudo 2 2 2 3 2.25
Winston Churchill 48 50 38 36 43
Barbie 42 42 44 45 43.25
Mark Zuckerberg 21 20 18 19 19.5
Robin Williams 2 1 1 2 1.5
Dalai Lama 26 26 32 36 30
Kim Jong-un 20 30 17 16 20.75
Harrison Ford 21 15 12 15 15.75
Bill Clinton 22 18 14 14 17
Michelle Obama 8 6 8 9 7.75
Queen Victoria 48 55 42 40 46.25
Cleopatra 56 52 50 51 52.25
Nikola Tesla 33 36 32 37 34.5
175
Table 3: Angular Resolution of Faces in the Town and Palace scenes, with the
average and standard deviation of the angular resolutions of the set of faces for
each scene. The difference in angular resolutions between the two scenes was not
statistically significant with (p = 0.44 > 0.05).
Face Number Town Palace
1 6.61 7.86
2 7.29 7.86
3 5.72 7.81
4 6.92 7.81
5 7.29 8.02
6 5.83 8.02
7 7.81 4.73
8 7.81 5.46
9 7.60 5.46
10 5.98 4.73
11 6.04 6.51
12 7.44 6.40
13 4.94 4.94
14 4.68 4.94
15 7.44 6.09
16 4.37 6.09
17 5.26 5.46
18 4.53 5.36
19 7.81 7.55
20 5.72 6.35
21 7.81 6.35
Average 6.42 6.37
Standard Dev 1.17 1.16
176
Figure 1: Face Set 1, containing 21 faces.
177
Figure 2: Face Set 2, containing 21 faces.
178
Bibliography
[1] Eugenia M Kolasinski. Simulator sickness in virtual environments. Technical
report, DTIC Document, 1995.
[2] Sue VG Cobb, Sarah Nichols, Amanda Ramsey, and John R Wilson. Virtual
reality-induced symptoms and effects (VRISE). Presence, 8(2):169–186, 1999.
[3] Jorge Poco, Ronak Etemadpour, Fernando Vieira Paulovich, TV Long, Paul
Rosenthal, Maria Cristina Ferreira de Oliveira, Lars Linsen, and Rosane
Minghim. A framework for exploring multidimensional data with 3d pro-
jections. In Computer Graphics Forum, volume 30, pages 1111–1120. Wiley
Online Library, 2011.
[4] Monica Tavanti and Mats Lind. 2d vs 3d, implications on spatial memory. In
Information Visualization, 2001. INFOVIS 2001. IEEE Symposium on, pages
139–145. IEEE, 2001.
[5] Antonio Gracia, Santiago González, Vı́ctor Robles, Ernestina Menasalvas, and
Tatiana Von Landesberger. New insights into the suitability of the third di-
mension for visualizing multivariate/multidimensional data: A study based
on loss of quality quantification. Information Visualization, 15(1):3–30, 2016.
[6] Joseph J LaViola Jr. A discussion of cybersickness in virtual environments.
ACM SIGCHI Bulletin, 32(1):47–56, 2000.
[7] Sharon R Holmes and Michael J Griffin. Correlation between heart rate and
the severity of motion sickness caused by optokinetic stimulation. Journal of
Psychophysiology, 15(1):35, 2001.
[8] Fabio Dell’Acqua, Paolo Gamba, and Alessio Ferrari. Exploiting spectral and
spatial information for classifying hyperspectral data in urban areas. In Geo-
science and Remote Sensing Symposium, 2003. IGARSS’03. Proceedings. 2003
IEEE International, volume 1, pages 464–466. IEEE, 2003.
179
[9] David A Landgrebe. Signal theory methods in multispectral remote sensing,
volume 29. John Wiley & Sons, 2005.
[10] Jaynes Julian. The origin of consciousness in the breakdown of the bicameral
mind. gli Adelphi, 1976.
[11] Henry L Roediger. Implicit and explicit memory models. Bulletin of the
Psychonomic Society, 13(6):339–342, 1979.
[12] Markus Knauff. Space to reason: A spatial theory of human thought. MIT
Press, 2013.
[13] Frances Amelia Yates. The art of memory, volume 64. Random House, 1992.
[14] Joshua Harman. Creating a memory palace using a computer. In CHI ’01
Extended Abstracts on Human Factors in Computing Systems, pages 407–408,
2001.
[15] Robert Godwin-Jones. Emerging technologies from memory palaces to spacing
algorithms: approaches to second-language vocabulary learning. Language,
Learning & Technology, 14(2):4, 2010.
[16] John D Mayer, Peter Salovey, David R Caruso, and Gill Sitarenios. Emotional
intelligence as a standard intelligence. Emotion, 1(3):232–242, 2001.
[17] Howard Gardner. Multiple intelligences: New horizons. New York: Basic
books, 2006. ISBN: 978-0465047680.
[18] Duncan R Godden and Alan D Baddeley. Context-dependent memory in two
natural environments: On land and underwater. British Journal of psychology,
66(3):325–331, 1975.
[19] Richard Skarbez, Frederick P. Brooks, Jr., and Mary C. Whitton. A sur-
vey of presence and related concepts. ACM Comput. Surv., 50(6):96:1–96:39,
November 2017.
[20] Mel Slater. Place illusion and plausibility can lead to realistic behaviour in im-
mersive virtual environments. Philosophical Transactions of the Royal Society
of London B: Biological Sciences, 364(1535):3549–3557, 2009.
[21] Claudia Repetto, Silvia Serino, Manuela Macedonia, and Giuseppe Riva. Vir-
tual reality as an embodied tool to enhance episodic memory in elderly. Fron-
tiers in Psychology, 7:1839:1–1839:4, 2016.
[22] Lawrence W Barsalou. Grounded cognition. Annu. Rev. Psychol., 59:617–645,
2008.
[23] Lawrence Shapiro. Embodied cognition. Routledge, 2010. ISBN: 978-
0415773423.
180
[24] Tamas Madl, Ke Chen, Daniela Montaldi, and Robert Trappl. Computational
cognitive models of spatial memory in navigation space: A review. Neural
Networks, 65:18–43, 2015.
[25] Stefan Leutgeb, Jill K Leutgeb, May-Britt Moser, and Edvard I Moser. Place
cells, spatial maps and the population code for memory. Current opinion in
neurobiology, 15(6):738–746, 2005.
[26] György Buzsáki and Edvard I Moser. Memory, navigation and theta rhythm
in the hippocampal-entorhinal system. Nature neuroscience, 16(2):130–138,
2013.
[27] Edvard I Moser, Emilio Kropff, and May-Britt Moser. Place cells, grid cells,
and the brain’s spatial representation system. Annual review of neuroscience,
31, 2008.
[28] Neil Burgess. Spatial cognition and the brain. Annals of the New York
Academy of Sciences, 1124(1):77–97, 2008.
[29] Oliver Baumann and Jason B Mattingley. Medial parietal cortex encodes per-
ceived heading direction in humans. Journal of Neuroscience, 30(39):12897–
12901, 2010.
[30] Colin Lever, Stephen Burton, Ali Jeewajee, John O’Keefe, and Neil Burgess.
Boundary vector cells in the subiculum of the hippocampal formation. Journal
of Neuroscience, 29(31):9771–9777, 2009.
[31] Arne D Ekstrom, Michael J Kahana, Jeremy B Caplan, Tony A Fields, Eve A
Isham, Ehren L Newman, and Itzhak Fried. Cellular networks underlying
human spatial navigation. Nature, 425(6954):184–188, 2003.
[32] Tom Hartley, Colin Lever, Neil Burgess, and John O’Keefe. Space in the brain:
how the hippocampal formation supports spatial cognition. Phil. Trans. R.
Soc. B, 369(1635):20120510, 2014.
[33] Caswell Barry, Colin Lever, Robin Hayman, Tom Hartley, Stephen Burton,
John O’Keefe, Kate Jeffery, and N Burgess. The boundary vector cell model
of place cell firing and spatial memory. Reviews in the Neurosciences, 17(1-
2):71–98, 2006.
[34] Jangjin Kim, Sébastien Delcasso, and Inah Lee. Neural correlates of object-in-
place learning in hippocampus and prefrontal cortex. Journal of Neuroscience,
31(47):16991–17006, 2011.
[35] Malcolm W Brown and John P Aggleton. Recognition memory: what are the
roles of the perirhinal cortex and hippocampus? Nature Reviews Neuroscience,
2(1):51–61, 2001.
181
[36] V Hok, E Save, PP Lenck-Santini, and B Poucet. Coding for spatial goals
in the prelimbic/infralimbic area of the rat frontal cortex. Proceedings of the
National Academy of Sciences of the United States of America, 102(12):4602–
4607, 2005.
[37] Eric LG Legge, Christopher R Madan, Enoch T Ng, and Jeremy B Caplan.
Building a memory palace in minutes: Equivalent memory performance us-
ing virtual versus conventional environments with the method of loci. Acta
psychologica, 141(3):380–390, 2012.
[38] E Fassbender and W Heiden. The virtual memory palace. Journal of Com-
putational Information Systems, 2(1):457–464, 2006.
[39] Doug A Bowman and Ryan P McMahan. Virtual reality: how much immersion
is enough? Computer, 40(7):36–43, 2007.
[40] Ajith Sowndararajan, Rongrong Wang, and Doug A. Bowman. Quantify-
ing the benefits of immersion for procedural training. In Proceedings of
the 2008 Workshop on Immersive Projection Technologies/Emerging Display
Technologiges, IPT/EDT ’08, pages 2:1–2:4, 2008.
[41] Eric D Ragan, Ajith Sowndararajan, Regis Kopper, and Doug A Bowman. The
effects of higher levels of immersion on procedure memorization performance
and implications for educational virtual environments. Presence: Teleoperators
and Virtual Environments, 19(6):527–543, 2010.
[42] Randy Pausch, Dennis Proffitt, and George Williams. Quantifying immersion
in virtual reality. In Proceedings of the 24th Annual Conference on Computer
Graphics and Interactive Techniques, SIGGRAPH ’97, pages 13–18, 1997.
[43] Roy A Ruddle, Stephen J Payne, and Dylan M Jones. Navigating large-
scale virtual environments: what differences occur between helmet-mounted
and desktop displays? Presence: Teleoperators and Virtual Environments,
8(2):157–168, 1999.
[44] Katerina Mania, Tom Troscianko, Rycharde Hawkes, and Alan Chalmers.
Fidelity metrics for virtual environment simulations based on spatial mem-
ory awareness states. Presence: Teleoperators and Virtual Environments,
12(3):296–310, 2003.
[45] Joel Harman, Ross Brown, and Daniel Johnson. Improved memory elicitation
in virtual reality: New experimental results and insights. In IFIP Conference
on Human-Computer Interaction, pages 128–146. Springer, 2017.
[46] Frederick P Brooks Jr, John Airey, John Alspaugh, Andrew Bell, Randolph
Brown, Curtis Hill, Uwe Nimscheck, Penny Rheingans, John Rohlf, Dana
Smith, Douglass Turner, Amitabh Varshney, Yulan Wang, Hans Weber, and
Xialin Yuan. Six generations of building walkthrough: Final technical report to
182
the National Science Foundation. 1992. TR92-026, Department of Computer
Science, University of North Carolina at Chapel Hill.
[47] Barbara M Brooks. The specificity of memory enhancement during interaction
with a virtual environment. Memory, 7(1):65–78, 1999.
[48] Anthony E Richardson, Daniel R Montello, and Mary Hegarty. Spatial knowl-
edge acquisition from maps and from navigation in real and virtual environ-
ments. Memory & Cognition, 27(4):741–750, 1999.
[49] Maryjane Wraga, Sarah H Creem-Regehr, and Dennis R Proffitt. Spatial
updating of virtual displays. Memory & Cognition, 32(3):399–415, 2004.
[50] Simon T. Perrault, Eric Lecolinet, Yoann Pascal Bourse, Shengdong Zhao, and
Yves Guiard. Physical loci: Leveraging spatial, object and semantic memory
for command selection. In Proceedings of the 33rd Annual ACM Conference
on Human Factors in Computing Systems, CHI ’15, pages 299–308, 2015.
[51] 3DMarko. Medieval town 01 dubrovnik, 2011.
[52] 3DMarko. Palace interior 02, 2014.
[53] John E Harris. Memory aids people use: Two interview studies. Memory &
Cognition, 8(1):31–38, 1980.
[54] Jennifer A McCabe. Location, location, location! Demonstrating the
mnemonic benefit of the method of loci. Teaching of Psychology, 42(2):169 –
173, 2015.
[55] George A Miller. The magical number seven, plus or minus two: some limits
on our capacity for processing information. Psychological review, 63(2):81,
1956.
[56] Alan D Baddeley and Graham Hitch. Working memory. Psychology of Learn-
ing and Motivation, 8:47–89, 1974.
[57] Richard C Atkinson and Richard M Shiffrin. Human memory: A proposed
system and its control processes. Psychology of learning and motivation, 2:89–
195, 1968.
[58] Katerina Mania and Alan Chalmers. The effects of levels of immersion on
memory and presence in virtual environments: A reality centered approach.
CyberPsychology & Behavior, 4(2):247–264, 2001.
[59] Jack M Loomis, James J Blascovich, and Andrew C Beall. Immersive vir-
tual environment technology as a basic research tool in psychology. Behavior
research methods, instruments, & computers, 31(4):557–564, 1999.
183
[60] Thomas D Parsons. Virtual reality for enhanced ecological validity and ex-
perimental control in the clinical, affective and social neurosciences. Frontiers
in human neuroscience, 9:660, 2015.
[61] Maria V Sanchez-Vives and Mel Slater. From presence to consciousness
through virtual reality. Nature Reviews Neuroscience, 6(4):332–339, 2005.
[62] Youngmin Kim, Amitabh Varshney, David W Jacobs, and François Guim-
bretière. Mesh saliency and human eye fixations. ACM Transactions on Ap-
plied Perception (TAP), 7(2):12, 2010.
[63] Robert S Kennedy, Norman E Lane, Kevin S Berbaum, and Michael G Lilien-
thal. Simulator sickness questionnaire: An enhanced method for quantifying
simulator sickness. The international journal of aviation psychology, 3(3):203–
220, 1993.
[64] Taro Maeda, Hideyuki Ando, and Maki Sugimoto. Virtual acceleration with
galvanic vestibular stimulation in a virtual reality environment. In IEEE
Proceedings. VR 2005. Virtual Reality, 2005., pages 289–290. IEEE, 2005.
[65] Bernhard E Riecke, Jörg Schulte-Pelkum, Franck Caniard, and Heinrich H
Bulthoff. Towards lean and elegant self-motion simulation in virtual reality.
In IEEE Proceedings. VR 2005. Virtual Reality, 2005., pages 131–138. IEEE,
2005.
[66] Lisa Rebenitsch and Charles Owen. Review on cybersickness in applications
and visual displays. Virtual Reality, 20(2):101–125, 2016.
[67] JJ-W Lin, Henry Been-Lirn Duh, Donald E Parker, Habib Abi-Rached, and
Thomas A Furness. Effects of field of view on presence, enjoyment, memory,
and simulator sickness in a virtual environment. In Virtual Reality, 2002.
Proceedings. IEEE, pages 164–171. IEEE, 2002.
[68] Ajoy S Fernandes and Steven K Feiner. Combating VR sickness through
subtle dynamic field-of-view modification. In 2016 IEEE Symposium on 3D
User Interfaces (3DUI), pages 201–210. IEEE, 2016.
[69] Simon Davis, Keith Nesbitt, and Eugene Nalivaiko. A systematic review of
cybersickness. In Proceedings of the 2014 Conference on Interactive Enter-
tainment, pages 1–9. ACM, 2014.
[70] Patricia S Cowings, Steve Suter, William B Toscano, Joe Kamiya, and Karen
Naifeh. General autonomic components of motion sickness. Psychophysiology,
23(5):542–551, 1986.
[71] Yu-Chieh Chen, Jeng-Ren Duann, Shang-Wen Chuang, Chun-Ling Lin, Li-
Wei Ko, Tzyy-Ping Jung, and Chin-Teng Lin. Spatial and temporal EEG
dynamics of motion sickness. NeuroImage, 49(3):2862–2870, 2010.
184
[72] Senqi Hu, Kathleen A McChesney, Kathryn A Player, Amy M Bahl, Jessica B
Buchanan, and Jason E Scozzafava. Systematic investigation of physiological
correlates of motion sickness induced by viewing an optokinetic rotating drum.
Aviation, space, and environmental medicine, 1999.
[73] Li-Wei Ko, Chun-Shu Wei, Shi-An Chen, and Chin-Teng Lin. EEG-based
motion sickness estimation using principal component regression. In Neural
Information Processing, pages 717–724. Springer, 2011.
[74] By Chin-Teng Lin, Li-Wei Ko, Jin-Chern Chiou, Jeng-Ren Duann, Ruey-Song
Huang, Sheng-Fu Liang, Tzai-Wen Chiu, and Tzyy-Ping Jung. Noninvasive
neural prostheses using mobile and wireless EEG. Proceedings of the IEEE,
96(7):1167–1183, 2008.
[75] Chin-Teng Lin, Shang-Wen Chuang, Yu-Chieh Chen, Li-Wei Ko, Sheng-Fu
Liang, and Tzyy-Ping Jung. EEG effects of motion sickness induced in a
dynamic virtual reality environment. In Engineering in Medicine and Biol-
ogy Society, 2007. EMBS 2007. 29th Annual International Conference of the
IEEE, pages 3872–3875. IEEE, 2007.
[76] Young Youn Kim, Hyun Ju Kim, Eun Nam Kim, Hee Dong Ko, and Hyun Taek
Kim. Characteristic changes in the physiological components of cybersickness.
Psychophysiology, 42(5):616–625, 2005.
[77] Byung-Chan Min, Soon-Cheol Chung, Yoon-Ki Min, and Kazuyoshi
Sakamoto. Psychophysiological evaluation of simulator sickness evoked by
a graphic simulator. Applied ergonomics, 35(6):549–556, 2004.
[78] Syed Ali Arsalan Naqvi, Nasreen Badruddin, Munsif Ali Jatoi, Aamir Saeed
Malik, Wan Hazabbah, and Baharudin Abdullah. EEG based time and fre-
quency dynamics analysis of visually induced motion sickness (vims). Aus-
tralasian Physical & Engineering Sciences in Medicine, 38(4):721–729, 2015.
[79] Syed Ali Arsalan Naqvi, Nasreen Badruddin, Aamir S Malik, Wan Hazabbah,
and Baharudin Abdullah. EEG alpha power: An indicator of visual fatigue. In
Intelligent and Advanced Systems (ICIAS), 2014 5th International Conference
on, pages 1–5. IEEE, 2014.
[80] Hiran Ekanayake. P300 and emotiv epoc: Does emotiv epoc capture real
eeg? Web publication http://neurofeedback. visaduma. info/emotivresearch.
htm, 2010.
[81] Erik W Anderson, Kristin C Potter, Laura E Matzen, Jason F Shepherd,
Gilbert A Preston, and Cláudio T Silva. A user study of visualization ef-
fectiveness using EEG and cognitive load. In Computer Graphics Forum,
volume 30, pages 791–800. Wiley Online Library, 2011.
185
[82] Peter Aspinall, Panagiotis Mavros, Richard Coyne, and Jenny Roe. The urban
brain: analysing outdoor physical activity with mobile EEG. British journal
of sports medicine, pages bjsports–2012, 2013.
[83] Stefan Debener, Falk Minow, Reiner Emkes, Katharina Gandras, and Maarten
Vos. How about taking a low-cost, small, and wireless EEG for a walk? Psy-
chophysiology, 49(11):1617–1621, 2012.
[84] Kay M Stanney and Robert S Kennedy. The psychometrics of cybersickness.
Communications of the ACM, 40(8):66–68, 1997.
[85] Arnaud Delorme, Terrence Sejnowski, and Scott Makeig. Enhanced detec-
tion of artifacts in EEG data using higher-order statistics and independent
component analysis. Neuroimage, 34(4):1443–1449, 2007.
[86] Scott Makeig, Anthony J Bell, Tzyy-Ping Jung, Terrence J Sejnowski, et al.
Independent component analysis of electroencephalographic data. Advances
in neural information processing systems, pages 145–151, 1996.
[87] Ruey-Song Huang, Tzyy-Ping Jung, and Scott Makeig. Event-related brain
dynamics in continuous sustained-attention tasks. Foundations of augmented
cognition, pages 65–74, 2007.
[88] Ruey-Song Huang, Tzyy-Ping Jung, Arnaud Delorme, and Scott Makeig.
Tonic and phasic electroencephalographic dynamics during continuous com-
pensatory tracking. NeuroImage, 39(4):1896–1909, 2008.
[89] Jason Matheny. Intelligence advanced research projects activity. 3rd Annual
BRAIN Initiative Investigators Meeting, North Bethesda, Maryland, 2016.
[90] Cagatay Turkay, Erdem Kaya, Selim Balcisoy, and Helwig Hauser. Designing
progressive and interactive analytics processes for high-dimensional data anal-
ysis. IEEE Transactions on Visualization and Computer Graphics, 23(1):131–
140, 2017.
[91] Eric Krokos, Catherine Plaisant, and Amitabh Varshney. Virtual memory
palaces: immersion aids recall. Virtual Reality, pages 1–15, 2018.
[92] Alex Endert, Patrick Fiaux, and Chris North. Semantic interaction for visual
text analytics. In Proceedings of the SIGCHI conference on Human factors in
computing systems, pages 473–482. ACM, 2012.
[93] Dominik Sacha, Leishi Zhang, Michael Sedlmair, John A Lee, Jaakko Pelto-
nen, Daniel Weiskopf, Stephen C North, and Daniel A Keim. Visual inter-
action with dimensionality reduction: A structured literature analysis. IEEE
Transactions on Visualization and Computer Graphics, 23(1):241–250, 2017.
[94] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by
locally linear embedding. Science, 290(5500):2323–2326, 2000.
186
[95] Joseph B Kruskal. Multidimensional scaling by optimizing goodness of fit to
a nonmetric hypothesis. Psychometrika, 29(1):1–27, 1964.
[96] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality re-
duction and data representation. Neural computation, 15(6):1373–1396, 2003.
[97] Yehuda Koren. On spectral graph drawing. In Tandy Warnow and Binhai
Zhu, editors, Computing and Combinatorics, volume 2697 of Lecture Notes in
Computer Science, pages 496–508. Springer Berlin Heidelberg, 2003.
[98] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.
Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.
[99] Graham W Taylor, Geoffrey E Hinton, and Sam T Roweis. Modeling hu-
man motion using binary latent variables. Advances in neural information
processing systems, 19:1345, 2007.
[100] Martin Wattenberg, Fernanda Vigas, and Ian Johnson. How to use t-sne
effectively. Distill, 2016.
[101] Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality
of data with neural networks. Science, 313(5786):504–507, 2006.
[102] Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by
learning an invariant mapping. In Computer vision and pattern recognition,
2006 IEEE computer society conference on, volume 2, pages 1735–1742. IEEE,
2006.
[103] Yushi Chen, Zhouhan Lin, Xing Zhao, Gang Wang, and Yanfeng Gu. Deep
learning-based classification of hyperspectral data. IEEE Journal of Selected
topics in applied earth observations and remote sensing, 7(6):2094–2107, 2014.
[104] El-ad David Amir, Kara L Davis, Michelle D Tadmor, Erin F Simonds, Ja-
cob H Levine, Sean C Bendall, Daniel K Shenfeld, Smita Krishnaswamy,
Garry P Nolan, and Dana Pe’er. visne enables visualization of high dimen-
sional single-cell data and reveals phenotypic heterogeneity of leukemia. Na-
ture biotechnology, 31(6):545–552, 2013.
[105] Tuan Nhon Dang and Leland Wilkinson. Transforming scagnostics to reveal
hidden features. IEEE Transactions on Visualization and Computer Graphics,
20(12):1624–1632, 2014.
[106] Cheuk Yiu Ip, Amitabh Varshney, and Joseph JaJa. Hierarchical exploration
of volumes using multilevel segmentation of the intensity-gradient histograms.
IEEE Transactions on Visualization and Computer Graphics, 18(12):2355–
2363, 2012.
187
[107] Hsueh-Chien Cheng, Antonio Cardone, Eric Krokos, Bogdan Stoica, Alan
Faden, and Amitabh Varshney. Deep-learning-assisted visualization for live-
cell images. In Proceedings of 2017 IEEE International Conference on Image
Processing, ICIP. IEEE, September 2017.
[108] Hsueh-Chien Cheng, Antonio Cardone, Somay Jain, Eric Krokos, Kedar
Narayan, Sriram Subramaniam, and Amitabh Varshney. Deep-learning-
assisted volume visualization. IEEE Transactions on Visualization and Com-
puter Graphics, PP(99):1–14, January 2018.
[109] Shusen Liu, Bei Wang, P-T Bremer, and Valerio Pascucci. Distortion-guided
structure-driven interactive exploration of high-dimensional data. In Com-
puter Graphics Forum, volume 33, pages 101–110. Wiley Online Library, 2014.
[110] Shusen Liu, Bei Wang, Jayaraman J Thiagarajan, P-T Bremer, and Valerio
Pascucci. Visual exploration of high-dimensional data through subspace anal-
ysis and dynamic projections. In Computer Graphics Forum, volume 34, pages
271–280. Wiley Online Library, 2015.
[111] Eric Krokos and Hanan Samet. A look into twitter hashtag discovery and gen-
eration. In Proceedings of the 7th ACM SIGSPATIAL Workshop on Location-
Based Social Networks (LBSN14), Dallas, TX, Nov, 2014.
[112] Ka-Ping Yee, Kirsten Swearingen, Kevin Li, and Marti Hearst. Faceted meta-
data for image search and browsing. In Proceedings of the SIGCHI conference
on Human factors in computing systems, pages 401–408. ACM, 2003.
[113] Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, and Ioannis P
Vlahavas. Multi-label classification of music into emotions. In The Interna-
tional Society of Music Information Retrieval, volume 8, pages 325–330, 2008.
[114] Thorsten Joachims. Text categorization with support vector machines: Learn-
ing with many relevant features. In European conference on machine learning,
pages 137–142. Springer, 1998.
[115] Naonori Ueda and Kazumi Saito. Parametric mixture models for multi-labeled
text. Advances in neural information processing systems, pages 737–744, 2003.
[116] Shantanu Godbole and Sunita Sarawagi. Discriminative methods for multi-
labeled classification. In Pacific-Asia Conference on Knowledge Discovery and
Data Mining, pages 22–30. Springer, 2004.
[117] Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol,
Pascal Vincent, and Samy Bengio. Why does unsupervised pre-training help
deep learning? Journal of Machine Learning Research, 11(Feb):625–660, 2010.
[118] Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learn-
ing method for deep neural networks. In Workshop on Challenges in Repre-
sentation Learning, ICML, volume 3, page 2, 2013.
188
[119] Nguyen Quoc Viet Hung, Duong Chi Thang, Matthias Weidlich, and Karl
Aberer. Minimizing efforts in validating crowd answers. In Proceedings of
the 2015 ACM SIGMOD International Conference on Management of Data,
pages 999–1014. ACM, 2015.
[120] Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani
Raiko. Semi-supervised learning with ladder networks. In Advances in Neural
Information Processing Systems, pages 3546–3554, 2015.
[121] Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, and
Yoshua Bengio. Deconstructing the ladder network architecture. In Inter-
national Conference on Machine Learning, pages 2368–2376, 2016.
[122] Malcolm Ware, Eibe Frank, Geoffrey Holmes, Mark Hall, and Ian H Wit-
ten. Interactive machine learning: letting users build classifiers. International
Journal of Human-Computer Studies, 55(3):281–292, 2001.
[123] Saleema Amershi, Bongshin Lee, Ashish Kapoor, Ratul Mahajan, and Blaine
Christian. Cuet: human-guided fast and accurate network alarm triage. In
Proceedings of the SIGCHI Conference on Human Factors in Computing Sys-
tems, pages 157–166. ACM, 2011.
[124] Axel J Soto, Ryan Kiros, Vlado Kešelj, and Evangelos Milios. Exploratory
visual analysis and interactive pattern extraction from semi-structured data.
ACM Transactions on Interactive Intelligent Systems (TiiS), 5(3):16, 2015.
[125] Mart́ın Abadi et al. Tensorflow: A system for large-scale machine learning. In
OSDI, volume 16, pages 265–283, 2016.
[126] François Chollet et al. Keras: Deep learning library for Theano and Tensor-
flow. URL: https://keras. io/k, 2015.
[127] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted
Boltzmann machines. In Proceedings of the 27th international conference on
machine learning (ICML-10), pages 807–814, 2010.
[128] Shun-ichi Amari, Andrzej Cichocki, and Howard Hua Yang. A new learn-
ing algorithm for blind signal separation. In Advances in neural information
processing systems, pages 757–763, 1996.
[129] Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint
arXiv:1212.5701, 2012.
[130] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation.
IEEE Transactions on pattern analysis and machine intelligence, 22(8):888–
905, 2000.
[131] William N Anderson Jr and Thomas D Morley. Eigenvalues of the laplacian
of a graph. Linear and multilinear algebra, 18(2):141–145, 1985.
189
[132] Xudong Kang, Shutao Li, and Jon Atli Benediktsson. Spectral–spatial hyper-
spectral image classification with edge-preserving filtering. IEEE Transactions
on Geoscience and Remote Sensing, 52(5):2666–2677, 2014.
[133] Hadi Shiravi, Ali Shiravi, and Ali A Ghorbani. A survey of visualization sys-
tems for network security. IEEE Transactions on visualization and computer
graphics, 18(8):1313–1329, 2012.
[134] Stephen G Eick. Engineering perceptually effective visualizations for abstract
data. In In Scientific Visualization Overviews, Methodologies and Techniques,
IEEE Computer Science. Citeseer, 1995.
[135] Alissa Torres. Building a world-class security operations center: A roadmap.
SANS Institute, May, 2015.
[136] John Goodall, Wayne Lutters, and Anita Komlodi. The work of intrusion
detection: rethinking the role of security analysts. AMCIS 2004 Proceedings,
page 179, 2004.
[137] Vinicius Tavares Guimaraes, Carla Maria Dal Sasso Freitas, Ramin Sadre,
Liane Margarida Rockenbach Tarouco, and Lisandro Zambenedetti Granville.
A survey on information visualization for network and service management.
IEEE Communications Surveys & Tutorials, 18(1):285–323, 2016.
[138] Giovanni Vigna and Richard A Kemmerer. Netstat: A network-based intrusion
detection approach. In Computer Security Applications Conference, 1998.
Proceedings. 14th Annual, pages 25–34. IEEE, 1998.
[139] Angela Orebaugh, Gilbert Ramirez, and Jay Beale. Wireshark & Ethereal
network protocol analyzer toolkit. Elsevier, 2006.
[140] Glenn A Fink, Christopher L North, Alex Endert, and Stuart Rose. Visualizing
cyber security: Usable workspaces. In Visualization for Cyber Security, 2009.
VizSec 2009. 6th International Workshop on, pages 45–56. IEEE, 2009.
[141] G.A. Fink, C.L. North, A. Endert, and S. Rose. Visualizing cyber security:
Usable workspaces. In Visualization for Cyber Security, 2009. VizSec 2009.
6th International Workshop on, pages 45–56, Oct 2009.
[142] Kulsoom Abdullah, Chris Lee, Gregory Conti, and John A Copeland. Visualiz-
ing network data for intrusion detection. In Information Assurance Workshop,
2005. IAW’05. Proceedings from the Sixth Annual IEEE SMC, pages 100–108.
IEEE, 2005.
[143] Ying Zhao, FangFang Zhou, XiaoPing Fan, Xing Liang, and YongGang Liu.
IDSRadar: a real-time visualization framework for IDS alerts. Science China
Information Sciences, 56(8):1–12, 2013.
190
[144] Bin Yu, Les Smith, and Mark Threefoot. Semi-supervised time series mod-
eling for real-time flux domain detection on passive DNS traffic. In Interna-
tional Workshop on Machine Learning and Data Mining in Pattern Recogni-
tion, pages 258–271. Springer, 2014.
[145] Anatoly Yelizarov and Dennis Gamayunov. Visualization of complex attacks
and state of attacked network. In Visualization for Cyber Security, 2009.
VizSec 2009. 6th International Workshop on, pages 1–9. IEEE, 2009.
[146] Troy Nunnally, Kulsoom Abdullah, A Selcuk Uluagac, John A Copeland, and
Raheem Beyah. Navsec: A recommender system for 3D network security
visualizations. In Proceedings of the Tenth Workshop on Visualization for
Cyber Security, pages 41–48. ACM, 2013.
[147] Inhwan Kim, Hyunsang Choi, and Heejo Lee. BotXrayer: Exposing botnets by
visualizing DNS traffic. In KSII the first International Conference on Internet,
2009.
[148] Yarden Livnat, Jim Agutter, Shaun Moon, Robert F Erbacher, and Stefano
Foresti. A visualization paradigm for network intrusion detection. In Informa-
tion Assurance Workshop, 2005. IAW’05. Proceedings from the Sixth Annual
IEEE SMC, pages 92–99. IEEE, 2005.
[149] Guihua Shan, Yang Wang, Maojin Xie, Haopu Lv, and Xuebin Chi. Visual
detection of anomalies in DNS query log data. In Visualization Symposium
(PacificVis), 2014 IEEE Pacific, pages 258–261. IEEE, 2014.
[150] Troy Nunnally, Penyen Chi, Kulsoom Abdullah, A Selcuk Uluagac, John A
Copeland, and Raheem Beyah. P3D: A parallel 3D coordinate visualization for
advanced network scans. In Communications (ICC), 2013 IEEE International
Conference on, pages 2052–2057. IEEE, 2013.
[151] Marios Iliofotou, Prashanth Pappu, Michalis Faloutsos, Michael Mitzen-
macher, Sumeet Singh, and George Varghese. Network monitoring using traffic
dispersion graphs (tdgs). In Proceedings of the 7th ACM SIGCOMM confer-
ence on Internet measurement, pages 315–320. ACM, 2007.
[152] Qingnan Lai, Changling Zhou, Hao Ma, Zhen Wu, and Shiyang Chen. Visualiz-
ing and characterizing DNS lookup behaviors via log-mining. Neurocomputing,
169:100–109, 2015.
[153] N. Jiang, J. Cao, Y. Jin, L. E. Li, and Z. Zhang. Identifying suspicious
activities through dns failure graph analysis. In The 18th IEEE International
Conference on Network Protocols, pages 144–153, Oct 2010.
[154] Mohammad Ghoniem, J-D Fekete, and Philippe Castagliola. A comparison of
the readability of graphs using node-link and matrix-based representations. In
Information Visualization, 2004. INFOVIS 2004. IEEE Symposium on, pages
17–24. Ieee, 2004.
191
[155] Rosa Romero-Gomez, Yacin Nadji, and Manos Antonakakis. Towards design-
ing effective visualizations for DNS-based network threat analysis. In Visu-
alization for Cyber Security (VizSec), 2017 IEEE Symposium on, pages 1–8.
IEEE, 2017.
[156] Sean McKenna, Diane Staheli, Cody Fulcher, and Miriah Meyer. Bubblenet:
A cyber security dashboard for visualizing patterns. In Computer Graphics
Forum, volume 35, pages 281–290. Wiley Online Library, 2016.
[157] Iman Sharafaldin, Amirhossein Gharib, Arash Habibi Lashkari, and Ali A
Ghorbani. Botviz: A memory forensic-based botnet detection and visualiza-
tion approach. In Security Technology (ICCST), 2017 International Carnahan
Conference on, pages 1–8. IEEE, 2017.
[158] Andrew Caudwell. Logstalgia, 2014. http://logstalgia.io/.
[159] Tatiana Von Landesberger, Arjan Kuijper, Tobias Schreck, Jörn Kohlhammer,
Jarke J van Wijk, J-D Fekete, and Dieter W Fellner. Visual analysis of large
graphs: state-of-the-art and future research challenges. In Computer graphics
forum, volume 30, pages 1719–1749. Wiley Online Library, 2011.
[160] Hans-Jorg Schulz, Steffen Hadlak, and Heidrun Schumann. The design space of
implicit hierarchy visualization: A survey. IEEE transactions on visualization
and computer graphics, 17(4):393–411, 2011.
[161] Stephen Lau. The spinning cube of potential doom. Communications of the
ACM, 47(6):25–26, 2004.
[162] Colin Ware. Information visualization: perception for design. Elsevier, 2012.
[163] Gregory Conti, Mustaque Ahamad, and John Stasko. Attacking information
visualization system usability overloading and deceiving the human. In Pro-
ceedings of the 2005 symposium on Usable privacy and security, pages 89–100.
ACM, 2005.
[164] Daisuke Inoue, Masashi Eto, Koei Suzuki, Mio Suzuki, and Koji Nakao.
Daedalus-viz: novel real-time 3D visualization for darknet monitoring-based
alert system. In Proceedings of the ninth international symposium on visual-
ization for cyber security, pages 72–79. ACM, 2012.
[165] Giovane Moura, Ricardo de O Schmidt, John Heidemann, Wouter B de Vries,
Moritz Muller, Lan Wei, and Cristian Hesselman. Anycast vs. DDoS: Evalu-
ating the november 2015 root DNS event. In Proceedings of the 2016 Internet
Measurement Conference, pages 255–270. ACM, 2016.
[166] ICANN. Factsheet - root server attack on 6 february 2007, 2007.
https://www.icann.org/en/system/files/files/factsheet-dns-attack-08mar07-
en.pdf.
192
[167] Steve Mansfield-Devine. The growth and evolution of DDoS. Network Security,
2015(10):13–20, 2015.
[168] Christopher Amin, Massimo Candela, Daniel Karrenberg, Robert Kisteleki,
and Andreas Strikos. Visualization and monitoring for the identification and
analysis of DNS issues. In Proceedings of the Tenth International Conference
on Internet Monitoring and Protection, 2015.
[169] Michael Aupetit, Yury Zhauniarovich, Giorgos Vasiliadis, Marc Dacier, and
Yazan Boshmaf. Visualization of actionable knowledge to mitigate DRDoS
attacks. In Visualization for Cyber Security (VizSec), 2016 IEEE Symposium
on, pages 1–8. IEEE, 2016.
[170] Barbara Tversky, Julie Bauer Morrison, and Mireille Betrancourt. Animation:
can it facilitate? International journal of human-computer studies, 57(4):247–
262, 2002.
193