ABSTRACT
Title of dissertation: DATA-DRIVEN STORYTELLING
FOR CASUAL USERS
Zhenpeng Zhao
Doctor of Philosophy, 2019
Dissertation directed by: Professor Niklas Elmqvist
College of Information Studies
Today?s overwhelming volume of data has made effective analysis virtually in-
accessible for the general public. The emerging practice of data-driven storytelling
is addressing this by framing data using familiar mechanisms such as slideshows,
videos, and comics to make even highly complex phenomena understandable. How-
ever, current data stories still do not utilize the full potential of the storytelling
domain. One reason for this is that current data-driven storytelling practice does
not leverage the full repertoire of media that can be used for storytelling, such as
speech, e-learning, and video games.
In this dissertation, we propose a taxonomy focused specifically on media types
for the purpose of widening the purview of data-driven storytelling by putting more
tools in the hands of designers. We expand the idea of data-driven storytelling
into the group of casual users, who are the consumers of information and non-
professionals with limited time, skills, and motivation , to bridge the data gap
between the advanced data analytics tools and everyday internet users. To prove the
effectiveness and the wide acceptance of our taxonomy and data-driven storytelling
among the casual users, we have collected examples for data-driven storytelling by
finding, reviewing, and classifying ninety-one examples.
Using our taxonomy as a generative tool, we also explored two novel sto-
rytelling mechanisms, including live-streaming analytics videos?DataTV?and se-
quential art (comics) that dynamically incorporates visual representations?Data
Comics [1]. Meanwhile, we widened the genres we explored to fill the gaps in the
literature. We also evaluated Data Comics and DataTV with user studies and ex-
pert reviews. The results show that Data Comics facilitates data-driven storytelling
in terms of inviting reading, aiding memory, and viewing as a story. The results
also show that an integrated system as DataTV encourages authors to create and
present data stories.
DATA-DRIVEN STORYTELLING
FOR CASUAL USERS
by
Zhenpeng Zhao
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2019
Advisory Committee:
Professor Niklas Elmqvist, Chair/Advisor
Professor Leilani Battle
Professor Ben Bederson
Professor He?ctor Corrada Bravo
Professor Matthias Zwicker
?c Copyright by
Zhenpeng Zhao
2019

Dedication
I dedicate this dissertation to people I loved and those I still love.
ii
Acknowledgments
I owe my gratitude to all the people who have made this thesis possible and
because of whom my graduate experience has been one that I will cherish forever.
First and foremost I?d like to thank my advisor, Professor Niklas Elmqvist, for
giving me an invaluable opportunity to work on challenging and extremely interest-
ing projects over the past six years. He has always been patient to me throughout
the years. He has always made himself available for help and advice and there has
never been an occasion when I?ve knocked on his door and he hasn?t given me time.
It has been a pleasure to work with and learn from such an extraordinary individual.
I would also like thank all my committee members: Prof. Leilani Battle, Prof.
Ben Bederson, Prof. Hector Corrado Bravo, and Prof. Matthias Zwicker. All the
professors tried their best to advice me from my preliminary exam to my defense.
although Prof. Alan Sussman was unfortunately unable to attend my defense. They
encouraged and tried their best to accommodate my schedule.
My colleagues from HCIL also gave me extensive help. They helped me during
my research and gave me lots of feedbacks for this dissertation.
It is impossible to remember all, and I apologize to those I?ve inadvertently
left out.
Lastly, thank you all and thank God!
iii
Table of Contents
Dedication ii
Acknowledgements iii
List of Tables xi
List of Figures xii
1 Introduction 1
1.1 Motivation: Harnessing the Data Deluge . . . . . . . . . . . . . . . . 2
1.1.1 Opportunity: The Scale of Big Data . . . . . . . . . . . . . . 3
1.1.2 Problem: The Digital Data Divide . . . . . . . . . . . . . . . 4
1.1.3 Setting: Data for Casual Users . . . . . . . . . . . . . . . . . 6
1.1.4 Solution: Data-Driven Storytelling . . . . . . . . . . . . . . . 7
1.2 Background: Data Visualization . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Examples of Data Visualization . . . . . . . . . . . . . . . . . 8
1.2.3 Data Visualization for the Masses . . . . . . . . . . . . . . . . 9
1.3 Background: Data-Driven Storytelling . . . . . . . . . . . . . . . . . 10
1.3.1 Definition of Storytelling, Story, and Data-driven Storytelling 10
1.3.2 Existing Frameworks . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 Brief History of Storytelling . . . . . . . . . . . . . . . . . . . 14
1.4 Purpose of This Work . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Related Work 16
2.1 Visualization and Visual Analytics . . . . . . . . . . . . . . . . . . . 16
2.1.1 Foundational Work on Data Visualization . . . . . . . . . . . 16
2.1.2 Examples of Classic and Well-known Visualizations . . . . . . 17
2.2 Production, Presentation, and Dissemination . . . . . . . . . . . . . . 18
2.3 Visual Communication and Visualization . . . . . . . . . . . . . . . . 19
2.4 Visual Storytelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Data-Driven Storytelling . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 Comics as a Storytelling Medium . . . . . . . . . . . . . . . . . . . . 21
iv
2.7 Animated Graphics as a Storytelling Medium . . . . . . . . . . . . . 23
3 Taxonomy of Media for Data-Driven Storytelling 25
3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1 Exhibit 1: Storytelling in Movies and Documentaries . . . . . 27
3.2.2 Exhibit 2: Dance . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.3 Exhibit 3: Data-Driven Storytelling in Video Games . . . . . 30
3.3 Taxonomy Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.1 Audience Cardinality (A) . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Space and Time (S/T) . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Visual Components (VC) . . . . . . . . . . . . . . . . . . . . . 36
3.3.4 Data Components (DC) . . . . . . . . . . . . . . . . . . . . . 37
3.3.5 Media Viewing Sequence (SQ) . . . . . . . . . . . . . . . . . . 38
3.3.6 Storage and Persistence (S/P) . . . . . . . . . . . . . . . . . . 39
3.4 Examples Using the Taxonomy (Applications) . . . . . . . . . . . . . 39
3.4.1 Data Comics . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.2 Data TV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Implication for the Taxonomy . . . . . . . . . . . . . . . . . . . . . . 45
4 Designing Data-Driven Tools for Casual Users 51
4.1 Casual Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Casual Information Visualization . . . . . . . . . . . . . . . . . . . . 51
4.3 Why Data-driven Storytelling for Casual Users . . . . . . . . . . . . . 52
4.4 Which Media to Use . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 Data Comics 54
5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.1 Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.2 Panel Content . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1.3 Characters, Annotation, and Effects . . . . . . . . . . . . . . . 56
5.1.4 Layout Management . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.5 Viewing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Implementing Data Comics . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.1 Clipper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.2 Decorator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.3 Composer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.4 Presenter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Telling Stories Using Data Comics . . . . . . . . . . . . . . . . . . . . 61
5.3.1 Comics Narration . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.2 Panel Content and Size . . . . . . . . . . . . . . . . . . . . . . 62
5.3.3 Textual Narration . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4 DataComics Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4.1 Euro Debt Crisis . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4.2 U.S. Census Population Pyramid . . . . . . . . . . . . . . . . 66
v
5.4.3 Scientific Journal Comparisons . . . . . . . . . . . . . . . . . . 66
5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.5.1 Study 1: Authoring Data Comics . . . . . . . . . . . . . . . . 67
5.5.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5.2 Study 2: Presenting Data Comics . . . . . . . . . . . . . . . . 70
5.5.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . 70
5.5.2.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5.2.3 Task and Datasets . . . . . . . . . . . . . . . . . . . 71
5.5.2.4 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5.2.5 Factors . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.5.2.6 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 73
5.5.2.7 Quantitative Results . . . . . . . . . . . . . . . . . . 74
5.5.2.8 Qualitative Feedback . . . . . . . . . . . . . . . . . . 75
5.5.3 Study 3: Partitioning and Sequence in Storytelling . . . . . . 76
5.5.3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . 77
5.5.3.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . 77
5.5.3.3 Task and Datasets . . . . . . . . . . . . . . . . . . . 77
5.5.3.4 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.5.3.5 Factors . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.5.3.6 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 80
5.5.3.7 Quantitative Results . . . . . . . . . . . . . . . . . . 82
5.5.3.8 Qualitative Feedback . . . . . . . . . . . . . . . . . . 82
6 DataTV 84
6.1 Design: Supporting Streaming Data Video Production . . . . . . . . 84
6.2 DataTV: A Streaming Data Video Editor . . . . . . . . . . . . . . . . 86
6.2.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2.2 Source Management . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.3 Scene Management . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.4 Video Annotation . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2.5 Recording and Live Streaming . . . . . . . . . . . . . . . . . . 91
6.2.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3 DataTV Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3.1 Nobel Prize Data Analysis with Keshif . . . . . . . . . . . . . 93
6.3.2 Stock Market Data Analysis with TimeFork . . . . . . . . . . 96
6.3.3 NY Times Comment Data Analysis with CommentIQ . . . . . 97
6.4 Qualitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.4.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.4.3 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.4.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
vi
7 Discussion 108
7.1 Explaining the Findings . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.1.1 DataComics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.1.2 DataTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2 Generalizing the Findings . . . . . . . . . . . . . . . . . . . . . . . . 110
7.2.1 DataComics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.2.2 DataTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.3.1 DataComics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.3.2 DataTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.4 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.4.1 Be Open to Unique Media for Storytelling . . . . . . . . . . . 113
7.4.2 Avoid Relying on Artistic Skill . . . . . . . . . . . . . . . . . 113
7.4.3 Start from Existing Examples, Don?t Be Too Unique . . . . . 114
8 Conclusion 115
9 Future Work 117
A Survey of Data-Driven Storytelling Media 119
A.1 Storytelling in Movies and Documentaries . . . . . . . . . . . . . . . 119
A.1.1 Stop Marine Plastic Pollution . . . . . . . . . . . . . . . . . . 119
A.1.2 A Beautiful Planet . . . . . . . . . . . . . . . . . . . . . . . . 121
A.1.3 ABC News for Irma Hurricane . . . . . . . . . . . . . . . . . . 123
A.1.4 Is Height All in Our Gene . . . . . . . . . . . . . . . . . . . . 124
A.1.5 Ancient Greece in 18 minutes . . . . . . . . . . . . . . . . . . 125
A.1.6 The history of Asia: every year . . . . . . . . . . . . . . . . . 126
A.1.7 Wealth Inequality in America . . . . . . . . . . . . . . . . . . 128
A.1.8 The Joy of Stats . . . . . . . . . . . . . . . . . . . . . . . . . 129
A.1.9 Religions and babies . . . . . . . . . . . . . . . . . . . . . . . 130
A.1.10 Gene Pool Decline . . . . . . . . . . . . . . . . . . . . . . . . 132
A.1.11 How To End Poverty . . . . . . . . . . . . . . . . . . . . . . . 133
A.1.12 China?s Geography Problem . . . . . . . . . . . . . . . . . . . 134
A.1.13 Imaginary Numbers Are Real . . . . . . . . . . . . . . . . . . 135
A.1.14 Big Data Revolution . . . . . . . . . . . . . . . . . . . . . . . 137
A.1.15 The Truth About Population . . . . . . . . . . . . . . . . . . 138
A.1.16 Inside the mind of a master procrastinator . . . . . . . . . . 139
A.1.17 How data will transform business . . . . . . . . . . . . . . . . 140
A.1.18 Will Saving Poor Children Lead to Overpopulation . . . . . . 142
A.2 Data Comics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A.2.1 PhD Comic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A.2.2 NFL Player Data . . . . . . . . . . . . . . . . . . . . . . . . . 145
A.2.3 Graphic Comic . . . . . . . . . . . . . . . . . . . . . . . . . . 147
A.2.4 Comic style Dashboard . . . . . . . . . . . . . . . . . . . . . . 148
A.2.5 Infographic Comic . . . . . . . . . . . . . . . . . . . . . . . . 150
vii
A.2.6 NYC Restaurant Data Vis Comic . . . . . . . . . . . . . . . . 151
A.2.7 Marvel vs DC Comics . . . . . . . . . . . . . . . . . . . . . . 153
A.2.8 Body Cartoon . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
A.2.9 Spider Man Comic Visualization . . . . . . . . . . . . . . . . . 157
A.2.10 Cell Phone Comic . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.2.11 Linear Regression Comic . . . . . . . . . . . . . . . . . . . . . 160
A.2.12 Vocation Stress Comic . . . . . . . . . . . . . . . . . . . . . . 161
A.2.13 Desk Entropy Comic . . . . . . . . . . . . . . . . . . . . . . . 162
A.2.14 PhD Grooming Comic . . . . . . . . . . . . . . . . . . . . . . 164
A.2.15 PhD Procrastination . . . . . . . . . . . . . . . . . . . . . . . 165
A.2.16 Day of Life an American . . . . . . . . . . . . . . . . . . . . . 167
A.2.17 Curve Fitting Comic . . . . . . . . . . . . . . . . . . . . . . . 169
A.2.18 Seashell Probability Comic . . . . . . . . . . . . . . . . . . . . 171
A.3 Web Article with Data Visualization . . . . . . . . . . . . . . . . . . 172
A.3.1 The Two Americas . . . . . . . . . . . . . . . . . . . . . . . . 173
A.3.2 Strikeouts on the Rise . . . . . . . . . . . . . . . . . . . . . . 177
A.3.3 1.5 Million Missing Black Men . . . . . . . . . . . . . . . . . . 179
A.4 Visualization Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
A.4.1 VisJocky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
A.4.2 ChartAccent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
A.5 Sketch Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
A.5.1 SketchStory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
A.5.2 Sketcholution . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
A.5.3 DataSketches? Royal Constellations . . . . . . . . . . . . . . . 186
A.5.4 DataSketches?Carcaptor Sakura . . . . . . . . . . . . . . . . . 187
A.5.5 The Big Short Movie Explained Animated . . . . . . . . . . . 188
A.6 Infographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
A.6.1 New Orleans Housing Population . . . . . . . . . . . . . . . . 190
A.6.2 Top writers for best sellers . . . . . . . . . . . . . . . . . . . . 191
A.6.3 Public Library Report . . . . . . . . . . . . . . . . . . . . . . 193
A.6.4 US Poverty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
A.6.5 London March . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
A.6.6 Yemen War . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
A.6.7 Space Industry . . . . . . . . . . . . . . . . . . . . . . . . . . 198
A.6.8 Household Air Pollution . . . . . . . . . . . . . . . . . . . . . 199
A.6.9 Air Pollution Linked Death . . . . . . . . . . . . . . . . . . . 200
A.6.10 North and South Korean Comparison . . . . . . . . . . . . . . 201
A.6.11 In the Shadow of Foreclosure . . . . . . . . . . . . . . . . . . 202
A.6.12 Word of Democrats and Republicans . . . . . . . . . . . . . . 204
A.6.13 UK and US Firearms . . . . . . . . . . . . . . . . . . . . . . . 205
A.6.14 White House Correspondent Dinner . . . . . . . . . . . . . . . 206
A.6.15 Day vs. Night: What NYCs Population Looks Like . . . . . . 207
A.6.16 Who owns everything: Big Data Today . . . . . . . . . . . . . 208
A.6.17 Big Welsh Coast Walk . . . . . . . . . . . . . . . . . . . . . . 210
A.6.18 Hungry USA . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
viii
A.6.19 NYC Celebrity Map . . . . . . . . . . . . . . . . . . . . . . . 212
A.7 Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
A.7.1 Halo: Reach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
A.7.2 Call of Duty: : Black Ops . . . . . . . . . . . . . . . . . . . . 215
A.7.3 StarCraft II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
A.8 Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
A.8.1 TwitterSheep . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
A.8.2 Twitter Interactive Games of Thrones . . . . . . . . . . . . . 219
A.8.3 Twitter Interactive: How tweets spread . . . . . . . . . . . . 220
A.9 Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
A.9.1 Uber Mobile Visualization . . . . . . . . . . . . . . . . . . . . 221
A.9.2 AR Data Visualization Design . . . . . . . . . . . . . . . . . 223
A.9.3 AR 3D Design . . . . . . . . . . . . . . . . . . . . . . . . . . 225
A.9.4 AR Flight Data . . . . . . . . . . . . . . . . . . . . . . . . . 226
A.9.5 AR Street Visualization . . . . . . . . . . . . . . . . . . . . . 227
A.9.6 AR pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
A.9.7 AR Infrastructure Visualization . . . . . . . . . . . . . . . . . 230
A.9.8 AR Bio-Chemical Visualization . . . . . . . . . . . . . . . . . 232
A.10 Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
A.10.1 Adobe VR Data Visualization . . . . . . . . . . . . . . . . . 233
A.10.2 VR Baseball training . . . . . . . . . . . . . . . . . . . . . . 235
A.10.3 VR Big Data Analysis . . . . . . . . . . . . . . . . . . . . . . 236
A.10.4 VR Lens Big Data . . . . . . . . . . . . . . . . . . . . . . . . 237
A.10.5 VR Immersive visualization for Big Data . . . . . . . . . . . 239
A.10.6 VR Bio-informatics Visualization . . . . . . . . . . . . . . . . 240
A.10.7 VR Geo Map Visualization . . . . . . . . . . . . . . . . . . . 241
A.10.8 VR Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 243
B Data Comics Evaluation Protocol 245
B.1 Evaluation: DataComics vs PowerPoint: Test Cases and Scripts . . . 245
B.1.1 Twitter Heatmap for Stocks . . . . . . . . . . . . . . . . . . . 245
B.1.2 the U.S. Census Population pyramid . . . . . . . . . . . . . . 246
B.1.3 World Happiness . . . . . . . . . . . . . . . . . . . . . . . . . 247
B.1.4 Star Wars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
B.2 Evaluation: Single Frame vs Frame Panels: Test Cases and Scripts . . 250
B.2.1 The Origin of Major Beer Types . . . . . . . . . . . . . . . . . 250
B.2.2 The Arabic-Israel War . . . . . . . . . . . . . . . . . . . . . . 251
B.2.3 Cell Phone Phishing . . . . . . . . . . . . . . . . . . . . . . . 252
B.2.4 Global Wealthy Population . . . . . . . . . . . . . . . . . . . 253
B.3 Evaluation: Expert Review . . . . . . . . . . . . . . . . . . . . . . . . 254
C Data TV Evaluation Protocol 255
C.1 Questions and Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 255
C.2 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
C.2.1 Data Set One . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
ix
C.2.2 Data Set Two . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Bibliography 258
x
List of Tables
3.1 Classification of our representative sample of different media used for
data-driven storytelling?one. . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Classification of our representative sample of different media used for
data-driven storytelling?two. . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Classification of our representative sample of different media used for
data-driven storytelling?three. . . . . . . . . . . . . . . . . . . . . . . 42
xi
List of Figures
1.1 Genres of narrative visualization by Segel and Heer [2]. . . . . . . . . 11
2.1 Visualization of locations of cholera cases and wells [3]. . . . . . . . . 17
2.2 Visualization of Napoleon?ss failed campaign into Russia [4]. . . . . . 18
3.1 The distribution of all the sources of examples. . . . . . . . . . . . . . 26
3.2 Image from the video A Day in the Life of the (Polluted) Ocean. . . . 27
3.3 Physical dance demonstration of a bubble sort algorithm. . . . . . . . 28
3.4 Data-driven story of a session in the PC game Civilization 3. . . . . . 30
3.5 The figure shows the parallel coordinate graph of the design space of
the examples categorized by the taxonomy . . . . . . . . . . . . . . . 48
3.6 Data Comic on the European debt crisis. . . . . . . . . . . . . . . . . 49
3.7 Webcam and Keshif visualization being recorded for the Nobel Prize
Winner data video. A data video can be made with user as the
presenter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1 (a) Basic DataComicsJS interface with the Composer in the middle
of the workspace and Decorator on the left side. The Decorator is
composed of four expendable menus including data importing, opera-
tions, comic elements, and collection of clips. (b) Data import is used
to load and update visualization clips from a server where the user
can also save finished work and reload them for presentation. SVG
clips is the content captured by the Clipper. . . . . . . . . . . . . . . 58
5.2 Viewing individual panels using the Presenter in slideshow mode.
Buttons allow for navigating slides in sequence. . . . . . . . . . . . . 60
5.3 (a) The use of text to drive narrative. (b) Encapsulation of moments
in panels juxtaposed yield closure as the viewer connects them and
sees the whole. (c) Panels organized into tiers with horizontal gutters;
multiple tiers form pages with vertical gutters. . . . . . . . . . . . . . 60
5.4 Examples of Data Comics. (a) The role of Greece in the European
debt crisis. (b) Trends in scientific journals in neuroscience. . . . . . . 64
5.5 Data Comics of the U.S. baby boom of the 1960s. . . . . . . . . . . . 65
xii
5.6 A comparison of the story frames of Birth data from the U.S. Census
Bureau with DataComics and PowerPoint. Stories are composed with
different methods but with one-to-one correspondence in details to
make the user study fair. . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.7 Comparison between DataComics (DC) and PowerPoint (PPT) of
subjective ratings (Likert 1-5 scale) for engagement, speed, space-
efficiency, ease of use, and enjoyability. . . . . . . . . . . . . . . . . . 73
5.8 (a) The original infographics without any changes. (b) Adding red
dotted boxes to highlight the locations for the panels. . . . . . . . . . 78
5.9 (a) The infographic is partitioned into panels. (b) The infographic
is partitioned into panels and captions are added to help address the
storyline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.10 Comparison between single visualization (V), visualization with cap-
tion (VC), data comic panels without captions (DC) and data comic
panels with captions (DCC) of subjective ratings (Likert 1-5 scale)
for engagement, speed, space-efficiency, ease of use, and enjoyability. . 81
6.1 Main user interface of the DataTV prototype tool. A live mode tool-
bar allows for panning and zooming sources as well as scribbling di-
rectly on top of the output. The list panes at the bottom of the
interface allow for controlling the scenes and sources being displayed. 87
6.2 DataTV?s live mode interface where users can zoom and pan in a
data source as well as annotate using a pen, eraser, and highlighter. . 91
6.3 Keshif browser for the Nobel Prize Winners dataset. . . . . . . . . . . 93
6.4 A live DataTV session composing a data video using the Keshif sys-
tem for the Nobel Prize Winners dataset. The user imports an exter-
nal visualization tool to display the economy of South Africa. . . . . . 94
6.5 Webcam and Keshif visualization being recorded for the Nobel Prize
Winner data video. The author is recording a live video ?talking
head? view using a webcam and microphone source as input, which
is composed into the final video output. . . . . . . . . . . . . . . . . . 95
6.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.8 Interface of CommentIQ system supporting multidimensional analysis
for online article comments. . . . . . . . . . . . . . . . . . . . . . . . 99
6.9 DataTV being used to record a streaming data video on race, murder
rate, and SWAT activity. . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.10 DataTV recording the CommentIQ system being used to filter com-
ments over time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.11 Expert 1 reviewed the U.S. budget trend in 2013 during the time of
President Obama. The expert presented their insights using DataTV. 104
6.12 Expert 2 reviewed the population change of Florida. Insights on
housing and population gain were presented using DataTV. . . . . . . 105
A.1 Marine Plastic Pollution [5]. . . . . . . . . . . . . . . . . . . . . . . . 119
xiii
A.2 Scene that shows relations between CO2 and temperature change. . . 120
A.3 Ice are melting with temperature increases. . . . . . . . . . . . . . . . 121
A.4 Scene that seasons are shifting and causing problems for animals . . . 121
A.5 News for Irma Hurricane [5]. Part of Miami will be flooded. . . . . . 123
A.6 Height change [6] along with time. The average height of human
increases with time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
A.7 Ancient Greek [7] ruler. . . . . . . . . . . . . . . . . . . . . . . . . . . 125
A.8 The change of Asia [8] viewed from a map. . . . . . . . . . . . . . . . 126
A.9 The documentary [9] shows how one percent of the population occu-
pies a large percent of wealth. . . . . . . . . . . . . . . . . . . . . . . 128
A.10 The documentary [10] shows how data visualization works. . . . . . . 129
A.11 The documentary [11] shows how numbers of babies varies with religion.130
A.12 The documentary [12] shows how gene pool of human declines. . . . . 132
A.13 The documentary [13] shows how to end poverty. . . . . . . . . . . . 133
A.14 The documentary [14] shows what China?s problem with its neighbours.134
A.15 The documentary [15] shows where real number comes from. . . . . . 135
A.16 The documentary [16] shows how the revolution of big data take place.137
A.17 The documentary [17] shows the relation between wealth and the size
of population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.18 The documentary [18] shows how exactly some procrastinator thinks. 139
A.19 The documentary [19] shows how big data transform business . . . . 140
A.20 The video [20] shows how the poor has more children than others. . . 142
A.21 Phd comic [21]. The ambition decreases along time. . . . . . . . . . . 143
A.22 NFL player report [22] . . . . . . . . . . . . . . . . . . . . . . . . . . 145
A.23 Graph comic for European relations. . . . . . . . . . . . . . . . . . . 147
A.24 Comic style dashboard [23] . . . . . . . . . . . . . . . . . . . . . . . . 148
A.25 Infographical Comics [24] . . . . . . . . . . . . . . . . . . . . . . . . . 150
A.26 Comics for data of restaurants in NYC [25] . . . . . . . . . . . . . . . 151
A.27 Comics of the comparison of characters from Marvel and DC [26] . . 153
A.28 Comics of someone having tatoo [27] on his arm as data visualization. 155
A.29 Spider Man Visualization in Comics [28] . . . . . . . . . . . . . . . . 157
A.30 Cell Phone Visualization in Comics [29] . . . . . . . . . . . . . . . . . 158
A.31 Linear Regression Visualization in Comics [30] . . . . . . . . . . . . . 160
A.32 Visualization to show the stress level change with vocation in Comics [31]161
A.33 The Increase of Entropy of Visualization in Comics [32] . . . . . . . . 162
A.34 The Need of Grooming of Visualization in Comics [33] . . . . . . . . 164
A.35 The Change of Procrastination and Stress Level in Comics [34] . . . . 165
A.36 The Change of Procrastination and Stress Level in Comics [35] . . . . 167
A.37 Curve Fitting Visualization in Comics [36] . . . . . . . . . . . . . . . 169
A.38 Illustration of Conditional Probability Visualized in Comics [37] . . . 171
A.39 The Two Americas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
A.40 The Two America: Trump. . . . . . . . . . . . . . . . . . . . . . . . . 174
A.41 The Two America: Cliton . . . . . . . . . . . . . . . . . . . . . . . . 174
A.42 The Two America: description of each side. . . . . . . . . . . . . . . 175
A.43 The Two America: description of Cliton side . . . . . . . . . . . . . . 175
xiv
A.44 The Two America: comparison of two sides. . . . . . . . . . . . . . . 176
A.45 Strikeouts on the rise . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
A.46 Strikeouts on the rise [38] for each player. . . . . . . . . . . . . . . . . 178
A.47 Missing men for different races [39] . . . . . . . . . . . . . . . . . . . 179
A.48 Which areas have men missing . . . . . . . . . . . . . . . . . . . . . . 180
A.49 Missing men for blacks and whites. . . . . . . . . . . . . . . . . . . . 180
A.50 VisJocky interface [40] . . . . . . . . . . . . . . . . . . . . . . . . . . 181
A.51 ChartAccentc interface [41] . . . . . . . . . . . . . . . . . . . . . . . . 183
A.52 Process of design story with SketchStory . . . . . . . . . . . . . . . . 184
A.53 Sketcholution comic strip and summary. [42] . . . . . . . . . . . . . . 185
A.54 The sketch visualization of the royal members [43]. . . . . . . . . . . 186
A.55 The sketch connects the royal members . . . . . . . . . . . . . . . . . 186
A.56 Visual Explanation of the Relationship of a Cartoon Series [44] . . . . 187
A.57 How mortgage bond combined into sub-prime mortgage [45]. . . . . . 188
A.58 New Orleans housing population decreases [46] . . . . . . . . . . . . . 190
A.59 Top writers for best sellers [47] . . . . . . . . . . . . . . . . . . . . . 191
A.60 Public library report [48] . . . . . . . . . . . . . . . . . . . . . . . . . 193
A.61 How poverty is distributed in the U.S. [49] . . . . . . . . . . . . . . . 195
A.62 Distribution of London Marches in Visualization [50] . . . . . . . . . 196
A.63 Geo-distribution of All the Forces in Yemen Civil War in Visualiza-
tion [51] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
A.64 The Stats of Space Industry and Technology in Visualization [52] . . 198
A.65 The Source of Household Air Pollution in Visualization [53] . . . . . . 199
A.66 The Source of Death Caused by Air Pollution in Visualization [54] . . 200
A.67 The Comparison of North and South Korea in Visualization [55] . . . 201
A.68 The situation of foreclosure across the U.S. [56] . . . . . . . . . . . . 202
A.69 The comparison for the Democrats and Republicans during election [57]204
A.70 The comparison of the firearm possession of the U.K. and the U.S. [58]205
A.71 The guest distribution of the White House Correspondent Dinner [59] 206
A.72 The change of population during days and nights [60] . . . . . . . . . 207
A.73 The wealth distribution [61] . . . . . . . . . . . . . . . . . . . . . . . 208
A.74 The participants and locations of the big Welsh coast walk [62] . . . . 210
A.75 Public library report [63] . . . . . . . . . . . . . . . . . . . . . . . . . 211
A.76 The locations of celebrities? home [64] . . . . . . . . . . . . . . . . . . 212
A.77 Visualization in Halo: Reach [65] . . . . . . . . . . . . . . . . . . . . 214
A.78 Visualization in Call of Duty: : Black Ops [66] . . . . . . . . . . . . . 215
A.79 Visualization in StarCraft II [67] . . . . . . . . . . . . . . . . . . . . 216
A.80 TwitterSheep interface [68] . . . . . . . . . . . . . . . . . . . . . . . . 217
A.81 Twitter interactive Games of Thrones [69] . . . . . . . . . . . . . . . 219
A.82 Twitter Interactive [70]: how world cup news spread. . . . . . . . . . 220
A.83 The architecture for the augmented reality [71] to have a data visu-
alization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
A.84 The architecture for the augmented reality [72] to have a data visu-
alization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
A.85 How AR [72] is used to visualize the world population . . . . . . . . . 224
xv
A.86 The process for the engineers with augmented reality [73] to design
a building with data visualization . . . . . . . . . . . . . . . . . . . . 225
A.87 The process for passengers to view flight data in augmented reality [74]226
A.88 Street viewers obtain information from augmented reality [75] on the
street . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
A.89 Engineers have the pipes shown with augmented reality [76] . . . . . 229
A.90 Engineers have the underground infrastructure shown with augmented
reality [77] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
A.91 A Bio-chemistry researcher has a structure of molecular shown with
augmented reality [78] . . . . . . . . . . . . . . . . . . . . . . . . . . 232
A.92 How virtual reality tool [79] is used to visualize the baseball training
data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
A.93 How virtual reality tool [80] is used to visualize the baseball training
data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
A.94 How virtual reality tool [81] is used to visualize big data in 3D. . . . 236
A.95 How virtual reality tool [82] is used to visualize big data with virtual
objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
A.96 How virtual reality tool [83] is used to do data analytics. . . . . . . . 239
A.97 How virtual reality tool [84] is used to visualize gene information with
graph in 3D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
A.98 How virtual reality tool [85] is used to visualize geographical infor-
mation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
A.99 How virtual reality tool [86] is used to do data analysis. . . . . . . . . 243
B.1 Stock heat map in comic style . . . . . . . . . . . . . . . . . . . . . . 245
B.2 US census data in comic style . . . . . . . . . . . . . . . . . . . . . . 246
B.3 World happiness data in comic style . . . . . . . . . . . . . . . . . . . 247
B.4 Star data in comic style . . . . . . . . . . . . . . . . . . . . . . . . . 249
B.5 The geo origin of beer around world in sequenced panels . . . . . . . 250
B.6 The Arabic-Israel War in sequenced panels . . . . . . . . . . . . . . . 251
B.7 The data about cellphone phished in sequenced panels . . . . . . . . 252
B.8 The data about global wealthy population in sequenced panels . . . . 253
xvi
Chapter 1: Introduction
If raw data is the crude oil of the information age, then digital tools are the
refineries needed to turn data into information?or crude into gas?ready for con-
sumption. Unfortunately, current data science tools typically require significant
expertise far beyond that of the normal citizen. Visualization, which uses interac-
tive graphical representations of data to aid cognition [87], has the potential to lower
the threshold of understanding data by virtue of using visual representations that
are both accessible to casual users [88], as well as scale gracefully to large data [89].
However, even interactive visualization tools such as Tableau [90], Spotfire [91], and
QlikView [92], which provide point-and-click interfaces on standard computers and
devices, require specialized knowledge in mathematics, statistics, and data man-
agement as well as access to clean, complete, and properly formatted data sources.
These barriers are essentially insurmountable to the average person, or what we
here call a casual user [88]: a person with average knowledge in science, technol-
ogy, and mathematics, with normal computer savvy, and with moderate interest in
harnessing data to enrich or enhance their lives.
This dissertation proposes methods for data-driven communication [93] for
casual users [88] using so-called data-driven storytelling [2]: interactive visualization
1
in conjunction with storytelling methods. The focus is primarily on casual users as
consumers of information, not as authors.
In this chapter, we set the stage for this dissertation: the data deluge in today?s
society, the use of storytelling to make all this data accessible to casual users, and
the gaps in current data-driven storytelling practice that this work will address.
1.1 Motivation: Harnessing the Data Deluge
Our society is under a veritable deluge of data [94]; we are inundated with
massive, detailed, and complex datasets, and we need digital tools to help us stay
afloat. This deluge is a double-edged sword: It?s an opportunity, because never before
in human history has so much of our world and our society, of science and of arts, of
medicine and of life itself been captured by sensors and stored in machine readable
form. If we were to harness this data effectively, there is tremendous potential
in improving the collective lives of millions?even billions?of people around the
world. For example, reserving informative patterns and knowledge for big data
provide the public sector an opportunity to improve productivity and higher levels
of efficiency and effectiveness [95, 96]. The opportunities lie in fields such as E-
health, Internet of Things, public utilities, transportation, logistics, public services,
government monitoring, and so on [97].
It?s a challenge, because the data is complex, large, heterogeneous, uncertain,
and fleeting [98]. Extracting actionable information from such data requires a careful
and deliberate approach. When handling big data problems, there are difficulties in
2
data capturing, storage, searching, sharing, analysis, and visualization [96,99]. Fur-
thermore, in a world of ever-increasing surveillance, there are several ethics concerns
that must also be considered [100,101].
1.1.1 Opportunity: The Scale of Big Data
Today?s world is facing the problem that the amount of available data is ex-
ploding [102]. Until 2003, humans had created approximately 5 exabytes (1018
bytes) of data [103]. By 2012, this amount of information was created in two days,
and the digital world of data had expanded to 2.72 zettabytes (1021 bytes) [103].
In 2012, IBM indicated that every day 2.5 exabytes of data was created and 90%
of the data produced by then was produced from 2010 to 2012 [98]. By the year
2020, 50 billion devices will be connected to networks and the Internet [104]. Ac-
cording to estimates, the volume of business data worldwide, for most companies,
doubles every 1.2 years [95, 96]. Several resources exist surveying the scope of big
data [94,105]. To take YouTube as an example, in 2014, there were about 300 hours
of video uploaded to YouTube every minute [106], and almost 1 billion minutes of
videos watched on YouTube every single day [107].
In an era when a common smartphone holds more computing power than all
the computing power of NASA in 1969 combined [108], it is no surprise that humans
are producing a huge amount of data everyday and everywhere. Accordingly, the
ability required to handle big data is increasingly demanding for virtually all fields,
including physics simulation, finance, business, and personal care [94,105]. Further-
3
more, beyond sheer scale, the data comes in many shapes and forms, can be faulty,
and is produced at significant speed.
Obviously, the opportunity in harnessing this data for governmental, societal,
humanistic, or business interests is significant. Many of these approaches are auto-
matic and based on machine learning and artificial intelligence. For example, today?s
recommendation engines know the preferences of their customers, and are able to an-
ticipate which products the consumer will want to buy [109]. YouTube, as the largest
and most popular online video community, can analyze personal preferences of its
large user group [110]. Modern web services, such as Amazon Web Services [111],
Microsoft Azure [112], and Google Cloud [113] have successfully served businesses
across the world with very large data volume and storage requirements.
1.1.2 Problem: The Digital Data Divide
For the average American who is not specialized in data analysis, dealing with
all this data is increasingly challenging. In today?s information society, it is difficult
to be an effective citizen without coming into contact with the internet and the
digital sphere. Most people have to deal with different kinds of data in their daily
lives, from health data collected by sport apps on their smartphones, to stock market
data governing their finances [102]. They have to use the internet to balance their
checkbooks, pay their bills, and manage their mortgages. In this case, the amount
and type of data makes it increasingly difficult for the average American to handle.
Given the ubiquity of data, there is a risk that people not capable of analyzing
4
and understanding data will fall behind in society [114]. This effect is only exacer-
bated by the increasing scope and magnitude of the aforementioned data deluge and
the lack of effective, accessible, and approachable tools for casual data analysis. We
refer to the gap between the difficulty of processing large amount of data and the
lack of data analysis skill and processing power of casual users as the digital data
divide.
Overcoming this barrier requires tackling both the lack of ICTs (information
and communication technologies, akin to the original definition of the ?digital di-
vide? [115,116]), as well as the lack of specialized data analytics knowledge. For the
former, technology progress is fortunately helping; there is more computing power
available for casual users than ever before. A Samsung flagship smartphone to-
day possesses more floating point computing power than the supercomputer Deep
Blue [117, 118], which beat the world champion in chess [118], and mobile devices
are coming down in price while the interfaces are improving. Even elderly users can
make full use of most apps on a smartphone with basic training. These factors all
contribute to shrinking the divide in terms of device access.
However, this still leaves a significant gap in data analytics capability. Most
current data science tools are simply not designed for novice users. One possible way
to overcome this is to leverage data visualization: the use of interactive graphical
representations of data to amplify cognition [87]. Unfortunately, even commercial
point-and-click data visualization tools?such as Tableau [119], Spotfire [120], and
Qlik [92]?rely on significant training and intimate knowledge of many advanced
mathematical and statistical concepts.
5
1.1.3 Setting: Data for Casual Users
While visualization is more approachable than typical data science tools, vi-
sualizations must still be deliberately designed to be appropriate for casual users :
non-professionals with limited time, skills, and motivation. In 2007, Pousman et
al. [88] proposed the topic of casual information visualization as visualization sys-
tems that are not designed for professional work tasks. They went on to paint a
vision for using computer mediated-tools to depict personally meaningful informa-
tion in visual ways that support everyday users in both everyday work and non-work
situations.
Many examples of casual visualizations exist; we give a sampling here. Some
help normal citizens understand complex scientific phenomena [121]. Embedding
visualization into presentation software such as PowerPoint [122], Tableau [123],
and Excel [124] can give people who are not familiar with visualization skills great
leverage over interpreting and analyzing large amount of data. Other tools dealing
with data for everyday users are also rich in visualization applications. For exam-
ple, line charts can be used to show the moving average of changes in the stock
market [125], health apps use histograms and pie charts to show the workout daily
distribution [126], and personal accounting software uses visualizations such as line
charts, treemaps, and histograms to show the annual change of a person?s financial
situation [127].
However, while casual visualization is a powerful method for conveying data to
the average user, this entire topic represents a large design space. Applying casual
6
visualization to aid data-driven communication requires adopting a specific strategy.
In this dissertation, we use storytelling for this purpose.
1.1.4 Solution: Data-Driven Storytelling
Data-driven storytelling [128] is the use of traditional storytelling techniques?
such as from oral narratives, fiction, and film?to convey narratives about data.
This approach to visual communication [93] is particularly appealing because of
its high familiarity and accessibility to casual users. Thus, our focus here is on
data-driven storytelling for casual users. As defined above, a casual user is a
non-professional consumer of data that has limited time, skills, and motivation. In
addition, our emphasis here is primarily on consumers as casual users; the authors
of a data-driven story may still need to be an expert.
Data-driven storytelling has seen increasing popularity in the visualization
and human-computer interaction communities. Using methods such as narration,
animation, cinematography, art, and design, these methods aim to shape findings
from complex and large-scale data into a story form that is more amenable to human
understanding than typical data displays. In particular, a recent book on data-
driven storytelling by Riche et al. [128] reviews this burgeoning field. However, the
repertoire of available media types for such data-driven stories is still limited.
7
1.2 Background: Data Visualization
Here we take a step back and review the research area of data visualization
and its related topics and technologies.
1.2.1 Definition
Data visualization represents data using visual displays to improve cogni-
tion [129]. It can also be considered as a medium to capture and share thoughts on
data with others [130]. More specifically, it is a process of exploration, analysis, and
presentation for data with graphics information and tools [131,132].
1.2.2 Examples of Data Visualization
Data visualizations are designed for both visualization professional and casual
users to understand and analyze data. While visualization has been around for a long
time [133], the academic field was formally founded in 1987 with a special volume on
the use of computer graphics for scientific and engineering applications [134]. Here
we review some representative examples of data visualization.
Multiple visualization tools exist, both commercial and academic. Tableau [90,
135] is a popular commercial visual analytics tool used for creating visualizations.
This tool is mostly used by people with background of business analytics or infor-
mation visualization. Additional commercial tools include Spotfire [91], Qlik [92],
and Keshif [136].
In the last decade, visualization has increasingly moved from personal com-
8
puters to a web-based ecosystem. D3.js [137] is the primary JavaScript package used
by front-end developers to create attractive data visualizations. Google Charts [138]
is another widely used package for front-end development to create rich interactive
data visualizations. The most recent development is the introduction of visualiza-
tion grammars, such as Vega [139] and Vega-Lite [140], which enable creating charts
using declarative specifications.
1.2.3 Data Visualization for the Masses
More and more people are facing the need to analyze, present and interpret
data [141]. Data visualization is not only used by professionals, but also by casual
users who adopt visualization as a tool to improve presentation, exploration, and
analysis. Wattenberg et al. [142] gave an example of how a visualization of baby
names can bring together communities of people on the internet. Many Eyes [143]
was one of the first visualization systems ?for the masses? that enabled people, often
casual users, to collaborate on data using just their web browsers. This has shown
that visualization has been both a challenge and an opportunity for the masses when
dealing with large range of visualization tools to handle and analyze data. Viegas
and Wattenberg call this idea communication-minded visualization [93].
Today, data visualization has affected lives in many different ways in education,
medicine, finance, meteorology, astronomy, etc. We have our smartphones loaded
with apps with data visualization such as Uber, Google Map, and stock market
apps. We see people tweet about their workout condition with visualization of their
9
physical data. We use Google Map everyday for navigation. We predict stock price,
house price, and other price trends with line charts or other dashboards. We also
use data visualization to present data in our PowerPoint slides.
Adopting visualization for a mass audience requires adopting casual visualiza-
tion practices. Freyne et al. [114] note that next generation visualization tools are
set to play an important role to overcome the challenge of large amount of raw data,
but current generation of visualization tools are sometimes too complex for typical
users. Infographics is a widely used media that primarily consists of data visualiza-
tion, and tools such as Infogram [144] provides means to create such artifacts easily.
More effort is needed to make this kind of mechanisms accessible to casual users.
1.3 Background: Data-Driven Storytelling
In data-driven storytelling, we harness the age-old practice of storytelling to
create narratives about data. This section describes the definition, background, and
existing work for data-driven storytelling.
1.3.1 Definition of Storytelling, Story, and Data-driven Storytelling
Storytelling is the conveyance of a sequence of events (often involving charac-
ters and places)?stories?using speech, sound, visuals [145], and other multisensory
stimuli, and has a history spanning thousands of years [146, 147]. It is one of the
oldest form of human communication, record-keeping, and entertainment, well pre-
dating the written word [146?148]. Stories?sequences of events involving characters
10
Figure 1.1: Genres of narrative visualization by Segel and Heer [2].
and places?are particularly well suited for this purpose because their chronological
structure enables memorization and recall, entices listeners, and facilitates under-
standing [145, 148]. For this reason, narration and storytelling retain important
roles even in today?s information society, where these properties are particularly
important in helping people understand an increasingly complex world. This has
recently given rise to data-driven storytelling where narrative techniques are utilized
for telling stories about data [2], often using visual media [149,150].
A visual narrative is a story told primarily using visual media, such as illus-
trations, photographs, animations, video, and?now?visualization [149, 150]. In
particular, visualization has a specific proclivity for communication by virtue of its
graphical form, yielding the notion of communication-minded visualization [93] to
support collaborative analysis. Combining the idea of communication-driven visu-
alization with storytelling yields the notion of data-driven storytelling : narrative
techniques for telling stories about data [2, 128].
11
1.3.2 Existing Frameworks
Our focus in this thesis is on exploring communication mechanisms?or me-
dia?used for conveying data-driven stories that are specifically suitable for casual
users. Here, we define ?media? as the channel or the tools used to store and deliver
information. While data-driven storytelling is a nascent research topic in visual
communication and visualization, there has so far been no specific focus directed to
the specific media used. Instead, existing efforts tend to revolve around the seven
genres of narrative visualization proposed by Segel and Heer [2] (Figure 1.1):
? Magazine Style: A data-driven image integrated in a page of text, where
the text refers to and explains the image.
? Annotated Chart: Chart adorned with descriptive text and labels for the
purpose of explaining its contents.
? Partitioned Poster: A poster or dashboard consisting of multiple images,
each with separate data.
? Flow Chart: Visually ordered sequence of images and annotations designed
to tell a story.
? Comic Strip: Sequence of frames containing images and text organized in a
comic-style strip layout.
? Slide Show: Deck of slides combining images and text to sequentially tell a
story.
12
? Film/Video/Animation: Motion graphics that incorporate data-driven im-
agery and visualizations, often animated.
However, as readily admitted by Segel and Heer, their findings are limited to
a sample of 58 examples. They also do not claim that their genres are exhaustive,
noting for example that their work did not include video games or e-learning tools.
Furthermore, the above seven genres conflate the media used for storytelling with
the format, method, and components employed. The seven genres have proven to
be extremely powerful for categorizing research in this field. They have even played
a prescriptive role, with Graph Comics [151] and Data Comics [1, 152] arising as
examples of the Comic Strip genre, Data Videos [153,154] drawing inspiration from
the Film/Video/Animation genre, and infographics [155] from the combination of
text and visualization However, there certainly is room for expanding the framework
further.
This suggests that the community may be limiting itself by needlessly adhering
to a framework that was intended to be generative rather than delimiting. What
about using the spoken word for data-driven storytelling, i.e., supporting speakers
talking to an audience? What about the written word, i.e., data-driven prose,
such as for inclusion in a textual report? Much innovation remains in data-driven
storytelling, but this requires going beyond existing frameworks.
13
1.3.3 Brief History of Storytelling
From the hunter returning from his latest foray to tell tall tales of stalking his
prey, to the shaman spinning a yarn about the origins of the gods, the stars, and the
moon, storytelling is one of the oldest ways for people to record, communicate, and
inherit information. There are drawings portraying fighters against wild animals be-
fore stone ages. The famous Homeric Hymns has provided us countless inspirations
for ancient Greek history. We learned the ways how ancient people were thinking
and behaving by all kinds of historical literature in the form of stories. Oral, written,
and drawn artifacts were the three major ways of storytelling in ancient times.
Modern technologies have helped storytelling evolve into multimedia forms
including audio, video, games, etc. Recently, virtual reality and argument reality
have joined the arsenal of methods and media for storytelling.
1.4 Purpose of This Work
The overall purpose of this Ph.D. thesis is to expand the horizon of media
for data-driven storytelling to aid casual users viewing, analyzing, and
understanding data. To achieve this, our work collects extensive examples and
explores the use of different media types for this purpose. From the taxonomy and
guideline derived, we investigate media types particularly useful for casual users
with little professional training or background in data visualization and analysis.
In particular, based on recommendations from Segel and Heer [2], this work
was the first to propose a practical system for leveraging comics (sequential art [156])
14
for data-driven storytelling [1] (first submitted in September 2014). Furthermore,
our work is the first to propose live-streaming data videos for this purpose (in sub-
mission). Finally, our taxonomy reviewing media types for data-driven storytelling
is, to the best of our knowledge, the first of its kind (in submission).
15
Chapter 2: Related Work
In this chapter, we will examine the literature that helped setup the back-
ground, definition and other relevant materials.
2.1 Visualization and Visual Analytics
Visualization is defined as translation from data to image [157], or the use of
interactive graphical representations of data to aid cognition [87]. Visual analyt-
ics is an analytics discourse to make the processing of information and data more
transparent [158]. In ?Illuminating the Path? [159], visual analytics is defined as
analytical reasoning science with interactive visual interfaces.
2.1.1 Foundational Work on Data Visualization
There are many early attempts on building taxonomies of visualization system
for its data [160] and algorithms [161]. The design space of information visualization
has laid foundations for research on the components of visualization [162]. Both in-
formation visualization and visual analytics are possible ways to handle the problem
of information overload [158].
16
2.1.2 Examples of Classic and Well-known Visualizations
While only recently recognized as a research field in its own right, visualization
has a long history [133]. For instance, the famous Napoleon March Map by Charles
Minard [4] in Figure 2.2 is a well-known example of using visualization to tell the
story of Napoleon?s failed campaign into Russia. John Snow?s cholera map [3] in
Figure 2.1 is an early example of using a dot graph to map the relation between
cholera death and location of water wells. The diagram of the deaths during the
Crimean war by Florence Nightingale shows the relation between causes of death
and time [163].
Figure 2.1: Visualization of locations of cholera cases and wells [3].
17
Figure 2.2: Visualization of Napoleon?ss failed campaign into Russia [4].
2.2 Production, Presentation, and Dissemination
The research agenda for visual analytics called for a focus on the production,
presentation, and dissemination of analytics results to stakeholders, policymakers,
and the public [159].
? The case for communication: Vie?gas and Wattenberg remarked upon
the proclivity of visualization for communication by virtue of its graphical
form. They encouraged focusing on so-called communication-minded visual-
ization [93] where communication enables collaborative analysis.
? Embedded dissemination: To reach its full potential, communication ca-
pabilities should be integrated into the visualization tools themselves [159]; for
example, Tableau now incorporates the story points feature [119], and most
commercial tools support exporting interactive dashboards and workspaces to
18
the web.
? Literacy and casual viewers: Presenting insights from data to the masses
requires taking the visualization literacy [164] of everyday users into account.
Thus, the notion of ?casual visualization? [88] is important.
2.3 Visual Communication and Visualization
Visual aids such as images, signs, typography, icons, and drawings have long
been used as a particularly effective medium for communication [150]. Beyond
human perceptual factors, part of the reason for this effectiveness is the mutual
knowledge, mutual beliefs, and mutual assumptions that visual communication en-
joys. These mutually agreed-upon conventions allow a particular medium?such as
the visual?to encode significant amounts of information with minimal resources
given.
Visualization is a particular form of visual language traditionally used for soli-
tary sense-making. The notion of communication-minded visualization (CMV) [93]
builds on ideas from visual communication by noting that visualization can often
be used for more than just individual insights. There are many examples of CMV
systems that exist as precursors to storytelling, including Themail [165], the Baby
Name Explorer [142], and Isis [166].
19
2.4 Visual Storytelling
Storytelling is when a sequence of events are conveyed using plots, locations,
and characters. It is a particularly effective communication medium because of
the typically high degree of common ground shared between narrator and listeners.
Visual storytelling draws upon imagery?both static and dynamic?for this purpose,
and includes media such as film, television, animation, design, and even art.
Already in 2001, Gershon and Page [167] suggested using storytelling in visu-
alization to improve its use for visual communication. In fact, the newly popular
infographics practiced on the web is built on many such storytelling principles. De-
spite this, it is only recently that data-driven storytelling was fully embraced by
the visualization community, such as the survey by Segel and Heer in 2010 [2], and
successful workshops at the annual conference in 2010 and 2011 [168,169]. Hullman
and Diakopoulos followed this up by studying how framing, context, and design
impact the rhetoric of a narrative [170]. Since then, several practical methods
and techniques have been proposed, including using free-form sketching for nar-
ration [171], story points in Tableau [119], and automatic spatialization for visual
exploration [172]. Most recently, Hullman et al. [173] studied sequence in narrative
visualization, proposing a graph-driven approach for transitioning between views to
minimize load on the viewer.
20
2.5 Data-Driven Storytelling
Combining the idea of communication-driven visualization with storytelling
yields the notion of data-driven storytelling : narrative techniques for telling sto-
ries about data [2]. Gershon and Page first proposed using storytelling for visual-
ization [167], and their work has since been followed up by workshops [168, 169],
surveys [2, 170], and even commercial tools [119].
The purview of data-driven storytelling has quickly grown, from dashboards
and slideshow presentations [119] to more esoteric formats such as sketching [171],
journeys in time and space, and even comics [151,174]. In their survey of narrative
visualization, Segel and Heer [2] identified seven genres?magazine style, annotated
charts, partitioned posters, flow charts, comic strips, slideshows, and videos?and
also suggested that future storytelling approaches may combine genres.
2.6 Comics as a Storytelling Medium
Comics are often defined as sequences of images??sequential art? [175] or
?sequential images? [176]?that combine to tell a story using graphical means [149].
It is therefore a visual communication medium. Because of its familiarity for many,
it enjoys a high level of common ground for conveying information. Furthermore,
the visual language of comics is often clear, concise, and intuitive [176,177].
How comics affect the reader has long been a topic of study. Dorfman et
al. [178] discussed the ideology in Disney comics from the perspective of culture and
21
economics. McCloud [175] built on these ideas by suggesting that the engagement
in comics mainly arises from the simplified and non-photorealistic appearance of
faces and characters, increasing recognizability and facilitating imagination. In a
way, this ?vague and unspecific? nature and lack of fixation in comics helps bridge
the gap between books and film [179].
Some efforts have tried to harness comics for visualization. Jin and Szekely [174,
180] proposed a visual query environment that uses a comic-strip metaphor for
querying and presenting temporal patterns. However, their system solely uses comics
strip for layout, and does not leverage the full potential of comics as a visual com-
munication medium.
The most recent and most relevant work in this vein is Graph Comics [151],
which are data-driven comics used for telling stories about dynamic networks. Si-
multaneously invented as our work in this dissertation, Graph Comics draw on the
same comic medium principles as Data Comics, but are restricted to node-link di-
agrams. Furthermore, whereas the Graph Comics work merely proposes the idea
and explores its utility, it does not provide any authoring support for creating them.
Thus the Graph Comics effort is complementary to the more general Data Comics
approach proposed here.
There are other relevant works including applications and surveys. Kim et
al. [181] proposed the DataToon for interactive data comics and dynamic networks.
Moore et at. [182] introduced the comic-strip narratives in time geography. Bryan et
al. [183] proposed an approach for interactive annotation to narrative visualization
with comic-strip style snapshots. Wang et al. [184] compared the effectiveness and
22
engagement of data comics and infographics. Wang et al. [185] also explored how
to teach data visualization and data-driven storytelling with data comics. Bach et
al. [186] explored the design patterns of data comics for data-driven storytelling.
2.7 Animated Graphics as a Storytelling Medium
Video-based storytelling is making significant inroads on the internet as a
whole. Never mind streaming services such as Netflix, Hulu, and Amazon Prime,
which according to the network service company Sandvine accounts for up to 70%
of peak internet traffic [187] (December 2015), the new generation of so-called
?YouTubers?, or YouTube celebrities, are challenging the boundaries of the medium
through novel formats such as ?vlogging? (short for video blogging), ?reaction
videos? (recording of a person reacting to an event), or ?unboxings? (video of some-
one unpacking a new product, commonly a high-tech one). One format in particular
is fascinating: so-called ?Let?s Play? video capture the gameplay as well as live au-
dio and video of a person playing a computer or console game. Sometimes likened
to watching a friend play a game while sitting on a couch in their home, Let?s
Play videos are characterized by focus on the often humorous, irreverent, and some-
times profanity-filled commentary that the recorded person provides. Popularize
by Swedish YouTuber (and now general celebrity) Felix Kjellberg, better known by
his YouTube alias PewDiePie and holding the distinction of having more than 42
million follower and more than 10 billion views, this phenomenon has since given
rise to live-streaming Let?s Play videos, such as through the online streaming service
23
Twitch.tv.1 Combined with their focus on broadcasting eSports, where professional
gamers play electronic games for prize money and salaries, Twitch has quickly risen
to be the fourth largest source of internet traffic in the United States [188].
Amini et al. [153, 154] recently identified data videos as motion graphics that
combine both sound and visuals to tell a data story. Pointing to prominent examples
from the New York Times and the Guardian, their work is descriptive and formative
in nature, engaging professional storytellers to use visuals to craft their narratives.
While obviously deeply influential to our work, their treatment focuses on the careful
and deliberate production of painstakingly designed data videos through ideation,
sketching, storyboarding, capturing, and editing; live streaming is not covered or
even considered, and the software, resources, and skills needed of their approach
is substantial. Furthermore, unlike our DataTV platform, their work provides no
technical interventions to support data video authoring.
1http://www.twitch.tv/
24
Chapter 3: Taxonomy of Media for Data-Driven Storytelling
We present a new taxonomy focused on the media for data-driven storytelling
with the purpose of opening the field to a wider set of future possibilities. Our work
started with collecting a significant corpus of evidence of data-driven storytelling
using novel and diverse media, from the spoken word to interpretative dance and
choreography. We then use these wealth of data to derive a taxonomy and classify
all of these examples into a coherent framework. Finally, by generalizing across
storytelling practice for different media, we derive design guidelines for data-driven
storytelling and discuss how future narrative techniques about data may look.
3.1 Method
To collect examples with enough quality and quantity, we start from data
features and storytelling features, by collecting examples with proper data com-
ponents and clear storytline. The sources of examples are online articles, blogs,
visualization tools, video websites, books, research papers, and software packages.
In order to generate a robust taxonomy, we chose the examples from a wide range
of genres including infographics, documentaries/data videos, data comics, visual
analytics/visualization tools, virtual reality visualization tools, augmented reality
25
visualization tools, Computer game tools, and etc. Figure. 3.1 shows the distribu-
tion of all the sources of examples. We started from about one hundred and thirty
examples. Ninety-one are left after removing examples without clear storyline, ex-
plicit data for a story, or clear group audiences.
Distribution of Examples
Others
10.3%
Computer game tools Data videos 
3.1% 23.7%
Augmented reality 
9.3%
Virtual reality 
7.2%
Infographics
14.4%
Web articles 
Data comics 
5.2%
18.6% Visual analytics/ 
8.2%
Figure 3.1: The distribution of all the sources of examples.
3.2 Evidence
To illustrate the prevalence and variety of different data-driven narratives in
the world, we here enumerate and discuss a set of representative and innovative
such examples. We only list three representative ones; the rest are explained in
Appendix A with classification in Table 3.1. The purpose is to provide a basis for
a taxonomy that can be used to classify the storytelling media used for the data-
driven narrative. For each example, we use an informal classification scheme to
26
describe the media in more detail. This scheme will then feed into our taxonomy in
the following section.
Figure 3.2: Image from the video A Day in the Life of the (Polluted) Ocean.
3.2.1 Exhibit 1: Storytelling in Movies and Documentaries
The movie ?A Day in the Life of the (Polluted) Ocean? [189] talks about the
pollution emission of the main character (the person in yellow) for one day. The
storyline moves with him as a regular person getting up in the morning and proceeds
with his normal day. The idea is to show how much pollution a normal person can
produce with simple activities.
Figure 3.2 depicts one scene that shows that the amount of plastic waste
produced for every capita in each continent. In this scene, the underlying data is
the weight of plastic waste, and the story is that there is too much plastic waste
produced for generating a certain amount of value. There is also a simple chart on
27
the left upper corner illustrating the portion of plastic waste per capita for each
continent. This scene has both simple numbers and data visualizations to facilitate
the storytelling.
Informal classification: As a movie that is shared online, the audience is po-
tentially many people on the internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. The visual components used include
full-motion video, animated graphics, text, and non-interactive static data visual-
izations. Video is static in that it cannot be manipulated or interacted with (except
for controlling the playback) by the audience. The typical way to view a video is
in sequence. However, it is also replicable, as it is stored and can be played back at
any time.
Figure 3.3: Physical dance demonstration of a bubble sort algorithm.
28
3.2.2 Exhibit 2: Dance
Figure 3.3 shows an image from a performance where dancers demonstrate how
the bubble sort algorithm works [190]. This algorithm pushes the largest element
to the right and forms a ordered sequence of numbers. For clarity, each dance is
labeled in the this picture. In reality, the dancers wear uniforms with numbers on
them, where each dancer stands for a different element in the array. The storyline
is the movement of all the array elements. The data is the ordered sequence.
A similar data-driven dance performance was created by the NSF-funded
Dance.Draw project [191], where the movements of dancers in a physical space
was conveyed using visual representations. This mechanism could also be used as a
vehicle for conveying data-driven stories.
Informal classification: Dance performances typically take place in an audi-
torium or studio, which supports a large audience. Disregarding video recordings
of the performance (which would be another form of media), this does require the
audience to be physically co-located with the performance, and to consume it in
synchronously, in real-time. The sorting process demonstrated by sorting is view
in sequence. This means that the performance is not stored; it is ephemeral. The
components of the performance are human bodies in motion over the duration of the
performance, and can also include text, sound, light, and visuals (typically projected
in the workspace).
29
Figure 3.4: Data-driven story of a session in the PC game Civilization 3.
3.2.3 Exhibit 3: Data-Driven Storytelling in Video Games
While Segel and Heer [2] explicitly note that they chose not to include video
games in their survey, games have long been instances where visualization is often
integrated. As it turns out, they have also been used for data-driven storytelling.
Figure 3.4 depicts an image from the replay session that is shown at the end of
a completed game session, i.e., after one of the players (human or computer) has
achieved one of the victory conditions: conquering all opponents, winning the space
race, taking over most of the land, or scoring a diplomatic or cultural victory. The
replay shows a history of how each civilization was founded, expanded across the
30
world, and was eventually defeated. While the interaction is limited, playback
controls allows the user to go back and forward in the history. Similar session
playback functionality?often called theater mode?can be found in Halo 3 and
Call of Duty: Black Ops.
Informal classification: The audience for most theater modes is individual
players who wants to study their own and other players? performance. However,
many theater modes typically also allow the player to cut and paste clips together,
eventually producing a resulting video to share with others. The resulting video will
have the same features as a data video (see above). A playback session typically
uses a map view, so the bandwidth requirement is lower than full-motion video, thus
reducing the cognitive load. One feature of most theater modes is that they make
it easy to navigate in 2D or 3D in the scene, thus changing the viewpoint. The
viewing sequence of a playback system is in sequence. While the action itself cannot
be changed (since it represents events that already happened), this interaction is
powerful in that it can, for example, allow a player to put themselves in the shoes
of another player to see what an encounter looked like from their viewpoint.
3.3 Taxonomy Dimensions
We propose to identify, study, and classify media that have traditionally been
used for storytelling. The purpose of this activity is to expand existing genres of
narrative visualization to encompass the entire scope of storytelling in society. This
would generate a wealth of new research ideas for how to best use such media for
31
data-driven storytelling. We found the following dimensions useful in classifying
such media, based on our survey of the existing evidence of a wide variety of media
for data-driven storytelling:
? Audience Cardinality (A): Who is the intended recipient for the story?
? Space and Time (S/T): What is the temporal and spatial delivery mecha-
nism for the story?
? Visual Components (VC): What are the visual and sound building blocks
employed? Are the components interactive?
? Data Components (DC): How is the data conveyed to the viewer?
? Media Viewing Sequence (SQ) How is the media viewed by the viewer?
? Storage and Persistence (SP): How is the media preserved. Is it temporary
or replicable?
The methods we used to derive these dimensions included reviewing a sam-
ple of about one hundred and thirty examples of data-driven storytelling, and then
narrowing our selection down a set of 91 representative examples. We looked at
the fundamentals of the process regarding creating data stories, broadcasting them,
and interacting with them. From the dimensions we selected, we are able to deter-
mine a profile of a data-driven storytelling media type, so that the media types are
descriptive enough in our taxonomy. For example, the dimension Data Bandwidth
is highly correlated with data components. But this dimension is very important
32
for storyteller when choosing a media type to convey large amount of data in real
time. These examples have been classified into the taxonomy and can be viewed in
Table 3.1.
3.3.1 Audience Cardinality (A)
In our notion of the audience of the data-driven story, we also include the
storytellers: whether it is one or several people who are creating or viewing the
narrative, respectively. A group performance, such as the bubble sort dance in the
example above, would be an example of many storytellers (the dancers) conveying
data to many recipients (the audience in the dance studio). When choosing the
Audience Cardinality, the storytellers can adjust the way of the story is conveyed.
For example, by changing the way computer game is recorded, we have changed the
media type from local video to online broadcasting, and thus we have changed the
Audience Cardinality from one-to-one to one-to-many.
Audience has one of the below values:
? One-to-one (1:1): one storyteller and one recipient, such as in a private conver-
sation. The traditional storytelling process from one storyteller to an audience
is a typical one-to-one process.
? One-to-many (1:N): one storyteller and many recipients, such as a speaker
giving a talk to a group. Most media with broadcast ability is one-to-many,
since one storyteller often creates the data story and communicates it to more
than one receipt. For example, the creating and broadcasting process of info-
33
graphics, data comics, and data videos.
? Many-to-many (N:N): many storytellers and many recipients, such as a dance
troupe giving a performance to an audience. There are media types that
allow multiple storytellers to broadcast at the same time. For example, in the
advanced version of Data TV, the system allows more than one user to tell
data stories at the same time.
Although the values of this dimension seem straightforward, there are situa-
tions from which choosing values can be hard. For example, during the live stream-
ing with data stories, the video is broadcast to multiple viewers. This seems typical
one-to-many cardinality. However, for extreme cases, such that there is only one
recipient, the broadcast of the media becomes one-to-one. In general, we only cover
the basic cases here.
3.3.2 Space and Time (S/T)
We borrow the notion of space and time from CSCW [192], where the space-
time matrix has long been used to characterize forms of groupware based on the
spatial and temporal relations of the human users. Some media types naturally
require that the storytellers and recipients are located in the same place, such as
dancers and their audiences. Other media types, such as infographics, movies, and
comics are easily recorded and broadcasted. The storytellers and the recipients can
stay in different places and time periods. It is a useful property because both space
and time have a significant impact on the delivery and storage mechanism for the
34
data-driven story.
Space and time has two values, one for each dimension:
? Space: relative physical locations of storyteller and recipient.
? Co-located (coloc): the storyteller and the recipient are in the same phys-
ical space. Traditionally, the storyteller is in the same location with the
audience when communicating a story, such as giving a speech, showing
a dancing performance, or giving a presentation in person.
? Distributed (distr): the storyteller and the recipient are not in the same
physical space. Some examples of this space attribute are media types
using modern technology such as live video streaming, phone meeting,
and other collaborative work tools.
? Time: temporal locations of storyteller and recipient.
? Synchronous (sync): the storyteller is delivering the story to the recipi-
ent in real time. This time attribute applies when it is required that the
storytellers and recipients share the same temporal locations. Such in-
stances can be included for media types that have a broadcasting ability
and have no method of saving a record, such as live streaming and dance
performances.
? Asynchronous (async): the storyteller is delivering the story in a form
that will be consumed by the recipient at a later time. Examples of
this include media types that have default saving capabilities, such as
35
infographics, in which the recipient can consume the story anytime after
the story has been delivered.
The dimension values Co-located and Distributed are very important in terms
of determine the media type to use when there are limited resources. For example,
if the storyteller and audience are located separately, the storyteller has to choose
a media type that is able to convey the story in a Distributed way.
This dimension can be heavily affected by the habits and preferences of the
storytellers and recipients. For example, dancing performance are often recorded by
video. As a result, the habitual way to tell this story is by distributing in both time
and space, allowing recipients to view it at a different location in the future.
3.3.3 Visual Components (VC)
The nature of the media captures the composition of the media being used
for data-driven storytelling. Since we are typically talking about composite media
types, the Media Components is a set variable that can include one or several of the
below:
? Audio (aud): audio, such as speech recordings, ambient noise, or sampled
sound effects.
? Photographs (pho): pixmap images.
? Live video (vid): animated pixmap images with basic control.
? Static graphics (gra): non-dynamic vector graphics.
36
? Animated graphics (ani): dynamic vector graphics with control.
? Text (txt): textual representations.
We covered the media components that used for data-driven storytelling, and
did not include other components that do not contribute as much. This list of media
components may not be complete, as we only covered the major media components
shown by the examples.
3.3.4 Data Components (DC)
The core purpose of a data-driven storytelling artifact is to convey data from
the storyteller to the viewer. The form that this takes is captured in the Data
Components (DC) dimension. For this dimension, we enumerate the most common
components used by the examples. When designing the Data Components for data-
driven storytelling media, storytellers are choosing what is the way that data will
be presented. Our enumeration of the components is sufficient to cover most of the
common data expressions. For example, we know that table is used for structured
data, map is used for geological information, and time visualization is used for
continuous time series. It is a set variable that will take or several of the below
values:
? Table (tab)
? Map (map)
? Statistical graphics (stat)
37
? Discrete event visualization (event)
? Continuous time visualization (time)
? Graph visualization (graph)
? Text visualization (txtvis)
? Formula (fmla)
This dimension covers the major artifact that carries data from the examples.
The list of values can also suffer from incompleteness caused the limited scope of
the examples.
3.3.5 Media Viewing Sequence (SQ)
From the nature of the media types, there are viewing sequences better ac-
cepted than others. For example, a comic strip can be viewed in sequence or with
branches in most of the cases. A video is mostly viewed in sequence. Infographics
can be viewed in parallel, which makes them capable of conveying information faster
in certain situations, and thus can be more difficult to understand [1].
? Sequence (seq): viewed linearly, the stroyline is in sequence.
? Branch (brch): viewed in sequence, but with branches.
? Parallel (prll): be able to viewed in parallel.
38
3.3.6 Storage and Persistence (S/P)
While related to the delivery mechanism of the story artifact, the storage and
persistence aspects of the artifact governs how and when it can be consumed. It
takes one of the following values:
? Ephemeral (eph): generated once and not recorded, not easily replicated.
Some media types have stories received by the audience immediately after
the stories are generated and told, such as gestures, speeches and facial ex-
pressions. These types of storytelling process relies on other tools to record
them and are hard to replicate. These media types can be recorded by other
media types, but we only consider the media types by themselves.
? Replicable (rep): generated once, but can be replicated by the storyteller on
demand. Some media types are easily replicated and stored, such as video,
audio and infographics.
3.4 Examples Using the Taxonomy (Applications)
With our taxonomy, we explored a few applications with new designs. By
changing the values of certain dimensions, we were able to test or propose new
media for data-driven storytelling.
39
40
Media Evidence A S/T VC DC SQ SP
Video game replay 1:1 coloc/sync vid map | events seq rep
Life of the Ocean [189] 1:N distr/async vid tab | stat | txtvis seq rep
A Beautiful Planet [193] 1:N distr/async vid tab | stat | txtvis seq rep
News Irma Hurricane [5] 1:N distr/async vid map | graph seq rep
Height Gene [6] 1:N distr/async vid tab | stat | txtvis seq rep
NFL Players Data [22] 1:N distr/async gra | txt tab | stat | txtvis | event | time seq | brch rep
Graph Comics [151] 1:N distr/async gra | txt graph seq | brch rep
The Two Americas [194] 1:N distr/async gra | txt tab | stat | txtvis | event seq | brch rep
Strikeouts on the Rise [38] 1:N distr/async gra | txt tab | time | event prll rep
Missing Black Men [39] 1:N distr/async gra | txt tab | time | event prll rep
VisJocky for Stocks [40] 1:N distr/async gra | txt time | event seq rep
ChartAccent for Countries [41] 1:N distr/async gra | txt time | event seq rep
SketchStory [171] 1:N distr/async gra | txt tab | time | event | graph seq rep
Skecholution [42] 1:N distr/async gra event | graph seq rep
New Orleans Housing [46] 1:N distr/async gra | txt event | graph prll rep
Top writers for best sellers [47] 1:N distr/async gra | txt tab | event prll rep
Public Library Report [48] 1:N distr/async gra | txt tab | stat | txtvis | event prll rep
HALO Reach [65] N:N distr/async gra map seq rep
Call of Duty: Black Ops [66] 1:1 coloc/async gra event seq rep
StarCraft II [67] 1:1 coloc/async gra event seq rep
Twittersheep [68] 1:N distr/async txt txtvis seq eph
Twitter Interactive GOT [69] 1:N distr/async gra | txt graph | txtvis prll eph
Interactive: Tweets Spread [70] 1:N distr/async ani | txt map | stat | graph | time seq rep
Dance of Sorting [190] N:N coloc/sync music | txt txtvis seq eph
Microsoft PowerPoint 1:N coloc/sync aud | gra | ani | txt tab | stat | event | time | txtvis seq rep
IBM Many Eyes [143] 1:N distr/async gra | txt tab | stat | event | time | txtvis seq rep
Storyfy [195] 1:N distr/async gra | ani | txt tab | stat | event | time | txtvis seq rep
Table 3.1: Classification of our representative sample of different media used for data-driven storytelling?one.
41
Media Evidence A S/T VC DC SQ SP
Height Gene [6] 1:N distr/async vid tab | stat |time | txtvis seq rep
Ancient Greece [7] 1:N distr/async vid tab | map |stat |graph |event seq rep
The history of Asia [8] 1:N distr/async vid map | graph seq rep
U.S. Wealth Inequality [9] 1:N distr/async vid stat | map | graph |txtvis seq rep
Saving Poor Children [20] 1:N distr/async vid stat seq rep
Carcaptor Sakura [44] 1:N distr/async gra | txt tab | stat | graph seq | brch eph
Movie Explained Animated [45] 1:N distr/async vid txtvis | event seq rep
Comic style Dashboard [23] 1:N distr/async gra | txt tab | stat | graph seq | brch rep
Marvel&DC Comic [26] 1:N distr/async gra | txt tab | stat seq | brch rep
Body Cartoon Comic [27] 1:N distr/async gra | txt stat | time seq | brch eph
Spiderman Suit [28] 1:N distr/async gra | txt txtvis seq | brch rep
Phone Comic [29] 1:N distr/async gra | txt stat | time seq | brch rep
Linear Regression [30] 1:N distr/async gra | txt tab | graph seq | brch rep
Curve Fitting [36] 1:N distr/async gra | txt tab | graph seq | brch rep
Seashell Comic [37] 1:N distr/async gra | txt fmla seq | brch rep
Infographic Comic [24] 1:N distr/async gra | txt event | stat seq | brch rep
DataSketches [43] 1:N distr/async gra | txt event | graph seq | brch eph
Shadow of Foreclosure [56] 1:N distr/async gra | txt map | event prll rep
US Poverty [49] 1:N distr/async gra | txt tab | stat | map prll rep
Democrats and Republicans [57] 1:N distr/async gra | txt tab | stat | txtvis prll rep
UK and US Firearms [58] 1:N distr/async gra | txt stat | time prll rep
Correspondent Dinner [59] 1:N distr/async gra | txt tab | event | txtvis prll rep
Day vs. Night: NYC [60] 1:N distr/async gra | txt map | event prll rep
Who owns everything [61] 1:N distr/async gra | txt graph | txtvis | event prll rep
Big Welsh Coast Walk [62] 1:N distr/async gra | txt tab | stat | txtvis prll rep
Hangry USA [63] 1:N distr/async gra | txt tab | stat | map prll rep
NYC Celebrity Map [64] 1:N distr/async gra | txt map prll rep
VR Data Analysis [86] 1:N distr/async txt | ani | gra tab| map | stat | event | time | txtvis seq rep
VR Baseball training [80] 1:N distr/async gra | txt | ani tab | stat seq rep
Adobe VR Data Vis [79] 1:N distr/async gra | ani | txt tab | stat | graph | time seq rep
Table 3.2: Classification of our representative sample of different media used for data-driven storytelling?two.
42
Media Evidence A S/T VC DC SQ SP
Gene Pool Decline [12] 1:N distr/async vid stat | graph seq rep
The Joy of Stats [10] 1:N distr/async vid stat seq rep
End Poverty [13] 1:N distr/async vid tab | stat seq rep
China Geo Problem [14] 1:N distr/async vid map seq rep
Data Trans Biz [19] 1:N distr/async vid tab | stat | event | time seq rep
Bigdata Revolution [16] 1:N distr/async vid tab | stat | map | time seq rep
PhD Comic [21] 1:N distr/async gra | txt stat | txtvis | time seq | brch rep
Vocation Comic [31] 1:N distr/async gra | txt time seq | brch rep
Desk Entropy Comic [32] 1:N distr/async gra | txt time seq | brch rep
PhD Grooming Comic [33] 1:N distr/async gra | txt time seq | brch rep
Procrastination Comic [34] 1:N distr/async gra | txt time seq | brch rep
Procrastinator Mind [18] 1:N distr/async vid tab | stat | event seq rep
Imaginary Numbers [15] 1:N distr/async vid stat | txtvis seq rep
Religions and babies [11] 1:N distr/async vid tab | stat | map seq rep
Population truth [17] 1:N distr/async vid stat | txtvis seq rep
Household Pollution [53] 1:N distr/async gra | txt tab | txtvis prll rep
Air Pollution [54] 1:N distr/async gra | txt stat | txtvis prll rep
Yemen War [51] 1:N distr/async gra | txt map prll rep
London March [50] 1:N distr/async gra | txt stat|txtvis prll rep
Space Industry [52] 1:N distr/async gra | txt stat|txtvis prll rep
NYC Restaurant Vis [25] 1:N distr/async gra | txt tab | graph | stat | txtvis prll rep
VR Geo Map [85] 1:N distr/async gra | txt | tab | ani map | stat | txtvis seq rep
VR BioInfo Vis [84] 1:N distr/async gra | txt | ani tab | stat |graph | txtvis seq rep
VR Immersive [83] 1:N distr/async gra | txt | ani stat | tab seq rep
VR Lens [82] 1:N distr/async gra | txt | ani stat | tab | map seq rep
VR Big Data [81] 1:N distr/async gra | txt | ani stat | tab | map seq rep
AR Flight [74] 1:N distr/async gra | ani | txt map | stat | graph | time seq eph
AR Street View [75] 1:N distr/async gra | ani | txt map | stat | event seq rep
AR Pipeline [76] 1:N coloc/sync gra | ani | txt stat | graph | event seq eph
AR BioChemical [78] 1:N distr/async gra | ani | txt graph seq rep
AR Infrastructure [76] 1:N coloc/sync gra | ani | txt graph | time seq eph
Uber Mobile Vis [71] 1:N coloc/sync gra | ani | txt tab | stat | event | txtvis seq eph
AR Data Vis Design [72] 1:N distr/async gra | txt tab | stat | event | time | txtvis seq rep
AR 3D Design [73] 1:N distr/async gra | ani | txt tab | stat | graph seq rep
Table 3.3: Classification of our representative sample of different media used for data-driven storytelling?three.
3.4.1 Data Comics
In an effort to show how sequential art?also known as comics?can be used
as a novel method for storytelling, we presented this example with a technique of
creating sequential art for data-driven storytelling. We developed the Data Comics
system with the design principle from our taxonomy?combining storytelling with
the comic media [152].
The data comics technique, specifically introduced in appendix [152], allows
the users to build narratives using comic layouts of panels containing both snapshots
and live visualizations. Comic features are seamlessly emerged into the comic layout
panel. The storytelling process leverages the continuous frames as the organization
of storyline and the comic features, such as speech bubbles, comic figures, and
directional arrows, to bring the data-based storytelling into another dimension.
The above figure is an example of the data comic frame produced with our de-
sign philosophy by the data comic system. The design of the data-driven storytelling
in data comic system is not much different from traditional oral way, users are still
dominant in the system as the system is not capable of generate the storyline by
itself. More specifically, users start by collecting the data-visualization by cropping
a subgraph and decide the size and location of cropped part. Such cropping finish
allows the user to choose the level focus and decontextualization, which encourages
the users to summarize the most interested part of the whole visualization. Then
they are free to map the data visualization with frames of any size and sequence.
43
3.4.2 Data TV
DataTV is a prototype system for authoring live-streaming data videos using
single, integrated interface. The DataTV prototype is based on OBS Studio1, a
popular Open Source project for game streaming that supports multiple different
operating systems. The prototype supports three separate modes for (1) produc-
tion, (2) recording, and (3) editing, in a highly streamlined and optimized workflow
that allows a single content creator to control the entire process, even during live
streaming. The tool incorporates multimedia sources such as live webcams, live au-
dio recordings, web browsers, image viewers, and full-motion video. In particular, it
supports live recording of any selected window presumably containing an interactive
visualization, such as a web browser or dedicated application window.
Furthermore, the tool incorporates advanced video functionality, such as chroma
keying (making parts of a stream transparent, such as for blue or green screens),
picture-in-picture, hand-drawn annotations (for highlighting important parts of a
stream), viewport control (zooming and panning), and advanced source composition
operations (transitions, stretching, and fitting). To validate the DataTV platform,
we engaged three visualization tools which are the results of published work in the
information visualization community and created data videos analyzing a certain
topic with them. Further more, we asked two visualization experts to create a data
video less than one minute long with DataTV on a random topic to test if our system
is adaptive on a wide range of topics.
1Open Broadcaster Software: http://obsproject.com/
44
Our DataTV interface supplies as much space as needed for the web-based
visualization to support its interactive features (Figure 3.7).
3.5 Implication for the Taxonomy
We have successfully labeled about 91 existing examples with our taxonomy
shown as Table 3.1, 3.2, 3.3, and Figure 3.5. The examples cover movies, docu-
mentaries, web articles with data visualization, infographics, comics, social media,
visualization tools, games, dance, and sketching tools, which are a large part of the
major categories for storytelling. This comprehensive classification provides good
evidence that our taxonomy is sufficient and complete.
From Figure 3.5 we can see that there are thick paths that indicate there are
more examples categorized with the values in these dimensions. The parts with
light paths or without paths are the gaps between examples. To fill the gaps of
the taxonomy and the examples, we can find examples based on light path or the
dimension values that have not been explored. For example, to create media type
data comics from infographics, we can follow the design space and dimensions of
infographics, then change the viewing sequence from parallel to sequence or branch.
Using partitioning and sequencing, we can reorganize the content of infographics and
add static comic graph to create data comics. The detailed process and evaluation
is mentioned in Chapter 5 later.
From collecting the examples, we have found media types primarily in the
form of data videos (23%), infographics (14%), web articles (5%), visual analyt-
45
ics/visualization tools (8%), data comics (18%), virtual reality visualization tools
(7%), augmented reality visualization tools (9%), and computer game tools (3%).
We also observe that the values of some dimensions are more diverse than others.
For example, the dimension Data Component has the most diversity, meanwhile
some of examples share part of the values for this dimension.
Our taxonomy is an extension that builds on the foundation that Segel and
Heer [2] laid in 2010. While this foundation has proven instrumental in the guiding
the development of data-driven storytelling, their model is limited in scope and con-
flates the delivery mechanism with the media used for the message. We believe our
taxonomy provides a more comprehensive view of media for data-driven storytelling
while still building on their foundational work. Using our taxonomy, designers will
be able to widen the horizon of data-driven storytelling. By providing a taxonomy
with detailed dimensions, we explored the possible values for each dimension. By
expanding the list of dimensions and dimensions values, we can also keep tracking of
the emerging media types. In particular, the terms and dimensions in our taxonomy
provides a standardized vocabulary to use when discussing data-driven storytelling.
This enables researchers and practitioners alike to classify their own work so that
existing and new media can be systematically organized with a common ground.
However, the true value of a taxonomy such as ours is in generating new ideas
by identifying gaps in the literature. By grouping and labeling the dimensions of
existing media, our taxonomy can help researchers identify new areas to explore
in the future. For example, this new design space can be generated by exploring
previously untested combinations of dimensions. While we have done so in the
46
previous section, specifically in terms of the Data Comics and Data TV applications,
the space is wide open for even more radical ideas. For example, consider employing
data-driven storytelling in e-learning, social media, or even video games. What
about interpretative dance, theater, improv, music, or even song for data-driven
storytelling? The possibilities are endless.
47
48
Figure 3.5: The figure shows the parallel coordinate graph of the design space of the examples categorized by the taxonomy
Figure 3.6: Data Comic on the European debt crisis.
49
Figure 3.7: Webcam and Keshif visualization being recorded for the Nobel Prize
Winner data video. A data video can be made with user as the presenter.
50
Chapter 4: Designing Data-Driven Tools for Casual Users
The overall purpose of this Ph.D. thesis is to expand the horizon of media
for data-driven storytelling to aid casual users viewing, analyzing, and
understanding data. Traditionally, infovis systems are designed for experts and
professionals with strong domain knowledge.
4.1 Casual Users
To expand to a larger population including everyday internet users, the elderly,
and young people, we consider casual users as the group of consumers of information
such as data stories, who leverage storytelling and infovis systems to gain meaningful
information for both everyday work and non-work situations.
4.2 Casual Information Visualization
Pousman et al. [88] proposed casual information visualization by expanding
traditional visualization research to edge cases such as ambient infovis, social infovis,
artistic infovis. The three sub-domains seem far from the core infovis research.
? Ambient Infovis are systems that can be loosely defined as infovis, with very
sparse and abstract expressions of data.
51
? Social Infovis are collaborative visualization systems that work on social
information.
? Artistic Infovis are information visualization systems that work on data-
driven art.
All these different kinds of infovis are not strictly bounded by the traditional
visualization tools which are widely used by the professionals such as Tableau [135],
Microsoft Power BI [196], and Many Eyes [143].
4.3 Why Data-driven Storytelling for Casual Users
To make a greater impact on the largest user group, we choose to focus on the
casual users. As smartphones are widely accepted by people of different ages, from
young kids to elderly, large amount of computing power is spread as well. We want
to leverage the computing power to level the digital data gap between experts and
casual users with data-driven storytelling media.
In today?s internet world, the majority of the users are young and open to new
media types. One of the most popular Youtubers, ?PewDiePie? [197], who makes
original videos and live streaming on Youtube for a living, has a subscriber count of
70 million. This size is larger than some that of the population of large countries.
For the casual users without professional training, using visualization tools
can be both an opportunity and a challenge. The visualization tools have already
provided a powerful way to analyze, understand, present, and interpret raw data,
but current visualization tools are often too difficult for inexperienced users.
52
4.4 Which Media to Use
Most of the media types on the internet, such as online articles, online videos,
live streaming, online infographics and etc, are well accepted by casual users. For
instance, we developed DataTV using live streaming and data visualization to help
data videos broadcast to casual users. . We also developed Data Comics with comic
as media, as well as an online application to develop data comic for online users.
53
Chapter 5: Data Comics
In this chapter, we present Data Comic, a technique that can be created by
juxtaposing multiple visualizations into comic strip layouts for casual users as con-
sumers of information. Data Comic consists of a sequence of panels, each annotated
with both visual and textual elements, and arranged into a sequence that progres-
sively develops the overarching story told in the comic.
To facilitate the creation of Data Comics, we present DataComicsJS (Fig-
ure 5.1(b)), a Google Chrome extension that consists of four components: (1) the
Clipper, for collecting both snapshots of visualizations and images as well as raw
data from any webpage viewed in the browser; (2) the Decorator, for editing the vi-
sual design of an individual panel, including clips, images, captions, and comic-style
visual elements; (3) the Composer, for managing the layout, size, and position of
panels making up the comic; and (4) the Presenter, for ultimately allowing a viewer
to navigate in a finished Data Comic, including viewing the entire comic as a whole,
as well as view single panels in sequence.
54
5.1 Definition
Data Comics is a visual storytelling method based on sequential images con-
sisting of data-driven visual representations. Its purpose is to support for expert
users to build engaging narratives about data. Our inspiration for this method came
from several sources, including the recent focus on storytelling for visualization [198],
the increasing use of comics for ?serious? applications (e.g [175, 199, 200]), and the
EuroVis 2011 keynote by Scott McCloud on comics [156].
Our motivation is to take advantage of both the plethora of existing visual-
izations on the web and the familiar visual language of comics, including layout,
characters, and comic elements such as motion lines, speech bubbles, and arrows.
Below we review an operational model for data comics and then discuss each of
its aspects, including creating panels, managing their layout, and letting a viewer
navigate the comic.
5.1.1 Basic Model
For the purposes of this chapter, a comic consists of a sequence of panels
organized into one-dimensional tiers (or strips) and separated by gutters, or spacing,
between the panels [149, 175]. The panels in a tier are organized to be read from
left to right to form a narrative (at least in Western cultures). Tiers can in turn be
organized into pages, where each tier becomes a row separated by a vertical gutter,
and several pages can be linked together into a book (or comic book).
55
5.1.2 Panel Content
Unlike a normal comic, most panels in a Data Comic consist of visualizations
that convey information using graphical means.1 These could be simple and familiar
statistical graphics such as barcharts, time-series charts, and piecharts, or more
advanced visualizations such as treemaps [201], node-link diagrams, or even parallel
coordinate plots [202], all depending on the visualization literacy of the intended
audience and the instructional annotations in the panel.
Because of this focus on data-driven graphics, the designer is largely relieved
from creating artistic content, which requires drawing skills that only few people
have. Instead, the visual content can be constructed by either creating entirely new
visualizations from raw data, or by clipping a snapshot from an existing visualiza-
tion.
5.1.3 Characters, Annotation, and Effects
A Data Comic would not truly be a comic if it did not also leverage the visual
language of comics. Designers creating Data Comics can be given access to this in
several ways:
Comic-style rendering: To emphasize the comic medium, content can be
drawn using non-photorealistic rendering (e.g. [203]).
Characters: Characters often drive the narrative and complement the data-
driven visualizations. Because this requires artistic talent, designers should be given
1For engagement and effect, a few panels may consist solely of artistic content, but this puts
corresponding artistic burden on the designer.
56
a library of characters.
Comic elements: Designers should also be given access to common visual
elements used in comics, such as motion lines, highlights, or even onomatopoeia
(words that mimic sounds).
Captions, speech, thoughts: Visuals are often scaffolded by text in captions
as well as speech and thought balloons [149,175].
5.1.4 Layout Management
The layout of a Data Comic?the organization of panels into tiers and pages?
is an important consideration in creating a narrative. To facilitate easily construct-
ing a narrative, the Data Comics model should allow a designer to easily change the
order of panels.
5.1.5 Viewing
Finally, after a Data Comic has been created, its purpose is to be viewed
by its intended audience to convey its designer?s story (and message). Just like a
traditional comic, the default view for a Data Comic is to view an entire page, with
all of the panels visible. Since screens are different from the written page, however,
it also makes sense to support a single-panel navigation mode, where the viewer
can sequentially navigate backwards and forwards in the comic. This is not unlike
traditional slideshows in PowerPoint or Keynote.
57
5.2 Implementing Data Comics
We have implemented Data Comics as a web application called DataComic-
sJS. It is a hybrid application consisting of both client-side and server-side compo-
nents. Client-side components are built using JQuery for DOM manipulation and
D3 for visualization. The content is stored on the server-side backend, implemented
as a simple Python server communicating using JSON-RPC.
Data COMPOSER
COMICS
DECORATOR
SVG CLIPS
(a) Composer in DataComicsJS. (b) Data importing.
Figure 5.1: (a) Basic DataComicsJS interface with the Composer in the middle of
the workspace and Decorator on the left side. The Decorator is composed of four
expendable menus including data importing, operations, comic elements, and collec-
tion of clips. (b) Data import is used to load and update visualization clips from a
server where the user can also save finished work and reload them for presentation.
SVG clips is the content captured by the Clipper.
5.2.1 Clipper
The Clipper component of DataComicsJS is implemented as a Google Chrome
extension that users download and install in their local browser. This allows the
system to integrate with and extend the browser so that users can clip content from
any website they visit. This is achieved by traversing the DOM from the element
58
that the user indicates. Three different types of content can be clipped: (1) Raw
data: structured data, such as in an HTML table element or tab-delimited text, can
be parsed and saved; (2) Snapshot (SVG): Parts or all of an SVG element can be
clipped, including any CSS that affect its appearance; and (3) Snapshot (raster): A
specified bounding box of the webpage can also be clipped as a raster screenshot.
5.2.2 Decorator
The panel is the basic building block of a Data Comic, and panels are created
using the Decorator, which consists of a toolbox (graphical tools, such as translation,
scaling, rotation, annotation, cropping, etc), the workspace (a blank area with panel
borders that can be dragged to change the aspect ratio of the panel), the clip
collection (the visual and raw data content that the user has collected using the
Clipper; the user can simply drag and drop elements from the collection, or drop
raw datasets to bring up a dialog to select the visualization), and a symbols library
(comic-style elements and direct access to the Noun Project free icon library).
5.2.3 Composer
The Composer gives the user control over the layout and organization of panels
for a specific Data Comic. The layout flows to fit the page from left to right, top to
bottom based on the number of panels and the page width. The interface provides
buttons for adding and removing panels as well as dragging and dropping them to
change their order. Double-clicking on a panel will open it in the Decorator.
59
Figure 5.2: Viewing individual panels using the Presenter in slideshow mode. But-
tons allow for navigating slides in sequence.
(a) Text driving narrative. (b) Seeing the whole from (c) Tiers and pages.
parts.
Figure 5.3: (a) The use of text to drive narrative. (b) Encapsulation of moments in
panels juxtaposed yield closure as the viewer connects them and sees the whole. (c)
Panels organized into tiers with horizontal gutters; multiple tiers form pages with
vertical gutters.
5.2.4 Presenter
Once the designer has created and exported a Data Comic, he or she can share
it widely using a unique URL that can be sent via e-mail, shared on social media,
or posted on a forum. Everyone opening the URL will be given the Presenter view
of the comic, which is a read-only viewer. The Presenter supports two viewing (and
printing) modes: (1) viewing the entire comic, as is common for normal comics,
and (2) viewing individual panels by navigating using on-screen buttons or the
arrow keys (Figure 5.2)). The latter presentation mode is implemented as smooth
animated transitions from one panel to the next to reinforce the comic metaphor.
60
5.3 Telling Stories Using Data Comics
Creating a Data Comics requires some knowledge of the narrative style of
comics; the layout, content, and size of panels; as well as the use of text in driving
the story. Here we draw on the general literature of comics to operationalize these
concepts for data stories [175].
5.3.1 Comics Narration
While comics narration is generally linear, the medium by necessity cannot
present the continuous narrative made possible by media such as digital video or
film. Instead, comics narration depends on encapsulation, the focus on particular
moments in the possible narrative arc to be represented in individual panels. Writers
and pencillers exercise syntagmatic choice in establishing a series of juxtaposed
images in the panels. The edges of the panels represent the limits of representation
within this smaller bit of the page layout, but each panel also contributes to the
larger unit, leading to an interpretive experience that is both linear and holistic since
each panel is interpreted individually but also as part of the larger page. The gutters,
while delineating the limits of each panel, also indicate the necessity for the reader
to engage in closure in order to generate a more coherent narrative (Figure 5.3(b)).
The progression of syntagmatically related panels can follow a number of pat-
terns. The progression may be temporal, following the same event as it unfolds in
time. It may be spatial or spatial/temporal, moving between a number of locations
at a given moment or during a specified time, or else presenting a series of localized
61
micro-events happening within a larger framework (akin to a series of reaction shots
to the same event used in film). Panels could also relate conceptually, presenting a
series of related abstractions and calling upon the reader to form associations be-
tween them. The panels in Figure 5.5, operating in a relatively static timeframe and
without a specific location, might best be thought of as such a conceptual arrange-
ment, isolating and explaining particular phenomena within the broader structure
of the U.S. Census data.
5.3.2 Panel Content and Size
The comic panel is among the most fungible methods of representation, capable
of presenting visual data ranging from a full landscape or urban horizon to a mid-
range ?street scene? shot to a close-up portrait shot or even the so-called extreme
close-up of a particular detail. While layout possibilities are nearly limitless, most
American (or European)-produced comics follow some variation of a grid pattern
in which rows are read left to right while proceeding from the top to the bottom
of the page. Panels might all be of equal size, as in Figure 5.5, but it is typical
to encounter panels with particularly important details occupying more space on
the page (Figure 5.3(c)). The extreme of this narration is the ?splash page,? in
which one panel occupies an entire page, or even two pages, a rare panel known as
a ?double-page spread.?
The combination of larger and smaller panels can often be used to convey a
large quantity of visual information in a large panel while highlighting particular
62
elements of the larger panel?s content in smaller panels, such as presenting a land-
scape or a cutaway view of a house in a larger panel and then presenting close-ups of
various visual elements in smaller panels collected near the larger panel. (The visual
content need not be a natural or built environment, however: a similar layout for
Figure 5.5 with a single large panel depicting the overview of the U.S. Census data
could readily be supplemented by rows of smaller panels presenting the individual
elements of the data set.)
5.3.3 Textual Narration
Comics panels may feature dialogue or textual narration, or these elements
may be omitted, a stylistic choice embedded in the form?s eighteenth-century origins.
The satirical engravings of the British artist William Hogarth generally omitted dia-
logue, whereas politics cartoons dating from the eighteenth and nineteenth centuries
often featured multiple characters speaking (to all appearances simultaneously) in a
one-panel image. Narration and dialogue are typically omitted for action sequences,
focusing the reader?s attention on the implied movement of the characters or objects
in the panels. Dialogue is generally placed in speech balloons that are superimposed
on the image. Narrative captions can be placed in a box, usually located at the top
of the panel, and delivered in the first, second, or third person (Figure 5.3(a)).
An entirely different method of narration is available in the form of direct
address using a character in the comic, usually anthropomorphized to one degree
or another. Breaking the so-called ?fourth wall? between the page and the reader,
63
such characters can function as avatars drawing the reader into the narrative or
as interlocutors presenting essential information to the reader. (The narrator in
Figure 5.5 represents the latter approach.) Scott McCloud combines both functions
in his Understanding Comics [175] guide to the visual interpretation of the medium,
deploying a cartooned avatar of himself to walk the reader through the essential
physical elements of comics layouts and illustration styles while also dispensing
information necessary to the reader?s interpretation of these elements.
(a) The Greek debt crisis. (b) Journals in neuroscience.
Figure 5.4: Examples of Data Comics. (a) The role of Greece in the European debt
crisis. (b) Trends in scientific journals in neuroscience.
5.4 DataComics Examples
To validate the Data Comics method and give a concrete idea of what it is
capable of, we present a few examples based on real-life data. The inspiration comes
from online data visualizations and infographics we experience on a daily basis. Our
work here reformulates some of these insights using the comics medium.
64
Figure 5.5: Data Comics of the U.S. baby boom of the 1960s.
5.4.1 Euro Debt Crisis
Dataset: The comic is based on a European Debt visualization available from
The New York Times.2 The visualization is a flow chart describing the debt relations
between major European countries, the United States, and Japan.
Insight: The original visualization shows the direction and amount of the debt.
It is obvious that the debt system is tangled, and the debt is huge. The Data Comic
can tell the story in smaller, more manageable chunks.
Construction and Visualization: We snapshot the visualization and add comic
figure and comic style text diagram to it. Figure 5.4(a) shows that the Greek debt
crisis has a great negative impact on the economy of the whole Europe. The exag-
gerated lightning sign and the character figure is intended to show the seriousness
2http://www.nytimes.com/interactive/2011/10/23/sunday-review/
an-overview-of-the-euro-crisis.html
65
of this crisis while adding a human angle. This shows how the Data Comic method
can change the style of an existing visualization.
5.4.2 U.S. Census Population Pyramid
Dataset: Figure 5.5 is the Data Comics version of a U.S. Census dataset from
1960s to 2000s. 3
Insight: The horizontal bar chart illustrates a population distribution based
on an age scale and time scale in decades, partitioned by gender. We clearly see
the babies born in each decade and the trend of the population growth. We want
to make a story about the population growth during and after baby boom to reveal
how the babies born are the key factor to shape the population structure.
Construction and Visualization: To track the changes of babies born during
the period of 1960 to 2000, we captured the bar charts of population pyramid during
this time period, and then generated another line chart based on these data. Within
each panel, we labeled dimensions such as age, number of babies, and changes in
births from previous years. The panels clearly show that the number of babies born
decreased during the middle two of the four decades. Later, the number of babies
born increased in the last bar chart.
5.4.3 Scientific Journal Comparisons
Dataset: The data is from a visualization comparing publication counts over
time in scientific journals covering different topics in neuroscience and brain stimu-
3http://vis.stanford.edu/jheer/d3/pyramid/shift.html
66
lation. Data gathered from the U.S. National Library of Medicine.4
Insight: The comparison among different publications over time is vastly dis-
tributed on the visualization space. Through comparison, we found that the Journal
Brain Stimulations has steady growth in the field of transcranial magnetic stimula-
tion.
Construction and Visualization: We captured the visualization, and then
added a hand-shaped indicator and a female figure to help guide the narrative.
As part of the whole story of six panels we built for this topic, in Figure 5.4(b), the
Data Comic performs the role as a personalized guide highlighting the difference for
comparison.
5.5 Evaluation
We focus on two separate aspects of Data Comics in this chapter: first, how
an analyst can go about authoring a data comic, and second, how an audience will
respond to the presentation using a data comic. No single user study can explore
both of these aspects, so below we report on two separate studies: authoring as well
as presentation.
5.5.1 Study 1: Authoring Data Comics
Our primary contribution in this work is the concept of Data Comics rather
than the prototype tool we have built to facilitate authoring them. After all, given
sufficient time and effort, the comics shown in this chapter can be replicated using
4http://neuralengr.com/asifr/journals/
67
traditional drawing tools. However, we still wanted to study how the presence
of a specialized authoring tool affected their creation. Our hypothesis is that when
explicitly framing data-driven storytelling through the lens of comics, new modalities
for visual communication arise.
5.5.1.1 Method
We performed an expert review with two data visualization professionals using
the basic method proposed by Tory and Mo?ller [204]. Two independent experts from
a neighboring research group at our university used DataComicsJS to create a three-
panel data comic based on their current work.These two experts have extensive
experiences and knowledge over data visualization theory and tools. They were
asked to follow an informal think-aloud protocol and allotted a total of one hour
session.
P1: Visual Analytics Researcher: P1 is a visual analytics and human-
computer interaction researcher at our university working on online visualization
system design with more than three years of experience.
During the expert review, P1 raised several points: (1) Better insight. The
process of having to determine a narrative structure for the data comic helped the
expert gain a better understanding of the data. This may be common to storytelling
in general, but the expert felt that the highly visual and visceral comic medium made
this particularly clear. (2) Better evidence. The process of assembling sufficient
material to compose the story was also beneficial in general in collecting evidence
68
for the comic. (3) Thinking in comics. The expert noted that the fact that our
DataComicsJS tool gives ready access to the visual language of comics?including
layout, narrative structure, and visual elements?reduced the workload and helped
P1 to think purely of comic storytelling rather than the mechanics of the interface.
(4) Web integration. Being able to effortlessly clip imagery and data from the web
was cited by the expert as one of the main benefits of the tool.
P2: Information Visualization Researcher P2 is a researcher specialized
in web-based information visualization system design, particularly for big data. P2
has more than three years of experience in designing infographics and interactive
dashboards.
During the expert review, P2 raised several points: (1) Accessibility: Comics,
by its nature, tell stories in a very friendly and accessible way, which can be beneficial
for data that is complex or even intimidating. (2) Sequence: Infographics typically
relies on layout to deliver the story and there is no explicit sequence. If readers do
not follow the layout in the way designed by the creator, they can lose the causality
of the story. Comics use a pre-determined temporal sequence where the causal
relation is obvious (see Section 5.3). (3) Motivational. The expert (P2) noted that
creating data stories in a comic style was intrinsically fun and motivated the design
process. It also invited thinking about how to best present the data using the visual
languge of comics.
69
5.5.2 Study 2: Presenting Data Comics
Our working hypothesis in this chapter is that Data Comics provide a com-
pelling way of telling stories about data. To empirically explore the virtues of this
premise, we conducted a qualitative user study comparing Data Comics to tradi-
tional PowerPoint slideshows. The reason we chose PowerPoint is not to prove that
our prototype implementation?DataComicsJS?is in any way superior to Power-
Point or any presentation software, but to pick an application and style of presenta-
tion widely used in the real world. Here we describe the methods and results from
this evaluation.
5.5.2.1 Participants
Participants: We recruited 12 paid participants (6 male, 6 female) to par-
ticipate in the user study. The participants were self-selected from the student
population at our university, were aged between 20 and 31 years of age, had normal
or corrected-to-normal vision, and were proficient computer users (all demographics
were self-reported).
5.5.2.2 Apparatus
Apparatus: We conducted the experiment on a laptop computer equipped
with a 15-inch 1280 ? 800 LCD screen, a standard keyboard, and a three-button
mouse. Both the Data Comics prototype and Microsoft PowerPoint was maximized
to fill the screen during the experiment.
70
5.5.2.3 Task and Datasets
Task and Datasets: Each trial consisted of the participant using either MS
PowerPoint or Data Comics to answer questions about a data story. Each story
consisted of several panels of narration. The number of panels were limited to five
or six to keep the simplicity of layout while providing sufficient information for the
story. Each story presentation focused on a single topic of visualized data and came
with an associated list of 7 to 9 questions designed to make the participant focus on
details of the visualization. Participants had access to all questions during the time
they were interacting with the story.
We created four stories for the evaluation: Twitter heatmap for stocks (S1),
the U.S. Census Population pyramid (S2), world happiness (S3), and Star Wars
character fans? personality rating (S4).
(a) Panel in a DataComicJS story. (b) Slide in a PowerPoint story.
Figure 5.6: A comparison of the story frames of Birth data from the U.S. Census
Bureau with DataComics and PowerPoint. Stories are composed with different
methods but with one-to-one correspondence in details to make the user study fair.
The stories were first created as a Data Comic by clipping from online data
71
sources and visualizations, creating an appropriate number of panels to tell the story,
and finally decorating the panels with comic annotations, characters, and captions.
We then created a corresponding PowerPoint slideshow with the exact same num-
ber of slides as the number of panels in the Data Comic. There was a one-to-one
correspondence between the visualizations and captions between panels and slides
as Figure 5.6; the only difference between the two versions was that the PowerPoint
slideshow only used visualizations and text, whereas the Data Comic included char-
acters and comic-style symbology. Furthermore, while the text explanations were
identical across versions, the Data Comic version integrated them in comic-style
captions or speech and thought balloons.
5.5.2.4 Metrics
Our focus with the evaluation was not to primarily study quantitative metrics,
such as time and accuracy, but to collect subjective and qualitative feedback on
the difference between Data Comics and traditional slideshow presentations. For
this reason, we developed a questionnaire polling participants on their subjective
experience of a story. This was administered to participants directly after each
story, and consisted of the following 1?5 Likert-scale questions: engagement, speed,
space efficiency, ease of use, and enjoyability. We also asked participants for general
feedback on the tool.
72
5.5.2.5 Factors
We included two factors in the experiment, described below.
Presentation (P): This factor modeled the presentation technique P given
for solving questions in a trial: 1) Data Comics: The narrative visualization is
presented as a Data Comic in the Presenter in our prototype implementation. Par-
ticipants were able to view the entire comic and navigate panel by panel in the
comic. 2) PowerPoint: The narrative is presented as a PowerPoint slideshow. Par-
ticipants can navigate backwards and forwards in the slideshow. They can also view
all of the slides at once in the ?slide sorter? view.
Story (S): We also hypothesized that the specific story and topic of the data
visualization may impact our outcome. Thus we added a factor S to model the
different stories.
Engagement Speed Space-efficiency Ease of use Enjoyability
DC PPT DC PPT DC PPT DC PPT DC PPT
Figure 5.7: Comparison between DataComics (DC) and PowerPoint (PPT) of sub-
jective ratings (Likert 1-5 scale) for engagement, speed, space-efficiency, ease of use,
and enjoyability.
5.5.2.6 Procedure
An experimental session started with the participant arriving, reading and
signing the consent form, and being assigned an identifier and story order. The
73
1 2 3 4 5
administrator then explained the general goals and task. Each trial started with the
administrator demonstrating how to use a Data Comic. The participant was then
given two examples, one Data Comic and one PowerPoint, and was allowed to ask
questions about the examples and task during this time.
When the participant finished the training, they were given a story opened
in the appropriate tool and a paper sheet with questions. They were given up
to 10 minutes to answer the questions, and were encouraged to use all of the time.
After answering all questions, the participant was given the subjective questionnaire
polling their experience in the trial. This was repeated for all four stories?two using
Data Comics, two using PowerPoint. A full session lasted approximately 50 minutes,
including training and questionnaires.
5.5.2.7 Quantitative Results
Figure 5.7 depicts boxplots of the subjective ratings for both Data Comics and
PowerPoint on engagement, speed, space-efficiency, ease of use, and enjoyability
(Q1 through Q5). We analyzed the 5-point Likert scale of subjective ratings for
effects of presentation technique P (Data Comic vs. PowerPoint), and found that the
engagement (Q1), efficiency (Q3), and enjoyability (Q5) were significantly different
between the two techniques (Friedman tests, p < .05), but the speed (Q2) and
ease of use (Q3) had no significant difference (Friedman tests, p = .51 and p = .08
separately). We also found no significant effect of story S on any of the metrics.
74
5.5.2.8 Qualitative Feedback
Inviting Reading: Nine out of twelve participants mentioned that the comic-
style rendering helped them view the materials as a whole story from the very
beginning without any explicit direction. They noted that the speech balloon helps
focus by creating a feeling that there is a virtual conversation going on, and the comic
figure kept them more involved in the scenario of the story. All these comments from
the participants suggest that Data Comics invite reading, even when incorporating
only simple and trivial comic elements.
Viewing as a Story: The Presenter is organized to show not just the current
panel, but also the two surrounding ones (Figure 5.2). While this nominally is a
waste of visual space?a slideshow shows each slide in full-screen mode, yielding
more pixels to complex visualizations?participants seemed to enjoy this view, pre-
sumably because it suggests continuity and story flow (?there is more to see beyond
this panel?) and it evokes the ?comic state of mind? we seek. Our observations and
interviews confirmed this fact; the extra context panels seem to encourage partici-
pants to keep reading. Also, our Likert scale results show that participants actually
felt that comics were more space-efficient (Q3) than the slideshows.
In fact, we observed that all but two of our participants would start each Data
Comic task by first reading through the entire comic from beginning to end. Thus,
the comics format seems to invite reading. This is in contrast to the PowerPoint
slideshows, which no participants were observed to read fully before answering ques-
tions. Several participants remarked that the slideshows did not ?feel? like stories,
75
but rather information sheets that they were just flipping through to find informa-
tion. While there is no intrinsic value to this story aspect of Data Comics, we do
think it increases user engagement, as evidenced by our quantitative results.
Facilitating Memory: Eight out of twelve participants mentioned that the
comic version of each story helped them remember the contents, even down to the
individual panel for specific information. Figures are naturally memorable, even
when they are not relevant to the current topic; this mirrors findings by Bateman
et al. [205] on the beneficial effect of ?chart junk? on recall in visualizations and
infographics. We also noted that a Data Comic does not need to be designed in
a very artistic way, but can incorporate basic clipart-like imagery and graphics.
Participants also mentioned that even a little variation of comic figures can help
distinguish frames.
5.5.3 Study 3: Partitioning and Sequence in Storytelling
Our working hypothesis in this chapter is that data comics provide a more
effective way of telling stories than a single visualization. To empirically explore the
virtues of this premise, we conducted a qualitative user study comparing multi-panel
data comics with a single infographic-style visualization for the same data. Here we
describe the methods and results from this evaluation.
76
5.5.3.1 Participants
We recruited 12 paid participants (9 male, 3 female) to participate in the study.
The participants were self-selected from the student population at our university,
were aged between 20 and 31 years of age, had normal or corrected-to-normal vision,
and were proficient computer users (all demographics were self-reported).
5.5.3.2 Apparatus
We conducted the experiment on a laptop computer equipped with a 15-inch
1280 ? 800 LCD screen, a standard keyboard, and a three-button mouse. Both
the data comics and the infographic were maximized to fill the screen during the
experiment.
5.5.3.3 Task and Datasets
Each trial consisted of the participant using four types of data stories to answer
questions about a data story: a single infographic versus a data comic, with and
without captions for the different parts (see Section 5.5.3.5 below). Each story
consisted of several panels of narration. The number of panels was limited to five
or six to keep the simplicity of layout while providing sufficient information for the
story. Each story presentation focused on a single topic of visualized data and came
with an associated list of 7 to 9 questions designed to make the participant focus on
details of the visualization. Participants had access to all questions during the time
they were interacting with the story.
77
We created four stories for the evaluation: Beer Origin Map (S1), the Arab-
Israeli Conflict (S2), Smart Phishing Attacks (S3), and World Wealthy People Dis-
tribution (S4).
The sequences of stories assigned to all the participants are the same. We
mixed the methods for stories to counter-balance the learning effect for the methods.
The four methods and the four stories are mix-matched into sixteen combinations,
where we pick twelve to assign to twelve participants randomly.
(a) Composer in DataComicsJS. (b) Data importing.
Figure 5.8: (a) The original infographics without any changes. (b) Adding red
dotted boxes to highlight the locations for the panels.
The stories were created based on existing infographics that we found online.
The data stories are based on the topic, structure, and organization of online visual-
ization. The organization of an infographics can be classified into several genres [2],
including overview-detail, cause-effect, chronological, etc. We first follow these pat-
terns to find stories from our selected infographics. Then we partition storyline into
panels and write captions to form the final data comic.
78
(a) Composer in DataComicsJS. (b) Data importing.
Figure 5.9: (a) The infographic is partitioned into panels. (b) The infographic is
partitioned into panels and captions are added to help address the storyline.
5.5.3.4 Metrics
Our focus with the evaluation was to both study quantitative metrics, such
as time and accuracy, as well as to collect subjective and qualitative feedback on
the difference between data stories of data comics and original visualization. For
this reason, we developed a questionnaire polling participants on their subjective
experience of a story. This was administered to participants directly after each
story, and consisted of the following 1?5 Likert-scale questions: engagement, speed,
space efficiency, ease of use, and enjoyability. We also asked participants for general
feedback on the tool.
Moreover, we forced the participants to answer correctly. We also found that
the amount of time spent by each participant with different techniques is compa-
rable. In this case, we only study the subject score of 15 Likert-scale questions on
engagement, speed, space efficiency, ease of use, and enjoyability.
79
5.5.3.5 Factors
We included three factors in the experiment, described below.
? Presentation (P): This factor modeled the presentation technique P given
for solving questions: infographic (IG) or data comic (DC). In other words,
this was the primary factor intended to differentiate between different data
story mechanisms.
? Captions (C): Whether or not the participant had access to the partitions
and captions. For the infographic, having access to the captions would show
the bounding boxes of the partitions as well as the associated caption we had
written for each partition (Figure 5.8, Figure 5.9).
? Story (S): We also hypothesized that the specific story and topic of the data
visualization may impact our outcome. Thus we added a factor S to model
the different stories.
5.5.3.6 Procedure
An experimental session started with the participant arriving, reading and
signing the consent form, and being assigned an identifier and story order. The
administrator then explained the general goals and task. Each trial started with the
administrator demonstrating how to read a data comic. The participant was then
given four examples, two with original visualizations (w/wt highlight of panel loca-
80
Engagement Quickness Space Efficiency Ease of use Enjoyability
DC DCC V VC DC DCC V VC DC DCC V VC DC DCC V VC DC DCC V VC
Figure 5.10: Comparison between single visualization (V), visualization with cap-
tion (VC), data comic panels without captions (DC) and data comic panels with
captions (DCC) of subjective ratings (Likert 1-5 scale) for engagement, speed, space-
efficiency, ease of use, and enjoyability.
tions) and two with data comic (w/wt captions), and was allowed to ask questions
about the examples and task during this time.
When the participant finished the training, they were given a story opened
in the appropriate tool and a paper sheet with questions. They were given up
to 10 minutes to answer the questions, and were encouraged to use all of the time.
After answering all questions, the participant was given the subjective questionnaire
polling their experience in the trial. This was repeated for all four stories?one with
the infographic, one with the infographic with captions, one with panels without
captions, and one with panels with captions. A full session lasted approximately 50
minutes, including training and questionnaires.
81
1 2 3 4 5
5.5.3.7 Quantitative Results
Figure 5.10 depicts boxplots of the subjective ratings for the four types of tasks:
a) infographic, b) infographic with highlights of panel focus locations and captions,
c) data comic without captions, and d) data comic with captions. The ratings are
for following effects on a 5-point Likert scale: engagement, speed, space-efficiency,
ease of use, and enjoyability (Q1 through Q5). We analyzed the 5-point Likert
scale of subjective ratings for effects of the technique P (infographic vs. data comic)
and captions C (no captions vs. with captions), and found that the engagement
(Q1), speed (Q2) efficiency (Q3), and enjoyability (Q4) were significantly different
between the four techniques (Friedman tests, p < .05), but enjoyability (Q5) had no
significant difference (Friedman tests, p = .12). We also found no significant effect
of story S on any of the metrics.
5.5.3.8 Qualitative Feedback
Easy to Start and Follow the Story: Reading a infographic isn?t always
easy. Some infographics are designed for professionals as they are packed with
information of all kinds. Partitioning the visualization into panels, especially when
adding captions, can help the reader follow the sequence of panels and captions
to generate a thread. One participant mentioned that the Arab-Israeli conflict
example is overwhelming from the first look, but that he first data comic panel
is a great summary for the whole visualization with all the excessive supplementary
information partitioned to other panels. Another participant mentioned that she can
82
skip a few panels when reading through the data comic panels while still following
the big picture of the story.
Facilitating Focusing: The data comic panels are organized in a sequence
following the storyline suggested by each infographic. The audiences? attention is
directed by the panel, so that the important information is contained by certain
panels. One participant mentioned that during the user study, he was able to easily
locate a couple of panels whenever he needed a certain kind of information. Another
participant mentioned that having captions and panels is like having labels for the
whole visualization.
Facilitating Memory: Individuals have different habits when reading an
infographic. People still start from different position to read a big infographics. In
our study, several participants mentioned that the panels suggested a structured
and progressive way of reading to build up the information through the sequence of
the panels. Five of the twelve participants mentioned that reading the data comic
panels helped them remember the information when answering questions. Even they
did not catch the information in details, they were able to go back to the correct
panels most of the time. This result matches the findings from Borkin et al. [206],
which shows that the visualizations are more memorable when including pictograms
or cartoons of a recognizable image.
83
Chapter 6: DataTV
Past work has shown that animated narratives can be particularly effective
for data-driven storytelling [153, 154]. In this chapter, we describe DataTV, an ap-
proach to live data video production using online streaming technologies. DataTV
is implemented as an integrated system, which provides a tool for the casual users
as consumers of information to view data video stories created from narration, in-
teractive visualization and annotation, and so on.
6.1 Design: Supporting Streaming Data Video Production
We claim that there is a need for a multimedia platform for creating live data
videos at a pace and scale where they can be streamed and uploaded to an online
video sharing service, such as YouTube or Twitch. Here we describe the major
design decisions of the DataTV platform we design to meet this need.
D1 Standalone application: We design our tool to be a standalone desktop
application rather than a web-based one.
? Video production and real-time streaming requires high performance pro-
cessing and significant storage.
84
? No existing streaming software is entirely web-based; in fact, some even
employ specialized hardware.
? Alternative: A web-based tool is platform-agnostic, but the performance
demands for real-time video capture are too high.
D2 Web integration: We embed a web browser as a capture source to support
web-based visualizations.
? Toolkits such as D3 [137] have made the web a unified platform for de-
livering visualization to the masses.
? Modern web browsers are full-featured multimedia platforms supporting
a wide range of content, including video, sound, speech, vectors, 3D, etc.
? Alternative: A web-based tool would trivially support web technologies,
but is not practical due to performance constraints (see above).
D3 Optimized workflow: The tool supports streaming with a single user acting
as talent, engineer, and producer.
? Sustainable workflows for creating streams on a weekly or even daily basis
must be time-efficient.
? Most streamers on Twitch?even established ones with thousands of
subscribers?operate alone.
? Alternative: Abandoning real-time control would prevent streaming.
D4 Native media support: We provide source handlers for many media types,
such as video, music, webcams, etc.
85
? Effective data videos incorporate multiple media types beyond ?just? the
visualizations themselves [153].
? Capturing directly from on-screen windows trivially enables native sup-
port for all applications.
? Alternative: Using special software for specific media breaks the work-
flow, requires expertise, and reduces efficiency, but requires integrating
all of the media handlers in the same tool.
D5 Simplified video production elements: We design the tool to provide
simplified video production operations using easily accessible actions.
? Typical analysts do not have a background in video editing, much less
storytelling using motion graphics.
? Providing the building blocks of video production will help creators think
in terms of storytelling rather than mundane tool operations.
? Alternative: Dedicated video editing software have a richer set of video
production elements, but their use would break the workflow.
6.2 DataTV: A Streaming Data Video Editor
We present DataTV, a prototype data video streaming utility. DataTV is a
standalone desktop tool for multiple platforms that allows recording multiple video
sources from any number of windows on the user?s desktop.
86
Figure 6.1: Main user interface of the DataTV prototype tool. A live mode toolbar
allows for panning and zooming sources as well as scribbling directly on top of the
output. The list panes at the bottom of the interface allow for controlling the scenes
and sources being displayed.
6.2.1 Workflow
The DataTV tool is modeled along the standard workflow used in game stream-
ing software such as OBS, XSplit, and GameShow, where the streamed output of
the tool is managed using the concept of sources and scenes :
? Source: Streaming input such as from a window, specific application, audio
source, or multimedia content.
87
? Scene: Composition of sources on an empty display that can be recorded and
streamed as output from the tool.
DataTV operates in one of two distinct modes: (a) offline mode, where the
user configures sources and scenes, or (b) live mode, where scenes are recorded (and
possibly streamed). Most operations, such as creating, modifying, and deleting
scenes and sources, can be performed in both modes to allow for users responding
to live events (e.g., adding a new web-based visualization to the stream on a spur-
of-the-moment idea), but the most common workflow is as follows:
1. Preparation: The user prepares all scenes and sources for recording in offline
mode. This involves creating the sources, creating the scenes, and composing
the sources for each scene on the output display canvas.
2. Recording: The user switches to live mode, sets up the stream settings (if
enabled), and begins the recording. If the video is being streamed live, the
user cannot easily switch back to preparation; instead, any changes to scenes or
sources must be made while recording. If the video is merely being recorded
and not streamed, the user can stop recording and go back to preparation,
allowing the clips to be edited together at a later stage.
3. Storytelling: The user creates the data video in live mode. This entails switch-
ing between different scenes (by selecting the scene to display in the scene
manager), managing specific sources (transforming, toggling on and off, anno-
tating, etc), interacting with visualizations and other windows, and potentially
narrating and/or capturing webcam video of the user.
88
6.2.2 Source Management
The tool maintains a list of currently available sources in the source listing
window (Figure 6.1). Using this widget, sources can be created, toggled on and off,
and deleted both during offline and live modes. Sources are created by selecting the
source type and then associating the source with the appropriate object, such as a
specific Google Chrome browser window on the desktop. Supported sources include
the following:
? Window capture: Video output from a specific window on the desktop based
on the window title, class, or executable.
? Video capture: Video output from an external device, such as a webcam or
video camera.
? Audio capture: Audio output from a microphone.
? Media objects: Sources based on static images, video files, and rich text.
6.2.3 Scene Management
A scene in DataTV is an empty canvas containing sources that form the current
output of the tool. Only one scene can be active at a time in DataTV; the active
scene is displayed in the main composition window (Figure 6.1) and governs what
is recorded and streamed when switching to live mode. The user can easily switch
scenes using the scene list.
89
Managing a scene essentially entails managing the sources involved in the
scene. In offline mode, most scene management operations are ?heavyweight? in
that they require significant setup that is not amenable to live recording. Examples
of such operations include adding sources to the scene, managing their depth order
(governing the drawing order of the sources), and transforming them (translating,
scaling, rotating, and cropping). These operations are achieved by interacting with
the sources in the composition window.
In live mode, users should mostly use ?lightweight? operations that are de-
signed for easy interaction while recording. This includes toggling the visibility of
sources, zooming and panning in a source using mouse dragging and the mouse
wheel (such as to zoom in on a particular part of a visualization or window), and
scribbling using the annotation feature.
6.2.4 Video Annotation
Sportscasters regularly use annotation to scribble symbols and highlights di-
rectly on the video feed, such as to explain specific events in an instant reply of a
touchdown or goal. Presentation software such as Microsoft PowerPoint supports
similar ?ink annotations? where the presenter can draw pen and highlight strokes
directly on a slide to illustrate a specific point. DataTV supports a similar video
annotation feature through its live mode interface (Figure 6.2), which provides a
pen, highlighter, and eraser tool. The interface also allows the user to select the
drawing color as well as to clear all of the annotation from the output when moving
90
to a new scene.
Figure 6.2: DataTV?s live mode interface where users can zoom and pan in a data
source as well as annotate using a pen, eraser, and highlighter.
6.2.5 Recording and Live Streaming
Our DataTV prototype supports recording to standard native video file for-
mats (FLV, MP4, MOV, etc) as well as live streaming output to services such as
Youtube and Twitch.tv. The tool also provides full control over stream settings for
both video and audio output.
6.2.6 Implementation
We implemented our DataTV prototype based on OBS (Open Broadcaster
Software) Studio, an Open Source live streaming package for multiple platforms.
91
Our extensions were made in C++ and significantly modifies the workflow of the tool
to include a streamlined live mode interface, including scene management, zooming
and panning, and live video annotation.
6.3 DataTV Examples
To validate the DataTV prototype and to give a concrete idea of what it is
capable of, we present a few examples based on real-life data and existing visualiza-
tion systems. The inspiration for these DataTV examples comes from creating and
understanding web-based data visualizations and infographics we experience on a
daily basis. Our work here reformulates and reimagines some of the insights from
the data video creation process by Amini et al. [153,154].
The data videos below consist of the following media sources:
? Live webcam video;
? Live microphone recordings;
? Web-based visualizations in a browser; and
? Video and images.
The media source selection for each story varies depending on the needs of
each particular story.
92
Figure 6.3: Keshif browser for the Nobel Prize Winners dataset.
6.3.1 Nobel Prize Data Analysis with Keshif
The first example uses the Keshif [136] system1 to explore the ?Nobel Prize
Winners? dataset. Keshif is a visualization system designed for interacting with
multi-dimensional data using a sophisticated faceted browser. The key interaction
in the Keshif system is linked selection, which is a generalization of brushing and
linking supporting highlighting, filtering, and comparison.
The Nobel Prize Winners dataset is composed of the basic profiles of all win-
ners, including their pictures, year of winning the prize, nationality, etc. The overall
story of this example is to briefly introduce South Africa, particularly its former
leader and Nobel Peace Prize winner, the late Nelson Mandela. The analyst uses a
YouTube video of Mandela (Figure 6.4) giving a speech as the introduction. Live
1http://keshif.me/
93
Figure 6.4: A live DataTV session composing a data video using the Keshif system
for the Nobel Prize Winners dataset. The user imports an external visualization
tool to display the economy of South Africa.
video used in this way will make the presentation engaging and draw in the viewer.
It is easily achieved in DataTV using a YouTube source and controlled live by the
user; no specific off-line editing is needed. Then we turn off the video and use a
live window with Keshif to lead the story from introducing the categories of Nobel
Prize to their age distribution. Again, the storyteller can do this in real-time sim-
ply by interacting with the Keshif visualization in a normal web browser window,
potentially narrating his findings using the webcam and microphone. The DataTV
platform will capture all of these sources, compose them in real-time, and stream
them to a remote server. Compared to static media types, such as infographics,
the ability to use an interactive interface enables the analyst to change the topic
and approach during the storytelling. For example, potentially in response to an
94
audience question through the Twitch chat service, the analyst may decide to give
a little history of the Nobel Peace Prize, as well as Nelson Mandela?s term as pres-
ident, culminating in him winning the Nobel Prize. We then conclude the session
with a short YouTube video of him giving another speech.
Figure 6.5: Webcam and Keshif visualization being recorded for the Nobel Prize
Winner data video. The author is recording a live video ?talking head? view using
a webcam and microphone source as input, which is composed into the final video
output.
Figure 6.3 shows the original interface of the Keshif system, which clearly
requires a significant amount of screen space to use properly. Since our DataTV
prototype supports real-time controls for zooming and scaling, the presenter can
easily adapt the size of the browser in the composite video output, even zooming in
95
to show specific features (Figure 6.5).
Figure 6.6: TimeFork [207] prediction space for stock market data.
6.3.2 Stock Market Data Analysis with TimeFork
For this example, we use TimeFork [207], an interactive visual prediction tech-
nique to support users exploring the future of time-series data. The TimeFork
implementation allows an analyst to explore a multitude of potential futures for
specific stocks by initiating a dialogue between the analyst and the user. Here we
use TimeFork to create a narrative for tech market stocks (in this case Apple and
Netflix).
Our user starts off the session with an introductory scene involving a YouTube
96
video featuring Warren Buffett talking about the current stock market. Then the
user switches to a scene incorporating the TimeFork tool, using it to predict the
current trend of hot tech stocks Apple and Netflix by simply interacting with the
tool in a web browser. The user can even switch to a window of a desktop visual-
ization tool such as Tableau or Spotfire and include that into the video if desired.
Meanwhile, the user is narrating the interaction and the findings using a live record-
ing of himself using the computer?s webcam and its built-in microphone. All of
these media sources are composed, recorded, and streamed in the DataTV tool in
real-time using the active scene specification. The main narrative of the data video
would describe a scenario for stock market trends, similar to what you may hear on
financial news. The user closes the video with a PowerPoint information slide that
summarizes the main trends.
6.3.3 NY Times Comment Data Analysis with CommentIQ
The CommentIQ [208] visual analytics system is designed to help online com-
munity moderators manage large amount of comments associated with online arti-
cles by automatically ranking them based on criteria such as relevance, readability,
personal experience, and length.
In this example, the user wants to author a streaming data video about the
community response on an article from The New York Times2 titled ?City Reacts:
State of Emergency? during the 2015 racial unrest in Ferguson, Missouri following
the death of Michael Brown at the hands of a white policeman. The user collects
2http://www.nytimes.com/
97
Figure 6.7: DataTV recording session for a stock market prediction story involving
a YouTube clip of Warren Buffet talking about the current state of the stock market
as well as the TimeFork web-based visualization tool [207] for predicting the stock
price of Apple and Netflix.
two infographics with topics on murder rate across races, and the SWAT deployment
rate of different races. The online interactive visualization CommentIQ looks deeper
into the comments of the article from The New York Times.
This data video leverages the advantages of each of the three types of media
source: video, infographics, and visualization. First, the narrator uses a YouTube
video of the Ferguson incident as the introduction. This video shows the confronta-
tion between the protesters and the police, providing a suitable framing to the video
that emphasizes the direness of the situation. The two infographics (Figure 6.9) give
background information by showing an overview of the guns and crimes in the area.
Finally, the CommentIQ visualization allows the narrator to discover trends in how
98
Figure 6.8: Interface of CommentIQ system supporting multidimensional analysis
for online article comments.
the NYT commenter community responded to the article. In all of these cases, the
narrator is able to scribble directly on the composited video output to highlight
interesting or important aspects of the video, such as outliers or trends.
The live streaming functionality of the DataTV platform opens up an entirely
new potential for the New York Times to provide a live complement to go with their
online comment system. Using the DataTV live stream, community moderators
could aggregate and discuss comments in real time, for example when polling voter
panels for political debates.
6.4 Qualitative Evaluation
In order to understand how experts and practitioners from the field of infor-
mation science and HCI would use DataTV, and to identify potential advantages
99
Figure 6.9: DataTV being used to record a streaming data video on race, murder
rate, and SWAT activity.
and challenges in using DataTV, we conducted a usability study. Because there is
no comparable tool for creating live-streamed data videos in real time, we opted
to not perform a comparative study, but instead to focus on the affordances and
capabilities of DataTV in a qualitative evaluation.
Our intent with the evaluation was to understand how DataTV can be used
in different scenarios. We thus picked two separate data representations, and de-
signed a set of tasks for each data representation such that the tasks were of similar
complexity across the two scenarios. Each participant was assigned one data rep-
resentation, with the associated set of tasks. During the process we use the basic
expert review method proposed by Tory and Mo?ller [204], which includes experts
evaluating a tool using pre-defined heuristics. The purpose of the study was to test
100
Figure 6.10: DataTV recording the CommentIQ system being used to filter com-
ments over time.
the usability under the context of the works of the experts.
6.4.1 Participants
We engaged two volunteer participants?Expert 1 and 2?to evaluate our sys-
tem. The two participants have extensive experiences for using information visu-
alization to tell data stories. They are both very knowledgeable for the existing
information visualization and visual analytics system. They have created and re-
viewed tools in both fields. The participants were Ph.D. students in the field of HCI
or information visualization with at least three years of experience. Both were male.
101
6.4.2 Apparatus
We conducted the experiment on a standard laptop computer equipped with a
15-inch LCD screen (resolution 1280?800), a standard keyboard, and a three-button
mouse. The built-in camera was used for recording the user?s speech with voice.
6.4.3 Tasks
Each participant?s task was to create data videos using DataTV and the data
visualization randomly assigned to them. Participants were required to use at least
one interactive visualization in their video. They were allowed to pick any other
appropriate media sources on the web to support their stories. The tasks were
devised so that the experts would need to explore the visualizations, select media
sources, and sift through information to create a narrative. The data stories we
asked them to create were inspired by our examples from Section 6.3:
? Obama Budget: A visualization illustrating the components of government
budget in the year 2013.
? U.S. Census: Demographics for the state of Florida.
A set of tasks was prepared for each scenario, requiring the participants to
explore the data visualizations in detail and to look for supporting information
online, before finally creating a data video to answer the question. A sample set of
tasks for one of the scenarios (Obama budget) is given below:
102
? Explain roughly how the total ($3.7 trillion) were allocated (using a 30-second
video);
? Explain the main types of spending (using a 30-second video);
? Explain how the spending has changed since the last budget (using a 40-second
video); and
? Describe how much of the budget was allocated to social security (using a
60-second video).
6.4.4 Procedure
Participants were shown a demonstration of DataTV and given as much time
as they needed to explore and familiarize themselves with the system. We then
gave them a sheet of tasks and asked them to create a data video in response to
each task. The resulting data video would ideally make use of multiple media types,
interactive visualization, and innovative storytelling techniques. During the creation
process, which was capped at 60 minutes, the experts were allowed to ask questions
about the interface. We followed a think-aloud protocol with the participants, and
recorded their behavior via video and written observations.
For the purpose of this study, the participant-generated stories were recorded
and saved (instead of streamed dynamically). The final participant-generated videos
were limited to a duration of one minute. During the process, the participants used
the DataTV tool to design, sketch, record, and produce their videos. After com-
pleting their tasks, we followed up with an interview where participants explained
103
their process and provided feedback on the system.
6.4.5 Results
We collected and analyzed both the products as well as the observations and
interview feedback from each expert review session.
Products: Each expert made several data videos, all less than one minute
long. Representative data videos are attached as supplemental materials. We also
captured screenshots of the ending exact workspace for each expert (Figure 6.11,
Figure 6.12). All in all, the resulting data videos were all of good quality and suggest
that the DataTV platform was instrumental in the process.
Figure 6.11: Expert 1 reviewed the U.S. budget trend in 2013 during the time of
President Obama. The expert presented their insights using DataTV.
Observations: Overall, participants used the tool with little training. Both
104
Figure 6.12: Expert 2 reviewed the population change of Florida. Insights on housing
and population gain were presented using DataTV.
experts familiarized themselves with the basic operation by first making a few trial
videos that were recorded and saved locally. Once satisfied with the basic mental
model, they spent a considerable time (20-25 minutes) finding source materials, se-
lecting visualizations, and writing notes. They then spent an additional 10 minutes
to create their scenes, including arranging the layout and size of the different media
sources. The actual recording of each video was surprisingly quick; given the prepa-
ration, the experts were both able to record their videos in a single take and with
few mistakes.
From our observations, it appeared as if the experts did not need much prompt-
ing to get familiar with the DataTV interface. The experts seemed to think the
interface behaved in a logical and predictable fashion, and they quickly became
105
proficient with very little instruction and within 10 minutes of training. Most im-
portantly, observations and think-aloud remarks seem to indicate that the experts
rapidly internalized the DataTV controls and were able to focus on the craft of
data-driven storytelling. This was indicated by their utterances increasingly dealing
with how to best organize and present insights rather than minutae of the interface.
Interview Feedback: During the structured interview after completing the
tasks, the experts were asked to give feedback on the system, including advantages
and disadvantages of DataTV over existing approaches. We summarize their feed-
back below:
? Positive: Expert 1 thought that the DataTV interface was simple and straight-
forward, with few opportunities to make mistakes. Expert 2 remarked that
compared to using multiple software platforms, our system makes the video
creation process seamless as it requires less operations and vital tools such as
annotation, compositing, and transformations are very accessible.
Expert 2 also thought that the use of our system was surprisingly easy, par-
ticularly the streamlined workflow where very little preparation is necessary.
The expert noted that the composite video output was helpful so as to always
know what is being streamed and recorded, striking a good balance between
real-time control and accuracy.
? Negative: Both experts remarked on the lack of video editing capabilities. We
responded with the fact that such editing functionality would have precluded
real-time streaming of the tool, and they both remarked that the compromise
106
was acceptable. The first expert complained that the DataTV interface should
provide better control over media sources in the workspace.
107
Chapter 7: Discussion
7.1 Explaining the Findings
7.1.1 DataComics
Our qualitative evaluation indicated that data comics, especially with captions
were significantly more engaging, space-efficient, faster and easier to use than origi-
nal visualization/infograhic. The feedbacks from the participants also indicate that
panels of partitions help focus and memory. The captions are particularly helpful
when following the story and remembering the details. The participants mostly felt
that the data comic was more effective that it invites reading and helps build up
the story. The sequence of the panels in data comics are important especially when
helping the participants recall detailed information on one of the panels with the
help of captions. The overall sequence is more important than sequence of a small
range, i.e. two panels about the topics parallel to each other can be changed without
harming the whole storyline.
A future evaluation with more participants regarding the change of eye focus
during the experiment from the participants will definitely help. However, we don
t have the equipment or time to conduct the experiment in such short time.
108
7.1.2 DataTV
Our informal evaluation indicated that DataTV facilitated live data-driven
storytelling. It should be noted that the main contribution of the chapter 6 is
the method of live-streaming data videos, whereas our implementation is merely a
prototype to show the validity of the concept. We have feedback from both experts
that an easy to use, well-built, and integrated system is facilitating the storytellers
to create data video stories.
It is important to note that all of the functionality of the DataTV tool can
be replicated in a combination of desktop recording tools?such as VLC?and video
editing tools?such as Adobe Premiere Pro?with sufficient time and effort. While
DataTV makes constructing data videos easy with its integration of media source
picking functionality, media label editing, and video recording, each of the DataTV
videos showcased in this chapter 6 can be built using other tools. However, the
argument for the DataTV platform and related software is two-fold: (1) a single
unified platform is needed to allow for live streaming and rapid production, and (2)
the integration of all of these data-driven storytelling features in a single tool enables
the analyst to think about data videos more in terms of storytelling rather than
low-level software, mechanics, and features. Results from our qualitative evaluation
support these two arguments.
We believe our work surfaces several new issues that were not considered in
the past. For example, while Amini et al. already suggested the data videos con-
cept in 2015 [153], their work still results in a static and prerecorded video, not a
109
live-streamed one. One of the benefits may be that it is easier to quickly produce
a data video using DataTV than painstakingly using a suite of tools such as screen
recorders, video editors, and audio production tools. However, DataClips [154], pre-
sented in 2017, does provide functionality for quickly assembling several clips using
predefined visualizations. On the other hand, a live data video can be responsive
to an audience, for example in responding to questions or requests for more infor-
mation. In this way, DataTV is much more of an interactive presentation tool than
typical data video production tools (such as DataClips). This is reinforced by the
emphasis on live video in DataTV, whereas the narrator is typically disembodied
in most existing data videos. We think this suggests that live data videos as those
supported in our work is a unique data-driven storytelling medium in its own right.
It is also the reason that we found no easy baseline for a comparative evaluation.
We leave comparisons to live presentation software, such as Microsoft PowerPoint,
to future work.
7.2 Generalizing the Findings
How general are these findings? We discuss this below.
7.2.1 DataComics
We explicitly chose not to measure time or correctness. There is likely little
difference between data comics versus single visualization/infographic, and this per-
ception was also confirmed by participants in our experiment. Rather, the strength
110
of data comics comes from its approachable, compelling, and intuitive format. This
is further validated by Lee et al. [171], who only collected subjective ratings from
participants in their SketchStory evaluation.
7.2.2 DataTV
The utility of live-streaming data videos as a concept can be questioned. It is
certainly true that we do not foresee ?Let?s Analyze? videos to dethrone the ?Let?s
Play? category on Twitch or YouTube anytime soon. However, the power of the
internet as both a medium as well as an audience should not be underestimated.
There is already a small but growing group of Twitch communities devoted to non-
gaming, such as painting, gardening, and programming. The step is not too far from
such topics to data analysis. Besides, even if live data videos never become popular,
many of the real-time authoring techniques pioneered in DataTV will be invaluable
for creating normal, non-streaming data videos, going beyond what even tools such
as DataClips [154] can do.
7.3 Limitations
7.3.1 DataComics
First of all, much of our argumentation of using sequential art for data is
based on two assumptions: that the audience has (a) prior experience, and (b) a
favorable opinion about comics. With no prior experience, much of the benefit of an
established common ground in the visual language of comics is lost. Furthermore,
111
given the sometimes questionable respectability of comics [149, 175], its use as a
communication medium may be problematic. For example, it can be argued that
a data comic may not be the best vehicle for presentations in very formal settings,
such as a boardroom meeting. Similarly, the intrinsically light-hearted nature of
comics may be inappropriate for sensitive or difficult topics, such as natural disasters,
emergency situations, and other types of crises or stories on the loss of lives or
livelihoods.
7.3.2 DataTV
First of all, much of our argumentation of using data video for storytelling
is based on two assumptions: that the audience has (a) enough knowledge for un-
derstanding the data video, and (b) a favorable opinion about video storytelling.
Without enough knowledge, much of the benefit of an established common ground
in the visual language of data video is lost. Furthermore, given the sometimes the
higher requirement of environment?playing video might be inappropriate in some
communication situations?data video might not be the perfect choice for informa-
tion distribution under situation that noise level is sensitive or displaying device is
not well equipped. For example, it can be argued that a DataTV may not be the
best vehicle for presentations in very noisy settings, such as a couple people dis-
cussing a topic in a train station, where static material might be more suitable. It
should also be noted that the content of DataTV, often including personal webcam
video, can be inappropriate for public broadcasting.
112
7.4 Guidelines
After exploration of the taxonomy and examples of data-driven storytelling
media. We suggest the following guidelines for others to conduct examination and
exploration for data-driven storytelling media.
7.4.1 Be Open to Unique Media for Storytelling
When facing new type of media, one should not be so restrict that only cer-
tain types of media are suitable for data-driven storytelling. Dancing is not often
considered as a viable way of data-driven storytelling. However, with proper la-
belling and moves, dancing can tell a story about how to conduct a bubble sort
very intuitively [190]. Augmented reality (AR) and virtual reality (VR) have gained
much attention recently. We found examples [72, 78, 80, 86]that use AR and VR
extensively for data analysis and exploration. The applications were not invented
to conduct data-driven storytelling, but these applications show that VR and AR,
as newly introduced media, make great example of how new technologies can be
integrated in the scope of data-driven storytelling.
7.4.2 Avoid Relying on Artistic Skill
Not everyone is an artist. Many data-driven storytelling media require artistic
skill. For instance, documentaries [12, 13, 15] with data stories require editing and
video shooting skills. DataComics is another example that requires a certain level of
artistic skills. Conducting data-driven storytelling with comics [21,151,156] requires
113
that the creator choose comic figure and other comic features accordingly, so that
the generated datacomic is enjoyable and easy to follow.
To make the creating process less dependent on the creator?s artistic stills, the
applications for new media should be loaded with more automatic features, such as
template recommendation [143, 196], computer vision for caption recommendation,
and natural language processing for caption generation. The applications should
have the mostly used functions well integrated in the interface to create an environ-
ment convenient for the storytellers so that the storytellers can focus on the content
instead of the user interface.
7.4.3 Start from Existing Examples, Don?t Be Too Unique
When creating new data-driven storytelling media types, it can be started
from existing examples. It is unnecessary to think about new media types totally
unique. One can use the taxonomy to determine each value of the dimensions, and
then change values of certain dimensions.
Some examples for data-driven storytelling with augmented reality and virtual
reality show that the latest technology are all practical on existing applications such
as design [72], construction [76], and data analysis [81, 86].
114
Chapter 8: Conclusion
In this dissertation, we studied data-driven storytelling media for casual users
as the consumers of information by expanding its horizon, and exploring how it
aids casual users in viewing, analyzing, and understanding data. In order to achieve
this, we present a new taxonomy focused on media types for data-driven storytelling
with the purpose of opening the field to a wider set of future possibilities. Our
work started with collecting a large amount of evidence of data-driven storytelling
using novel and diverse media, from the spoken word to interpretative dance and
choreography. From the taxonomy and guideline derived, we investigate media types
particularly useful for casual users with little professional training or background in
data visualization and analysis.
With our taxonomy and the guidelines derived, we proposed two examples:
DataComics, leveraging comics (sequential art [156]) for data-driven storytelling [1],
and DataTV, live-streaming data videos for this purpose.
Through collecting and studying the examples as well as the two systems we
proposed, we found several common phenomena:
? Many of the new media types that we studied are not well investigated under
the scope of data-driven storytelling. For example, there are still very few
115
dedicated tools for story authoring with Augmented Reality or Virtual Reality
for data visualization.
? Most of the media types for data-driven storytelling we examined can already
be leveraged using existing software systems. For example, a data comic could
be created using only Microsoft PowerPoint, or a data video using Adobe
Premiere.
? However, it is very important to have an integrated system for authoring
and presenting data-driven stories. Most of existing authoring tools had al-
ternatives before they were invented. For example, Adobe Premiere, with
its advanced artificial intelligence video editing functions, can be replaced by
multiple simpler software systems, such as basic video editing tools, and more
human effort. The availability of dedicated or automatic tools allow the user
to focus on the subject matter rather than the technical or logistical aspects
of the process.
? Evaluation of data-driven storytelling system should not only study the pre-
sentation of data-driven stories, but also the authoring experience. Although
it is easier to assess the effect on presentation, having a tool well integrated
and easy to use for the authoring process is equally important and is the key
to encourage creating new data-driven storytelling.
116
Chapter 9: Future Work
In the future, our taxonomy can still be further refined with exploring more
types of data-driven storytelling media. More specifically, the current list of dimen-
sions and values for each media type can be further justified to make the taxonomy
more robust and thorough. Our exploration of data-driven storytelling media has
been mostly focused on and following our own taxonomy, but connecting with sim-
ilar taxonomies?taxonomies of general storytelling? and taxonomies of other fields?
taxonomies of visualization techniques?are also potential beneficial. Such connec-
tions can be used for expanding and completing the current taxonomy, as well as
inspecting our taxonomy and guideline from other directions.
When new types of data-driven storytelling media is invented over time, they
are still yet to be categorized using our taxonomy. This thesis does not cover all the
possible media types of data-driven storytelling. There are new media types coming
out all the time and their combination with data-driven storytelling is pending
further investigation. Also, many existing media types are available to use for data-
driven storytelling.
For example, imagine creating a data-driven storytelling tool designed to sup-
port speech. Such a tool may be supported by the natural language processing
117
technology and automatically generates statistical diagrams to illustrate the ideas
extracted from the speech. Clearly, there is ample opportunity for leveraging data-
driven storytelling in many other forms than has currently been studied in the
visualization community.
The process of creating new data-driven storytelling media for casual users
is based on the guidelines we proposed. We have shown the effectiveness of the
guidelines through creating DataTV and Data Comics [152]. However, the guideline
may not be sufficient for all future situations. It is likely that new guidelines will
be added to the current collection.
118
Appendix A: Survey of Data-Driven Storytelling Media
A.1 Storytelling in Movies and Documentaries
Figure A.1: Marine Plastic Pollution [5].
A.1.1 Stop Marine Plastic Pollution
The marine plastic pollution condition in Figure. A.1 shows the sources and
conditions of marine pollution. The map shows the distribution of pollution condi-
tion.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
119
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
Figure A.2: Scene that shows relations between CO2 and temperature change.
120
Figure A.3: Ice are melting with temperature increases.
Figure A.4: Scene that seasons are shifting and causing problems for animals
A.1.2 A Beautiful Planet
This is a documentary [193] that telling a story about global warming. The
movie consists of scenes of warming signs such as melting glacier, illustration of
121
correlation between temperature and CO2, and animal birth timing changes. The
movie is not just a video of one guy giving presentation with data visualization,
whereas a large portion of it is footage of natural scenes and interviews. The story-
telling is to use data to prove that global warming is an inevitable truth that needs
peoples attention immediately. The images below can show that both data (1. the
changes of both temperature, CO2 level, and the correlation between them. 2. The
birth timing change. ) and natural footage are important to compose such a story.
Figure( A.2, A.3, A.4), can show that both data (1. the changes of both
temperature, CO2 level, and the correlation between them. 2. The birth timing
change. ) and natural footages are important to compose such a a story.
Informal classification: As a movie that is shared on-line, the audience is po-
tentially many people on the Internet, typically viewed individually on their personal
devices (although public mass viewings are certainly possible). The delivery method
is thus distributed and asynchronous. Because full-motion video is a high-bandwidth
medium, and the user has no control over the pacing (except to pause the play-
back) there is significant potential for cognitive overload. The visual components
used include full-motion video, animated graphics, text, and non-interactive data
visualizations. Video is static in that it cannot be manipulated or interacted with
(except for controlling the playback) by the audience. The viewing sequence is in
sequence. However, it is also replicable, as it is stored and can be played back at any
time.
122
Figure A.5: News for Irma Hurricane [5]. Part of Miami will be flooded.
A.1.3 ABC News for Irma Hurricane
The ABC News in Figure. A.5 shows the forecast of Hurrican Irma and its
potential damage. The heatmap illustrate the flooded area and the intensity of the
wind.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
123
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.1.4 Is Height All in Our Gene
Figure A.6: Height change [6] along with time. The average height of human in-
creases with time.
The data-driven video shows how human height is affected by gene. Line chart
is used to demonstrate the change of height with time in Figure. A.6. The story is
the change of human height along time and among people within the same period.
The data is the height of time and people of a specific group.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
124
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, table, and non-interactive data visualizations.
Video is static in that it cannot be manipulated or interacted with (except for
controlling the playback) by the audience. The viewing sequence is in sequence.
However, it is also replicable, as it is stored and can be played back at any time.
A.1.5 Ancient Greece in 18 minutes
Figure A.7: Ancient Greek [7] ruler.
The data-driven video shows how Greece changed its territory during the his-
tory. Colored map graph is used to demonstrate different countries in Figure. A.7.
The story is the change of Greece narrated by text and time is visualized with bar
chart as data.
125
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.1.6 The history of Asia: every year
Figure A.8: The change of Asia [8] viewed from a map.
126
The data-driven video shows how Asian countries changed territories during
the history. Colored map graph is used to demonstrate different countries in Fig-
ure. A.8.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
127
A.1.7 Wealth Inequality in America
Figure A.9: The documentary [9] shows how one percent of the population occupies
a large percent of wealth.
The data-driven video shows how unbalanced the wealth are owned by different
group of people in Figure. A.9 The bar chart and annotations show that the half of
the stocks, bonds, and mutual funds are owned by one percent of the people. The
super rich one percent has more wealth that the bar chart is not able to hold in the
current view.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
128
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.1.8 The Joy of Stats
Figure A.10: The documentary [10] shows how data visualization works.
The data-driven video shows statistical visualization can summarize the his-
tory over two hundred years in a few minutes in Figure. A.10 Scatter plots show
how income and population change during the time.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
129
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.1.9 Religions and babies
Figure A.11: The documentary [11] shows how numbers of babies varies with reli-
gion.
130
The data-driven video shows relation of baby population and religion with
different years in Figure. A.11 The scatter plot shows the distribution of baby pop-
ulation size which is easy to map on the visualization.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
131
A.1.10 Gene Pool Decline
Figure A.12: The documentary [12] shows how gene pool of human declines.
The data-driven video shows how gene pool declines with time as we are more
relied on medical treatment. A.12 The graph and statistical visualization shows how
human gene pool evolves and regresses with time.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
132
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.1.11 How To End Poverty
Figure A.13: The documentary [13] shows how to end poverty.
The data-driven video shows how poverty is distributed with locations and
time. A.13 The statistical graph and map shows how the poverty decreases along
time from ancient time to modern days.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
133
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.1.12 China?s Geography Problem
Figure A.14: The documentary [14] shows what China?s problem with its neighbours.
The data-driven video shows how China has a problem with its geography
situations. A.14 The video uses map and event visualization to show the conflicts
between China and surrounding countries.
Informal classification:
134
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.1.13 Imaginary Numbers Are Real
Figure A.15: The documentary [15] shows where real number comes from.
135
The data-driven video shows how imaginary numbers are generated and why
they are useful. A.15 The diagrams and numbers shows how the imaginary numbers
are different compared to rational numbers.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
136
A.1.14 Big Data Revolution
Figure A.16: The documentary [16] shows how the revolution of big data take place.
The data-driven video shows how animation and statistical visualization with
VR and AR can change the use of big data in daily life. A.16
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
137
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.1.15 The Truth About Population
Figure A.17: The documentary [17] shows the relation between wealth and the size
of population.
The data-driven video shows how population group with different levels of
income will be treated in the society Figure. A.17 The barchart shows that the
group with 100 dollar income will ignore the different between groups of 10 dollar
income and 1 dollar income.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
138
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.1.16 Inside the mind of a master procrastinator
Figure A.18: The documentary [18] shows how exactly some procrastinator thinks.
The data-driven video shows the process that someone used to procrastinating
gradually turned into a master procrastinator Figure. A.18 The animation of event
and statistical visualization shows how his time is distributed.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
139
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.1.17 How data will transform business
Figure A.19: The documentary [19] shows how big data transform business
The data-driven video shows how digital revolution takes place and how the
stock of data is booming with statistical, event and continuous visualization in
140
Figure A.19.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
141
A.1.18 Will Saving Poor Children Lead to Overpopulation
Figure A.20: The video [20] shows how the poor has more children than others.
The data-driven video shows how saving poor children will stop overpopulation
in Figure. A.20 The bar chart shows that the majority of the population have family
patterns that two parents only have two children, while the poor family have more
children than two on average with some children dying at young age.
Informal classification:
As a documentary that is shared on-line, the audience is potentially many peo-
ple on the Internet, typically viewed individually on their personal devices (although
public mass viewings are certainly possible). The delivery method is thus distributed
and asynchronous. Because full-motion video is a high-bandwidth medium, and the
user has no control over the pacing (except to pause the playback) there is signifi-
cant potential for cognitive overload. The visual components used include full-motion
142
video, map, animated graphics, text, and non-interactive data visualizations. Video
is static in that it cannot be manipulated or interacted with (except for controlling
the playback) by the audience. The viewing sequence is in sequence. However, it is
also replicable, as it is stored and can be played back at any time.
A.2 Data Comics
Figure A.21: Phd comic [21]. The ambition decreases along time.
A.2.1 PhD Comic
This comics in Figure. A.21 has a story that the PhDs are stressed and their
ambitions are decreasing. The content and results of the two stories are shown by
the first and last frame separately. The data, which is shown in a qualitative way
in the middle frame showing a decreasing trend.
Informal classification: As a published comic that is shared on-line, the au-
dience is potentially many people on the Internet, typically viewed individually on
143
their personal devices (although public mass viewings are certainly possible). The
delivery method is thus distributed and asynchronous. Data visualization is a low-
bandwidth medium, but the user has no control. The visual components used include
full- text, photographics, comic figures, and data visualizations. The data visual-
ization in this comic is static in that it cannot be manipulated or interacted with.
The viewing sequence is in sequence or branch. However, it is also replicable, as it
is stored and can be played back at any time.
144
Figure A.22: NFL player report [22]
A.2.2 NFL Player Data
The comic in Figure. A.22 is another example of the application of the design
of data comics. The story is about the arrests data of NFL players. Firstly, the
layout is divided into irregular frames like typical comic strips. The comic features
such as speech bubble and comic figures are also leading the storyline. The texts in
145
the speech bubble are transition sentences that guide the readers about what about
happen and what to expect in this or next few frames. The data is represented as
data visualization for bar chart and scatter plot, illustrating the detailed number
such as number of DUI and number of arrests. Within the data visualizations
comic features such as speech bubble and directional arrows are used to highlight
numbers. The original page of this datacomics is interactive that the visualizations
can be clicked to show more information.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, photograph-
ics, comic figures, and data visualizations. The data visualization in this comic is
static in that it cannot be manipulated or interacted with. The viewing sequence
is in sequence or branch. However, it is also replicable, as it is stored and can be
played back at any time.
146
Figure A.23: Graph comic for European relations.
A.2.3 Graphic Comic
Bach et al. [151] expended the design of data comics into network graphs A.23.
This still falls into the category of telling data-driven stories with comics. The comic
styles graphs are organized into strips of frames like comic strips. The story is about
how European countries formed alliance during the early 1900s. The data is about
the process that how different countries changed their relationships along time. The
storytelling process of this example is simple with text description and comic style
network graphs, but it enables the general audience to fast understand complicated
temporal changes.
Informal classification:
147
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, photograph-
ics, comic figures, and data visualizations. The data visualization in this comic is
static in that it cannot be manipulated or interacted with. The viewing sequence
is in sequence or branch. However, it is also replicable, as it is stored and can be
played back at any time.
A.2.4 Comic style Dashboard
Figure A.24: Comic style dashboard [23]
148
The design of comic style dashboard with data visualization A.24 can be ex-
pended into panels with comics and data visualization. This still falls into the
category of telling data-driven stories with comics. The comic styles graphs are
organized into sequence of panels. The story is about how a commentator can help
the illustration of an idea with data visualization. The data is the marketing and
sales trend.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
149
A.2.5 Infographic Comic
Figure A.25: Infographical Comics [24]
The design of data comics into network graphs A.25 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a big infographic.
The story is about how gravity changes from the edge of the solar system to the
inner circle such as Venus and Mars. The data is the gravity levels demonstrated as
the level of water and the altitude of mountain.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
150
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
A.2.6 NYC Restaurant Data Vis Comic
Figure A.26: Comics for data of restaurants in NYC [25]
The design of data comics with statistic graphs A.26 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The story is about how the style and location of restaurants in
New York City are distributed. Different visualization types are used to demonstrate
the different data fields of the restaurants.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
151
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
152
A.2.7 Marvel vs DC Comics
Figure A.27: Comics of the comparison of characters from Marvel and DC [26]
153
The design of data comics with statistic graphs A.27 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a big infographic.
The story is about the comparison of the comic figures from Marvel and DC series.
The data is the visualization of figure distribution and comparison of different fields
such as dressing and mental status.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
154
A.2.8 Body Cartoon
Figure A.28: Comics of someone having tatoo [27] on his arm as data visualization.
The design of data comics with statistic graphs A.28 can be expended into data
visualization with comics. This still falls into the category of telling data-driven sto-
ries with comics. The comic styles graphs are organized into a few visualizations
on one human arm. The story is about how the visualization is rendered as tat-
too on a human body. The data is the location of the visualization and the data
visualizations themselves.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
155
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
156
A.2.9 Spider Man Comic Visualization
Figure A.29: Spider Man Visualization in Comics [28]
157
The design of data comics into network graphs A.29 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a big infographic.
The story is about the technology that the Spiderman?s suit uses. The data is the
layers and functions of the suit.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
A.2.10 Cell Phone Comic
Figure A.30: Cell Phone Visualization in Comics [29]
158
The design of data comics into network graphs A.30 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a strip of comic
with information visualizations. The story is about how the trends of cancer and
cellphone changing.The data is the visualization that illustrate the number change
of cancer incidents and cellphones.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
159
A.2.11 Linear Regression Comic
Figure A.31: Linear Regression Visualization in Comics [30]
The design of data comics into network graphs A.31 can be expended into
infographics with comics. This still falls into the category of telling data-driven sto-
ries with comics. The comic styles graphs are organized into a strip of comic with
information visualizations. The story is about the visualization of a linear regres-
sion.The data is the visualization comparison of linear regression and a randomly
drawn diagram based on the data points.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
160
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
A.2.12 Vocation Stress Comic
Figure A.32: Visualization to show the stress level change with vocation in
Comics [31]
The design of data comics into network graphs A.32 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a strip of comic with
information visualizations. The story is about the visualization of the stress level
as the vocation starts and ends.The data is the visualization of stress level changed
161
not as expected for being interrupted by worrying about work during vocation.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
A.2.13 Desk Entropy Comic
Figure A.33: The Increase of Entropy of Visualization in Comics [32]
162
The design of data comics into network graphs A.33 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a strip of comic
with information visualizations. The story is about the visualization about how
messy the desk can be for a PhD student.The data is the visualization that a PhD
student?s desk is getting more and more messy during time.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
163
A.2.14 PhD Grooming Comic
Figure A.34: The Need of Grooming of Visualization in Comics [33]
The design of data comics into network graphs A.33 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a strip of comic
with information visualizations. The story is about the visualization of the grooming
condition of a PhD student.The data is the visualization of the grooming condition
getting worse through time.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
164
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
A.2.15 PhD Procrastination
Figure A.35: The Change of Procrastination and Stress Level in Comics [34]
The design of data comics into network graphs A.35 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a strip of comic with
information visualizations. The story is about the visualization of procrastination
165
condition. The data is the visualization of how level of procrastination increases
when word load is high in reality.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
166
A.2.16 Day of Life an American
Figure A.36: The Change of Procrastination and Stress Level in Comics [35]
167
The design of data comics into network graphs A.36 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a strip of comic
with information visualizations. The story is about the visualization of an average
American?s life. The data is the comic strips that shows what this person does
throughout the day.
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
168
A.2.17 Curve Fitting Comic
Figure A.37: Curve Fitting Visualization in Comics [36]
169
The design of data comics into network graphs A.37 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a strip of comic
with information visualizations. The story is about the comparison of all the curve
fitting methods.The data is the visualization of the matching of different .
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.
170
A.2.18 Seashell Probability Comic
Figure A.38: Illustration of Conditional Probability Visualized in Comics [37]
The design of data comics into network graphs A.38 can be expended into
infographics with comics. This still falls into the category of telling data-driven
stories with comics. The comic styles graphs are organized into a strip of comic
171
with information visualizations. The story is about the comparison of all the curve
fitting methods.The data is the visualization of the matching of different .
Informal classification:
As a published comic that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium, but
the user has no control. The visual components used include full- text, comic fig-
ures, and data visualizations. The data visualization in this comic is static in that
it cannot be manipulated or interacted with. The viewing sequence is in sequence
or branch. However, it is also replicable, as it is stored and can be played back at
any time.x
A.3 Web Article with Data Visualization
Some web pages consist of very rich media including text, image, and visual-
ization. Being a bit different from infographics, rich web pages are more interactive,
as all the media types inside a web page can be dynamic. The data can be updated
in real time, and the visualization can be interactive. The forms of data in a web
page is more versatile than infographics.
172
A.3.1 The Two Americas
Figure. ( A.39, A.40, A.41, A.42, A.43, A.44) [194] are showing an example
of a web page telling a data story about the comparison of election between Donald
Trump and Hillary Clinton. The story is composed with visualization map, text
description and table.
Figure A.39: The Two Americas
173
Figure A.40: The Two America: Trump.
Figure A.41: The Two America: Cliton
174
Figure A.42: The Two America: description of each side.
Figure A.43: The Two America: description of Cliton side
175
Figure A.44: The Two America: comparison of two sides.
Informal classification:
As a published article that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium. The
visual components used include full- text, photographics, and data visualizations.
The data visualization in this article is basically interactive in that it supports hov-
ering, clicking, and other ways from the data visualization. The viewing sequence
is in parallel. However, it is also replicable, as it is stored and can be played back
at any time.
176
A.3.2 Strikeouts on the Rise
Figure A.45: Strikeouts on the rise
177
Figure A.46: Strikeouts on the rise [38] for each player.
Figure. ( A.45, A.46) show that the visualization can be interactive and more
informative. This story shows the trend of strikeouts is on the rise for the league
in 2012. The line chart combined with scatter plot enables the audience to hover
each point and see the accurate data for that time. For more detailed information,
a table with hyper-link is provided for each team. Here I only show part of the web
page but the audience can easily draw a story with detailed information from the
visualization, text and table.
Informal classification:
178
As a published article that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium. The
visual components used include full- text, photographics, and data visualizations.
The data visualization in this article is not interactive. The viewing sequence is in
parallel. However, it is also replicable, as it is stored and can be played back at any
time.
A.3.3 1.5 Million Missing Black Men
Figure A.47: Missing men for different races [39]
179
Figure A.48: Which areas have men missing
Figure A.49: Missing men for blacks and whites.
The example in Figure.( A.47, A.48, A.48) shows the black men missing for
every one hundred population. The story is that more black men are missing then
other racess.
Informal classification:
As a published article that is shared on-line, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
180
though public mass viewings are certainly possible). The delivery method is thus
distributed and asynchronous. Data visualization is a low-bandwidth medium. The
visual components used include full- text, photographics, and data visualizations.
The data visualization in this article is not interactive. The viewing sequence is in
parallel. However, it is also replicable, as it is stored and can be played back at any
time.
A.4 Visualization Tools
A.4.1 VisJocky
Figure A.50: VisJocky interface [40]
181
Figure. A.50 is to simply visualize and add annotation to data to form a story.
The tool creates a basic line chart with highlighted text annotation as description.
The original data is a table of Dow Jones Index average and S&P 500 index for a few
months. The story is that the Dow index loses 109 points and S&P index surged.
Informal classification:
As a visualization that is used individually but shared publicly, the audience
is potentially many people on the Internet, typically viewed individually on their
personal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has basic controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization in this article
is interactive in that it supports hovering, clicking, and other ways from the data
visualization. The viewing sequence is in sequence. However, it is also replicable, as
it is stored and can be played back at any time.
182
A.4.2 ChartAccent
Figure A.51: ChartAccentc interface [41]
This tool shown in Figure. A.51 is used to add additional annotation and
highlighting to existing data visualization. The blue area connecting United States
and other small dots shows a subset of countries that are located in North and
South America. Data: table of countries with their life expectancy and fertility
rate. Story: The story of this added annotation is to show a group of North and
South America countries about the distribution of their life expectancy and fertility
rate. The connected highlighted area is a clear indication that the distribution of
America countries are spreaded very sparse.
Informal classification:
As a visualization that is used individually but shared publicly, the audience
is potentially many people on the Internet, typically viewed individually on their
183
personal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has basic controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization in this article
is interactive in that it supports hovering, clicking, and other ways from the data
visualization. The viewing sequence is in sequence. However, it is also replicable, as
it is stored and can be played back at any time.
A.5 Sketch Tools
A.5.1 SketchStory
Figure A.52: Process of design story with SketchStory
How to tell a story using SketchStory [171]: The presenter can start from (a)
sketching icon and chart axis as example, (b) let SketchStory system finish the rest
of the chart by combining sketches and underlying data, to (c) the produced media,
whose chart can be interacted with presenter.
Informal classification:
184
As a sketching tool that is used individually but shared publicly, the audience
is potentially many people on the Internet, typically viewed individually on their
personal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a high-bandwidth
medium, but the user has basic controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is interactive in that it supports hovering, clicking, and other ways from
the data visualization. The viewing sequence is in sequence. However, it is also
replicable, as it is stored and can be played back at any time.
Figure A.53: Sketcholution comic strip and summary. [42]
A.5.2 Sketcholution
Sketcholution is a method to create visual histories of hand sketches automat-
ically. The resulting aggregation dendrogram in Figure. A.53 is able be adjust at
any level based on display space. It can also be used to create a visual history in
two styles including comic-strip for highlighting differences and a single summary
frame annotating each object in the sketch scene.
Informal classification:
185
As a sketching tool that is used individually but shared publicly, the audience
is potentially many people on the Internet, typically viewed individually on their
personal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a high-bandwidth
medium, but the user has basic controls. The visual components used include full-
text, photographics, and data visualizations. The viewing sequence is in sequence.
However, it is also replicable, as it is stored and can be played back at any time.
A.5.3 DataSketches? Royal Constellations
Figure A.54: The sketch visualization of the royal members [43].
Figure A.55: The sketch connects the royal members
The example shows that a sketch based visualization tool is a method to create
visual demonstration of the relationship of a 1000 years of ancestral connections in
186
the European royal families. The resulting animated sketches in Figure. A.54 is to
show the relationships between two royal figures by sketches. The story also includes
the annotation from the storyteller. The data is the information of the relationship
between royal family members.
Informal classification:
As a sketching tool that is used individually, the audience is potentially many
people on the Internet, typically viewed individually on their personal devices (al-
though public mass viewings are certainly possible). The delivery method is thus
distributed and synchronous. Data visualization is a low-bandwidth medium, but the
user has basic controls. The visual components used include full- text, sketches, and
data visualizations. The data visualization produced by this tool is interactive. The
viewing sequence is in sequence or branch. However, it is also ephemeral, as it is
stored and can be played back at any time.
A.5.4 DataSketches?Carcaptor Sakura
Figure A.56: Visual Explanation of the Relationship of a Cartoon Series [44]
187
The example shows that a sketch based animation is a method to create visual
explanation of the relationship of a cartoon series. The resulting animated sketches
in Figure. A.56 is to show the relationships between selected cartoon figures by
sketches. The story also includes the annotation from the storyteller. The data is
the information of the relationship networks in the cartoon series.
Informal classification:
As a sketching tool that is used individually but shared publicly, the audience
is potentially many people on the Internet, typically viewed individually on their
personal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and synchronous. Data visualization is a low-bandwidth
medium, but the user has basic controls. The visual components used include full-
text, sketches, and data visualizations. The data visualization produced by this tool
is interactive. The viewing sequence is in sequence or branch. However, it is also
ephemeral, as it is stored and can be played back at any time.
A.5.5 The Big Short Movie Explained Animated
Figure A.57: How mortgage bond combined into sub-prime mortgage [45].
188
The example shows that a sketch based animation is a method to create visual
explanation of a movie story line. The resulting animated sketches in Figure. A.57
is to show the story of the movie into steps. The story also includes the annotation
and the personal understanding of the movie from the storyteller. The data is the
information of the financial product in the movie.
Informal classification:
As a sketching tool that is used individually but shared publicly, the audience
is potentially many people on the Internet, typically viewed individually on their
personal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a high-bandwidth
medium, but the user has no controls. The visual components used include full- text,
sketches, and data visualizations. The data visualization produced by this tool is
not interactive. The viewing sequence is in sequence. However, it is also replicable,
as it is stored and can be played back at any time.
A.6 Infographics
Infographics is another example of storytelling media that combines a few basic
forms of media to interpret a data story. Most infographics involve certain forms
of data visualization and text explanation. Infographics are very widely used from
entertainment to education purpose. The data for the story is mostly represented
by data visualization and tables.
189
Figure A.58: New Orleans housing population decreases [46]
A.6.1 New Orleans Housing Population
For the example in Figure. A.58, the story is about he population change of
New Orleans. It clearly shows a trend that the coast area and downtown of New
Orleans has an increasing black share of the population, as well as a decreasing
occupation of the house units. The story is both revealed by the heat map and text
description. Most of the time, the infographics is not meant to be interactive, the
story is well organized by the positioning and annotation of the data visualization.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
190
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.2 Top writers for best sellers
Figure A.59: Top writers for best sellers [47]
191
This infographic in Figure. A.59 is to show that very few colored writers were
ranked top bestsellers. Data is the number of colored writers and two top sellers
ranked the most times. The story is the writers of the bestsellers do not have very
good diversity.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
192
A.6.3 Public Library Report
Figure A.60: Public library report [48]
The infographics in Figure. A.60 is show the managing status for a public
library. It includes data of people, inventory and financials. Data: The original
data is a data sheet of the running conditions of public library Story: The story is
that the business is running well. All major data including people, usage, products
showed the library is in a good condition.
193
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
194
A.6.4 US Poverty
Figure A.61: How poverty is distributed in the U.S. [49]
The infographics in Figure. A.61 shows the percentage of poor people and the
ethic distribution around the US. Data: The distribution of poor people. Story:
The way the poor people are distributed among subgroups.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
195
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.5 London March
Figure A.62: Distribution of London Marches in Visualization [50]
The infographics in Figure. A.62 shows the distribution of features of marches
happened in London. Data: The distribution of Marches. Story: The way the
marches in London are distributed for their time and size.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
196
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.6 Yemen War
Figure A.63: Geo-distribution of All the Forces in Yemen Civil War in Visualiza-
tion [51]
The infographics in Figure. A.63 shows the distribution of all forces? position of
Yemen Civil War. Data: The distribution of all forces. Story: The way all different
forces are distributed and fight against each other.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
197
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.7 Space Industry
Figure A.64: The Stats of Space Industry and Technology in Visualization [52]
The infographics in Figure. A.64 shows the distribution of all kinds of satellite
launches in the space industry. Data: The distribution of cost of all kinds of com-
mercial launches. Story: The way all different kinds of satellite launches including
governmental and commercial purpose.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
198
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.8 Household Air Pollution
Figure A.65: The Source of Household Air Pollution in Visualization [53]
The infographics in Figure. A.65 shows the death caused by household air
pollution. Data: The distribution of all kinds of death caused by household air
pollution. Story: The household air pollution cause all kinds of death.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
199
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.9 Air Pollution Linked Death
Figure A.66: The Source of Death Caused by Air Pollution in Visualization [54]
The infographics in Figure. A.66 shows the death caused by air pollution.
Data: The distribution of all kinds of death caused by air pollution. Story: The
death can be caused by household and outdoor air pollution.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
200
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.10 North and South Korean Comparison
Figure A.67: The Comparison of North and South Korea in Visualization [55]
The infographics in Figure. A.67 shows the comparison of all aspects of north
and south Korea. Data: The distribution of all kinds of death caused by air pollu-
tion. Story: The death can be caused by household and outdoor air pollution.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
201
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.11 In the Shadow of Foreclosure
Figure A.68: The situation of foreclosure across the U.S. [56]
The infographics in Figure. A.68 shows the distribution of foreclosure percent.
The west coast and big Florida area are the heavily influenced locations. Data: The
percent of houses foreclosed in each state. Story: Giving a impression the level of
pain across US due to the sub-prime mortgage crisis.
202
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
203
A.6.12 Word of Democrats and Republicans
Figure A.69: The comparison for the Democrats and Republicans during elec-
tion [57]
The infographics in Figure. A.69 shows the most frequent word the Democrats
and Republicans usually say during the election Data: The heat map of different
words and mention of countries. Story: The story is that the focus of Democrats
and Republicans have very different focus of topics during election.
Informal classification:
204
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.13 UK and US Firearms
Figure A.70: The comparison of the firearm possession of the U.K. and the U.S. [58]
The infographics in Figure. A.70 the managing status for a public library. It
includes data of people, inventory and financials. Data: The original data is a data
sheet of the running conditions of public library Story: The story is that the business
is running well. All major data including people, usage, products showed the library
205
is in a good condition.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.14 White House Correspondent Dinner
Figure A.71: The guest distribution of the White House Correspondent Dinner [59]
206
The infographics in Figure. A.71 shows the comparison among the invited to
White House correspondent dinner. Data: the specs of different correspondents
along the time. Story: The story is the trend which correspondent is more popular.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.15 Day vs. Night: What NYCs Population Looks Like
Figure A.72: The change of population during days and nights [60]
The infographics in Figure. A.72 shows the different between day and night
for population distribution in New York City. There are more people in midtown
207
and downtown in the day time than in the night. Data: The actual distribution of
population density in the day and night of New York City. Story: There are way
more people in Manhattan in the daytime than in the night. The average commute
time is 34 minutes.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.16 Who owns everything: Big Data Today
Figure A.73: The wealth distribution [61]
208
The infographics in Figure. A.73 shows the ownership between large coopera-
tion in US. It includes data of the capital value of each company and the share of
each company to other companies. Data: The ownership data between companies.
Story: The story is how the giant cooperations have share from each other.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
209
A.6.17 Big Welsh Coast Walk
Figure A.74: The participants and locations of the big Welsh coast walk [62]
The infographics in Figure. A.74 shows the coast walk activity in Welsh. It
includes people distribution among the cities and the money raised in total. Data:
The original data is a data sheet of the running conditions of public library Story:
The story is that the business is running well. All major data including people,
usage, products showed the library is in a good condition.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
210
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.18 Hungry USA
Figure A.75: Public library report [63]
The infographics in Figure. A.75 shows hangry level of US states as a colored
map. It includes the cause and effect of hangryness on average level Data: The level
of hangryness across the US and the average data for different states. Story: The
states of New York and California have the highest level of hangryness across the
country. People in South Dakota and Illinois are pretty chill.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
211
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
A.6.19 NYC Celebrity Map
Figure A.76: The locations of celebrities? home [64]
The infographics in Figure. A.76 is show the managing status for a public
library. It includes data of people, inventory and financials. Data: The original
data is a data sheet of the running conditions of public library Story: The story is
212
that the business is running well. All major data including people, usage, products
showed the library is in a good condition.
Informal classification:
As an infographic is used individually but shared publicly, the audience is
potentially many people on the Internet, typically viewed individually on their per-
sonal devices (although public mass viewings are certainly possible). The delivery
method is thus distributed and asynchronous. Data visualization is a low-bandwidth
medium, but the user has no controls. The visual components used include full-
text, photographics, and data visualizations. The data visualization produced by
this tool is not interactive. The viewing sequence is in parallel. However, it is also
replicable, as it is stored and can be played back at any time.
213
A.7 Games
A.7.1 Halo: Reach
Figure A.77: Visualization in Halo: Reach [65]
The heatmaps in Figure. A.77 is to compare three players for their deaths
(top), kills (middle) and their differences (bottom row) separately in the game of
Halo: Reach [65]. The difference map is to show the areas in the map, at which one
player have advantages over others on the possibility of survival.
Informal classification:
As a visualization in the game is used individually but can be shared publicly,
the audience is potentially many people on the Internet, typically viewed individually
on their personal devices (although public mass viewings are certainly possible).
The delivery method is thus distributed and asynchronous. Data visualization is a
214
high-bandwidth medium, but the user has basic controls. The visual components
used include full- text and data visualizations. The viewing sequence is in parallel.
However, it is also replicable, as it is stored and can be played back at any time.
A.7.2 Call of Duty: : Black Ops
Figure A.78: Visualization in Call of Duty: : Black Ops [66]
The bar chart from Figure. A.78 is to shown the wager earnings for different
players. The players are competitive from the game Call of Duty: Black Ops [66]
Informal classification:
As a visualization in the game is used individually but can be shared publicly,
the audience is potentially many people on the Internet, typically viewed individually
on their personal devices (although public mass viewings are certainly possible).
The delivery method is thus distributed and asynchronous. Data visualization is a
high-bandwidth medium, but the user has basic controls. The visual components
used include full- text and data visualizations.The viewing sequence is in sequence.
215
However, it is also replicable, as it is stored and can be played back at any time.
A.7.3 StarCraft II
Figure A.79: Visualization in StarCraft II [67]
The Figure. A.79 [67] show that a comparison of the two opposing teams
for their building order of installations. Players are listed separately on the left and
right columns.
Informal classification:
As a visualization in the game is used individually but can be shared publicly,
the audience is potentially many people on the Internet, typically viewed individually
on their personal devices (although public mass viewings are certainly possible).
The delivery method is thus distributed and asynchronous. Data visualization is a
high-bandwidth medium, but the user has basic controls. The visual components
used include full- text and data visualizations.The viewing sequence is in sequence.
However, it is also replicable, as it is stored and can be played back at any time.
216
A.8 Social Media
A.8.1 TwitterSheep
Figure A.80: TwitterSheep interface [68]
217
The application shown in Figure. A.80 gathers the topics for ones tweets. This
example is the result for a software developer working in IT industry. The data are
all the texts of tweets for one account The story is that the summary of major topics
of a software engineers tweets. He is interested web and technology.
Informal classification:
As a visualization in the social media is used individually but can be shared
publicly, the audience is potentially many people on the Internet, typically viewed
individually on their personal devices (although public mass viewings are certainly
possible). The delivery method is thus distributed and asynchronous. Data visu-
alization is a low-bandwidth medium, but the user has basic controls. The visual
components used include full- text and data visualizations.The viewing sequence is
in sequence. However, it is also replicable, as it is stored a images.
218
A.8.2 Twitter Interactive Games of Thrones
Figure A.81: Twitter interactive Games of Thrones [69]
The text tool in Figure. A.81 is to show the story summary of each episode.
The Visualization on the right side is a connection graph showing relationship be-
tween different districts. The colored points are coded based on families and sides.
Data: the texts and highlighted names of main character, their background and
relationship. Story: The interaction frequencies and targets of main characters.
Informal classification:
As a visualization in the social media is used individually but can be shared
publicly, the audience is potentially many people on the Internet, typically viewed
individually on their personal devices (although public mass viewings are certainly
possible). The delivery method is thus distributed and asynchronous. Data visu-
alization is a high-bandwidth medium, but the user has full controls. The visual
219
components used include full- text, animation, sound, and data visualizations.The
viewing sequence is in sequence. The data visualization produced is interactive.
However, it is also ephemeral, as it is stored.
A.8.3 Twitter Interactive: How tweets spread
Figure A.82: Twitter Interactive [70]: how world cup news spread.
This example in Figure. A.82 in NYT-smis a twitter application designed to
show the spreading trend for news of the 2010 German soccer world cup. The appli-
cation is running with animation in the speed as the real speed of the news spreading.
Data: The spreading location based on the tweets location of 2010 German soccer
world cup Story: The sequence and speed of news spreading internationally. The
bar chart at the top is to show the frequency of tweets.
Informal classification:
As a visualization in the social media is used individually but can be shared
220
publicly, the audience is potentially many people on the Internet, typically viewed
individually on their personal devices (although public mass viewings are certainly
possible). The delivery method is thus distributed and asynchronous. Data visu-
alization is a high-bandwidth medium, but the user has basic controls. The visual
components used include full- text , animaiton, and data visualizations. The data vi-
sualization produced is interactive. The viewing sequence is in sequence. However,
it is also replicable.
A.9 Augmented Reality
A.9.1 Uber Mobile Visualization
Figure A.83: The architecture for the augmented reality [71] to have a data visu-
alization
This example in Figure. A.83 is an augmented reality application designed
to show the road condition for drivers. The application is running with animation
and data visualization in the speed according to the road condition. Data: The
road condition and traffic information Story: The information on top of the bus
221
is showing the bus schedule and warning the it?s leaving in two minutes. The bar
charts and numbers on the right shows the navigation and weather information.
Informal classification: This example is primarily intended to be general. As
a visualization in the social media is used individually, the audience is potentially
one driver, typically viewed individually on their personal devices. The delivery
method is thus distributed but synchronous. Data visualization is a high-bandwidth
medium, but the user has basic controls. The visual components used include full-
text , animaiton, and data visualizations. The data visualization produced is not
interactive. The viewing sequence is in sequence. However, it is also ephemeral.
222
A.9.2 AR Data Visualization Design
Figure A.84: The architecture for the augmented reality [72] to have a data visu-
alization
223
Figure A.85: How AR [72] is used to visualize the world population
This example in Figure. A.84 and Figure. A.85 is an augmented reality ap-
plication designed to show the data visualization. The application is running with
animation and data visualization with user?s interaction Data: The data visualiza-
tion and annotation of the world population Story: The information are shown as
the bar chart to demonstrate the distribution of the world population.
Informal classification:
As a visualization rendered with personal tools, it is used individually but
can be shared publicly, the audience is potentially one personal who is working
on the dataset, although public mass viewings are certainly possible. The delivery
method is thus distributed and asynchronous. Data visualization is a high-bandwidth
medium, but the user has full controls. The visual components used include full-
text , animaiton, and data visualizations. The viewing sequence is in sequence.
224
However, it is also replicable.
A.9.3 AR 3D Design
Figure A.86: The process for the engineers with augmented reality [73] to design a
building with data visualization
This example in Figure. A.86 is an augmented reality application designed to
show the data visualization combined with real working site. The application is
running with animation and data visualization according to the user?s interaction.
Data: The data visualization, images, and annotation of the world population Story:
The information are shown as frames, line chart, and text to demonstrate the meta
data and expected layout of the finished building.
Informal classification:
225
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially one personal who is working on the
dataset and the actual working site, although public mass viewings are certainly
possible. The delivery method is thus distributed and asynchronous. Data visu-
alization is a high-bandwidth medium, but the user has full controls. The visual
components used include full- text , images, animation, and data visualizations.
The data visualization produced is fully interactive. The viewing sequence is in
sequence. However, it is also replicable.
A.9.4 AR Flight Data
Figure A.87: The process for passengers to view flight data in augmented reality [74]
This example in Figure. A.87 is an augmented reality application designed to
show the data visualization of flight tracking information. The application is run-
ning with animation and data visualization to give users a virtual route of flight
positions. Data: The data visualization, animation, and annotation for flight in-
226
formation Story: The information is shown as animated tracks to show the plane
heights and locations in real time
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially one personal who is working on the
dataset and the actual working site, although public mass viewings are certainly
possible. The delivery method is thus distributed and asynchronous. Data visu-
alization is a high-bandwidth medium, but the user has basic controls. The visual
components used include full- text , images, animation, and data visualizations. The
viewing sequence is in sequence. However, it is also ephemeral.
A.9.5 AR Street Visualization
Figure A.88: Street viewers obtain information from augmented reality [75] on the
street
227
This example in Figure. A.88 is an augmented reality application designed to
show the data visualization combined with real working site. The application is
running with animation and data visualization in the speed according to the user?s
interaction. Data: The data visualization, images, texts, and annotation of the
street view. Story: The information is shown to annotate the street view in real
time.
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially one personal who is working on the
dataset and the actual working site, although public mass viewings are certainly
possible. The delivery method is thus distributed and asynchronous. Data visu-
alization is a high-bandwidth medium, but the user has basic controls. The visual
components used include full- text , images, animation, and data visualizations. The
viewing sequence is in sequence. However, it is also replicable.
228
A.9.6 AR pipeline
Figure A.89: Engineers have the pipes shown with augmented reality [76]
This example in Figure. A.89 is an augmented reality application designed to
show the data visualization combined with production pipeline. The application is
running with animation and data visualization according to the user?s interaction.
Data: The data visualization, images, and annotation of the pipeline and devices.
Story: The information is shown to the supporting information about a pipeline for
its production rate, efficiency, and running condition.
Informal classification:
As a visualization rendered with personal tools, it is used individually but
can be shared publicly, the audience is potentially one personal who is working on
229
the dataset and the actual working site, although public mass viewings are cer-
tainly possible. The delivery method is thus distributed and asynchronous. Data
visualization is a high-bandwidth medium, but the user has no controls. The visual
components used include full- text , images, animation, and data visualizations. The
data visualization produced is not interactive. The viewing sequence is in sequence.
However, it is also replicable.
A.9.7 AR Infrastructure Visualization
Figure A.90: Engineers have the underground infrastructure shown with augmented
reality [77]
230
This example in Figure. A.90 is an augmented reality application designed
to show the data visualization combined with real working site. The application
is running with animation and data visualization in the speed according to the
view. Data: The data visualization, images, and annotation of the street view and
highlighted pipes. Story: The information are shown as frames, line chart, and text
to demonstrate the meta data and expected layout of underground pipes.
Informal classification:
As a visualization rendered with personal tools, it is used individually but
can be shared publicly, the audience is potentially one personal who is working on
the dataset and the actual working site, although public mass viewings are cer-
tainly possible. The delivery method is thus distributed and asynchronous. Data
visualization is a high-bandwidth medium, but the user has no controls. The visual
components used include full- text , images, animation, and data visualizations. The
data visualization produced is not interactive. The viewing sequence is in sequence.
However, it is also ephemeral.
231
A.9.8 AR Bio-Chemical Visualization
Figure A.91: A Bio-chemistry researcher has a structure of molecular shown with
augmented reality [78]
This example in Figure. A.91 is an augmented reality application designed to
show the data visualization of molecular structure of Bio-chemical materials. The
application is running with animation and data visualization according to the user?s
interaction. Data: The data visualization, graph visualization, and annotation of
the molecular structure Story: The information shown is to help the bio-chemical
researchers understand the structures of different molecular.
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
232
be shared publicly, the audience is potentially one personal who is working on the
dataset and the actual working site, although public mass viewings are certainly
possible. The delivery method is thus distributed and asynchronous. Data visual-
ization is a high-bandwidth medium, but the user has basic controls. The visual
components used include full- text , images, animation, and data visualizations.
The data visualization produced is basicly interactive. The viewing sequence is in
sequence. However, it is also replicable.
A.10 Virtual Reality
A.10.1 Adobe VR Data Visualization
Figure A.92: How virtual reality tool [79] is used to visualize the baseball training
data.
This example in Figure. A.92 is an virtual reality application designed to show
the data visualization in virtual space. The application is running with animation,
233
texts, and data visualization according to the user?s interaction. Data: The data
visualization, images, and annotation of page views in different parts of the world.
Story: The information is to show the comparison of different parts of the world in
terms of internet web page view. The trend of each part can be highlighted.
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially one personal who is working on the
dataset and the actual working site, although public mass viewings are certainly
possible. The delivery method is thus distributed and asynchronous. Data visu-
alization is a high-bandwidth medium, but the user has full controls. The visual
components used include full- text , images, animation, and data visualizations.
The data visualization produced is fully interactive. The viewing sequence is in
sequence. However, it is also replicable.
234
A.10.2 VR Baseball training
Figure A.93: How virtual reality tool [80] is used to visualize the baseball training
data.
This example in Figure. A.93 is an virtual reality application designed to
facilitate baseball training by highlighting the baseball trace. The application is
running with animation, texts, and data visualization in the speed according to the
user?s interaction. Data: The data visualization, images, and highlighting of the
baseball trajectory. Story: The information are shown as frames, line chart, and
text to help the user correct the baseball training by showing the correct trajectory.
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially one personal who is working on the
235
dataset and try to improve baseball techniques, although public mass viewings are
certainly possible. The delivery method is thus distributed and asynchronous. Data
visualization is a high-bandwidth medium, but the user has fully controls. The visual
components used include full- text , images, animation, and data visualizations. The
data visualization produced is full interactive. The viewing sequence is in sequence.
However, it is also replicable.
A.10.3 VR Big Data Analysis
Figure A.94: How virtual reality tool [81] is used to visualize big data in 3D.
This example in Figure. A.94 is an virtual reality application designed to fa-
cilitate understanding the natural level globally. The application is running with
animation, texts, and data visualization in the speed according to the user?s inter-
action. Data: The data visualization, images, and map highlighting of the natural
resources. Story: The information is to show the comparison of natural resources of
236
different countries with map visualization.
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially one personal who is working on the
dataset and try to improve baseball techniques, although public mass viewings are
certainly possible. The delivery method is thus distributed and asynchronous. Data
visualization is a high-bandwidth medium, but the user has basic controls. The visual
components used include full- text , images, animation, and data visualizations.
However, it is also replicable.
A.10.4 VR Lens Big Data
Figure A.95: How virtual reality tool [82] is used to visualize big data with virtual
objects.
237
This example in Figure. A.95 is an virtual reality application designed to fa-
cilitate training in the environment of a new industrial facility. The application is
running with animation, texts, and data visualization according to the user?s inter-
action. Data: The data visualization, images, and highlighting of the information
of each devices. Story: The information is shown to show the trainee what is the
condition of each device in a virtual industrial facility.
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially one personal who is working on the
dataset and try to improve baseball techniques, although public mass viewings are
certainly possible. The delivery method is thus distributed and asynchronous. Data
visualization is a high-bandwidth medium, but the user has full controls. The visual
components used include full- text , images, animation, and data visualizations.
The data visualization produced is fully interactive. The viewing sequence is in
sequence. However, it is also replicable.
238
A.10.5 VR Immersive visualization for Big Data
Figure A.96: How virtual reality tool [83] is used to do data analytics.
This example in Figure. A.96 is an virtual reality application designed to facil-
itate the understanding of high dimensional data. The application is running with
animation, texts, and data visualization according to the user?s interaction. Data:
The data visualization, images, and virtual viewer figure. Story: The information is
to show how to visualize high dimensional data generated by the users from a user
interface in 2D.
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially one personal who is working on the
dataset and try to improve baseball techniques, although public mass viewings are
certainly possible. The delivery method is thus distributed and asynchronous. Data
visualization is a high-bandwidth medium, but the user has full controls. The visual
239
components used include full- text , images, animation, and data visualizations.
The data visualization produced is fully interactive. The viewing sequence is in
sequence. However, it is also replicable.
A.10.6 VR Bio-informatics Visualization
Figure A.97: How virtual reality tool [84] is used to visualize gene information with
graph in 3D.
This example in Figure. A.97 is an virtual reality application designed to facili-
tate understanding of the gene sequence. The application is running with animation,
texts, and data visualization according to the user?s interaction. Data: The data
visualization, images, and the connection of each gene sequence. Story: The infor-
mation is to show how a researcher is in the virtual space with label gene sequence
240
surrounding him.
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially one personal who is working on the
dataset and try to improve baseball techniques, although public mass viewings are
certainly possible. The delivery method is thus distributed and asynchronous. Data
visualization is a high-bandwidth medium, but the user has full controls. The visual
components used include full- text , images, animation, and data visualizations.
The data visualization produced is fully interactive. The viewing sequence is in
sequence. However, it is also replicable.
A.10.7 VR Geo Map Visualization
Figure A.98: How virtual reality tool [85] is used to visualize geographical informa-
tion.
241
This example in Figure. A.98 is an virtual reality application designed to facil-
itate comparison of population of certain areas on earth. The application is running
with animation, texts, and data visualization according to the user?s interaction.
Data: The data visualization, images, and map. Story: The information is to show
the population as barchart in 3D on a virtual earth.
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially one personal who is working on the
dataset and try to improve baseball techniques, although public mass viewings are
certainly possible. The delivery method is thus distributed and asynchronous. Data
visualization is a high-bandwidth medium, but the user has full controls. The visual
components used include full- text , images, animation, and data visualizations.
The data visualization produced is fully interactive. The viewing sequence is in
sequence. However, it is also replicable.
242
A.10.8 VR Data Analysis
Figure A.99: How virtual reality tool [86] is used to do data analysis.
This example in Figure. A.99 is an virtual reality application designed to show
the data visualization in virtual space. The application is running with animation,
texts, virtual figure, and data visualization in the speed according to the user?s
interaction. Data: The data visualization, images, and annotation of the a multidi-
mensional data set. Story: The information is shown as frames, charts, and text to
demonstrate how data can be analysis from multiple dimensions with virtual reality.
Informal classification:
As a visualization rendered with personal tools, it is used individually but can
be shared publicly, the audience is potentially multiple users who is working on the
dataset, although one personal user is certainly possible. The delivery method is
243
thus distributed and synchronous. Data visualization is a high-bandwidth medium,
but the user has basic controls. The visual components used include full- text
, images, animation, and data visualizations. The data visualization produced is
fully interactive. The viewing sequence is in sequence. However, it is also replicable.
244
Appendix B: Data Comics Evaluation Protocol
B.1 Evaluation: DataComics vs PowerPoint: Test Cases and Scripts
B.1.1 Twitter Heatmap for Stocks
Figure B.1: Stock heat map in comic style
1. What are the six sections of stocks this story talks about , , ,
, ,
2. Which stock has largest volume for financial sector?
3. Which stock has largest volume for consumer goods section?
245
4. Which stock has largest volume for service section?
5. Which stock has largest volume for technology section?
6. Which section has 30 percent of tweets volume of all sections?
7. Which stock has second largest volume for consumer goods section?
8. Which stock has second largest volume for service section?
9. Which stock has second largest volume for financial section?
B.1.2 the U.S. Census Population pyramid
Figure B.2: US census data in comic style
1. At what year did the baby boom start?
2. What is the level of baby born in 1970s? Boom or Normal (circle one)
246
3. What is the level of baby born in 1980s? Boom or Normal or Keep dropping
(circle one)
4. In 2000s, the population of the baby boom generation is almost the same as
the population aged from to .
5. At which two decades the baby dropped? ,
6. How many babies were born in 1970
7. How many babies were born in 1980
8. How many babies were born in 2000
B.1.3 World Happiness
Figure B.3: World happiness data in comic style
1. How many votes worldwide are there for this happiness data?
247
2. What is the percentage of European participants that are feeling bad at the
moment?
3. What is the percentage of North America participants that are feeling bad at
the moment?
4. What is the percentage of Oceania participants that are feeling bad at the
moment?
5. What is the percentage of Asian participants that are feeling Good at the
moment?
6. Frame six is the comparison of bad feeling participants among five different
continents. Of these continents, which one do you think has the most percent-
age of participants feeling bad?
7. What is the percentage of Oceania participants that are feeling good at the
moment?
8. How many votes in Asia are there for this happiness data?
9. How many votes Oceania are there for this happiness data?
B.1.4 Star Wars
1. We like people to us.
2. Which movie series is this datacomics story talking about ?
248
Figure B.4: Star data in comic style
3. For the male fans of Anakin Skywalker, what are the two personalities they
rate themselves with highest scores? ( and )
4. For the female fans of Anakin Skywalker, what are the three personalities they
rate themselves with highest scores? ( , and )
5. For the female fans of Luke Skywalker, which personality is best rated? .
6. For the male fans of Master Yoda, which two personalities are best rated?
,
7. For the female fans of Master Yoda, which two personalities are the best rated.
,
8. For the male fans of Darth Vader, which personality is best rated?
9. There are eight Star war characters counted in during the like collecting process
249
from the fans. The last frame shows how many times each one personality is
rated with highest score by the male fans. From the bar chart/table, please
tell which personality is rated the highest with the least times.
B.2 Evaluation: Single Frame vs Frame Panels: Test Cases and
Scripts
B.2.1 The Origin of Major Beer Types
Figure B.5: The geo origin of beer around world in sequenced panels
1. What are the countries that most of beer styles are originated from
2. List the styles of Scottish Ale
3. List the styles of Japanese Larger
250
4. Do America and Belgium share any beer styles?
5. Do America and German share any beer styles?
6. Do America and British share any beer styles?
7. What is the country in Asia that mentioned for the styles of beer
8. Does Australia have any beer style? If yes, what are they?
B.2.2 The Arabic-Israel War
Figure B.6: The Arabic-Israel War in sequenced panels
1. How many important persons are involved in the Arab-Israeli conflict?
2. Which conflict happened in the 1980s?
3. In which country/countries did the conflict 6,7,8,9 happened?
251
4. In which country/countries did the conflict 1,2,3,4,5 happened?
5. How many conflict happened in the 1970s
6. How many conflict happened in the 1960s
7. Who is the important figure involved in the conflict happened in 6,7,8,9
8. Who is the important figure involved in the conflict happened in 10, 11, 12
B.2.3 Cell Phone Phishing
Figure B.7: The data about cellphone phished in sequenced panels
1. What is the percentage of adults in U.S. access web via cellphone
2. What is the percentage of adults in U.S check their email 1-3 times with cell
phone
252
3. Is mobile users easier to submit their login information than PC users?
4. How many Facebook users have been attacked by Phishing sites
5. Is there any scam involving IRS and tax refund mentioned?
6. Is there any scam involving support kids in Africa mentioned?
7. Should we be cautious of the sender of an email regarding preventing from
phishing attacks.
8. Should we install anti-phishing apps on cell phones to prevent phishing attacks
B.2.4 Global Wealthy Population
Figure B.8: The data about global wealthy population in sequenced panels
1. Which area has the most number of wealthy people among all the continents?
253
2. What is the number of millionaires worth more than 30M dollar?
3. What is the percentage of millionaires of the global population?
4. Which city in Europe has the most millionaires?
5. Among North America and Asia, which city has the most millionaires?
6. Which city in Asia has the most millionaires?
7. Which area has the fastest rate of growth of number of millionaires?
8. What kind of pattern we can obtain in term of the relation between number
and worth of millionaires?
B.3 Evaluation: Expert Review
The experts are interviewed with the following questions:
1. Does the system help storytelling
2. If yes, how does it help storytelling
3. Do you think the partitioning helps storytelling and understanding of the story
4. Please explain how does the system help storytelling and user to understand
the story
254
Appendix C: Data TV Evaluation Protocol
During the process we use the basic expert review method proposed by Tory
and Mo?ller [204], which includes experts evaluating a tool using pre-defined heuris-
tics. We had two datasets prepared for the experts.
C.1 Questions and Protocol
The experts are given but not restricted to a few directions for recommendation
as part of the training process:
1. What do you think about overall of the interface?
2. What do you think about ease of use of the interface?
3. What do you think about the functions of the system?
The experts are interviewed with the following questions:
1. Do you think this system will help storytelling?
2. If yes, how does it help?
3. Which functions do you think help you the most
4. Which part of the system do you have the most impression
255
5. Which part of the system do you not like the most
6. Any suggestion for the improvement of the system
C.2 Data Sets
Each participant?s task was to create data videos using DataTV and the data
visualization randomly assigned to them. Participants were required to use at least
one interactive visualization in their video.
The resulting data video would ideally make use of multiple media types,
interactive visualization, and innovative storytelling techniques.
C.2.1 Data Set One
For this data set, we use TimeFork [207], an interactive visual prediction tech-
nique to support users exploring the future of time-series data. Here the expert use
TimeFork to create a narrative for tech market stocks (here Apple and Netflix).
C.2.2 Data Set Two
The CommentIQ [208] system is designed to help community moderators man-
age large amount of comments associated with online articles by automatically rank-
ing them based on criteria such as relevance, readability, personal experience, and
length.Here the expert wants to author a streaming data video about the commu-
nity response on an article from The New York Times1 titled ?City Reacts: State of
1http://www.nytimes.com/
256
Emergency? during the 2015 racial unrest in Ferguson, Missouri following the death
of Michael Brown at the hands of a white policeman.
257
Bibliography
[1] Zhenpeng Zhao, Rachael Marr, Jason Shaffer, and Niklas Elmqvist. Under-
standing partitioning and sequence in data-driven storytelling. In Proceedings
of the iConference, 2019. To appear.
[2] Edward Segel and Jeffrey Heer. Narrative visualization: Telling stories
with data. IEEE Transactions on Visualization and Computer Graphics,
16(6):1139?1148, 2010.
[3] John Snow. 1854 Broad Street cholera outbreak. 1855.
[4] Charles Joseph Minard. Graphic Storytelling and Visual Narrative. 1812.
[5] Hurricane irma news, 2017. https://www.youtube.com/watch?v=
KDmFbAhlh3w.
[6] Is height all in our gene, 2018. https://www.youtube.com/watch?v=
0cuO5OSDMbw&t=328s.
[7] Ancient greece in 18 minutes, 2017. https://www.youtube.com/watch?v=
gFRxmi4uCGo.
[8] The history of asia: every year, 2017. https://www.youtube.com/watch?v=
c8TNvvjoqvw.
[9] Wealth inequality in america, 2012. https://www.youtube.com/watch?v=
QPKKQnijnsM.
[10] The joy of stats, 2010. https://www.youtube.com/watch?v=jbkSRLYSojo.
[11] Religions and babies, 2012. https://www.youtube.com/watch?v=
ezVk1ahRF78.
[12] Gene pool decline, 2018. https://www.youtube.com/watch?v=k2N4ZO57fjE.
258
[13] How to end poverty, 2015. https://www.youtube.com/watch?v=5JiYcV_
mg6A.
[14] China?s geography problem, 2017. https://www.youtube.com/watch?v=
GiBF6v5UAAE.
[15] Imaginary numbers are real, 2015. https://www.youtube.com/watch?v=
T647CGsuOVU.
[16] Big data revolution, 2013. https://www.youtube.com/watch?v=
bIY3LUZ7i8Y.
[17] The truth about population, 2013. https://www.youtube.com/watch?v=
QpdyCJi3Ib4.
[18] Inside the mind of a master procrastinator, 2016. https://www.youtube.
com/watch?v=arj7oStGLkU&t=588s.
[19] How data will transform business, 2014. https://www.youtube.com/watch?
v=EHTmxmuhZ10.
[20] Will saving poor children lead to overpopula-
tion, 2012. https://www.gapminder.org/answers/
will-saving-poor-children-lead-to-overpopulation/.
[21] Phd comic your life ambition, 2017. http://phdcomics.com/comics/
archive.php?comicid=1012.
[22] Nfl player data report, 2017. https://public.tableau.com/profile/
mikevizneros#!/vizhome/IsThatRight/IsThatTrue.
[23] Comic style dashboard, 2012. https://atrowpoole.wordpress.com/
portfolio/data-visualization/.
[24] Infographic comic, 2010. https://xkcd.com/681/.
[25] The new york restaurant vis, 2017. https://www.menglugao.com/blog/
2017/12/7/new-york-city-restaurants-data-visualization.
[26] Marvel vs dc comics, 2018. http://
jobloving.com/infographics/data-visualization/
data-visualization-infographics-marvel-vs-dc-comics/.
[27] Body cartoon, 2018. https://www.cartoonstock.com/cartoonview.asp?
catref=jcen1296/.
[28] Spider man comic visualization, 2018. https://vignette.wikia.nocookie.
net/marveldatabase/images/b/bc/Iron_Spider_Armor_V1.1_from_
Official_Handbook_of_the_Marvel_Universe_Vol_5_Spider-Man_-_
Back_in_Black.jpg/revision/latest?cb=20141204030850/.
259
[29] Cell phone comic, 2018. https://vignette.wikia.nocookie.net/
marveldatabase/images/b/bc/Iron_Spider_Armor_V1.1_from_Official_
Handbook_of_the_Marvel_Universe_Vol_5_Spider-Man_-_Back_in_
Black.jpg/revision/latest?cb=20141204030850/.
[30] Linear regression comic, 2018. https://xkcd.com/1725/.
[31] Comic for vocation stress, 2016. https://twitter.com/pvermeul_peter/
status/756524771702149121.
[32] Comic for desk entropy, 2005. http://phdcomics.com/comics/archive.
php?comicid=575.
[33] Comic for phd grooming, 2010. https://
strangelyincoherentloveletters.files.wordpress.com/2010/12/
phd061209s.gif.
[34] Comic for phd procrastination, 2016. https://substance.etsmtl.ca/en/
power-procrastination-according-phd-comics.
[35] A day of an american life, 2019. https://flowingdata.com/2019/04/02/
data-comic-shows-an-average-american-day/.
[36] Curve fitting comic, 2018. https://xkcd.com/2048/.
[37] Seashell comic, 2018. https://xkcd.com/1236/.
[38] Strikeouts on the rise, 2017. http://www.nytimes.com/interactive/2013/
03/29/sports/baseball/Strikeouts-Are-Still-Soaring.html.
[39] 1.5 million missing black men, 2017. https://www.nytimes.com/
interactive/2015/04/20/upshot/missing-black-men.html.
[40] Bum Chul Kwon, Florian Stoffel, Dominik Jckle, Bongshin Lee, and Daniel
Keim. Visjockey: Enriching data stories through orchestrated interactive vi-
sualization. In Proceedings of the Symposium on Computation+Journalism,
January 2014.
[41] Donghao Ren, Matthew Brehmer, Bongshin Lee, Tobias Hollerer, and Eun Ky-
oung Choe. ChartAccent: Annotation for data-driven storytelling. In Pro-
ceedings of the IEEE Pacific Symposium on Visualization. IEEE, April 2017.
[42] Zhenpeng Zhao, William Benjamin, Niklas Elmqvist, and Karthik Ramani.
Sketcholution. International Journal of Human-Computer Studies, 82(C):11?
20, October 2015.
[43] Datasketches?royal constellations, 2010. http://www.datasketch.es/
october/code/nadieh/.
260
[44] Datasketches?carcaptor sakura, 2010. http://www.datasketch.es/june/
code/nadieh/.
[45] The big short movie explained animated, 2016. https://www.youtube.com/
watch?v=UFlHwkiAmyU.
[46] Population decline in new orleans, 2017. http://www.nytimes.com/
interactive/2011/02/03/us/0203-nat-census-orleans.html.
[47] The ny times top 10 bestsellers, 2017. http://blog.leeandlow.com/2013/
12/10/wheres-the-diversity-the-ny-times-top-10-bestsellers-list/.
[48] Non profit revenue report, 2017. goo.gl/HsvQDB.
[49] Attack of the little people, 2011. https://graphicviolence.wordpress.
com/2011/09/18/attack-of-the-little-people/.
[50] London marches, 2018. https://www.slow-journalism.com/
infographics/infographic-londons-largest-protest-marches.
[51] Yemen civil war, 2018. hhttps://www.slow-journalism.com/
infographics/map-the-conflict-in-yemen-june-2018.
[52] Global space industry, 2018. https://www.slow-journalism.com/
infographics/infographic-starship-enterprises-charting-the-new-space-race.
[53] Air pollution for household, 2016. https://www.who.int/airpollution/
infographics/Air-pollution-INFOGRAPHICS-English-5-1200px.jpg?
ua=1.
[54] Air pollution linked death, 2016. https://www.who.int/airpollution/
infographics/Air-pollution-INFOGRAPHICS-English-2-1200px.jpg?
ua=1.
[55] North and south korean comparison, 2012.
https://www.slow-journalism.com/infographics/
the-great-divide-north-and-south-korea-compared.
[56] In the shadow of foreclosure, 2017. http://infographicsnews.blogspot.
com/2009/02/.
[57] Words democrats and republican used, 2011. https://archive.nytimes.
com/www.nytimes.com/interactive/2008/09/04/us/politics/20080905_
WORDS_GRAPHIC.html?_r=0.
[58] Uk and us firearms, 2014. https://www.statista.com/chart/2628/
police-firearms-discharges/.
[59] White correspondent dinner, 2015. https://www.6sqft.com/
what-nycs-population-looks-like-day-vs-night/.
261
[60] Day and night: Nyc population, 2015. https://www.6sqft.com/
what-nycs-population-looks-like-day-vs-night/.
[61] Who owns everything: Big data today, 2015. https://www.6sqft.com/
what-nycs-population-looks-like-day-vs-night/.
[62] Big welsh coast walk, 2015. https://graphs.net/easel-ly-infographics.
html.
[63] Hangry usa, 2015. https://kfbk.iheart.com/content/
2018-01-10-california-is-hangry-are-you/.
[64] Nyc celebrity map, 2015. https://www.addressreport.com/blog/
nyc-celebrity-map-star-map/.
[65] Bungie, inc.,halo: Reach, 2010. Microsoft Game Studios.
[66] Treyarch, call of duty: Black ops, 2010. Activision.
[67] Starcraft ii: Wings of liberty, 2010. Blizzard.
[68] Twittersheep, 2017. http://www.twittersheep.com/.
[69] Game of thrones discussion of twitter, 2017. http://www.twittersheep.
com/.
[70] How tweets spread, 2017. https://interactive.twitter.com/tenyears/
#?lang=EN.
[71] Uber mobile visualization, 2016. https://hackernoon.com/
can-augmented-reality-solve-mobile-visualization-f06c008f8f84.
[72] Ar data visualization design, 2016. http://www.jolamux.com/v3.5/works/
ar/dataViz.html.
[73] Ar 3d design, 2010. https://www.researchgate.net/figure/
A-conceptual-image-of-AR-overlay-of-3D-design-and-contextual-data-Dunston-Shin-2009_
fig4_221906912.
[74] Ar flight data, 2016. https://hololens.reality.news/news/
holoflight-turns-flight-data-into-cool-mixed-reality-visualizations-0173138/.
[75] Ar street visualization, 2017. https://hackernoon.com/
silent-augmented-reality-f0f7614cab32.
[76] Ar pipeline, 2017. http://thearea.org/
augmented-reality-and-the-internet-of-things-boost-human-performance/.
262
[77] Ar underground infrastructure, 2018. https://
communities.bentley.com/other/old_site_member_
blogs/bentley_employees/b/stephanecotes_blog/posts/
augmentation-of-subsurface-utilities-the-problem-of-spatial-perception.
[78] Ar bio-chemical vis, 2018. https://ideastations.org/science-matters/
science-news/augmented-reality-revolutionizes-surgery-and-data-visualization-vcu.
[79] Adobe vr 3d design, 2018. https://edgylabs.com/
project-new-view-leverages-vr-ai-tools-for-3d-immersive-data-visualization.
[80] Vr baseball training, 2018. https://www.baseballamerica.com/stories/
better-data-equals-better-training-with-trinityvr/.
[81] Vr big data visualization, 2016. https://www.youtube.com/watch?v=
wacNaAVGXdU.
[82] Vr lens for big data, 2016. https://www.datanami.com/2015/03/09/
a-virtual-reality-lens-for-big-data-visualization/.
[83] C. Donalek, S. G. Djorgovski, A. Cioc, A. Wang, J. Zhang, E. Lawler, S. Yeh,
A. Mahabal, M. Graham, A. Drake, S. Davidoff, J. S. Norris, and G. Longo.
Immersive and collaborative data visualization using virtual reality platforms.
In Proceedings of the IEEE International Conference on Big Data, pages 609?
614, Oct 2014.
[84] Vr bio-informatics, 2017. https://vrsconference.
com/2017/10/not-playing-games-enterprise-vr/
bio-informatics-data-visualization-at-greenlight-insights-vrs-2017/.
[85] Vr geo map visualization, 2015. https://ocean.sagepub.com/blog/2018/
6/20/experimenting-with-data-visualization-in-vr/.
[86] Vr data analysis, 2018. https://www.springwise.com/
vr-enables-immersive-3d-data-analysis/.
[87] Stuart K. Card, Jock D. Mackinlay, and Ben Shneiderman, editors. Readings
in Information Visualization: Using Vision to Think. Morgan Kaufmann, San
Francisco, CA, 1999.
[88] Zachary Pousman, John T. Stasko, and Michael Mateas. Casual informa-
tion visualization: Depictions of data in everyday life. IEEE Transactions on
Visualization and Computer Graphics, 13(6):1145?1152, 2007.
[89] Jean-Daniel Fekete and Catherine Plaisant. Interactive information visualiza-
tion of a million items. In Proceedings of the IEEE Symposium on Information
Visualization, pages 117?124, 2002.
263
[90] Chris Stolte, Diane Tang, and Pat Hanrahan. Polaris: A system for query,
analysis, and visualization of multidimensional relational databases. IEEE
Transactions on Visualization and Computer Graphics, 8(1):52?65, 2002.
[91] Christopher Ahlberg. Spotfire: An information exploration environment. SIG-
MOD Record, 25(4):25?29, 1996.
[92] Qlik. https://www.qlik.com/.
[93] Fernanda Vie?gas and Martin Wattenberg. Communication-minded visualiza-
tion: A call to action. IBM Systems Journal, 45(4):801?812, 2006.
[94] Okyay Kaynak and Shen Yin. Big data for modern industry: Challenges and
trends [point of view]. Proceedings of the IEEE, 103:143?146, 02 2015.
[95] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs,
Charles Roxburgh, and Angela Hung Byers. Big data: The next frontier for
innovation, competition, and productivity. 05 2011.
[96] C.L. Philip Chen and Chun-Yang Zhang. Data-intensive applications, chal-
lenges, techniques and technologies: A survey on big data. Information Sci-
ences, 275:314 ? 347, 2014.
[97] Ahmed Oussous, Fatima-Zahra Benjelloun, Ayoub Ait Lahcen, and Samir
Belfkih. Big data technologies: A survey. Journal of King Saud University -
Computer and Information Sciences, 30(4):431 ? 448, 2018.
[98] Sachchidanand Singh and Nirmala Singh. Big data analytics. 2012 Interna-
tional Conference on Communication, Information & Computing Technology
(ICCICT), pages 1?4, 2012.
[99] J. Ahrens, B. Hendrickson, G. Long, S. Miller, R. Ross, and D. Williams. Data-
intensive science in the us doe: Case studies and future challenges. Computing
in Science Engineering, 13(6):14?24, Nov 2011.
[100] Katie Shilton. Values and ethics in human-computer interaction. Foundations
and Trends HumanComputer Interaction, 12(2):107?171, 2018.
[101] Andrej Zwitter. Big data ethics. Big Data & Society, 1(2):2053951714559253,
2014.
[102] Stephen Kaisler, Frank Armour, J. Alberto Espinosa, and William Money.
Big data: Issues and challenges moving forward. In Proceedings of the Hawaii
International Conference on System Sciences, pages 995?1004, Washington,
DC, USA, 2013.
[103] S. Sagiroglu and D. Sinanc. Big data: A review. In 2013 International Con-
ference on Collaboration Technologies and Systems (CTS), pages 42?47, May
2013.
264
[104] Griffin K. Gerhardt, B. and R. Klemann. Unlocking value in the fragmented
world of big data analytics. Cisco Internet Business Solutions Group, 2012.
[105] Vasant Dhar. Data science and prediction. Communactions of the ACM,
56(12):64?73, 2013.
[106] Hours uploaded, 2014. https://tubularinsights.com/
300-hours-video-youtube-advertisers/.
[107] Hours watched, 2014. https://techcrunch.com/2017/02/28/
people-now-watch-1-billion-hours-of-youtube-per-day/.
[108] Cellphone more powerful than old NASA computers,
2014. https://www.zmescience.com/research/technology/
smartphone-power-compared-to-apollo-432/.
[109] Michael J. Pazzani and Daniel Billsus. Content-Based Recommendation Sys-
tems, pages 325?341. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
[110] James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor
Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston,
and Dasarathi Sampath. The YouTube video recommendation system. In Pro-
ceedings of the ACM Conference on Recommender Systems, pages 293?296,
New York, NY, USA, 2010. ACM.
[111] Andreas Wittig and Michael Wittig. Amazon Web Services in Action. Man-
ning Publications Co., Greenwich, CT, USA, 1st edition, 2015.
[112] Sherif Talaat. Pro PowerShell for Microsoft Azure. Apress, Berkely, CA, USA,
1st edition, 2015.
[113] Asit K. Mishra, Joseph L. Hellerstein, Walfredo Cirne, and Chita R. Das. To-
wards characterizing cloud backend workloads: Insights from google compute
clusters. SIGMETRICS Performance Evaluation Review, 37(4):34?41, March
2010.
[114] Jill Freyne and Barry Smyth. Visualization for the masses: Learning from the
experts. In Case-Based Reasoning. Research and Development, pages 111?125,
Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
[115] Digit divide. https://www.internetworldstats.com/links10.htm.
[116] Digit divide definition. https://cs.stanford.edu/people/eroberts/
cs181/projects/digital-divide/start.html.
[117] Phone vs computer, 2014. https://www.phonearena.com/news/
A-modern-smartphone-or-a-vintage-supercomputer-which-is-more-powerful_
id57149.
265
[118] Murray Campbell, A. Joseph Hoane, and Feng hsiung Hsu. Deep blue. Arti-
ficial Intelligence, 134(1):57 ? 83, 2002.
[119] Robert Kosara. Story points in Tableau Software. Keynote at Tableau Cus-
tomer Conference, September 2013.
[120] Spotfire. https://www.tibco.com/products/tibco-spotfire.
[121] Peter Fox and James Hendler. Changing the equation on scientific data visu-
alization. Science, 331(6018):705?708, 2011.
[122] N Erdemir. The effect of PowerPoint and traditional lectures on students?
achievement in physics. Journal of Turkish Science Education, 8:176?189, 09
2011.
[123] Dan Murray. Tableau Your Data!: Fast and Easy Visual Analysis with Tableau
Software. Wiley Publishing, 1st edition, 2013.
[124] Stephen Few. Now You See It: Simple Visualization Techniques for Quanti-
tative Analysis. Analytics Press, USA, 1st edition, 2009.
[125] Pu Shen. The p/e ratio and stock market performance. Economic Review,
pages 23?36, 01 2000.
[126] M Silver, T Sakata, H C Su, C Herman, Steven Dolins, and M J O?Shea. Case
study: How to apply data mining techniques in a healthcare data warehouse.
Journal of Healthcare Information Management, 15:155?64, 02 2001.
[127] Miriam Lux. Visualization of financial information. In Proceedings of the
Workshop on New Paradigms in Information Visualization and Manipulation,
pages 58?61, New York, NY, USA, 1997. ACM.
[128] Nathalie Henry Riche, Christophe Hurter, Nicholas Diakopoulos, and Sheelagh
Carpendale. Data-Driven Storytelling. A. K. Peters, Ltd., Natick, MA, USA,
1st edition, 2018.
[129] Colin Ware. Information Visualization: Perception for Design. Morgan Kauf-
mann Publishers Inc., San Francisco, CA, USA, 2004.
[130] Steven F. Roth. Capstone address: Visualization as a medium for capturing
and sharing thoughts. In Proceedings of the IEEE Symposium on Information
Visualization, 2004.
[131] Peter R. Keller and Mary M. Keller. Visual Cues: Practical Data Visualiza-
tion. IEEE Computer Society Press, Los Alamitos, CA, USA, 1994.
[132] Chun-houh Chen, Wolfgang Hrdle, Antony Unwin, Chun-houh Chen, Wolf-
gang Hrdle, and Antony Unwin. Handbook of Data Visualization (Springer
Handbooks of Computational Statistics). Springer-Verlag TELOS, Santa Clara,
CA, USA, 1 edition, 2008.
266
[133] Michael Friendly. A brief history of data visualization. Handbook of Compu-
tational Statistics: Data Visualization, III, 2007.
[134] Thomas A. Defanti and Maxine D. Brown. Visualization in scientific comput-
ing. volume 33 of Advances in Computers, pages 247?307. Elsevier, 1991.
[135] Dan Murray. Tableau Your Data!: Fast and Easy Visual Analysis with Tableau
Software. Wiley Publishing, 1st edition, 2013.
[136] M. Adil Yalcin, Niklas Elmqvist, and Benjamin B. Bederson. Keshif: Rapid
and expressive tabular data exploration for novices. IEEE Transactions on
Visualization and Computer Graphics, 2017.
[137] Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. D3: Data-driven
documents. IEEE Transactions on Visualization and Computer Graphics,
17(12):2301?2309, 2011.
[138] Google charts api, 2007. https://developers.google.com/chart/.
[139] Arvind Satyanarayan, Kanit Wongsuphasawat, and Jeffrey Heer. Declarative
interaction design for data visualization. In Proceedings of the ACM Sympo-
sium on User Interface Software and Technology, pages 669?678, 2014.
[140] Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey
Heer. Vega-lite: A grammar of interactive graphics. IEEE Transactions on
Visualization and Computer Graphics, 23(1):341?350, 2017.
[141] Andy Kirk. Data Visualization: A Successful Design Process. Community
experience distilled. Packt Pub., 2012.
[142] Martin Wattenberg. Baby names, visualization, and social data analysis. In
Proceedings of the IEEE Symposium on Information Visualization, pages 1?7,
2005.
[143] Many eyes, 2007. http://www.boostlabs.com/
ibms-many-eyes-online-data-visualization-tool/.
[144] Infogram, 2017. https://infogram.com/.
[145] Jonathan Gottschall. The Storytelling Animal: How Stories Make Us Human.
Mariner Books, 2012.
[146] Roger C. Schank and Robert P. Abelson. Knowledge and memory: The real
story. In Jr. Robert S. Wyer, editor, Knowledge and Memory: The Real Story,
pages 1?85, Hillsdale, NJ, 1995. Lawrence Erlbaum Associates.
[147] Jan Vansina. Oral Tradition as History. University of Wisconsin Press, Madi-
son, WI, 1985.
267
[148] Thomas M. Leitch. What Stories Are: Narrative Theory and Interpretation.
Pennsylvania State University Press, University Park, PA, 1986.
[149] Will Eisner. Graphic Storytelling and Visual Narrative. W. W. Norton &
Company, 2008.
[150] David Sless. Learning and Visual Communication. Wiley, 1981.
[151] Benjamin Bach, Natalie Kerracher, Kyle Wm. Hall, Sheelagh Carpendale,
Jessie Kennedy, and Nathalie Henry Riche. Telling stories about dynamic
networks with Graph Comics. In Proceedings of the ACM Conference on
Human Factors in Computing Systems, pages 3670?3682, 2016.
[152] Zhenpeng Zhao, Rachael Marr, and Niklas Elmqvist. Data comics: Sequential
art for data-driven storytelling. Technical Report HCIL-15-15, University of
Maryland, College Park, October 2015.
[153] Fereshteh Amini, Nathalie Henry Riche, Bongshin Lee, Christophe Hurter, and
Pourang Irani. Understanding data videos: Looking at narrative visualization
through the cinematography lens. In Proceedings of the ACM Conference on
Human Factors in Computing Systems, pages 1459?1468, 2015.
[154] Fereshteh Amini, Nathalie Henry Riche, Bongshin Lee, Andres Monroy-
Herna?ndez, and Pourang Irani. Authoring data-driven videos with dataclips.
IEEE Transactions on Visualization and Computer Graphics, 23(1):501?510,
2017.
[155] Robert L. Harris. Information Graphics: A Comprehensive Illustrated Refer-
ence. Oxford University Press, Oxford, United Kingdom, 1999.
[156] Scott McCloud. Comics: A medium in transition. Computer Graphics Forum,
30(3):xiii, 2011.
[157] Caroline Ziemkiewicz and Robert Kosara. Embedding Information Visualiza-
tion within Visual Representation. Springer Berlin Heidelberg, Berlin, Heidel-
berg, 2009.
[158] Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, and etc. Visual An-
alytics: Definition, Process, and Challenges, pages 154?175. Springer Berlin
Heidelberg, Berlin, Heidelberg, 2008.
[159] James J. Thomas and Kristin A. Cook, editors. Illuminating the Path: The
Research and Development Agenda for Visual Analytics. IEEE Computer
Society, 2005.
[160] B. Shneiderman. The eyes have it: a task by data type taxonomy for in-
formation visualizations. In Proceedings of the IEEE Symposium on Visual
Languages, pages 336?343, Sept 1996.
268
[161] Melanie Tory and Torsten Moller. Rethinking visualization: A high-level tax-
onomy. In Proceedings of the IEEE Symposium on Information Visualization,
pages 151?158, 2004.
[162] Stuart K. Card and Jock Mackinlay. The structure of the information visual-
ization design space. In Proceedings of the IEEE Conference on Visualization,
pages 92?99, Oct 1997.
[163] Florence Nightingale. The causes of mortality in the army in the East.
[164] Jeremy Boy, Ronald A. Rensink, Enrico Bertini, and Jean-Daniel Fekete. A
principled way of assessing visualization literacy. IEEE Transactions on Vi-
sualization and Computer Graphics, 20(12):1963?1972, 2014.
[165] Fernanda B. Vie?gas, Scott Golder, and Judith Donath. Visualizing email
content: portraying relationships from conversational histories. In Proceedings
of the ACM Conference on Human Factors in Computing Systems, pages 979?
988, 2006.
[166] Doantam Phan, Andreas Paepcke, and Terry Winograd. Progressive multiples
for communication-minded visualization. In Proceedings of Graphics Interface,
pages 225?232, 2007.
[167] Nahum D. Gershon and Ward Page. What storytelling can do for information
visualization. Communications of the ACM, 44(8):31?37, 2001.
[168] Nick Diakopoulos, Joan DiMicco, Jessica Hullman, Karrie Karahalios, and
Adam Perer. Telling stories with data: The next chapter?a visweek 2011
workshop, 2011.
[169] Joan DiMicco, Matt McKeon, and Karrie Karahalios. Telling stories with
data?a visweek 2010 workshop, 2010.
[170] Jessica Hullman and Nicholas Diakopoulos. Visualization rhetoric: Framing
effects in narrative visualization. IEEE Transactions on Visualization and
Computer Graphics, 17(12):2231?2240, 2011.
[171] Bongshin Lee, Rubaiat Habib Kazi, and Greg Smith. SketchStory: Telling
more engaging stories with data through freeform sketching. IEEE Transac-
tions on Visualization and Computer Graphics, 19(12):2416?2425, 2013.
[172] Waqas Javed and Niklas Elmqvist. ExPlates: Spatializing interactive analysis
to scaffold visual exploration. Computer Graphics Forum, 32(3pt4):441?450,
2013.
[173] Jessica Hullman, Steven M. Drucker, Nathalie Henry Riche, Bongshin Lee,
Danyel Fisher, and Eytan Adar. A deeper understanding of sequence in
narrative visualization. IEEE Transactions on Visualization and Computer
Graphics, 19(12):2406?2415, 2013.
269
[174] Jing Jin and Pedro A. Szekely. Interactive querying of temporal data using
a comic strip metaphor. In Proceedings of the IEEE Symposium on Visual
Analytics Science and Technology, pages 163?170, 2010.
[175] Scott McCloud. Understanding Comics: The Invisible Art. William Morrow
Paperbacks, 1994.
[176] Neil Cohn. The Visual Language of Comics: Introduction to the structure and
cognition of sequential images. Bloomsbury, London, 2014.
[177] Neil Cohn. Beyond speech balloons and thought bubbles: The integration of
text and image. Semiotica, 197:35?63, 2013.
[178] Ariel Dorfman and Armand Mattelart. How to Read Donald Duck: Imperialist
Ideology in the Disney Comic. Intl General, 1984.
[179] Hans-Christian Christiansen. Comics and film: a narrative perspective. In
Anne Magnussen and Hans-Christian Christiansen, editors, Comics & cul-
ture: Analytical and theoretical approaches to comics, pages 107?122. Museum
Tusculanum Press, University of Copenhagen, 2000.
[180] Jing Jin and Pedro A. Szekely. QueryMarvel: A visual query language for
temporal patterns using comic strips. In Proceedings of the IEEE Conference
on Visual Languages and Human-Centered Computing, pages 207?214, 2009.
[181] Nam Wook Kim, Nathalie Henry Riche, Benjamin Bach, Guanpeng Xu,
Matthew Brehmer, Ken Hinckley, Michel Pahud, Haijun Xia, Michael J.
McGuffin, and Hanspeter Pfister. Datatoon: Drawing dynamic network comics
with pen + touch interaction. In Proceedings of the 2019 CHI Conference on
Human Factors in Computing Systems, CHI ?19, pages 105:1?105:12, New
York, NY, USA, 2019. ACM.
[182] Antoni B. Moore, Mariusz Nowostawski, Christopher Frantz, and Christina
Hulbe. Comic strip narratives in time geography. ISPRS International Journal
of Geo-Information, 7(7), 2018.
[183] C. Bryan, K. Ma, and J. Woodring. Temporal summary images: An approach
to narrative visualization via interactive annotation generation and placement.
IEEE Transactions on Visualization and Computer Graphics, 23(1):511?520,
Jan 2017.
[184] Zezhong Wang, Shunming Wang, Matteo Farinella, Dave Murray-Rust,
Nathalie Henry Riche, and Benjamin Bach. Comparing effectiveness and en-
gagement of data comics and infographics. In Proceedings of the 2019 CHI
Conference on Human Factors in Computing Systems, CHI ?19, pages 253:1?
253:12, New York, NY, USA, 2019. ACM.
270
[185] Zezhong Wang, Harvey Dingwall, and Benjamin Bach. Teaching data visual-
ization and storytelling with data comic workshops. In Extended Abstracts of
the 2019 CHI Conference on Human Factors in Computing Systems, CHI EA
?19, pages CS26:1?CS26:9, New York, NY, USA, 2019. ACM.
[186] Benjamin Bach, Zezhong Wang, Matteo Farinella, Dave Murray-Rust, and
Nathalie Henry Riche. Design patterns for data comics. In Proceedings of
the 2018 CHI Conference on Human Factors in Computing Systems, CHI ?18,
pages 38:1?38:12, New York, NY, USA, 2018. ACM.
[187] Sandvine. Global internet phenomena report. Technical report, Sandvine
Incorporated, December 2015.
[188] David M. Ewalt. The ESPN of video games. Forbes, (December 2), 2013.
[189] Stop marine plastic pollution, 2014. https://www.youtube.com/watch?v=
02WjKxk1veQ.
[190] Sorting algorithms shown by dance, 2017. http://www.pdviz.com/
different-sorting-algorithm-demonstrated-with.
[191] Celine Latulipe, David Wilson, Sybil Huskey, Berto Gonzalez, and Melissa
Word. Temporal integration of interactive technology in dance: Creative pro-
cess impacts. In Proceedings of the ACM Conference on Creativity & Cogni-
tion, pages 107?116, 2011.
[192] Ronald M. Baecker. Readings in Groupware and Computer-Supported Coop-
erative Work. Morgan Kaufmann Publishers, San Francisco, 1993.
[193] A beautiful planet, 2016. http://www.imdb.com/title/tt2800050/.
[194] The two americas, 2017. https://www.nytimes.com/interactive/2016/11/
16/us/politics/the-two-americas-of-2016.html?smid=pl-share&_r=0.
[195] Storyfy, 2017. https://storify.com/.
[196] Power bi, 2014. https://powerbi.microsoft.com/en-us/blog/tag/pdf/.
[197] Youtuber pewdiepie. https://www.youtube.com/channel/
UC-lHJZR3Gqxm24_Vd_AJ5Yw.
[198] Robert Kosara and Jock D. Mackinlay. Storytelling: The next step for visu-
alization. IEEE Computer, 46(5):44?50, 2013.
[199] Larry Gonick and Art Huffman. The Cartoon Guide to Physics. HarperPeren-
nial, New York, 1990.
[200] Larry Gonick and Woollcott Smith. The Cartoon Guide to Statistics. Harper-
Collins, New York, 1993.
271
[201] Ben Shneiderman. Tree visualization with tree-maps: A 2-D space-filling
approach. ACM Transactions on Graphics, 11(1):92?99, January 1992.
[202] Alfred Inselberg. The plane with parallel coordinates. The Visual Computer,
1(2):69?91, 1985.
[203] John P. Collomosse, D. Rowntree, and P. M. Hall. Rendering cartoon-style
motion cues in post-production video. Graphical Models, 67(6):549?564, 2005.
[204] Melanie Tory and Torsten Mo?ller. Evaluating visualizations: Do expert re-
views work? IEEE Computer Graphics and Applications, 25(5):8?11, 2005.
[205] Scott Bateman, Regan L. Mandryk, Carl Gutwin, Aaron Genest, David Mc-
Dine, and Christopher A. Brooks. Useful junk?: the effects of visual embel-
lishment on comprehension and memorability of charts. In Proceedings of the
ACM Conference on Human Factors in Computing Systems, pages 2573?2582.
ACM, 2010.
[206] M. A. Borkin, A. A. Vo, Z. Bylinskii, P. Isola, S. Sunkavalli, A. Oliva, and
H. Pfister. What makes a visualization memorable? IEEE Transactions on
Visualization and Computer Graphics, 19(12):2306?2315, Dec 2013.
[207] Sriram Karthik Badam, J. Zhao, N. Elmqvist, and D. S. Ebert. TimeFork:
Interactive prediction of time series. In Proceedings of the ACM Conference
on Human Factors in Computing Systems, pages 5409?5420, 2016.
[208] Deok Gun Park, Simranjit Singh, Nicholas Diakopoulos, and Niklas Elmqvist.
Supporting comment moderators in identifying high quality online news com-
ments. In Proceedings of the ACM Conference on Human Factors in Comput-
ing Systems, pages 1114?1125, 2016.
272