ABSTRACT
Title of Dissertation: AUTOMATING THE DISCOVERY
OF CENSORSHIP EVASION STRATEGIES
Kevin Bock
Doctor of Philosophy, 2022
Dissertation Directed by: Professor Dave Levin
Department of Computer Science
Censoring nation-states deploy complex network infrastructure to regulate
what content citizens can access, and such restrictions to open sharing of infor-
mation threaten the freedoms of billions of users worldwide, especially marginalized
groups. Researchers and censoring regimes have long engaged in a cat-and-mouse
game, leading to increasingly sophisticated Internet-scale censorship techniques and
methods to evade them. In this dissertation, I study the technology that under-
pins this Internet censorship: middleboxes (e.g., firewalls). I argue the following
thesis: It is possible to automatically discover packet sequence modifications that
render deployed censorship middleboxes ineffective across multiple application-layer
protocols.
To evaluate this thesis, I develop Geneva, a novel genetic algorithm that auto-
matically discovers packet-manipulation-based censorship evasion strategies against
nation-state level censors. Training directly against a live adversary, Geneva com-
poses, mutates, and evolves sophisticated strategies out of four basic packet manip-
ulation primitives (drop, tamper, duplicate, and fragment).
I show that Geneva can be effective across different application layer proto-
cols (HTTP, HTTPS+SNI, HTTPS+ESNI, DNS, SMTP, FTP), censoring regimes
(China, Iran, India, and Kazakhstan), and deployment contexts (client-side, server-
side), even in cases where multiple middleboxes work in parallel to perform censor-
ship. In total, I present 112 client-side strategies (85 of which work by modifying
application layer data), and the first ever server-side strategies (11 in total). Finally,
I use Geneva to discover two novel attacks that show that censoring middleboxes can
be weaponized to launch attacks against innocent hosts anywhere on the Internet.
Collectively, my work shows that censorship evasion can be automated and
that censorship infrastructures pose a greater threat to Internet availability than
previously understood.
AUTOMATING THE DISCOVERY
OF CENSORSHIP EVASION STRATEGIES
by
Kevin Bock
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2022
Advisory Committee:
Professor Dave Levin, Chair/Advisor (University of Maryland)
Professor Bobby Bhattacharjee (University of Maryland)
Professor Eric Wustrow (University of Colorado, Boulder)
Professor Michel Cukier (University of Maryland)
Professor John Dickerson (University of Maryland)
? Copyright by
Kevin Bock
2022
Acknowledgments
I first want to thank my advisor, Dave Levin. Dave?s endless support extended
beyond just research, and I am a better student, writer, presenter, researcher, run-
ner, mentor?and most important of all, a better person?from having worked with
him over these years. It is Dave?s enthusiasm for research and making a difference
that got me into graduate school, and his mentorship that got me through. It is
well-known within the lab that Dave?s top priority is his students and bringing out
the best of those students: Dave supported me through many late nights, early
mornings, and everything in between, and I will be forever grateful of his generosity
with his time and energy spanning years.
I want to thank the team of students that worked with me over the years in
the Breakerspace Lab. George Hughey was the first student to join the project, and
I will always appreciate his initial leap of faith and support that got this project
off the ground. Louis-Henri Merino worked with me across multiple projects, mul-
tiple degrees, and multiple academic institutions, and his unwavering support and
dedication to the project was a constant source of energy for me. In roughly chrono-
logical order, thank you to all the students that contributed to the many various
projects under the Geneva umbrella: George Hughey, Louis-Henri Merino, Tania
Arya, Daniel Liscinsky, Regina Pogosian, Gabriel Naval, Kyle Reese, Yair Fax,
Pranav Bharadwaj, Jasraj Singh, Nathan Stiff, Sadena Rishindran, Quinton David-
son, Alden Schmidt, Michael Harrity, Kyle Hurley, Freddy Sell, Brendan Mcmahan,
Amanda Li, Josephine Chow, Katie Sullivan, Melissa Hoff, Sadia Nourin, Aaron
ii
Ortwein, and the other students who chose not to be named here. I have grown
tremendously from having worked with you all, and each of you had a meaningful
impact on the project.
I also want to thank my many collaborators and those who have helped me. At
Colorado Boulder, Eric Wustrow and Abdul Alaraj were close collaborators multiple
projects (including some of the work that comprises this thesis), and Eric served on
my proposal and defense committees. Thank you both for all of your time, energy,
and insights: I am a better researcher and person from having worked with you both.
At Berkeley, I thank Xiao Qiang for lending early support to the Geneva project:
without your help, Geneva may never have left the lab. I thank Neil Spring for
serving on my proposal committee, and thank John Dickerson for serving on both
my proposal and dissertation committee; both of you provided valuable feedback and
support during the process that helped my work grow. I thank Bobby Bhattacharjee
for serving on my proposal and dissertation committees and for giving me the tough
feedback that I needed: your tough love helped me grow as a researcher.
I thank the Open Technology Fund, whose early support and enthusiasm
helped get this work off the ground. I also thank the OONI community for their
support and for the community they have built. There are many other activists
and researchers that contributed their time, networks, and expertise to the project
whom I cannot thank by name here: thank you.
I next want to thank Michel Cukier and the ACES staff. Michel Cukier is the
director of the Advanced Cybersecurity Exeprience for Students (ACES) program
on campus, and welcomed me onto his research team early in my undergraduate
iii
career. Bertrand Sobesto led the research project I was working on, and took me
under his wing for multiple semesters of research. I credit Michel and Bertrand for
first igniting my interest in research, and giving me the space to explore and grow
my research skills as an undergraduate. The entire ACES staff (Michel Cukier, Jan
Plane, Liz Rogers, Bertrand Sobesto, and the many other assistants during my time
in the program) helped to curate and build a solid foundation for me to launch my
academic career: thank you.
I must also thank the CS department for their support throughout my doc-
torate. First, I thank Tom Hurst in the graduate advising office for his endless
patience, kind support, and legendary email response time. Tom was available and
supportive for hours of questions even before I became a graduate student, and it
is his patience and investment in me that helped make me comfortable to first take
the plunge into graduate school. Throughout my degree, Tom handled more ques-
tions, policy edge cases, form submissions, and other academic concerns than I can
possibly count, and did it all with a smile. Thank you, Tom. I thank Sharron McEl-
roy (and the entire Purchasing team) for the tremendous behind the scenes work
keeping our infrastructure up, available, and running smoothly: without you, the
project would not have been possible. I also thank the broader team of personnel
within the department that helped me throughout the process.
I also have a significant support network outside of school that helped me
along the way. Ashton Webster, Daven Patel, and Ryan Eckenrod read early drafts
and gave feedback for every major research paper of my academic career. Baldwin
Mei, Chris Fu, Nick Cataldo, Brian Gross, Caroline Juang, Alex Comerford and
iv
more helped to review and give feedback on multiple papers that comprised this
dissertation. Brian Bock helped review papers, articles, patiently listened to dozens
of technical discussions, and even contributed graphics to the project website. Thank
you all for your help, and for keeping me grounded and sane over the years!
I want to extend my sincere thanks to my family. My parents have always
been incredible role models for me, and my siblings and entire huge family has been
the most supportive squad I could have asked for. My grandparents, Nana and
Papa, were also important role models and an amazing component of my support
network. Thank you all for your endless support and enthusiasm!
Lastly, I want to thank my wife and life-long supporter Sydnee, who has been
endlessly supportive of my graduate pursuits, despite many long days and nights.
You have been my fiercest defender, strongest supporter, a patient sounding board
through more technical discussions than I can count. I love you all with all my
heart.
Grants This dissertation was supported in part by the Open Technology Fund
and NSF grants CNS-1816802 and CNS-1943240.
Collaborations This dissertation involved collaborative efforts with the following
people:
? Chapter 3: My co-authors are George Hughey, Xiao Qiang, and Dave Levin, and
this work appeared in ACM CCS in 2019 [1]. I would also like to thank Ra-
makrishna Padmanabhan, Neil Spring, the Breakerspace lab, and the anonymous
reviewers for their helpful feedback.
v
? Chapter 4: My co-authors are George Hughey, Louis-Henri Merino, Tania Arya,
Daniel Liscinsky, Regina Pogosian, Dave Levin, and this work appeared in ACM
SIGCOMM in 2020 [2]. I would also like to thank my collaborators from the OTF
and OONI communities, who have contributed insights and resources that made
this work possible, and the anonymous reviewers for their helpful feedback.
? Chapter 5: My co-authors are Michael Harrity, Freddy Sell, and Dave Levin, and
this work appeared in USENIX Security 2022. I would also like to thank our
shepherd Paul Pierce, the anonymous reviewers, David Fifield, my collaborators
from the OTF and OONI communities, as well as the University of Maryland
UMIACS IT Staff, who contributed insights and resources that made this work
possible.
? Chapter 6: My co-authors are Yair Fax, Kyle Reese, Jasraj Singh, and Dave
Levin, and this work appeared in USENIX FOCI in 2020 [3]. I would also like to
thank our shepherd David Fifield and the anonymous reviewers for their helpful
feedback. I also thank the OTF and OONI communities who have contributed
insights and resources that made this work possible.
? Chapter 7: My co-authors are Gabriel Naval, Kyle Reese, and Dave Levin, and
this work appeared in SIGCOMM FOCI in 2021 [4]. I would also like to thank the
anonymous reviewers and our shepherd, Rob Jansen, for their helpful feedback.
? Chapter 8: My co-authors are Abdulrahman Alaraj, Yair Fax, Kyle Hurley, Eric
Wustrow, and Dave Levin, and this work appeared in USENIX Security in 2021 [5].
I would also like to thank network infrastructure team at the University of Col-
vi
orado Boulder for supporting our scanning efforts and providing the resources
that made this work possible. I also thank the anonymous reviewers for their
helpful feedback. Finally, I thank our collaborators from the OTF and OONI
communities for contributing resources that enabled this work.
? Chapter 9: My co-authors are Pranav Bharadwaj, Jasraj Singh, Dave Levin, and
this work appeared in USENIX WOOT in 2021 [6]. I would also like to thank our
shepherd Kevin Borgolte and the anonymous reviewers for their helpful feedback;
Will Scott for his support with SP3; and our collaborators from the OTF and
OONI communities for contributing insights and resources that made this work
possible. Also, I thank the anonymous Artifact Evaluators for their diligent efforts.
To acknowledge the many collaborators and supporters that contributed to
this work, I will use the word ?we? within many chapters.
vii
Table of Contents
Acknowledgements ii
Table of Contents viii
List of Tables xii
List of Figures xv
1 Introduction 1
1.1 Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background and Threat Model 10
2.1 Nation-state Censors: Threat Model . . . . . . . . . . . . . . . . . . 10
2.2 Related Work: Measuring Censors . . . . . . . . . . . . . . . . . . . . 13
2.3 Evasion via Packet Manipulation . . . . . . . . . . . . . . . . . . . . 14
2.4 Automating Censorship Evasion . . . . . . . . . . . . . . . . . . . . . 17
2.5 Fuzzing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Discovering Client-side Evasion Strategies with Geneva 21
3.1 Geneva Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Overview and Challenges . . . . . . . . . . . . . . . . . . . . . 22
3.1.2 Geneva?s Genetic Building Blocks . . . . . . . . . . . . . . . . 23
3.1.3 Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Evaluation against real censors . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.2 China: The Great Firewall . . . . . . . . . . . . . . . . . . . . 39
3.3.3 Other Countries . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.4 Training Defunct Strategies . . . . . . . . . . . . . . . . . . . 52
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
viii
4 Server-side Evasion 60
4.1 Client-Side Strategies do not Generalize . . . . . . . . . . . . . . . . . 63
4.2 Server-side Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.1 Geneva Extensions . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.2 Data Collection Methodology . . . . . . . . . . . . . . . . . . 67
4.3 Server-Side Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.1 Server-side Evasion in China . . . . . . . . . . . . . . . . . . . 70
4.3.2 Server-side Evasion in India & Iran . . . . . . . . . . . . . . . 81
4.3.3 Server-side Evasion in Kazakhstan . . . . . . . . . . . . . . . 82
4.4 Multiple Censorship Boxes . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5 Client Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.6 Deployment Considerations . . . . . . . . . . . . . . . . . . . . . . . 91
4.7 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5 Application-Layer Evasion 95
5.1 Application-Layer Censorship Background . . . . . . . . . . . . . . . 98
5.2 Fuzzer Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.1 Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2.2 Manipulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.3 Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.4 Using Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4 HTTP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4.1 Summary Results . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4.2 Evasion Strategies . . . . . . . . . . . . . . . . . . . . . . . . 116
5.4.3 External Validation . . . . . . . . . . . . . . . . . . . . . . . . 124
5.5 DNS Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.7 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6 Censorship-in-Depth: Iran 137
6.1 Iranian Censorship Background . . . . . . . . . . . . . . . . . . . . . 139
6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.3 Protocol Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.3.1 How Iran?s Protocol Filter Works . . . . . . . . . . . . . . . . 141
6.3.2 Whom the Filter Is Applied To . . . . . . . . . . . . . . . . . 144
6.3.3 Protocol Fingerprints . . . . . . . . . . . . . . . . . . . . . . . 147
6.4 Evading the Protocol Filter . . . . . . . . . . . . . . . . . . . . . . . 150
6.4.1 Old Strategies Do Not Apply . . . . . . . . . . . . . . . . . . 150
6.4.2 Evolving New Strategies . . . . . . . . . . . . . . . . . . . . . 151
6.4.3 Discovered Evasion Strategies . . . . . . . . . . . . . . . . . . 153
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
ix
7 Censorship-in-Depth: China?s SNI Censorship 157
7.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.2 Evasion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7.2.1 MB-RA Evasion Strategies . . . . . . . . . . . . . . . . . . . . . 165
7.2.2 Evading MB-RA and MB-R . . . . . . . . . . . . . . . . . . . . . 168
7.3 How does MB-R work? . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.4 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8 Weaponizing Censors for Amplification Attacks 176
8.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.2 Discovering TCP-based Reflection Attacks . . . . . . . . . . . . . . . 184
8.2.1 Automated Discovery of Amplification . . . . . . . . . . . . . 184
8.2.2 Training Methodology . . . . . . . . . . . . . . . . . . . . . . 186
8.2.3 Discovered Amplification Attacks . . . . . . . . . . . . . . . . 187
8.2.3.1 Amplifying Packet Sequences . . . . . . . . . . . . . 188
8.2.3.2 Packet Sequence Modifications . . . . . . . . . . . . 191
8.3 Internet Scanning Methodology . . . . . . . . . . . . . . . . . . . . . 195
8.4 Internet Scanning Results . . . . . . . . . . . . . . . . . . . . . . . . 197
8.4.1 Which strategies work best? . . . . . . . . . . . . . . . . . . . 198
8.4.2 Are these actually amplifiers? . . . . . . . . . . . . . . . . . . 201
8.4.3 Are these middleboxes? . . . . . . . . . . . . . . . . . . . . . . 202
8.4.4 What kind of packets do amplifiers send? . . . . . . . . . . . . 205
8.4.5 Are these national firewalls? . . . . . . . . . . . . . . . . . . . 206
8.4.6 Routing Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.5 ?Mega-amplifiers? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.6 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.7 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.7.1 Middleboxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.7.2 End Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9 Weaponizing Censors for Availability Attacks 222
9.1 Background & Related Work . . . . . . . . . . . . . . . . . . . . . . . 225
9.2 Measurement Methodology . . . . . . . . . . . . . . . . . . . . . . . . 228
9.3 State of Residual Censorship . . . . . . . . . . . . . . . . . . . . . . . 230
9.4 Residual Censorship Attack . . . . . . . . . . . . . . . . . . . . . . . 239
9.4.1 Launching the Attack . . . . . . . . . . . . . . . . . . . . . . . 239
9.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
9.5 Attack Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.6 Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
9.6.1 Censors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
9.6.2 Potential Victims . . . . . . . . . . . . . . . . . . . . . . . . . 253
9.7 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 254
9.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
x
10 Defending Against Geneva 256
10.1 What would it take to defend against Geneva? . . . . . . . . . . . . . 256
10.2 Does Geneva help the censor? . . . . . . . . . . . . . . . . . . . . . . 260
11 Conclusion and Future Work 262
11.1 Immediate Term Challenges . . . . . . . . . . . . . . . . . . . . . . . 262
11.2 Long Term Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Bibliography 268
xi
List of Tables
3.1 Species, subspecies, and variants Geneva found (with success rates)
against the GFW. For readability, we omit all ?send?s from the ge-
netic code (e.g., duplicate(,) is equivalent to duplicate(send,send)).
This is correct, syntactic sugar for Geneva. . . . . . . . . . . . . . . . 57
3.2 Mock censors developed for in-lab training, and strategies Geneva
learned to defeat them. . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Prior work?s effective TCP-based strategies and whether Geneva re-
derived the strategy in the lab or in the wild, regardless of whether
the strategy is still effective. Note that Geneva had no knowledge of
HTTP fields and could not introduce delays into the request. . . . . . 59
4.1 Client locations and protocols used in our experiments. . . . . . . . . 67
4.2 Summary of server-side-only strategies and their success rates. All
of these strategies manipulate only TCP, and yet, against China?s
GFW, their success rates are application-dependent. Kazakhstan?s
HTTPS and Iran?s DNS-over-TCP censorship infrastructure are cur-
rently inactive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1 DNS Open Resolvers we conduct experiments with. All of these open
resolvers are accessible from within China. . . . . . . . . . . . . . . . 110
5.2 HTTP evasion strategies and where they succeed. A strategy is suc-
cessful against a nation if it evades that nation?s censor. A strategy
is successful to a server if it evades in at least one country and is
accepted by the server. CN-H and CN-K stand for the China Head-
ers and China Keyword modes respectively. ?***? denotes a strategy
found against a live server we did not control; though these evade in
some of our tested countries, but do not receive responses from the
servers we tested. This table is continued i Table 5.3. . . . . . . . . . 126
xii
5.3 Continuation of Table 5.2. A strategy is successful against a nation if
it evades that nation?s censor. A strategy is successful to a server if
it evades in at least one country and is accepted by the server. CN-H
and CN-K stand for the China Headers and China Keyword modes
respectively. ?***? denotes a strategy found against a live server we
did not control; though these evade in some of our tested countries,
but do not receive responses from the servers we tested. . . . . . . . . 127
5.4 Summary of the five DNS strategy families we discover that defeat
all three DNS injectors simultaneously, and which DNS resolvers re-
spond to them: Cloudflare (CF), Google (G), Quad9 (Q9), OpenDNS
(OD), CleanBrowsing (CB), ComodoSecure (CS), Verisign (V), and
DNS.Watch (DW). Our system successfully identified strategies for
every DNS resolver, and also identified four more unique variants to
these strategies that only disabled a subset of the injectors. . . . . . . 128
6.1 Top 10 providers for affected IP addresses. . . . . . . . . . . . . . . . 144
6.2 Top 10 providers for unaffected IP addresses . . . . . . . . . . . . . . 145
8.1 TCP-based reflected amplification attacks discovered against 184 Quack
servers. Each packet with the PSH flag set includes an offending HTTP
GET request in the payload. . . . . . . . . . . . . . . . . . . . . . . . 188
8.2 Total data received (GB) from the top 100,000 IP addresses for each
combination of target URL and packet sequence. Bolded is the max-
imum value for each target URL. . . . . . . . . . . . . . . . . . . . . 198
8.3 Number of IP addresses with amplification factor over 100? for each
combination of target URL and packet sequence. Bolded is the max-
imum value for each sequence. . . . . . . . . . . . . . . . . . . . . . 199
8.4 Nation-states with nation-wide censorship infrastructure and the fin-
gerprint they most frequently respond to clients with. Numbers in
parentheses denote packet sizes in bytes. . . . . . . . . . . . . . . . . 206
9.1 The current state of residual censorship, among the countries and
protocols we tested (those that we tested but are not in the table did
not residually censor in our tests). We were unable to reproduce SNI
censorship in China; in that row, we report prior results [7]. *: Iran?s
SNI residual censorship sometimes lasts longer than 180s; in a small
number of our experiments, we found it to last upwards of 5 minutes. 231
xiii
9.2 Success rates in weaponizing each country?s censorship infrastructure
against each victim vantage point from our attacker in Seattle, WA.
(X denotes 100%, 8 denotes 0%, and N/A denotes a location that
does not cross the border of the censor.) Note that the success rates
are not always consistent, even to victims in the same country, or
between censored protocols in each censored regime. Iran is consistent
and reliable; Kazakhstan is consistently unreliable for HTTP, but
consistently reliable for HTTPS. In China, however, the attack was
not always consistent by protocol, victim location, or server location. 239
xiv
List of Figures
4.1 Server-side evasion strategies in China. All of the strategies work
without modifications to the client, and yet they induce client-side
behavior that helps circumvent censorship. (Standard packets at the
beginning and the end are grayed out to emphasize the critical dif-
ferences from normal behavior.) . . . . . . . . . . . . . . . . . . . . . 72
4.2 Server-side evasion strategies that are successful against HTTP in
Kazakhstan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3 Single versus multiple censorship boxes. A standard assumption is
that evasion strategies that work for one application will work for
another within a given country. However, our results indicate that
China?s GFW uses distinct censorship boxes for each protocol, each
with their own network stacks (and bugs). . . . . . . . . . . . . . . . 88
5.1 Structure of an HTTP request for example.com. Note that ? ?
denotes where whitespace is required by the RFC, typically 1 space.
Typically, HTTP Requests contain multiple headers separated by a
\r\n. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 Structure of a DNS request for example.com. Note that the Bit
Flags field (detailed in the lower box) is two bytes wide. Although
DNS requests typically only contain one Question Record, the RFC [8]
allows for multiple DNS Questions to be included with no separator
between them. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Examples of three HTTP strategies we discover. Each of
these strategies defeats censorship for a different censor or mecha-
nism (Header-based in China, in India, and Keyword-based in China). 117
6.1 Iran?s layered censorship system, employing defense in depth. Note
that the order of censorship systems is unknown; this is simply a
graphical depiction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
xv
7.1 A waterfall diagram of the TCP 3-way handshake and the TLS hand-
shake, denoting where the already known MB-RA and newly discovered
MB-R middleboxes act during the connection. Note that MB-R does
not act until deeper in the handshake than MB-RA (and only if MB-RA
does not act), seemingly acting as a backup middlebox for China?s
HTTPS (SNI) censorship. . . . . . . . . . . . . . . . . . . . . . . . . 162
8.1 The maximum amplification factor we obtained per IPv4 address,
based on several Internet-wide scans. (Note: the axes are log-scale.) . 177
8.2 Rank order plot of maximum amplification factor from Quack-identified
IP addresses. The maximum amplification factor was 7,455?. . . . . 186
8.3 Types of attacks we find. Thick arrows denote amplification; red
ones denote packets that trigger amplification. We find that infinite
amplification is caused by (d) routing loops that fail to decrement
TTLs and (e) victim-sustained reflection. . . . . . . . . . . . . . . . . 195
8.4 Rank order plot of the amplification factor received from each IP ad-
dress for the triggering payloads containing www.youporn.com across
all five packet sequences. . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.5 Rank order plot of the amplification factor received from each IP ad-
dress for the ?SYN; PSH+ACK? packet sequence across all seven scanning
payloads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.6 The increase factor in the number of bytes we receive between sending
5 probes and sending 1 probe. 46% of IP addresses responded with
exactly 5? as much data. . . . . . . . . . . . . . . . . . . . . . . . . 202
8.7 The fraction of the top million hosts that we confirm are middle-
boxes, using TTL-limited probe. The small gap at x ? 100,000 and
the large gap in the middle of the plot correspond to networks that
block traceroutes at their borders. Accounting for this, we find in-
jected responses from 82.9% of the top million IP addresses are from
confirmed middleboxes. . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.8 Rank order plot of the amplification factor by country for the www.youporn.com
scan with the ?SYN; PSH+ACK? packet sequence. . . . . . . . . . . . . . 207
8.9 CDF of the increase factor in amplification of candidate looping IP
addresses when scanned with a TTL of 255 and 64. Because the in-
crease factor is affected by the number of hops away an IP address
is, we expect routing loops to have an increase factor of at least 4.
Larger increase factors are further away from our scanner, limiting
the overall amplification factor from our perspective. . . . . . . . . . 210
8.10 The /24 prefixes with at least one routing loop, rank-ordered by the
fraction of their 256 IP addresses that we observe to loop. Of the
2,763 looping prefixes, 54 (2%) have over 90% of their IP addresses
loop, but 1,705 (62%) have only one looping IP address. (Note that
the x-axis is log-scale.) . . . . . . . . . . . . . . . . . . . . . . . . . . 211
xvi
8.11 Attack bandwidth received at two vantage points from a self-sustaining
amplifying IP address, which (based on its block page) appears to be
a component of a Russian ISP?s censorship system. The dashed line
marks when the packet sequence was sent from the second vantage
point. Note how the bandwidth we get from the system is divided
evenly between the vantage points. This experiment supports our
hypothesis that self-sustaining amplification is caused by an infinite
routing loop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
8.12 Rank order plot of amplification factor of two scans for the www.youporn.com
keyword requested with the ?SYN; PSH+ACK? packet sequence: one with
outbound RST and RST+ACK packets being dropped and the other nor-
mally. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
9.1 Vantage points in our experiments. The green dot is our attacker
running SP3 [9]; black dots represent victim vantage points; and
the red dots denote the location of the servers inside the censoring
regimes we studied: China, Iran, and Kazakhstan (outlined in red).
Note that some dots overlap. . . . . . . . . . . . . . . . . . . . . . . . 229
9.2 The relationship between the number of times censorship is triggered
and the reliability of HTTP residual censorship, as measured from our
Beijing 2 vantage point. As the number of times residual censorship
is triggered increases, the reliability improves. (Error bars represent
95% confidence.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
xvii
Chapter 1: Introduction
Many nations around the world today engage in country-wide censorship of
Internet traffic. Although there are many forms of censorship?including politi-
cal pressure [10], outright blocking of certain protocols [11, 12], or simply taking
large swaths of the internet offline [13]?one of the most pervasive form of online
censorship involves in-network monitoring and censoring of forbidden keywords.
China [14], Pakistan [15], and more [10] deploy on-path middleboxes?similar to net-
work intrusion detection systems (NIDS) [16]?that monitor all the Internet traffic
that crosses their borders to detect, tear down, and in some cases outright block net-
work connections that carry a prohibited word, content, or protocol that they view
as threatening. These countries regularly block news, information about women?s
reproductive health, political views that oppose those in power, and recently even
credible allegations of sexual assault against top political officials [17, 18]. Such re-
strictions to open sharing of information threaten the freedoms of billions of users
worldwide, especially marginalized groups.
For years, security researchers have engaged in a cat-and-mouse game, de-
veloping new schemes to evade [16, 19?26] censors, who in turn have developed
increasingly sophisticated countermeasures [12,27?32]. Unfortunately, censors have
1
long had an inherent advantage.
Discovering new censorship evasion techniques has, to date, been a laborious,
manual process. Details of censors? infrastructures and implementations are gen-
erally not made publicly known, and thus researchers typically must first measure
and develop an understanding of how a particular censor works before they can
develop strategies to evade them [23, 24]. Further complicating matters, many of
the middleboxes that power censorship systems operate transparently, adhere to no
open standards, and multiple different middleboxes may run in parallel to censor
the same content [33]. As a result, when a new censorship technique is deployed or
new content is censored, there can often be considerable loss of availability until the
new censorship technique is detected by researchers, measured, reverse-engineered,
and circumvented [34,35].
My insight is to automate the discovery of censorship evasion techniques. Au-
tomated approaches to evasion allow evaders to react quickly to new censorship
techniques or deployments. I focus my thesis on studying the core building blocks
of censorship infrastructures themselves?middleboxes?and how an attacker can
render them ineffective at implementing their network policies. In so doing, I ex-
pose problems that are broader than the censors themselves.
1.1 Thesis
It is possible to automatically discover packet sequence modifications
that render deployed censorship middleboxes ineffective across multiple
2
application-layer protocols.
By ?deployed censorship middleboxes?, I refer to the middleboxes that power
censorship infrastructure that are currently in use as of time of writing. Although
most of this dissertation will focus on nation-state censorship infrastructure, in
Chapter 8, I will also demonstrate attacks on non-nation-state middleboxes. By
?ineffective?, I specifically mean ?not correctly implementing its policy?, and I fore-
see two categories in which this failure can occur: Either a middlebox can fail to
correctly censor a connection when it should, or it can incorrectly try to censor
an innocuous connection. In this dissertation, I will demonstrate both cases across
multiple protocols and across multiple nation-state censorship systems. I will also
discuss what my results suggest about the limits of this approach.
To evaluate this broad thesis statement, I decompose it into the following
research questions:
? Is it possible to automate the discovery of censorship evasion through client-
side manipulation of IP and TCP headers?
? Is censorship evasion possible without requiring clients inside of censoring
regimes to take any anti-censorship measures whatsoever? Can these server-
side evasion strategies be discovered automatically?
? Is it possible to automate the discovery of censorship evasion through client-
side manipulation of application-layer data?
? Can automatically-discovered evasion strategies allow researchers to gain novel
3
insights into how censorship infrastructures operate?
? Is it possible for attackers to weaponize censorship infrastructures, and can
those attacks be discovered in an automated way?
My work answers each of these questions in the affirmative, thereby collectively
proving my thesis. Moreover, I prove my thesis constructively, resulting in various
open source tools and other contributions, which I summarize next.
1.2 Contributions
Constructively proving this hypothesis leads to the following contributions:
Geneva, a new open-source tool for automating the discovery of censorship
evasion strategies. Geneva demonstrates that it is possible to automate the
discovery of censorship evasion strategies, even against a black-box adversary. I
developed Geneva and released it open-source, and its extensibility has enabled us
to successfully respond quickly to new censorship events [3, 4, 36].
Discovery of the first server-side evasion strategies. Until this work, censor-
ship evasion always required the client to do something in order to evade censorship
(such as to install or configure anti-censorship software). This is because it is more
difficult to discover strategies that work from the server side because there is little
opportunity for the server to influence the state of the connection.
This work presents the first known censorship evasion strategies that work
exclusively from the server, enabling servers outside of censoring regimes to subvert
4
censorship on users? behalf. Server-side strategies can be easier to deploy in real-
world settings, as modification of packet headers typically requires elevated privileges
that are difficult to attain on mobile devices.
Discovery of the first TCP-based reflected amplification attack. To
date, almost all reflected amplification attacks have leveraged UDP. This is because
launching non-trivial (going beyond the SYN) amplification attacks over TCP had
long been thought to be impossible: to go beyond the SYN would seem to require
an attacker to (1) guess the amplifier?s 32-bit initial sequence number (ISN) in their
SYN+ACK packet and (2) prevent the victim from responding to the amplifier with
a RST [37]. This work demonstrates that TCP-based reflected amplification attacks
are indeed possible: by leveraging TCP non-compliance in middleboxes, an attacker
can leverage middleboxes as reflection points. In Chapter 8, I will demonstrate this
attack and its evaluation on today?s IPv4 Internet.
The first empirical analysis of residual censorship across multiple coun-
tries. Residual censorship is a little studied feature of many nation-state censor-
ship systems. After a given TCP connection triggers a censor (e.g., by including a
forbidden keyword in a plaintext HTTP GET request), some censors not only tear
down the connection, but ?residually censor? all future communication between the
two end-hosts (on particular ports) for some period of time?even if the subsequent
traffic is completely innocuous. I perform the first empirical survey of the current
state of residual censorship around the world today: what countries employ it, how
it operates, how long it lasts, and so on. My results demonstrate a wide variety
5
in the implementation of residual censorship systems?even within a given country,
residual censorship can operate very differently from one protocol to another.
1.3 Ethical Considerations
Ethical considerations were a careful and important piece of this disserta-
tion. As this work is not human subjects research, it falls outside the scope of my
university?s IRB. Still, many chapters of this dissertation posed unique ethical con-
siderations; for this reason, each chapter will describe its own ethical considerations
and responsible disclosure process where appropriate.
1.4 Roadmap
The rest of this dissertation is structured as follows:
Chapter 2: Background and Threat Model
I will start by offering a background that is relevant to all subsequent chapters:
on middleboxes and the wider space of censorship research, with a particular focus on
censorship evasion of nation-state censorship. I will also discuss packet manipulation
for censorship evasion, the foundation for my thesis work, and reason about the
threat model described by nation-state censorship infrastructure. Some chapters
will require more specific background material, so I will provide background specific
to each chapter within that chapter if relevant.
Chapter 3: Discovering Client-side Evasion Strategies with Geneva
Next, I will present the design and results of Geneva, a novel genetic algorithm
6
that evolves network-level censorship evasion strategies directly against real world
censors. Geneva automatically discovers TCP/IP packet manipulation sequences
that, when applied only from one side of the connection, confuse censoring mid-
dleboxes without impacting the underlying connection. In the lab, Geneva quickly
re-derived almost all prior work in the space of packet manipulation strategies.
Against real world censors in China, India, Kazakhstan, and Iran, Geneva has dis-
covered dozens of strategies, including previously unknown strategies those that
exploit what seem to be bugs in implementation in censors. This chapter demon-
strates that it is possible to automatically discover packet manipulation strategies
that render nation-state censorship middleboxes ineffective at enforcing their policy.
Chapter 4: Server-side Evasion
Next, I show that using Geneva, server-side censorship evasion is possible,
allowing a server to subvert censorship on a client?s behalf. This permits unmodified
clients to connect directly to forbidden servers without requiring them to install
any anti-censorship software. I evaluate this approach across 5 different network
protocols (HTTP, HTTPS, DNS, FTP, and SMTP), demonstrating that it is possible
to automatically render nation-state censorship middleboxes ineffective at enforcing
their policy across multiple network protocols.
Chapter 5: Application-Layer Evasion
In this chapter, I show that it is possible to discover censorship evasion strate-
gies that themselves operate exclusively at the application layer. I design new mod-
ification primitives to explore modifications to HTTP and DNS requests, and show
that even modifications limited to these application-layer protocols can render mid-
7
dleboxes ineffective.
Chapter 6: Censorship-in-Depth: Iran
Nation-state middlebox deployments often involve multiple middleboxes de-
ployed in parallel, creating ?censorship-in-depth?. These deployments make finding
censorship evasion strategies and studying the censorship systems more difficult. To
evade censorship, we would need to find an overlap in evasion strategies that defeats
both systems, and to study either system individually, we would need to be able to
disentangle the effects of both systems. In this chapter, I study a novel example
of censorship-in-depth in Iran, and show that even in these cases, it is possible to
individually render middleboxes ineffective at properly enforcing their policies.
Chapter 7: Censorship-in-Depth: China?s SNI Censorship
Next, I present a second example of rendering middleboxes ineffective in a
?censorship-in-depth? deployment. I study China?s deployment of a secondary,
backup censorship system to their existing HTTPS (SNI) middleboxes. Unlike in
Iran, this is a system in which two different middleboxes operate on the same set of
packets with the same goal. This chapter, as with the previous, supports my thesis
in the context of more complex, real-world middlebox deployments.
Chapter 8: Weaponizing Censors for Amplification Attacks
In this chapter, I present a new attack that shows that middleboxes can be
coerced into (trying to) enforce their policy when they should not. The new at-
tack works by eliciting censorship responses from middleboxes to launch volumetric
reflected denial of service attacks.
Chapter 9: Weaponizing Censors for Availability Attacks
8
In this chapter, I present a second attack that demonstrates that nation-state
censors can be coerced into blocking arbitrary IP pairs from communicating across
their borders across multiple protocols. This attack makes use of a relatively little
studied feature of many nation-state censorship systems: residual censorship.
Chapter 10: Defending Against Geneva
Before concluding, in this chapter I take a step back and reason about what
it would take to defend against the myriad attacks I present in this dissertation.
What are the limits of this work, and should we expect it to work forever?
Chapter 11: Conclusion and Future Work
Finally, I conclude by revisiting the contributions of this work. I discuss imme-
diate next steps for this work, and comment on future challenges in the censorship
evasion space.
9
Chapter 2: Background and Threat Model
In this chapter, I provide a background relevant to all chapters of this thesis:
on nation-state censorship and middleboxes. I will also define the threat model
that this work operates within. Some chapters will require additional background
material specific to that chapter; where appropriate, individual chapters will provide
additional background material.
2.1 Nation-state Censors: Threat Model
Much of this dissertation studies nation-state censors. These are powerful
entities who are able to inspect [16], inject [38], and sometimes also drop [39] traffic
throughout their countries. Nation-state censors operate in two broad ways: on-
path (man-on-the-side) or in-path (man-in-the-middle) [24,40], and my experiments
span both kinds. In this section, I will also discuss other relevant properties of
nation-state censors: failing open or closed, the eavesdropper?s dilemma, and more.
On-path Censors On-path (man-on-the-side) censors can obtain copies of pack-
ets, allowing them to overhear all communication on a connection. To determine
whether to censor, these attackers perform deep-packet inspection (DPI) and typ-
10
ically look for keywords they wish to censor, such as DNS queries [28, 38, 41] or
resources in HTTP GETs [23,24,42].
On-path censors are also able to inject packets to both ends of the connection.
Because they are able to view all traffic on the connection, they can trivially inject
packets that the end-hosts will accept?unlike traditional off-path attackers who
must guess sequence numbers, query IDs, or port numbers [43,44]. On-path censors
have been observed to inject TCP RSTs to tear down connections [16, 23, 24, 40, 42,
45,46] and DNS lemon responses to thwart address lookup [38,41].
To reconstruct application-layer messages and track sequence numbers, on-
path censors maintain a Transmission Control Block (TCB) for each flow. A TCB
comprises sequence numbers, received packets, and other information about the con-
nection. A considerable amount of work has gone into modeling and understand-
ing how censors synchronize and re-synchronize their TCB state with the ongoing
connection?s [23, 24]. Understanding this can enable researchers to craft a packet
sequence that causes the censor to synchronize on incorrect data.
In-path Censors In-path (man-in-the-middle) censors also perform DPI to de-
termine whether to block a connection, but they can do more than just inject a
RST or lemon response. For example, an in-path censor is able to simply drop a
connection?s packets altogether. Alternatively, an in-path censor can also hijack a
connection entirely, inject a block-page, and prevent the client?s packets from reach-
ing the server. Evading an in-path censor requires tricking the censor into believing
that a connection should not be censored, for instance by hiding the true identity of
11
the server [20,21,47], obfuscating the protocol [11,48,49], or modifying the packets
in such a way that the censor no longer recognizes the forbidden query as a target.
The Eavesdropper?s Dilemma Almost all on-path and in-path middleboxes
must contend with the eavesdropper?s dilemma, which states that it is difficult to
accurately model the state of a connection from the middle of that connection [50].
The reason for this is that unless a middlebox ensures that every packet is delivered,
accepted, and processed by the end-server in the same way as the middlebox, an
attacker may be able to tamper with the middlebox?s internal state about the connec-
tion. For example, I will demonstrate an attack in Chapter 3 in which the attacker
sends a packet with a payload and a reduced TTL: the packet will be processed by
the middlebox (causing it to advance its internal TCB), but will be dropped before
it reaches the end-server. In this case, the middlebox will now be desynchronized
from the connection, and may be unable to correctly inspect the rest of the flow.
As I will discuss in Chapter 10, for a middlebox to mandate consistent state with
end-hosts can be difficult in practice, and may require a significant re-architecture
of the censorship infrastructure in the world today. Every nation-state middlebox
I study in this dissertation is susceptible to at least some attacks enabled by the
eavesdropper?s dilemma.
Failing Open Most nation-state censors operate in a fail-open capacity: any
packet that cannot be processed or matched to internal state is allowed to pass.
Failing-open reduces the collateral damage of censorship (e.g., an unknown protocol
will not be erroneously targeted), but it presents more opportunities for evaders. It
12
is difficult for an on-path censor to reliably fail closed, however: if the middlebox
requires connection state to disrupt a that connection (as is the case with injected
RSTs), if that state is incorrect, the censor will be unable to correctly censor the
connection.
Across all the middleboxes I study in this dissertation, only one approximates
a fail-closed system: Iran?s Protocol Filter, discussed in Chapter 6. This system
operated a strict protocol allow-list, and any protocol or packet sequence that could
not be positively identified would be dropped. As we will see in Chapter 6, however,
this system is not a perfect fail-closed system, and was still susceptible to attacks.
Throughout this dissertation, I will be explicit about the specific threat model
that each individual censorship system falls into. In addition to the above informa-
tion, I make several common assumptions that hold across all the threat models in
this dissertation. I assume that censors cannot break encryption that is considered
secure: only publicly known weaknesses are considered. I also assume censors do not
have sufficient resources to record indefinite packet captures of all network traffic
leaving their borders.
2.2 Related Work: Measuring Censors
There has been a wide range of work measuring how censors work and what
they block. This can be broadly broken down into two broad categories:
First are studies into what specific content or destinations censors block [39,
51?54]. My work is largely orthogonal to these prior efforts; the primary goal is not
13
to discover who or what is being censored, but to measure and understand how it
is being censored (and evade it).
Second is the body of work that studies how censors operate [12,14,15,23,24,
28, 38, 55, 56]. My work is complementary to these prior efforts, in that I am able
to lend new insights into how several censors perform on-path censorship, as well as
gaps in their logic and bugs in their implementations. For instance, I believe I am
the first to observe that censors use different transport-layer techniques depending
on the overlying application.
2.3 Evasion via Packet Manipulation
There is a history of evading on-path and in-path censorship through the
application of packet-manipulation strategies. At a high level, these techniques alter
and inject packets at one of the communicating endpoints (typically the client). In
so doing, their goal is to either de-synchronize the censor?s state (e.g., by injecting
TTL-limited RSTs [57]) or to confuse the censor into not recognizing a forbidden
keyword (e.g., by segmenting TCP packets).
Client-side evasion The earliest packet-manipulation strategies to evade on-
path censors come from an open-source project from 2011, sniffjoke [46]. sniffjoke
introduced a handful of client-side strategies, such as injecting packets with ran-
dom sequence numbers or injecting packets that shift the sequence number but
corrupt the payload. Unfortunately, many of the specific strategies sniffjoke em-
ployed have long been defunct, but its broad approaches were later re-discovered by
14
other work [23,24].
In 2013, Khattak et al. [16] crafted 17 different evasion strategies to exploit
specific implementation weaknesses against the GFW. In 2017, Wang et al. [24]
developed a suite of highly effective hand-crafted strategies, and their open-source
system INTANG could systematically identify the best evasion strategy from this
suite for a given server and network path. They perform empirical tests regarding
the behavior of the GFW, and make hypotheses on previously unknown updates
to the GFW. Li et al. [23] studied numerous middlebox traffic classifiers in their
2017 work, and pioneered automated work of identifying traffic differentiation. Once
traffic differentiation is detected, their system could choose from a library of pre-built
evasion techniques to evade the censor. They tested their work on many censorship
regimes, including the GFW, and many of the censorship techniques they leverage
are still relevant today.
My work is informed by and extends these prior efforts: I will present over 100
censorship evasion schemes discovered by Geneva, including some previously thought
impossible. My results also lead me to refine prior work?s findings. For instance,
Wang et al. [24] showed that the GFW was capable of reassembling TCP streams
to detect censored keywords in HTTP requests; my result confirms this for HTTP,
but show that the GFW is frequently incapable of doing so over FTP, indicating
that censors use different transport-layer techniques depending on the application.
Server-side evasion To the best of my knowledge, all prior censorship evasion
systems (including Geneva in Chapter 3) require some degree of client-side eva-
15
sion software. Even techniques that rely on server-side features, such as domain
fronting [58] or decoy routing [21], require client-side changes. However, there are
two server-side strategies that are similar in spirit to the novel server-side censorship
strategies I will describe in Chapter 4. In 2010, Beardsley and Qian [59] demon-
strated that a variant of TCP simultaneous open was able to bypass some intrusion
detection systems; these do not appear to work against censors, but we show in
?4.3 that Geneva discovered multiple simultaneous open-based strategies that work
against China?s GFW. brdgrd [45] intercepted packets sent by a Tor bridge to the
Tor client, and employed a relatively simple strategy?it lowered the TCP window
size of outbound SYN+ACK packets. This caused Tor clients to segment their TLS
handshake packets, splitting the set of supported ciphersuites across multiple TCP
packets. At that time, the GFW was unable to reassemble TCP segments, and thus
this strategy avoided detection and blocking. In 2013, the GFW added the ability
to reassemble TCP segments, rendering brdgrd defunct. Since then, we are aware
of no other work on this topic: all prior literature in this space has explored only
client-side strategies [16, 23,24,40].
More Broadly Beyond packet manipulation-based censorship evasion, there is
a much wider space of prior work for circumventing censorship. Researchers have
explored tunneling traffic over a wide variety of mediums, including email [60], video
games [61], VoIP [62], SSH [63], WebRTC [64], HTTP [65], just to name a few. Other
systems seek to hide the true destination of traffic, such as with Tor [20], domain
fronting [58], Decoy or Refraction Routing [21, 47, 66, 67], or to avoid the censoring
16
country altogether (Alibi Routing [68], DeTor [25]). Traffic mimicry systems have
also been developed to disguise network traffic as another protocol [48,49,69]; though
these appear to have inherent limitations [11]. Geneva is orthogonal to all of these
systems, and, as demonstrated with INTANG [24], could be used in tandem with
them to help bolster their ability to circumvent censors.
2.4 Automating Censorship Evasion
In the next chapter, I will describe the design and implementation of Geneva,
the first system to automate the discovery of censorship evasion strategies. Since
Geneva?s publication, however, there have been two notable works in the space of
automating censorship evasion that deserve mention here.
In 2020, Wang et al. released SymTCP [70]: a system to automatically discover
discrepancies between how censors and end servers process packets using symbolic
execution of the TCP implementation in Linux. SymTCP offers a contrasting ap-
proach towards the same end goal as Geneva: while Geneva treats the censor and
the end-host as ?black boxes? and explores the space of strategies by evolution,
SymTCP performs symbolic execution of the end host?s TCP stack and explores
the strategy space systematically. There are trade-offs between the approach taken
by Geneva and by SymTCP. SymTCP?s principled exploration of the strategy space
offers a more deterministic approach towards censorship evasion. However, by treat-
ing censors and end-hosts as black boxes, Geneva can be more easily deployed against
previously unknown or new censorship systems. For example, no additional effort
17
is required to train Geneva with a censored Windows HTTPS server compared to
a Linux SMTP server, whereas SymTCP would require the ability to execute that
server within its symbolic execution engine. This has allowed Geneva to be highly
responsive to new censorship events and systems [36,71].
In 2019, Moon et al. released Alembic [72], a system to automatically infer
state models for middleboxes. Alembic applies symbolic execution and finite-state
machines (FSM) to infer the state of a stateful firewall. Alembic takes a contrasting
approach to Geneva: Geneva defeats censorship first, and then researchers can infer
the firewall?s model from the strategies it discovers and discards, whereas Alembic
first discovers the firewall?s model, and then researchers can use that model to deter-
mine evasion strategies. Like SymTCP, Alembic offers a more principled approach
towards identifying evasion opportunities, and knowing a firewall?s model can be
useful beyond just identifying evasion strategies, such as to improve the accuracy
of network testing tools. With nation-state censorship, Geneva offers other advan-
tages over Alembic. For firewall model inference, Alembic requires an offline training
stage that can last for tens of hours, which may not scale to real nation-state cen-
sors. Further, Geneva supports a much larger ?alphabet? of potential actions (and
the ability for researchers to add new actions), making Geneva more expressive.
However, Alembic and Geneva have not trained against the same systems, so it is
difficult to compare their effectiveness directly.
18
2.5 Fuzzing
Fuzz testers [73] mutate inputs non-deterministically in an effort to evaluate
the correctness, security, and coverage of programs. Most relevant to my work is
the space of grammar-based fuzzers, which define an input grammar for the target
protocol, and differential-based fuzzers, which send fuzzed inputs to multiple systems
to identify any differences in behaviors. Grammar-based fuzzers (including those
based on genetic algorithms) have been used successfully against many targets [74],
including web applications [75] and other popular protocols [76]. The Peach Fuzzer
is a grammar-based protocol fuzzer that allows a user to specify an input grammar,
but only its Community Edition is available since Gitlab purchased it in 2020 [77].
WFuzz is another powerful fuzzer for HTTP web servers, but it has no support for
other protocols or extending its grammar [78].
My work differs from existing fuzzers in two subtle but important ways: First,
Geneva has a different goal from traditional fuzzers: instead of searching for modi-
fied inputs that elicit incorrect behavior from the application, our work must find
a modified input that elicits correct behavior from the application but incorrect
behavior from the eavesdropping censor. Second, my goal is not just to find any
output that evades a censor, but rather to identify a modification that can be made
to an existing user query to enable the user to bypass the censor. Whereas fuzz
testers traditionally generate inputs, our approach generates what amounts to small
pieces of code (built from its manipulation primitives) that are in turn applied to
inputs (user traffic). Therefore, we search over the space of manipulation actions,
19
not over the input space itself.
Genetic algorithms have been used for fuzzing, including in the well known
American Fuzzy Lop (AFL) [74] and iFuzzer [79]. Genetic algorithm fuzzing tech-
niques have been applied to web applications [75] and other popular protocols [76].
To my knowledge, I am the first to apply such techniques to censorship evasion.
20
Chapter 3: Discovering Client-side Eva-
sion Strategies with Geneva
I begin by demonstrating that it is possible to automate the discovery of
evasion strategies through client-side manipulation of IP and TCP packets. To
achieve this, I have designed and implemented Geneva, a novel genetic algorithm
that discovers how to evade censorship against a live adversary. I trained Geneva
against real-world censorship infrastructure in China, India, Iran, and Kazakhstan,
and present a total of 27 strategies (including strategies that prior work posited
should be impossible). I will detail how these strategies work, and what these new
evasion strategies teach us about how Chinese censorship works. This chapter will
demonstrate that censors can be rendered ineffective.
3.1 Geneva Design
In this section, I describe its genetic algorithm-based design in terms of its
building blocks and how it composes and evolves them over time. I begin by pro-
viding a high-level overview of the approach.
21
3.1.1 Overview and Challenges
Genetic algorithms [80] are a biologically-inspired approach to automate al-
gorithm design. They require three core components: (1) genetic building blocks
that provide a way to programmatically represent different algorithms, (2) a fit-
ness function to capture how well a given algorithm performs, and (3) methods for
performing mutation and crossover to generate new algorithms. Iteratively, over
successive generations (rounds), genetic algorithms simulate evolutionary natural
selection: Given a set of individuals (candidate algorithms), it runs each one to
compute their fitness, allows only some of the fittest to survive, and mutates or
crosses-over the surviving ones to generate new individuals for the next generation.
One primary challenge faced in applying genetic algorithms to censorship eva-
sion lies in how many degrees of freedom we permit in its genetic building blocks.
On the one hand, we could allow virtually unlimited degrees of freedom by, say,
treating all packets merely as bit strings and allowing the genetic algorithm to con-
struct strategies out of bit flips, bit removals, and bit insertions. Such an approach
would eventually learn virtually any possible strategy, but would require an inordi-
nate amount of time to do so. On the other extreme, we could use existing evasion
strategies from prior work as building blocks; this would learn more quickly, but risks
?over-fitting? to the strategies that are already known. Therefore, Geneva needs ge-
netic building blocks that balance between finding new strategies and finding them
efficiently.
22
3.1.2 Geneva?s Genetic Building Blocks
Strategies in Geneva comprise a set of (trigger, action tree) pairs. Packets
that match a given trigger (for instance, all TCP packets with the ACK flag set) are
modified using the corresponding sequence of actions in an action tree. We permit
Geneva to evolve the triggers, the structure of the action trees, and the properties
of the individual actions themselves.
Here, we present the design of triggers, actions, and action trees, as well as
a syntax that comprises the genetic code of individuals to unambiguously describe
Geneva strategies.
Triggers Triggers represent fields in a packet header that, when matched, cause
packet manipulation actions to be applied. In this work, we have restricted trig-
gers to span only TCP and IP, though adding support for additional protocols is
straightforward in our implementation. Triggers are expressed with the following
syntax: [PROTOCOL:FIELD:VALUE]. For example, [TCP:flags:R] is a trigger that
fires when the TCP field flags is set to RST. Geneva requires exact matches: for
instance, a packet with only the TCP RST flag set would not match a trigger for
[TCP:flags:RA].
Actions To balance expressiveness with efficiency, we permit four distinct packet-
level actions:
1. duplicate(A1, A2) copies a packet and applies action sequence A1 to the
original packet and A2 to the duplicate.
23
2. fragment{protocol:offset:inOrder}(A1, A2) fragments or segments the
packet (depending on if the protocol is set to IP or TCP) at a specific byte
offset, applies A1 to the first fragment, A2 to the second, and optionally
returns them inOrder.
3. tamper{protocol:field:mode[:newValue]}(A1) alters the given field of
a packet and then applies action sequence A1 to it. tamper always tries to
keep the packet in a valid state unless otherwise directed, and will recompute
the headers? checksums and/or lengths if needed (unless field is a checksum
or length). Note that if the specified field is optional and not present, such
as a TCP option, it will be added to the packet. tamper has two modes of
operation: replace and corrupt. replace:newValue sets the given field of
the packet to newValue. corrupt replaces the given field of the packet with
a random value of the same bitsize (a new random value is selected each time
the action is invoked).
4. drop causes a given packet to be dropped.
Action Trees Geneva?s actions are composed to form a binary tree: duplicate
and fragment both have two children; tamper has one child; and drop has no
children. An action tree encapsulates a packet modification scheme?each packet
that matches the associated trigger enters at the root of the tree and is passed down
via in-order traversal to the actions of the tree. Packets that emerge at the leaves
are sent on or accepted from the wire. We refer to an ordered list of (trigger, action
tree) pairs as a forest, and forests can be combined to represent a strategy. Triggers
24
need not be unique within a forest?if multiple action-trees have the same trigger,
each action-tree is given its own fresh copy of the original packet, and runs serially,
in isolation, in the order the trees exist in the forest. Note that action-trees are
stateless, and operate only on singular packet inputs (though they may result in
sending multiple packets). An interesting area of future work would be to extend
Geneva to operate over packet streams.
Outbound vs. Inbound We allow Geneva to evolve action-trees for both inbound
and outbound packets. A strategy in Geneva is thus two components: an inbound
and outbound forest of triggers and action-trees. This lets Geneva independently
alter outgoing packets and alter (or ignore) incoming packets. Due to limitations of
NFQueue, branching actions (duplicate and fragment) are disallowed in inbound
forests. We represent the overall strategy syntactically as outbound-forest \/
inbound-forest.
Example To demonstrate Geneva?s syntax, consider the following:
Strategy 1: TCB Turnaround / RST Drop
[TCP:flags:S]-
duplicate(
tamper{TCP:flags:replace:SA}(
send),
send)-| \/
[TCP:flags:R]-drop-|
This example strategy has one outbound and one inbound tree. The first
(outbound) action-tree duplicates outgoing SYN packets; it replaces the first copy?s
TCP flags with SYN/ACK before sending it. It then sends the second copy of the
25
SYN packet unmodified. On the inbound forest, the only action-tree triggers on
RST packets and drops them. Collectively, this strategy implements a hybrid of
two previously known strategies: TCB-Reversal [24] (characterized by sending a
SYN/ACK before the three-way handshake) and RST-Drop [42]. (Unfortunately, as
we will see in Section 3.3, both halves of this hybrid species are now extinct against
the GFW.)
Expressiveness Note that Geneva?s genetic building blocks reflect the set of packet
manipulations that can occur at the IP layer: as a result, we posit that they can
be composed to generate any packet stream. To evaluate this hypothesis, we tested
whether it was possible to express all prior work?s strategies [16, 23, 24] through
combinations of duplicate, fragment, tamper, drop, and send alone. Indeed,
we were able to express 30 (83.3%) of the 36 previously published strategies?the
only exceptions were strategies that (1) manipulated HTTP packets, as was done
by Khattak et al. [16], and those that (2) paused for 40?240 seconds, as was done
by lib?erate [23]. These are not fundamental limitations: one could easily extend
Geneva to support HTTP manipulation or sleeping through tamper actions. For this
chapter, we chose to limit Geneva to only manipulate IPv4 and TCP (as this was
the central focus of most prior work), and not to include pauses: including pauses
would significantly slow down training time. As we will show in ?3.2, Geneva was
able to independently discover all of these 30 strategies in in-lab experiments, and it
discovered many more strategies when trained against a live censor: China?s GFW.
Geneva automatically derives these strategies through the process of evolution, which
26
we describe next.
3.1.3 Evolution
Geneva automatically derives censorship evasion strategies through evolution,
which takes place over a series of discrete generations. Each generation comprises
multiple individuals (strategies, represented as inbound and outbound forests of
action-trees), and includes three broad steps: (1) mutation and crossover, (2) eval-
uation of individuals? fitness, and (3) selection of individuals to survive to the next
generation.
Population Initialization We explored two ways to initialize Geneva?s popula-
tion. For most of our experiments, we randomly generated an initial population of
individuals. We generated 200 individuals, each with random but valid action-trees
with precisely 3 actions each. Additionally, we explored seeding the population with
?extinct? strategies. With a population seed, the initial population is comprised of
duplicates of the seed: this allows the algorithm to focus evolution on improving a
given strategy.
Mutation As in biological systems, Geneva?s genetic building blocks can be altered
through random mutations. Mutations can occur at the level of actions, action-trees,
and entire individuals. Each action mutates in the following ways:
? duplicate mutations swap the order of the children (i.e., duplicate(A1, A2)
? duplicate(A2, A1)).
? fragment mutations change the protocol (fragmentation or segmentation), the
27
order of the packet fragments, or the fragmentation index.
? tamper mutations depend on the mode it is in: replace mode mutations can
alter the field they replace or the new value it changes it to, whereas corrupt
mode mutations can alter the field it corrupts. Both modes can mutate to the
other mode.
? drop does not support mutations.
Triggers can also be mutated similarly to the tamper action: the protocol, field, or
value to trigger on may be changed.
To mutate an action tree, one of four primitives is applied with some config-
urable probability1: a new action can be chosen at random and added to the tree in
a random location (20% probability in our implementation), an existing action can
be removed from the tree (20%), the trigger can be mutated (20%), or one of the
actions can be mutated (40%).
An individual (which in turn comprises outbound and inbound action-forests)
can be mutated in one of four ways, also with configurable probability: a new
random action tree can be added to one of its forests (10%); an existing action tree
can be removed from one of its forests (10%); trees in its forests can be reordered
(5%); or specific trees within each forest can be mutated (25%). In each generation,
each individual is mutated with a configurable probability (90%).
As actions and triggers must operate on real-world packet data, it is challenging
to mutate the actions or triggers in such a way that it results in packet values that
1We verified that Geneva was still effective when each option was chosen with equal probability.
We chose our specific values based on our intuition during in-lab experimentation, and leave a full
parameter sweep optimization for future work.
28
are seen in the real world. For example, if the algorithm was to mutate the TCP flags
header field to a valid random value (any value from 0?65535) it would very rarely
choose a valid combination of TCP flags. Therefore, during mutation, actions and
triggers are given access to a packet capture of their previous run against a censor.
The triggers (and tamper action) can draw from the values contained in real packets
to mutate.
Drawing from real packet captures also confers a second advantage to the
evading system. If the censor interacts with the strategy (e.g., by forging RST
packets), these injected packets will be available in the packet capture for the action
system to draw from and use for mutation. This allows action trees to find triggers
that apply only to injected packets.
Crossover Unlike mutations?which are random perturbations of singular strate-
gies or actions?crossovers serve as a form of ?breeding? between two different
individuals. To perform crossover, two individuals are chosen at random from the
population pool, and one of the following occurs. Trees in each action forest are
randomly swapped, or a randomly chosen tree in each forest is mated with a ran-
domly chosen tree from the other. To mate two trees, an action is chosen from each
tree, and the subtrees of that action are swapped between each tree. If each action
forest for a specific direction only has one tree, crossover will be applied using the
second mechanism.
In each generation, crossover is applied between every other individual in the
pool with a configurable probability (40% by default). In our implementation,
29
crossover is applied before mutation.
Fitness At the end of each generation, all individuals are evaluated for their fitness.
Genetic algorithms rely on some domain-specific fitness function when determining
which individuals should be allowed to survive to the next generation. Geneva
evaluates fitness by running directly against the censor. This way, Geneva evolves
in the presence of the real deployment, and can therefore adapt to the details and
idiosyncrasies of a particular censor?s implementation.
To evaluate a given strategy, a Geneva client simply tries to make a forbidden
GET request through an actual censor (or a simulated censor, for in-lab testing),
while the strategy runs on the client side. The specific request depends on the
censor: against the GFW, Geneva makes an HTTP GET request with a forbidden
word, against India?s Airtel ISP, we make an HTTP GET request to a blocked
URL; against Kazakhstan?s HTTPS MITM, we make an HTTPS request. Geneva
assigns a positive numerical fitness metric if the connection can properly finish; if the
connection is censored (is reset, blocked, or gets the injected certificate respectively),
a large negative value is added to the fitness. As we will see in ?3.3, some censors
may not work 100% of the time. To prevent false positives in strategy evaluation,
Geneva evaluates each strategy twice and records the lower of the two fitness scores.
Three additional adjustments are made to the fitness measure to help refine
and optimize successful strategies: First, the fitness is punished if any vestigial
action-trees are present?action-trees whose triggers which are never fired during
an evaluation. Punishing for vestigial actions kills off strategies without effective
30
triggers early in the evolution process, allowing the framework to evolve good trig-
gers before it discovers fully functional action-trees, and encourages pruning unused
action-trees. Second, the fitness is punished for strategy overhead?the number of
additional packets that a strategy adds to the data-stream. Punishing for strategy
overhead encourages precise triggers (such as triggering only on PSH/ACK packets,
instead of every packet). Finally, the strategy is punished for strategy complex-
ity?a count of the number of actions across all of the action-trees in the strategy
to encourage succinct strategies. Critically, punishments for strategy overhead and
complexity are applied only when the fitness of an individual is positive to encourage
the algorithm to explore the strategy space as much as necessary in the early stages
of evolution.
Selection In the final step of a generation, Geneva runs a selection tournament [81].
Some individuals are drawn at random (with replacement) from the population; the
highest-fitness individual among them is added to the offspring pool. This process
repeats until the offspring pool is the same size as the population pool; then, the
offspring pool becomes the population for the next generation.
Selection tournaments have several benefits. High-fitness individuals have a
greater probability of being selected for the next generation?and because they are
chosen with replacement, multiple copies of them are likely to be selected. This
allows Geneva to focus on improving promising strategies. While low-fitness indi-
viduals decrease in number, they have non-zero probability of surviving to the next
generation. This has the benefit of promoting genetic diversity, thereby steering
31
Geneva away from local maxima.
As the evolutionary framework will run for many generations, it is possible to
find a successful strategy, but mutate away from it or break it in ensuing genera-
tions. To prevent the loss of successful strategies as the algorithm progresses, the
system maintains a ?Hall of Fame?: a global sorted collection of every individual
the algorithm has evaluated during a run. At the end of each generation, the Hall
of Fame is updated with the highest performing individuals.
Strategy Coverage The evolutionary process we have described thus far does not,
by itself, promote a broad exploration or coverage of the strategy space. As we will
see in Section 3.3, when running in a real environment, some header fields have
a higher probability of contributing to a successful strategy. As a result, Geneva
tends to find them first, and there is no evolutionary pressure to deviate from those
individuals to find new strategies. To broaden coverage, we add an optional meta
layer on top of normal evolution: if, across multiple consecutive experiments a
particular header field is repeated across all of the successful strategies, Geneva can
preclude it from future training sessions. This encourages broader exploration in
other portions of the space of potential strategies.
3.1.4 Implementation
We implemented Geneva in approximately 6,000 lines of Python. Geneva runs
strictly at the client, and uses NetfilterQueue [82] to interpose on (and possibly
alter) all of the client?s outbound and inbound packets. As a result, Geneva does
32
not require any modifications to the applications. To demonstrate this, we deployed
an unmodified Google Chrome browser on a client running Geneva in China, and,
using the strategies we present in ?3.3, verified that we were able to browse free of
keyword censorship.
In its current implementation, Geneva requires root access?as with all prior
work on packet-manipulation-based censorship evasion [16,23,24,45,46], root privi-
lege is necessary for most of their packet manipulations. However, we demonstrate
in ?3.3 that Geneva is also able to find strategies that operate strictly through TCP
segmentation. Strategies such as these could be deployed without root privilege.
Recall that Geneva currently only supports modifications of IP and TCP packets;
it would be straightforward to also add application-layer modifications, in the form
of new tamper primitives for HTTP, DNS, and so on. These would not require
root privilege, and given prior successes at application-layer manipulations [16, 23],
we speculate that Geneva would also fare well, but this is beyond the scope of this
chapter.
3.2 Validation
In this section, we validate Geneva?s design by investigating whether it can
re-derive strategies found from prior work [23, 24]. Unfortunately, the techniques
employed by censors are not guaranteed to be the same today as when these prior
studies were performed. To achieve a fair comparison, we have implemented mock
censors that exhibit the behavior reported in prior work, and validate against them
33
in a controlled environment.
Mock Censors We first developed a suite of mock censors (11 in total) to mimic
specific aspects of nation-state censor behavior as hypothesized by previous re-
searchers [15, 23, 24, 55]. This includes on-path censors injecting TCP RST packets
to disrupt a connection (China), varied TCB synchronization/teardown behavior
(China, Iran), in-path censors dropping packets (India, China), TCB resynchro-
nization behavior (China), and so on. A full list of the censors we developed is
included below in this section.
We implemented a Dockerized [83] evaluation system for Geneva to train
against these censors. We ran each strategy in an isolated environment with three
containers (a client, a mock censor, and server). We isolated each training session
from the others, with a starting population pool of 1,000 individuals, capped at 50
generations. In the lab setting, Geneva evaluated 3?5 strategies per second, and
each generation took 4.4 minutes on average to complete.
Validation Results Geneva found successful strategies against every mock censor.
We analyzed the strategies that Geneva discovered and found that, of the 36 strate-
gies suggested by previous work [16,23,24], Geneva automatically re-derived 30 (83%)
of them. The strategies that Geneva did not find are not possible to create with our
genetic building blocks (drop, tamper headers, duplicate, and fragment). Specif-
ically, Geneva did not rediscover the ability to delay packet transmissions [23, 24],
perform state exhaustion [16,24], or perform HTTP-specific tweaks [16] (Geneva was
not given the HTTP protocol structure to perform specific minor modifications).
34
In addition to learning simple behavior against weak censors, Geneva finds
strategies in the TCB Creation, Data Reassembly, and TCB Teardown species, and
learned more complex behavior. For example, prior work theorized that the GFW
would enter a ?resynchronization state? after a RST or RST/ACK, and that the GFW
updates its TCB with the next packet in the stream. Such a feature would allow
it to recover to continue censoring a connection, even after an injected insertion
RST [24]. Against a similar censor in the lab, Geneva evolved a strategy that injects
an insertion RST packet after the connection is established, then injects an insertion
packet with an invalid sequence number. Geneva also evolved strategy variants with
additional behavior, such as TCB Turnarounds, various fragmentation attacks, and
different forms of TCB teardown [23, 24, 84]. While training in the lab, Geneva
identified 9 now-patched bugs in scapy [85], a bug in Docker for Mac [83], and a bug
in NetfilterQueue [82].
All the discovered strategies require only 1?2 action trees in the outbound
forest to express; besides the initial strategy of dropping inbound RSTs, none of the
strategies relied on the inbound forest at all (Geneva typically pruned them quickly).
Why does Geneva work? At first glance, it seems counter-intuitive that Geneva
would be effective at searching the space of strategies: after all, there is no continuous
cost function against which it can gradient descent (changing one TCP flag can cause
the entire connection to terminate). Yet, Geneva finds a working strategy in all of
its experiments (which comprise at most 10,000 individuals). By comparison, when
we run a strawman scheme that simply generates random strategies, it found no
35
working strategies until we manually assisted it by handing it working triggers, and
even then it only found one working strategy after 100,000 individuals. Why is
Geneva so much more effective?
Observing Geneva?s strategies throughout the duration of its experiments, we
can broadly classify four major ?development phases? that Geneva naturally goes
through. First, Geneva learns which triggers are relevant; in early generations, indi-
viduals try a highly variable number of triggers, but those who randomly generate
relevant triggers receive higher fitness, and the selection tournament converges on
a set of workable triggers. Second, Geneva learns how not to kill the ongoing TCP
connection; action trees that have at the root tamper{TCP:chksum:corrupt} are
likely to be doomed?such action trees get very low fitness and are thus likely to
be weeded out in the selection tournament. Third, with working TCP connections,
Geneva tends to tweak its action trees through mutation, crossover, and mating to
iterate on various modifications that ultimately trick the censor. Finally, with work-
ing strategies, Geneva?s fitness function punishes strategies with more actions; thus
mutations drive it towards smaller strategies until a local minimum is reached.
We emphasize that we did not encode these various ?stages? into Geneva: these
emerge naturally from its genetic algorithm and fitness function.
These in-lab validation experiments demonstrate that Geneva?s genetic building
blocks are expressive enough to span a wide range of strategies, and that our evo-
lutionary process is effective at finding successful ones. Next, we evaluate against
real world censors.
36
3.3 Evaluation against real censors
We have three high-level questions in evaluating Geneva: (1) Can Geneva find
successful circumvention strategies efficiently when training against a real censor?
(2) What novel strategies can Geneva find against a real censor? and (3) Does
Geneva generalize to multiple censoring regimes?
To answer these questions, we ran Geneva against three nation-state censors:
China?s Great Firewall, India?s ISP-based censorship (Airtel), and Kazakhstan?s
recent HTTPS MITM infrastructure. Table 3.1 lists the success rates, descriptions,
and taxonomy of all strategies and strategy variants Geneva found against these
censors.
3.3.1 Experiment Setup
Vantage points We used VPSes in Mainland China from four vantage points
(Shanghai, Zhengzhou, Shenzen, and Beijing); in India, we used VPSes in Banga-
lore; and in Kazakhstan, VPSes in Almaty and Qaraghandy. Censorship strategies
can vary based on ISP, routing path, or egress points [24, 86], but we observed no
significant difference in the success rate between any two of our vantage points in any
of the countries we tested. Nonetheless, it is possible that running Geneva from more
locations would result in more varied success rates, or different strategies entirely.
Initialization In each evolution experiment we performed, we initialized Geneva
with a set of individuals generated at random, each with three actions and one
37
trigger (all selected and parameterized with random values), and disallowed it from
accessing results from previous runs. We configured each training session with a
starting pool of 200 individuals, and capped it at 50 generations, or until popu-
lation convergence occurred (whichever came first). On average, each generation
generated approximately 500KB in outbound traffic and 2MB in inbound traffic.
Each generation took 5?10 minutes to complete; overall, training sessions took 4?8
hours.
Triage Recall that during training, Geneva evaluates each strategy in the popu-
lation by making real connections to censored resources as a part of the fitness
function. To compute a success rate for a given strategy in a given country, we re-
peatedly evaluated the strategy from each of our vantage points within the country
and averaged the success rates of each.
After Geneva completed its experiments, we then manually analyzed the set
of successful strategies it found. To verify that all of the actions in each strategy
were strictly necessary, we manually removed individual actions and verified that the
strategy was no longer successful as a result. To better understand why the strategies
were successful, we manually altered, removed, added, and swapped actions. We
emphasize that all manual changes were only done as a post hoc analysis, and all
strategies and strategy variants presented herein were independently discovered by
Geneva.
38
3.3.2 China: The Great Firewall
We focus specifically on GFW?s HTTP censorship. The GFW injects RST
packets if a forbidden word is included in the URL of an HTTP GET request.
The GFW also employs ?residual censorship? [24]: after a client makes a censored
request to a given website, the GFW forbids new connections between the client?s
IP address and the website?s IP:port pair for approximately 90 seconds.
To avoid residual censorship, we compiled a pool of destination servers to train
against by querying all sites from the Alexa Top 10,000 that are initially reachable
with an HTTP GET but censored when the request includes a forbidden word. This
allows us to test whether Geneva can be effective at evading keyword censorship of
real, popular websites. It also filters servers that are in the GFW?s IP blacklist
(e.g., Facebook or Google); those blocked by DNS; and those hosted in-country
(in which case the GFW may not necessarily be in-between our machine and the
server). We find 7,917 sites out of the above 10,000 that were outside the GFW and
not immediately censored. This is similar to GreatFire?s census, which found that
147 of the top 1,000 Alexa sites are blocked in China [87]. While evaluating Geneva,
we chose sites at random, limited to only those that were both accessible and not
subject to residual censorship.
As previously shown [16, 23, 24], strategies deployed against the GFW do not
succeed or fail consistently; in fact, if no strategy is used whatsoever, we find that it
still succeeds 2.8% of the time. Throughout this section and in Table 3.1, we include
each strategy?s success rate against the GFW.
39
We allowed Geneva to train against the GFW directly in 27 discrete, isolated
experiments over 16 days. Geneva discovered successful strategies in 23 of the 27
training sessions, across four different species of strategy. Geneva failed to discover
strategies only when we heavily restricted its access to header fields, in an effort to
explore a broader set of strategies (e.g., it failed to identify strategies when disallowed
from accessing the entire TCP header). Below, we detail several successful strategies
from each of the four species Geneva was able to discover against the GFW.
Species 1: TCB Desynchronization This species? strategies inject an insertion
packet with a payload. The GFW treats the packet as legitimate, so the GFW
advances the associated TCB, desynchronizing from the connection. Geneva quickly
discovered this species; every subspecies emerged within the first three generations.
The most common way Geneva exploits this weakness is with a single outbound
action-tree, triggered on PSH/ACK packets (which contain the censored keyword). For
instance, Strategy 2 creates an insertion packet by duplicating the offensive packet,
setting the TCP data offset to 10, and corrupting the checksum.
Strategy 2: TCB Desynchronization 98% (CN)
[TCP:flags:PA]-duplicate(
tamper{TCP:dataofs:replace:10}(
tamper{TCP:chksum:corrupt}(send)),
send)-| \/
Interestingly, this strategy sends the forbidden keyword twice (in both du-
plicates? payloads), seemingly increasing the likelihood of detection. Yet, neither
request elicits a RST from the censor. Why?
40
The first packet invalidates the checksum, but this only causes the destination
web server to ignore it, as the GFW does not verify checksums. The first packet also
increases the dataofs. This field controls the size of the TCP header; increasing
it causes a receiver to interpret the beginning of the payload as additional bytes in
the TCP header. This is sufficient for the GFW to no longer identify the payload as
an HTTP request, and thus it ignores the keyword, treats it as a legitimate part of
the connection, and consequently desynchronizes from the connection. The censor
therefore ignores the second packet altogether (the sequence number appears out of
window), but the destination server accepts it.
Geneva also identifies seven other unique variants that exploit this issue us-
ing different combinations of header fields, operations, and action trees; these are
available in Table 3.1.
Species 2: TCB Teardown This species? strategies inject an insertion packet
with TCP flags to trigger a teardown of the GFW?s associated TCB before sending
the censored request. Once the TCB is torn down, the GFW ignores the connection?s
subsequent packets. Others have identified this species [23, 24], but Geneva has
discovered new variants that reveal that the GFW works differently than suggested
by prior work.
The most successful TCB Teardown strategy, shown in Strategy 3, has one
outbound action-tree, triggered on ACK packets. It duplicates the ACK; it sends the
first one unaltered, and turns the second one into a RST with a corrupted checksum
before sending it. As with Strategy 2, the server ignores the RST, but the GFW does
41
not verify checksums and accepts the packet.
Strategy 3: TCB Teardown Variant 1 95% (CN)
[TCP:flags:A]-duplicate(send,
tamper{TCP:flags:replace:R}(
tamper{TCP:chksum:corrupt}(send)))-| \/
Through mutation, Geneva also found a variant of Strategy 3 that swaps the
two packets: the corrupted RST is sent before the original ACK. This swap lowers the
success rate to 51%. Through additional mutation, Geneva discovered Strategy 4,
which improves this less successful variant by adding a second outbound action tree
that corrupts ACK packets. This improves the success rate to 92%.
Strategy 4: TCB Teardown Variant 2 92% (CN)
[TCP:flags:A]-tamper{TCP:seq:corrupt}-|
[TCP:flags:A]-duplicate(
tamper{TCP:flags:replace:R}(
tamper{TCP:chksum:corrupt}(send)),
send)-| \/
To understand why Strategy 4 works, recall that when multiple action trees
fire on the same trigger, each is given a fresh copy of the original packet. Thus,
the third and final packet sent in this strategy is the original, uncorrupted copy,
and the three-way handshake is able to complete. The server ignores the other two,
corrupted packets, but the GFW does not.
According to prior work [24], Strategies 3 and 4 should not work (at least, not
nearly as well as they do). Prior work hypothesized that the GFW may enter a
?resynchronization? state upon seeing a RST or RST/ACK packet [24]. In this case,
once Strategy 4 sends the RST, the GFW should resynchronize the TCB on the next
42
packet in the datastream (the original ACK) and resume censoring the connection.
If this were the case, then modifying Strategy 4 to move the first action tree (with
the corrupted ACK) to the end of the outbound forest should be equally successful.
However, this modification causes the strategy?s success rate to plummet to 47%.
Why?
These results indicate that the GFW is tracking the state of the TCP three-way
handshake, and sometimes enters a resynchronization state only while the three-
way handshake is unfinished. Concretely, we update the resynchronization state
hypothesis as follows: upon receiving a RST or RST/ACK packet before the three-way
handshake is complete, the GFW may enter the resynchronization state (about 50%
of the time) instead of tearing down the TCB. Further, these strategies suggest that
the GFW tracks the three-way handshake without paying attention to sequence
numbers: the mere presence of an ACK packet is enough to fool the GFW into
thinking that the three-way handshake is complete.
Geneva also lends insight into how the GFW processes RST packets. Consider
Strategy 5:
Strategy 5: TCB Teardown with Invalid Flags 96% (CN)
[TCP:flags:A]-duplicate(send,
tamper{TCP:flags:replace:FRAPUN}(
tamper{IP:ttl:replace:10}(send))-| \/
FRAPUN is a completely invalid combination of TCP flags, and yet the strategy
is still highly effective. We hypothesize that the GFW is looking only for the presence
of a RST flag to teardown the TCB, and not validating that a legitimate combination
43
of flags is present in the packet. Table 3.1 shows variants of this strategy with many
other invalid combinations of TCP flags.
Species 3: Segmentation This species? strategies take advantage of how the
GFW mishandles TCP payloads that are segmented across multiple TCP packets.
The Segmentation species is fundamentally different than the Data Reassembly
species from prior work [16]. Data Reassembly takes advantage of the censor?s
inability to differentiate which fragments or which data from fragments should be
accepted. For instance, some such strategies extend one segment with junk data
and overlap the second segment with the correct data. Prior work theorized that
the GFW would accept the first packet to arrive with a specific IP fragment, but the
second packet to arrive with a particular TCP segment [16]. Other Data Reassembly
strategies leveraged this to inject insertion segments or fragments, tricking the GFW
into accepting the wrong packet. Conversely, strategies from the Segmentation
species exercise no IP fragmentation, no segment overlapping, and no inert packet
injection?and can be performed from within an application, without raw sockets.
Nonetheless, these are the only strategies Geneva has found to date that are highly
successful across all three countries we experimented in.
Geneva has discovered two main Segmentation subspecies that are effective
against the GFW. The first subspecies, shown in Strategy 6, segments the HTTP
request (triggered on the PSH/ACK) at 8 bytes and corrupts packets with only the
ACK flag set:
Corrupting the sequence number of the ACK packet breaks the original three-
44
Strategy 6: Segmentation with ACK 94% (CN)
[TCP:flags:PA]-fragment{tcp:8:True}(send,send)-|
[TCP:flags:A]-tamper{TCP:seq:corrupt}(send)-| \/
way handshake, but the ACK flag set in the PSH/ACK packet finishes the handshake.
Table 3.1 lists additional variants.
One might expect that this strategy simply splits the forbidden word across
multiple packets, and that the GFW must not be properly reassembling the seg-
ments. However, this is not the case. Our TCP payload is ?GET /?search=ultrasurf?:
the first segment is ?GET /?se? and the censored word appears in its entirety in the
second segment. Changing the length of the censored word (e.g., to ?falun-gong?)
does not affect the strategy?s success rate.
Each component of Strategy 6 is required?for instance, it fails without the
corrupted ACK?but it works surprisingly well even as many of the individual values
vary. Decreasing the size of the first segment to anything less than 8 is equally
effective, but increasing it to larger than 8 renders the strategy completely ineffec-
tive. The length of the HTTP parameter does not affect the strategy?s success rate.
As long as the sequence number is altered and the segmentation index is less than
or equal to 8, the GFW seems insensitive to additional changes tried by strategy
variants, such as corrupting both the sequence and acknowledgement numbers.
The second subspecies Geneva discovered is even stranger:
This strategy produces three segments, the first of size 8, the second of size 4,
and the final containing the remainder of the original packet. Again, this does not
segment the keyword: applying Strategy 7 to the original HTTP request results in
45
Strategy 7: Multi-segmentation 98% (CN)
[TCP:flags:PA]-
fragment{tcp:8:True}(send,
fragment{tcp:4:True}(send, send))-| \/
segments (1) ?GET /?se?, (2) ?arch?, and (3) ?=ultrasurf HTTP/1.1\r\nHost...?.
In a post-hoc analysis of this strategy, we explored different values for the
segment offsets m and n (m = 8 and n = 4 in Strategy 7). We found that Strategy 7
works with near identical success rate so long as 0 < m ? 8, m + n ? 12, and the
second segment does not contain ?HTTP/1?. The strategy?s effectiveness is also
unaffected by the segment ordering.
Frankly, we do not yet fully understand why these strategies work. We hypoth-
esize that this species exploits the GFW?s inability to match or identify the packet
as HTTP, but it is still unclear why Strategy 6 works; some interplay between how
the GFW synchronizes its TCB after the three-way handshake also affects its ability
to process segments.
The Segmentation species required significantly more generations to find than
the previous two species. Strategy 6 emerged after 23 generations, and it required
4 more generations to achieve population convergence. Strategy 7 required 12 gen-
erations to identify. This implies that more nuanced strategies may simply require
more generations to find, and there exists an opportunity to identify additional such
strategies with a higher generation limit.
Overall, the Segmentation species is a significant departure from previously
hand-developed strategies. Unlike almost all strategies from previous work [16,
46
23, 24, 84], Segmentation strategies do not require insertion packets, and can be
deployed without raw sockets (let alone root privilege). Prior work has found that
middleboxes can drop certain insertion packets [23,24], and the requirement of root
privilege may be a deployment barrier for some users. Thus, evasion strategies
that can be deployed without insertion packets and without root privilege have an
advantage of being more reliable and easier to deploy. Moreover, we believe it would
be very challenging for a human to develop such a strategy as it exploits multiple
instances of previously unknown dynamics with the GFW.
Species 4: Hybrid The final strategy Geneva discovered against the GFW is so
distinct from other strategies that we classified it into its own species. The Hybrid
species (Strategy 8) triggers on the HTTP request (the PSH/ACK). Before sending
the original request, it sends a corrupted version, with the TCP flags set to FIN and
the IP length set to 78.
Strategy 8: Hybrid Species 53% (CN)
[TCP:flags:PA]-
duplicate(
tamper{TCP:flags:replace:F}(
tamper{IP:len:replace:78}(send)),
send)-| \/
This is not a variant of TCB Teardown: injecting a FIN packet is not sufficient
to trigger a teardown for the GFW [24]. Instead, this strategy actually causes a
desynchronization in the GFW. Why?
Recall that checksums are calculated over the entire packet?s data, but as the
packet propagates, only the bytes within the specified packet length will be sent.
47
Thus, while the client sends a correct checksum, the subsequent hops will recompute
the checksum as being different than what the client sent. In other words, the
network assists in constructing a successful insertion packet.
The IP length change cuts the censored GET request at the Host: header,
after the censored word appears. Like with the Segmentation species, this should
be sufficient for the GFW to identify it as a censored HTTP request?indeed, if we
remove the FIN flag, the strategy immediately fails. We hypothesize that the FIN
packet carrying a payload induces the GFW to enter the resynchronization state, and
causes it to resynchronize immediately on the current packet. This resynchronization
behavior is unusual. We believe the GFW has made a special case for FIN packets
with data (after one such packet in a connection, there are usually no further packets
to resynchronize on). To test this, we instrumented a client to increase the sequence
number of the valid copy of the forbidden request by the length of the injected
packet payload (in this case, 38). The GFW tried to tear down this connection,
confirming our hypothesis.
Although Geneva discovered this strategy with a fixed IP length (78), we find
that any value works so long as only one HTTP header is included in the injected
packet. We do not understand why this is the case. Our results suggest that the
GFW has a separate processing pipeline when in the resynchronization state which
differs from their regular protocol parsing. This allows us to exploit weaknesses in
this specific code path. It is this secondary bug exploitation that makes this strategy
a unique species.
This strategy also presents an interesting dilemma for the GFW as it pertains
48
to the resynchronization state. In examining the TCB Teardown variants that only
succeeded 50% of the time, our results indicated that if the GFW were to enter
the resynchronization state more frequently, they would be better protected from
TCB attacks. However, this strategy demonstrates that it is not so simple: though
increasing the likelihood of resynchronization worsens the performance of some of
the TCB Teardown variants, it would improve the Hybrid variants.
3.3.3 Other Countries
To demonstrate Geneva?s generalizability beyond China, we apply it to censors
in two other countries: India and Kazakhstan.
India Our vantage points in India are within the Airtel ISP, specifically in Ban-
galore, which performs HTTP censorship by injecting a block page response if a
request is made with a forbidden Host: header [86]. In our evaluation, we perform
an HTTP GET request to a censored site (e.g., pornhub.com) from our vantage
points, and consider the strategy to have failed if we receive the Airtel block page
instead of the requested site. Airtel does not employ residual censorship, so we do
avoid connections to blocked sites. Also, unlike the GFW, all of the strategies we
tested either work 0% or 100% of the time against Airtel. Table 3.1 evaluates all
strategies found from all of our vantage points against all three censors.
Geneva identified two broad species in India, both of which we believe are
previously unknown.
First, Geneva discovered that Airtel is incapable of handling any invalid TCP
49
options; by adding invalid TCP options to requests, we can evade censorship com-
pletely. Geneva identified variants of this strategy using almost every available TCP
option. We find that all the end-hosts we test ignore every option we add ex-
cept timestamp, so this strategy does not damage the underlying TCP connection.
Geneva also identifies additional subspecies that generate invalid options by control-
ling the dataofs field.
Second, Geneva found that Airtel is incapable of handling TCP segment re-
assembly; simply segmenting the request is sufficient for the connection to succeed.
Similarly, Strategy 9 sends only a portion of the payload before sending the entire
payload, thereby rendering the censor unable to identify the connection:
Strategy 9: Stutter Request 100% (IN)
[TCP:flags:PA]-duplicate(
tamper{IP:len:replace:64}(send),
send)-|
Collectively, we find these evasion strategies to be much simpler than those
required to evade China?s GFW. Indeed, Geneva did not identify any strategies in
India resembling the TCB Teardown strategy, and many of the strategies that take
advantage of the increased complexity of the GFW do not work against Airtel.
Kazakhstan Starting on July 17, 2019, Kazakhstan began intercepting HTTPS
connections to many social media sites using a fake root certificate [88]. Though
this interception has fortunately since ended [89], we deployed Geneva against the
system while it was active. To perform strategy evaluation, we sent an SNI request
with a targeted hostname (such as facebook.com) to HTTPS servers hosted in
50
Kazakhstan within the affected region. We consider the strategy to have failed if
our client receives the injected certificate; if we receive the correct certificate, we
consider it a success.
Within 4 hours, Geneva discovered three successful species.
Similar to Airtel?s censorship, we find that Kazakhstan?s HTTPS MITM can-
not process TCP segmentation; segmenting the targeted SNI request is sufficient
alone to evade the MITM.
Geneva discovered a second species that was originally manually developed
(and is now extinct) against the GFW: the TCB Turnaround (Strategy 1), which
sends a SYN/ACK before the SYN to make the censor believe the roles of client and
server are reversed.
Geneva also identified strategies that resemble TCB Desynchronization, though
they are simpler than the desynchronization strategies Geneva found against the
GFW. As shown in Strategy 10, simply sending a second SYN packet with a payload
circumvents the MITM with 100% success rate. All of the other desynchronization
attacks learned against the GFW also worked (see Table 3.1).
Strategy 10: Simple TCB Desynchronization 100% (KZ)
[TCP:flags:S]-duplicate(send,
tamper{TCP:load:corrupt}(send,))-|
As with India, strategies to evade Kazakhstan?s MITM attack are less sophis-
ticated and easier for Geneva to find than the GFW. These results show that Geneva
is capable of attacking diverse censorship systems and can apply broadly.
51
3.3.4 Training Defunct Strategies
Extinct Strategies In addition to deriving new strategies, we also tried multiple
strategies in now-extinct species and subspecies suggested by previous works against
the GFW. We find the TCB Creation species to be extinct; Geneva was unable to
find any functional strategies that create a new TCB. In manual testing, we also
found that strategies that relied on this species from former work no longer work, and
even improved versions of this strategy, such as TCB Creation + Resync/Desync [24]
do not work against the GFW. This includes related subspecies, such as the TCB
Turnaround [24].
TCB Teardown using a FIN or FIN/ACK packet [24] seems to be similarly ex-
tinct: the only successful TCB Teardown strategies that Geneva identified required
the RST flag to be set to successfully function. We also find the Data Reassem-
bly (as defined by previous works) species to be largely extinct. This finding also
confirms results from previous work [24], which found that IP fragment ordering
strategies were no longer effective against the GFW. However, given the nuance of
the Segmentation species, we hesitate to definitively rule out any species as fully
extinct.
Seeded Training We next experimented with how Geneva could cope with chang-
ing firewall rules in the real world. For this experiment, we seeded the evolution
using the extinct TCB Creation + Resync/Desync strategy [24] against the GFW.
Seeding the evolution spawns the initial population pool using copies of this strategy
instead of a randomly initialized pool. It takes just 4 generations for the first set of
52
new functional strategies to emerge, and within 15 generations, a sizable population
of TCB Desynchronization strategies emerged. In a second experiment, it takes just
2 generations to derive various less successful subspecies of TCB Teardown, and a
further 6 to hone it to a fully reduced, effective strategy. This demonstrates that
even if a species has achieved full population saturation and the GFW updates to
make them go extinct, Geneva is capable of pivoting to find new successful strategies.
3.4 Discussion
Is Geneva Necessary? Would it be possible to realize Geneva-like functionality with
less complexity? One alternative would be to simply enumerate the entire space of
packet manipulations. Unfortunately, this is infeasible; INTANG [24] presents a
strategy (?TCB Creation + Resync/Desync?) that would require a Geneva action
tree of size nine to represent. However, because Geneva can support modifications
to all IP and TCP fields (including multiple TCP options), there are a huge number
of potential action trees. We conservatively estimate2 that there are 289 functionally
distinct Geneva trees of size nine.
Alternatively, we could ostensibly try to distill down the lessons that Geneva
learns and use them to manually craft rules to guide strategy generation. However,
this is unnecessary (Geneva learns these lessons by itself), and worse yet, it introduces
bias : if we were to encode how we believe the censor?s implementation of TCP works
into how Geneva searches the space of solutions, we would not allow Geneva to find
2In this under-estimate, we assume that tampering with identifier fields (e.g., seq, chksum) can
only take one of two values: correct, or incorrect, and cardinal fields (e.g. dataofs) can take on
only one of three values: too-small, too-large, or just-right.
53
unintuitive strategies or bugs in the censor?s implementation.
It is possible that there is another form of machine learning that is more
accurate or more efficient than Geneva?s use of genetic algorithms. Exploring these
alternatives is beyond the scope of this chapter?my primary goal to support my
thesis is to show that the problem can be automated, and to discover strategies
manual efforts have not.
Censor Countermeasures We envision two broad ways in which censors can react
to Geneva. First and foremost, they can fix their systems. For implementation bugs,
this may be a simple matter?in fact, they may use Geneva themselves to find bugs
prior to deployment. More difficult to repair, however, are errors the censors make
in their underlying assumptions. For example, the TCB Teardown strategies exploit
the GFW?s shortcut of tearing down TCBs to save state; fixing this may introduce
significant computational overhead.
Second, censors could try to detect and thwart Geneva itself, for instance, by
detecting its training packets, and poisoning our datasets by making strategies ap-
pear (not) to work. Geneva tampers with packets in random ways, often resulting in
strange combinations of flags that would be easy to detect, like FRAPUN in Strategy 5.
Geneva could be modified to avoid this, for instance by constraining its mutations
or by punishing ?detectability? in the fitness function.
We see these as logical conclusions to the ongoing censorship arms race: even-
tually, censors will either have to fully patch their system (which seems costly)
or thwart future efforts to probe their systems (which seems infeasible). Geneva?s
54
automation speeds us to these ends. I discuss these countermeasures (and the diffi-
culties in implementing them in practice) in greater depth in Chapter 10.
Limitations of Our Evaluation We did not evaluate our system on as many van-
tage points in China as some prior work [23,24] because, since those studies, China
has made it significantly more difficult for non-Chinese residents to rent machines
in mainland China. Obtaining the vantage points we had required considerable ef-
fort. The difficulty with which to run these experiments also limits the ease with
which the results can be reproduced, a limitation that unfortunately applies to all
work in the space of nation-state censorship evasion. We find this trend concerning,
and caution users to fully understand the risks before undertaking similar studies.
Nonetheless, by applying Geneva in three fundamentally different censoring regimes,
we have shown it generalizes, and expect it would be applicable to other vantage
points in these countries, as well.
Ethical Considerations We designed Geneva to have minimal impact on other
hosts. To the best of our knowledge, the state of one host?s TCP connections does
not affect the connections of other hosts. Geneva was designed not to spoof IP
addresses or ports, and our interactions with the GFW should have had no impact
on any other users. Moreover, we designed Geneva to evaluate strategies serially,
which effectively limits the rate at which it creates TCP connections and sends data,
mitigating any impact it may have had on other hosts on the same network.
Beyond these traditional concerns of evaluating systems on shared infrastruc-
ture, there are also ethical concerns with evaluating in a censoring regime. Similar
55
to some prior work [16, 23, 24], we evaluated Geneva by running it solely on hosts
that we rented and controlled?as opposed to recruiting unwitting users [90]?to
mitigate ethical concerns.
3.5 Conclusion
There has long been a cat-and-mouse game between censors and a community
of researchers and practitioners who seek to evade them. The current evade-detect
cycle requires extensive manual measurement, reverse-engineering, and creativity
to obtain new means of censorship evasion. In this chapter, I presented Geneva, a
genetic algorithm for automatically discovering censorship evasion strategies against
network censors. Through evaluation both in-lab and against the GFW, I have
demonstrated that Geneva can efficiently discover strategies, and that its genetic
building blocks allow it to both re-derive all previously published schemes that it can
support, and derive altogether new strategies that prior work posited would not be
effective. Geneva supports my thesis and shows that middleboxes can automatically
be rendered ineffective from the client-side. Geneva represents an important first
step towards automating censorship evasion, and to this end, I have made the code
publicly available at https://geneva.cs.umd.edu.
In the next chapter, I will extend Geneva to support my thesis across multiple
protocols and in a brand-new deployment context: evading censorship from the
server-side.
56
Success Rate
Species Subspecies Variant Genetic Code CN IN KZ
None None None \/ 3% 0% 0%
[TCP:flags:PA]-duplicate(tamper{TCP:dataofs:replace:10}
Corrupt Chksum 98% 0% 100%
(tamper{TCP:chksum:corrupt},),)-|
[TCP:flags:PA]-duplicate(tamper{TCP:dataofs:replace:10}
Small TTL 98% 0% 100%
(tamper{IP:ttl:replace:10},),)-|
[TCP:flags:PA]-duplicate(tamper{TCP:dataofs:replace:10}
Inc. Dataofs Invalid Flags 26% 0% 100%
(tamper{TCP:flags:replace:FRAPUN},),)-|
[TCP:flags:PA]-duplicate(tamper{TCP:dataofs:replace:10}
Corrupt Ack 94% 0% 100%
(tamper{TCP:ack:corrupt},),)-|
[TCP:flags:PA]-duplicate(tamper{TCP:options-wscale:corrupt}
TCB Desync Corrupt WScale 98% 0% 100%
(tamper{TCP:dataofs:replace:8},),)-|
[TCP:flags:PA]-duplicate(tamper{TCP:load:corrupt}
Corrupt Chksum 80% 0% 100%
(tamper{TCP:chksum:corrupt},),)-|
[TCP:flags:PA]-duplicate(tamper{TCP:load:corrupt}
Inv. Payload Small TTL
(tamper{IP:ttl:replace:8} | 98% 0% 100%,),)-
[TCP:flags:PA]-duplicate(tamper{TCP:load:corrupt}
Corrupt Ack { } | 87% 0% 100%(tamper TCP:ack:corrupt ,),)-
Simple Payload SYN [TCP:flags:S]-duplicate(,tamper{TCP:load:corrupt})-| 3% 0% 100%
Stutter Request Stutter Request [TCP:flags:PA]-duplicate(tamper{IP:len:replace:64},)-| 3% 100% 0%
[TCP:flags:A]-duplicate(,tamper{TCP:flags:replace:R}
95% 0% 0%
(tamper{TCP:chksum:corrupt},))-|
Corrupt Chksum
[TCP:flags:A]-duplicate(tamper{TCP:flags:replace:R}
{ 51% 0% 0%(tamper TCP:chksum:corrupt},),)-|
[TCP:flags:A]-duplicate(,tamper{TCP:flags:replace:R}
87% 0% 0%
(tamper{IP:ttl:replace:10},))-|
With RST Small TTL
[TCP:flags:A]-duplicate(tamper{TCP:flags:replace:R}
{ } | 52% 0% 0%(tamper IP:ttl:replace:9 ,),)-
[TCP:flags:A]-duplicate(,tamper{TCP:options-md5header:corrupt}
86% 0% 0%
(tamper{TCP:flags:replace:R},))-|
Inv. md5Header
[TCP:flags:A]-duplicate(tamper{TCP:options-md5header:corrupt}
44% 0% 0%
(tamper{TCP:flags:replace:RA},),)-|
[TCP:flags:A]-duplicate(,tamper{TCP:flags:replace:RA}
{ 80% 0% 0%(tamper TCP:chksum:corrupt},))-|
Corrupt Chksum
[TCP:flags:A]-duplicate(tamper{TCP:flags:replace:RA}
66% 0% 0%
(tamper{TCP:chksum:corrupt},),)-|
[TCP:flags:A]-duplicate(,tamper{TCP:flags:replace:RA}
94% 0% 0%
(tamper{IP:ttl:replace:10},))-|
Small TTL
[TCP:flags:A]-duplicate(tamper{TCP:flags:replace:RA}
{ } | 57% 0% 0%(tamper IP:ttl:replace:10 ,),)-
Teardown With RST/ACK
[TCP:flags:A]-duplicate(,tamper{TCP:options-md5header:corrupt}
94% 0% 0%
(tamper{TCP:flags:replace:R},))-|
Inv. md5Header
[TCP:flags:A]-duplicate(tamper{TCP:options-md5header:corrupt}
(tamper{TCP:flags:replace:R} | 48% 0% 0%,),)-
[TCP:flags:A]-duplicate(tamper{TCP:flags:replace:RA}
43% 0% 0%
(tamper{TCP:ack:corrupt},),)-|
Corrupt Ack
[TCP:flags:A]-duplicate(,tamper{TCP:flags:replace:RA}
31% 0% 0%
(tamper{TCP:ack:corrupt},))-|
[TCP:flags:A]-duplicate(,tamper{TCP:flags:replace:FRAPUEN}
{ 89% 0% 0%(tamper TCP:chksum:corrupt},))-|
Corrupt Chksum
[TCP:flags:A]-duplicate(tamper{TCP:flags:replace:FRAPUEN}
{ } | 48% 0% 0%(tamper TCP:chksum:corrupt ,),)-
[TCP:flags:A]-duplicate(,tamper{TCP:flags:replace:FREACN}
96% 0% 0%
(tamper{IP:ttl:replace:10},))-|
Invalid Flags Small TTL
[TCP:flags:A]-duplicate(tamper{TCP:flags:replace:FRAPUEN}
56% 0% 0%
(tamper{IP:ttl:replace:10},),)-|
[TCP:flags:A]-duplicate(,tamper{TCP:flags:replace:FRAPUN}
94% 0% 0%
(tamper{TCP:options-md5header:corrupt},))-|
Inv. md5Header
[TCP:flags:A]-duplicate(tamper{TCP:flags:replace:FRAPUEN}
55% 0% 0%
(tamper{TCP:options-md5header:corrupt},),)-|
[TCP:flags:PA]-fragment{tcp:8:False}-|
With ACK Offsets { 94% 100% 100%[TCP:flags:A]-tamper TCP:seq:corrupt}-|
Segmentation
Reassembly Offsets [TCP:flags:PA]-fragment{tcp:8:True}(,fragment{tcp:4:True})-| 98% 100% 100%
Simple In-Order [TCP:flags:PA]-fragment{tcp:-1:True}-| 3% 100% 100%
[TCP:flags:PA]-duplicate(tamper{TCP:flags:replace:F}
Hybrid With FIN Cut Header 53% 100% 0%
(tamper{IP:len:replace:78},),)-|
TCB Turnaround TCB Turnaround TCB Turnaround [TCP:flags:S]-duplicate(tamper{TCP:flags:replace:SA},)-| 3% 0% 100%
Invalid Options Invalid Options Corrupt UTO [TCP:flags:PA]-tamper{TCP:options-uto:corrupt}-| 3% 100% 0%
Table 3.1: Species, subspecies, and variants Geneva found (with success rates)
against the GFW. For readability, we omit all ?send?s from the genetic code (e.g.,
duplicate(,) is equivalent to duplicate(send,send)). This is correct, syntactic
sugar for Geneva.
57
Censor behavior Learned strategy to defeat
1. Synchronizes TCB on the first SYN only; sends RSTs Drop inbound RST packets.
only to the client if a censored word appears any-
where in any packet and a matching TCB exists.
2. Synchronizes TCB on the first SYN only; sends RSTs Inject a SYN packet with a different se-
to the client and server if a censored word appears quence number.
anywhere in any packet and a matching TCB exists.
3. Synchronizes TCB on the first SYN only, drops all fu- Inject a SYN packet with a different se-
ture client/server communication if a censored word quence number.
appears anywhere in any packet and a matching
TCB exists.
4. Synchronizes TCB on SYN and ACK packets; sends Inject an insertion ACK packet with a
RSTs to the client and server if a censored word ap- different sequence number after the 3-
pears anywhere in any packet and a matching TCB way handshake.
exists.
5. Synchronizes TCB on SYN, and resynchronizes pe- Inject an insertion ACK packet with a
riodically every few packets packets; sends RSTs to different sequence number after the 3-
the client and server if a censored word appears any- way handshake.
where in any packet and a matching TCB exists.
6. Synchronizes TCB using only IP addresses on SYN Inject an insertion RST packet after the
and SYN/ACK; sends RSTs to the client and server 3-way handshake, or induce the server
if a censored word appears anywhere in an HTTP to send a RST on another port.
header or packet payload unless TCB is torn down.
7. Synchronizes TCB using only IP/port tuples on SYN Inject an insertion RST packet after the
and SYN/ACK; sends RSTs only to the client if a cen- 3-way handshake.
sored word appears anywhere in any packet unless
TCB is torn down.
8. Synchronizes TCB on SYN, SYN/ACK, and ACK; sends Inject an insertion RST packet after the
RSTs only to the client if a censored word appears 3-way handshake.
anywhere in any packet unless TCB is torn down.
9. Synchronizes TCB on SYN and ACK; sends RSTs only Inject an insertion RST or FIN after the
to the client if a censored word appears anywhere 3-way handshake, and then send a fol-
in any packet, and enters a resynchronization state lowup insertion packet with a different
on any RST or FIN packet. sequence number.
10. Synchronizes TCB on SYN, only processes packets Inject an insertion RST packet after
with correct checksums; sends RSTs only to the client the 3-way handshake using a non-
if a censored word appears anywhere in any packet, checksum insertion mechanism (e.g.,
and enters a resynchronization state on any RST or low TTL), immediately followed by
FIN packet. another insertion packet with an incor-
rect sequence number.
11. Synchronizes TCB on SYN, only processes packets Inject an insertion RST packet after the
with correct checksums, lengths, and data offsets; 3-way handshake using a low TTL, im-
sends RSTs only to the client if a censored word ap- mediately followed by another inser-
pears anywhere in any packet, and enters a resyn- tion packet with an incorrect sequence
chronization state on any valid RST or FIN packet. number.
Table 3.2: Mock censors developed for in-lab training, and strategies Geneva learned
to defeat them.
58
Found?
Species Strategy [16] [23] [24] Geneva
w/ low TTL X X X
TCB Creation w/ corrupt checksum X X
(Improved) and Resync/Desync X X
w/ RST and low TTL X X X X
w/ RST and corrupt checksum X X X
w/ RST and invalid timestamp X X
w/ RST and invalid MD5 Header X X
w/ RST/ACK and corrupt checksum X X
w/ RST/ACK and low TTL X X X X
TCB Teardown
w/ RST/ACK and invalid timestamp X X
w/ RST/ACK and invalid MD5 Header X X
w/ FIN and low TTL X X X
w/ FIN and corrupt checksum X X
(Improved) X X
and TCB Reversal X X
TCP Segmentation w/ out of order data X X X
Overlapping fragments X X X
Overlapping segments X X X
In-order data w/ low TTL X X
In-order data w/ corrupt ACK X X X
In-order data w/ corrupt checksum X X
Reassembly
In-order data w/ no TCP flags X X
Out-of-order data w/ IP fragments X X
Out-of-order data w/ TCP segments X X
(Improved) In-order data overlapping X X
Payload splitting X X
Payload reordering X X
Inert Packet Insertion w/ corrupt checksum X X
Traffic Misclassification
Inert Packet Insertion w/o ACK flag X X
Send > 1KB of traffic X
State Exhaustion
Classification Flushing ? Delay X X
> 1 space between method and URI X
Keyword at location > 2048 X
HTTP Incompleteness Keyword in 2nd or higher of multiple
X
requests in one segment
URL encoding (except %-encoding) X
Table 3.3: Prior work?s effective TCP-based strategies and whether Geneva re-
derived the strategy in the lab or in the wild, regardless of whether the strategy
is still effective. Note that Geneva had no knowledge of HTTP fields and could not
introduce delays into the request.
59
Chapter 4: Server-side Evasion
In the previous chapter, I demonstrated that it is possible to automatically
discover censorship evasion strategies that run purely at the client, but this left
open a critical question: Do all censorship evasion strategies have to run at the
client, or could servers evade censorship on clients? behalves? Indeed, I am aware
of no prior censorship evasion that runs purely server-side. In this chapter, I show
that server-side evasion is indeed possible, and that it can be used to evade multiple
protocols (HTTP, HTTPS, DNS, and more). My results from training against many
protocols also exposes new insights into the designs and deployments of censorship
infrastructures.
For a client inside a censoring regime to access censored content, it seems
quite natural that the client would have to deploy something. Indeed, to the best
of our knowledge, all prior work in censorship evasion has required some degree of
deployment at the clients within the censoring regime. Proxies [65, 91], decoy rout-
ing [21,47], VPNs, anonymous communication protocols [20], domain fronting [58],
protocol obfuscation [26, 48, 49], and recent advances that confuse censors by ma-
nipulating packets [16,23,24,40]?all of these prior solutions require various degrees
of active participation on behalf of clients.
60
Unfortunately, active participation on the part of clients can limit the reach of
censorship evasion techniques. In some scenarios, installing anti-censorship software
can put users at risk [92]. For users who are willing to take on this risk, it can be
difficult to bootstrap censorship evasion, as the anti-censorship tools themselves may
be censored [93, 94]. Worse yet, there are many users who do not seek out tools to
evade censorship because they do not even know they are being censored [95].
Ideally, servers located outside of a censoring regime would be able to help
clients evade censorship without the client having to install any extra software what-
soever. If possible, this could result in a more open Internet for users who are
otherwise unable (or unfamiliar with how) to access censored content.
To our knowledge, there has been no prior work that has explored evasion
techniques that involve no client-side participation whatsoever. This is not for lack
of want; rather, at first glance, it would appear that server-side-only techniques
could not possibly provide a sufficient solution. To see why, let us consider all of the
packets that are transmitted that lead up to an HTTP connection being censored
due to the client issuing a GET request for a censored keyword. First, the client
would initiate a TCP three-way handshake, during which the client sends a SYN,
the server responds with a SYN+ACK, and the client responds with an ACK. Then,
the client would send a PSH+ACK packet containing the HTTP request with the
censored keyword, at which point the censor would tear down the connection (e.g.,
by injecting RST packets to both the client and the server). Note that the only packet
a server sends before a typical censorship event is just a SYN+ACK?this would seem
to leave very little room for a censorship evasion strategy.
61
In this chapter, I present the first purely server-side censorship evasion strategies?
11 in total, spanning four countries (China, India, Iran, and Kazakhstan). Like a
recent string of papers [16, 23, 24, 40], these strategies do not involve a custom pro-
tocol, but rather operate by manipulating packets of existing applications, e.g., by
inserting, duplicating, tampering, or dropping packets. We verify that each of these
strategies (sometimes with small tweaks) work with completely unmodified clients
running any major operating system.
To find these strategies, we make use Geneva. While this required several
modest extensions to the tool, I do not claim them as a primary contribution of
this chapter. Rather, the primary contributions are the discovery that server-side
strategies are possible at all, and the various insights we have gained from follow-up
experiments that explain why the strategies Geneva found work. Though the specific
circumvention strategies may be patchable, the underlying insights they allowed us
to glean are, we believe, more fundamental. These findings include:
? Server-side-only circumvention strategies are possible! We succeeded in finding
them in every country we tested (China, India, Iran, and Kazakhstan) and for all
of the protocols we were able to trigger censorship with (DNS-over-TCP, FTP,
HTTP, HTTPS, and SMTP).
? The so-called Great Firewall (GFW) of China has a more nuanced ?resynchro-
nization state? than previously reported [24,40].
? China uses different network stacks for each of the protocols that it censors; cir-
cumvention strategies that work for one application-layer protocol (e.g., HTTPS)
62
do not necessarily work for another (e.g., HTTP or SMTP).
The rest of this chapter is organized as follows. ?4.1 empirically shows that, un-
fortunately, client-side techniques do not generalize to server-side. ?4.2 presents our
experiment methodology. We present 11 new server-side evasion strategies in ?4.3,
and through further examination, shed new light on the inner workings of censor-
ship in China, India, Iran, and Kazakhstan. ?4.4 explores our theory that censors
employ different network stacks for each censored application. ?4.5 shows that our
server-side strategies work for a wide diversity of client OSes. We discuss deployment
considerations in ?4.6 and ethical considerations in ?4.7. Finally, ?4.8 concludes this
chapter.
4.1 Client-Side Strategies do not Generalize
First, we answer a natural question: do previously discovered client-side results
generalize to server-side?
Prior work has identified a wealth of client-side strategies for circumventing
censorship. Some of these strategies are tailored specifically to the client; for in-
stance, ?Segmentation? strategies split up a client?s HTTP GET request across
multiple TCP packets, exploiting an apparent bug in some censors? packet reassem-
bly code [40]. However, other client-side strategies appear as if they would work
from the server, as well. For example, a seminal circumvention strategy has the
client send a TCP RST with a TTL large enough to reach the censor but too small
to reach the server [16,23,24,40,57]. As a result of this strategy, the censor believes
63
the connection has been torn down and thus pays no attention to future packets from
that connection, allowing the client to send requests that would have otherwise been
censored. Should such strategies not also work from the server?
We experimentally evaluated whether client-side strategies can be translated
to work from the server-side, as well. Starting with all 36 of the currently working
client-side strategies described in the previous chapter, we manually identified 11
strategies that had no obvious server-side analog (such as Segmentation) and dis-
carded them. All the remaining 25 strategies involved sending an ?insertion packet?
(a packet that is processed by the censor but not by the server, like the TTL-limited
RST) during or immediately after the 3-way handshake.
The only packet a server typically sends before the censored query is a SYN+ACK.
For each strategy, we generate two new server-side analogs: one that sends the
insertion packet before the SYN+ACK, and one that sends it after. We then tested
these strategies with clients at vantage points within China connecting to a server
we control at a vantage point in the US.
Unfortunately, none of these strategies worked when run server-side. This
is surprising: many of the ?TCB Teardown? strategies described in the previous
chapter involve the client sending tear-down packets (insertion packets with RST
or RST+ACK flags) immediately after receiving the server?s SYN+ACK; these server-
side analogs also send tear-down packets immediately after the SYN+ACK, the only
difference being that they come from the server. We considered the possibility that
network delays were causing the server?s tear-down packets to arrive at the censor
64
after the client?s censored query1. To account for this, we instrumented our client
to delay sending its query until it received the insertion packets, but this was also
unsuccessful at evading censorship.
In other words, for some of these strategies, the only difference was whether
it was the client or the server that sent the insertion packets, and yet none of them
work. We considered that the censor may be treating inbound packets differently
than outbound?for instance, it may have been the case that the censor simply
ignores inbound RST packets. To test for this, we also ran the server from inside
China and the client in the US, but the strategies continued to fail. This indicates
that the GFW tries to determine which host is the client (the one who initiated the
connection), and processes the client?s packets differently than the server?s.
Collectively, these results show that client-side strategies do not generalize
to server-side. Moreover, the results show that clients? and servers? packets are
processed differently, and therefore the censors? shortcomings that previous work
exploited client-side do not necessarily lend insight into how to circumvent from
server-side. In short: server-side censorship circumvention requires a blank-slate
approach.
4.2 Server-side Methodology
In this section, I describe my methodology in deploying Geneva, data collection,
and experimentation.
1This is not an issue when clients send both the tear-down and the query, because we can
generally expect packets to arrive FIFO.
65
4.2.1 Geneva Extensions
New Protocols Geneva?s initial design was initially applied only to HTTP. In
this chapter, I show that Geneva can be applied to be able to train over a variety
of applications across a variety of protocols. Specifically, I added support for DNS-
over-TCP, FTP, HTTPS, and SMTP.
Non-additions I also explored applying server-side evasion to Tor Bridges and
Telegram MTProxy servers [96, 97]. Although Tor and Telegram are both blocked
at the IP and DNS level, as of time of writing, I was unable to trigger active probing
to private unpublished Tor bridges or MTProxies. The Tor team is aware that Tor
did not trigger active probing as of time of writing, and these findings are consistent
with recent reports [24, 40]. We focus our efforts on the protocols that are getting
censored now, and we leave a deeper exploration of server-side training over other
anti-censorship protocols to later work.
Server-side Evasion Geneva is largely agnostic to packet semantics; it is able to
recompute checksums, but it is not configured to understand the meanings behind
any particular packet header fields. As a result, converting Geneva from client-side
to server-side was relatively straightforward, requiring only minor changes to its
implementation.
We configured Geneva to initialize each population pool with 300 individuals,
and allowed evolution to take place for 50 generations, or until population conver-
gence occurs. Although Geneva is capable of evolving not only how it manipulates
packets but also which packets it triggers on, we observed that for DNS-over-TCP,
66
Country Vantage Points Protocols
China Beijing, Shanghai DNS, FTP, HTTP,
Shenzen, Zhengzhou HTTPS, SMTP
India Bangalore HTTP
Iran Tehran, Zanjan HTTP, HTTPS
Kazakhstan Qaraghandy, Almaty HTTP
Table 4.1: Client locations and protocols used in our experiments.
HTTP, HTTPS, and SMTP, the only packet the server could trigger on before a
censorship event was the SYN+ACK packet. Thus, as a slight optimization, for these
protocols, we restricted Geneva to only be able to trigger on SYN+ACKs.
4.2.2 Data Collection Methodology
Over the span of five months, we ran Geneva server-side in six countries?
Australia, Germany, Ireland, Japan, South Korea, and the US?on five proto-
cols: DNS (over TCP), FTP, HTTP, HTTPS, and SMTP (all over IPv4). We
used unmodified clients within four nation-state censors?China, India, Iran, and
Kazakhstan?to connect to our servers. For each nation-state censor, we trained on
each protocol for which we were able to trigger censorship; all four countries cen-
sored HTTP, but only China censored all six protocols.2 Table 4.1 shows the client
locations and protocols we used throughout our experiments. Within each censored
regime, we find no significant difference in strategy effectiveness across the different
vantage points or external servers.
Each country and protocol required a slightly different configuration to trigger
censorship:
2Contrary to the findings by Aryan et al. [55], we find that Iran no longer censors DNS-over-TCP
at all.
67
? DNS-over-TCP (China): We make a censored request with an unmodified DNS
client to open resolvers (Google and Cloudflare), as well as resolvers we control
outside China.
? FTP (China): We sign into FTP servers we control and issue requests for files
with sensitive keywords as names (e.g., ultrasurf).
? HTTP (all countries): In China, we issue GET requests with a censored keyword
in the URL parameters (for instance, ?q=ultrasurf). In India, Iran, and Kaza-
khstan, we issue GET requests with a blacklisted website in the Host: header.
? HTTPS (China and Iran): We perform a TLS handshake with a forbidden URL
(e.g., youtube.com in Iran and www.wikipedia.org in China) in the Server Name
Indication (SNI) field.
? SMTP (China): We connect to SMTP servers we control and, from our unmod-
ified clients, send an email to a forbidden email address, xiazai@upup.info [98].
In all of the above settings, we configure Geneva to consider censorship to have been
avoided if the connection is not forcibly torn down and if the client receives the
correct, unaltered data.
Residual Censorship In China, we observe that different protocols are handled
differently by the GFW. For example, over HTTP, the GFW has residual censor-
ship: for approximately 90 seconds after a forbidden request is censored, all TCP
requests to the server IP and port elicit tear-down packets from the GFW immedi-
ately following the three-way handshake. Prior work has documented the existence
68
of residual censorship in some cases for HTTPS; however, we do not observe this
behavior from any of our vantage points during our experiments and confirm that as
of time of writing, HTTPS residual censorship is not active in China. Further, we do
not observe this behavior from any of our vantage points in China for SMTP, DNS-
over-TCP, or FTP; after the forbidden request on these protocols is censored, the
user is free to make a second follow-up request immediately. I will report on more
specific dynamics of residual censorship later in this dissertation; for this chapter,
residual censorship is primarily relevant towards informing the methodology.
Evasion Success Rates It has been shown that, somewhat surprisingly, some
packet-manipulation strategies succeed only some of the time; for instance, in the
previous chapter, we found some client-side strategies that work roughly 50% of the
time. Throughout this chapter, we present the success rates of the various strategies
Geneva has found. For DNS in particular, this requires some special consideration,
because, according to RFC 7766 [99] on DNS-over-TCP: DNS clients SHOULD
retry unanswered queries if the connection closes before receiving all outstanding
responses. No specific retry algorithm is specified in this document. Censorship by
the GFW qualifies as a premature connection close, and thus results in retries, but
the RFC leaves the exact number of retries up to the implementer. This serves to
greatly improve the success rates of any server-side strategies for DNS-over-TCP:
even if the strategy works only 50% of the time, with just 2 retries (3 total queries),
the success rates will improve to 87.5%.
We have found that, in practice, applications choose different numbers of DNS
69
retries. Some dig versions make only 1 retry, others retry repeatedly (sometimes
3?5 times), and others allow the user to specify how many. Python?s DNS library
tries 3 times over TCP when faced with the GFW?s TCP RSTs. Google Chrome on
Windows retries 4 times after a censorship event (for a total of 5 requests per page
load). Chrome also periodically retries failed page loads (often over 20 times, we
have observed). To be consistent with most DNS clients, we test all of our strategies
with a maximum of 3 tries.
Follow-up Experiments At the end of each run, Geneva outputs the packet-
manipulation strategies that succeeded (and failed). We then perform follow-up
experiments to understand why the strategies work (or fail) and to glean information
about how these various censors operate. We describe the specific steps we take in-
line with our results.
4.3 Server-Side Results
Here, we detail newly discovered strategies that defeat censors from the server-
side. Table 4.2 summarizes our results across all countries (China, India, Iran, and
Kazakhstan) and applications (DNS-over-TCP, FTP, HTTP, HTTPS, and SMTP).
4.3.1 Server-side Evasion in China
We applied Geneva from the server side against the GFW across DNS, FTP,
SMTP, HTTP, and HTTPS. Geneva identified 8 distinct server-side only strategies
that are successful at least 50% of the time for at least one protocol in China: 4 for
70
Strategy Success Rates
# Description DNS FTP HTTP HTTPS SMTP
China
? No evasion 2% 3% 3% 3% 26%
11 Sim. Open, Injected RST 89% 52% 54% 14% 70%
12 Sim. Open, Injected Load 83% 36% 54% 55% 59%
13 Corrupt ACK, Sim. Open 26% 65% 4% 4% 23%
14 Corrupt ACK Alone 7% 33% 5% 5% 22%
15 Corrupt ACK, Injected Load 15% 97% 4% 3% 25%
16 Injected Load, Induced RST 82% 55% 52% 54% 55%
17 Injected RST, Induced RST 83% 85% 54% 4% 66%
18 TCP Window Reduction 3% 47% 2% 3% 100%
India
? No evasion 100% 100% 2% 100% 100%
18 TCP Window Reduction ? ? 100% ? ?
Iran
? No evasion 100% 100% 0% 0% 100%
18 TCP Window Reduction ? ? 100% 100% ?
Kazakhstan
? No evasion 100% 100% 0% 100% 100%
18 TCP Window Reduction ? ? 100% ? ?
19 Triple Load ? ? 100% ? ?
20 Double GET ? ? 100% ? ?
21 Null Flags ? ? 100% ? ?
Table 4.2: Summary of server-side-only strategies and their success rates. All of
these strategies manipulate only TCP, and yet, against China?s GFW, their success
rates are application-dependent. Kazakhstan?s HTTPS and Iran?s DNS-over-TCP
censorship infrastructure are currently inactive.
DNS, 5 for FTP, 1 for SMTP, 4 for HTTP, and 2 for HTTPS. We provide packet wa-
terfall diagrams in Figure 4.1 which show the resulting server- and client-behaviors
when the strategies are run. Although the strategies require no client-side modi-
fications whatsoever, they induce client-side behavior that assists in circumventing
censorship. In the rest of this subsection, we explore each of these strategies, explain
why they work, and describe what they teach us about China?s GFW.
Strategy 11: Simultaneous Open, Injected RST (China)
DNS (89%), FTP (52%), HTTP (54%), HTTPS (14%), SMTP (70%)
[TCP:flags:SA]-
duplicate(
tamper{TCP:flags:replace:R},
tamper{TCP:flags:replace:S})-| \/
Simultaneous Open Strategy 11 triggers on outbound SYN+ACK packets. Instead
71
Client Server Client Server Client Server Client Server Client Server Client Server Client Server Client Server Client Server
SYN SYN SYN SYN SYN SYN
SYN
SYN SYN
RST SYN SYN/ACK SYN/ACK
FIN
RST SYN/ACK
SYN/ACK (bad ackno) (bad ackno) SYN/ACK (w/ load)
(bad ackno) (small window)
SYN SYN
(corrupted) SYN/ACK SYN/ACK SYN/ACKACK SYN RST (bad ackno) (bad ackno) ACK
SYN/ACK
PSH/ACK RST RST RST RST
(query) SYN/ACK PSH/ACKACK SYN/ACK (query segment)
SYN/ACK ACK (w/ load)
ACK ACK
PSH/ACK SYN/ACK SYN/ACK PSH/ACK
(query) ACK PSH/ACK (query segment)ACK
PSH/ACK ACK ACK
ACK (query)
(response) ACK PSH/ACK PSH/ACK PSH/ACK PSH/ACK PSH/ACK
PSH/ACK (query) ACK (query) (query) (query)
(query segment)
PSH/ACK (query)
(response) ACK ACK ACKACK PSH/ACK ACK
ACK (response)
PSH/ACK PSH/ACK PSH/ACK
PSH/ACK (response) (response) (response) PSH/ACK
PSH/ACK (response) (response)
(response)
Strategy 1 Strategy 2 Strategy 3 Strategy 4 Strategy 5 Strategy 6 Strategy 7 Strategy 8
Normal behavior Simultaneous open, Simultaneous open, Corrupted ACK, Corrupted ACK, Corrupted ACK, Corrupted load, Injected RST, TCP window
injected RST injected load simultaneous open alone injected load induced RST induced RST reduction
Figure 4.1: Server-side evasion strategies in China. All of the strategies work with-
out modifications to the client, and yet they induce client-side behavior that helps
circumvent censorship. (Standard packets at the beginning and the end are grayed
out to emphasize the critical differences from normal behavior.)
of sending the SYN+ACK, it replaces it with two packets?a RST and a SYN?and sends
them instead. How does an unmodified client respond to this strange sequence of
packets?
First, the RST packet is actually ignored by the client, because it does not
have the ACK flag set and the TCP connection is not yet in a synchronized state.
Despite RFC 793 [100] suggesting that the connection be torn down, we find that
in practice, TCP implementations across all modern operating systems ignore this
RST. Second, the injected SYN packet serves to initiate TCP simultaneous open.
RFC 793 [100] requires TCP implementations to support simultaneous open.
Originally, simultaneous open was meant to occur when two hosts attempt to open
a connection by sending SYN packets to each other at the same time. However, a
server can simulate simultaneous open by responding to a SYN packet from the client
with a SYN packet of its own. To the client, this resembles simultaneous open, since
the client receives a SYN packet, and therefore must respond with a SYN+ACK packet.
This strategy employs simultaneous open by first sending an inert RST packet, then
72
by setting up the connection with a SYN packet.
When used for HTTP, Strategy 11 has a success rate of 54%. We see similar
success rates for FTP and for each single DNS-over-TCP query (recall that DNS
will try up to 3 times).
It is tempting to assume that this strategy works because the injected RST tears
down the connection, and the SYN packet looks like an entirely new connection in the
reverse direction (thereby making the censored request sent by the client ignored).
However, this is not the case?as demonstrated above, injected RST packets either
inside or outside the 3-way handshake from the server are unable to tear down
a connection. Another potential theory is that the GFW simply cannot properly
handle TCP simultaneous open; this too, however, is incorrect: if the RST is removed
from the strategy, the strategy fails. Instead, we hypothesize that this strategy is far
more nuanced, and is actually performing a desynchronization attack by exploiting
a bug in the GFW?s resynchronization state.
Prior work has hypothesized that the presence of a RST packet during the
three-way handshake can put the GFW in a resynchronization state with about
50% probability [24, 40]. Therefore, we expect the injected RST packet not to tear
down the connection, but instead to put the GFW into the resynchronization state.
Wang et al. hypothesized that the only packets sent by the server that the GFW
resynchronizes on are SYN+ACK packets, so the next packet for the GFW to resyn-
chronize on is the SYN+ACK packet sent by the client. At this point, the GFW should
just properly resynchronize onto our connection?but it does not. Why?
When TCP simultaneous open is performed, the sequence number does not
73
advance during the handshake in the same fashion as it does in a regular TCP three-
way handshake. During TCP simultaneous open, the SYN+ACK packet sent by the
client retains the same sequence number as the original SYN packet, and 1 is not
added to the sequence number until the ACK packet is sent. Therefore, if the GFW?s
resynchronization state is not aware that simultaneous open is being performed, it
will synchronize onto this SYN+ACK packet and assume that the sequence number has
already been incremented by 1, as it would be if this were an ACK packet finishing
the regular 3-way handshake. As such, the GFW will fail to advance its sequence
number by 1 when the request is sent by the client, making the GFW desynchronized
by 1 byte from the real connection.
To test this theory, we instrumented a client-side request to decrement the
sequence number of the forbidden request by 1 while the strategy is run on the
server side. If the theory holds, we expect to experience censorship approximately
50% of the time (as this is how frequently China?s censors enter the resynchronization
state [24]). Indeed, when we perform this experiment, that is exactly the result we
see. Note that if we perform this sequence number adjustment experiment without
running the server-side strategy, we never experience censorship as expected, because
the real query is now desynchronized from the connection.
This experiment suggests that Strategy 11 actually performs a desynchroniza-
tion attack against the GFW, and that a bug exists in the GFW?s resynchronization
state handling of simultaneous open. As we will see, this bug is quite powerful, and
Geneva identifies it repeatedly in our experiments.
Strangely, Strategy 11 does not work well against HTTPS. We hypothesize
74
this is because the RST does not cause the GFW to enter the resynchronization
state for HTTPS, but does for the other protocols. The rest of this section explores
a number of cases in which TCP/IP-level attacks work well for one application-level
protocol but not another; ?4.4 offers an explanation why this occurs.
Strategy 12: Simultaneous Open, Injected Load (China)
DNS (83%), FTP (36%), HTTP (54%), HTTPS (55%), SMTP (59%)
[TCP:flags:SA]-
tamper{TCP:flags:replace:S}(
duplicate(,
tamper{TCP:load:corrupt}),)-| \/
Strategy 12 also relies on simultaneous open, but with a slightly different
mechanism. Rather than injecting a RST, it changes the outgoing SYN+ACK packet
into two SYN packets: the first SYN is well-formed and the second has a random
payload. It has comparable success to Strategy 11, though slightly worse for FTP
(36% vs. 52%) and SMTP (59% vs. 70%), and better for HTTPS (55% vs. 14%).
Like with the first strategy, when the first SYN packet reaches the client, it
triggers simultaneous open, prompting the client to respond with a SYN+ACK. Since
both SYN packets are sent simultaneously, both likely cross the GFW before the
client responds. The second SYN packet with a payload will induce the GFW to
enter the resynchronization state, and like last time, the next packet available for
it to resynchronize on is the SYN+ACK packet from the client, again desynchronizing
the GFW by 1 from the connection. We confirmed this by repeating the prior
experiment on this strategy.
Strategy 12 does not damage the TCP connection despite the client being
75
unmodified. Although it is uncommon for SYN packets to carry a payload, this
is permitted by the RFC (this behavior is required by TCP Fast Open), and the
payload is ignored by the client (though the client does respond with an ACK to
acknowledge the current sequence number).
Strategy 13: Corrupted ACK, Simultaneous Open (China)
DNS (26%), FTP (65%), HTTP (4%), HTTPS (4%), SMTP (23%)
[TCP:flags:SA]-
duplicate(
tamper{TCP:ack:corrupt},
tamper{TCP:flags:replace:S})-| \/
Geneva identified one final strategy relying on simultaneous open. Strategy 13
copies the SYN+ACK packet: it corrupts the ack number of the first, and converts
the second to a SYN. The SYN+ACK with the corrupted ack number induces the
client to send a RST packet, before responding with a SYN+ACK to initiate the TCP
simultaneous open. However, unlike Strategies 11 and 12, this strategy is the most
successful for FTP.
Wang et al. [24], while studying HTTP censorship, hypothesized that a SYN+ACK
from the server with an incorrect ack number is sufficient to trigger the GFW?s resyn-
chronization state. We observe that this is no longer true for; however, it does work
for FTP censorship. Therefore, when the SYN+ACK with the corrupted ack num-
ber is sent, the FTP portion of the GFW enters the resynchronization state and
resynchronizes on the next packet from the client?the RST induced by the incorrect
ack number. Because the RST packet has the incorrect sequence number, the GFW
will become desynchronized from the connection. Geneva also identified successful
76
variants of this species in which the order of the two packets is reversed.
Strategy 14: Corrupt ACK Alone (China)
DNS (7%), FTP (33%), HTTP (5%), HTTPS (5%), SMTP (22%)
[TCP:flags:SA]-
duplicate(
tamper{TCP:ack:corrupt},)-| \/
Strategy 14 is identical to Strategy 13, but without simultaneous open. This
shows that, although simultaneous open is not required to evade FTP censorship,
it improves the success rate (33% vs. 65%).
Strategy 15: Corrupt ACK, Injected Load (China)
DNS (15%), FTP (97%), HTTP (4%), HTTPS (3%), SMTP (25%)
[TCP:flags:SA]-
duplicate(
tamper{TCP:ack:corrupt},
tamper{TCP:load:corrupt})-| \/
Strategy 15 offers an even greater improvement in success rate. This strategy
sends a SYN+ACK with a corrupted ack number, followed by another SYN+ACK with a
random payload. As with the previous strategies, the corrupted ack number induces
the client to send a RST packet, which the GFW resynchronizes on. This RST is
critical to the strategy?s success: if we instrument the client to drop this induced
RST, the strategy stops being effective.
Strategy 15 is highly successful (97%), but again, largely only applicable to
FTP. We do not yet understand the reason for the improvement in success rate with
the inclusion of simultaneous open or an inert payload.
We draw special attention here to the specific order that the injected packets
77
are sent (first, corrupted ack, followed by injected payload). When we reverse the
order of the packets, the strategy is ineffective. However, Geneva discovered a suc-
cessful species almost identical to this experimental ineffective strategy, requiring
only one modification:
Strategy 16: Injected Load, Induced RST (China)
DNS (82%), FTP (55%), HTTP (52%), HTTPS (54%), SMTP (55%)
[TCP:flags:SA]-
duplicate(
duplicate(
tamper{TCP:flags:replace:F}(
tamper{TCP:load:corrupt},),
tamper{TCP:ack:corrupt}),)-| \/
Resynchronization State, Revisited Strategy 16 replaces the outbound SYN+ACK
with three packets: (1) A FIN with a random payload, (2) A SYN+ACK with a cor-
rupted ack number, and (3) The original SYN+ACK. Note the apparent similarity with
Strategy 15: an inert payload and SYN+ACK with corrupted ack are both sent to the
client, but Geneva found that adding the FIN makes the strategy more effective for
all but FTP. We also found that this strategy works equally well if an ACK flag is
sent instead of FIN.
When the FIN (or ACK) packet with the payload arrives at the client, it is
ignored, and like with previous strategies, when the corrupted SYN+ACK packet ar-
rives, it induces a RST. However, unlike the previous strategies, this RST packet is
not a critical component of the strategy, but rather a vestigial side-effect of it?if
we instrument the client to drop the RST, the strategy is still equally effective. This
is because the GFW is resynchronizing not on the RST, but instead on the SYN+ACK
78
packet with an incorrect ack number.
This presents a stark difference from Strategy 15?once the corrupted ack
number caused the GFW to enter the resynchronization state over FTP, the GFW
did not resynchronize on the next packet in the connection (which would be a
SYN+ACK with the correct sequence and ack numbers), but rather on the next packet
from the client (the RST with an incorrect sequence number). This has a surprising
implication: depending on the reason the GFW enters the resynchronization state,
it behaves differently.
In summary, our hypothesis for the new behavior of the resynchronization
state is as follows:
1. A payload from the server on a non-SYN+ACK packet causes the GFW to resyn-
chronize on the next SYN+ACK packet from the server or the next packet from
the client with the ACK flag set for every protocol.
2. A RST from the server causes the GFW to resynchronize on the next packet it
sees from the client for each protocol except HTTPS.
3. A SYN+ACK with a corrupted ack number only causes a resync for FTP, and it
resynchronizes on the next packet from the client.
We test this theory with Strategy 17, which begins by copying the SYN+ACK
packet twice. To the first duplicate, the flags are changed to RST, to the second
duplicate, the ack number is corrupted, and the third is left unchanged. All three
packets are then sent. The first RST packet is ignored by the client, the corrupted
79
ACK induces the client to send a RST, and finally the client responds to the server?s
SYN+ACK with an ACK to properly finish the handshake.
Strategy 17: Injected RST, Induced RST (China)
DNS (83%), FTP (85%), HTTP (54%), HTTPS (4%), SMTP (66%)
[TCP:flags:SA]-
duplicate(
duplicate(
tamper{TCP:flags:replace:R},
tamper{TCP:ack:corrupt}),)-|
If our above new model for the resynchronization state holds true, we expect
the first RST packet of Strategy 17 to put the GFW in the resynchronization state
for every protocol but HTTPS, and resynchronize not on the next packet it sees in
the connection or the next SYN+ACK, but on the next packet it sees from the client,
which is the induced RST with an incorrect sequence number.
To test this, we instrumented a client to adjust its sequence numbers to match
that in the RST packet. This resulted in censorship, indicating that the GFW indeed
synchronized on this packet, and confirming our new model of GFW?s resynchro-
nization state.
Strategy 18: TCP Window Reduction (China)
DNS (3%), FTP (47%), HTTP (2%), HTTPS (3%), SMTP (100%)
[TCP:flags:SA]-
tamper{TCP:window:replace:10}(
tamper{TCP:options-wscale:replace:},)-|\/
TCP Window Reduction Strategy 18 works by reducing the TCP window size
and removing wscale options from the SYN+ACK packet, inducing the client to seg-
80
ment the forbidden request. This strategy is almost the exact same strategy iden-
tified by brdgrd [45] in 2012. The fact that this strategy works at all is highly
surprising?the GFW has had the capacity to reassemble segments since brdgrd
became defunct in 2012. It appears that the portion of the GFW responsible for
FTP censorship is incapable of reassembling TCP segments. This strategy is also
the most effective at evading SMTP censorship in China, and as we show next, it is
highly effective in other countries, as well.
4.3.2 Server-side Evasion in India & Iran
Our vantage points in India are all within the Airtel ISP, and we confirm
that Airtel only censors over HTTP [28]. Our vantage points in Iran are in Zanjan
and Tehran; here, HTTP, HTTPS, and DNS is censored (though DNS-over-TCP is
uncensored, so we will focus on HTTP and HTTPS here).
Airtel?s censorship injects an HTTP 200 with a block page with a FIN+PSH+ACK
packet instead of tearing down the connection. Iran?s censorship simply ?blackholes?
the traffic, dropping the offending packet and all future packets from the client in
the flow for 1 minute. In India, as reported by Yadav et al., we also observe a
follow-up RST packet from the middlebox for good measure [28].
We find that both countries only censor on each protocol?s default ports (80,
443); hosting a web server on any other port defeats censorship completely. Both
countries? middleboxes also do not seem to track connection state at all: sending a
forbidden request without performing a three-way handshake to the server elicits a
81
censorship response.
Given the lack of state tracking for these middleboxes, the problem of server-
side evasion becomes even more challenging: there is no censor state to invalidate or
teardown, so the only feasible strategies are those that mutate the client?s forbidden
request in a manner that cannot be processed by the censor. When deployed from
the server side, Geneva identifies one such strategy in both countries that we have
already seen: TCP Window Reduction (Strategy 18).
Again, simply by reducing the TCP window size of the SYN+ACK packet, it
induces the client to segment the forbidden request. This works because the mid-
dleboxes in both countries appear incapable of reassembling TCP segments, so once
the forbidden request is segmented, it is uncensored.
This result, combined with the similar success of this strategy in China against
FTP and SMTP, suggests a pattern of generalizability for client-side strategies.
Client-side strategy species that work by performing simple segmentation can be
re-deployed at the server-side in the form of a strategy that induces simple segmen-
tation.
4.3.3 Server-side Evasion in Kazakhstan
Kazakhstan has deployed multiple types of censorship. Previous works have
explored weaknesses in their now-defunct HTTPS man-in-the-middle [40]. Here, we
focus on their in-network DPI censorship of HTTP. Like the Airtel ISP, the censor
steps in when a forbidden URL is specified in the Host: header of an HTTP GET
82
Client Server Client Server Client Server
SYN SYN SYN
SYN/ACK SYN/ACK ?
(rand load) (benign GET) (no flags)
SYN/ACK SYN/ACK SYN/ACK
(rand load) (benign GET)
SYN/ACK ACK
(rand load) ACK
PSH/ACK
ACK ACK (query)
ACK PSH/ACK ACK
ACK (query)
PSH/ACK
PSH/ACK ACK (response)
(query)
PSH/ACK
ACK (response)
PSH/ACK
(response)
Strategy 9 Strategy 10 Strategy 11
Triple Random Double Benign Null Flags
Payload HTTP GET
Figure 4.2: Server-side evasion strategies that are successful against HTTP in Kaza-
khstan.
request. When the censor activates, it first performs a man-in-the-middle, so all
packets in the TCP stream (including the forbidden request) for approximately 15
seconds are intercepted by the censor and will not reach the server. The censor
then injects a FIN+PSH+ACK packet with a block page to inform the user the page is
blocked and the connection terminates.
We provide an overview of our successful server-side evasion strategies against
Kazakhstan in Figure 4.2.
Strategy 19 takes the outbound SYN+ACK packet, adds a random payload, and
83
Strategy 19: Triple Load (Kazakhstan) HTTP (100%)
[TCP:flags:SA]-
tamper{TCP:load:corrupt}(
duplicate(
duplicate,),)-| \/
then duplicates it twice, effectively sending three back-to-back SYN+ACK packets with
payloads. The payloads and duplicate packets are ignored by the client, and the
client completes the 3-way handshake. This strategy works 100% of the time in
Kazakhstan.
Strangely, we find that Strategy 19 is effective only if the packet with the load
is sent at least three times. Increasing the number of duplicates does not reduce
the effectiveness of the strategy, but removing any of them renders the strategy
unsuccessful.
We find the size of the payload injected by the server does not affect the success
of the strategy; whether just 1 byte is injected or hundreds, the strategy is equally
effective. This suggests that it is the presence of the payloads, not the length of the
payloads, that causes the censor to fail.
We also find that it is critical that each of the SYN+ACK packets have the
payload. If we instrument the strategy instead to send just one SYN+ACK with a
payload (either first, in the middle, or last), the strategy fails, or if we instrument
the strategy to send two SYN+ACK with a payload (back-to-back in the beginning,
back-to-back at the end, and with an empty SYN+ACK in between), the strategy fails.
The strategy only works if three back-to-back packets with a payload are sent during
the handshake.
84
We first test if this strategy is causing a desynchronization in the censor. If
the censor advances its TCB upon seeing the SYN+ACK payload, we do not know if
the censor will advance it for all of the packets, or just some subset of them. To test
each of these cases, we instrumented the client to increment the sequence number of
its forbidden request by single, double, and triple the length of the injected payload.
However, none of these instrumented requests trigger censorship, suggesting that
this attack does not perform a desynchronization attack against the censor.
Instead, we hypothesize the censor monitors connections specifically for pat-
terns that resemble normal HTTP connections, and seeing payloads from the server
during the handshake violates this model, causing it to ignore the connection. How-
ever, we do not understand why three payloads are required to enter this state. The
next strategies identified by Geneva support this hypothesis.
Strategy 20: Double GET (Kazakhstan) HTTP (100%)
[TCP:flags:SA]-
tamper{TCP:load:replace:GET / HTTP1.}(
duplicate,)-| \/
Strategy 20 duplicates the outbound SYN+ACK packet and sets the load to the
first few bytes of a well-formed, benign HTTP GET request. Since this payload is
on the SYN+ACK, the client ignores it, and the TCP connection is unharmed, but the
payload is processed by the censor. The above strategy shows the minimum portion
of a HTTP GET request required for the strategy to work (if the ?.? is removed,
the strategy stops working). As long as the GET request is well-formed up to the
?.?, the strategy works; for example, the strategy works equally well if we specify
85
the rest of the GET request or use a different or longer path. We also find that the
duplicate is required for this strategy to work; if the GET is only sent once, the
strategy does not work.
Frankly, we do not understand why this strategy works. We hypothesize the
request is just enough to pass a regular expression or pattern matching inside the
censor, and seeing the well-formed GET request is sufficient for the censor to think
the server is actually the client. To confirm the censor is processing injected packets,
we try probing the censor by injecting forbidden GET requests. We find two ways
to inject the content such that it elicits a response from the censor: injecting two
GET requests during the handshake, or performing simultaneous open and injecting
one GET request after during the handshake.
We do not understand why two requests are required to elicit a response dur-
ing the handshake; we hypothesize the first request is needed to break out of the
censor?s ?handshake? state and the second request is then processed. To test this
hypothesis, we try injecting a forbidden request followed by a benign request, and no
censorship occurs. This indicates that when content is injected before a connection
is established, it is the second request that the censor processes.
Strategy 21: Null Flags (Kazakhstan) HTTP (100%)
[TCP:flags:SA]-
duplicate(
tamper{TCP:flags:replace:},)-| \/
Strategy 21 duplicates outbound SYN+ACK packet. To the first duplicate, all
of the TCP flags are cleared before it is sent, and the second duplicate is sent
86
unchanged. We find this strategy works 100% of the time. Although Geneva first
discovered this strategy by clearing the TCP flags, it also identified the strategy
works as long as FIN, RST, SYN, and ACK are not used. We hypothesize the censor
is monitoring for ?normal? TCP handshake patterns, and when those patterns are
violated, the connection is ignored.
Finally, as expected, Strategy 18 also works in Kazakhstan: inducing client
segmentation is sufficient to defeat the censor.
4.4 Multiple Censorship Boxes
The server-side evasion strategies from ?4.3 exhibit a surprising property: al-
though they strictly operate at the level of TCP (specifically the 3-way handshake),
they have varying success rates depending on the higher-layer application within
a given country. This defies expectation: our evasion strategies exploit gaps in
censors? logic or implementation at the transport layer, and thus those same gaps
ought to be exploitable by all higher-layer applications. Exceptions to this indicate
either a cross-layer violation or a different network stack implementation for each
application?two phenomena that are necessarily rare in the layered design of the
Internet.
The remaining explanation is that China uses distinct boxes?with distinct
network stack implementations?for each of the application protocols they censor.
We depict this in Figure 4.3.
This raises an important question: how does the censor know which box to
87
Network path
? ?
DNS FTP HTTP HTTPS SMTP
TCP TCP TCP TCP TCP TCP
IP IP IP IP IP IP
(a) Single censorship box (b) Multiple censorship boxes
Figure 4.3: Single versus multiple censorship boxes. A standard assumption is that
evasion strategies that work for one application will work for another within a given
country. However, our results indicate that China?s GFW uses distinct censorship
boxes for each protocol, each with their own network stacks (and bugs).
apply? This is not as simple as triggering on port numbers; recall that, in our exper-
iments, we randomize the server?s port numbers, and yet still experience censorship
for each protocol. Indeed, most of the GFW?s censorship is not port-specific.
We posit that each of the GFW?s separate censorship boxes individually track
all TCP connections until it identifies network traffic that matches its target protocol
(i.e., until the request). Note, however, that most of our strategies complete before
the end of the 3-way handshake?before it can be determined which application is
using it. Thus, if our theory is correct, then when an application-specific TCP-level
strategy is used, all of the protocols? processing engines react, but only some of
them respond incorrectly.
Separate censoring boxes would also explain why the GFW never ?fails closed?;
i.e., it does not default to censorship if it observes packets that are not associated
with a TCB or that it cannot parse. Our multi-box theory suggests that the GFW
can never fail closed because, although one box may not recognize a packet, it must
assume that another box might. If each censorship box were fail-closed, the GFW
88
DNS
FTP
HTTP
HTTPS
SMTP
would destroy every connection.
To see if we can detect the presence of multiple boxes, we sought to locate them
via TTL-limited censored probes [28]. We instrumented a client to perform 3-way
handshakes with servers of various protocols, and then send the query repeatedly
with incrementing TTLs until it elicits a response from a censor. We found that, in
China, censorship occurred at the same number of hops for each protocol at each
vantage point. This indicates that, if there are indeed multiple boxes, then China
collocates them.
4.5 Client Compatibility
The evasion strategies presented in ?4.3 take advantage of esoteric features
of TCP that appear to have faulty implementations in nation-state censors? fire-
walls. Server-side deployment risks making the server unreachable to any client
that also has the same shortcomings. Conversely, strategies that work for a diverse
set of clients are readily deployable. Here, we comprehensively evaluate of all of
the strategies against a diversity of client operating systems, and we provide some
anecdotal evidence across different link types.
Experiment Setup We formed a private network consisting of an Ubuntu 18.04.3
server running each of the server-side TCP strategies (using Apache2.4 for HTTP
and HTTPS). For our clients, we used 17 different versions of 6 popular operating
systems: Windows (XP SP3, 7 Ultimate SP1, 8.1 Pro, 10 Enterprise (17134),
Server 2003 Datacenter, Server 2008 Datacenter, Server 2013 Standard, Server 2018
89
Standard), MacOS (10.15), iOS (13.3), Android (10), Ubuntu (12.04.5, 14.04.3,
16.04.4, 18.04.1), and CentOS (6, 7). We tried each protocol and each server-side
strategy against each client.
OS Results We found that all but three strategies worked on every version of every
client OS. The only exceptions were Strategies 15, 19, and 20, each of which failed
to work on any of the versions of Windows and MacOS. These three strategies all
involve sending a SYN+ACK with a payload; Linux?s TCP stack ignores these, but
Windows? and MacOS?s do not.
However, we can slightly alter Strategies 15, 19, and 20 to make them work
with all clients. The key insight is that these strategies work on Linux precisely
because Linux ignores the payload (but censors do not). However, we can modify
the strategy in other ways to make the client ignore the packet while the censor still
accepts it; this is commonly referred to as an ?insertion? packet, and there are other
ways to create insertion packets [40]. For instance, we can send the payload packets
with a corrupted chksum (so they are processed by the censor but not the client),
and send the original SYN+ACK packet unmodified afterwards. We re-evaluated these
three strategies with this modification, and found that with this small change, the
strategies worked for all client operating systems. An area of future work is evolving
strategies directly against many operating systems to avoid requiring these post-hoc
modifications.
Results Can Vary by Network We close this section with an anecdotal obser-
vation. In addition to the tests on our private network, we also tested all strategies
90
from a Pixel 3 running Android 10 on wifi and two cellular networks: T-Mobile,
and AT&T in a non-censoring country (anonymized for submission). All strategies
worked over wifi, and all worked on the two cellular networks except Strategies 11
and 13 for T-Mobile and Strategies 11, 12, and 13 (all of the simultaneous open
strategies) for AT&T. We speculate that the failures were caused by other in-network
middleboxes. This indicates that, while the client may not be an issue with some
server-side strategies, the client?s network might.
These results collectively demonstrate that, when deploying server-side strate-
gies, it is important to test across a wide range of clients and network middleboxes.
Fortunately, many of the strategies we have found appear to work across a very wide
range of networks and client types, but for practical deployments, a global study of
network compatibility would be an important and interesting avenue of future work.
4.6 Deployment Considerations
Where to Deploy? Though we refer to them as ?server-side,? the strategies
we have presented could be deployed at any point in the path between the censor
and the server. For instance, a reverse proxy (such as a CDN), a common hosting
platform (like Amazon AWS), or even a middlebox along the path (like in Tap-
Dance [47]) could run our strategies by manipulating packets in-flight. However, for
ease of deployment, we anticipate that our strategies will mainly be run at whichever
host is performing the 3-way handshake with the client. Our strategies incur lit-
tle computation or communication overhead (at most three extra payloads), so we
91
expect that they could be deployed even in performance-critical settings.
Which Strategies to Use? As our results have shown, strategies that work in
one country or ISP do not necessarily work in another. Thus, in deployment, the
server must determine which strategy to use on a per-client basis. This may prove
challenging, as the server must make its determination based only on the client?s
SYN packet. Coarse-grained, country-level IP geolocation may suffice for nation-
states that exhibit mostly consistent censorship behavior throughout their borders
(like China). However, for countries with region-specific behavior (such as Iran
or Russia), finer-grained determination of ISP may be required. Rapid, accurate
determination of which strategies to use is an important area of future work.
4.7 Ethical Considerations
Ethical Experiments We designed our experiments to have minimal impact
on other hosts and users. All of our testing and training was done from machines
directly under our control. Geneva generates relatively little traffic while training [40]
and does not spoof IP addresses or ports. We follow the precedent of evaluating
strategies strictly serially, which rate-limits how quickly it creates connections and
sends data. We believe this mitigates any potential impact it may have had on other
hosts on the same network.
Ethical Considerations of Server-side Evasion In traditional, client-side tools
for censorship evasion, the user is directly responsible for attempting to evade the
censor, and is taking a deliberate action to do so. As such, the user has the oppor-
92
tunity to both decide and consent to the evasion, and (ideally) is knowledgeable of
the risk associated with attempting to (and/or failing to) evade censorship.
However, such an opportunity may not always be present when server-side
strategies are applied to traditional, non-evasive protocols (like DNS, FTP, HTTP,
and SMTP). Every server-side strategy discussed in this work runs during the 3-way
handshake, so the user has no in-band opportunity to be informed or consent to the
server applying strategies over their connection. This raises an ethical question:
Should servers have to seek informed consent from users before evading censorship
on their behalf?
There are several precedents that lead us to believe that such consent is not
necessary. Various evasion techniques are regularly deployed without explicit sup-
port from users, such as wider deployments of HSTS, HTTPS, or encrypted SNI,
and new techniques such as DNS-over-TLS and DNS-over-HTTPS.
Whatever the answer to this question, we did not face any of these concerns
during our experimentation: our servers were not public-facing, served no sensitive
content, and were not connected to by anyone besides our own clients.
4.8 Conclusion
In this chapter, I supported my thesis across multiple network protocols and
in a novel deployment context: server-side evasion. I have presented eleven server-
side packet-manipulation strategies for evading nation-state censors?ten of which
are novel and, to my knowledge, the only working server-side strategies at time of
93
writing. My results lend greater insight into how the national censors in China,
India, Iran, and Kazakhstan operate: we find, for instance, that the GFW appears
to use separate censoring systems for each application it censors, and that each
such system has gaps in its logic, bugs in its implementation, and different network
stacks?all of which we have shown can be exploited to evade censorship. Such
heterogeneity severely complicates the process of evading censorship. Fortunately,
we have shown that, by applying automated tools like Geneva, it is possible to
efficiently evade (across multiple protocols) and understand a threat as nuanced
(and buggy) as nation-state censors.
In the next chapter, I will lend additional support to my thesis across multi-
ple protocols. This chapter?s results showed that TCP/IP level packet manipula-
tion could render middleboxes ineffective across multiple application layer protocols.
Next, I will show that packets can be efficiently manipulated at the application layer
itself to render middleboxes ineffective.
94
Chapter 5: Application-Layer Evasion
The previous two chapters demonstrated that both client- and server-side eva-
sion strategies can be automatically discovered, but were limited for the most part
to manipulations of IP and TCP headers. This leads me to ask: Are TCP/IP-level
packet manipulations the only way that middleboxes can be rendered ineffective?
Can censorship be evaded via manipulating application-layer data, instead? In this
chapter, I will explore how middleboxes can be rendered ineffective, even if packet
modifications are restricted to the application-layer.
The ability to automatically discover censorship evasion strategies is powerful,
but by focusing only on TCP and IP headers, the approach suffers from several
limitations:
Difficulty of deployment. As a practical matter, manipulating TCP and IP
headers requires administrative privileges on most platforms. Some platforms limit
such access (most mobile platforms do not have options for raw IP sockets), and
some tools are reluctant to seek root privileges in the first place (notably, Tor [20]).
Ideally, censorship evasion could take place by manipulating only application-layer
data, which could take place in unprivileged usermode.
95
Lack of UDP support. Geneva (in addition to other tools published after
Geneva?s release [70, 72]) only support TCP-based applications. While this is ex-
tremely useful?spanning HTTP, HTTPS, and even DNS over TCP?it misses out
on arguably the most important and common protocol: DNS (over UDP). Without
reliable and uncensored DNS, users and applications would have to know IP ad-
dresses of the services they wish to connect to, which is untenable. However, UDP
is such a simple protocol that manipulating UDP headers alone is unlikely to lead
to viable censorship evasion strategies. Again, it would be ideal to explore how to
alter application-layer data to evade censorship.
Surprisingly, despite advances in fuzzing techniques in other domains, techniques
to automate the discovery of censorship evasion strategies in the application space
remain relatively unexplored. At the time we started this project, we were unaware
of any application-layer fuzzers that could generalize to multiple protocols and be
modified to train against nation-state censorship infrastructure.
To address this, in this chapter we present what we believe to be the first work that
automatically discovers application-layer censorship evasion strategies. We extend
Geneva with application-layer fuzzing and new fitness functions. The fuzzing engine
we have built is not the primary contribution of this chapter; indeed, it is a relatively
standard fuzzer. What is surprising, however, is that, to the best of our knowledge,
fuzzers have not been applied to censors at all.
As such, we make the following contributions:
? We take the first steps toward automating the discovery of application-layer cen-
96
sorship evasion strategies. These are easier to deploy than their headers-only
counterparts.
? We use our extended build of Geneva to perform a wide-scale empirical study
in several countries (China, India, and Kazakhstan), two protocols (HTTP and
DNS), and many different versions of server software.
? We discover and report on 77 unique circumvention strategies for HTTP and 9
for DNS. We describe many of these strategies in detail, and provide the full list
in Tables 5.2 and 5.3.
? We perform a thorough analysis of these strategies to gain new insights into how
censorship is implemented in different places and how evasion strategies generalize
at the application layer.
Why study censorship of unencrypted protocols? HTTPS adoption is on
the rise for most of the web [101], and browsers have started to request HTTPS
by default [102]. Similarly, with development of encrypted DNS transports, such
as DNS-over-TLS (DoT), DNS-over-HTTPS (DoH), and DNS-over-QUIC (DoQ),
why study ?vanilla? DNS? Despite the availability of more secure alternatives, un-
encrypted protocols are still heavily used around the world. Unencrypted DNS still
dominates the market, and encrypted DNS alternatives are not yet widely adopted
anywhere. HTTP traffic is also still unfortunately prevalent in censored regimes. As
of the time of this writing, HTTP traffic comprises nearly 20% of all traffic out of
China to Cloudflare [103]. Worse yet, many censored websites still do not support
HTTPS. We issued HTTPS requests to all the domains in Citizenlab?s censorship
97
test lists [104] and found that 18% of them did not support HTTPS, and 52% of
the domains on their China-specific list did not load over HTTPS. Lastly, censors
have grown increasingly hostile to new privacy advances in HTTPS, blocking TLS
1.3?s ESNI [36], and launching HTTPS man-in-the-middle attacks [105?107]. Taken
together, we believe HTTP and DNS will be prevalent in censored regimes for the
foreseeable future. Our work shows that HTTP and DNS censorship can be evaded
in easily deployable ways.
Roadmap The rest of this chapter is structured as follows: ?5.1 presents further
background on the specifics of censorship in the countries we study in this chapter.
?5.2 describes the design of our extensions to Geneva and the specific application to
DNS and HTTP. ?5.3 describes our experimental methodology. ?5.4 presents our
results from training over HTTP and ?5.5 presents our results from training over
DNS. We discuss these results, and what we can learn about censors in ?5.6, and
address ethical considerations in ?5.7.
5.1 Application-Layer Censorship Background
In this section, we review additional details about the specific nation-state
censorship infrastructure studied in this chapter and additional background about
application-layer fuzzing techniques.
Censors commonly filter HTTP traffic in one of two ways: either by examining
the requested domain (via the Host header), or by searching for forbidden keywords
in the request string itself [1, 2, 28]. Censors in India and Kazakhstan examine the
98
Host header, while the Great Firewall of China uses both techniques. All three of
these countries perform HTTP censorship differently. Airtel?s ISP in India injects a
block page to the user, the Great Firewall of China injects RST+ACK packets to tear
down the connection, and the Kazakhstani censor drops the offending traffic (and
subsequent traffic) from the client. To censor DNS, censors will commonly inject
responses that contain an incorrect IP address. As of time of writing, China has
deployed three independent DNS censorship systems running in parallel, each with
their own fingerprints and block-lists [33]. Although some DNS and HTTP servers
are censored by IP-blocking, the focus of this work will be on the active censorship
performed at the application level.
Why extend Geneva? For this work, due to the number of DNS resolvers, HTTP
servers, and censoring countries, we will use an automated approach for discovering
application layer strategies.
We are familiar with three existing systems to automating censorship evasion:
Geneva [1], SymTCP [70], and Alembic [72]. Although each of these systems takes
a different approach, the high level goal is the same: to find a sequence of packets
that cause the censor to be unable to teardown a connection (while preserving the
connection to the server itself). Geneva uses a genetic algorithm, and treats censors
and destinations as black boxes, not unlike a fuzz tester. Alembic and SymTCP
require access to the source code to perform symbolic execution of the server. In our
case, we may not have access to the source code of the application servers, and will
also run across multiple versions of multiple server types. For this work, we chose
99
to extend Geneva, and we will detail our design in ?5.2.
Application Fuzzing In addition to the relevant fuzzing works described in
Chapter 2, most similar to this chapter is a concurrent work T-Reqs [108], a
grammar-based differential HTTP fuzzer to identify HTTP Request Smuggling at-
tacks. HTTP Request Smuggling is the process of modifying an HTTP request such
that a firewall or proxy fails to identify a second, hidden request. Although HTTP
Request Smuggling is similar in spirit to censorship evasion, the goals are slightly
different: with censorship evasion, our goal is not to sneak a second request past
a censor, but simply to allow the original request to bypass the censor. T-Reqs
created a detailed context-free grammar for the HTTP specification, and randomly
mutated inputs to discover differences in how popular HTTP proxies and servers
handle content. With modification, T-Reqs (or other grammar-based fuzzers) could
likely also be applied to censorship evasion.
5.2 Fuzzer Design
In this section, for completeness, we discuss the design and implementation of
our fuzzer to automatically discover censorship circumvention strategies for HTTP
requests and DNS queries.
Prior approaches to automating censorship evasion techniques have taken a
fuzzing approach (Geneva [1]) or a symbolic execution approach (SymTCP [70] and
Alembic [72]) to identify successful modifications to network packets. In this work,
we will not always have access to the source code for every application layer server
100
GET?<PATH>?HTTP/1.1\r\n Request
Method Path Components HTTP Version Delimiter Line
Host:?example.com\r\n\r\n HTTP
Name Value: Domain End of Header Header
/path?foo=bar&foo2=bar2#anchor
Path Param Value Param Value Anchor
Path End Param Delimiter Anchor Delimiter
Figure 5.1: Structure of an HTTP request for example.com. Note that ? ?
denotes where whitespace is required by the RFC, typically 1 space. Typically,
HTTP Requests contain multiple headers separated by a \r\n.
13 37 <> 00 01 00 00 00 00 00 00 DNS
Query ID Bit Flags Query Count Answer Count NS Count Add. Records Header
07 example 03 com 00 00 01 00 01 Question
Length Effective 2nd Length TLD End Type (A) Class Record
Level Domain
Bit Flags
0 0000 0 0 0 0 000 0000
QR Opcode AA TC RD RA Z Response
 Truncated  Recursion Reserved
  
Code
Avail.
Authoritative Recursion
Answer Desired
Figure 5.2: Structure of a DNS request for example.com. Note that the Bit
Flags field (detailed in the lower box) is two bytes wide. Although DNS requests
typically only contain one Question Record, the RFC [8] allows for multiple DNS
Questions to be included with no separator between them.
Param Delimiter
101
we need to train with (such as Google?s public DNS resolver). Therefore, we will
use a fuzzing approach for our design, and specifically will extend Geneva?s to the
application layer space and re-use its existing genetic algorithm.
What lessons can we learn from the design of Geneva to inform how we should
fuzz for application-layer strategies? Geneva built censorship evasion strategies out
of small, individual manipulation primitives (called actions) that could modify a
packet. Each action takes parameter values, which were chosen at random or from
packet captures of previous strategies. Since some actions can introduce new packets
into the network (such as duplicate), these actions compose to form trees that
describe how a packet should be modified, and each tree has an associated trigger
to describe which packet to modify. Despite the simplicity of the manipulation
actions, by composing them together with associated triggers, Geneva?s strategies
can be expressive enough such that a strategy can transform any set of packets into
any other set of packets.
Each strategy is evaluated with a fitness function, which applies the strategy
to modify a request for a forbidden resource and assigns it a numeric fitness value
based on its success, overhead, and complexity. The genetic algorithm uses the
fitness values to decide which strategies should survive to the proceeding generations
and propagate.
How can we apply these ideas to application-layer requests? We observe that
in abstract, manipulating individual packets is tantamount to manipulating smaller
components of a broader request. To translate this approach to the application-
layer space, we identify the constituent units of the broader requests for HTTP and
102
DNS. Though HTTP starts with a few constant fields (Method, Path, Version), the
majority of an HTTP request is made up of a variable number of smaller HTTP
headers. DNS requests, too, are comprised of constant fields, followed by a variable
number of DNS question records. Therefore, we will allow our manipulations to
access the constant fields and chain together modifications that affect the variable
fields (HTTP Headers and DNS Question records, respectively). We note that even
beyond the scope of this chapter, other popular application layer protocols follow
this pattern; for example, TLS packets usually have many TLS Messages and TLS
Extensions.
5.2.1 Grammars
Next, we define a grammar that allows us to parse and modify these requests.
HTTP Grammar We specifically scope this work to HTTP Version 1 (HTTP/1.0
and HTTP/1.1). The HTTP protocol grammar is specified by RFCs 2616, 7230,
7231, 7232, 7233, 7234, 7235, and 3986 [109?116]. An HTTP Request starts with
the HTTP Method (sometimes called a ?verb?), which defines the type of request,
followed by a single space. Next, a request contains the request path, which specifies
the resource location the HTTP request is for, as well as any HTTP parameters and
values for the request. The path generally starts with a /, and if HTTP parameters
are included, a ? denotes the end of the path and the start of the query parameters.
RFC 3986 specifies that in certain circumstances, other characters may mark the
start of the path, but these are restricted to specific circumstances [116]. Multiple
103
parameters may be specified within the request line by delimiting them with a &.
After the path, a single space separates the HTTP version, and HTTP headers
comprise the remainder of the request. The end of the starting line containing the
method, path, and version is ended with a \r\n. Each line within the HTTP header
is delimited with a \r\n, and the end of all the headers is marked with an empty
line followed by a \r\n. This will look like a header followed by \r\n\r\n, signifying
all following data is the message body. Using this grammar, our system will parse
the given HTTP request to extract the constant fields (Method, Path, Version), and
variable headers into a list. See Figure 5.1 for an example HTTP request.
DNS Grammar In this work, we focus specifically on normal DNS Requests,
so extensions or other DNS technologies (such as DNSSEC or running DNS over
other protocols) are out of scope. The structure of DNS queries are defined by RFC
1035 [8]. DNS Queries are comprised of a set of fixed constant fields, followed by a
variable number of DNS Question Records which specify the domains to lookup. By
convention, DNS Queries usually only have 1 DNS Question (and as we will see in
Section 5.5, many DNS servers will only respond to queries with 1 DNS Question),
but the RFC still permits multiple Question Records in a request. See Figure 5.2
for the fields in a DNS Query.
5.2.2 Manipulations
Now that we can parse HTTP and DNS requests, our goal will be to design
simple manipulation primitives that can be composed together such that for a given
104
application, a strategy can transform any request into any other request. Therefore,
our actions must be able to add, remove, or manipulate any constituent components
of the request. We will define duplicate and drop to add or remove components
from a request, but most importantly, we must be able to modify one of these
components. Unfortunately, application-layer data is significantly less structured
than packet headers, and HTTP headers in particular are primarily composed of
raw, unstructured text. We require a new set of actions that will allow us to modify
unstructured text.
Inserting New Bytes We define a new modification primitive to insert new bytes
into a given header or question record:
insert(<VALUE>, <WHERE>, <COMPONENT>, <NUM>)
The action takes four parameters, which control what bytes are inserted, where
within the existing text they should be inserted (start, middle, end, random),
which component should be affected, if applicable (such as HTTP header name or
value), and the number of times the bytes should be inserted. As the genetic
algorithm runs, these parameters can be mutated and learned through the process
of evolution.
Replacing Bytes We define a second modification primitive to allow our system
to replace existing bytes within a given header or question record:
replace(<VALUE>, <COMPONENT>, <NUM>)
The action takes three parameters, what bytes should replace the existing text,
which component should be affected, if applicable (such as HTTP header name or
105
value), and the number of times the bytes should be placed in that location. This
action also incorporates the ability to delete the component, by replacing with a
value of an empty string. As the genetic algorithm runs, these parameters can be
mutated and learned through the process of evolution.
Changing String Case We define this action to take in a string and change the
case of all alphabetical characters in the header name and value.
changecase(<CASE>)
This action takes one parameter, which is what case all letters should be changed to.
It can change all characters to lower or upper case, or randomly assign each letter
to be upper or lower case, irrespective of its current case. Nothing will happen to
non-alphabetical characters.
5.2.3 Fitness Function
In this work, we do not modify Geneva?s original genetic algorithm, but we will
update its fitness function to allow us to evaluate application-layer strategies. We
will evaluate strategies directly against real-world censors by using them to mod-
ify a request for forbidden resources, sending the resulting request across a censor
to a destination server, and checking that the request did not trigger censorship
and successfully obtained the forbidden content. Each time we train the genetic
algorithm, we will initialize it with a clean slate with no access to prior results or
knowledge of the censorship system. Our system will execute each training run for
a pre-specified number of generations or until population convergence occurs. Be-
106
tween each training run, we perform post-hoc analysis of the results and strategies
the system identified.
HTTP Evaluation To evaluate HTTP strategies, the fitness function makes a
request that either contains a forbidden Host header, or a forbidden keyword in the
request string. To train for HTTP strategies, we will run our system from vantage
points we control within a censored country and make a request to a server we
control outside the censored country. This will allow us to control the server type
and version.
Our design must account for the effects of residual censorship. In China, for
90 seconds after the censor tears down a forbidden request, any follow-up request to
the same three-tuple (server IP, server port, and client IP) will result in censorship,
even if that request is benign. Fortunately, China?s HTTP censorship is active on
every destination port. Therefore, the fitness function will use a different destination
port within a large range of ports for every strategy, and all of these ports will be
forwarded to a single port the server runs on. In this regard, we can train without
residual censorship affecting the fitness function.
DNS Evaluation To evaluate each DNS strategy, the fitness function applies each
strategy to a DNS request that contains a DNS Question Record for a forbidden
domain.
Recall that the Great Firewall of China runs three separate DNS censorship
systems, and any subset of them can respond to a forbidden query [33]. The GFW
does not drop the offending query packet, so in addition to the DNS injectors, the in-
107
tended destination of the request will also receive it and respond. As a consequence,
if a client within China makes a forbidden DNS query to a reachable DNS server
outside of China, the client could get anywhere from 0 to 4 DNS responses (up to
three from the injectors, optionally followed by the real uncensored response). Since
any strategy could affect the response or any of the censors or the destination server
itself, it is difficult to identify whether a given DNS response constitutes censorship
without issuing a follow-up query to the IP address in the response, which is slow.
To avoid this problem, we run training for DNS outside of China. To evalu-
ate a strategy, the fitness function applies the strategy to a query for a forbidden
domain (such as google.sm). First, the resulting modified query is sent to an uncen-
sored DNS server, such as an open resolver, like Google?s 8.8.8.8. If the strategy
successfully gets a response from the DNS server, we know the query is valid, and
the fitness function rewards the strategy?s fitness value. Next, we send the same
modified query into China to a machine under our control that is not running any
DNS server at all. In this case, if the query gets any DNS responses, we know these
responses originated from the Great Firewall (and the fitness function punishes the
fitness value).
Importantly, as with the HTTP fitness function (and fitness functions from
prior work), the fitness function gives a lower fitness value to a strategy that breaks
the underlying request than if the resulting request was still valid but experienced
censorship. This encourages the genetic algorithm to explore the space of strategies
that preserve the validity of the original request, but can impact the censor.
108
5.2.4 Using Strategies
To make our strategies useful for real users, we developed a standalone ?proxy?
application, which applies a given strategy to live traffic. This proxy application
accepts the original strategy syntax, so any of the strategies presented herein can
be copied and used, with no further set up. We tested this proxy by browsing with
it through our vantage point in India to multiple forbidden websites, and validate
that these strategies can be used on real user traffic.
5.3 Methodology
In this section, we describe our experiment methodology for training our sys-
tem. As we will see, many application-layer strategies only work with specific des-
tination servers; therefore, we need to repeatedly train to different popular servers
for DNS and HTTP.
HTTP Servers On September 3rd 2020, we downloaded a list of the most popular
HTTP servers currently in use from W3Techs [117] and BuiltWith [118]. According
to both resources, Apache [119] was the most popular (with 36.5% and 35% esti-
mated market share from each respective resource) and Ngninx [120] was the second
most popular (with 32.5% and 34% share respectively). W3Techs identified Cloud-
flare?s hosting as the third most popular (15.7%), and both identified Microsoft IIS
as the next most popular (7.9% and 13% respectively). For this work, we choose to
focus on the servers with the maximal market share: Apache and Nginx. Deploy-
109
DNS Resolver Org. Resolver Address
Cloudflare 1.1.1.1
Google 8.8.8.8
Quad9 9.9.9.9
OpenDNS 208.67.222.222
CleanBrowsing 185.228.168.168
ComodoSecure 8.26.56.26
DNS.Watch 84.200.69.80
Verisign 64.6.64.6
Table 5.1: DNS Open Resolvers we conduct experiments with. All of these open
resolvers are accessible from within China.
ments of Apache and Nginx span many versions; we selected the four most popular
versions for each, according to W3Techs [117], specifically 2.4.6, 2.4.18, 2.4.29, and
2.4.43 for Apache and 1.13.4, 1.14.1, 1.16.1, and 1.19.0 for Nginx.
DNS Resolvers Most DNS traffic is handled by large resolvers; in 2019, DNS
Observatory studied over 1 trillion DNS transactions and found that over 60% of
them were handled by just 1,000 nameservers and flowed to authoritative servers
run by less than 10 organizations [121]. For this reason, we choose to train directly
with the most popular open resolvers. We tested if these resolvers are affected by
IP-blocking censorship by making innocuous DNS lookups from our vantage point
within China, and found that none are affected and all are reachable. See Table 5.1
for a full list of the resolvers we test.
Vantage Points We obtained vantage points in China (Beijing), India (Banga-
lore), and Kazakhstan (Almaty) to use in our experiments. We also set up servers
we controlled in uncensored countries in Europe (Ireland), Japan (Tokyo), and the
United States (at our university) to conduct experiments.
To train our system in these countries, our system will trigger censorship
110
depending on the country and type of censorship. For HTTP, in India and Kaza-
khstan, we sent an HTTP request with a forbidden domain in the Host header
(youporn.com). Recall that China censors HTTP both by censoring keywords in
the HTTP parameter list and by examining the Host header, so we train in China
against both types of censorship (specifically, using the forbidden word ultrasurf
as an HTTP parameter and youporn.com in the Host header). For DNS, we send
a DNS query containing a question for a domain forbidden by China between two
hosts we control across the censor. Recall that the landscape of DNS censorship is
more complex in China than with HTTP, with three parallel DNS censorship injec-
tors. We specifically choose to train with only those domains that are affected by
all three censorship systems, such as google.sm.
Like all censorship research, our results are limited by the censorship we can
access and test with; still, we believe that testing against three different censors
for HTTP and DNS is sufficient breadth to demonstrate the generalizability of this
technique.
HTTP Experiment Methodology We ran our experiments over the span of
seventeen months, starting in December 2020. We evaluated against a diverse set of
censorship types: India, Kazakhstan, China-Host, and China-keyword. For all four
types of censors, and for all eight types/versions of HTTP servers, we conducted 5
training runs (160 in total). Each training run executed with a population pool of
500 individuals for 50 generations.
For each HTTP server, for training runs with Host header based censorship,
111
we configure the server with a VirtualHost to require the Host header; this pre-
vents a strategy from ?succeeding? by simply removing, or mangling the forbidden
value from the request. For keyword-based censorship training, the fitness func-
tion requires that the forbidden keyword is present in the outbound request. Note
also that we limited our system to only actions at the application layer space, so
TCP segmentation is not permitted, and the fitness function cannot make additional
requests.
To avoid residual censorship in China, we ensured that no two strategies used
the same destination port within a 90-second window. In particular, we allocated
15,000 contiguous ports, assigned each port to one strategy, and used iptables to
redirect all of these ports to a single port that hosts the server. The fitness function
ensures that each strategy gets its own port. Since residual censorship lasts for 90
seconds, we evaluated fewer than 167 strategies per second (15,000/90) so as not to
exhaust our ports.
We evaluate each strategy serially, with no sleep in between. On average,
the fitness function for HTTP evaluates 1-2 strategies per second and each HTTP
request is initially 40 bytes. For example, an initial HTTP request (before it is
modified by a strategy) in India is:
GET / HTTP/1.1\r\n
Host: youporn.com\r\n\r\n
We also tested if this technique is applicable to servers outside our control by training
to 12 censored domains over HTTP (6 in KZ, 6 in IN); we show the successful results
112
of these experiments in ?5.4.3.
DNS Experiment Methodology For DNS, we chose to train against all three of
China?s DNS Injectors simultaneously, so the resulting strategies could be applied to
any forbidden domains. We can do this by using a domain that appears on all three
injectors? block-lists. We reached out to Anonymous et al.?who originally discov-
ered that the GFW?s DNS infrastructure was powered by three injectors?and the
authors provided a list of domains that appeared on each injectors? block-lists [33].
By choosing which domain name we used to trigger censorship, we can tailor our
training to specific DNS injectors. For this work, we chose to use google.sm, which
appears on the block-lists for all three injectors.
For each of the 8 DNS resolvers we train with, we conduct 5 training runs.
We use the same hyperparameters for training as with HTTP: each training run is
executed with a population pool of 500 individuals over 50 generations.
Since DNS runs on UDP, the fitness function can evaluate the strategies much
more quickly?about 20 strategies per second?and each request is initially 27 bytes.
The total network load for DNS training to an open resolver is approximately 11kbps,
and lasts than less approximately 20 minutes per training run; these network loads
should be negligible for resolvers of this size. Fortunately, residual censorship is not
a concern for DNS in China, allowing us to train more quickly.
Post-Hoc Analysis After each training run for DNS and HTTP, we perform man-
ual analysis to investigate the strategies our system discovers and perform manual
experiments to understand why each strategy works. We also follow precedent from
113
prior Geneva work: after each training run, we disable any fields or actions that
dominated the search space to encourage strategy diversity. For example, if the first
training run discovers that any changes to a specific field always evade censorship
and those strategies quickly dominate, we will remove that field from the proceeding
training runs to encourage the algorithm to discover new strategies.
Strategy Success Rates After we completed all the training runs, we re-tested
every discovered strategy against every other server version in each country. We
tested every DNS strategy 1,000 times and HTTP strategy 100 times. We did not
observe any differences in the success rates of our strategies from when they were
initially collected to this success rate testing.
Manual Verification To confirm that the strategies we discovered work the way
we expect, we performed several additional manual verification steps. First, we
manually ran every strategy presented in this paper against every server type and
confirmed we receive the correct server response page. For a more rigorous check for
a subset of our servers, we also compared server responses to unmodified requests
and requests modified by our strategies and confirmed they were byte-wise identical.
Finally, as mentioned in ?5.2.4, we manually tested a sample of strategies in India
with a real web browser using our proxy server and validated that we could browse
blocked websites successfully.
114
5.4 HTTP Results
In this section, we will detail our results from training our system against
HTTP censorship against Host- and Keyword-based censorship in China, and Host-
based censorship in India and Kazakhstan. For a strategy to succeed, it must
modify a request sufficiently to evade censorship, while still being accepted by the
destination server.
5.4.1 Summary Results
We only report on strategies for which at least one HTTP server we tested
correctly responded. We consider a strategy unique if it defeats censorship, or is
accepted by a server, for a unique reason. This means for each strategy, there are
often many ways to craft strategy variants that do functionally the same thing,
but the total number of strategies we report are only those that work for a unique
reason.
In total, we identify 77 unique HTTP strategies, and we manually performed
experiments to understand how they work and determine their success rate against
each country and HTTP server. We found the most strategies that defeated Airtel?s
censorship in India: of the 77 strategies we discovered, an incredible 56 of them by-
passed the Indian censor. A total of 29 strategies bypass the Kazakhstani censor. In
China, we found a total of 22 evasion strategies that evaded path-based censorship,
and 27 strategies that evaded the host-based censorship.
As we will see, the number of strategies we discover against each censor does
115
not necessarily imply that the censor is non-compliant with the RFCs; on the con-
trary, our results suggest if a censor is more RFC-compliant than the destination
server, there will be many more opportunities for evasion.
Due to space constraints, we cannot discuss every strategy we discovered.
Instead, in this section, we will describe each strategy family and give examples of
where and why they work.
5.4.2 Evasion Strategies
Version Mangling The first strategy we discuss is surprisingly simple: corrupting
the HTTP version. The resulting request would seem to be in violation of the RFC,
as RFC 7230 (Section 2.6), specifies that servers should respond with an error page
if they receive an unknown version. However, the RFC also admits that a server
may respond anyway ?if it is known or suspected that the client incorrectly imple-
ments the HTTP specification and is incapable of correctly processing later response
versions?. We find that several server versions (Apache 2.4.6 and 2.4.18) choose to
be maximally permissive and ignore malformed versions, responding normally. We
also find that the tested versions of Nginx will respond normally if the version is
corrupted by inserting a % character (%25).
This strategy evades censorship for both types of HTTP censorship in China,
which is surprising: the HTTP version appears after the path that contains the
forbidden keyword. This suggests that the censor validates the HTTP Version or
will only perform DPI on the packet if the Version has an expected value. Version
116
??
GET / HTTP/1.1\r\n
Extra Space Injected
Host: youporn.com\r\n\r\n
Forbidden Header Unmodified
(a) Request Line Whitespace: Inserting an extra space between the
Method and Path evades Host-based censorship in China. The cen-
sor assumes that there will only be one whitespace character in that
location, but the RFC [110] permits more.
??
GET ///.../// HTTP/1.1\r\n
1,409 '/' Injected
Host: youporn.com\r\n\r\n
Forbidden Header Unmodified
(b) Induced Segmentation: Evades Airtel?s censorship in India by
forcing the request to be segmented across two TCP packets. The
entire request, with headers, is larger than the Ethernet MTU, but
India?s censorship does not properly handle segmentation.
??
GET /?ultrasurf HTTP/1.1\r\n
Request Line Unmodified
AAA...:AAAAAAAA...AAA\r\n
64-byte Name 1,207 Values
Host: youporn.com\r\n
Forbidden Header Unmodified
B:BBB...\r\n\r\n
129-byte Header
(c) Sandwich Strategy : Evades keyword- and Host header-based
censorship in China. This breaks the parsing in such a way that
the censor cannot process the host header, which is needed for path
reconstruction.
Figure 5.3: Examples of three HTTP strategies we discover. Each of these
strategies defeats censorship for a different censor or mechanism (Header-based in
China, in India, and Keyword-based in China).
117
mangling also defeats censorship in India.
Kazakhstan, on the other hand, will censor a request with a corrupted version
unless enough bytes are inserted into the field to lengthen it to 1,434 bytes long.
At this point, the censor ignores the request, and we can evade successfully. We do
not believe the Kazakhstani censor is doing any validation of the version; instead,
we believe it is more likely that the censor has a limit to the number of bytes it will
buffer before processing it.
Four Element Request Line The HTTP RFCs specify that the request line
should be split on whitespace between the three request line parameters. We dis-
covered a class of strategy that inserts a space into the middle of a field within
the path or the version, in such a way that the important aspects of the path and
HTTP parameters can still be understood. We believe this strategy works for the
same reason that HTTP version mangling does. When a censor?s DPI splits the
request line, the third component is no longer a well-formed HTTP version. These
strategies are also in violation of the RFC, but are still understood by versions of
Apache.
The reason these strategies work is the initial path is being interpreted as the
real path, HTTP server logs confirmed this, whereas the whitespace is creating a
new request line element that might be interpreted as the version. We found these
strategies worked in China and India, but not in Kazakhstan, which is consistent
with our results from HTTP Version mangling.
Changing Case In HTTP requests, there are some components that the RFCs
118
specify should be case-sensitive, including the method (RFC7230 Section-3.1.1) and
version (RFC7230 Section-2.6), while others that should be case-insensitive, like
header names (RFC7230 Section-3.2). We discovered strategies that change the
case of the method, version, or of the Host header name itself (such as to host). All
of these work in India, but do not work in China or Kazakhstan. These strategies
tell us that the Airtel censor is too strict in how it processes HTTP requests.
Request Line Whitespace RFC 7230 specifies that a single space should delimit
between the Method, Path, and Version fields, but that servers should ignore extra-
neous whitespace before the method and after the version, and treat any contiguous
blocks of whitespace as a single space [110, Section 3.5]. The RFC classifies ?whites-
pace? as space (URL-encoded: %20), horizontal tab (%09), vertical tab (%0B), form
feed (%0C), or bare carriage return (%0D). It also states that servers should treat
newlines (%0A) as a \r\n, or the intended line delimiter.
These rules permit a wide variety of ways to modify a request line without
altering syntax, and we found a total of 33 unique strategies that take advantage
of inserting some form of whitespace within the request line. Some of these strate-
gies are simple: in China, we can insert a single additional space after the HTTP
Method and evade Host-based censorship (though this does not work for keyword-
based censorship). We present an example in Figure 5.3a. Other strategies in this
family are more complicated: in Kazakhstan, if a strategy inserts 1,434 whitespace
characters after any item in the request line, it will evade the censor. We find that
the strategy can get away with inserting only one whitespace character if it inserts
119
it before the method. The Indian censor we tested was the most brittle with respect
to whitespace. We discover other strategies in this class that work by inserting
certain patterns of additional whitespace between the HTTP version and the \r\n.
For example, appending a \n\t to the Version is not sufficient to evade the Indian
censor, but \n\t\n\t, (or any number of spaces), will evade.
Although not all of our servers under test correctly responded to all of these
strategies, most of them did, and whitespace-inserting strategies remain the strategy
class that is most broadly successful across server and censor types.
Host Header Whitespace Similar to inserting whitespace around the request
line, we also discovered 21 strategies that involve inserting certain amounts of specific
whitespace characters around the Host header. RFC 7230 defines the correct format
for headers as:
<NAME>:<OPT WSPACE><VALUE><OPT WSPACE>
where <OPT WSPACE> is optional whitespace, consisting only of spaces and horizon-
tal tabs (RFC 7230, section 3.2) [110]. Strategies in this class insert additional
whitespace into the optional whitespace locations or even around the header name
itself.
In China, inserting whitespace before the header name (which is not RFC com-
pliant), successfully evades Host-based censorship, but not path-based censorship.
This suggests the GFW fails to parse headers that begin with whitespace, but it
can still parse and identify forbidden keywords in the path. In India, we find that if
a strategy inserts a whitespace character before or after the Host header name, or
120
a single newline character around the Host header value, it will evade the censor.
In Kazakhstan, we found similar rules for which strategies work and why. We
find that inserting one space after the header value or anywhere around the name
evades. Using tabs or newlines instead of spaces works only slightly changes the
requirements: inserting one tab anywhere around the header name or value or a
newline anywhere except the end of the header, evades censorship.
Induced Segmentation One simple-seeming strategy we discovered in India
works by simply inserting more data anywhere in the request to make it at least
1,449 bytes long. We present an example in Figure 5.3b. What is special about this
number of bytes? With an HTTP request at least 1,449 bytes long, the added bytes
for IP (20 bytes), and TCP headers (32 bytes, including the timestamp option) total
52, bringing the request size up to 1501 bytes. Since this is exactly one byte past the
Ethernet MTU (1500 bytes) [122], we conclude that this strategy works by inducing
segmentation. Prior work has found that the Indian censor can be evaded by simple
segmentation, which supports this hypothesis [2].
We observe a similar strategy in Kazakhstan, but slightly more complexity
is required. Instead of inducing segmentation anywhere in the request, our system
discovered that if a strategy induces segmentation specifically at the byte index
between the Host header name and value, it will evade censorship. It accomplishes
this by inserting enough bytes such that the 1,449th byte is the last byte before
the host header value, and the final two bytes before the host header value must
both be spaces. We do not understand why two spaces are required for this strategy
121
to work. These strategies are perfectly RFC-compliant, and every server we tested
responded correctly. We found no evidence that this type of strategy has any effect
on China?s censors, however many of these strategies still evade in China due to
other unrelated reasons, such as whitespace insertion or long header names.
Path Confusion Another family of strategies we discovered involves adding addi-
tional characters, parameters, or anchors to the path that are ignored by the server,
but processed by the censor. For example, the strategy that inserts a single ? before
the start of the path evades in India and China (for both header and keyword cen-
sorship). Technically, ? is only allowed to start a path if the path is empty, but we
find that every Apache version we tested still correctly processed the path and the
request. Another strategy in this family works by inserting a new very long HTTP
parameter (at least 1,003 bytes long) before the forbidden keyword; this only works
in China.
Host Header Shield The next strategy we discuss evades China?s keyword and
host-based censorship. Recall that inserting a single space after the HTTP Method
is sufficient to evade China?s Host-based censorship, but does not evade its keyword
censorship. Our system found that by also inserting a new header before the host
header with a header name that is at least 64 bytes long, it could evade both keyword
and Host censorship simultaneously. This only works if whitespace is inserted before
the HTTP Method or between the Method and Path, not anywhere else in the
request line.
Why does this strategy work? It seems strange that adding a space before the
122
path is required to evade Host-based censorship, and adding a long header before
the Host header is required to evade keyword-based censorship (although we note
this is sufficient on its own to evade header censorship). Our results suggest that a
64+ byte header name prevents the GFW from reading any further headers, which
explains why the longer header is enough to defeat header censorship. We believe
that the added space in the request line forces the GFW to look for the Host header
before it processes the path. If the strategy does not include the modified header,
or includes it after the Host header, the GFW inspects the path correctly, but if we
interfere with this search for the Host header, the GFW fails to check the contents
of the path.
Sandwich Strategy The last type of strategy we will analyze creates a sandwich
of headers around the Host header, and we find that if these headers are crafted in
the correct way, we can bypass keyword and header censorship in China and India.
We present an example in Figure 5.3c.
In China, we find the following constraints:
? The first header that appears in the packet must have at least 64 characters
in the header name.
? Enough data must be transferred in the headers such that some header?s value
starts at least 1280 bytes away from the start of the headers (first character
of header value is at least the 1281st byte after the request line)
? The last header must be at least 129 bytes total (including ending \r\n and
the separator ?:?)
123
? The Host header cannot be the first or last header.
This type of strategy works in both header- and path-based censorship, though
we note it is technically overkill to defeat header-based, as a single long (64+ byte)
header is enough. We also found that many sandwich strategies work in India, but
only because the header size induces segmentation.
5.4.3 External Validation
To demonstrate that this approach works without control of the destination
server, we trained our system against 12 censored domains (6 in Kazakhstan and
6 in India). We downloaded CitizenLab?s censorship test lists for India and Kaza-
khstan [123], and tested all the domains to identify which were censored, and then
chose 6 randomly for each country. We do not know the type or version of these
servers.
Our system successfully identified evasion strategies for every domain we tested.
Across these twelve experiments, we discovered 13 unique strategies, 7 of which do
not work on any of the other HTTP servers we tested. These experiments demon-
strate the generalizability of this technique to new application servers, and under-
score the importance of having an automated solution in this space.
Method Mangling Here, we showcase a surprising class of strategies we dis-
covered during this validation phase. This strategy works by simply corrupting
the HTTP method and replacing it with another string. Note that this is abso-
lutely not RFC-compliant; RFC 7231 (Section 4) specifically mentions that any
124
non-conforming method should be denied [111]. However, we find that some HTTP
servers, when confronted with an HTTP method they do not recognize, choose to
default to an HTTP GET request and respond as normal. We found this behavior
only on a subset of HTTP servers that hosted censored domains outside our control,
and we identified that nginx 1.10.3 responds to this query. The Apache and Nginx
server versions we controlled did not respond to these requests with invalid methods.
None of the censors we tested could censor this strategy, including for both
China?s Host-based and keyword-based censorship. This suggests that the censors
validate or require a valid HTTP Method before processing the rest of the request.
5.5 DNS Results
We trained our system against all three of China?s DNS injectors by using a
domain that is on all three blocklists (?google.sm?) to eight different open resolvers
(see Table 5.1). In prior work, researchers identified that these different DNS injec-
tors could be differentiated based on the fields set in the DNS responses. To avoid
ambiguity, we will refer each of the three injectors using the same terminology as
Anonymous et al. and identify them by idiosyncratic fields they set in their response
headers: Injector #1 (TTL=60, AA=1, DF=0), Injector #2 (AA=0, DF=1), and
Injector #3 (AA=0, DF=0, IPID=0) [33].
In total, we discovered 9 unique strategy types, 5 of which defeat all three
injectors simultaneously. After our training runs, we performed manual analysis of
the strategies to understand why they worked against each DNS injector. For each
125
Apache 2.4.X Nginx 1.X.X Country
CN-CN-
Family Strategy 6 18 29 43 13.414.116.119.0 IN KZ
H K
Case [HTTP:host:*]-changecase{lower}-? 3 3 3 3 3 3 3 3 - - 3 -
Sensitivity [HTTP:host:*]-changecase{upper}-? 3 3 3 3 3 3 3 3 - - 3 -
Four [HTTP:version:*]-insert{%09:middle:value:14}-? 3 3 - - - - - - 3 3 3 -
Element [HTTP:path:*]-insert{%09:end:value:1434}-?
3 3 - - - - - - - 3 3 -
Request [HTTP:path:*]-insert{1:start:value:507}-?
Line [HTTP:path:*]-insert{%20:end:value:1}-?
3 3 - - 3 3 3 3 - 3 3 -
[HTTP:path:*]-insert{g:end:value:1013}-?
[HTTP:path:*]-insert{%20:start:value:1}-?
[HTTP:host:*]-duplicate(replace{/:name:64} 3 3 - - 3 3 3 3 3 3 - -
(replace{/?ultrasurf:value},),)-?
Host
[HTTP:host:*]-duplicate(replace{a:name:64},)-? 3 3 3 3 3 3 3 3 3 - - -
Header
[HTTP:method:*]-insert{%09:end:value}-?
Shield 3 3 - - - - - - - - 3 3
[HTTP:host:*]-duplicate(replace{a:name:64},)-?
[HTTP:method:*]-insert{%0A:start:value:1}-?
3 3 - - 3 3 3 3 - - 3 3
[HTTP:host:*]-duplicate(replace{%2F:name:64},)-?
[HTTP:method:*]-insert{%20:end:value:1}-?
3 3 - - 3 3 3 3 3 3 - -
[HTTP:host:*]-duplicate(replace{%2F:name:64},)-?
[HTTP:path:*]-insert{%20:start:value:1}-?
3 3 - - 3 3 3 3 3 3 - -
[HTTP:host:*]-duplicate(replace{%C2%B0:name:32},)-?
[HTTP:host:*]-duplicate(insert{%0A:end:value:1},)-? 3 3 - - 3 3 3 3 - - 3 -
[HTTP:host:*]-duplicate(insert{%0A:random:name:1},)-? - - - - 3 3 3 3 - - 3 -
[HTTP:host:*]-duplicate(insert{%20%0A:end:name:1},)-? - - - - 3 3 3 3 - - 3 -
[HTTP:host:*]-insert{%09:end:name}-? 3 3 - - - - - - - - 3 3
[HTTP:host:*]-insert{%09:end:value:1}-? 3 3 3 3 - - - - - - - 3
[HTTP:host:*]-insert{%09:start:value:1}-? 3 3 3 3 - - - - - - - 3
Host
***[HTTP:host:*]-insert{%0A%0A:start:value:1}-? - - - - - - - - - - 3 3
Header
[HTTP:host:*]-insert{%0A%20:start:value:1}-? 3 3 - - - - - - - - 3 3
Whitespace
[HTTP:host:*]-insert{%0A:end:value:1}-? 3 3 - - 3 3 3 3 - - 3 -
[HTTP:host:*]-insert{%20%0A:start:name:1}-? - - - - 3 3 3 3 3 - 3 3
[HTTP:host:*]-insert{%20:end:name:1}-? 3 3 - - - - - - - - 3 3
[HTTP:host:*]-insert{%20:end:value:1}-? 3 3 3 3 3 3 3 3 - - - 3
***[HTTP:host:*]-insert{%20:start:name:1}-? - - - - - - - - 3 - 3 3
***[HTTP:host:*]-insert{%20:start:value:2}-? - - - - - - - - - - - -
[HTTP:path:*]-replace{/:value:1434}-? 3 3 3 3 3 3 3 3 - - 3 -
[HTTP:host:*]-insert{%20:start:value:1413}-? 3 3 3 3 3 3 3 3 - - 3 -
[HTTP:host:*]-insert{%20:start:value:1434}-? 3 3 3 3 3 3 3 3 - - 3 3
[HTTP:method:*]-duplicate(,replace{a:name:1407})-? 3 3 3 3 3 3 3 3 3 - 3 -
[HTTP:method:*]-insert{%09:end:value:2568}-? 3 3 - - - - - - - - 3 3
[HTTP:method:*]-insert{%0A:start:value:4336}-? - - - - 3 3 3 3 3 3 3 3
[HTTP:method:*]-insert{%20:end:value:1413}-? 3 3 - - 3 3 3 3 3 - 3 -
[HTTP:method:*]-insert{%20:end:value:1720}-? 3 3 - - 3 3 3 3 3 - 3 3
[HTTP:path:*]-duplicate(replace{a:name:1}
3 3 3 3 3 3 3 3 - - 3 -
(insert{a:start:value:1408},),)-?
Long [HTTP:path:*]-insert{%0D:end:value:1434}-? 3 3 - - - - - - 3 3 3 -
Request [HTTP:path:*]-insert{%20:end:value:1413}-? 3 3 - - 3 3 3 3 - - 3 -
[HTTP:path:*]-insert{%20:start:value:1}-?
[HTTP:path:*]-replace{3:value:511} 3 3 - - 3 3 3 3 3 3 - -
(insert{&:start:value},)-?
[HTTP:path:*]-insert{%23:end:value:1413}-? 3 3 - - 3 3 3 3 - - 3 -
[HTTP:path:*]-insert{%23:end:value:1}
{ } 3 3 - - 3 3 3 3 - - 3 -(insert %C3:end:value:470 ,)-?
[HTTP:path:*]-insert{%3F:end:value:1413}-? 3 3 3 3 3 3 3 3 - - 3 -
[HTTP:path:*]-insert{%3F:start:value:1413}-? 3 3 3 3 - - - - 3 - 3 -
[HTTP:path:*]-replace{/:value:1414}-? 3 3 3 3 3 3 3 3 - - 3 -
[HTTP:version:*]-insert{%20:end:value:1434}-? 3 3 - - 3 3 3 3 - - 3 3
[HTTP:version:*]-insert{%20:start:value:1434}-? 3 3 - - 3 3 3 3 - - 3 3
[HTTP:version:*]-insert{%25:middle:value:1434}-? 3 3 - - - - - - 3 3 3 3
[HTTP:version:*]-insert{%C2%81:end:value:773}-? 3 3 - - - - - - - - 3 3
[HTTP:version:*]-insert{%C3%8B:middle:value:717}-? 3 3 - - - - - - 3 3 3 3
Table 5.2: HTTP evasion strategies and where they succeed. A strategy is successful
against a nation if it evades that nation?s censor. A strategy is successful to a server
if it evades in at least one country and is accepted by the server. CN-H and CN-K
stand for the China Headers and China Keyword modes respectively. ?***? denotes
a strategy found against a live server we did not control; though these evade in some
of our tested countries, but do not receive responses from the servers we tested. This
table is continued i Table 5.3.
126
Apache 2.4.X Nginx 1.X.X Country
CN-CN-
Family Strategy 6 18 29 43 13.414.116.119.0 IN KZ
H K
***[HTTP:method:*]-duplicate(,)-? - - - - - - - - - - 3 3
Method
***[HTTP:method:*]-replace{%3A:value:1}-? - - - - - - - - 3 3 3 3
Mangling
***[HTTP:method:*]-replace{HTTP/1.1:value:1}-? - - - - - - - - 3 3 3 3
[HTTP:path:*]-duplicate(insert{3:middle:value:1004},
Path { } 3 3 3 3 3 3 3 3 - 3 3 -replace &ultrasurf:value )-?
Confusion
[HTTP:path:*]-insert{%3F:start:value:1}-? 3 3 3 3 - - - - 3 - 3 -
[HTTP:method:*]-insert{%09:end:value:1}-? 3 3 - - - - - - - - 3 3
***[HTTP:method:*]-insert{%09:start:value:1}-? - - - - - - - - - - 3 3
[HTTP:method:*]-insert{%0A:start:value:1}-? 3 3 - - 3 3 3 3 - - 3 3
[HTTP:method:*]-insert{%0B:end:value:1}-? 3 3 - - - - - - - - 3 3
[HTTP:method:*]-insert{%0D:end:value:2}-? 3 3 - - - - - - 3 3 3 3
[HTTP:path:*]-insert{%09:end:value:1}-? 3 3 - - - - - - - - 3 -
[HTTP:path:*]-insert{%09:start:value:1}-? 3 3 - - - - - - 3 - 3 -
Request
[HTTP:path:*]-insert{%0C:start:value:1}-? 3 3 - - - - - - 3 - 3 -
Line
[HTTP:path:*]-insert{%0D:start:value:1}-? 3 3 - - - - - - 3 3 3 -
Whitespace
[HTTP:path:*]-insert{%20:end:value:1}-? 3 3 - - 3 3 3 3 - - 3 -
[HTTP:path:*]-insert{%20:start:value:1}-? - - - - - - - - 3 - - -
[HTTP:version:*]-insert{%0A%09%0A%09:end:value:1}-? - - - - 3 3 3 3 - - 3 3
[HTTP:version:*]-insert{%0A%09:end:value:1}-? - - - - 3 3 3 3 - - - 3
[HTTP:version:*]-insert{%0A%20%0A%20:end:value:1}-? - - - - 3 3 3 3 - - 3 3
[HTTP:version:*]-insert{%20%0A%09:end:value:1}-? - - - - 3 3 3 3 - - 3 3
[HTTP:version:*]-insert{%20:end:value:1}-? 3 3 - - 3 3 3 3 - - 3 -
[HTTP:host:*]-duplicate(replace{%C3%97:name:596},
3 3 - - - - - - 3 3 3 3
insert{%20:end:name:786})-?
[HTTP:host:*]-replace{%5E:name:926}
Sandwich (duplicate(duplicate(,replace{host:name:1} 3 3 3 3 3 3 3 3 3 3 - 3
Strategy (insert{%20:start:value:3238},)),),)-?
[HTTP:host:*]-replace{%C3%97:name:1358}
(duplicate(duplicate(,replace{host:name:1} 3 3 - - 3 3 3 3 3 3 3 3
(insert{%20:end:value},)),),)-?
[HTTP:host:*]-replace{%C3%97:name:1371}
3 3 - - 3 3 3 3 3 3 3 -
(duplicate(duplicate(,replace{host:name:1}),),)-?
[HTTP:host:*]-insert{%20:end:value:4081}
(duplicate(duplicate(,replace{a:name:1}), 3 3 3 3 3 3 3 3 - 3 - 3
insert{%09:start:name:3238}),)-?
[HTTP:host:*]-insert{%20:end:value:4081}
(duplicate(duplicate(insert{%09:start:name:3238},), 3 3 - - 3 3 3 3 - 3 - 3
replace{a:name:1}),)-?
[HTTP:host:*]-replace{PUT:name:423}
{ } 3 3 3 3 3 3 3 3 3 3 3 -(duplicate(duplicate(,replace host:name ),),)-?
Version [HTTP:version:*]-duplicate-? 3 3 - - - - - - - - 3 -
Mangling [HTTP:version:*]-replace{OPTIONS:value:1}-? 3 3 - - - - - - 3 3 3 -
Table 5.3: Continuation of Table 5.2. A strategy is successful against a nation if
it evades that nation?s censor. A strategy is successful to a server if it evades in
at least one country and is accepted by the server. CN-H and CN-K stand for the
China Headers and China Keyword modes respectively. ?***? denotes a strategy
found against a live server we did not control; though these evade in some of our
tested countries, but do not receive responses from the servers we tested.
127
Strategy Family Strategy CF G Q9 OD CB CS DW V
[DNS:*:*]-tamper{DNS:nscount:replace:1} 3 - - 3 3 - - -
Elevated Count;
(tamper{DNS:z:replace:1}
ZBit truncated
(tamper{DNS:tc:replace:1},),)-?
Elevated Count [DNSQR:qname:*]-tamper{DNS:qdcount:replace:2}-? 3 - - - - - - -
Long Secondary [DNSQR:qclass:]-tamper{DNS:ancount:replace:98}-? 3 - - 3 - - - -
query; Elevated Count [DNSQR:qtype:]-replace{%C3%95:name:262}-?
Long Secondary Query [DNSQR:qname:*]-duplicate(,replace{%C2%91:name:957})-? - - 3 - - 3 3 3
[DNS:*:*]-tamper{DNS:qd:compress}
Compression 3 3 - - - - - -
(tamper{DNS:qdcount:replace:2},)-?
Table 5.4: Summary of the five DNS strategy families we discover that defeat all
three DNS injectors simultaneously, and which DNS resolvers respond to them:
Cloudflare (CF), Google (G), Quad9 (Q9), OpenDNS (OD), CleanBrowsing (CB),
ComodoSecure (CS), Verisign (V), and DNS.Watch (DW). Our system successfully
identified strategies for every DNS resolver, and also identified four more unique
variants to these strategies that only disabled a subset of the injectors.
of the success rates below, we test each strategy 1000 times. See Table 5.4 for the
full breakdown of results.
Elevated Count Fields The simplest family of strategy types we discovered
works by simply increasing one of the count fields (qdcount, ancount, arcount, or
nscount) by 1. All four of these strategies are in violation of the RFC: the request
only contains 1 Question Record and 0 Answer, Name Server, or Additional records.
Surprisingly, each of the GFW?s injectors and open resolvers respond differently
depending on which field we modify.
Elevating the qdcount field to 2 evades all three GFW injectors with 100%
success rate, but only Cloudflare will respond to the query. Elevating the ancount,
arcount, or nscount evade only DNS injector 2 and 3. Cloudflare responds to all of
these queries, OpenDNS responds only to elevated ancount and nscount, and none
of the other resolvers responded to any of them.
DNS Compression The next strategy we discover works by performing DNS
compression on the DNS query and then increasing the qdcount field to 2. DNS
compression (defined by RFC 1035 [8]) works by splitting the DNS query across
128
multiple records at the separator. This strategy is related to the Elevated Count
Fields strategies, but uses DNS compression to increase the number of DNS Ques-
tion Records in the packet to actually be 2. Technically, since the domain is com-
pressed across multiple DNS question records, the request has two DNS Question
Records attached to it, even though they only comprise one DNS Question. This
strategy evades all three DNS injectors with 100% reliability, but is only supported
by Google and Cloudflare. We note that DNS compression alone does not evade
censorship, it must be paired with the elevated qdcount.
Truncated-Reserved The next strategy we discover works by increasing the
nscount to 1 (which evades GFW injector #2 and #3), setting the reserved z field
to 1, and setting the tc (truncated) bit to 1. The combination of the truncated field
and reserved field both being set to 1 evades injector #1 with approximately 50%
success rate. Therefore, if this strategy is used with a domain blocked by injector
#2 or #3, it will evade with 100% reliability, but if the domain is also included
on injector #1?s blocklist, it will only evade with 50% reliability. Frankly, we do
not understand the cause of why this strategy works only 50% of the time against
injector #1.
Multibyte Long Query Injection The next strategy type we discover relies on
injecting new text into the requests; specifically, it creates a second DNS Question
Record after the forbidden query containing a request for a domain filled with 2-
byte-wide multibyte UTF-8 characters. Surprisingly, all three of the GFW?s injectors
have problems handling requests that contain multibyte characters, but a different
129
number of multibyte characters is required to cause trouble for each injector. A
strategy will evade injector #1 if it inserts a new DNS Question Record containing
at least 241 2-byte-wide multibyte characters. A strategy will also evade injector
#3 with at least 482 multibyte characters; any less, and the strategy fails to evade
#3. We note that the required number to evade injector #3 is exactly double that
required to evade injector #1. Injector #2 can also be evaded with a 36% success
rate with 721 2-byte-wide multibyte characters; any less than 721 and the strategy
fails to evade #2. This success rate can be increased to 97% with at least 1,334
multibyte characters. Interestingly, not all multibyte characters work: for all three
injectors, only the characters within the range of %C[2-F]%[80-BF] succeed, and
only 2-byte-wide characters work; 3-byte-wide characters do not.
Note that all of these requests are not RFC compliant. According to RFC
1035 (Section 2.3.4), the limit to names is 255 bytes; in all the above cases, the
DNS Question Record contains many more bytes than this. Different DNS resolvers
have different policies as to if they respond to these queries. Quad9, Comodo, and
DNS.Watch all respond to these queries normally, while Verisign responds only to
25% of the queries (we suspect this is due to load balancing between resolvers that
may or may not be able to handle the queries). None of the other resolvers respond
to these requests.
Multibyte and ARCount Our system also identified a combination strategy of
the above multibyte strategy and elevated arcount; this strategy creates a second
DNS Question Record that contains 242 multibyte characters and sets the arcount
130
field to 1. This strategy exemplifies how the different injectors can be defeated
individually; by setting the arcount field, the strategy bypasses injector #2 and
injector #3, and using 242 multibyte characters bypasses injector #1. The benefit
of this re-combination of the above strategies is that it permits different resolvers
to respond: by injecting fewer characters, Cloudflare and OpenDNS now respond to
the query, but Quad9, Comodo, and DNS.Watch will not respond to the elevated
arcount.
5.6 Discussion
In this section, we discuss our results, and what we can learn about the nation-
state censors.
How can censors defend against these attacks? Censors could read this work
and try to patch each individual issue we identify; however, we do not think censors
will be able to easily (or cheaply) defend against all these attacks. Our results point
to a broader trend about protocol compliance in censoring middleboxes. In order to
effectively defend against these attacks, censors must always be more permissive in
inputs they tolerate than servers on the other side of the connection. In cases where
the censor was significantly more RFC-compliant (such as in India), our system had
the easiest time discovering ways to evade censorship.
Even beyond censors needing to be more permissive than servers, to effectively
censor, the censor must also maintain at least as much state as servers on the
other side of the connection. If a server buffers more bytes than the censor does, a
131
client can simply make the request longer until the forbidden keyword or header is
outside the censors buffer, as we?ve seen in China. This is good news for evaders, as
addressing this issue completely will likely require the censors to buffer vastly more
data than they do currently. These trends hold across both HTTP and DNS.
What HTTP strategies work most often, and what do censors most com-
monly do wrong? The most common strategy we find by far is various forms
of injecting whitespace, in both the headers and the request line. In fact, 53 of our
77 strategies work by inserting some form of whitespace, and 38 of which require no
further modifications. The HTTP RFCs have many rules about where whitespace
should be allowed, ignored, or disallowed, and we identified many cases in which
the censor processes whitespace where it should not, or fails to process it where
it should. Another common failure mode we observed from the censor was being
unable to process a large request from a client, though each censor we studied was
affected for a different reason.
What class of strategies are most broadly applicable across server versions
and resolvers? For HTTP, we again find that inserting whitespace in different
places around the request line or header value. The RFCs mention that certain types
of whitespace should be ignored for robustness, so strategies that inject whitespace
in these locations are most commonly versatile across server versions. We find
that many of the server versions we tested often accept too much whitespace for
robustness?s sake, despite what the RFC says.
For DNS, we found little overlap between the queries accepted between the
132
different resolvers. Our most broadly applicable strategies only worked on half
of the resolvers we tested, and most worked across even less. In general, lack of
generalizability for DNS strategies does not affect usability the same way for HTTP.
The reason for this is that if a user wishes to use our strategies to perform forbidden
DNS lookups, the user can do all of those lookups to the same resolver. Over HTTP,
by contrast, the evasion strategy must be compatible with the server on the other
end of the connection, and every site the user visits may be using a different server
version.
Is any one location in the HTTP or DNS header more prone to having
viable evasion strategies? Overall, we found strategies for every major com-
ponent of the HTTP request: 31 strategies acted on the Host header, 16 acted on
the Method, 22 acted on the Path, and 13 acted on the Version. Note that these
numbers do not add to 77, as there is overlap in strategies that act on multiple parts
of the request. In DNS, our strategies were also fairly well distributed throughout
the DNS header, and only a few fields were never co-opted by a strategy for evasion.
How does China?s Host header censorship compare to keyword censor-
ship? In general, we find that almost all the strategies that evade keyword-based
censorship in China also evade host-based censorship (17 out of 22). This interest-
ing finding suggests that in order to correctly censor keywords, the GFW must be
able to read the Host header, or read all the headers without problems and find no
host header. Our results also suggest that the reverse is not true: no strategies that
affected only the Host header were able to evade keyword-based censorship. We
133
also find that more strategies can evade host-based censorship by simply injecting
whitespace, compared to keyword censorship.
How do China?s three DNS injectors compare to one another? We find
differences between all three injectors that affects how well our strategies work.
Injector #1 was the most permissive to fields being incorrect in the DNS header,
and therefore had fewer strategies work; for example, Injector #1 still correctly
processed forbidden DNS queries if the arcount, ancount, or the nscount fields
were non-zero. Injector #2 had the most idiosyncratic responses to multibyte UTF
characters: injecting between 721 and 1,333 multibyte characters caused Injector
#2 to fail at least 33% of the time (and the failure rate increased as the number of
inserted characters increased); after 1,334 characters, Injector #2 fails 100% of the
time. Every strategy that evaded Injector #2 also evaded Injector #3, though we
discover that Injector #3 has different limits to the number of multibyte characters
it will tolerate in the DNS Query Records (a limit of 482). Overall, our results
further emphasize that these injectors are truly separate, each with their own block
list and weaknesses.
How generalizable is this technique to the future? We believe this technique
should generalize well to other protocols. Many application-layer protocols fit the
abstraction we defined for this chapter (with smaller, discrete components that
compose within a larger message). For example, TLS records are comprised of
fixed static fields, and dynamic TLS Messages and TLS Extensions. We leave the
implementation of this to future work.
134
5.7 Ethical Considerations
We design our experiments to limit the potential impact to other hosts and
the risk to real users. This work does not involve human subjects, and therefore falls
outside the purview of our Institutional Review Board; still, we follow best practices
laid out by prior censorship studies [1, 52].
We perform all of our system training exclusively from vantage points we
control, and our work does not require recruiting users (unwitting or not [90]).
Our system does not spoof IP addresses or impersonate other machines, and our
interactions with the censors should have had no impact on any other users. To limit
the effect of our training on the network, we evaluate strategies serially (and with
a small sleep for DNS), which limits how quickly our system can generate traffic.
This is important, as some of our training runs that involved hosts outside our
control (such as with open DNS resolvers), and we believe our impact to these hosts
is minimal. For example, our DNS training had a network load of approximately
11kbps, which should be a negligible volume of traffic for the size of the networks
we test with.
5.8 Conclusion
In this chapter, we present the first techniques to automate the discovery of
new censorship evasion techniques purely in the application layer. The approach
is applicable to HTTP and DNS, and we trained our system against three distinct
135
HTTP and DNS censors across China, India, and Kazakhstan. In total, we discover
9 unique strategies for DNS and 77 unique evasion strategies for HTTP, which
exploit differences between how the censor and destination server process a request.
All of these evasion strategies require only application-layer modifications, making
them easier to incorporate into applications and deploy.
Taken collectively with the Chapter 3 and Chapter 4, I have demonstrated
that it is possible to render middleboxes ineffective at implementing their policy
(incapable of correctly censoring traffic when they should) from the client-side and
server-side and via both TCP/IP and application-layer packet manipulations.
136
Chapter 6: Censorship-in-Depth: Iran
Through the years of implementing, evaluating, and applying Geneva, I have
observed that censoring nation-states have deployed new, more sophisticated censor-
ship infrastructures, with multiple middleboxes running in parallel. This provided
a unique opportunity to evaluate whether my thesis applies even as censors evolve:
that is, whether Geneva is able to quickly and effectively render new forms of cen-
sorship ineffective. In this chapter and the next, I evaluate this in the context of
new forms of censorship in Iran and China, respectively.
Censoring nation-states employ defense-in-depth, layering multiple orthogo-
nal censorship mechanisms to make it more difficult to communicate with certain
destinations or via certain protocols. Typically, such ?censorship-in-depth? involves
wholly different systems, such as combining lemon DNS responses [38,41], IP block-
ing [1, 124], and TLS SNI blocking [2]. As a result, each form of censorship targets
different packets, and can often be studied and defeated in isolation.
Far less common are censorship mechanisms that directly compose with one
another, and target the same packets. In such situations, it is more difficult to
study censorship because the two mechanisms? side effects can be conflated, and
it is more difficult to evade censorship because one must evade both mechanisms
137
simultaneously.
In early 2020, Iran launched such a form of censorship-in-depth by deploying
their protocol filter. A protocol filter only allows a small list of protocols to be used,
and censors protocols it forbids. A similar system in Iran was first reported on by
Aryan et al. [55] in 2013, but to the best of our knowledge was not used for years
until it was turned back on in 2020. We are also unfamiliar with any work detailing
how Iran?s protocol filter works or how to evade it?underscoring the difficulties
inherent in measuring and circumventing censorship-in-depth.
In this chapter, I present a detailed analysis of Iran?s protocol filter: how it
works, its limitations, and how it can be defeated. Even though the protocol filter
operates concurrently with and on the same traffic as Iran?s standard deep packet
inspection (DPI)-based censorship, we demonstrate that it is possible to engage
with each censoring mechanism in isolation. That is, we show how to evade the
filter only, the regular censorship system only, and both in tandem. We report on
the three evasion techniques Geneva discovered, as well as the results from our follow-
on experiments that expose what the filter targets and what protocol fingerprints it
uses.
The rest of the chapter is organized as follows. ?6.1 reviews prior work in
measuring Iranian censorship. ?6.2 describes our methodology and vantage points
we use for our experiments. ?6.3 presents our analysis of the protocol filter. ?6.4
discusses how the protocol filter can be evaded. Finally, ?6.5 concludes.
138
6.1 Iranian Censorship Background
Iranian censorship has been studied in broader efforts to measure global cen-
sorship [52, 125?129]. There have been fewer studies specific to how Iran?s censor-
ship operates. Notably, Anderson proposed a technique for detecting censorship via
throttling in Iran [130].
The most closely related study to this chapter was a 2013 study by Aryan et
al. [55]. They observed throttling between two vantage points that affected SSH,
custom obfuscated SSH, and custom obfuscated HTTP. Since HTTP and HTTPS
were unaffected, the authors hypothesized that Iran had deployed a protocol filter
and were throttling connections that did not match HTTP and HTTPS. This be-
havior disappeared shortly after Iran?s June 2013 election, and to the best of our
knowledge, there have been no further reports on protocol filtering.
The censorship system observed by Aryan et al. in 2013 differs significantly
from what we observe in 2020. First, the censorship mechanism is different; the
prior system throttled forbidden protocols, but we observe outright dropping of all
packets for some period of time. Second, the affected ports appear to be different;
Aryan et al. observed filtering of SSH but not HTTP, but we find this no longer to
be true (we find more nuanced behavior, and test a wider set of protocols). We are
the first to delve deeply into how the protocol filter works and how to evade it, and
thus cannot compare our results directly.
139
6.2 Methodology
We performed our experiments from 6 vantage points geographically dispersed
within Iran: Fars, Isfahan, Khorasan, Razavi, Tehran, and Zanjan. These contain
a mix of both residential and business networks.
In our experiments to measure the protocol filter (?6.3), we performed active
measurements from these vantage points to servers we controlled outside of Iran, in
Amazon EC2, Microsoft Azure, and DigitalOcean (located geographically in Japan,
Ireland, the United States, Australia, and India). We find no significant difference
in the behavior of the protocol filter across any of our vantage points or external
servers, nor did we observe any change in the behavior of the filter during the course
of our experiments to the time of writing.
To develop new evasion strategies (?6.4), we used Geneva and trained it from
the client-side and the server-side to discover ways to defeat the protocol-filter in
isolation.
6.3 Protocol Filter
In this section, we explore how Iran?s protocol filter operates and whom it
affects, and we detail precisely what properties it looks for when filtering DNS,
HTTP, and HTTPS traffic.
140
6.3.1 How Iran?s Protocol Filter Works
We performed active measurements to answer the following questions about
the mechanics of the protocol filter:
How does the protocol filter censor forbidden protocols? Once a connection
is observed to be communicating with a disallowed protocol, the protocol filter
censors the connection. The filter censors connections by dropping all packets from
the client in the flow1. The protocol filter can be triggered manually by sending
any data stream on a monitored port that does not resemble a permitted protocol.
Packets within the censored flow from the server are unaffected: the client still
receives all of the packets sent by the server even after the protocol filter has been
tripped. However, because the client cannot acknowledge or respond to any data,
the connection is effectively censored.
Which ports and protocols does the filter monitor? From our vantage
points, we made connections to servers we controlled outside of Iran and repeatedly
sent messages containing just the string ?test? (a payload that is not compliant with
any of the protocols we tested) between sleeps for every possible destination port
value (0-65535). Connections that time out identify which ports are likely affected
by the filter. We repeated this experiment three times to validate our results.
We find that Iran?s protocol filter affects only TCP traffic, and only on ports
53 (commonly DNS), 80 (commonly HTTP), and 443 (commonly HTTPS). Traffic
1We define ?flow? to refer to the unique four-tuple of source and destination IP addresses and
ports.
141
Protocol Standard
Filter Censorship
Permitted Internet
Client
Blackhole
Not permitted
Figure 6.1: Iran?s layered censorship system, employing defense in depth. Note that
the order of censorship systems is unknown; this is simply a graphical depiction.
sent on any other port is not filtered (and is therefore also not subject to Iran?s
standard censorship, which only operates over these same ports).
We then sent well-formatted messages of a variety of protocols (DNS, HTTP,
HTTPS, SMTP, and SSH) on these ports. Of these, we find that the filter permits
only DNS, HTTP, and HTTPS traffic. However, none of these are bound to their
standard ports: the filter matches all three protocols on any of the three ports.
How many packets does the filter monitor? To answer this question, we
sent multiple packets with non-protocol data (e.g., ?test?) before well-formatted
allowed protocol data. We determined that the filter monitors the first two data-
carrying packets from the client at the start of a connection. If either of those two
packets matches a protocol fingerprint, the flow is unharmed; if no packet does, the
second packet and rest of the flow are dropped.
How long does the filter censor an offending flow? To test this, we in-
142
tentionally tripped the protocol filter, waited an interval of time, and then sent
non-data-carrying packets in the censored flow. Recall that once we trigger the fil-
ter, these packets will be dropped if the filter is still censoring our flow. We repeat
this experiment with time intervals from 1 second to 90 seconds, each time using
different source ports to avoid experiments conflicting with one another.
We find that, once tripped, the filter will continue to drop the offending flow?s
network traffic for 60 seconds, but each time an additional packet is sent in a flow, the
60 second timer resets. This means that, in practice, because TCP will retransmit
packets that are not acknowledged, an offending flow will be affected by the filter
for much longer than 60 seconds.
Is the protocol filter bidirectional? ?Bidirectional? censorship systems do not
differentiate between the client being the host inside or outside the nation-state.
Iran?s standard censorship system operates bidirectionally; it can be triggered by
making requests from outside the country to servers inside the country (or vice
versa). As a result, bidirectional censorship is often easier for researchers to study.
However, we find that the filter is not bidirectional: it only affects connections
where the client is inside Iran. The server also receives almost no indication cen-
sorship has taken place. Recall that packets from the server are unaffected: unlike
with the Great Firewall of China, which sends RSTs in both directions [24], Iran?s
protocol filter only affects the packets sent by the client. This makes it difficult to
identify and study the protocol filter without vantage points within Iran.
Can the protocol filter reassemble TCP segments? We repeatedly made
143
#IPs Provider
1,453 Amazon Technologies Inc.
565 Cloudflare, Inc.
229 Akamai Technologies, Inc.
171 Amazon.com, Inc.
167 Fastly
146 DigitalOcean, LLC
97 Amazon Data Services Limited
92 RIPE Network Coordination Centre
64 Linode
60 Amazon Data Services
Table 6.1: Top 10 providers for affected IP addresses.
valid but segmented DNS, HTTP, and HTTPS requests on filtered ports2. We find
that segmenting our requests too many times incurs censorship from the protocol
filter, indicating that, like Iran?s regular censorship infrastructure [1,2,55], the filter
is incapable of reassembling TCP segments. We also note that the filter also does
not check the checksums of the packets it processes.
6.3.2 Whom the Filter Is Applied To
During our experiments, we noticed that the protocol filter is not applied to
all server IP addresses. We find that whether or not an IP address is filtered is
consistent between our vantage points; we could not identify any destination IP
addresses for which the protocol filter was active from one vantage but inactive
from another.
To identify which IP addresses are affected by the filter, we tested the effects
of the filter on the Alexa top-20,000 most popular websites. To avoid the effects of
2We disabled Nagle?s algorithm for this experiment to avoid spurious segment reassembly in-
terfering with our results.
144
#IPs Provider
4,541 Cloudflare, Inc.
1,465 Unknown
657 Google, LLC
657 Alisoft
580 Amazon Technologies, Inc.
544 Asia Pacific NIC
537 RIPE Network Coordination Centre
287 Alibaba.com LLC
277 Amazon.com, Inc.
253 Akamai Technologies, Inc.
Table 6.2: Top 10 providers for unaffected IP addresses
DNS censorship or requesting IP addresses inside of Iran (as the requests would not
cross the filter), we used dig outside of Iran to get IP addresses for all 20,000.
Inside of Iran, we set up an experiment with two conditions. Our experiment
The first condition was a control: we made normal GET requests to all 20,000 IP
addresses and recorded the success or failure of each request. The second condition
tested for the filter: we requested all 20,000 IP addresses again, this time sending
?G?, ?ET?, and ?/? in separate messages3. IP addresses that respond in the first
condition but time out in the second condition are likely affected by the protocol
filter. We perform this experiment ten times to validate the results.
Over all ten experiments, 3,595 IP addresses (17.9%) tripped the filter at
least eight times. Of those, 3,499 were affected all ten times (17.4%), and 278
(1.4%) IP addresses were affected 3?7 times. Tables 6.1 and 6.2 show the number of
IP addresses per provider that were affected and unaffected by the protocol filter,
respectively. Overall, we find that IP address provider is not correlated with whether
the filter affects an IP address or not, but some prefixes are affected significantly
3We performed this experiment over raw sockets, with Nagle?s algorithm again disabled.
145
more heavily than others.
Case Study: Cloudflare We explore how Cloudflare in particular is affected by
Iran?s protocol filter, as Cloudflare hosts the most IP addresses from our dataset.
Cloudflare makes its entire list of IP addresses publicly available4. Many of these
prefixes are prohibitively large; instead of testing every IP address in each prefix,
we sampled 256 IP addresses at random from each prefix to test. We performed a
similar experiment to the one above: given a Cloudflare IP address, we made two
requests to it (first normally, then segmented); IP addresses that respond in the first
condition but time out in the second condition are likely affected. We repeated this
experiment five times for each prefix.
We found that only two of Cloudflare?s prefixes contained IP addresses that are
affected by the filter: 104.18.0.0/16 and 104.31.82.0/24. All of the IP addresses
we tested in both of these prefixes were affected by the filter, but none of the IP
addresses from the other prefixes were. It is unclear why these prefixes are targeted
specifically. We were unable to identify any commonality between the sites hosted
on these prefixes compared to unaffected prefixes.
We also performed traceroutes to a sample of the affected and unaffected
IP addresses owned by Cloudflare. We were unable to identify consistent routing
differences between them. At this time, it is not clear why the protocol filter affects
the IP addresses it does.
4https://www.cloudflare.com/ips/
146
6.3.3 Protocol Fingerprints
By repeatedly, manually tweaking the payloads of permitted protocols and
observing what gets censored and what does not, we reverse engineered the filter?s
fingerprints for each protocol. Knowing the fingerprints can be a powerful tool for
evaders: recall that the filter only monitors the first two data-carrying packets, and
thus sending compliant packets at the start of a flow can allow all subsequent packets
to bypass the filter. Since the filter will match any of these fingerprints on all three
ports, any fingerprint can be used on any protocol-filtered ports.
DNS Fingerprint To match the protocol filter?s fingerprint for DNS-over-TCP,
the following conditions must be met:
1. The TCP payload must be at least 12 bytes long.
2. The query/response (qr) field must be 0.
3. The question count must be less than 15.
4. The answer count must be 0.
5. The structure of the TCP payload must be a valid DNS-over-UDP header, not
a DNS-over-TCP header.
For example, the following message would be permitted by the DNS fingerprint:
\x00\x00\x01\x00\x00\x01
\x00\x00\x00\x00\x00\x00
147
The last requirement appears to be a bug in the implementation of the DNS
fingerprint. Recall that the DNS-over-UDP header is slightly different than DNS-
over-TCP?s; over TCP, the DNS header includes a length field [131]. Since the filter
is only active over TCP but does not take the extra field into account, it will never
match a legitimate DNS-over-TCP packet. We believe the reason this oversight has
not caused a significant issue is because DNS-over-TCP generally only requires a
single data-carrying packet from the client, but Iran?s protocol filter only begins
dropping packets on the second data-carrying packet.
However, the faulty DNS fingerprint does still pose a problem: clients can
reuse DNS-over-TCP connections [132]. In such cases, the filter would allow the
first query, but block any subsequent queries made within 60 seconds.
HTTP Fingerprint To match the HTTP fingerprint, the following conditions
must be met:
1. The TCP payload must be at least 8 bytes long.
2. The payload must start with one of the following HTTP verbs: GET, POST,
HEAD, CONNECT, OPTIONS, DELETE, or PUT.
3. The HTTP verb must be followed by one space.
Note that two HTTP verbs are not supported by the protocol filter: PATCH
and TRACE. Any website in the affected IP address space that uses either of these
would be censored.
For example, a message permitted by the HTTP fingerprint is: GET testing123.
148
HTTPS Fingerprint To match the HTTPS fingerprint, the following conditions
must be met.
1. The TCP payload must be at least 41 bytes long: 5 bytes for the TLS header,
36 bytes for the TLS Client Hello.
2. The length field of the TLS Header must correctly describe the length of the
Client Hello.
3. The TLS version header (bytes 2 and 3 of the TCP payload) must be TLS 1.0
(\x03\x01), 1.1 (\x03\x02), or 1.2 (\x03\x03).
The last requirement makes no practical difference; real TLS 1.x Client Hellos
all have TLS 1.0 in this field.
Also, the last requirement again appears to be an error in the design of the
protocol filter. It allows TLS versions 1.0, 1.1, and 1.2 to be declared, but this
version field is not used accurately in practice: TLS servers must accept any two
byte value in this field so long as the first byte is \x03 [133, Appendix E].
The HTTPS fingerprint does not filter specific HTTPS connections or appli-
cations; it simply enforces that generic TLS is used. As a result, censorship evasion
tools that use TLS will likely be unaffected by the protocol filter at this time, as
they will fulfill the above fingerprint requirements by default. This also means the
protocol filter would spare more secure DNS transport protocols, such as DNS-over-
HTTPS and DNS-over-TLS, if those protocols were used over one of the affected
ports.
After the first 5 bytes of the packet (the type, version, and the length), the
149
protocol filter does not check any of the remaining contents of the Client Hello. So
long as the first 5 bytes match the fingerprint and the packet is of the proper length,
the rest of the packet can comprise arbitrary data and bypass the filter.
An example message that matches the HTTPS fingerprint is: \x16\x03\x01\x02\x00
followed by 512 null bytes, where \x16 is the indication of a handshake, \x03\x01
is TLS version (1.0), and \x02\x00 is the length of the Client Hello (512 bytes).
Using Fingerprints We find that any of the fingerprints can be used to evade
the filter. This presents an opportunity for censorship evasion tool developers: by
sending any fingerprint at the start of a connection (or injecting it as an ?insertion
packet? [1, 24, 134]), we can ensure the filter will permit the rest of the flow, re-
gardless of the actual protocol used. As we will see in the next section, Geneva also
independently discovers strategies to inject innocuous fingerprints from the client-
side.
6.4 Evading the Protocol Filter
In this section, we demonstrate how to evade Iran?s protocol filter. We begin
by demonstrating that known evasion strategies developed against Iran?s standard
censorship infrastructure do not apply to the protocol filter.
6.4.1 Old Strategies Do Not Apply
We first explored whether we could apply the same strategies that work against
Iran?s regular censorship system (affecting HTTP and HTTPS) to evade the protocol
150
filter.5 The only functioning strategy in Iran we are aware of is simple segmentation:
simply splitting the censored request into multiple packets to take advantage of the
censor?s inability to reassemble TCP segments. We find that no other strategies
identified by Geneva or prior work defeats Iran?s censorship system.
Unfortunately, the effectiveness of the segmentation strategy depends on its
implementation: it does not necessarily generalize, and at worst, can be counterpro-
ductive to evasion. In the worse case, if the HTTP request is segmented at a byte
index less than 8, although the regular HTTP censor can no longer recognize it,
the first packet will not match the protocol filter fingerprints and incur censorship.
However, if the HTTP request is segmented such that the first segment fulfills the
requirements of the HTTP fingerprint (it is at least 8 bytes long and is well-formed),
and the Host: header is split across the second segment, the strategy can defeat
both the protocol filter and the HTTP censor.
Importantly (and as we will see throughout this section), merely evading the
regular censorship system does not necessarily imply defeating the protocol filter.
6.4.2 Evolving New Strategies
To identify new strategies to defeat the protocol filter, we leveraged Geneva, an
open-source genetic algorithm designed to evolve packet-manipulation strategies to
evade censorship [1]. Unlike most anti-censorship systems, Geneva does not require
deployment at both ends of the connection: it runs exclusively at one side (client
5Contrary to the 2013 findings by Aryan et al. [55], from our vantage points, we find that Iran?s
standard censorship infrastructure no longer targets DNS-over-TCP at all.
151
or server) and defeats censorship by manipulating the packet stream to confuse the
censor without impacting the underlying connection. Geneva?s packet manipulation
strategies are expressed in a domain-specific language [1]; we describe each in plain
English, but to allow us to unambiguously express strategies, we also present them
using Geneva?s language.
Geneva evaluates strategies with a fitness function, which returns a numeric
score that captures how successful a given strategy is at evading censorship. Strate-
gies that receive a higher score are more likely to survive and pass their ?genetic
code? to the next generation. Geneva tries to perform some forbidden action while
a strategy manipulates the packet sequence: if the forbidden action succeeds, the
fitness function rewards the strategy; if it fails, the strategy is punished. To apply
Geneva to the protocol filter, we wrote a custom fitness function. Our custom fitness
function connected to a vantage point outside of Iran and repeatedly sent messages
to intentionally trip the filter. As Geneva allows for new fitness functions to be added
dynamically, this required no changes to Geneva itself. Using this fitness function,
we can test and train strategies directly against the filter. Note that this fitness
function does not try to trigger the standard censorship system.
We deployed Geneva against the protocol filter with a single evolution from the
client-side. We follow the original training hyperparameters for Geneva and configure
Geneva with a population pool of 200 individuals and 50 generations. In under two
hours, it discovered three simple strategies that defeat it. All the strategies discussed
herein have a 100% success rate against the protocol filter.
152
6.4.3 Discovered Evasion Strategies
Strategy 22: Innocuous Fingerprint The simplest strategy Geneva identified
was to inject a PSH/ACK packet with a corrupt checksum and an innocuous HTTP
request as the payload immediately following the 3-way handshake. This trivially
serves to bypass the filter, as it matches the protocol fingerprints. However, because
the checksum is corrupt, the server will not accept this packet. There are other
variants of this strategy that ensure that the filter processes the packet but the
server does not, such as setting the TTL large enough to reach the censor but too
small to reach the server [1].
We note that we did not need to encode anything in Geneva for it to discover
this strategy; Geneva already has the capacity to replace the TCP payload with a
well-formed query for several protocols within its tamper primitive.
Strategy 22: Innocuous Fingerprint
[TCP:flags:PA]-duplicate(
tamper{TCP:load:replace:GET%20testing123}(
tamper{TCP:chksum:corrupt},),
),)-| \/
Strategy 23: Double FINs This strategy works by sending two additional packets
before the 3-way handshake starts: two empty packets with the FIN flag set. To the
server, the FIN packets are ignored, as they are not a part of an active connection,
but the filter processes them and causes it to ignore the rest of the connection. We
do not understand why this strategy works, though we hypothesize the FIN packets
trick the filter into thinking it has already missed the relevant data packets, causing
153
it to ignore the rest of the flow.
Strategy 23: Double FIN
[TCP:flags:S]-duplicate(
tamper{TCP:flags:replace:F}(
duplicate,),
)-| \/
Although Geneva discovers this strategy with two FIN packets, we find that
sending more than two FIN packets also works.
Strategy 24: Nine ACKs The final client-side strategy we present is stranger than
the first two: this strategy works by sending nine copies of the ACK packet during the
3-way handshake. This causes the filter to ignore the rest of the flow. This strategy
works 100% of the time, and does not affect the underlying TCP connection. We
hypothesize this works because the filter has some internal limit on the number of
packets it will process for a given flow.
Strategy 24: Nine ACKs
[TCP:flags:A]-duplicate(
duplicate(duplicate,duplicate),
duplicate(duplicate,duplicate(
duplicate(duplicate,),
))
)-|
This strategy does not require ACK packets to work: any combination of non-
data-carrying packets, including RSTs or SYNs, is also effective. The nine injected
packets also need not have the correct seq or ack numbers: the strategy defeats the
protocol filter regardless.
154
This strategy presents us with an opportunity to evade the protocol filter from
the server side. Server-side censorship evasion allows completely unmodified clients
to connect directly to a server while the server subverts censorship on behalf of the
clients [2].
Since Strategy 24 is effective with any set of TCP flags, if a server can in-
duce the client to send nine non-data-carrying packets before it sends its forbidden
request, we can defeat the protocol filter. We can accomplish this using a trick
from prior deployments of Geneva: by sending multiple SYN+ACK packets during the
three-way handshake with a corrupted ack number, we induce the client to respond
with multiple RST packets.
Strategy 25: Nine Induced RSTs, Server Side
[TCP:flags:SA]-duplicate(
tamper{TCP:ack:corrupt}(duplicate(
duplicate(duplicate,duplicate),
duplicate(duplicate,duplicate(
duplicate,))
),),
)-| \/
Strategy 25: Nine Induced RSTs This strategy sends nine corrupted SYN+ACKs,
followed by one unaltered SYN+ACK. This induces the client to send nine RST packets
with corrupted sequence numbers before sending its normal ACK, thereby evading
the protocol filter.
We note that all of these strategies defeat the protocol filter only, not the regular
censorship system that works in tandem. These allow us to bypass the filter and
study Iran?s existing DPI censorship system in isolation.
155
6.5 Conclusion
In 2020, Iran took the latest step in censorship-in-depth by deploying a pro-
tocol filter alongside their standard censorship infrastructure. In this chapter, I
have performed a deep investigation into Iran?s protocol filter. Using vantage points
within Iran and servers outside, we empirically demonstrated how the protocol fil-
ter works, what its fingerprints are, and to a lesser extent whom it filters. Also,
using Geneva [1, 2], I identified four ways to bypass the protocol filter?three from
client-side and one from server-side. My results collectively show that Iran?s two
censorship systems can still be studied in isolation, and bypassed together.
Iran has had a greater capacity for censorship than they have exercised in the
past, and the protocol filter can pose a threat to existing deployments of censorship-
evasion tools (VPNs, Tor, etc.). As the censorship arms race advances, we anticipate
censorship-in-depth to become increasingly common. In the next chapter, I will
show a second example of a censorship-in-depth censorship deployment, this time
in China, and will show that my thesis still holds.
156
Chapter 7: Censorship-in-Depth: China?s
SNI Censorship
As shown in the previous chapter, censorship-in-depth deployments can com-
plicate censorship measurements and censorship evasion. In this chapter, I will
showcase a second example of censorship-in-depth, this time in China, where I dis-
covered that the GFW was using two independent middleboxes running in parallel
to censor HTTPS connections with SNI.
As much of the web transitions to HTTPS, nation-state network censors have
less information to base their decisions of whether to block or tear down a connec-
tion. Whereas HTTP permitted deep packet inspection (DPI) of keywords, HTTPS
hides all request and response data through encryption. However, the server name
indication (SNI) field in the TLS handshake reveals the website to which the client
wishes to connect. Censors such as China and Iran have thus used the plaintext
SNI field to guide their censorship decisions and, in some cases, outright block all
traffic that seeks to hide the SNI through encryption (ESNI) [36].
As a result, significant effort has been paid to understanding and evading
SNI censorship, with particular attention paid to one of the world?s largest censors,
157
the so-called Great Firewall of China (GFW). In 2019, Chai et al. [7] empirically
evaluated how SNI censorship operated in China, and argued for the importance of
using ESNI. Unfortunately, China began blocking all ESNI traffic the next year [36].
In 2020, Bock et al. investigated how to evade China?s SNI censorship [2] and recently
demonstrated how to weaponize it to launch availability attacks [135]. Through all
of this work, a mental model emerged that indicated that China uses a single model
of middlebox to detect and react to SNI connections.
In this chapter, I show that in fact China?s GFW uses two distinct censorship
mechanisms in parallel to censor HTTPS based on SNI.1 We first discovered this
second HTTPS censorship middlebox while trying to reproduce the censorship eva-
sion results from Chapter 3 for HTTPS. We observed that some censorship evasion
strategies could evade the GFW?s known HTTPS censorship, but small modifica-
tions could cause strategies to fail unexpectedly: via a single RST packet deeper
in the TLS handshake. Now, we understand and report on the root cause of this
strange behavior: the GFW had a second censorship middlebox all along.
In this chapter, I present a detailed analysis of China?s secondary HTTPS
censorship middlebox: how it works, how it can be triggered, and how it can be
defeated. We confirm this behavior is caused by a separate middlebox by identify-
ing unique TCP-layer bugs in each middlebox, suggesting separate TCP stacks [2].
These findings are important in refining our understanding of SNI censorship in
China?they resolve some of the confusing behavior previously identified and chart
1We only focus on SNI-based censorship of HTTPS, and thus use ?HTTPS censorship? and
?SNI censorship? interchangeably.
158
a clearer path forward for how to measure and evade SNI censorship more precisely.
This is especially important now, as China has effectively stopped the roll-out of
ESNI within its borders [36] and Russia is actively working to do the same [136].
These findings also support my thesis, and show that middleboxes can be rendered
ineffective, even in more complex deployment scenarios.
Whereas prior approaches and the previous chapter investigate cooperating
mechanisms that aim to censor different but complementary protocols, we have
identified two distinct mechanisms that both aim to censor the same exact protocol
(SNI-based HTTPS). As we will demonstrate, this makes it particularly challeng-
ing to disentangle the two, as they both operate on the same packets. Our find-
ings demonstrate what we believe to be a novel way in which nation-states employ
censorship-in-depth. Although it is tempting to think of them as a single ?black
box? of censorship, this chapter shows that it is both possible and important to
tease them apart into their constituent components, even in this deployment con-
text.
The rest of this chapter is organized as follows. ?7.1 discusses the methodology
for our experiments. ?7.2 shows how we can evade the newly discovered censorship
middlebox and how censorship evasion is critical for our measurements of the new
middlebox. ?7.3 studies the functionality of the new middlebox. Finally, ?7.4 dis-
cusses ethical considerations and ?7.5 concludes.
159
7.1 Methodology
Measuring two censorship mechanisms that both operate on the same packets
is challenging. To understand how they both operate independently and in con-
junction with one another, our methodology involves evading one of the boxes to
selectively measure the other. In this section, we describe our high-level approach
to evasion and measurement.
Admittedly, our approach was somewhat circular: our initial measurements
provided insight that allowed us to begin evading, which let us perform more mea-
surements, and so on. Thus, to best understand our methodology, it is useful to
also understand at a high level how the two censorship mechanisms work, which we
also provide here.
Vantage Points We obtained two censored vantage points inside China (Beijing)
and external uncensored vantage points in Japan (Tokyo) and the United States
(Iowa, Virginia). Our Chinese vantage points are located within different ISPs, but
Xu et al. found that the GFW?s actual deployment of certain censoring middleboxes
may vary based on the type of ISP [137], so our conclusions are limited by the ISPs we
can measure. We use the vantage points in China as our ?client,? and our vantage
points outside as our ?server.? Throughout our experiments, we only connect to
machines we control.
Detecting Evasion of One Mechanism It is straightforward to determine if we
have evaded both of the censorship mechanisms?we need only see if we received
160
the censored content. But how can we determine if we have evaded censorship of
only one box?
The key insight is that the two mechanisms block censorship in different ways.
The GFW?s primary (already known) censorship middlebox operates by injecting an
idiosyncratic pattern of three RST+ACK packets to both the client and the server once
it observes a TLS Client Hello with a forbidden Server Name Indication (SNI) field [2,
7,135]. We will refer to this primary middlebox as MB-RA (MiddleBox RST+ACK). The
GFW?s secondary SNI censorship middlebox, by contrast, tears down connections by
injecting one single RST packet: we will refer to this middlebox as MB-R (MiddleBox
RST).
Unless otherwise specified, we configured our vantage points to drop all out-
bound RST and RST+ACK packets. Thus, we expect any RST or RST+ACK packets
received by our client to come from the MB-R or MB-RA middleboxes, respectively.
Triggering Censorship We trigger censorship by injecting forbidden domain names
in the SNI field (though all communication is strictly between the client and server
machines we control). However, we have found that it is not always sufficient to
stop sending packets at that time.
Unlike MB-RA, MB-R does not tear down a connection immediately after ob-
serving a forbidden SNI. Instead, it waits to inject its RST packet until the client
sends the next packet in the TLS handshake: the ClientKeyExchange or the
ClientChangeCipherSpec. Note that the forbidden SNI field is not present in
either of these messages. MB-R is a stateful middlebox that is triggered by the for-
161
Figure 7.1: A waterfall diagram of the TCP 3-way handshake and the TLS hand-
shake, denoting where the already known MB-RA and newly discovered MB-R mid-
dleboxes act during the connection. Note that MB-R does not act until deeper in
the handshake than MB-RA (and only if MB-RA does not act), seemingly acting as a
backup middlebox for China?s HTTPS (SNI) censorship.
162
bidden SNI field in the Client Hello message but does not act until after the client
continues the handshake. We believe this is the reason researchers have not reported
on this middlebox until now. Figure 7.1 illustrates the TCP 3-way handshake and
TLS handshake and where each of the two middleboxes acts.
Isolating the Second Middlebox Studying MB-R is also made more difficult
because MB-RA and MB-R seem to interact with one another. Specifically, when MB-RA
takes action to tear down a connection, MB-R does not act even if MB-RA fails to tear
down the connection or the connection continues. We performed an experiment in
which we instrumented a vantage point within China and a server outside of China
to drop all inbound RST+ACK packets and tried to complete a TLS handshake with
a forbidden SNI between them. If MB-R and MB-RA operated independently, after
both sides of the connection drop the RST+ACKs injected by MB-RA, we would expect
MB-R to inject its RST packet once the client continues the TLS handshake.
Instead, we find that any time MB-RA injects packets, MB-R stops paying at-
tention to the connection entirely. We believe the injected RST+ACK packets from
MB-RA are causing MB-R to tear down its TCB (Transmission Control Block) for the
connection. This experiment suggests that MB-R is a backup censorship middlebox
for MB-RA: it only injects RST packets if MB-RA fails to take action.
This interaction between MB-RA and MB-R also offers MB-R a way to avoid state
exhaustion: once a connection is torn down by MB-RA, MB-R does not need to continue
tracking it. We believe this interaction also explains why other components of the
GFW will stop paying attention to a connection if the client injects a RST+ACK packet
163
(a TCB Teardown attack). Researchers have wondered why the GFW continues to
be vulnerable to TCB Teardown attacks to this day, despite having been reported
for years [2,16,23,24,40,70]. If the GFW is architected to internally use the RST+ACK
packets injected by one middlebox to prevent state exhaustion in other middleboxes,
this would explain why TCB Teardown attacks have not been patched.
Unfortunately, this interaction between MB-R and MB-RA makes studying it in
isolation difficult. The only signal we have to measure MB-R is its injection of RST
packets, but it does not inject these packets until deeper in the TLS handshake than
MB-RA. We could repeatedly make forbidden connections until MB-RA fails to inject
packets, but to make reliable measurements, instead we leverage packet manipula-
tion evasion strategies to evade MB-RA without affecting MB-R.
Evading Censorship We leveraged an open-source tool called Geneva (Genetic
Evasion), a genetic algorithm designed to discover packet manipulation-based cen-
sorship evasion strategies. Geneva has been used successfully against the GFW in
the past [2, 36,40], as well as censorship infrastructure in other countries [2, 3].
The output of Geneva is sequences of packet manipulations that confuse or
disable a censoring middlebox. Central to Geneva?s ability to find evasion strategies
is its fitness function, which evaluates how successful a strategy is against a given
censor. For this work, we made a small modification to Geneva?s reward function
to optionally ignore inbound RST packets on both sides of the connection. This
enables us to optionally train Geneva to find strategies that defeat only the RST+ACK
middlebox (since MB-RA injects RST+ACK packets, not RST packets).
164
After using Geneva, we performed manual follow-up experiments to understand
how each strategy works. To compute reliability for each strategy, we used each
strategy 100 times while trying to complete a full TLS handshake with a censored
keyword in the SNI field (wikipedia.org) between vantage points within China
and outside of China.
Because of the interaction between the two middleboxes, we are only able to
defeat either MB-RA alone or both of them together. Recall that the only signal we
have to measure MB-R?s reaction is it injecting RST packets, but it does not do this
injection until later in the TLS handshake after MB-RA may act. It is possible that
there exist packet sequences that confuse or disable MB-R without disabling MB-RA,
but we are unable to confirm this.
7.2 Evasion
In this section, we will report on client-side strategies we discovered with
Geneva that defeat only MB-RA and both MB-RA and MB-R. Following precedent from
prior work, we will report on the strategies we find both in text and include the
Geneva syntax that implements the strategy.
7.2.1 MB-RA Evasion Strategies
The most reliable working client-side strategy that we found first sends two
SYN packets, then splits the TLS Client Hello in half to make two TCP segments, and
165
sends them out of order2. In our testing, this strategy worked with 99% reliability.
Strategy 26: MB-RA: Double-SYN Segmentation
[TCP:flags:S]-duplicate-|
[TCP:flags:PA]-fragment{tcp:-1:False}-|
The fact that this strategy works is strange and surprising. The GFW is
known to be capable of reassembling TCP segments, even if sent out of order [2].
Indeed, if the second SYN packet is removed, the strategy no longer works, as MB-RA
reassembles the TLS Client Hello and censors the connection. This strategy suggests
that MB-RA is keeping track of both the TCP handshake and the TLS handshake, but
seeing the unexpected SYN packet interferes with its ability to reassemble messages.
We do not know why this is. Note that this strategy does not evade MB-R; this only
disables MB-RA.
The second type of client-side strategy we discovered that defeats MB-RA also
involves abusing MB-RA?s ability to reassemble TCP segments. This strategy involves
performing 6 TCP segmentations to create 7 total TCP segments out of the original
TLS Client Hello, with each segmentation reversing the order of the segments. In
the end, this strategy reverses the order of the segments exactly. This strategy
worked with 100% reliability.
This is not the only variant of this strategy that works to defeat MB-RA, but it
is not sufficient to simply split the TLS Client Hello into any seven segments. Geneva
found dozens of strategies with similar number and ordering of segmentation that
2Note that Geneva?s syntax represents TCP segmentation with the fragment action with the
tcp parameter.
166
Strategy 27: MB-RA: Segmentation Overload
[TCP:flags:PA]-fragment{tcp:-1:False}(
fragment{tcp:-1:False}(
,fragment{tcp:-1:False}),
fragment{tcp:-1:False}(
fragment{tcp:-1:False},
fragment{tcp:-1:False})
)-| \/
function, and hundreds more that do not.
Without a second SYN packet, at least 7 segments are required for this strategy
to work, and further segmentation does not negatively affect the reliability of the
strategy. Exactly reversing the order of the segments too is not a requirement; other
variants of this strategy exist that defeat MB-RA without defeating MB-R without this
property. Previous researchers found that different parts of the GFW have issues
reassembling segments less than 8 bytes long [2,70], but each segment in this example
is at least 24 bytes long.
We originally hypothesized that this series of segmentations must simply split
up the SNI field across multiple packets, but when this strategy is used, the SNI
field is intact and unchanged in a single TCP segment. Leaving the SNI field intact
is also not a requirement; other versions of this strategy that split the SNI field
across multiple segments and work equally well. Frankly, we do not understand why
this strategy defeats MB-RA.
167
7.2.2 Evading MB-RA and MB-R
Next, we will discuss strategies that can defeat both MB-RA and MB-R. Geneva
discovered variants of the aforementioned Segmentation Overload strategy that de-
feat both MB-RA and MB-R simultaneously, with 99% reliability. Like before, this
strategy performs multiple rounds of TCP segmentations on the TLS Client Hello
packet to produce 7 individual packets, most of which are out of order. Again, it is
not clear why this strategy works.
Strategy 28: MB-R & MB-RA: In-Order Segmentation Overload
[TCP:flags:PA]-fragment{tcp:-1:False}(
fragment{tcp:-1:False}(
,fragment{tcp:-1:False})
,fragment{tcp:-1:False}(
fragment{tcp:-1:True},
fragment{tcp:-1:False})
)-| \/
The most salient difference between strategies that defeat MB-R compared to
the previously discussed MB-RA-beating strategies is that these strategies contain at
least one middle pair of segments that remain in-order. The location of the SNI field
does not impact the reliability of this strategy; it can be included in any segment
or be split across multiple segments.
Geneva also found that it could combine pieces of the In-Order Segmentation
strategy to reduce strategy complexity. This next strategy works by duplicating the
SYN packet and performing three TCP segmentations of the TLS Client Hello.
In our follow-up experimentation, we find that in order for this strategy to
168
Strategy 29: MB-R & MB-RA: Double SYN, Triple Segmentation
[TCP:flags:S]-duplicate-|
[TCP:flags:PA]-fragment{tcp:-1:False}(
,fragment{tcp:-1:False}(
fragment{tcp:-1:True},)
)-| \/
defeat both MB-RA and MB-R, the segments must be sent in a specific order: the
fourth segment must be sent first, then the second segment, then the third, and
finally the first segment. Any deviation from this order causes MB-R to detect the
sequence, though any order in which the first segment is not sent first is sufficient
to evade MB-RA.
We verified that only the order in which the segments are sent matters, not
the content or size of the segments. We manually tested different strategies that
would make a single segment 188 bytes long (making each of the other segments just
a single byte long); as long as the correct segment order is maintained, the strategy
evades MB-RA and MB-R. We do not understand why these constraints apply.
We also rediscovered several strategies that researchers had found in the past
for other components of the GFW [24, 36, 40]: TCB Teardowns (injecting a TTL-
limited or checksum corrupted RST) and TCB Desynchronization (injecting a TTL-
limited or corrupt checksum with data).
7.3 How does MB-R work?
Now that we have a robust way to trigger MB-R in isolation, we can explore
how MB-R works. In this section, we report on MB-R?s functionality.
169
Which packets from the client will MB-R act upon? We performed a series
of experiments in which we instrumented a client to send a TLS Client Hello with a
forbidden SNI field (such as wikipedia.org), followed by different client handshake
messages or packet payloads, including empty packets, garbage messages, and HTTP
payloads. We did not observe a response from MB-R for any non-TLS messages nor
for ClientHandshakeFinished messages. We find that MB-R will only take action
if it sees a ClientKeyExchange or ClientChangeCipherSpec.
Is MB-R bidirectional? Yes, both MB-R and MB-RA track connections that originate
from both inside and outside of China. First, we confirmed that MB-RA is still
bidirectional: we made requests from vantage points we controlled outside of China
to our vantage points inside China, and in the opposite direction; in both cases, we
can trigger MB-RA. Next, we tested if MB-R also monitors traffic inbound to China
by sending multiple different packet sequences that evade MB-RA (in different ways)
but trigger MB-R and confirmed that MB-R is also bidirectional.
What is the reliability of MB-R and MB-RA? Previous researchers have found
that the GFW is not 100% reliable in its censorship (usually around 97%) [2,24,40].
To test the reliability of both the primary and secondary middleboxes, we sent
2,000 packet sequences with small sleeps in between for both MB-R and MB-RA from
a vantage point outside of China to servers we controlled inside China, each from
a fixed source port to a unique destination port. By observing which ports are
interfered with, we can estimate the reliability of each middlebox.
We find that MB-R interfered with 87.0% of the connections, and MB-RA inter-
170
fered with 88.2% of connections. Interestingly, these numbers composed together
explain the approximately 97% total reliability found by previous researchers [2]: the
likelihood of both middleboxes failing is approximately 1.8%, for a total reliability
of 98%.
What ports does MB-R monitor? Researchers in the past have reported that
the GFW?s SNI censorship middlebox (MB-RA) monitors all ports 1-65,535 [2]. To
test which ports MB-R monitors, we conducted an experiment in which we sent the
sequences of packets that trigger MB-R from our vantage points outside of China to
servers we control within China on every destination port. For this experiment, we
configured the server within China to drop all outbound RST and RST+ACK packets,
so we expect any RST or RST+ACK packet received by our vantage points outside of
China to originate from the middlebox. We also verified the sequence numbers of
inbound RST packets to prevent any spurious RST packets from interfering with the
experiment. To account for MB-R not being 100% reliable, for any port that did
not elicit censorship, we repeat the packet sequences to confirm whether or not the
failure was a fluke. We find that MB-R, like the already known MB-RA, monitors all
ports.
Does MB-R monitor ESNI or omit-SNI? In 2020, researchers discovered that
China had deployed a new censorship middlebox to censor uses of HTTPS with
Encrypted SNI (ESNI) [36]. They found that the new ESNI censorship middlebox
does not censor omit-SNI (Client Hello messages with the SNI field omitted), al-
though other censorship middleboxes have been observed censoring omit-SNI [138].
171
They determined that this censorship middlebox was different from the already
known MB-RA HTTPS (SNI) censorship middlebox and confirmed that MB-RA does
not monitor or censor uses of ESNI or omit-SNI. Does MB-R censor ESNI or omit-
SNI?
To test this, we modified the sequence of packets we discovered that trigger
MB-R. In the first experiment, we replaced the forbidden SNI TLS Client Hello with
a TLS 1.3 Client Hello with an ESNI extension. In the second experiment, we
replaced the forbidden SNI TLS Client Hello with a TLS Client Hello with no SNI
extension at all. We find that MB-R does not censor ESNI or omit-SNI connections.
Does MB-R middlebox have residual censorship? Residual censorship is a
feature of some censorship middleboxes in which after a censorship event occurs
between a pair of hosts, the censor continues to interfere with benign connections
between them for a short amount of time [135]. Some prior work has reported
that MB-RA has residual censorship [7], but other researchers have reported that
this residual censorship may be specific to certain vantage points [135]. From our
vantage points in China, we do not observe residual censorship for MB-RA: after a
censorship event, future benign connections between the same pair of hosts are not
affected.
To test if MB-R has residual censorship, we issued packet sequences that trigger
MB-R, and then sent follow-up benign connections. We find the same result as
MB-RA: we do not observe residual censorship. Unfortunately, like all censorship
measurement research, we are limited in what vantage points we can access, and
172
absence of evidence for residual censorship at both of our vantage points is not
evidence of its absence throughout the network. It is possible that MB-R?s residual
censorship varies by geographic location.
Does MB-R and MB-RA have the same blocklist? To test if MB-R and MB-RA have
different blocklists, we downloaded CitizenLab?s China (567 domains) and Global
(1,435 domains) test lists [104] to see if there were any domains censored by one
middlebox that was not censored by the other. For each domain on the test list,
we sent trigger packet sequences for both MB-RA and MB-R from the vantage points
we controlled in China to a vantage point outside of China with the test domain
in the SNI field of the TLS Client Hello. We used a unique source port for each
of these connections and our vantage points were configured to drop all outbound
RST and RST+ACK packets, so we expect any RST or RST+ACK packets we receive
to originate from the GFW. Note that since our vantage points do not experience
residual censorship for MB-RA or MB-R, residual censorship is not a concern for this
experiment. Since the reliability of MB-R and MB-RA are not 100%, we repeated this
experiment 5 times. As long as a test domain triggers a middlebox at least once,
we know it is censored.
We find that both middleboxes had the same response to all of the domains we
tested; if MB-RA censored it, so did MB-R and vice versa. This experiment supports
our theory that MB-R acts as a backup middlebox to MB-RA.
Where is MB-R deployed relative to MB-RA? To test where MB-R and MB-RA
are located on the network, we performed an experiment in which we TTL limited
173
the packet trigger sequences for both MB-R and MB-RA. By repeatedly sending a
trigger sequence of packets with increasing TTL values, we can see at what hop
each middlebox performs traffic injection. We repeated this experiment from both
of our vantage points inside of China destined to multiple vantage points outside
the country and then again in the reverse direction. We find that MB-RA and MB-R
were the same number of hops away from each test vantage point; this suggests
that they are collocated on the network level. This finding aligns with a previous
exploration of China?s censorship middlebox, which also found that China collocated
the censorship infrastructure for other protocols [2].
7.4 Ethical Considerations
We designed our experiments to minimize impact on other hosts and to mini-
mize risk to other users. All of our experiments with MB-R and training with Geneva
was done strictly between hosts we controlled and hosts not located in residential
networks. Geneva does not spoof IP addresses and generates a fairly small amount
of traffic while training [40]. We also followed the original experiment design of
Geneva and evaluated strategies serially to limit the volume of data we sent at once.
7.5 Conclusion
In this chapter, I showed that China?s SNI-based censorship has continued
to evolve, and supported my thesis in the context of more complex middlebox de-
ployments. We discover and report on the existence of a secondary SNI censorship
174
middlebox and show that the middleboxes can be studied in isolation.
It is somewhat surprising that China continues to invest in its SNI-based cen-
sorship, as TLS is evolving to incorporate encrypted versions with Encrypted SNI
(ESNI) and Encrypted Client Hello (ECH). Indeed, China continues to do so, and
they (as with other countries [136]) are working to block ESNI outright [36]. This
indicates that there is not yet enough critical mass behind ESNI/ECH to make
the collateral damage of blocking them prohibitively large for China. Until it is,
SNI-based censorship will remain a threat.
Our work also uncovers a more fundamental finding: censors are employing
censorship-in-depth not just by blocking multiple intersecting protocols but by de-
ploying middleboxes that target the same protocol in slightly different ways. The
techniques we presented in this chapter provide a potential path forward for under-
standing and evading these robust forms of censorship.
Collectively, these results show that automated, packet-manipulation-based
censorship evasion can render censoring middleboxes ineffective at censoring, and
that this thesis holds even as censors evolve and employ censorship-in-depth.
In the next chapter, I will demonstrate how packet manipulation attacks can
be used to render middleboxes ineffective by coercing them to enforce their policy
when they should not, to disastrous effect. Whereas Chapter 3-7 demonstrated
that packet manipulation strategies can render middleboxes ineffective at censoring,
the following chapters demonstrate another class of middlebox policies that can
be rendered ineffective. In particular, I will show that automated techniques can
discover how to weaponize middleboxes to launch attacks against innocent hosts.
175
Chapter 8: Weaponizing Censors for Am-
plification Attacks
In the previous chapters, I showed it is possible to trick middleboxes into failing
to implement their policy when they should, but it still leaves open the question of
the reverse: can middleboxes be coerced into taking action when they should not?
In this chapter, I show that this indeed, middleboxes can be rendered ineffective
in this way, and that by doing so, middleboxes can actually be leveraged to launch
startlingly effective volume-based reflected denial of service attacks.
Volume-based distributed denial of service (DDoS) attacks operate by pro-
ducing more traffic at a victim?s network than its capacity permits, resulting in
decreased throughput and limited availability. An important component in the ar-
senal of a DDoS attacker is the ability to amplify its traffic. Instead of sending traffic
directly to a victim V , the attacker spoofs V ?s source address, sends b bytes to some
amplifier host A, who then ?replies? to V with ? ? b bytes for some ? > 1. In this
manner, the attacker hides its IP address(es) from the victim, making it difficult to
simply filter the attack traffic at a firewall, and increases its effective capacity by
the amplification factor ?.
176
9
10
8
10
7
10
6
10
5
10
4 Memcached
10
(51,000x)
3
10
2 NTP
10 (556.9x)
1
10
0
10
0 1 2 3 4 5 6 7 8
10 10 10 10 10 10 10 10 10
IP Address Rank
Figure 8.1: The maximum amplification factor we obtained per IPv4 address, based
on several Internet-wide scans. (Note: the axes are log-scale.)
Some reflected amplification attacks can elicit impressive amplification factors.
Among the most notable, DNS has been shown to have an amplification factor of
54, while NTP offers up to 556.9 [139]. Misconfigured Memcached [140] servers can
provide amplifications over 51,000 [141,142], and were used against Github in 2018
in the largest known DDoS attack to date, achieving 1.35 Tbps at peak [143].
To date, almost all reflected amplification attacks have leveraged UDP. In fact,
to the best of our knowledge, there are no known TCP-based reflected amplification
attacks that send beyond a single SYN packet.1 This is because such attacks ap-
pear virtually impossible: to go beyond the SYN would seem to require an attacker
to (1) guess the amplifier?s 32-bit initial sequence number (ISN) in their SYN+ACK
packet2 and (2) prevent the victim from responding to the amplifier with a RST [37].
In this chapter, we show that it is indeed possible to launch reflected amplifi-
1We discuss non-reflected TCP-based amplification attacks in Section 8.1.
2We will use + to denote when a single packet has multiple TCP flags set.
177
Amplification Factor
cation attacks with TCP beyond a single SYN packet without having to guess initial
sequence numbers. The key insight is to not elicit responses from the destination,
but rather from middleboxes on the path to the destination.
Many middleboxes (especially nation-state censors) inject block pages or other
content (such as RST packets) [24, 86, 126, 144] into established TCP connections
when they detect forbidden requests. Moreover, because middleboxes cannot rely on
seeing all packets in a connection [50], they are often designed to operate even when
they see only one side of the connection. Our attacks tend to leverage non-compliant
middleboxes that respond without having to observe both ISNs. Our measurements
show that such middleboxes are surprisingly common on today?s Internet, and that
they can lead to amplification factors surpassing even many of the best UDP-based
amplification factors to date.
We introduce a novel application of a recent network-based genetic algo-
rithm [40] that discovers sequences of TCP packets that elicit large amplification
factors from middleboxes.
We perform a series of IPv4-wide scans of the Internet using ZMap [145], to
identify how many hosts can serve as amplifiers and quantify their amplification
factor. Figure 8.1 provides an overview of the maximum amplification factor we
were able to get from all IP addresses after several Internet-wide scans. We find
386,187 IP addresses that yield an amplification factor of at least 100?; 97,079
IP addresses that elicit a larger amplification factor than the infamous NTP at-
tack [139], and over 192 IP addresses that responded with a higher amplification
factor than Memcached [142].
178
Compared to SYN-only reflective amplification attacks, our attack identifies
two orders of magnitude more IP addresses [146,147], and we also find amplification
factors above 2,500?.
In fact, we find many hosts that effectively have an infinite amplification: in
response to one or two attack packets, these machines respond at their full capacity
indefinitely (barring packet drops) without any additional attacker involvement.
Czyz et al. [148] observed similar behavior when studying NTP amplification, and
called such hosts ?mega-amplifiers.? We at last answer the open question of why
some hosts provide such abnormally high amplification factors: we show that many
are actually sustained by the victims themselves, and others are due to routing loops.
Collectively, our results show that there is significant, untapped potential for
TCP-based reflective amplification attacks. To enable this new area of study, we
have made our code publicly available at https://geneva.cs.umd.edu/weaponizing.
Contributions We make the following contributions:
? We introduce a novel application of genetic algorithms to discover and maximize
the efficacy of TCP-based reflective amplification attacks, and identify 5 attacks
in total.
? We scan the IPv4 Internet to determine how many IP addresses can be used as
TCP-based amplifiers, and their amplification factor.
? We confirm that these amplified responses typically come from network middle-
boxes, including government censorship infrastructure and corporate firewalls.
? We resolve the open question of the root causes of ?mega-amplifiers.? We attribute
179
them to infinite routing loops and what we call ?victim-sustained amplification?,
in which victims? default responses (RSTs) actually induce the reflector to send
more data without additional effort from the attacker, leading to virtually infinite
amplification.
The rest of this chapter is organized as follows. I provide additional back-
ground on DDoS attacks specifically in ?8.1. In ?8.2, we present novel techniques
for discovering new TCP-based amplification attacks, and the results from applying
these techniques to live censoring middleboxes. Next, I describe our methodology
(?8.3) and results (?8.4) from scanning the entire IPv4 Internet with our newfound
attacks. I explore ?mega-amplifiers? in ?8.5. I discuss ethical considerations and
our responsible disclosure in ?8.6, potential countermeasures in ?8.7, and conclude
this chapter in ?8.8.
8.1 Background
Here, we define our threat model and review details of TCP and in-network
middleboxes that are relevant to our attacks.
Threat Model To maximize the applicability of our attacks, we make very few
assumptions about the adversary?s capabilities. In particular, we assume a com-
pletely off-path attacker: it cannot eavesdrop, intercept, drop, or alter any packets
other than the ones destined to it. We also assume that the attacker has the ability
to source-spoof its victim?s IP address. This would not be possible if the attacker?s
network performs egress filtering?that is, if it verified that the packets leaving its
180
network had IP addresses originating from within its network?but egress filtering
is still not yet widely deployed in practice [146,149,150].
TCP Basics To ensure in-order delivery of bytes, both ends of a TCP connection
assign 32-bit sequence numbers to the bytes they send. TCP connections begin
with a three-way handshake, during which the end-hosts inform one another of their
(random) initial sequence number (ISN). In a standard three-way handshake, the
client sends a SYN packet containing its ISNclient, to which the server responds with
a SYN+ACK that contains both its own ISNserver and ISNclient + 1 to acknowledge the
client?s ISN. Finally, the client acknowledges ISNserver by including it (plus one) in
an ACK packet. Following this, a typical client sends a PSH+ACK packet containing
its application-layer data (e.g., an HTTP GET request).
For a TCP connection to complete, the ISNs must be acknowledged with
perfect accuracy. If the client were to send an ACK acknowledging anything but
ISNserver + 1, the server would not accept the connection.
TCP-based Reflection Attacks In a reflection attack, an adversary sends to a
destination r a packet that spoofs the source IP address to be that of victim v. As
a result, r will believe v sent the packet, and will send its response to v. Reflection
can be useful to hide the attacker?s identity from the victim, and is commonly used
when the reflector r is also an amplifier, sending more data to v than r received
from the attacker.
Note that an adversary within our threat model cannot feasibly complete a
three-way handshake in a reflection attack. The adversary would send the SYN
181
while source-spoofing v, and thus the server?s SYN+ACK?with ISNserver?would be
sent to v, not the attacker. To complete the handshake, the attacker would have to
send a source-spoofed ACK, but would only have 2?32 chance of guessing the correct
ISNserver. Moreover, even if the adversary were to guess ISNserver, the victim (if
online) will respond to the server?s spurious SYN+ACK with a RST, thereby tearing
down the connection at the server.
Given these challenges, prior work assumed that TCP-based reflection attacks
were limited to the initial handshake, in which the attacker sends a source-spoofed
SYN and does not try to guess the appropriate ACK, let alone send an application-
layer PSH+ACK [146,147]. Ku?hrer et al. [147] showed that a single TCP SYN can result
in a surprising amount of amplification. Compliant servers amplify a small amount
because they retransmit SYN+ACKs a handful of times, until they timeout, receive
the appropriate ACK, or receive a RST from the victim. Ku?hrer et al. also found a
few non-compliant machines on the Internet that respond to SYNs with many more
packets, affording a greater amplification [146,147].
In this work, we discover that middleboxes enable more sophisticated TCP-
based reflected attacks beyond a single SYN. Compared to prior work, these new
middlebox-enabled attacks yield even higher amplification rates and provide larger
numbers of amplifiers that attackers can use.
Why should we think middleboxes might be vulnerable to this attack? Mid-
dleboxes often track the content of connections across multiple packets to handle
re-ordered or dropped packets. However, middleboxes may not see packets in both
182
directions. This is because the Internet can exhibit route asymmetry, whereby pack-
ets between two end-hosts may traverse different paths [151]. Consequently, a mid-
dlebox may only see one side of a TCP connection (e.g., the packets from client to
server). To handle this asymmetry, middleboxes often implement non-compliant or
partial TCP reassembly, allowing them to still block connections even though they
don?t see all of the packets in a connection.
Middleboxes? resilience to missing packets presents an opportunity to attack-
ers: a reflecting attacker may not need to complete the three-way handshake so long
as it can convince the middlebox that the handshake had been completed. Com-
bined with the packets they inject?especially block pages?middleboxes could be
attractive targets for reflected amplification. In the remainder of this chapter, we
show packet sequences that trick middleboxes into responding, and we show that
middleboxes can yield very large amplification factors.
Non-reflective and UDP Amplification Attacks Other amplification attacks
abuse TCP but involve directly connecting to the victim. Sherwood et al. [152]
showed an attacker can use optimistic acknowledgments to induce a server to send
a file at higher rates, ultimately DoSing its own network. The Great Cannon injects
Javascript into Baidu webpages, turning visiting browsers into denial of service
bots [153]. Our attack is effectively the reverse: instead of a censor co-opting the
bandwidth of users to perform an attack, an attacker can co-opt the bandwidth of
the censor.
Reflected UDP attacks have been studied extensively [139,140,154,155]. How-
183
ever, we are the first to study the use of middleboxes as reflectors.
Victim-sustained Attacks As we will see later in this chapter, we discover a
mechanism by which an attacker attacks a victim in such a way that the victim
themselves sustains the attack. Sargent et al. [156] identified 79 hosts that respond
to a particular IGMP request by repeating the request. Ostensibly, source-spoofing
this request could cause an infinite loop between two such hosts, and is thus similar
to our victim-sustained attacks in ?8.5. Our attacks are more widely applicable,
since they rely on standard client behavior (sending RSTs to unsolicited packets);
and as a result we identified several orders of magnitude more targets of victim-
sustained infinite amplification. However, their findings motivate applying tools
like Geneva at the application layer to discover application-specific bugs.
8.2 Discovering TCP-based Reflection Attacks
In this section, we present the first non-trivial, TCP-based reflected amplifi-
cation attacks. We present a novel way to automatically discover new amplification
attacks (?8.2.1), train it against a set of censoring middleboxes (?8.2.2), and report
on the amplification attacks we discovered (?8.2.3).
8.2.1 Automated Discovery of Amplification
Our goal is to identify sequences of packets that will elicit amplified responses
from middleboxes, without requiring us to establish a legitimate TCP connection
or guess ISNs. This requires identifying non-compliant TCP behavior. Unlike
184
UDP [148] or TCP SYN-based [147] reflected amplification attacks?which take ad-
vantage of weaknesses in protocol designs?we must find weaknesses in TCP imple-
mentations.
We make two modest changes to Geneva to find new amplification attacks
against middleboxes:
Initial Packet Sequence Geneva operates by manipulating an existing packet
sequence, such as a real client?s packets as it browses the web. To discover new
amplification attacks, we use a single PSH+ACK packet with a well-formed HTTP GET
request with the Host: header set to a given URL (we describe which URLs we use
in ?8.2.2). We chose HTTP as the input traffic because recent work demonstrated
both how widely deployed HTTP filtering middleboxes are [126] and that many
HTTP censors inject large block pages in response to small web requests [52].
Fitness Function Our goal is to find packet sequences that maximize amplification
from middleboxes. The straightforward approach would be to set the fitness function
to the amplification factor itself (number of bytes received divided by the number of
bytes sent). However, we found that this sometimes encourages Geneva to try to elicit
many small (e.g., SYN+ACK) packets from the end-host, rather than larger (e.g., block
page) packets from middleboxes. To encourage Geneva to elicit responses specifically
from middleboxes, our fitness function is the amplification factor, but ignoring all
incoming packets that have no application-level payload. This optimization applies
only to the fitness function; we report on all bytes sent and received in our results.
185
10,000
1,000
100
10
1
 0  20  40  60  80  100  120  140  160  180
IP Address Rank
Figure 8.2: Rank order plot of maximum amplification factor from Quack-identified
IP addresses. The maximum amplification factor was 7,455?.
8.2.2 Training Methodology
Geneva trains on live networks, and thus requires destination IP addresses to
train against. To identify destination IP addresses that are likely to have mid-
dleboxes on the path from our measurement machine to them, we use data from
Quack [52], a part of the Censored Planet [157] platform that performs active mea-
surements of censorship. Quack regularly sends HTTP GET requests with poten-
tially forbidden URLs in the Host: header to echo servers around the world, and
detects injected censorship responses from middleboxes.
We use Quack?s daily reports [54] to find endpoints that are likely to have
middleboxes on the path, and the URLs likely to trigger them. We downloaded
Quack?s March 28th, 2020 dataset and extracted the IP addresses that experienced
HTTP injection interference. This identified 209 IP addresses with active censoring
middleboxes on their path, along with the offending URLs. We began training
186
Amplification Factor
against them on March 29th.
To train Geneva with an IP address from Quack?s data, we set the destination
of the generated traffic to the IP address, and set the Host: header in the HTTP
GET request to one of the URLs that triggered interference to this IP address.
We let Geneva train for 10 generations with an initial population of 1,000
randomly generated strategies3. Training took approximately 25 minutes per IP
address. To limit our impact on the network, we spaced our experiments out over
four days; we sent each end-host just 2.8 Kbps of traffic on average (comparable to
Quack?s scans).
Before each experiment, we repeated Quack?s methodology to the destination
IP address to confirm it is still experiencing interference, and we skipped IP ad-
dresses that we did not experience interference. During our experiments, 25 of the
209 IP addresses (11.9%) stopped responding or no longer experienced interference,
consistent with the churn rates seen in Quack?s original experiments [52]. This left
184 IP addresses with active censoring middleboxes that Geneva trained against.
Next, we present the packet sequences Geneva discovered.
8.2.3 Discovered Amplification Attacks
For 178 (96.7%) of the 184 IP addresses from the Quack dataset, Geneva found
at least one packet sequence that elicited a response, and achieved an amplification
factor greater than 1 for 169/178 (94.9%). Figure 8.2 shows the maximum amplifi-
cation factors we discovered across all of these 169 hosts. Some of the middleboxes
3We forgo a full hyperparameter sweep to limit our impact on end hosts.
187
Strategy Response % Max Amplification
?SYN; PSH+ACK? 69.5% 7,455?
?SYN; PSH? 65.7% 24?
PSH 44.6% 14?
PSH+ACK 33.1% 21?
SYN (with GET) 11.4% 572?
Table 8.1: TCP-based reflected amplification attacks discovered against 184 Quack
servers. Each packet with the PSH flag set includes an offending HTTP GET request
in the payload.
provided high amplification factors: 17 (9.5%) had greater than 100?, and the
maximum amplification factor was 7,455?.
We identify five unique packet sequences that elicit responses and five addi-
tional modifications to improve amplification factor. We summarize them in Ta-
ble 8.1 and describe them in turn below.
8.2.3.1 Amplifying Packet Sequences
?SYN; PSH+ACK? The most successful strategy we discovered sends a SYN packet
(with no payload) with sequence number s, followed by a second PSH+ACK packet
containing sequence number s + 1 and the forbidden GET request. Although this
strategy comes at the cost of an entire additional packet, we find it to be highly
effective at getting responses from middleboxes. It elicited responses from 128/184
(69.6%) of the middleboxes, with a maximum amplification factor of 7,455?.
From a middlebox?s perspective, this packet sequence looks like a traditional
TCP connection, missing the server?s SYN+ACK and the client?s ACK. As with nor-
mal TCP connections, the sequence number of the SYN is one less than the sequence
number of the PSH+ACK. As discussed in ?8.1, middleboxes must be resilient to asym-
188
metric routes, so it is expected that they would respond while missing the server?s
SYN+ACK. We note this sequence omits the client?s ACK in a typical handshake, though
the PSH+ACK may suffice to replace it. Geneva tried adding the client?s ACK, but elim-
inated it during training?in follow-up experiments, we verified that adding the ACK
had no effect on how the middleboxes responded.
?SYN; PSH? This sequence sends a SYN with sequence number s (and no payload)
followed by a PSH with sequence number s + 1 and the forbidden GET request as
its payload. Note that this is the same as the ?SYN; PSH+ACK? strategy, but with the
ACK flag cleared in the second packet.
?SYN; PSH? elicited responses from 121/184 (65.7%) of middleboxes, with a
maximum amplification of 24?. Most (118, or 97.5%) of these also responded to
the ?SYN; PSH+ACK? sequence with the same amplification factors: those middleboxes
appear not to be sensitive to the presence of the ACK flag on the packet containing
the request. However, 10 middleboxes responded only when the ACK flag was set
and 3 middleboxes responded only when it was not. We explore these differences
more deeply with full IPv4 scans in ?8.4.
We also explored if an additional ACK packet between the SYN packet and the
PSH packet would improve response rate. Like with the ?SYN; PSH+ACK? sequence,
we found it had no effect on the middleboxes? responses.
PSH This sequence sends only a single packet: a PSH with the forbidden GET
request. It elicited responses from 82 (44.6%) of middleboxes, with a maximum
amplification factor of 14?. Note that this is the same as the ?SYN; PSH? sequence,
189
without the SYN. All but one (98.8%) of the middleboxes that responded to just the
PSH also responded to ?SYN; PSH?, indicating that the SYN was not necessary. For
those hosts, avoiding the SYN resulted in an increase in amplification factor.
PSH+ACK This also sends a single packet: a PSH+ACK with a forbidden GET request.
No TCP-compliant host should respond to this packet with anything besides an
empty RST, as there is no three-way handshake. Still, 61 (33.2%) middleboxes
responded with injected responses, with a maximum amplification factor of 21?.
This strategy is identical to the ?SYN; PSH+ACK? sequence, minus the SYN
packet. We find that all of the middleboxes that responded to a lone PSH+ACK
also responded to the ?SYN; PSH+ACK?, with the responses of the same size. For
those hosts, sending the additional SYN strictly decreases the amplification factor.
Most (51, or 83.6%) of the middleboxes that responded to PSH+ACK also re-
sponded to PSH; these middleboxes? responses were the same for both strategies,
indicating no change in amplification. 10 middleboxes responded to PSH+ACK but
not to PSH; these gave PSH+ACK its greatest amplification factor. However, 31 mid-
dleboxes responded to PSH but not PSH+ACK. Overall, PSH elicited more responses,
but PSH+ACK elicited larger ones.
SYN with Payload This strategy sends the forbidden GET request as the pay-
load of a single SYN packet. This elicited the fewest responses?21 (11.4%) of the
middleboxes?but one of the largest amplification factors: 527?.
It is not common to send payloads in SYN packets4, which led us to hypothesize
that the middleboxes that responded to this might only be looking at the payloads.
4This is generally reserved for TCP Fast Open, which is rare in practice.
190
But this appears not to be the case: only 3 (14.3%) of the middleboxes that re-
sponded to SYN also responded to PSH+ACK, and only 6 (28.6%) also responded to
PSH.
8.2.3.2 Packet Sequence Modifications
Geneva identified five additional modifications to the above packet sequences
that improve the amplification factor for some middleboxes. One of these (increasing
TTLs) never resulted in lower amplifications, and appear to be worth doing against
all middleboxes. Four improve amplification for some middleboxes but lower it for
others; to use such modifications in a practical setting, an attacker would ideally
identify the middleboxes it uses ahead of time.
Increased TTLs Every IP header includes a time-to-live (TTL) field to limit the
number of hops a packet should take; routers are supposed to decrement this at each
hop, and drop the packet if the TTL reaches zero. Against one middlebox, Geneva
learned to increase the TTL of both packets in the ?SYN; PSH+ACK? sequence to its
maximum value (255) to improve the amplification factor. It is very surprising that
the TTL would have any impact on the amplification factor; the default TTL was
already large enough to reach the destination.
To understand its root cause, we sent packet sequences to this middlebox with
TTLs ranging from 0 to 255, and counted the number of responses for each. We find
a perfectly linear relationship between TTL and amplification factor: we received
t ? 13 block pages for all TTL values t ? 13. At the maximum TTL value (255), it
191
sent 242 copies of its block page!
This behavior can be explained by routing loops in the network of the censoring
middlebox. Each time the packet sequence circles the routing loop, it re-crosses the
censoring middlebox, causing it to re-inject its block page. That this only works for
TTLs greater than 13 indicates that the routing loop is 13 hops from our measurement
host. We show in ?8.4 that routing loops are surprisingly common on the Internet
at large, and they can be exploited by attackers for significant improvements to the
amplification factor.
We found that setting a high TTL on packets has no effect on the response rate
of any of the other packet sequences, so this modification can be made at no cost
to freely exploit routing loops for maximum amplification.
Increased wscale Window scaling (or wscale) is a TCP option that controls
how large the TCP window can grow. Geneva discovered an optimization that gets
7 (3.8%) more middleboxes to respond to the ?SYN; PSH+ACK? sequence: setting the
wscale TCP option in the SYN packet to an integer greater than 12. Based on the
block page these middleboxes injected, we believe they are instances of Symantec?s
Web Gateway (SWG).
To understand this behavior, we sent the modified packet sequence 1,000 times
to the candidate middleboxes in Quack?s dataset, and repeated this experiment five
times. Strangely, in each case, the middleboxes responded only ?25% of the time.
We could successfully ping the end-hosts behind each SWG with innocuous requests,
suggesting that packet drops are not the root cause of the reduced response rate.
192
Varying the time between each packet sequence had no effect on the response rate,
indicating we were not overloading the SWGs. The behavior is also not affected
by packets sent by the end-host: if we limit the TTL of all of our packets such that
they reach the middlebox but not the end-host, the middlebox still injects content
to 25% of requests. Finally, altering the actual value of wscale had no effect on
response rate. We do not understand why SWG is sensitive to this option.
Like with increased TTLs, increasing wscale had no adverse effect on response
rates or sizes. However, because wscale is a TCP option, it requires additional
bytes, thereby potentially lowering the amplification factor.
TCP Segmentation One modification Geneva identified for some middleboxes
is to simply segment the forbidden GET request across multiple packets, either by
adding an additional packet to single-packet sequences, or across the two packets
in the ?SYN; PSH? or ?SYN; PSH+ACK? sequences. Geneva discovered that 5/184 (2%)
middleboxes would send the block page a second time, once for each packet segment.
For these middleboxes, this serves as an optimization for the amplification factor:
although it comes at the cost of an additional packet with some payload, the payoff is
a doubling in traffic elicited from the middleboxes. Strangely, this modification only
works for two segments: any further segmentation causes two of the middleboxes to
not respond, and the other three only send a maximum of two block pages.
Although this optimization can improve the amplification from middleboxes
with this behavior, 26 others (14%) are unable to perform packet reassembly and
stop responding entirely. Worse, for the middleboxes that do perform reassembly
193
and still respond, segmenting the request across multiple packets lowers the ampli-
fication factor.
FIN+CWR Another modification Geneva identified against four (2%) middleboxes was
to change the TCP flags of the PSH+ACK packet in the ?SYN; PSH+ACK? sequence to
FIN+CWR. The CWR flag??Congestion Window Reduced??is used for TCP?s Explicit
Congestion Notification (ECN), and generally should not be combined with a FIN
flag. The modified packet sequence elicits 12 copies of the middleboxes? block pages,
each sent 0.4 seconds apart. The block page duplication increases the amplification
factor of these middleboxes to 301?. If the CWR flag is not present on the packet, no
response is sent. According to the injected block pages, these middleboxes appear
to be instances of Fortinet Application Guard; this modification appears to only
improve amplification factor for these middleboxes.
Shorter HTTP Geneva discovered an optimization against one middlebox: cutting
off the four bytes in the HTTP GET request that immediately follow the forbidden
URL (\r\n\r\n). Although this slightly improves the amplification factor for one
middlebox, none of the other 183 middleboxes responded. This suggests that it is
important for the HTTP GET request to be well-formed.
Failed Approaches We expected that changing the TCP window in our packet
sequences might have an impact on amplification. Recall that TCP window size
determines how much data the other endpoint can send before expecting an ac-
knowledgement. However, we found that none of the middleboxes respected this
TCP feature. Similarly, though TCP mandates that data sent should not exceed
194
D D D D D A Attacker
D Destination
R M R
M M
M Middlebox
A V A V A V A V A V R Router
V Victim
(a) Destination (b) Middlebox (c) Destination and (d) Routing loop (e) Victim-sustained
reflection reflection middlebox reflection reflection reflection
Figure 8.3: Types of attacks we find. Thick arrows denote amplification; red ones
denote packets that trigger amplification. We find that infinite amplification is
caused by (d) routing loops that fail to decrement TTLs and (e) victim-sustained
reflection.
the maximum segment size (MSS) TCP option, every middlebox ignored this option.
8.3 Internet Scanning Methodology
We perform ZMap [145] scans of the IPv4 Internet to measure the effectiveness
each of the attack packet sequences from ?8.2.
Modifications to ZMap ZMap allows us to create arbitrary probe packets with
the ?probe modules?; we wrote a custom probe module for the packet sequences
identified by Geneva. ZMap does not natively have the ability to send multiple
distinct packets in each probe (e.g., SYN followed by PSH+ACK), so we modified ZMap
to add this capability.
Selecting Forbidden URLs Quack?s dataset contains 1,052 URLs that triggered
censorship. Ideally, we could perform full Internet-wide scans for each URL and
determine which ones produce the highest amplification. Unfortunately, this would
take over 6 weeks of scanning at full 1 Gbps line rate per Geneva strategy, and would
likely have diminishing returns.
Instead, we chose to estimate the smallest combination of URLs that collec-
195
tively elicit responses from the largest number of IP addresses. To do this, we
construct every set of size 1 ? N ? 7 of the 1,052 URLs from the Quack dataset,
and for each set compute the number of Quack IP addresses it would have triggered.
We find the ideal set to be of size N = 5, each coincidentally from a different
website category as identified by the Citizen Lab Block List [158]: www.youporn.com
(pornography), plus.google.com (social networking), www.bittorrent.com (file shar-
ing), www.roxypalace.com (online gambling), and www.survive.org.uk (sexual health
services). These five keywords collectively elicit responses from 83% of the Quack IP
addresses, after which there are diminishing returns (adding a sixth keyword only
increased the response rate by 3.6%).
We acknowledge that the Quack dataset may not be representative of the
entire Internet. Moreover, coverage of IP addresses is not necessarily the same as
coverage of middleboxes; however, few IP addresses (4%) in the Quack dataset share
the same /24 prefix, so we expect little middlebox overlap. It is possible that other
keywords will elicit broader coverage or greater amplification; we leave this to future
work.
Data Collection From April 9th to April 26th, 2020, we performed 5 sets of
Internet scans, one for each mutually exclusive packet configuration (?8.2.3). For
each set, we performed 7 Internet-wide scans: one for each of the 5 domains and our
two control scans (?example.com?, and no payload at all). To avoid saturating our
link, we scanned at 350 Mbps; and each scan took approximately 2?4 hours. After
each scan, we aggregated the number of bytes and packets we received from each IP
196
address that responded to our probes. Following convention, we include the size of
the Ethernet header in the size of our probes and response packets when computing
amplification factors.
8.4 Internet Scanning Results
This section presents the results of sending our attack packet sequences from
?8.2 to the entire IPv4 Internet. We make two notes upfront that are important in
understanding our results:
Responder variation Our packet sequences elicit a wide range of behaviors. We
broadly classify them in Figure 8.3; for some destinations and packet sequences, we
get response packets directly from destinations, from middleboxes (pretending to be
the destination), or some combination of the two. We confirm in ?8.4.3 that over 82%
of the largest responses we receive come from middleboxes, but unfortunately it is
difficult to perform this analysis for every destination IP address we send to. Thus,
for consistency (and because middlebox de-aliasing is difficult and error-prone), we
report on the number of destination IP addresses from which we can elicit responses
throughout this chapter. We explore clustering and identifying middleboxes by their
responses in ?8.4.4.
Infinite amplification We discover many IP addresses that continue to respond,
seemingly indefinitely, to our probes. The amplification factors for these IP addresses
are technically infinite, but we report the (finite) amplification we obtained during
our scans. These tend to be orders of magnitude larger than other hosts. We explore
197
8
10 syn+psh
syn+pshack
6 syn
10 psh
pshack
4
10
2
10
0
10
0 1 2 3 4 5 6 7 8
10 10 10 10 10 10 10 10 10
IP Address Rank
Figure 8.4: Rank order plot of the amplification factor received from each IP ad-
dress for the triggering payloads containing www.youporn.com across all five packet
sequences.
?SYN; ?SYN;
URL SYN PSH PSH+ACK PSH? PSH+ACK?
www.youporn.com 49.4 4.4 23.2 13.9 52.0
roxypalace.com 5.8 4.4 16.5 13.6 31.3
plus.google.com 7.4 7.0 5.9 13.4 14.9
bittorrent.com 3.7 3.2 3.8 10.6 13.7
survive.org.uk 4.4 2.8 2.4 11.0 11.2
example.com 3.4 2.9 2.8 11.2 8.4
empty 0.06 0.01 0.02 0.05 0.06
Table 8.2: Total data received (GB) from the top 100,000 IP addresses for each
combination of target URL and packet sequence. Bolded is the maximum value for
each target URL.
infinite amplifiers in ?8.5.
8.4.1 Which strategies work best?
We begin by measuring the impact that packet sequence and keyword have on
response rate and amplification factor.
Figure 8.4 compares the amplification factors for each of the 5 packet se-
quences with the URL www.youporn.com. We immediately observe that each of
198
Amplification Factor
?SYN; ?SYN;
URL SYN PSH PSH+ACK PSH? PSH+ACK?
www.youporn.com 116,120 67,503 78,830 92,765 97,689
roxypalace.com 128,843 52,168 63,080 86,010 97,213
plus.google.com 39,177 27,815 24,827 54,916 63,090
bittorrent.com 33,187 19,171 24,682 47,348 193,754
survive.org.uk 98,038 14,600 13,060 45,953 43,927
example.com 28,909 15,669 15,911 46,469 27,962
empty 65 27 49 42 59
Table 8.3: Number of IP addresses with amplification factor over 100? for each
combination of target URL and packet sequence. Bolded is the maximum value for
each sequence.
these strategies elicits responses from over 5M destination IP addresses with am-
plification greater than one. Moreover, we find that all of them elicit very large
amplification factors; for each packet sequence, there are over 50,000 destination IP
addresses that yield over 100?.
To focus on the heaviest hitters, Table 8.2 compares the total volume of traffic
generated from the top 100,000 IP addresses for each scan, and Table 8.3 shows the
number of IP addresses with amplification factor greater than 100?. ?SYN; PSH?
and ?SYN; PSH+ACK? get responses from the largest number of unique IP addresses:
29? more than the SYN scan. Despite requiring an additional packet, they also yield
higher amplification factors for most of the top 1,000 IP addresses, and elicited
the highest total amount of traffic across every URL. Sending a SYN packet with a
forbidden HTTP GET was surprisingly effective at eliciting responses: for half of
the URLs, it had the most IP addresses with an amplification factor greater than
100?.
The choice of URL has a strong impact on how well a given packet sequence
amplifies. Figure 8.5 shows the amplification factors from using each of the key-
199
8
10
7 www.youporn.com
10
6 example.com
10 plus.google.com
5
10 www.roxypalace.com
4 www.survive.org.uk
10 www.bittorrent.com
3
10 empty
2
10
1
10
0
10
?1
10
0 1 2 3 4 5 6 7 8
10 10 10 10 10 10 10 10 10
IP Address Rank
Figure 8.5: Rank order plot of the amplification factor received from each IP address
for the ?SYN; PSH+ACK? packet sequence across all seven scanning payloads.
word/strategy combination.
Overall, www.youporn.com was the most effective for eliciting the most re-
sponses, with two notable exceptions. First, www.bittorrent.com elicited double
the number of IP addresses with amplification factor greater than 100?. The source
of this is highly amplifying censorship of two networks with /16 prefixes: one run
by the University of Ghent; the other, the City of Jacksonville, Florida. Second,
roxypalace.com on SYN packets similarly elicited responses from more IP addresses
than any other URL, and this is largely due to triggering the border firewall at
Brigham Young University, which runs a /16 prefix.
Surprisingly, scans for the control keyword example.com trigger many ampli-
fiers. It under-performed every other keyword in number of IP addresses and amount
of data elicited, but thousands of IP addresses still responded with 20? amplifica-
tion. It is possible the middleboxes who respond to this do so as a means of access
control. Scans with an empty payload received the fewest amplifiers, smallest total
200
Amplification Factor
data elicited, and smallest total amplification: the ?SYN; PSH+ACK? scan elicited three
orders of magnitude more data than an empty SYN scan.
Summary The ?SYN; PSH+ACK? packet sequence with www.youporn.com is overall
the most effective at eliciting amplification, but other URLs and sequences are
needed to trigger specific, large networks.
8.4.2 Are these actually amplifiers?
We next explore if these IP addresses can be (ab)used for real-world attacks. In
a real attack, an attacker would not send just one trigger packet sequence; instead,
she would repeatedly send trigger packet sequences to these IP addresses to amplify
the response traffic. To test if the IP addresses we identify are true amplifiers,
we perform an experiment with the top 1 million IP addresses with the highest
amplification factor from the ?SYN; PSH+ACK? scan with www.youporn.com keyword.
Using ZMap, we perform two independent scans to these IP addresses: first, by
sending 5 trigger packet sequences to each IP address, and second (as a control),
just one trigger packet sequence5.
Figure 8.6 presents the increase factor : the ratio of bytes we received from
each IP address when sending 5 probes to the bytes received from 1 probe. Perfect
amplifiers have an increase factor of 5?. Our results suggest that the majority of the
top 1 million IP addresses are true amplifiers. Over 46% of IP addresses responded
with exactly 5? as much data, and another 30% responded with between 2? and
5When sending multiple probes, we modify ZMap so that each probe is sent from a different
source port, so the packets are not identical.
201
 1
 0.9
 0.8
 0.7
 0.6
 0.5
 0.4
 0.3
 0.2
 0.1
 0
 1  2  3  4  5  6  7  8  9  10
Increase Factor from 1 Probes to 5 Probes
Figure 8.6: The increase factor in the number of bytes we receive between sending
5 probes and sending 1 probe. 46% of IP addresses responded with exactly 5? as
much data.
5? as much data, likely representing amplifiers that missed or dropped one or more
of our packets. Notably, many of the IP addresses that sent the most data do not
increase by the same rate. Of the top 100 amplifiers, none of them increased by
exactly a factor of 5?, and only 10 increased by 4?6?.
8.4.3 Are these middleboxes?
Next, we determine if the responses we receive are truly coming from middle-
boxes. We performed a traceroute using a custom ZMap probe module on the top
million IP addresses by bytes received in our ?SYN; PSH+ACK? www.youporn.com scan.
Our ZMap module sent three TTL-limited TCP SYN packets for each TTL between 10
and 25 to each of the million hosts, and recorded the resulting ICMP TTL-exceeded
messages. This allowed us to construct a (partial) traceroute for each target for hops
10?25. Out of the million targets, 99.5% provided at least one router hop, with an
202
Cumulative Fraction of Hosts
average of at least 6 hops per traceroute.
For each target, we extracted the last hop that we received a TTL-exceeded
message for (i.e., the last hop we learned on the traceroute to the target). We then
sent a follow up ?SYN; PSH+ACK? sequence with www.youporn.com to the target, but
TTL-limited to the last known hop. This probe is certain to not reach the target, as
it should generate a TTL-exceeded message by the last-hop router. Therefore, if we
still receive a response from the endpoint, we can tell the response is coming from
a middlebox along the path to the target, and not the target itself.
If we do not receive a response, we cannot conclude that responses normally
come from the target endpoint, as it could be that our traceroute was incomplete:
there may be a middlebox further along the path but still before the endpoint.
However, we can interpret the presence of a response to our TTL-limited probe as
confirmation that it was produced by a middlebox.
Figure 8.7 shows the results of this scan, binning IP addresses into bins of size
1,000 and plotting the fraction of the IPs in the bin that we identified as middleboxes.
Overall, 36.8% of the 1M targets responded to our TTL-limited probe, positively
confirming their responses were produced by a middlebox. Notably present, however,
are two gaps in the graph in which almost no responses were received:
The small gap has ?10,000 IP addresses (104,000 ? x ? 114,000). All of
these IPs are in three /20-sized subnets that belong to the Texas State Technical
College Harlingen (TSTCH). Their responses correspond to block pages generated
by a SonicWall network security appliance, a common middlebox we see in our
data. It appears that TSTCH blocks traceroutes at its border, meaning that our
203
TSTCH Saudi Arabia
 1
 0.8
 0.6
 0.4
 0.2
 0
0 200k 400k 600k 800k 1M
IP Rank (bin size 1000)
Figure 8.7: The fraction of the top million hosts that we confirm are middleboxes,
using TTL-limited probe. The small gap at x ? 100,000 and the large gap in the
middle of the plot correspond to networks that block traceroutes at their borders.
Accounting for this, we find injected responses from 82.9% of the top million IP
addresses are from confirmed middleboxes.
last-observed traceroute hop occurs before the SonicWall appliance.
The larger gap has ?465,000 IP addresses (213,000 ? x ? 678,000). 98.6%
of them geolocate to Saudi Arabia. Looking at their traceroutes, their last hops
comprise just 2,068 unique router IPs, with 90% of IP addresses sharing only 10
last-hop routers (all within Saudi Arabia). It appears that Saudi Arabia also blocks
traceroutes at their border, preventing us from being able to traceroute into the
country. However, the response that comes back from 97% of the IP addresses
in this block corresponds to the standard block page of Saudi Arabian censorship,
describing that the website is blocked, and also suggesting a middlebox is responsible
for this response.
Conservatively labelling the 10,000 IP addresses from TSTCH and 97% of
the 465,000 Saudi Arabian IPs as encountering on-path middleboxes increases the
204
Fraction Middleboxes
percent of IPs that encounter on-path middleboxes to 82.9% of the million targets
we scanned. We conclude that responses from the vast majority of IP addresses in
our dataset are produced by middleboxes.
8.4.4 What kind of packets do amplifiers send?
We analyzed the packets we received in our ?SYN; PSH+ACK? scan with www.youporn.com.
This scan received a total of over 105 GB of data from 337 million IP addresses.
For each IP address, we generate a fingerprint from the response packet sequence,
consisting of a vector of (TCP flags, packet size) tuples; this allows us to ef-
ficiently group IP addresses that send us similar responses. We then counted the
number of IP addresses that sent each fingerprint. We ignore order to allow for
packet re-ordering.
Overall, we discover 63,662 unique fingerprints. Each fingerprint repre-
sents a unique set of packets sent by amplifiers. The fingerprint returned by the
most IP addresses is a sequence of three 54-byte RST+ACKs, which we received from
approximately 154 million IPs. This is a well-known censorship pattern produced by
the Great Firewall of China (GFW) [24,40], and using the MaxMind database [159],
we find 99.9% of these IPs geolocate to China. We note this is weakly-amplifying,
sending 162 bytes for our 149 byte probe.
The fingerprints representing the largest number of bytes are less common. For
example, the top fingerprint is 528,007 410 byte FIN+PSH+ACK packets and 525,110
RST+ACKs, sent by a single IP address in India. We investigate these mega-amplifiers
205
#Responsive % Sending
Country IP addresses fingerprint Fingerprint
China 170,858,209 90.0% 3? RST+ACK (54)
S Korea 15,981,100 7.6% PSH+FIN+ACK (119)
Iran 8,612,544 75.7% PSH+FIN+ACK (402?405);
RST+PSH+ACK (54)
Egypt 2,909,897 89.8% RST+ACK (54)
Bangladesh 1,375,908 81.4% PSH+FIN+ACK (248)
Saudi Arabia 894,858 45.3% PSH+ACK (97);
2? PSH+ACK (1354)
Oman 596,546 94.7% RST (54)
Qatar 387,625 89.4% RST (54)
Uzbekistan 253,098 91.8% FIN+ACK (74)
Kuwait 173,126 31.3% PSH+FIN+ACK (114)
UAE 161,014 52.0% RST (54)
Table 8.4: Nation-states with nation-wide censorship infrastructure and the finger-
print they most frequently respond to clients with. Numbers in parentheses denote
packet sizes in bytes.
more in ?8.5. The largest fingerprints sent by more than one IP address consist of a
single SYN+ACK and multiple megabytes worth of PSH+ACK packets containing data.
These appear to be sent by buggy TCP servers that simply respond to our non-
compliant GET request with real data. We find approximately 746,000 IP addresses
with this behavior.
8.4.5 Are these national firewalls?
We find that nation-state censorship infrastructure makes up a significant frac-
tion of the TCP amplifiers we discover. Figure 8.8 breaks down the amplification we
see for the top 5 countries by number of amplifying IP addresses. Out of these, all
but the US have deployed nationwide Internet censorship infrastructure [160, 161],
visible by long flat plateaus in the graph which indicate a large number of IP ad-
dresses with uniform amplification. The US is a notable exception, and we explore
why it is so prevalent later in this section. Amplification factors vary significantly
206
8
10
7 China
10 US
6
10 Iran
5 S Korea
10 Russia
4
10
3
10
2
10
1
10
0
10
0 1 2 3 4 5 6 7 8
10 10 10 10 10 10 10 10 10
IP Address Rank
Figure 8.8: Rank order plot of the amplification factor by country for the
www.youporn.com scan with the ?SYN; PSH+ACK? packet sequence.
country-to-country due to different censorship methods.
By extracting fingerprints that were shared by many IP addresses that ge-
olocate to the same country, we can identify censoring nation-states. For example,
over a million IP addresses geolocate to Bangladesh and respond with a 248-byte
FIN+PSH+ACK. Table 8.4 shows a sample of censoring countries and their most popu-
lar fingerprint. At a slightly higher amplification, we observe four similar fingerprints
with two packets each: a 402?405-byte FIN+PSH+ACK and a 54-byte RST+PSH+ACK.
We received these fingerprints from 8.6 million IP addresses in Iran, representing
76% of all the responding IP addresses that geolocate to Iran.
The censorship infrastructure of Saudi Arabia also shows prominently in our
dataset: its fingerprint is three packets: a 97-byte PSH+ACK and two 1354-byte
PSH+ACKs, offering an amplification factor of 18.9?. We received this fingerprint
from over 400K IP addresses, 99% of which geolocate to Saudi Arabia, comprising
45% of all the responding IP addresses that geolocate to Saudi Arabia.
207
Amplification Factor
In general, we find the amplification factor from nation-state censors is small:
most countries we surveyed provide less than 4? amplification. The GFW of China
is the largest?but also the weakest?amplifier we find. Curiously, we find that the
GFW has a different fingerprint between two of our scans: the ?SYN; PSH+ACK? scan
with plus.google.com elicited three RST+ACKs and a RST packet, but this extra RST
packet is missing in scans for www.youporn.com. This RST was also absent when
plus.google.com was sent with the ?SYN; PSH? sequence. The presence of the RST
raises the amplification factor of the GFW from 1.08? to 1.45?.
We do not understand why the GFW behaves differently between these key-
words and sequences. Researchers have hypothesized that the RST+ACK and RST
packets from the GFW originate from different, co-located censorship systems [24,
40]; our results support this theory, and even suggest that the block lists themselves
can be processed differently between the two censorship systems depending on the
sequences of packets.
We also discover hundreds of IP addresses in routing loops in Russia that
contain censoring middleboxes with 250.9? amplification. The highest amplifying
nation-state censors are two censoring ISPs located in Russia that seem to have
infinite routing loops in their network, that sent us packets for weeks after our
scans. We examine the effects of routing loops more closely next in ?8.4.6.
Nation-state censors pose a more significant threat to the Internet than their
amplification factor alone suggests. First, nation-state censorship infrastructure is
located at high-speed ISPs, and is capable of sending and injecting data at incred-
ibly high bandwidths. This allows an attacker to amplify larger amounts of traffic
208
without worry of amplifier saturation. Second, the enormous pool of source IP ad-
dresses that can be used to trigger amplification attacks makes it difficult for victims
to simply block a handful of reflectors [162]. Nation-state censors effectively turn
every routable IP addresses within their country into a potential amplifier.
While nation-state censors are well-represented in our amplifiers dataset, other
large non-censoring countries, such as the US, are prevalent as well. Specifically for
the US, we observe a more diverse set of fingerprints: over 13,000 unique fingerprints,
compared to 7,553 in Russia, and under 3,000 from South Korea. This indicates a
diversity of networks, rather than a coordinated, nationwide deployment. Indeed,
we observe several university and enterprise firewalls that respond with identifiable
and amplifying fingerprints.
These results demonstrate that nation-state censors enable TCP amplification
attacks, but that they are far from the sole contributor to this problem.
8.4.6 Routing Loops
Routing loops are the result of network misconfigurations, inconsistencies, and
errors in routing protocol implementations. Packets caught in a routing loop will
typically eventually be dropped when their TTL reaches zero. However, even a
finite routing loop can hypothetically have significant impact on amplification factor.
Suppose an amplifying middlebox were in a routing loop; every time an offending
packet traversed the loop, it would re-trigger the middlebox. Such a scenario would
make the network self-amplifying: at no additional cost to an attacker, the effective
209
 1
 0.9
 0.8
 0.7
 0.6
 0.5
 0.4
 0.3
 0.2
 0.1
 0
 4  5  6  7  8  9  10
Increase Factor from TTL=64 to 255
Figure 8.9: CDF of the increase factor in amplification of candidate looping IP
addresses when scanned with a TTL of 255 and 64. Because the increase factor is
affected by the number of hops away an IP address is, we expect routing loops to
have an increase factor of at least 4. Larger increase factors are further away from
our scanner, limiting the overall amplification factor from our perspective.
amplification rate of a middlebox would be increased by the number of times the
packet crosses the middlebox in the routing loop.
The maximum value of TTL in the IPv4 header is 255, so the number of times a
single trigger packet sequence can elicit responses from an RFC-compliant middlebox
is `(255 ? d), where d is the number of hops between the attacker machine and
the routing loop and ` is the number of times the packets traverse the amplifying
middlebox per loop.
So far, our scans were conducted with a TTL value of 255, in accordance with
the optimizations discovered by Geneva in ?8.2. We performed follow-up scans with
a reduced TTL value in order to observe which IP addresses send us a corresponding
reduction in the number of packets, allowing us to identify which amplifiers involve
routing loops.
210
Cumulative Fraction of Hosts
 1
 0.9
 0.8
 0.7
 0.6
 0.5
 0.4
 0.3
 0.2
 0.1
 0
0 1 2 3
10 10 10 10
Rank Order of /24 Prefixes with At Least One Routing Loop
Figure 8.10: The /24 prefixes with at least one routing loop, rank-ordered by the
fraction of their 256 IP addresses that we observe to loop. Of the 2,763 looping
prefixes, 54 (2%) have over 90% of their IP addresses loop, but 1,705 (62%) have
only one looping IP address. (Note that the x-axis is log-scale.)
For this experiment, we use the ?SYN; PSH+ACK? packet sequence with the
www.youporn.com trigger keyword. We use the top 1 million hosts (by number pack-
ets sent during the scans), and perform two follow-up scans to these IP addresses:
one with the TTL set to 255 and one set to 64 (approximately 1/4 the value). As
we are knowingly re-triggering machines with potentially enormous amplification
factors, we reduced the scanning speed to 100 kbps6.
We can identify routing loops by comparing the number of packets we receive
per IP address across scans. For a routing loop d hops from our scanner, we expect a
probe with TTL = 255 to receive (255?d)/(64?d) times more packets than a probe
with TTL = 64. Note that this value increases as d increases, and, for a routing loop,
has a minimum value of ?4 (when the routing loop is zero hops away). Therefore,
we label an IP addresses as having a routing loop if it has an increase factor of at
6Despite our low send rate, we received back on average around 800 Mbps, representing a total
amplification of 8,000? for this experiment.
211
Fraction of Prefix?s
Addresses that Loop
least 4 and sent more than 10 packets when probed with a TTL of 255. From our top
1 million IP sample, we label 53,041 IP addresses as routing loop amplifiers
using this heuristic, spanning 2,763 distinct /24 prefixes. Figure 8.9 presents a CDF
of the increase factor for these routing loop IPs.
Loops per subnet One would expect that if sending to a given IP address
results in a routing loop, then all of the other IP addresses in its /24 prefix would
experience a loop, as well. Surprisingly, we find that 62% of /24 prefixes with at
least one routing loop have exactly one loop. Figure 8.10 shows the fraction of
IP addresses found in each looping /24 prefix. Only 54 subnets have over 90%
(231 of 256) of their IP addresses show evidence of being a routing-loop amplifier.
On the other hand, 81.2% (2,244) of looping prefixes have fewer than 10 looping IP
addresses. This means that even if an attacker can elicit responses from a middlebox
by sending packets to any IP address that routes through it, she may only be able
to take advantage of routing loops to a small number of IP addresses.
8.5 ?Mega-amplifiers?
In our scans, we identify a surprising number of hosts that send enormous
amounts of data in response to a single packet sequence?on the order of many
gigabytes. We believe these are the same ?mega-amplifiers? that Czyz et al. [148]
reported in 2014. We identify two phenomena that contribute to mega-amplification:
self-sustaining amplifiers and victim-sustained amplifiers.
Self-Sustaining Amplifiers Self-sustaining amplifiers are IP addresses that, once
212
triggered, continue sending data indefinitely. In our scans, we have observed these
continuing for weeks after our probes. We hypothesize the cause of self-sustaining
amplifiers is infinite routing loops: routing loops between middleboxes that do not
decrement TTLs.
An infinite routing loop suggests these amplifiers are sending responses at the
maximum capacity of their links. To confirm, we sent a packet sequence to a self-
sustaining amplifier we identified in an ISP?s censorship system in Russia. A short
time later, we sent the same packet sequence from a different vantage point, and we
recorded the bandwidth received from each. Figure 8.11 shows the bandwidth we
received on both vantage points during our experiment. When we send a probe from
a second vantage point, the response bandwidth was split equally between them.
We were unable to terminate the barrage of packets sent to us by this ampli-
fier. We sent RST packets, and also tried FIN+ACK, FIN, RST+ACK, and ICMP port
unreachable messages with no effect. Ultimately, the traffic stopped after approx-
imately six days to the first vantage point, and 22 hours for the second. We believe
the reason they finally stopped was because the routing loop eventually dropped a
packet.
Fortunately, we find very few self-sustaining amplifiers: only 19 IP addresses
sent data continuously. We identified 6 IP addresses (each in a different /24 prefix)
located in China that sent the known censorship pattern from the GFW indefinitely,
possibly indicating a loop across the GFW itself. Two ISPs in Russia also sent block
pages indefinitely.
213
 2000
Vantage Point 1
Vantage Point 2
 1500
 1000
 500
 0
 0  100  200  300  400  500  600  700
Seconds Since Experiment Start
Figure 8.11: Attack bandwidth received at two vantage points from a self-sustaining
amplifying IP address, which (based on its block page) appears to be a component of
a Russian ISP?s censorship system. The dashed line marks when the packet sequence
was sent from the second vantage point. Note how the bandwidth we get from the
system is divided evenly between the vantage points. This experiment supports our
hypothesis that self-sustaining amplification is caused by an infinite routing loop.
Victim-Sustained Attacks The TCP standard says that when a host receives an
unsolicited non-RST packet, it should send a RST packet in response [100]. For TCP
amplification victims, this means they will send RST packets for any received (am-
plified) traffic. Normally, victim-generated RST packets have no effect on middlebox
amplifiers7.
However, our scans identify amplifying IP addresses that send an additional
response to RST packets instead of ignoring them. This causes the victim to send
another RST, inducing more responses, and so on. This packet storm continues
indefinitely until a packet is dropped.
By default, our scanning machine sent outbound RST packets in response to
data, thereby eliciting additional packets from victim-sustained amplifiers. To ex-
7Conversely, they may serendipitously halt SYN-based amplification attacks that target end-
hosts [146,147].
214
Kbps
plore the effect that outbound RST packets have on amplification factor, we per-
form two additional scans: one with outbound RST packets turned off for the
www.youporn.com keyword in the ?SYN; PSH+ACK? sequence, and one with RSTs en-
abled (default). Figure 8.12 shows a comparison between these two scans. Dropping
outbound RST packets has the effect of lowering the amplification factor for the top
amplifying IP addresses, while raising the amplification factor of many IP addresses
in the ?long-tail?.
We find several thousand IP addresses that behave this way, which we classify
into two classes: censoring repeaters and ?acknowledgers?.
For censoring repeaters, we find 4,154 middleboxes that re-send a block
page in response to a RST. This appears to be a buggy flow-tracking middlebox
that, once a TCP flow triggers blocking, will continue injecting its block page in
response to any subsequent packet, including RSTs.
For acknowledgers, we find 10,645 IPs that respond with an ACK to both data
payloads and subsequent RST packets. This behavior is also not TCP compliant.
To investigate what operating systems these ?acknowledgers? are, we performed
Operating System (OS) identification nmap [163] scans on 500 randomly sampled
victim sustained IP addresses. Of the 452 (90.2%) IP addresses with a successful
OS match, 267 (59%) were Dell SonicWall NSA 220. We believe this firewall model
is to blame for most of the acknowledger victim-sustained behavior: the next most
common OS match was Linux 2.68, with only 14 hosts (3%).
8We note this is not standard Linux 2.6 behavior.
215
8
10
7 RSTs
10
No RSTs
6
10
5
10
4
10
3
10
2
10
1
10
0
10
?1
10
0 1 2 3 4 5 6 7 8
10 10 10 10 10 10 10 10 10
IP Address Rank
Figure 8.12: Rank order plot of amplification factor of two scans for the
www.youporn.com keyword requested with the ?SYN; PSH+ACK? packet sequence: one
with outbound RST and RST+ACK packets being dropped and the other normally.
8.6 Ethical Considerations
Internet Scanning We followed best practices for scans as outlined by ZMap and
Quack [52, 145]. We set up reverse DNS and hosted a webpage on the IP address
we performed scans from, explaining the purpose of our scans. We also listed an
email address to receive complaints and allow people to opt out of future scans. We
received 8 removal requests over the course of our study comprising 2.1 million IP
addresses which we removed from our scans.
Censorship-focused Internet-wide scans require additional careful considera-
tions to avoid causing harm or falsely implicating users in making censored requests.
In prior work on active probing to trigger censorship, researchers used alternative
techniques to avoid having clients in censored countries make requests for banned
content [52, 161, 164, 165]. Similarly in our work, the requests are made by our
216
Amplification Factor
scanning machine from outside the censored countries to all IPv4 addresses, making
it unlikely that a government would punish any individual, due to the directional-
ity and ubiquity of the scans. The packet sequences we probe with are non-TCP
compliant and do not induce any in-country clients to make sensitive requests in
response. For these reasons, we believe wide-scale scans of this nature pose minimal
risk to individuals in censored regions.
Saturation Experiments A natural question with all amplification studies is: at
what point do amplifiers? link saturate? For example, a single host with amplification
factor of 5,000? may not be very valuable if it only has a 100kbps uplink.
Measuring the saturation of a specific amplifier requires sending the triggering
packet sequence in rapid succession and measuring the response it triggers. For
ethical reasons, we do not perform such an experiment. These experiments would
effectively perform denial of service attacks against the specific middlebox or the IP
address, or could adversely impact other networks on path.
We unintentionally triggered mega-amplifiers, and report on our findings in
this chapter. However, after discovering these IP addresses and the nature of their
responses, we removed them from future scans.
Responsible Disclosure Responsibly disclosing our findings is challenging given
the large number of potentially affected vendors and network operators. It is both
difficult to fingerprint specific vendors or manufacturers of middleboxes, and also
difficult to identify the networks where middleboxes are responding from, as they
spoof their source IP address by design.
217
Nonetheless, we attempted to reach out to both operators and vendors of mid-
dleboxes we discovered in our study. We contacted several country-level Computer
Emergency Readiness Teams (CERT) that coordinate disclosure for their respective
countries, including China, Egypt, India, Iran, Oman, Qatar, Russia, Saudi Arabia,
South Korea, the United Arab Emirates, and the United States. We also reached
out to several middlebox vendors and manufacturers, including Check Point, Cisco,
F5, Fortinet, Juniper, Netscout, Palo Alto, SonicWall, and Sucuri.
We also publicly provide a repository of scripts that can help manufacturers
and network operators test their middleboxes for amplifying behavior.
8.7 Countermeasures
Unlike previous amplification attack vectors [139, 140, 148], our attack is not
isolated to a specific protocol and impacts a wide range of implementations and
devices. Unfortunately, this means there is no single vendor or network that can
be patched to correct the problem. Instead, this issue is systemic to middleboxes,
particularly those that must operate seeing only one side of a connection.
Nonetheless, we offer potential remedies that can eliminate or partially miti-
gate amplification attacks, for both middleboxes and potential victims.
8.7.1 Middleboxes
Connection directionality While many middleboxes see asymmetric sides of
a connection (e.g., only traffic to the server), there are others that see both sides,
218
such as middleboxes deployed at the gateways of networks. These middleboxes
can accurately infer if a connection is live and only inject content if the three-way
handshake is valid. We recommend such middleboxes require seeing traffic in both
directions (to client and to server), and only inject block pages if this condition is
met. This makes it more difficult for an attacker to spoof a connection, as it is
infeasible for them to get both sides of a spoofed connection to pass by the same
middlebox to induce injection. However, this solution will not work for large-scale
middleboxes that sit in large transit networks and more frequently see only one side
of a connection.
Limit injected response sizes Some middleboxes inject large block pages, di-
rectly enabling large amplification attacks. An alternative approach is for these
middleboxes to only respond with a single RST to close a forbidden connection, or a
with a minimal HTTP redirect to a different server that hosts a block page. If the
middlebox?s response size is smaller than the minimum size required to trigger it,
this ensures that the middlebox will not be a productive amplifier.
Egress filtering Though middleboxes are only supposed to block websites for a
limited group (such as a country or within a corporate or school network), many
operate ?bidirectionally?, such that users outside the network accessing content
within can also trigger injected responses. For instance, users outside China can
still elicit the Great Firewall of China to inject RST packets despite not being the
intended target of censorship. Instead, middleboxes should be configured to only
censor requests originating from within the intended network, limiting the scope of
219
victims of amplification.
Remove or limit censorship devices Many middleboxes inject block pages into
censored HTTP requests which use an outdated protocol that has been far surpassed
in traffic volume and page loads by HTTPS [166]. The utility that HTTP-injecting
devices provide is shrinking, and will ultimately disappear as more sites use TLS.
However, the damage they inflict via amplification attacks will remain until these
devices are removed. Disabling HTTP injection in these devices altogether would
prevent abuse from attackers.
8.7.2 End Hosts
End hosts can take steps to mitigate the potential impact of these attacks.
Hosts that drop outbound RST packets are more susceptible to TCP handshake-
based attacks, but hosts that do not are susceptible to sustaining a packet storm
from a victim-sustained amplifier. Instead, we recommend end hosts be configured
to drop outbound RST packets probabilistically; this prevents an infinite packet
storm, while still offering some protection from handshake-based amplifiers.
8.8 Conclusion
In this chapter, I presented the first non-trivial TCP-based reflected amplifica-
tion attacks, and demonstrated that middleboxes could be automatically rendered
ineffective at policy implementation to disastrous effect. To discover these attacks,
I trained Geneva directly against censoring middleboxes with a new fitness function.
220
We then scanned the Internet dozens of times and find over 200 million IPv4 ad-
dresses that provide amplification from 1? to over 700,000?, as well as others that
effectively yield infinite amplification.
Through a series of thorough follow-up experiments, we found that these TCP
amplifiers are predominantly middleboxes, and frequently nation-state censorship
devices. It has long been understood that nation-state censors restrict open com-
munication for those in their borders; our work shows that they pose an even greater
threat to the Internet as a whole, as attackers can weaponize their powerful infras-
tructures to attack anyone.
Our results show that middleboxes introduce an unexpected, as-yet untapped
threat that attackers could leverage to launch powerful DoS attacks. Since the
publication of this work [5], these attacks have since been found in the wild [167].
Protecting the Internet from these threats will require concerted effort from many
middlebox manufacturers and operators. To assist in these efforts, we released our
code publicly available at:
https://geneva.cs.umd.edu/weaponizing
In the next chapter, I will demonstrate another attack that renders middle-
boxes ineffective at correctly executing their policies by coercing them to disrupt
innocuous communication.
221
Chapter 9: Weaponizing Censors for Avail-
ability Attacks
The previous chapter demonstrated that middleboxes can be rendered inef-
fective at policy implementation by implementing policy when they should not, to
disastrous effect. The previous chapter focused only on HTTP, however, and did
not affect censoring middleboxes that drop traffic to censor. This leads me to ask:
can middleboxes that drop forbidden traffic to censor also be weaponized to launch
attacks?
To answer this question, in this chapter I demonstrate a second attack that
shows middleboxes can be coerced into executing their policy when they should
not. There are additional benefits to answering this question, because this chapter
also shows that censoring regimes pose a greater threat to the Internet than pre-
viously understood. In particular, we show that attackers can weaponize censoring
infrastructure to keep two end-hosts separated by that country?s borders from being
able to communicate with one another, effectively blocking innocuous hosts. The
attacker need not be within the censoring regime; it merely needs the ability to
source-spoof packets.
222
The attack makes use of a little-studied but widespread feature of many cen-
soring infrastructures: residual censorship. After a given TCP connection triggers a
censor (e.g., by including a forbidden keyword in a plaintext HTTP GET request),
some censors not only tear down the connection, but ?residually censor? all future
communication between the two endhosts (on particular ports) for some period of
time?even if the subsequent traffic is completely innocuous.
Armed with this insight, our attack is relatively straightforward: the adversary
spoofs the victim endhosts, sending packets with censored content across the censor?s
border, thereby triggering censorship and blocking the victims from communicating
for some time.
Although conceptually simple, there are several challenging aspects of this
attack in practice. In particular, most censoring middleboxes are stateful (they track
connections across packets), and so it would seem that the attacker would have to
fake a TCP three-way handshake in order to be able to send a valid censored packet
in the first place. We show that, surprisingly, the attack is indeed possible, even
with a completely off-path attacker.
The central contributions of this chapter are not just in demonstrating the
possibility of weaponizing residual censorship, but also in performing two compre-
hensive feasibility studies for the attack:
First, we perform active measurements to analyze the current state of residual
censorship around the world today: what countries employ it, how it operates, how
long it lasts, and so on. Our results demonstrate a wide variety in the implementa-
tion of residual censorship systems?even within a given country, residual censorship
223
can operate very differently from one protocol to another.
Second, we analyze our attack?s success and feasibility by launching it us-
ing (and targeting) hosts we control in three censoring nation-states?China, Iran,
and Kazakhstan?across four protocols (HTTP, HTTPS+SNI, HTTPS+ESNI, and
Iran?s protocol filter [3]). This study sheds light on the limitations of the attack?for
instance, we find that the attacker generally needs to be on the same side of the
censor as the victim client. It also shows several surprising strengths of the attack.
For example, Iran and Kazakhstan extend the duration of residual censorship when-
ever the censor sees a matching packet?as a result, once the attack is started, the
victim?s own packets help sustain the attack on themselves.
Our results show that even a low-resource attacker can weaponize censoring
nation-states to launch an effective availability attack. In China, a source-spoofing
attacker needs to send only four packets every three minutes to indefinitely sus-
tain blocking between a given pair of end-hosts on a given destination port. An
attacker that can sustain 1,093 packets per second (about 600 kbps) can weaponize
Kazakhstan?s censor, or 728 packets per second (422 kbps) to weaponize Iran?s. Col-
lectively, our results show that censorship infrastructures as they are deployed today
have the potential to cause even more harm to the Internet at large than previously
understood.
The rest of this chapter is organized as follows. In Section 9.1, we review re-
lated work and provide a background on nation-state censorship, residual censorship,
and availability attacks. We describe our experiment methodology in Section 9.2.
Section 9.3 presents our study of the current state of residual censorship, and Sec-
224
tion 9.4 presents our feasibility study from launching the attack against hosts under
our control. We speculate about the breadth of the attack and discuss limitations in
Section 9.5, explore potential mitigations in 9.6, and present ethical considerations
in Section 9.7. Finally, I conclude this chapter in Section 9.8.
9.1 Background & Related Work
How censors operate There have been many measurement studies to under-
stand how various censoring infrastructures work?far too many and varied to do
full justice here. Instead, we highlight several key properties that are critical to
understanding our results.
In-network censors generally have two broad components: a mechanism for
determining whether to censor, and a set of mechanisms for actually tearing down
the offensive connection. Determining whether to censor a connection has been
shown to depend on keywords (e.g., in HTTP GET requests [40,168]), domain names
(e.g., in the Server Name Indication (SNI) field during an HTTPS connection [2, 7,
36]), or the very protocol being used [3,12]. Our evaluation spans different types of
these.
To actually tear down a connection, censors often employ one of two tactics:
Some simply drop the offending user?s (or connection?s) traffic. This is referred to
as null routing, and is obviously a very effective way of terminating a connection.
However, it is also costly for the censor, as it requires them to have a box on the
path between source and destination at which they can drop the traffic. More
225
commonly, censors are deployed not as man-in-the-middle adversaries, but as man-
on-the-side: they sit just off of the path, and the ISPs send copies of packets (in
both directions) to the censor for processing. For such deployments, the censor tears
down the connection not by dropping the offending traffic, but by injecting spoofed
TCP RSTs (or lemon DNS responses [38]) to both client and server, causing them
both to believe the other had terminated the connection. In our experiments, we
study both null-routing and tear-down censors.
Residual censorship Residual censorship is a feature observed in some censorship
systems in which the censor continues to block innocuous requests for a short pe-
riod of time after censoring a forbidden request. We are not the first to observe this
behavior; the Censored Planet datasets [54] report on instances where innocuous
queries are blocked shortly after sending a censored query. It has also been noted
in the context of studying censorship in China [7], Iran [3], and others [40] that,
for some countries and some protocols, once a connection triggers censorship, sub-
sequent connections can also be censored. However, to the best of our knowledge,
we are the first to systematically study residual censorship?what precise protocols
and ports it targets, for how long, and whether innocuous traffic can keep residual
censorship in place?and how attackers can weaponize it.
An important facet of residual censorship is precisely what the censor blocks
after censorship is initially triggered. There are three basic options available to an
adversary: 2-tuple (client IP, server IP), 3-tuple (client IP, server IP+port), or 4-
tuple (client IP+port, server IP+port)1. We are not aware of any censors who use
1It is also conceivable that a censor could block multiple IP addresses at a time, such as a /24,
226
2-tuple residual censorship. All prior work of which we are aware that had identified
some form of residual censorship focused only on 3-tuple. To our knowledge, we are
the first to identify 4-tuple censorship, and yet, as we will show, it is one of the most
widespread forms of residual censorship.
Weaponizing censors Besides the attack outlined in the previous chapter, I am
aware of only one instance of coercing a censor into blocking someone else. In 2014,
the developers of VPN Gate realized that the Great Firewall of China (GFW) had
developed an active system for scraping the IP addresses of their VPNs and auto-
matically blocking them without validating that these IP addresses were actually
VPNs. The researchers began to mix innocent IP addresses into their published list
of VPN servers and were able to control which IP addresses were globally blocked by
the GFW for two days until the GFW added verification checks [169]. Our approach
differs considerably; in our setting, an attacker can trigger the censorship, without
needing the GFW to actively scan them. Moreover, our attack appears to be more
difficult for the GFW to mitigate.
Off-path attacks This chapter fits into a much broader space of off-path at-
tacks. Prior work has explored how to adversely affect TCP connections between
two end-hosts in myriad ways, including TCP side channels [170] and data injec-
tion [171]. Other work has shown that an off-path attacker can weaponize network
infrastructure to launch amplification attacks [147,172,173]. Each of these prior at-
tacks manipulate the state at the end-hosts it targets. Our work broadens this space
by showing that attackers can manipulate the state of middleboxes in the network
but we did not study this.
227
itself to adversely affect end-hosts? ability to communicate.
9.2 Measurement Methodology
As with all censorship measurement research, we are limited by the vantage
points we can access and the censorship we can experience. For our experiments, we
obtained four vantage points within censoring countries: two in China (Beijing), one
in Iran (Tehran), and one in Kazakhstan (Qaraghandy). We also performed exper-
iments from two vantage points we obtained in India (Bangalore) and one vantage
point we obtained in Russia (Khabarovsk), but as we will see in the next section, we
were unable to identify residual censorship in either location. We also obtained van-
tage points located in geographically disparate locations around the world that do
not experience censorship: Australia (Sydney), India (Mumbai), Ireland (Dublin),
Japan (Tokyo), United Arab Emirates (Dubai), and the United States (Iowa, Col-
orado, and Virginia). Figure 9.1 shows the locations of each of these vantage points,
along with the censoring regimes in which we validated our attack.
To test for residual censorship, we issued queries that trigger censorship fol-
lowed by queries that do not trigger censorship on their own and observed if the
censor interferes. The specific queries we issued for each protocol are as follows (for
ease of exposition, we will refer to HTTPS with SNI as simply ?SNI?, and HTTPS
with ESNI as simply ?ESNI?):
? SMTP: Sent an SMTP request with a forbidden email address (such as ?xi-
azai@upup.info? in China [2]) in the MAIL FROM: field.
228
Figure 9.1: Vantage points in our experiments. The green dot is our attacker running
SP3 [9]; black dots represent victim vantage points; and the red dots denote the
location of the servers inside the censoring regimes we studied: China, Iran, and
Kazakhstan (outlined in red). Note that some dots overlap.
? DNS: Issued a DNS query (over both UDP and TCP) with a forbidden ques-
tion record (such as ?facebook.com? in China) both to real DNS resolvers and
to resolvers we controlled.
? HTTP: Issued a HTTP GET request with a forbidden URL in the host header
(such as Host: youporn.com), or with a forbidden keyword as an HTTP
parameter (such as ?q=ultrasurf).
? HTTPS (SNI): Initiated a TLS handshake with a forbidden domain in the
SNI field to servers we controlled.
? HTTPS (ESNI): Initiated a TLS handshake configured with ESNI to servers
we controlled.
229
? Protocol Filter (Iran)2: Sent two messages back to back containing the
message ?test?. As this trivially does not match any approved protocol, it
triggers censorship [3].
We also tested different patterns of follow-up requests and packets. To identify
3-tuple residual censorship, we issued follow-up queries with the same protocol to the
same destination, containing an innocuous payload (such as ?example.com?). We
also tested making innocuous queries of different protocols and malformed payloads
that do not resemble any protocol (such as just the string ?test?). To identify 4-
tuple residual censorship, we sent follow-up packets with the same source port to the
same destination IP address and port (but with an out-of-window TCP sequence and
acknowledgment number) and confirmed that our packets arrived at the destination
correctly and without interference. We performed this check with SYN packets, PSH
packets, PSH+ACK packets, and RST packets. We then repeated these experiments
across many ports to identify which ports were affected.
9.3 State of Residual Censorship
In this section, we present the results from our comprehensive study of the
current state of residual censorship in China, Iran, and Kazakhstan. Table 9.1
provides a breakdown of all of our results in this section.
Which countries employ residual censorship? We found some form of resid-
ual censorship (3-tuple or 4-tuple) for multiple protocols in China (SNI, ESNI, and
2In addition to its standard content filter, Iran uses a protocol filter, which censors unrecognized
protocols on monitored ports [3].
230
Country Protocol Ports Type Duration Bidirectional Timer Reset Mechanism
HTTP Any 3-tuple 90s X 8 Injected RST
China SNI Any 3-tuple 60s X Unknown Injected RST
ESNI Any 3 and 4-tuple 120-180s X 8 Null Routing
HTTP Any 4-tuple 120s X X Null Routing
Kazakhstan
SNI Any 4-tuple 120s X X Null Routing
HTTP 53, 80, 443 4-tuple 180s X X Null Routing
Iran SNI 53, 80, 443 4-tuple 180s* X X Null Routing
Protocol Filter 53, 80, 443 4-tuple 60s 8 X Null Routing
Table 9.1: The current state of residual censorship, among the countries and pro-
tocols we tested (those that we tested but are not in the table did not residually
censor in our tests). We were unable to reproduce SNI censorship in China; in that
row, we report prior results [7]. *: Iran?s SNI residual censorship sometimes lasts
longer than 180s; in a small number of our experiments, we found it to last upwards
of 5 minutes.
HTTP), Iran (HTTP, SNI, and its protocol filter), and Kazakhstan (HTTP and
SNI).
China and Iran in particular employ residual censorship for only some of the
protocols they censor. Neither have residual censorship for any of their DNS censor-
ship (DNS-over-UDP or DNS-over-TCP)3. Further, China does not employ residual
censorship for their SMTP censorship.
Some countries we tested do not employ residual censorship at all against our
vantage points. Both of our vantage points within the Airtel ISP in India experienced
HTTP and SNI censorship, but neither experienced residual censorship. We were
also unable to trigger censorship from our vantage point in Russia to any of our
destination vantage points, so we exclude both of these from our analysis.
What types of residual censorship do censors employ? We find that censors
vary between 3-tuple and 4-tuple residual censorship, depending on the protocol
being censored.
China uses 3-tuple residual censorship for HTTP traffic and censors by in-
3In Iran, although some prior work has reported DNS-over-TCP censorship [55], we are unable
to trigger any DNS-over-TCP censorship at this time (similar to what was reported in [2]).
231
jecting TCP RST packets. This has been observed in the past [24, 40]. Prior work
has reported residual censorship in China for SNI [7] by injecting RSTs, but neither
of our two vantage points experienced any SNI residual censorship to any of our
vantage destinations.
ESNI censorship in China presents a more complicated picture. Less than
1 second after the GFW sees a TLS ClientHello containing the ESNI extension,
it begins dropping all traffic that matches the connection?s 4-tuple (note that the
ESNI packet itself reaches the server unaffected). This is 4-tuple residual censorship.
For approximately five seconds, the GFW also drops all traffic that matches the
connection?s 3-tuple: a short window of 3-tuple residual censorship. But if the
client sends a second ESNI request with the same 3-tuple within the next three
minutes, the GFW will begin dropping all traffic that matches the 3-tuple for three
minutes: a long window of 3-tuple residual censorship. Unlike for HTTP and SNI,
ESNI?s residual censorship does not operate equally in both directions. Researchers
have hypothesized in the past that China censors each protocol using a different
set of middleboxes; the vast disparity between residual censorship implementation
across our vantage points supports this hypothesis [2, 174].
In Iran and Kazakhstan, we find that the mechanism used for residual censor-
ship (null-routing) and type of residual censorship (4-tuple) is consistent between
protocols. As we will see later in this section, however, there are other inconsis-
tencies in the implementations of the residual censorship for each censored protocol
within Iran and Kazakhstan, such as the duration of censorship.
232
Does residual censorship use the same mechanisms as the initial cen-
sorship? We find that residual censorship is generally enforced using the same
mechanism as the initial censorship. For example, China injects RST packets to cen-
sor HTTP normally, and injects RST packets for its residual censorship (the same
is also reported for China?s SNI censorship [7]). China?s ESNI censorship operates
with null-routing, as does its residual censorship. The censorship mechanisms are
also consistent in Iran and Kazakhstan, with one exception.
We find that Iran censors HTTP using multiple methods simultaneously: in-
jecting a block page with a packet that has the RST flag set while simultaneously
null routing the connection. Despite using three censorship mechanisms for regular
censorship, only 4-tuple null-routing continues for residual censorship.
What ports are affected by residual censorship? We tested this by issuing
censored requests to vantage points we controlled destined to all 65,535 ports and
confirmed that all were affected. We find that the ports affected by residual censor-
ship match the ports affected by the regular censorship in each country we studied,
but each country monitors a different set of ports. In China (with HTTP and ESNI)
and Kazakhstan (with HTTP and SNI), we find that we can trigger residual censor-
ship on any arbitrary port, including ephemeral ports. In Iran, however, both the
protocol filter and the standard censorship system only monitor ports 53, 80, and
443, and therefore we can only trigger residual censorship to these ports. Note that
in Iran, residual censorship can be triggered for any protocol on any of those three
ports: for example, we can trigger HTTP residual censorship to port 53.
233
Is residual censorship applied bidirectionally? Even within the same coun-
try, residual censorship is not always applied equally to connections entering the
country as to those exiting the country. Although we find that Iran?s standard
censorship system can be triggered bidirectionally, we confirm the findings of [3]
that the protocol filter (and by extension, its residual censorship) only operates on
flows leaving Iran. China?s ESNI censorship operates bidirectionally, but it operates
differently (and more aggressively) against traffic entering the country than exiting
the country.
For every other censorship system we tested, we were able to trigger censorship
(and residual censorship) equally from outside the country. Like all censorship
research, our study is limited by the vantage points we can access; it is possible
that there are other censorship systems that only employ residual censorship on
connections leaving the country that we cannot study.
We find that the direction of subsequent traffic is important in whether it
is affected by residual censorship. If a client within a censored regime makes a
forbidden request to a server outside, we find that only traffic sent by the client
is affected by residual censorship. This makes sense: traffic direction is encoded
in both 3-tuple and 4-tuple flow tracking. However, this does impose an important
limitation on attackers: an attacker generally must be on the same side of the censor
as their victim.
What packets are affected by residual censorship? Which packets are
impacted by residual censorship changes depending on the censorship mechanism
234
used. China?s HTTP residual censorship mechanism of injecting RST packets does
not initiate until after the client has sent a new request in a PSH+ACK packet. None
of the 3-way handshake is impacted; it reaches the server without interference.
However, China?s ESNI residual censorship (both 3-tuple or 4-tuple) null-routes:
all packets leaving the client, including SYN packets are affected by the residual
censorship.
We find the same effect for the null-routing residual censorship in Kazakhstan
and Iran. Note that the direction of traffic matters for every censor we studied:
only packets from the client are impacted. If a server sends packets in a connection
being null-routed, the packets will reach the client unaffected.
How long does residual censorship last? To determine the duration of residual
censorship, we performed an experiment in which we varied the duration of time
between triggering censorship and making a follow-up request, and recorded whether
residual censorship took place.
We find the duration of residual censorship also varies between countries and
protocols, but is generally less than three minutes in every country we studied.
HTTP residual censorship in China lasts approximately 90 seconds (as observed
in [24,40]) and ESNI is residually censored for 120 seconds (as observed in [36]). We
note that for ESNI censorship in China, other researchers have reported both 120
and 180 seconds of residual censorship [36]. In Iran, while its protocol filter residually
censors for 60 seconds, its HTTP and SNI censorship systems residually censor for
180 seconds (and in a small number of our experiments, the SNI system continued
235
to residually censor requests up to approximately 5 minutes). In Kazakhstan, both
HTTP and SNI residual censorship systems operate for 120 seconds.
We find that both Iran and Kazakhstan restarts their residual censorship timer
if the client sends a matching packet, thereby extending the duration of time that
the client is affected. Due to TCP retransmissions, in practice this means that Iran
and Kazakhstan will drop traffic for much longer than their original time. This
is presumably done to make their censorship systems more robust against TCP
retransmissions. As we will see in the next section, however, this timer reset makes
our attack easier to launch.
Does residual censorship require a full 3-way handshake? No! We were able
to trigger residual censorship without a proper 3-way handshake for every censor we
studied. To discover this, we followed the methodology of Bock et al. [5] to attempt
subsets of the TCP 3-way handshake before sending a PSH+ACK with a censored
keyword.
The Airtel ISP in India enacted residual censorship without any of the 3-way
handshake (one needs only send the PSH+ACK). Censorship of clients within this ISP
appears to maintain no TCP state for their censored system.
Other countries required a subset, but not the entirety, of the TCP 3-way
handshake. We sent a single SYN packet with a decremented sequence number,
followed by a PSH+ACK containing the forbidden payload (we will refer to these two
packets as the ?censorship trigger?). This successfully triggered censorship (and
residual censorship) for every censorship system we studied.
236
 1
 0.8
 0.6
 0.4
 0.2
 0
 1  2  3  4  5  6  7  8  9
Number of residual-censorship triggers
Figure 9.2: The relationship between the number of times censorship is triggered
and the reliability of HTTP residual censorship, as measured from our Beijing 2
vantage point. As the number of times residual censorship is triggered increases,
the reliability improves. (Error bars represent 95% confidence.)
How reliable is residual censorship? We define the ?reliability? of residual
censorship as the fraction of follow-up innocuous requests made within the residual
censorship window that are successfully censored. Note that this is distinct from the
reliability of censorship itself, which traditionally refers to the fraction of forbidden
requests a censor successfully censors [2].
We performed an experiment to measure residual censorship reliability from
each of our censored vantage points. We triggered censorship and then made one
innocuous request per second and recorded how many requests were impacted; this
experiment was repeated 10 times, spaced evenly throughout a 24 hour period. For
every protocol in Iran and Kazakhstan and for ESNI censorship in China, we find
that 100% of our requests were residually censored as expected. For HTTP residual
censorship in China, however, we find that only approximately 50% of our requests
are correctly residually censored. We find this pattern holds bidirectionally.
237
Fraction of innocuous
requests censored
We next explored if we could improve the reliability of HTTP residual cen-
sorship. We performed an experiment in which we varied the number of forbidden
requests we made before starting our test innocuous queries. From our Beijing 1
vantage point, we varied the number times we issued forbidden requests between 1
and 9 times, and then made one innocuous request per second for one minute. We
randomized the order of the trials, implemented 5 minutes of sleep between each,
and issued innocuous test queries before starting each experiment to ensure that
the experiments did not interfere with each other. We repeated this experiment 6
times.
Figure 9.2 shows the average fraction of innocuous queries that were censored
as a function of the number of residual-censorship triggers we send ahead of time. We
find that as we increase the number of forbidden queries, we improve the reliability
of residual censorship and after seven retries, the success rate levels out.
We hypothesize that the GFW is internally load balancing queries from this
vantage point and that different middleboxes within the GFW do not communicate
with one another when residual censorship starts. As we add additional queries,
we are more likely to trigger residual censorship with multiple middleboxes, thereby
increasing the likelihood that as future requests are made, they will get routed
through a middlebox with active residual censorship.
238
Destination Location
Kazakhstan Iran Beijing 1 Beijing 2
Victim Location HTTP HTTPS HTTP HTTPS HTTP ESNI HTTP ESNI
Australia Sydney X X X X 50% 10% 55% X
Beijing 1 8 X X X N/A N/A N/A N/A
China
Beijing 2 8 X X X N/A N/A N/A N/A
Mumbai 8 X X X 8 8 8 30%
India Bangalore 1 X X X X 50% 10% X X
Bangalore 2 X X X X 25% 10% X X
Iran Tehran X X N/A N/A 8 50% 75% X
Dublin 1 8 X X X 8 8 8 5%
Ireland
Dublin 2 8 X X X 50% 8 8 8
Japan Tokyo X X X X 25% 8 8 X
Kazakhstan Qaraghandy N/A N/A X X 50% 8 20% 8
Russia Khabarovsk X X X X X 8 X 8
Dubai 1 8 X X X 85% 8 95% 8
UAE
Dubai 2 8 X X X 8 10% 8 50%
Colorado X X X X 8 8 X 8
USA Iowa 8 X X X 8 8 8 60%
Virginia X X X X 50% X 55% 8
Table 9.2: Success rates in weaponizing each country?s censorship infrastructure
against each victim vantage point from our attacker in Seattle, WA. (X denotes
100%, 8 denotes 0%, and N/A denotes a location that does not cross the border of
the censor.) Note that the success rates are not always consistent, even to victims
in the same country, or between censored protocols in each censored regime. Iran is
consistent and reliable; Kazakhstan is consistently unreliable for HTTP, but consis-
tently reliable for HTTPS. In China, however, the attack was not always consistent
by protocol, victim location, or server location.
9.4 Residual Censorship Attack
The results from our measurement of residual censorship indicate that it would
be possible for an off-path attacker to get a victim?s connections residually censored.
Because censors do not look for the entire 3-way handshake, an attacker could
simply source-spoof the victim, send a censored request, thereby residually censoring
communication between the victim and server.
In this section, we empirically evaluate the feasibility of this attack by launch-
ing it against ourselves.
9.4.1 Launching the Attack
Since all of our vantage points employed egress filtering, we cannot launch the
attack directly from our censored vantage points within China, Iran, or Kazakhstan.
239
Instead, we leverage a public deployment of SP3 (A Simple Practical & Safe
Packet Spoofing Protocol) [9] deployed at the University of Washington, to ethically
send source-spoofed packets and thus act as our attacker. SP3 is a web server
that offers the ability to send spoofed packets, but mandates that a client consent
to receiving source-spoofed packets. A client gives this consent by creating and
holding open a websocket connection to SP3. When the client connects, SP3 returns
a UUID16 challenge string. As long as the websocket connection is held open, other
servers can connect to SP3 with a websocket, supply the challenge code, and can
give SP3 packets through binary frames to send to that client.
We launched the attack on ourselves as follows. We used SP3 to send a se-
quence of packets to trigger residual censorship to a server that crosses the censor,
with the source addresses spoofed to be a test victim under our control. Recall that
traffic direction matters to residual censorship in each of these three countries: the
attacker must be on the same side of the censor as the victim. Since SP3 is located
in the United States, this means we are launching the attack from outside-in for
each censoring country. Fortunately, as we saw in Section 9.3, residual censorship
is bidirectional for most of the protocols we study. Our vantage points within each
country acted as the server; we launched the attack against all of our geographically
disparate vantage points around the world as victims.
Then, we used our ?victim? to make requests to the server, and recorded if
the connection succeeded or if it was impacted by residual censorship. We varied
our test request based on the protocol and type of residual censorship. For 3-tuple
residual censorship, the client makes an innocuous request with a different source
240
port to the same server IP address and port. For 4-tuple residual censorship, we
ensure the client uses the same source port as the attacker. Of course, in a real
attack scenario, the attacker cannot know the source port a victim will use a priori.
Therefore, to weaponize 4-tuple residual censorship systems, the attacker would re-
trigger censorship for all 65,535 possible source ports. We investigate the limitations
imposed by this later in this section; for now to demonstrate the attack, we allow
the attacker to access the source port.
We launched this attack against every uncensored vantage point for every
bidirectional, residually censored protocol to each of our vantage points in China,
with HTTP and ESNI, in Kazakhstan, with HTTP and SNI, and in Iran, with
HTTP and SNI. Recall that Iran?s protocol filter censorship cannot be triggered
from outside the country, and therefore we omit it from these experiments. To
determine attack reliability, we repeated each attack 20 times.
Before we launched each attack, we also record two traceroutes. First, we
performed a regular traceroute between the victim and the destination. Second, we
performed a source-spoofed traceroute using SP3. Our server (inside the censored
regime) connects to SP3 and consents to receive TCP SYN packets with the TTL
ranging from 1 to 30, with the source address of the packets spoofed to be the
victim. While SP3 sends these packets, the victim (a vantage outside of the censored
country) records TTL ?Time Exceeded? messages. This allows us to reconstruct the
network path taken by the packets spoofed by SP3, and compare it to the network
path taken by the victim?s test request.
241
9.4.2 Results
In every country we tested, we could successfully weaponize the censorship
infrastructure against every victim vantage point at least once around the world.
We find that the attack is sensitive to the chosen protocol (for example, HTTPS
offers better results in Kazakhstan than HTTP). Table 9.2 presents an overview of
our results.
Collectively, our results suggest that there are many shared paths through the
censorship infrastructure of each country, and an attacker that can access just one
source spoofed capable machine is capable of launching highly effective availability
attacks. A more well resourced attacker could likely get even better results by
choosing vantage points with even more similar paths as their victims.
In the remainder of this section, we detail the results in each of the countries
we tested.
Kazakhstan In Kazakhstan, 100% of the attacks succeeded if the attacker trig-
gered residual censorship with SNI payloads. However, we find that if a forbidden
HTTP payload is used instead, the success varies depending on the victim vantage
point, and this pattern persists irrespective of the port the attacker uses.
First, we explored why the success of the HTTP attack changes depending on
the victim location. We hypothesize the reason for this is that the network path
of the packets sent by the attacker and sent by the victim enter at different ingress
points within the censor?s infrastructure, and triggering censorship at one ingress
does not initiate residual censorship at the other. To gain insight into this, we
242
can compare the two traceroutes taken before the attack is launched: one from the
attacker and one from the victim. Although both traceroutes are performed with
the same source IP address, since they start from different geographic locations, the
packets will necessarily take (at least partially) different paths to reach the server.
By comparing the paths taken for each traceroute, we can try to determine if the
paths converged before the packets reached the censor, or afterwards. If the paths
converge after the packets reach the censor, it is possible that the attacker?s traffic
and victim?s traffic will take different ingress points, and therefore be processed by
different censoring middleboxes. To determine how many hops away the censor is
from the server inside the censoring regime, we send TTL-limited forbidden queries
until we initiate censorship. We find that our vantage point inside Kazakhstan is
5 hops away from the censor. Necessarily, this analysis will not be perfect; many
routers and middleboxes can simply choose not to send a TTL Time Exceeded mes-
sage and hide themselves from this analysis.
Nevertheless, for all victims for which the attack failed, we find that paths do
not converge until less than 5 hops away from reaching the server.
Why then, even for victims with paths that do not converge, does the attack
succeed when HTTPS is used, even when the same destination ports are used as in
HTTP? Frankly, we do not know. We hypothesize this could be due to Kazakhstan
having physically fewer HTTPS censoring middleboxes, and therefore fewer internal
paths for the attacker and victim?s traffic to be split between.
What sending rate is required for an attacker to weaponize Kazakhstan?s cen-
sor to block a 3-tuple (source IP address, destination IP address, destination port)?
243
Since both HTTP and SNI residual censorship can be triggered on any port, the
attacker can choose to use whichever is more convenient. Both are 4-tuple residual
censorship systems, which means the attacker must trigger censorship with the same
source port that the victim will use. Since the attacker cannot know the victim?s
source ports ahead of time, instead the attacker will trigger censorship for all 65,535
possible source ports. It requires 2 packets to trigger censorship (a SYN, followed
by a PSH+ACK with the forbidden payload), and once triggered, residual censorship
will last for 120 seconds. Therefore, an attacker needs to send 2 ?65,535 = 1,093
120
packets per second to sustain the attack indefinitely. The SYN packet is 54 bytes
long (including the Ethernet header), but the length of the PSH+ACK will change
depending on the protocol. Our HTTP trigger payload is 91 bytes long (54 bytes
of headers and 37 bytes for the HTTP request), and our HTTPS trigger payload is
379 bytes long (54 bytes of headers and 325 bytes of TLS ClientHello). To sustain
the HTTP attack, an attacker must be able to send (54 + 91) ?65,535 = 79,188
120
bytes per second, or 634 kbps. For HTTPS: (54 + 379) ?65,535 = 236,473 bytes per
120
second, or 1,892 kbps.
Recall that we found no difference in reliability between HTTP and SNI, and
therefore an attacker could opt to use the smaller HTTP triggers and reduce the
amount of required bandwidth unless their victim was located in a geographically
disadvantageous location.
Would it be advantageous for an attacker to try to trigger residual censorship
with both protocols? We cannot be sure, but an attacker likely does not need to.
Since both censorship systems reset the duration of their residual censorship anytime
244
a matching packet is encountered, once the attacker triggers one censorship system,
any packets sent to trigger the other will reset the timer for the first. We also
note that the effects of censorship for HTTP and SNI are identical: for this reason,
we cannot be certain whether packets being residually censored by one censorship
system reach the other.
China The attack was inconsistent to both of our vantage points in China. The
success rate of the attack varied based on multiple factors: the victim location,
server location, and the chosen residually censored protocol.
As in Kazakhstan, we consulted the traceroutes to examine if the network
paths could explain the lack of success for the attack. We repeatedly sent TTL-
limited forbidden requests to determine how many hops both of our machines are
away from the GFW (6 hops and 9 hops respectively). We hypothesized that the
attack should succeed greater than 0% of the time if the paths converge before it
reaches the censor. Recall from Section 9.3 that in China, triggering HTTP residual
censorship once does not guarantee that all future requests that match the 3-tuple
will be censored; therefore, even if the attacker?s and victim?s paths converge before
packets reach the GFW, we cannot guarantee success. Nevertheless, the traceroutes
do not contradict our hypothesis: we find almost no path convergence for every
victim against which the attack frequently failed (such as Ireland 1& 2).
Why are these success rates not either 100% or 0%, as in Iran and Kazakhstan?
Bock et al. observed a similar phenomenon in [2] and posited that the GFW is a
heterogeneous deployment of many different middleboxes, all running in parallel.
245
We hypothesize that fractional success rates are caused by geographic variation in
deployments of the GFW itself, and load balancing between multiple middleboxes
running in parallel.
For an attacker, weaponizing the GFW poses an interesting opportunity, as it
offers both types of residual censorship (3-tuple or 4-tuple) and multiple different
censorship mechanisms (null routing or injected RSTs). Attackers within the country
can choose to trigger ESNI residual censorship at either the 3-tuple or 4-tuple with
null routing, or trigger 3-tuple HTTP residual censorship to get injected RSTs. Out-
side the country, ESNI censorship is limited to 4-tuple residual censorship, so the
attacker can choose whether to launch one or the other depending on the location
of their victim.
With 3-tuple censorship systems at an attackers disposal, weaponizing the
GFW to prevent a victim from communicating with a given destination IP address
and port is trivial. An attacker needs to trigger censorship only once to initiate the
residual censorship, and can trivially re-send the censorship triggers to improve the
reliability if needed. If 3-tuple residual censorship is unavailable, the attacker can
fall back to leveraging 4-tuple residual censorship, as we demonstrated in Iran and
Kazakhstan, which also lasts for 120 seconds. To trigger ESNI?s 4-tuple residual
censorship, the attacker must send a SYN (54 bytes), followed by the PSH+ACK con-
taining the ESNI trigger (54 bytes for headers and 65 bytes of payload). An attacker
needs to send 2 ?65,535 = 1,093 packets per second, equivalent to (54 + 119) ?65,535
120 120
= 94,480 bytes per second, or 756 kbps to sustain the attack indefinitely.
Could an attacker simply try to invoke both censorship systems simultaneously
246
in an attempt to improve the reliability of this attack? We find the answer is yes:
the attacker can send multiple back-to-back packet sequences to trigger censorship
using different protocols, as long as each source port is different. For example, the
attacker can trigger 3-tuple HTTP residual censorship, followed by a trigger for 4-
tuple ESNI censorship with a different source port. We find that if both triggers
are sent with the same source port, only the first trigger will be successful. The
reason for this was posited by [2]: once the HTTP censorship system sees the ESNI
payload, it stops paying attention to the connection. However, since the HTTP
residual censorship is 3-tuple, the attacker can use one source port to trigger the
HTTP residual censorship system and still trigger 4-tuple residual censorship on all
of the other source ports.
With both censorship systems performing residual censorship in parallel, which
one affects a victim? We find the answer is the ESNI censorship system: this is
because the ESNI residual censorship affects all packets, but the HTTP residual
censorship system does not teardown a connection until after the 3-way handshake
has completed. In our testing, we did not see an improvement in reliability when
combining censorship triggers, but its utility may increase for victims in other geo-
graphic locations.
Iran Our attack was most successful in Iran. Here, 100% of the attacks succeeded
using both forbidden HTTP and HTTPS (SNI) against every victim we tested. Both
of these protocols are 4-tuple censored for a full 180 seconds, and both timers reset
in the presence of any matching packet.
247
What is required for an attacker to effectively block a victim from commu-
nicating with a destination IP address and port across the censor? The attacker
requires 2 packets to trigger censorship (a SYN, followed by a PSH+ACK with the for-
bidden payload), and once triggered, residual censorship will last for 180 seconds.
Therefore, an attacker needs to send 2 ?65,535 = 729 packets per second to sustain
180
the attack indefinitely. The triggers are the same for Iran as for Kazakhstan: the
SYN packet is 54 bytes long (including the Ethernet header), our HTTP trigger pay-
load is 91 bytes long (54 bytes of headers and 37 bytes for the HTTP request), and
our HTTPS trigger payload is 379 bytes long (54 bytes of headers and 325 bytes of
TLS ClientHello). To sustain the HTTP attack, an attacker must be able to send
(54 + 91) ?65,535 = 52,792 bytes per second, or 422 kbps. For HTTPS: (54 + 325)
180
?65,535 = 137,987 bytes per second, or 1.1 Mbps?a modest amount.
180
The length of the payload required to trigger SNI censorship is significantly
larger than the payload required to trigger HTTP censorship, and since each protocol
worked equally well for our attacker, there is no incentive to use the longer SNI
trigger. Of course, like in Kazakhstan, if the HTTP trigger fails for a given victim
location, A bandwidth constrained attacker could opt to start with HTTP triggers
and only switch to SNI triggers if their victim is in a disadvantageous geographic
area.
248
9.5 Attack Impact
Here, we reason about the potential impact of this attack by considering the
potential breadth and limitations.
Breadth What is the true breadth of this attack? Unfortunately, we are limited
by our vantage points to answer this definitively. Nevertheless, we can speculate
about what other systems could potentially be weaponized.
We restricted our analysis only to censoring countries in which we could obtain
vantage points that experienced residual censorship. Although we were unable to
test this attack in India or Russia, prior work has found that other ISPs in India
(Vodafone and Idea [28]) and Russia [175] employ null routing for censorship. De-
pending on how the null routing is implemented, these ISPs may be vulnerable to
this attack, but we were unable to obtain vantage points within these systems to
confirm this.
Our analysis assumed that either the server or victim is located physically
inside a censoring regime. However, researchers in the past have observed that
traffic that simply traverses the Internet borders of a censored regime can trigger
censorship, even if neither the client nor server are located within the country [38].
Performing this attack against traversing traffic is an interesting area of future work.
We can also speculate about the breadth of this attack by examining the results
of Quack, a powerful censorship scanning tool from Censored Planet [52]. Every day,
Quack sends well-formed HTTP GET requests with potentially forbidden domains in
the Host: header to echo servers around the world to identify interference. Quack
249
records the cause of censorship and also monitors for 3-tuple residual censorship
(called ?stateful disruption?). In the December 27th, 2020 dataset, Quack had
identified censoring middleboxes in 33 countries where 3-tuple stateful disruption
was present and in 18 countries where null routing was used to censor. These
results suggest that this attack may be significantly more broadly applicable.
Limitations Despite the potential breadth, there are limitations to this attack.
An attacker must be able to obtain a vantage point (1) without egress filtering that
(2) shares a similar enough path with their victim and (3) the traffic crosses a censor
(4) with residual censorship (5) that can be triggered statelessly.
Our experiments suggest that there are a surprisingly high number of joint
network paths, even for geographically disparate victims (such as Australia and
USA). Still, not every attacking vantage point will be able to affect every victim,
and the attacker has no mechanism to confirm whether their attack successfully
blocked the victim.
Another potential limitation is that this attack may not work for every IP
address. Researchers have observed in the past that some censorship systems vary
their response based on the destination [3]. We were unaffected by this for all of
our victim locations, but an interesting area of future work would be to repeat this
study across a very broad range of IP addresses.
Lastly, there are some limitations to how completely an attacker could cut
off two hosts. Could an attacker weaponize these censorship systems to completely
cut two hosts from communicating? It depends on the type of residual censorship.
250
We believe it is infeasible for an attacker to use a 4-tuple censorship system to
completely prevent two IP addresses from communicating, as this would require
triggering censorship for all 232 possible combinations of source and destination
ports. However, for a 3-tuple residual censorship system, the attacker could trigger
residual censorship 65,535 times to all possible destination ports and accomplish
this.
Does this attack become infeasible if middleboxes start properly tracking the
3-way handshake? Yes, but we believe it would be difficult for censors to do so.
Particularly at the scale at which nation-state censors must operate, censors must
content with path asymmetry: the network path used by traffic exiting the country
may be different than the path used by traffic entering the country, even for the same
connection. This makes properly tracking the 3-way handshake difficult: different
middleboxes may see the SYN packet from the client than those that see the SYN+ACK
packet from the server.
Can the attacker trigger residual censorship for UDP-based protocols as well?
In our experiments, we only identified residual censorship for TCP-based protocols.
However, this is only a partial limitation, since all of the null-routing residual cen-
sorship we studied affected both TCP and UDP traffic. If an attacker wishes to
interfere with UDP traffic, she can simply trigger null-routing residual censorship
over TCP and the victim?s UDP traffic will be censored.
251
9.6 Mitigations
In this section, we discuss our recommendations to potential victims and cen-
soring regimes to mitigate this attack.
9.6.1 Censors
Null-routing should track sequence numbers, or should not be used. All
of the null-routing censorship systems we study (Iran, Kazakhstan, and China?s
ESNI censorship) operate only at the 4-tuple, and do not do any validation of the
sequence or acknowledgment numbers of the packets they drop. Unfortunately,
this implementation of censorship with null-routing is inherently flawed. TCP is
designed to be tolerant to packet loss, so most end-hosts will continue to retry
sending packets when confronted with null-routing. This forces censors to maintain
the flow?s null-routing for a long enough period of time to exceed the duration of
time that network stacks will retransmit (or further reset their internal timer when
an offending packet is sent). Unfortunately, the longer this window of time is, the
easier it is for an attacker to abuse null-routing to perform this attack. Therefore, to
eliminate 4-tuple residual censorship, we recommend that middleboxes who use null-
routing only drop packets with the correct sequence and acknowledgment numbers,
or to avoid using null-routing entirely.
Eliminate (or modify) 3-tuple residual censorship. Presumably, 3-tuple
residual censorship is designed as a deterrent system: users who search for a forbid-
252
den term are ?punished? and forbidden from trying to communicate with the same
server again for a small period of time. Unlike 4-tuple residual censorship, the effect
of 3-tuple residual censorship is salient to the user. However, we question the effi-
cacy of this feature as a deterrent, since there is no communication or information to
the end-user to alert them why they are continually being censored in all countries
we tested in (China, Iran, Kazakhstan). Consider a user in China that searches for
a long string of text containing a single verboten word. The GFW only sends RST
packets: it does not inform the user the cause of censorship, and an uneducated user
may be unaware that censorship is the reason their subsequent connections continue
to fail. Worse, as we showed in Section 9.3, residual censorship is not even always be
effective, and can fail depending on the users network route. For these reasons, we
recommend that middleboxes?particularly the GFW?remove their residual cen-
sorship components altogether or modify their response from null routing to sending
a block page or some response that indicates to the user who is being censored that
they are being ?punished? for their search.
We also echo many of the suggestions made by Bock et al. [5], as the root
of our attack also stems from the ability to trigger censorship systems without a
proper 3-way handshake.
9.6.2 Potential Victims
Unfortunately, once the attack is initiated, there is very little a victim can do
to stop it. Nevertheless, we make recommendations here to mitigate or work around
253
this attack.
Use a proxy. Since our availability attack is generally limited by the 3-tuple or
4-tuple, changing the source IP address that the censor sees is an effective way to
bypass the attack. Therefore, we recommend that an affected user switch to use
some proxying system, such as VPN, Tor, or an HTTP proxy. Further, a victim
can rapidly rotate between proxies in an effort to stay ahead of an attacker. Unfor-
tunately, this is only a stopgap solution; if the path from the victim to the proxy?s
entry nodes also crosses the censor, an attacker can simply switch to attacking the
proxy itself.
Do not immediately try to reconnect. In some censorship systems, the
presence of additional matching traffic causes the residual censorship timer to reset,
thereby prolonging the attack. Therefore, if a user is affected, they should not
continue trying to reconnect; instead, they should stop sending network traffic and
wait a few minutes.
9.7 Ethical Considerations
Experiment Design We took care in designing our experiment to ensure that
it would not involve or cause harm to any other users. Our experiments do not
induce any in-country clients outside of our control to send forbidden requests; all
communication was strictly between hosts we fully controlled. To the best of our
knowledge, none of our vantage points in-country were NATted with other hosts,
making it unlikely other users were affected.
254
Responsible Disclosure It is difficult to responsibly disclose our findings, as the
affected censorship systems have historically been unresponsive to similar issues [5]
or unwilling to intentionally weaken their censorship systems. Nevertheless, we are
in the process of contacting several country-level Computer Emergency Readiness
Teams (CERT) that coordinate disclosure for their respective countries.
9.8 Conclusion
In this chapter, I demonstrated that it is possible to weaponize the censorship
infrastructure in Iran, Kazakhstan, and China to perform availability attacks. We
launched this attack against 17 different geographically disparate victims under our
control and show that even a weak attacker (with access to a single low-bandwidth
source spoofer) can launch effective availability attacks.
Collectively, Chapters 8 and 9 show that middleboxes can be rendered inef-
fective at executing their network policy by coercing them to censor content they
should not. These results show that the negative impact of censorship extends well
beyond the censor?s borders, and that they pose an even larger threat to the Internet
writ large. Taken together, Chapters 3-9 constructively prove my thesis, showing
multiple ways that censoring middleboxes? policies can be rendered ineffective in
automated manners.
In the next chapter, I will take a step back and discuss what it would take
for a censored regime to defend itself against the myriad attacks I presented in this
dissertation and reason about the limits of my automated approach.
255
Chapter 10: Defending Against Geneva
10.1 What would it take to defend against Geneva?
What would it take to defend against Geneva?s strategies? In this dissertation,
I have presented a total of 141 evasion strategies that evade censorship in 4 countries
(China, India, Iran, and Kazakhstan) across 16 unique, real-world censorship sys-
tems (China: HTTP, HTTPS SNI Primary, HTTPS SNI Secondary, HTTPS ESNI,
SMTP, FTP, DNS; India: HTTP, HTTPS; Iran: HTTP, HTTPS, DNS-over-TCP,
Protocol Fitler; Kazakhstan: HTTP, HTTPS, HTTPS MITM). To defend against
all of these strategies, the minimal characteristics that a middlebox must have are:
it must possess no bugs, fully process every packet in a connection, and always
maintain consistent state with the end hosts. Intuitively, if all of these conditions
are met, then the middlebox will correctly process exactly the same set of packets
as the end server, or the packets will not be delivered. In this section, I will show
that each of these are necessary, and that if any one does not hold, there may be
a potential for attack. Much of this section will focus on TCP-based protocols, as
they require more from the censor, but I will also argue these characteristics are still
necessary to censor DNS over UDP.
256
Fix Bugs Most trivially, packet manipulators can make use of bugs to evade
policies, so a first step is for middlebox manufacturers to fix all their bugs. Many
Geneva strategies, particularly the server-side strategies, are examples of this. Bugs
represent 28/141 of Geneva?s strategies: Turnaround (1), Invalid Options(1), Four
Element Request Line (3), Host Header Shield (6), Host Header Whitespace (14),
Path Confusion (2), and Double FIN (1).
If there are exploitable bugs available in the middlebox, they may be leveraged to
render the middlebox ineffective.
Fully Process All Packets There are multiple reasons for which a middlebox
would not fully process every packet within a connection. Some middleboxes stop
paying attention to a connection after a certain threshold number of packets have
been exchanged, such as Iran?s Protocol Filter, which only tracked the first 9 pack-
ets in a connection [3]. Some middleboxes watch only until specific packets have
been sent [3,4], such as China?s backup SNI censorship system, that stops watching
after certain TLS messages have been sent by the client. I have reported on cases
in which middleboxes stop processing packets after the connection appears to have
been terminated. This problem also arises in the application-layer space: some mid-
dleboxes have a fixed amount of buffer space they store requests in, and if a request
is too long, the middlebox can miss the forbidden request. Other middleboxes miss
traffic due to asymmetric routes, load balancing, and more [5].
This broad category encompasses the majority of the strategies reported in
this dissertation, as in particular, most of the application-layer strategies trick the
257
middlebox into not processing or identifying the forbidden keyword. In total, fully
processing all packets would eliminate 74/141 strategies in the species.
If a middlebox does not monitor all traffic and fully process each packet in a given
connection, a packet manipulator may be able to inject a packet that causes the
middlebox to ignore the rest of the connection, become desynchronized from the con-
nection, or miss the forbidden query entirely.
Mandate Consistent State Many packet manipulation attacks exploit the eaves-
dropper?s dilemma, which states that it is difficult for a middlebox to maintain con-
sistent state with the end-hosts of the connection. For example, injecting a payload
that the middlebox processes with a limited TTL will cause the censor?s state to
update without reaching the server, making the middlebox desynchronized. I fore-
see two possible approaches that enable a middlebox to mandate consistent state,
despite the eavesdropper?s dilemma.
First, a middlebox could operate in-path and fail-closed. The idea of a fail-
closed system is straightforward: if the middlebox encounters any packet or request
that it cannot parse, does not match its internal state, or contains ambiguity in its
interpretation, then that packet should not be delivered. In order for a fail-closed
system to be effective, however, it must operate in-path and drop offending traffic:
if the middlebox requires per-flow state to disrupt a connection and its internal state
is incorrect, it will not be able to correctly disrupt the connection. Operating fail-
closed and in-path defends against eavesdropper?s dilemma-based attacks by simply
mandating that only traffic that matches its internal state will be allowed through.
258
Under this model, an attacker is welcome to try to desynchronize the middlebox
from the connection, but in so doing, the attacker will cause the middlebox to drop
the real connection when it does not match any internal state.
Second, a middlebox could normalize the traffic. A defensive traffic normalizer
was first proposed by Vern Paxson et al. in 2001 [84] to defend against packet ma-
nipulation attacks and contend with the eavesdropper?s dilemma. The normalizer?s
goal is to ensure that the state of the middlebox is always consistent with the state
at the end-host. To achieve this, the normalizer modifies network traffic as it goes
by: it overwrites TTL values to ensure packets reach the end-host, drops packets
with incorrect checksums or that will be ignored by the server, etc.
Neither of these approaches can be perfect, however. A key limitation to traffic
normalizing middleboxes is that they cannot know a priori the semantics for a given
connection [84]. A canonical example of this is with the TCP Urgent pointer: if
a client sends the message robot with the urgent pointer pointed to b, depending
on the server?s connection setup, the server may process either root or robot.
If there are other semantics imposed by the application-layer on the underlying
connection as to what bytes should be accepted or not, it is possible that a packet
manipulator could sneak data or a request past the middlebox, even with consistent
state. Inconsistent state issues were responsible for 39/141 of Geneva?s strategies.
If the middlebox does not store consistent state with the end-hosts, it may be vulner-
able to desynchronization attacks.
DNS Censorship Much of this section has focused on TCP-based protocols, and
259
the limitations inherit to reliably censoring these protocols. However, one of the
most important protocols for censors, DNS, runs over UDP. For middleboxes, DNS-
over-UDP requires less complexity to censor compared to any TCP-based protocols,
as the middlebox does not need to track state, reassemble data streams, and more.
The above requirements still hold for DNS censorship. If there are exploitable
bugs present, Geneva may be able to discover a packet modification to evade the
censor. If the packets are not processed completely, Geneva may be able to pad
the packet with innocuous data until the forbidden query is ignored. Finally, if the
middlebox does not mandate that only packets that are completely and correctly
processed should be delivered, Geneva may be able to send a request that is not
correctly parsed by the censor due to RFC ambiguities and subvert censorship. I
presented an example of all three of these scenarios in Chapter 5.
10.2 Does Geneva help the censor?
I report on many circumvention strategies (including those that are likely bugs
in censor implementations) in this dissertation, and discuss what would be required
for a censor to mitigate 100% of the issues in this chapter. Are these requirements a
recipe for censors to follow in the future? Although they would defend against all the
packet manipulation attacks discovered by Geneva and discussed in this dissertation,
actually implementing these changes would likely be exceedingly challenging at scale.
For example, mandating consistent state in the presence of asymmetric routes
and load balancing may be very difficult. As an example, the GFW is currently a
260
fail-open system, and we hypothesized this is the case because they operate many in-
dependent middleboxes in parallel [2]. In this deployment context, every middlebox
must be fail-open, because each middlebox must assume that some other middlebox
may be able to handle any traffic it cannot. In these circumstances, imposing the
requirements stated in this chapter could require a significant re-architecture and
re-implementation of their entire censorship system.
There may also be a high cost to imposing these requirements. For example,
mandating that traffic must be correctly parsed and understood to be delivered
may cause a high degree of collateral damage, as there are a wide variety of server
implementations running in the wild. Storing more state about every connection
than the most stateful end-server may impose a high memory cost.
Therefore, even though a censor may use Geneva to identify bugs and limita-
tions, actually fixing those limitations may be challenging in practice.
Lastly, although Geneva is one mechanism that a censor can use to identify
their own bugs, middlebox manufacturers have access to their own code. Existing
tools have demonstrated that fuzzing can be done significantly faster with code
instrumentation [74], so censors could have been fuzzing their own systems to find
these issues from their initial development. By releasing Geneva open source, we are
democratizing the ability to find bugs and limitations in their censorship systems.
261
Chapter 11: Conclusion and Future Work
In this thesis, I demonstrated that it is possible to automate the discovery of
ways to render middleboxes ineffective at implementing their network policies. I
developed Geneva, a novel genetic algorithm that can learn packet sequence modifi-
cations against a live adversary, and I showed that it could be used to discovery both
new ways to evade censorship (across multiple network protocols and deployment
contexts) and to launch dangerous network attacks. In this chapter, I will speak to
future work in this space.
11.1 Immediate Term Challenges
Before speaking to longer term future work, I will note several challenges for
the immediate term.
TLS Support Although Geneva has support for HTTP and DNS, extending it to
support TLS has the potential for great impact, as the web is increasingly moving
to HTTPS. There are several challenges in adding TLS support. First, the TLS
state machine is significantly more complicated than any of the other protocols that
Geneva supports, dramatically expanding the search space. Techniques to reduce
262
the search space, such as testing a strategy against a local server before testing
it against a live adversary, will likely be required to make the problem tractable.
Second, there are many implementations and versions of TLS in active use. In order
to effectively walk through the entire search space, Geneva must be able to handle
each TLS version and extension, even if those implementations are conflicting. For
example, there have been multiple implementations of TLS 1.3?s Encrypted Client
Hello (ECH) as the standard evolved. Still, with the widespread use of HTTPS,
TLS support would be an impactful direction to explore, and could lend us insights
into how middleboxes themselves have handled TLS?s evolution over time.
Training without client instrumentation Server-side evasion strategies are
easier to deploy than client-side evasion techniques. Unfortunately, the process
of discovering server-side evasion strategies with Geneva has historically required
instrumentation from a client: to make requests with specific parameters to evaluate
each strategy during the evolution process. As a consequence, training is limited
only to those countries within which we can safely procure a vantage point that can
be remotely instrumented.
In the future, it would be impactful if it were possible to train Geneva without
requiring control of the client. Designing such a mechanism has its challenges,
however. First, if Geneva cannot control or instrument the client, the first major
challenge is how to direct traffic that will trigger the censor to its active strategies.
This may require cultivating a dedicated user base of testers, a standalone program
that can generate connections, or by partnering with an existing forbidden server.
263
Second, Geneva benefits from the ability to collect additional information from its
clients while evaluating strategies, such as how the strategy impacted the underlying
connection, in order to inform the fitness function. It is an engineering challenge
to recover this information from the server-side of the connection. Finally, if the
learning algorithm must depend on clients over which it has no control, there may
risk of a sybil attack from the adversary trying to pollute the algorithm?s training
set.
Measuring Middlebox-based Amplification Attacks Already, the middlebox-
based TCP reflected amplification attacks have been detected in the wild [167], but
there is no measurement yet of how these attacks have progressed, who they are
attacking, and who is launching the attacks. In the future, developing a system
to globally monitor for attackers trying to use this attack vector could help us
learn more about how quickly attackers can incorporate and optimize the attack,
and better protect those under attack. I foresee two principle ways that we can
detect attackers using this threat vector: during the attacker?s discovery phase or
during the attack phase. During the discovery phase, the attacker must find and
discover potential amplifiers on the Internet, which requires Internet scanning. In
the future, we can develop tools to detect these Internet-wide scans to determine
who is scanning to identify potential amplifiers. Such a system could even respond
to these scans with a modest (but bandwidth constrained) amplification amount,
so that the system also gets included in the attacker?s attack phase. In the attack
phase, we can develop tools to detect the fingerprints of middlebox responses and
264
partner with organizations that have a wide network view to detect and measure
live attacks.
Understanding the True Limits of Automating Evasion The eavesdrop-
per?s dilemma suggests some fundamental limitations for middleboxes, and helps
to inform the limits of this approach [50]. However, it is unknown the true limits
of this approach: is it the case that in order for any middlebox system to render
binary censorship decisions at line-speed will necessarily incur one of the weaknesses
described in Chapter 10? Defining a formalization for middlebox network functions
might allow us to formally reason about whether every type of network middlebox
will be vulnerable to packet manipulation attacks (and if so, if those attacks can be
automatically discovered deterministically).
11.2 Long Term Challenges
Contending with Adversarial Systems Today, Geneva?s adversaries are rela-
tively static while it is training. Censors may make changes or deploy new systems
over time, but in the timespan of the hours that Geneva is training, to the best of
my knowledge, the functionality of censors is static. This means that the censor
does not adapt to what Geneva is doing in real time. In the future however, mid-
dleboxes and censors may take a more adversarial role during the training process,
and directly try to interfere with Geneva?s training.
For example, if a censor were to identify hosts training Geneva, they could
apply different network policies, or change their network policies dynamically to
265
pollute Geneva?s training data. Alternatively, a censor could simply cut Geneva off
from the network entirely. None of the three existing automated tools for discov-
ering evasion strategies (SymTCP, Alembic, and Geneva) are designed to handle a
dynamic adversary that changes at runtime.
Designing a new algorithm that is equipped to handle a dynamic adversary
is challenging. Ideally, a learning algorithm hardened to work against an active
adversary would need to escape identification while running, blending into normal
network traffic.
Preparing for the Next Censorship Arms Race Various activists I work
with have warned of several troubling future censorship capabilities against which
the anti-censorship community is not prepared. Sophisticated techniques like throt-
tling instead of outright blocking, and using machine learning to fingerprint anti-
censorship protocols require us to reconsider how we evade censorship. Moreover,
some evidence points to countries like China personalizing what content gets cen-
sored based on a user?s occupation or social credit score. This will require a complete
redesign of how we approach censorship measurement: no longer will it suffice to say
that a site is blocked, we will have to understand for whom a site is blocked. Ad-
vances like these point to an even greater need for automated techniques to measure
and circumvent censorship. Personalized censorship may require personalized eva-
sion, but one of the challenges I foresee is that training could put users at risk against
an aggressive adversary. Developing new ways surreptitiously and collaboratively
train in a federated manner could enable users safely learn from one another.
266
To support future researchers in taking on these challenging problems, I have
made my dissertation?s various artifacts publicly available at:
https://geneva.cs.umd.edu.
267
Bibliography
[1] Kevin Bock, George Hughey, Xiao Qiang, and Dave Levin. Geneva: Evolv-
ing Censorship Evasion Strategies. In ACM Conference on Computer and
Communications Security (CCS), 2019.
[2] Kevin Bock, George Hughey, Louis-Henri Merino, Tania Arya, Daniel Liscin-
sky, Regina Pogosian, and Dave Levin. Come as You Are: Helping Unmodified
Clients Bypass Censorship with Server-Side Evasion. In ACM SIGCOMM,
2020.
[3] Kevin Bock, Yair Fax, Kyle Reese, Jasraj Singh, and Dave Levin. Detecting
and Evading Censorship-in-Depth: A Case Study of Iran?s Protocol Whitelis-
ter. In USENIX Workshop on Free and Open Communications on the Internet
(FOCI), 2020.
[4] Kevin Bock, Gabriel Naval, Kyle Reese, and Dave Levin. Even Censors Have a
Backup: Examining China?s Double HTTPS Censorship System. In USENIX
Workshop on Free and Open Communications on the Internet (FOCI), 2021.
[5] Kevin Bock, Abdulrahman Alaraj, Yair Fax, Kyle Hurley, Eric Wustrow, and
Dave Levin. Weaponizing Middleboxes for TCP Reflected Amplification. In
USENIX Annual Technical Conference, 2021.
[6] Kevin Bock, Pranav Bharadwaj, Jasraj Singh, and Dave Levin. Your Censor is
My Censor: Weaponizing Censorship Infrastructure for Availability Attacks.
In USENIX Workshop on Offensive Technologies (WOOT), 2021.
[7] Zimo Chai, Amirhossein Ghafari, and Amir Houmansadr. On the Importance
of Encrypted-SNI (ESNI) to Censorship Circumvention. In USENIX Work-
shop on Free and Open Communications on the Internet (FOCI), 2019.
[8] P. Mockapetris. RFC 1035, 1987. https://datatracker.ietf.org/doc/
html/rfc1035.
[9] Will Scott. A Secure, Practical & Safe Packet Spoofing Service. 2017.
268
[10] Reporters Without Borders. Enemies of the Internet 2013, Re-
port. http://surveillance.rsf.org/en/wp-content/uploads/sites/2/
2013/03/enemies-of-the-internet_2013.pdf, March 2013.
[11] Amirr Houmansadr, Chad Brubaker, and Vitaly Shmatikov. The Parrot is
Dead: Observing Unobservable Network Communications. In IEEE Sympo-
sium on Security and Privacy, 2013.
[12] Roya Ensafi, David Fifield, Philipp Winter, Nick Feamster, Nicholas Weaver,
and Vern Paxson. Examining How the Great Firewall Discovers Hidden Cir-
cumvention Servers. In ACM Internet Measurement Conference (IMC), 2015.
[13] CAIDA IODA (Internet Outage Detection and Analysis). https://ioda.
caida.org/.
[14] Xueyang Xu, Morley Mao, and J. Alex Halderman. Internet Censorship in
China: Where Does the Filtering Occur? In Passive and Active Network
Measurement Workshop (PAM), 2011.
[15] Zubair Nabi. The Anatomy of Web Censorship in Pakistan. In USENIX
Workshop on Free and Open Communications on the Internet (FOCI), 2013.
[16] Sheharbano Khattak, Mobin Javed, Philip D. Anderson, and Vern Paxson.
Towards Illuminating a Censorship Monitor?s Model to Facilitate Evasion.
In USENIX Workshop on Free and Open Communications on the Internet
(FOCI), 2013.
[17] Ana Bita Samba Vasilis Ververis, Fadelkon. Women on Web website censored
in Spain. https://blog.magma.lavafeld.org/post/women-on-web-blocking/.
[18] Kai Wang and Wanyuan Song. Peng Shuai: How China censored a tennis
star. https://www.bbc.com/news/59338205.
[19] Dave Levin, Youndo Lee, Luke Valenta, Zhihao Li, Victoria Lai, Cristian
Lumezanu, Neil Spring, and Bobby Bhattacharjee. Alibi Routing. In ACM
SIGCOMM, 2015.
[20] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: The Second-
Generation Onion Router. In USENIX Security Symposium, 2004.
[21] Eric Wustrow, Scott Wolchok, Ian Goldberg, and J. Alex Halderman. Telex:
Anticensorship in the Network Infrastructure. In USENIX Security Sympo-
sium, 2011.
[22] Josh Karlin, Daniel Ellard, Alden W. Jackson, Christine E. Jones, Greg Lauer,
David P. Mankins, and W. Timothy Strayer. Decoy Routing: Toward Un-
blockable Internet Communication. In USENIX Workshop on Free and Open
Communications on the Internet (FOCI), 2011.
269
[23] Fangfan Li, Abbas Razaghpanah, Arash Molavi Kakhki, Arian Akhavan Niaki,
David Choffnes, Phillipa Gill, and Alan Mislove. lib.erate, (n): A library for
exposing (traffic-classification) rules and avoiding them efficiently. In ACM
Internet Measurement Conference (IMC), 2017.
[24] Zhongjie Wang, Yue Cao, Zhiyun Qian, Chengyu Song, and Srikanth V. Kr-
ishnamurthy. Your State is Not Mine: A Closer Look at Evading Stateful
Internet Censorship. In ACM Internet Measurement Conference (IMC), 2017.
[25] Zhihao Li, Stephen Herwig, and Dave Levin. DeTor: Provably Avoiding Ge-
ographic Regions in Tor. In USENIX Security Symposium, 2017.
[26] Richard McPherson, Amir Houmansadr, and Vitaly Shmatikov. CovertCast:
Using Live Streaming to Evade Internet Censorship. In Privacy Enhancing
Technologies Symposium (PETS), 2016.
[27] Max Schuchard, John Geddes, Christopher Thompson, and Nicholas Hopper.
Routing Around Decoys. In ACM Conference on Computer and Communica-
tions Security (CCS), 2012.
[28] Tarun Kumar Yadav, Akshat Sinha, Devashish Gosain, Piyush Kumar
Sharma, and Sambuddho Chakravarty. Where The Light Gets In: Analyz-
ing Web Censorship Mechanisms in India. In ACM Internet Measurement
Conference (IMC), 2018.
[29] Daniel Anderson. Splinternet Behind the Great Firewall of China. Queue,
10(11), November 2006.
[30] Philipp Winter and Stefan Lindskog. How the Great Firewall of China is
Blocking Tor. In USENIX Workshop on Free and Open Communications on
the Internet (FOCI), 2012.
[31] Amir Houmansadr, Chad Brubaker, and Vitaly Shmatikov. The Parrot is
Dead: Observing Unobservable Network Communications. In IEEE Sympo-
sium on Security and Privacy, 2013.
[32] John Geddes, Max Schuchard, and Nicholas Hopper. Cover Your ACKs: Pit-
falls of Covert Channel Censorship Circumvention. In ACM Conference on
Computer and Communications Security (CCS), 2013.
[33] Anonymous, Arian Akhavan Niaki, Nguyen Phong Hoang, Phillipa Gill, and
Amir Houmansadr. Triplet Censors: Demystifying Great Firewall?s DNS Cen-
sorship Behavior. In USENIX Workshop on Free and Open Communications
on the Internet (FOCI), 2020.
[34] Moxie Marlinspike. Doodles, stickers, and censorship circumvention for Sig-
nal Android. https://signal.org/blog/doodles-stickers-censorship/,
2017.
270
[35] Signal. Egypt keeps trying to block Signal, inadvertently blocking all of
Google, and having to stop as a result. We?ll also expand domain fronts.
https://twitter.com/signalapp/status/817062093094604800, 2017.
[36] Kevin Bock, iyouport, Anonymous, Louis-Henri Merino, David Fifield, Amir
Houmansadr, and Dave Levin. Exposing and Circumventing China?s Censor-
ship of ESNI. https://geneva.cs.umd.edu/posts/china-censors-esni/
esni/, 2020.
[37] Robert T. Morris. A Weakness in the 4.2BSD Unix TCP/IP Software. CSTR
117, 1985.
[38] Anonymous. The Collateral Damage of Internet Censorship. ACM SIGCOMM
Computer Communication Review (CCR), 42(3):21?27, 2012.
[39] Rachee Singh, Rishab Nithyanand, Sadia Afroz, Paul Pearce, Michael Carl
Tschantz, Phillipa Gill, and Vern Paxson. Characterizing the Nature and
Dynamics of Tor Exit Blocking. In USENIX Security Symposium, 2017.
[40] Kevin Bock, George Hughey, Xiao Qiang, and Dave Levin. Geneva: Evolving
Censorship Evasion. In ACM Conference on Computer and Communications
Security (CCS), 2019.
[41] Anonymous. Towards a Comprehensive Picture of the Great Firewall?s DNS
Censorship. In USENIX Workshop on Free and Open Communications on the
Internet (FOCI), 2014.
[42] Richard Clayton, Steven J. Murdoch, and Robert N. M. Watson. Ignoring
the Great Firewall of China. In Privacy Enhancing Technologies Symposium
(PETS), 2006.
[43] Yue Cao, Zhiyun Qian, Zhongjie Wang, Tuan Dao, Srikanth V. Krishna-
murthy, and Lisa M. Marvel. Off-Path TCP Exploits: Global Rate Limit
Considered Dangerous. In USENIX Security Symposium, 2016.
[44] Dan Kaminsky. It?s The End of the Cache As We Know It. http://kurser.
lobner.dk/dDist/DMK_BO2K8.pdf, 2008.
[45] Philipp Winter. brdgrd (Bridge Guard). https://github.com/
NullHypothesis/brdgrd, 2012.
[46] Claudio Agosti and Giovanni Pellerano. SniffJoke: transparent TCP connec-
tion scrambler. https://github.com/vecna/sniffjoke, 2011.
[47] Eric Wustrow, Colleen M. Swanson, and J. Alex Halderman. TapDance: End-
to-Middle Anticensorship without Flow Blocking. In USENIX Annual Tech-
nical Conference, 2014.
271
[48] Hooman Mohajeri Moghaddam, Baiyu Li, Mohammad Derakhshani, and Ian
Goldberg. SkypeMorph: Protocol Obfuscation for Tor Bridges. In ACM
Conference on Computer and Communications Security (CCS), 2012.
[49] Zachary Weinberg, Jeffrey Wang, Vinod Yegneswaran, Linda Briesemeister,
Steven Cheung, Frank Wang, and Dan Boneh. StegoTorus: A Camouflage
Proxy for the Tor Anonymity System. In ACM Conference on Computer and
Communications Security (CCS), 2012.
[50] Eric Cronin, Micah Sherr, and Matthew Blaze. The Eavesdropper?s Dilemma,
2006.
[51] Kei Yin Ng, Anna Feldman, and Chris Leberknight. Detecting Censorable
Content on Sina Weibo: A Pilot Study. In Hellenic Conference on Artificial
Intelligence (SETN), 2018.
[52] Benjamin VanderSloot, Allison McDonald, Will Scott, J. Alex Halderman,
and Roya Ensafi. Quack: Scalable Remote Measurement of Application-Layer
Censorship. In USENIX Security Symposium, 2018.
[53] Paul Pearce, Ben Jones, Frank Li, Roya Ensafi, Nick Feamster, Nick Weaver,
and Vern Paxson. Global Measurement of DNS Manipulation. In USENIX
Security Symposium, 2017.
[54] Roya Ensafi. CensoredPlanet Raw Data.
https://censoredplanet.org/data/raw.
[55] Simurgh Aryan, Homa Aryan, and J. Alex Halderman. Internet Censorship in
Iran: A First Look. In USENIX Workshop on Free and Open Communications
on the Internet (FOCI), 2013.
[56] Jill Jermyn and Nicholas Weaver. Autosonda: Discovering Rules and Triggers
of Censorship Devices. In USENIX Workshop on Free and Open Communi-
cations on the Internet (FOCI), 2017.
[57] Thomas H. Ptacek and Timothy N. Newsham. Insertion, Evasion, and Denial
of Service: Eluding Network Intrusion Detection. In Secure Networks, 1998.
[58] David Fifield, Chang Lan, Rod Hynes, Percy Wegmann, and Vern Paxson.
Blocking-resistant communication through domain fronting. In Privacy En-
hancing Technologies Symposium (PETS), 2015.
[59] Tod Beardsley and Jin Qian. The TCP Split Handshake: Practical Effects on
Modern Network Equipment. Network Protocols and Algorithms, 2(1):197?
217, 2010.
[60] Wenxuan Zhou, Amir Houmansadr, Matthew Caesar, and Nikita Borisov.
SWEET: Serving the Web by Exploiting Email Tunnels. In Privacy Enhancing
Technologies Symposium (PETS), 2013.
272
[61] Paul Vines and Tadayoshi Kohno. Rook: Using Video Games as a Low-
Bandwidth Censorship Resistant Communication Platform. In Workshop on
Privacy in the Electronic Society (WPES), 2015.
[62] Amir Houmansadr, Thomas Riedl, Nikita Borisov, and Andrew Singer.
IP over Voice-over-IP for censorship circumvention. In arXiv preprint
arXiv:1207.2683, 2012.
[63] Brandon Wiley. Dust: A Blocking-Resistant Internet Transport Protocol.
http://blanu.net/Dust.pdf.
[64] David Fifield. Threat modeling and circumvention of Internet censorship. In
PhD thesis, 2017.
[65] David Fifield, Nate Hardison, Jonathan Ellithorpe, Emily Stark, Dan Boneh,
Roger Dingledine, and Phil Porras. Evading Censorship with Browser-Based
Proxies. In Privacy Enhancing Technologies Symposium (PETS), 2012.
[66] Daniel Ellard, Christine Jones, Victoria Manfredi, W. Timothy Strayer, Bishal
Thapa, Megan Van Welie, and Alden Jackson. Rebound: Decoy routing on
asymmetric routes via error messages. 2015.
[67] Amir Houmansadr, Giang T. K. Nguyen, Matthew Caesar, and Nikita Borisov.
Cirripede: Circumvention Infrastructure using Router Redirection with Plau-
sible Deniability. In ACM Conference on Computer and Communications
Security (CCS), 2011.
[68] Dave Levin, Youndo Lee, Luke Valenta, Zhihao Li, Victoria Lai, Cristian
Lumenzanu, Neil Spring, and Bobby Bhattacharjee. Alibi Routing. In ACM
SIGCOMM, 2015.
[69] Qiyan Wang, Xun Gong, Giang T.K. Nguyen, Amir Houmansadr, and Nikita
Borisov. CensorSpoofer: Asymmetric communication using IP Spoofing for
Censorship-resistant Web Browsing. In ACM Conference on Computer and
Communications Security (CCS), 2012.
[70] Zhongjie Wang, Shitong Zhu, Yue Cao, Zhiyun Qian, Chengyu Song,
Srikanth V. Krishnamurthy, Kevin S. Chan, and Tracy D. Braun. SymTCP:
Eluding Stateful Deep Packet Inspection with Automated Discrepancy Discov-
ery. In Network and Distributed System Security Symposium (NDSS), 2020.
[71] Kevin Bock, Yair Fax, Kyle Reese, Jasraj Singh, and Dave Levin. Detecting
and Evading Censorship-in-Depth: A Case Study of Iran?s Protocol Whitelis-
ter. In USENIX Workshop on Free and Open Communications on the Internet
(FOCI), 2020.
[72] Soo-Jin Moon, Jeffrey Helt, Yifei Yuan, Yves Bieri, Sujata Banerjee, Vyas
Sekar, Wenfei Wu, Mihalis Yannakakis, and Ying Zhang. Alembic: Automated
273
Model Inference for Stateful Network Functions. In Symposium on Networked
Systems Design and Implementation (NSDI), 2019.
[73] George T. Klees, Andrew Ruef, Benjamin Cooper, Shiyi Wei, and Michael
Hicks. Evaluating Fuzz Testing. In ACM Conference on Computer and Com-
munications Security (CCS), 2018.
[74] American Fuzzy Lop. http://lcamtuf.coredump.cx/afl/.
[75] Scott Michael Seal. Optimizing Web Application Fuzzing with Genetic Algo-
rithms and Language Theory. In Master of Science Thesis, 2016.
[76] Li Haifeng, Wang Shaolei, Zhang Bin, Shuai Bo, and Tang Chaojing. Net-
work protocol security testing based on fuzz. In International Conference on
Computer Science and Network Technology (ICCSNT), 2015.
[77] Gitlab. Gitlab Protocol Fuzzer Community Edition, 2021. https://gitlab.
com/gitlab-org/security-products/protocol-fuzzer-ce.
[78] Xavi Mendez. WFuzz: The Web Fuzzer, 2020. wfuzz.io.
[79] Spandan Veggalam, Sanjay Rawat, Istvan Haller, and Herbert Bos. IFuzzer:
An Evolutionary Interpreter Fuzzer using Genetic Programming. In European
Symposium on Research in Computer Security (ESORICS), 2016.
[80] Lawrence Davis. Handbook of genetic algorithms. 1991.
[81] Fe?lix-Antoine Fortin, Franc?ois-Michel De Rainville, Marc-Andre? Gardner,
Marc Parizeau, and Christian Gagne?. DEAP: Evolutionary algorithms made
easy. Journal of Machine Learning Research, 13:2171?2175, July 2012.
[82] NetFilter. https://netfilter.org.
[83] Dirk Merkel. Docker: Lightweight Linux Containers for Consistent Develop-
ment and Deployment. Linux Journal, 239(2), 2014.
[84] Mark Handley, Vern Paxson, and Christian Kreibich. Network Intrusion De-
tection: Evasion, Traffic Normalization, and End-To-End Protocol Semantics.
In USENIX Security Symposium, 2001.
[85] Scapy. https://scapy.net.
[86] Tarun Kumar Yadav, Akshat Sinha, Devashish Gosain, Piyush Kumar
Sharma, and Sambuddho Chakravarty. Where The Light Gets In: Analyz-
ing Web Censorship Mechanisms in India. In ACM Internet Measurement
Conference (IMC), 2018.
[87] Censorship of Alexa Top 1000 Domains in China. https://en.greatfire.
org/search/alexa-top-1000-domains, 2019.
274
[88] Ram Sundara Raman, Leonid Evdokimov, Eric Wustrow, Alex Halder-
man, and Roya Ensafi. Kazakhstan?s HTTPS Interception. https://
censoredplanet.org/kazakhstan, 2019.
[89] Kazakhstan?s HTTPS Interception Live! https://censoredplanet.org/
kazakhstan/live, 2019.
[90] Sam Burnett and Nick Feamster. Encore: Lightweight Measurement of Web
Censorship with Cross-Origin Requests. In ACM SIGCOMM, 2015.
[91] Roger Dingledine. Obfsproxy: the next step in the censorship arms
race. https://blog.torproject.org/obfsproxy-next-step-censorship-
arms-race, 2012.
[92] Sigal Samuel. China is installing a secret surveillance app on tourists?
phones. https://www.vox.com/future-perfect/2019/7/3/20681258/
china-uighur-surveillance-app-tourist-phone, 2019.
[93] Philipp Winter and Stefan Lindskog. How the Great Firewall of China is
Blocking Tor. In USENIX Workshop on Free and Open Communications on
the Internet (FOCI), 2012.
[94] agrabeli. Internet Censorship in Iran: Findings from 2014-2017.
https://blog.torproject.org/internet-censorship-iran-findings-
2014-2017, 2017.
[95] Li Yuan. A Generation Grows Up in China Without Google, Facebook
or Twitter. https://www.nytimes.com/2018/08/06/technology/china-
generation-blocked-internet.html, 2018.
[96] TelegramMessenger. MTProxy. https://github.com/TelegramMessenger/
MTProxy, 2019.
[97] Inc. The Tor Project. Tor Project: Bridges. https://2019.www.torproject.
org/docs/bridges.html.en.
[98] fqrouter. Detailed GFW?s three blocking methods for SMTP
protocol. https://web.archive.org/web/20151121091522/http:
//fqrouter.tumblr.com/post/43400982633/%E8%AF%A6%E8%BF%B0gfw%
E5%AF%B9smtp%E5%8D%8F%E8%AE%AE%E7%9A%84%E4%B8%89%E7%A7%8D%E5%B0%
81%E9%94%81%E6%89%8B%E6%B3%95, 2015.
[99] DNS Transport over TCP - Implementation Requirements. RFC 7766, RFC
Editor, March 2016.
[100] Transmission Control Protocol. RFC 793, RFC Editor, September 1981.
[101] Adrienne Porter Felt, Richard Barnes, April King, Chris Palmer, Chris
Bentzel, and Parisa Tabriz. Measuring HTTPS Adoption on the Web. In
USENIX Security Symposium, 2017.
275
[102] Chromium Development Team. A safer default for navigation:
HTTPS. https://blog.chromium.org/2021/03/a-safer-default-for-
navigation-https.html, 2020.
[103] Cloudflare. Cloudflare Radar: Up to date Internet trends and insight. https:
//radar.cloudflare.com/cn?date_filter=last_30_days, 2022.
[104] CitizenLab. URL testing lists intended for discovering website censorship.
https://github.com/citizenlab/test-lists/, 2022.
[105] wkrp. HTTPS MITM of various GitHub IP addresses in China. https:
//github.com/net4people/bbs/issues/27, 2020.
[106] Ram Sundara Raman, Leonid Evdokimov, Eric Wustrow, Alex Halder-
man, and Roya Ensafi. Kazakhstan?s HTTPS Interception. https://
censoredplanet.org/kazakhstan, 2019.
[107] Ram Sundara Raman, Leonid Evdokimov, Eric Wustrow, Alex Halderman,
and Roya Ensafi. Investigating Large Scale HTTPS Interception in Kaza-
khstan. In ACM Internet Measurement Conference (IMC), 2020.
[108] Bahruz Jabiyev, Steven Sprecher, Kaan Onarlioglu, and Engin Kirda. T-Reqs:
HTTP Request Smuggling with Differential Fuzzing. In ACM Conference on
Computer and Communications Security (CCS), 2021.
[109] RFC 2616, 1999. https://datatracker.ietf.org/doc/html/rfc2616.
[110] Roy Fielding and Julian Reschke. RFC 7230, 2014. https://www.rfc-
editor.org/rfc/rfc7230.html.
[111] Roy Fielding and Julian Reschke. RFC 7231, 2014. https://www.rfc-
editor.org/rfc/rfc7231.html.
[112] Roy Fielding and Julian Reschke. RFC 7232, 2014. https://www.rfc-
editor.org/rfc/rfc7232.html.
[113] Roy Fielding, Yves Lafon, and Julian Reschke. RFC 7233, 2014. https:
//www.rfc-editor.org/rfc/rfc7233.html.
[114] Roy Fielding, Mark Nottingham, and Julian Reschke. RFC 7234, 2014. https:
//www.rfc-editor.org/rfc/rfc7234.html.
[115] Roy Fielding and Julian Reschke. RFC 7235, 2014. https://www.rfc-
editor.org/rfc/rfc7235.html.
[116] Tim Berners-Lee, Roy Fielding, and Larry Masinter. RFC 3986, 2005. https:
//www.rfc-editor.org/rfc/rfc3986.
[117] Usage statistics of web servers, 2020. https://w3techs.com/technologies/
overview/web_server.
276
[118] Web Server Usage Distribution in the Top 1 Million Sites, 2020. https:
//trends.builtwith.com/web-server.
[119] COMMUNITY-LED DEVELOPMENT ?THE APACHE WAY?, 2022.
https://www.apache.org/.
[120] NGINX Part of F5, 2022. https://www.nginx.com/.
[121] Pawel Foremski. Tracking the DNS Stars: The DNS Observatory,
2019. https://www.farsightsecurity.com/blog/txt-record/dnsstars-
20190610/.
[122] Charles Hornig. RFC 894, 1984. https://datatracker.ietf.org/doc/
html/rfc894.
[123] CitizenLab. CitizenLab Test Lists. https://github.com/citizenlab/test-
lists, 2020.
[124] Philipp Winter and Jedidiah R. Crandall. The Great Firewall of China: How
It Blocks Tor and Why It Is Hard to Pinpoint. ;login:, 37(6), 2012.
[125] Paul Pearce, Ben Jones, Frank Li, Roya Ensafi, Nick Feamster, Nick Weaver,
and Vern Paxson. Global-Scale Measurement of DNS Manipulation. In
USENIX Security Symposium, 2017.
[126] Ram Sundara Raman, Adrian Stoll, Jakub Dalek, Armin Sarabi, Reethika
Ramesh, Will Scott, and Roya Ensafi. Measuring the deployment of network
censorship filters at global scale. In Network and Distributed System Security
Symposium (NDSS), 2020.
[127] Arian Niaki, Shinyoung Cho, Zachary Weinberg, Nguyen Hoang, Abbas Raza-
ghpanah, Nicolas Christin, and Phillipa Gill. ICLab: A Global, Longitudinal
Internet Censorship Measurement Platform. In IEEE Symposium on Security
and Privacy, 2020.
[128] OONI: Open Observatory of Network Interference. https://ooni.org/.
[129] CAIDA IODA: Internet Outage Detection and Analysis. https://ioda.
caida.org/.
[130] Collin Anderson. Dimming the Internet: Detecting Throttling as a Mechanism
of Censorship in Iran. In arXiv preprint arXiv:1306.4361, 2013.
[131] Paul Mockapetris. Domain Names - Implementation and Specification. https:
//tools.ietf.org/html/rfc1035, November 1987. RFC 1035.
[132] J. Dickinson, S. Dickinson, R. Bellis, A. Mankin, and D. Wessels. DNS Trans-
port over TCP - Implementation Requirements. https://tools.ietf.org/
html/rfc7766, March 2016. RFC 7766.
277
[133] T. Dierks and E. Rescorla. The Transport Layer Security (TLS) Protocol:
Version 1.2. https://tools.ietf.org/html/rfc5246, August 2008. RFC
5246.
[134] Zhongjie Wang, Shitong Zhu, Yue Cao, Zhiyun Qian, Chengyu Song,
Srikanth V. Krishnamurthy, Kevin S. Chan, and Tracy D. Braun. SymTCP:
Eluding Stateful Deep Packet Inspection with Automated Discrepancy Discov-
ery. In Network and Distributed System Security Symposium (NDSS), 2020.
[135] Kevin Bock, Pranav Bharadwaj, Jasraj Singh, and Dave Levin. Your censor
is my censor: Weaponizing censorship infrastructure for availability attacks.
In USENIX Workshop on Offensive Technologies (WOOT), 2021.
[136] Catalin Cimpanu. Russia wants to ban the use of secure protocols such as
TLS 1.3, DoH, DoT, ESNI. https://www.zdnet.com/article/russia-
wants-to-ban-the-use-of-secure-protocols-such-as-tls-1-3-doh-
dot-esni/, 2020.
[137] Xueyang Xu, Z. Morley Mao, and J. Alex Halderman. ?Internet Censorship in
China: Where Does the Filtering Occur??. In Neil Spring and George F. Riley,
editors, Passive and Active Measurement, pages 133?142, Berlin, Heidelberg,
2011. Springer Berlin Heidelberg.
[138] Russia Censoring Omitted SNI. https://github.com/net4people/bbs/
issues/10, 2019.
[139] Christian Rossow. Amplification Hell: Revisiting Network Protocols for DDoS
Abuse. In Network and Distributed System Security Symposium (NDSS), 2014.
[140] Kulvinder Singh and Ajit Singh. Memcached DDoS Exploits: Operations,
Vulnerabilities, Preventions and Mitigations. 2018.
[141] UDP-Based Amplification Attacks: Alert (TA14-017A). National Cyber
Awareness System Alerts, January 2014. https://www.us-cert.gov/ncas/
alerts/TA14-017A.
[142] CVE-2018-1000115: Memcached version 1.5.5. National Vulnerability
Database, March 2018. http://nvd.nist.gov/nvd.cfm?cvename=CVE-
2018-1000115.
[143] Sam Kottler. February 28th DDoS incident report. https://github.blog/
2018-03-01-ddos-incident-report/, Mar 2018.
[144] Ben Jones, Tzu-Wen Lee, Nick Feamster, and Phillipa Gill. Automated De-
tection and Fingerprinting of Censorship Block Pages. In ACM Internet Mea-
surement Conference (IMC), 2014.
278
[145] Zakir Durumeric, Eric Wustrow, and J. Alex Halderman. ZMap: Fast Internet-
wide Scanning and its Security Applications. In USENIX Security Symposium,
2013.
[146] Marc Ku?hrer, Thomas Hupperich, Christian Rossow, and Thorsten Holz. Exit
from Hell? Reducing the Impact of Amplification DDoS Attacks. In USENIX
Security Symposium, 2014.
[147] Marc Ku?hrer, Thomas Hupperich, Christian Rossow, and Thorsten Holz. Hell
of a Handshake: Abusing TCP for Reflective Amplification DDoS Attacks. In
USENIX Security Symposium, 2014.
[148] Jakub Czyz, Michael Kallitsis, Manaf Gharaibeh, Christos Papadopoulos,
Michael Bailey, and Manish Karir. Taming the 800 Pound Gorilla: The Rise
and Decline of NTP DDoS Attacks. In ACM Internet Measurement Confer-
ence (IMC), 2014.
[149] Robert Beverly and Steven Bauer. The Spoofer Project: inferring the Extent
of Source Address Filtering on the Internet. In USENIX Workshop on Steps
to Reducing Unwanted Traffic on the Internet (SRUTI), 2005.
[150] The Spoofer Project: State of IP Spoofing. https://spoofer.caida.org/
summary.php.
[151] Vern Paxson. End-to-End Routing Behavior in the Internet. In ACM SIG-
COMM, 1996.
[152] Rob Sherwood, Bobby Bhattacharjee, and Ryan Braud. Misbehaving TCP
Receivers Can Cause Internet-Wide Congestion Collapse. In ACM Conference
on Computer and Communications Security (CCS), 2005.
[153] Bill Marczak, Nicholas Weaver, Jakub Dalek, Roya Ensafi, David Fifield,
Sarah McKune, Arn Rey, John Scott-Railton, Ron Deibert, and Vern Pax-
son. An Analysis of China?s ?Great Cannon?. In USENIX Workshop on Free
and Open Communications on the Internet (FOCI), 2015.
[154] Marios Anagnostopoulos, Georgios Kambourakis, Panagiotis Kopanos, Geor-
gios Louloudakis, and Stefanos Gritzalis. DNS Amplification Attack Revisited.
Computers & Security, 39(B):475?485, November 2013.
[155] Bingshuang Liu, Skyler Berg, Jun Li, Tao Wei, Chao Zhang, and Xinhui Han.
The Store-and-Flood Distributed Reflective Denial of Service Attack. 2014.
[156] Matthew Sargent, John Kristoff, Vern Paxson, and Mark Allman. On the Po-
tential Abuse of IGMP. ACM SIGCOMM Computer Communication Review
(CCR), 47(1), 2017.
279
[157] Ram Sundara Raman, Prerana Shenoy, Katharina Kohls, and Roya Ensafi.
Censored Planet: An Internet-wide, Longitudinal Censorship Observatory. In
ACM Conference on Computer and Communications Security (CCS), 2020.
[158] Citizen Lab. Block test list. https://github.com/citizenlab/test-lists.
[159] MaxMind. GeoLite2. https://dev.maxmind.com/geoip/geoip2/geolite2,
2020.
[160] Freedom House. Freedom in the world report. https://freedomhouse.org/
countries/freedom-world/scores.
[161] Arturo Filasto and Jacob Appelbaum. OONI: Open Observatory of Network
Interference. In USENIX Workshop on Free and Open Communications on
the Internet (FOCI), 2012.
[162] Matthew Prince. The DDoS That Almost Broke the Internet. Cloud-
flare Blog, March 2013. https://blog.cloudflare.com/the-ddos-that-
almost-broke-the-internet/.
[163] Gordon Lyon. nmap. https://nmap.org/.
[164] Paul Pearce, Ben Jones, Frank Li, Nick Feamster, Nick Weaver, and Vern
Paxson. Global Measurement of DNS Manipulation. In USENIX Annual
Technical Conference, 2017.
[165] Craig Partridge and Mark Allman. Addressing ethical considerations in net-
work measurement papers. In NS Ethics@ SIGCOMM, 2015.
[166] Let?s Encrypt Stats. Percentage of Web Pages Loaded by Firefox Using
HTTPS. https://letsencrypt.org/stats/#percent-pageloads, 2018.
[167] TCP Middlebox Reflection: Coming to a DDoS Near You, 2022. https:
//www.akamai.com/blog/security/tcp-middlebox-reflection.
[168] Roya Ensafi, Philipp Winter, Abdullah Mueen, and Jedidiah R. Crandall.
Analyzing the Great Firewall of China Over Space and Time. In Privacy
Enhancing Technologies Symposium (PETS), 2015.
[169] Daiyuu Nobori and Yasushi Shinjo. VPN Gate: A Volunteer-Organized Pub-
lic VPN Relay System with Blocking Resistance for Bypassing Government
Censorship Firewalls. In Symposium on Networked Systems Design and Im-
plementation (NSDI), 2014.
[170] Yue Cao, Zhiyun Qian, Zhongjie Wang, Tuan Dao, Srikanth V. Krishna-
murthy, and Lisa M. Marvel. Off-Path TCP Exploits: Global Rate Limit
Considered Dangerous. In USENIX Security Symposium, 2016.
[171] Yossi Gilad and Amir Herzberg. Off-Path Attacking the Web. In USENIX
Workshop on Offensive Technologies (WOOT), 2012.
280
[172] Florian Adamsky, Syed Ali Khayam, Rudolf Ja?ger, and Muttukrishnan Ra-
jarajan. P2P File-Sharing in Hell: Exploiting BitTorrent Vulnerabilities to
Launch Distributed Reflective DoS Attacks. In USENIX Workshop on Offen-
sive Technologies (WOOT), 2015.
[173] Jonas Bushart. Optimizing Recurrent Pulsing Attacks using Application-
Layer Amplification of Open DNS Resolvers. In USENIX Workshop on Of-
fensive Technologies (WOOT), 2018.
[174] Jan Beznazwy and Amir Houmansadr. How china detects and blocks shad-
owsocks. In ACM Internet Measurement Conference (IMC), 2020.
[175] Reethika Ramesh Ram, Sundara Raman, Matthew Bernhard, Victor Ongkow-
ijaya, Leonid Evdokimov, Annie Edmundson, S. Sprecher, Muhammad Ikram,
and Roya Ensafi. Decentralized Control: A Case Study of Russia. In Network
and Distributed System Security Symposium (NDSS), 2020.
281