ABSTRACT
Title of dissertation: TRACE OBLIVIOUS PROGRAM EXECUTION
Chang Liu, Doctor of Philosophy, 2016
Dissertation directed by: Professor Michael Hicks
Department of Computer Science, University of Maryland
and Professor Elaine Shi
Department of Computer Science, Cornell University
The big data era has dramatically transformed our lives; however, security
incidents such as data breaches can put sensitive data (e.g. photos, identities,
genomes) at risk. To protect users’ data privacy, there is a growing interest in
building secure cloud computing systems, which keep sensitive data inputs hidden,
even from computation providers. Conceptually, secure cloud computing systems
leverage cryptographic techniques (e.g., secure multiparty computation) and trusted
hardware (e.g. secure processors) to instantiate a secure abstract machine consisting
of a CPU and encrypted memory, so that an adversary cannot learn information
through either the computation within the CPU or the data in the memory. Unfor-
tunately, evidence has shown that side channels (e.g. memory accesses, timing, and
termination) in such a secure abstract machine may potentially leak highly sensitive
information, including cryptographic keys that form the root of trust for the secure
systems.
This thesis broadly expands the investigation of a research direction called
trace oblivious computation, where programming language techniques are employed
to prevent side channel information leakage. We demonstrate the feasibility of
trace oblivious computation, by formalizing and building several systems, including
GhostRider, which is a hardware-software co-design to provide a hardware-based
trace oblivious computing solution, SCVM, which is an automatic RAM-model se-
cure computation system, and ObliVM, which is a programming framework to facili-
tate programmers to develop applications. All of these systems enjoy formal security
guarantees while demonstrating a better performance than prior systems, by one to
several orders of magnitude.
TRACE OBLIVIOUS PROGRAM EXECUTION
by
Chang Liu
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2016
Advisory Committee:
Professor Michael W. Hicks, Co-Chair/Co-Advisor
Professor Elaine Shi, Co-Chair/Co-Advisor
Professor Charalampos Papamanthou
Professor Zia Khan
Professor Lawrence C. Washington
c© Copyright by
Chang Liu
2016
Dedication
To my loving mother.
ii
Acknowledgments
I am very fortunate to have spent three fantastic years at UMD. Through-
out my journey to learn how to do research, many people have helped me. This
dissertation would not have been possible without them.
First, I want to thank my advisors, Elaine Shi and Michael Hicks, for their
guidance to this amazing academic world. Through collaborating with them, Elaine
and Mike have set up examples for me with both precept and practice. I not only
benefited from insightful technical discussions with them, but also learnt a lot about
other aspects of research: how to pick an important problem to work on, how to
criticize a work, especially the one from myself, and so on.
I have learnt from many people that a piece of research is an execution of a
great taste: I have learnt from Elaine a great taste of choosing research problems and
pushing them closer to perfection. I have learnt from Mike a great taste to develop
elegant theories to explain phenomenon and to communicate the ideas precisely and
concisely. I also have learnt from Jon Froehlich, my proposal committee member, a
great taste of giving presentation, and I have learnt from Dawn Song, who I worked
closely in my last year, a great taste of envisioning novel research directions. I am
grateful that I can work with these fantastic researchers during my PhD life.
My committee members – my advisor, Babis Papamanthou, Zia Khan, and
Larry Washington – have been extremely supportive. My thanks also go to all other
members of the MC2 faculty. I am also fortunate to be surrounded by a diverse and
inspiring group of (former and current) students at MC2.
iii
I am fortunate to work with my collaborators. Many ideas in this dissertation
cannot be executed so well without their help. They are my advisors, Xiao Wang,
Natik Kayak, Yan Huang, Martin Maas, Austin Harris, Mohit Tiwari and Jonathan
Katz without an order.
Now, I would like to thank some dear friends. Xi Chen, Yuening Hu, Ke Zhai,
He He, and I have spent a wonderful summer at Microsoft Research at Redmond,
which is the most enjoyable summer during my PhD life. I am fortunate to be
friends of Yulu Wang and Shangfu Peng, who made my early graduate life happy
and substantial. Qian Wu and Hang Hu are my old friends who always support
me on both life and research. Xi Yi and Yu Zhang brought both happiness and
insightful suggestions into my life. I am happy to become the introducer of the
couple to meet each other.
Life at UMD would have been much more difficult without members of the
administrative and technical staff. Jennifer Story has always been patient and help-
ful beyond the call of duty. Joe Webster has made resources available just as we
needed it. Jodie Gray and Sharron McElroy have minimized bureaucracy when it
comes to tax and money. Thank you all.
Last but in no way the least, I thank my family who have given me uncondi-
tional love throughout my life. My mother, to whom I owe a lot due to my academic
career, has been supporting me altruistically for me to pursuit my dream. Words
are not enough to express my gratitude for them. Finally, I thank Danqi Hu, who
has always been there for me as an inspiring partner. Thank you for steering me to
a better self.
iv
Table of Contents
List of Figures viii
1 Introduction 1
1.1 Beyond Oblivious RAM . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 A hardware-software co-design for ensuring memory trace obliviousness 4
1.3 Automatic RAM-model secure computation . . . . . . . . . . . . . . 5
1.4 A programming framework for secure computation . . . . . . . . . . . 7
1.5 Our Results and Contributions. . . . . . . . . . . . . . . . . . . . . . 8
2 Memory Trace Obliviousness: Basic Setting 11
2.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Motivating Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Approach Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Memory trace obliviousness by typing . . . . . . . . . . . . . . . . . . 16
2.4.1 Operational semantics . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 Memory trace obliviousness . . . . . . . . . . . . . . . . . . . 21
2.4.3 Security typing . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.1 Type checking source programs . . . . . . . . . . . . . . . . . 30
2.5.2 Allocating variables to ORAM banks . . . . . . . . . . . . . . 32
2.5.3 Inserting padding instructions . . . . . . . . . . . . . . . . . . 33
2.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 37
2.7 Conclusion Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 GhostRider: A Compiler-Hardware Approach 39
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1.1 Our Results and Contributions. . . . . . . . . . . . . . . . . . 40
3.2 Architecture and Approach . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.1 Motivating example . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.2 Threat model . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
v
3.2.3 Architectural Overview . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Formalizing the target language . . . . . . . . . . . . . . . . . . . . . 46
3.3.1 Instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Security by typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.1 Memory Trace Obliviousness . . . . . . . . . . . . . . . . . . . 50
3.4.2 Typing: Preliminaries . . . . . . . . . . . . . . . . . . . . . . 53
3.4.3 Type rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4.4 Security theorem . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5.1 Source Language . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5.2 Memory bank allocation . . . . . . . . . . . . . . . . . . . . . 65
3.5.3 Basic compilation . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.4 Padding and register allocation . . . . . . . . . . . . . . . . . 68
3.6 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . 69
3.7 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4 RAM-model Secure Computation 79
4.1 Technical Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2 Background: RAM-Model Secure Computation . . . . . . . . . . . . 83
4.3 Technical Overview: Compiling for RAM-Model Secure Computation 87
4.3.1 Instruction-Trace Obliviousness . . . . . . . . . . . . . . . . . 87
4.3.2 Memory-Trace Obliviousness . . . . . . . . . . . . . . . . . . . 88
4.3.3 Mixed-Mode Execution . . . . . . . . . . . . . . . . . . . . . . 89
4.3.4 Example: Dijkstra’s Algorithm . . . . . . . . . . . . . . . . . 90
4.4 SCVM Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.4.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.4.2 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.4.3 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.4.4 Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.4.5 From SCVM Programs to Secure Protocols . . . . . . . . . . 109
4.5 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.6.1 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . 116
4.6.2 Comparison with Automated Circuits . . . . . . . . . . . . . . 118
4.6.2.1 Repeated sublinear-time queries . . . . . . . . . . . . 119
4.6.2.2 Faster one-time executions . . . . . . . . . . . . . . . 121
4.6.3 Comparison with RAM-SC Baselines . . . . . . . . . . . . . . 124
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5 ObliVM: A Programming Framework for Secure Computation 129
5.1 ObliVM Overview and Contributions . . . . . . . . . . . . . . . . . . 130
5.1.1 Applications and Evaluation . . . . . . . . . . . . . . . . . . . 134
5.1.2 Threat Model, Deployment, and Scope . . . . . . . . . . . . . 135
vi
5.2 Programming Language and Compiler . . . . . . . . . . . . . . . . . 136
5.2.1 Language features for expressiveness and efficiency . . . . . . 136
5.2.2 Language features for security . . . . . . . . . . . . . . . . . . 140
5.3 User-Facing Oblivious Programming Abstractions . . . . . . . . . . . 145
5.3.1 MapReduce Programming Abstractions . . . . . . . . . . . . . 145
5.3.2 Programming Abstractions for Data Structures . . . . . . . . 150
5.3.3 Loop Coalescing and New Oblivious Graph Algorithms . . . . 153
5.4 Implementing Rich Circuit Libraries . . . . . . . . . . . . . . . . . . . 158
5.4.1 Case Study: Basic Arithmetic Operations . . . . . . . . . . . 158
5.4.2 Case Study: Circuit ORAM . . . . . . . . . . . . . . . . . . . 160
5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.5.1 Back End Implementation . . . . . . . . . . . . . . . . . . . . 162
5.5.2 Metrics and Experiment Setup . . . . . . . . . . . . . . . . . . 162
5.5.3 Comparison with Previous Automated Approaches . . . . . . 163
5.5.4 ObliVM vs. Hand-Crafted Solutions . . . . . . . . . . . . . . . 168
5.5.5 End-to-End Application Performance . . . . . . . . . . . . . . 170
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6 Conclusion Remarks and Future Directions 176
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.2 Future Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.2.1 Verifying Hardware ORAM Implementation . . . . . . . . . . 177
6.2.2 Parallel Trace Oblivious Execution . . . . . . . . . . . . . . . 177
6.2.3 Differentially Privately Oblivious Execution . . . . . . . . . . 178
A Proof of Theorem 1 179
A.1 Trace equivalence and lemmas . . . . . . . . . . . . . . . . . . . . . . 179
A.2 Lemmas on trace pattern equivalence . . . . . . . . . . . . . . . . . . 184
A.3 Proof of memory trace obliviousness . . . . . . . . . . . . . . . . . . . 185
B Proof of Theorem 2 195
C Proof of Theorem 3 236
C.1 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
D The hybrid protocol and the proof of Theorem 5 246
Bibliography 256
vii
List of Figures
2.1 Language syntax of Lbasic . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Auxiliary syntax and functions for semantics in Lbasic . . . . . . . . . 17
2.3 Operational semantics of Lbasic . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Trace equivalence in Lbasic . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Auxiliary syntax and functions for typing in Lbasic . . . . . . . . . . . 23
2.6 Typing for Lbasic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Trace pattern equivalence in Lbasic . . . . . . . . . . . . . . . . . . . . 27
2.8 Finding a short padding sequence using the greatest com-
mon subsequence algorithm. An example with two abstract
traces Tt = [T1;T2;T3;T4;T5] and Tf = [T1;T3;T2;T4]. One great-
est common subsequence as shown is [T1;T2;T4]. A shortest common
super-sequence of the two traces is Ttf = [T1;T3;T2;T3;T4;T5]. . . . 34
2.9 Simulation Results for Strawman, Opt 1, and Opt 2. . . . . . . . . . 37
3.1 Motivating source program of GhostRider. . . . . . . . . . . . . . . . 43
3.2 GhostRider architecture. . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Syntax for LGhostRider language, comprising (1) ldb and stb instruc-
tions that move data blocks between scratchpad and a specific ERAM
or ORAM bank, and (2) scratchpad-to-register moves and standard
RISC instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 LGhostRider code implementing (part of) Figure 3.1 . . . . . . . . . . . 49
3.5 Symbolic values, labels, auxiliary judgments and functions . . . . . . 53
3.6 Trace patterns and their equivalence in LGhostRider . . . . . . . . . . . 56
3.7 Security Type System for LGhostRider (Part 1) . . . . . . . . . . . . . . 57
3.8 Security Type System for LGhostRider (Part 2) . . . . . . . . . . . . . . 60
3.9 Security Type System for LGhostRider (Part 3) . . . . . . . . . . . . . . 62
3.10 Simulator-based execution time results of GhostRider. . . . . . . . . . 73
3.11 Legends of Figure 3.10 . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.12 FPGA based execution time results: Slowdown of Baseline and
Final versions compared to non-secure version of the program. Note
that unlike Figure 3.10, Final uses only a single ORAM bank and
conflates ERAM and DRAM (cf. Section 3.6). . . . . . . . . . . . . . 76
viii
4.1 Dijkstra’s shortest distance algorithm in source (Part) . . . . . . . . . 90
4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3 Formal results in SCVM. . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4 Syntax of SCVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.5 Auxiliary syntax and functions for SCVM semantics . . . . . . . . . . 98
4.6 Operational semantics for expressions in SCVM . . . . . . . . . . . . 101
4.7 Operational semantics for statements in SCVM (Part 1) . . . . . . . . 102
4.8 Operational semantics for statements in SCVM (Part 2) . . . . . . . . 103
4.9 Type System for SCVM . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.10 SCVM vs. automated circuit-based approach (Binary Search) . . . . . 118
4.11 SCVM vs. hand-constructed linear scan circuit (Binary Search) . . . . 119
4.12 Heap insertion in SCVM . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.13 Heap extraction in SCVM . . . . . . . . . . . . . . . . . . . . . . . . 122
4.14 KMP string matching for median n (fixing m = 50) in SCVM . . . . . 123
4.15 KMP string matching for large n (fixing m = 50) in SCVM . . . . . . 124
4.16 Dijkstra’s shortest-path algorithm’s performance in SCVM . . . . . . 125
4.17 Dijkstra’s shortest-path algorithm’s speedup of SCVM . . . . . . . . . 125
4.18 Aggregation over sliding windows’s performance in SCVM . . . . . . . 126
4.19 Aggregation over sliding windows’s speedup in SCVM . . . . . . . . . 126
4.20 SCVM’s Savings by memory-trace obliviousness optimization (inverse
permutation). the non-linearity (around 60) of the curve is due to the
increase of the ORAM recursion level at that point. . . . . . . . . . . 127
4.21 Savings by memory-trace obliviousness optimization (Dijkstra) . . . . 127
5.1 Streaming MapReduce in ObliVM-Lang. See Section 5.3.1 for
oblivious algorithms for the streaming MapReduce paradigm [36]. . . 147
5.2 Oblivious stack by non-specialist programmers. . . . . . . . . . . . . 151
5.3 Code by expert programmers to help non-specialists implement obliv-
ious stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.4 Loop coalescing. The outer loop will be executed at most n times in
total, the inner loop will be executed at most m times in total – over
all iterations of the outer loop. A naive approach compiler would pad
the outer and inner loop to n and m respectively, incurring O(nm)
cost. Our loop coalescing technique achieves O(n+m) cost instead. 154
5.5 Karatsuba multiplication in ObliVM-Lang. . . . . . . . . . . . . 159
5.6 Part of our Circuit ORAM implementation (Type Defini-
tion) in ObliVM-Lang. . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.7 Part of our Circuit ORAM implementation (ReadAndRemove)
in ObliVM-Lang. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.8 Sources of speedup in comparison with state-of-the-art in
2012 [44]: an in-depth look. . . . . . . . . . . . . . . . . . . . . 166
B.1 Well formedness judgments for proof of Memory-Trace Obliviousness
of LGhostRider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
B.2 Symbolic Execution in LGhostRider . . . . . . . . . . . . . . . . . . . . 218
ix
C.1 Operational semantics for simA . . . . . . . . . . . . . . . . . . . . . 238
C.2 Operational semantics for statements in simA (part 1) . . . . . . . . 239
C.3 Operational semantics for statements in simA (part 2) . . . . . . . . 253
D.1 Hybrid Protocol piG (Part I) . . . . . . . . . . . . . . . . . . . . . . . 254
D.2 Hybrid Protocol ΠG(PartII) . . . . . . . . . . . . . . . . . . . . . . . 255
x
Chapter 1: Introduction
Cloud computing allows users to outsource both data and computation to
third-party cloud providers, and promises numerous benefits such as economies of
scale, easy maintenance, and ubiquitous availability. These benefits, however, come
at the cost of giving up physical control of one’s computing infrastructure and private
data. Privacy concerns have held back government agencies and businesses alike
from outsourcing their computing infrastructure to the public cloud [18,94].
To protect users’ data privacy, there is a growing trend to build secure cloud
computing systems, which enable computation over two or more parties’ sensitive
data, while revealing nothing more than the results to the participating parties,
and nothing at all to any third party providers Conceptually, secure cloud comput-
ing systems leverage cryptographic techniques (e.g. secure multiparty computation)
and trusted hardware (e.g. secure processors) to instantiate a “secure” abstract
machine consisting of a CPU and encrypted memory, so that an adversary cannot
learn information through either the computation within the CPU or the data in
the memory. Unfortunately, evidence has shown that side channels (e.g., memory
accesses, timing, and termination) in such a “secure” abstract machine may poten-
tially leak highly sensitive information including cryptographic keys that form the
1
root of trust for the secure systems.
The thesis of this work is that programming language-based techniques—
notably compilers and type systems— can be used to make programs secure, despite
powerful adversaries with a fine-grained view of execution, by enforcing the property
of trace obliviousness, which we define in this thesis.
1.1 Beyond Oblivious RAM
To cryptographically obfuscate memory access patterns, one can employ Obliv-
ious RAM (ORAM) [33, 35], a cryptographic construction that makes memory ad-
dress traces computationally indistinguishable from a random address trace. En-
couragingly, recent theoretical breakthroughs [80,84,90] have allowed ORAM mem-
ory controllers to be built [28, 63] – these turn DRAM into oblivious, block-
addressable memory banks.
The simplest way to deploy ORAM is to implement a single, large ORAM
bank that contains all the code and data (assuming that the client can use standard
PKI to safely transmit the code and data to a remote secure processor). A major
drawback of this baseline approach is efficiency: every single memory-block access
incurs the ORAM penalty which is roughly (poly-)logarithmic in the size of the
ORAM [35, 80]. In practice, this translates to almost 100× additional bandwidth
that, even with optimizations, incurs a ∼10× latency cost per block [28,63]. Another
issue is that, absent any padding, the baseline approach reveals the total number of
memory accesses made by a program, which can leak information about the secret
2
inputs.
In practice, we observe that many programs are intrinsically oblivious, meaning
that their execution traces observed by an adversary do not leak information. For
example, let us consider a program to compute the summation of an array of integers,
which are to be hidden from the adversary. This program will sequentially scan
through the entire array, and assuming the array is encrypted in the memory, its
memory access pattern will not leak any information about the secret integers stored
in the array. In this case, ORAM is not necessary.
To enable such optimizations is not trivial. A program may not always be
oblivious. For example, a program may mistakenly allocate an array whose access
pattern indeed leaks sensitive information outside any ORAM banks. We take
the approach to formalize the property of memory trace obliviousness (MTO), and
design novel type systems to enforce that a well-typed program enjoys memory trace
obliviousness.
A type system is a static analysis method to enforce a well-typed program to
satisfy certain properties. Intuitively, the type systems we develop are an extension
of language-based information flow systems [79], which is used to keep track of
whether or not a variable used by a program contains sensitive information. Our
extension further keeps track of whether the memory access patterns produced by
a program’s execution will leak information to the adversary.
In the following, we will discuss two main solutions to achieve secure cloud
computing, i.e., secure-processor-based solution and secure-computation-based so-
lution.
3
1.2 A hardware-software co-design for ensuring memory trace obliv-
iousness
To protect the confidentiality of sensitive data in the cloud, thwarting software
attacks alone is necessary but not sufficient. An attacker with physical access to
the computing platform (e.g., an malicious insider or intruder) can launch various
physical attacks, such as tapping memory buses, plugging in malicious peripherals,
or using cold-(re)boots [40,81]. Such physical attacks can uncover secrets even when
the software stack is provably secure.
A secure processor enables memory encryption [32,56,85–87] to hide the con-
tents of memory from direct inspection, but an adversary can still observe memory
addresses transmitted over the memory bus. As argued above, however, the mem-
ory address trace is a side channel that can leak sensitive information. We thus
exploit the above idea to enable memory obfuscation within the secure processor,
and employ programming language techniques to optimize program’s execution.
Deploying the above idea to build a memory trace oblivious system in realistic
hardware architectures is non-trivial. First, as explained above, the entire memory
needs be split into regions of ORAM banks, whose access addresses are obfuscated,
versus encrypted RAM, whose access addresses are not. Second, cache behaviors can
break a program’s memory trace oblivious execution, and it is hard to be tracked
statically. The compiler needs hardware support to deterministically control the
cache behavior. Third, the type system needs to deal with assembly code directly,
4
since otherwise the the MTO property may be broken during compilation.
To tackle these problems, we present a hardware-software co-design called
GhostRider. GhostRider’s hardware architecture supports both an encrypted RAM
region, and multiple ORAM banks, which can be leveraged by programs to optimize
the performance. GhostRider compiler can translate a program written in a C-like
high-level language into assembly code using GhostRider’s instruction set architec-
ture (ISA). GhostRider provides a type checker over the assembly code to enforce a
well-typed program enjoys MTO. We will explain GhostRider in detail in Chapter 3.
1.3 Automatic RAM-model secure computation
An alternative route to achieve secure cloud computing is through secure com-
putation. Secure computation is a cryptographic technique that allows mutually
distrusting parties to make collaborative use of their local data without harming
privacy of their individual inputs. Since Yao’s seminal paper [96], research on secure
two-party computation—especially in the semi-honest model we consider here—has
flourished, resulting in ever more efficient protocols [11, 38,52,97] as well as several
practical implementations [16, 43, 45, 46, 54, 65]. Since the first system for general-
purpose secure two-party computation was built in 2004 [65], efficiency has improved
substantially [11,46].
Almost all previous implementations of general-purpose secure computation
assume the underlying computation is represented as a circuit. While theoretical
developments using circuits are sensible (and common), compiling typical programs,
5
which assume a von Neumann-style Random Access Machine (RAM) model, to
efficient circuits can be challenging. One significant challenge is handling dynamic
memory accesses to an array in which the memory location being read/written
depends on secret inputs. A typical program-to-circuit compiler typically makes an
entire copy of the array upon every dynamic memory access, thus resulting in a
huge circuit when the data size is large. Theoretically speaking, generic approaches
for translating RAM programs into circuits incur, in general, O(TN) blowup in
efficiency, where T is an upper bound on the program’s running time, and N is the
memory size.
To address these limitations, researchers have more recently considered secure
computation that works directly in the RAM model [38, 62]. The key insight is,
to rely on ORAM to enable dynamic memory access with poly-logarithmic cost,
while preventing information leakage through memory-access patterns. Gordon et
al. [38] observed a significant advantage of RAM-model secure computation (RAM-
SC) in the setting of repeated sublinear-time queries (e.g., binary search) on a large
database. By amortizing the setup cost over many queries, RAM-SC can achieve
amortized cost asymptotically close to the run-time of the underlying program in
the insecure setting.
To enable secure computation with practical usage, it is ideal to have a com-
plete system, so that developers can implement the applications in a high-level lan-
guage (which are mostly in RAM-model) rather than implementing circuits directly,
and the compiler can translate the program into an efficient secure computation pro-
tocol while enforcing security.
6
While pursuing this goal, the MTO approach that we developed for GhostRider
can be leveraged in this setting as well. In particular, for a program, we can use the
same kind of MTO analysis approach to decide whether or not we should store some
data in an ORAM bank, and ensure that a program is MTO using a type system.
Secure computation, however, imposes more security restrictions to be considered.
For example, a secure computation requires the circuit to be evaluated by both par-
ties. This means that both parties know the instruction being executed. Therefore,
secure computation requires instruction trace obliviousness beyond memory trace
obliviousness. Further, since ORAM protocols are implemented in circuits in secure
computation, this allows programmers to make non-blackbox usage of ORAMs as
well. For example, developers can use tree-based non-recursive ORAM [90], which
is a less expensive building block of ORAM, directly to achieve better performance.
To address these issues, we developed the SCVM system which includes a SCVM
intermediate representation, a compiler, and a secure type system to demonstrate
how to achieve automatic efficient RAM-model computation. We detail SCVM in
Chapter 4.
1.4 A programming framework for secure computation
As a last contribution, we also deliver the ObliVM system as an extension on
top of SCVM. It provides a programming framework with more expressive power
and easy-programmability to help developers write better algorithms more easily.
ObliVM focuses more on how to facilitate developers to build secure computation
7
applications. The design goal is to allow both cryptographic experts to improve the
efficiency of low-level cryptographic protocols, and application developers who may
not be familiar with cryptography to implement efficient applications in a high-
level language. To meet these goals, we designed a new high-level programming
language, called ObliVM-Lang, as an extension to SCVM and implement a compiler.
Using this language, we can implement programming abstractions, which enable ap-
plication developers to implement algorithms in an easy and efficient way. ObliVM
also provides a backend called ObliVM-SC to allow cryptographic experts to imple-
ment different protocols to further accelerate the execution of the whole system. We
will explain ObliVM in Chapter 5.
1.5 Our Results and Contributions.
In this thesis, we propose the trace oblivious computation theory and bring
it to practice. We design and build GhostRider, a hardware/software platform for
provably secure, memory-trace oblivious program execution, which can compile pro-
grams to a realistic architecture while formally ensuring MTO. We also design and
build SCVM and ObliVM to enable developers to develop efficient secure computation
applications. In summary, our contributions are:
Trace obliviousness theory. We greatly extend the study of trace oblivious
program execution by providing a theory to establish when if a program is trace
oblivious. Particularly, we demonstrate the way how to formalize a language such
that the adversary-observable execution traces generated by the program can be
8
modeled. We also demonstrate how type systems can be used to enforce the trace
obliviousness of programs in these languages. On the one hand, using these type
systems, we extend the set of oblivious programs that can be verified automatically,
to include more efficient implementations. On the other hand, these type systems
can be extended to analyze other side-channel leakages that can be expressed as
traces, and thus used to defend against attacks leveraging these channels.
GhostRider system. GhostRider is the first system to bring trace oblivious
computation theory to practice. By building GhostRider itself, we build the first
memory-trace obliviousness compiler that emits target code for a realistic ORAM-
capable processor architecture. GhostRider’s compiler optimizes the generated as-
sembly code while ensuring the optimized code still satisfies MTO. To enable these
optimizations, GhostRider builds on the Phantom processor architecture [63] but
exposes new features and knobs to the software. Our empirical results on a real
processor demonstrate the feasibility of our architecture and show that compared to
the baseline approach of placing everything in a single ORAM bank, our compile-
time static analysis achieves up to nearly an order-of-magnitude speedup for many
common programs.
SCVM system. We build SCVM as the first system to enable automatic RAM-
model secure computation. SCVM provides a complete system that takes a program
written in a high-level language and compiles it to a protocol for secure two-party
computation of that program. To achieve this goal, SCVM provides an intermedi-
ate representation, a type system to ensure any well-typed program will generate
a secure computation protocol secure in the semi-honest model, and a compiler
9
to transform a program written in a high-level language into a secure two-party
computation protocol while integrating compile-time optimizations curcial for im-
proving performance. Our evaluation shows a speedup of 1–2 orders of magnitude
as compared to standard circuit-based approaches for securely computing the same
programs.
ObliVM system. Building on top of SCVM, we design and implement ObliVM, which
focuses more on richer expressive power and easy programmability while achieving
state-of-the-art performance for secure computation. ObliVM provides an expressive
programming language to allow both cryptographic experts and non-experts to use
ObliVM to customize both front-end and back-end optimizations very easily. Us-
ing this programming language, we implement several programming abstractions in
ObliVM to help developers design and implement new efficient oblivious algorithms.
Experiments show that the automatically generated circuits incurs only 0.5% to 2%
overhead over manually optimized implementations.
10
Chapter 2: Memory Trace Obliviousness: Basic Setting
In this chapter, we discuss a simple setting for memory trace oblivious program
execution, which still captures the essential ideas. We consider a simple client-server
scenario. A program is run on the client, but the data operated by the program is
stored on the server. The goal is to enforce the server cannot learn any sensitive
information from both the data stored on the server, and its interaction with the
client-side program. We assume the server manages the data as a big memory, so
that the only way a client program can interact with the server is through memory
read/write APIs. Particularly, through a read call read(i) will request the data
block indexed by i from the server, and a write call write(i, data) will update
the data block indexed by i with data. The server thus can observe i and data
during such interactions.
This chapter will explain how can we use ORAM to prevent the server to
learn any information through these interactions, and how a compiler and a type
system can help enforcing this security property over a program automatically while
preserving efficiency.
Before we go into the details, we want to emphasize that although this setting
itself has interesting applications in this setting, it can be extended to more appli-
11
cations such as secure processor applications, where CPU and RAM correspond to
client and server respectively, and secure multi-party computation. We will discuss
these extensions in Chapter 3, 4, 5. We want to explain the basic ideas and tech-
niques to achieve trace oblivious computation in this chapter, which will be further
extended in later chapters.
This chapter is based on a paper that I co-authored with Michael Hicks and
Elaine Shi [58]. I developed the formalism and conducted the proof of the main
memory trace obliviousness theorem under the help of Michael Hicks, and provided
preliminary experimental results on performance.
2.1 Threat Model
Particularly, a server stores all the data, and a client runs programs interacting
with the server to get data. We assume that the client has small local storage. The
server manages the data as a random access memory (RAM), which supports read
and write operation with random addresses. The adversarial model assumes that
the server can observe all addresses that the client program is accessing, but not
client programs’ internal states. We assume the data stored on the server are all
encrypted, so that the server cannot observe the data directly. Client program will
decrypt the data once retrieved, and re-encrypt them before uploading to the server
(via write operations). We consider the honest-but-curious model, such that the
server does not modify the stored data. Orthogonal techniques, such as Merkle’s
hash tree, can be used to enforce data integrity.
12
Though they are a real threat, timing and other covert channels are not con-
sidered in this basic setting. Later, we will show how our GhostRider system (Chap-
ter 3) and ObliVM (Chapter 5) can prevent leakage through these channels.
2.2 Motivating Examples
Let us consider the following program, where the client does not have enough
space to store a big dataset, but offloads it to the server instead.
1: int findmax(public int n, secret int* data) {
2: secret int max = data[0];
3: for (public int i=1; i<n; ++i) {
4: if (data[i] > max)
5: max = data[i]
}
}
In this client program, data refers to the client’s data stored on the server.
The keyword secret is used to denote this data is sensitive. A straightforward idea
to enforce that the server does not learn information through addresses is that the
client program can run an ORAM protocol with the server and stores all data in
a giant ORAM. This approach, however, has two drawbacks. First, this may not
secure, because the total number of memory accesses may still leak the information.
For example, in the above example, based on the total number of accesses, the
server will learn how many times line 5 is executed. The server can infer about
13
some information such as whether data is in ascending order.
To mitigate this problem, the client program needs perform dummy accesses
to the server even though the condition at line 4 is not satisfied. To this aim,
this program needs be rewritten such that in the false-branch of line 4, a dummy
statement dummy max = data[i] is inserted, where dummy max is an inserted dummy
variable which has no side-effect to other data such as max and data.
Second, this is not efficient, as it will incur an ORAM overhead which is
unnecessary. In particular, This program sequentially scans through the entire data
array, and keeps the maximal value in max. In this case, ORAM is not necessary,
since the data access addresses, i.e. i, are public information which can be inferred
by the server without knowing any information about the secret data.
2.3 Approach Overview
In the following, we present several technical highlights in the following.
Memory Trace Obliviousness. The adversary can observe the stream of ac-
cesses to server, even if he cannot observe the content of those accesses, and such
observations are sufficient to infer secret information. To eliminate this channel of
information, we need a way to run the program so that the event stream does not
depend on the secret data—no matter the values of the secret, the observable events
will be the same. Programs that exhibit this behavior enjoy a property we call
memory trace obliviousness.
Padding. Toward ensuring memory trace obliviousness, the compiler can add
14
padding instructions to either or both branches of if statements whose guards
reference secret information (we refer to such guards as high guards). This idea
is similar to inserting padding to ensure uniform timing [6, 10, 22, 42]. We need to
insert dummy accesses to the two branches such that both branches have equivalent
access patterns. We will detail our approach in Section 2.5.
ORAM for secret data. We store secret data in Oblivious RAM (ORAM), and
extend our trusted computing base with an client side ORAM library. This library
will encrypt/decrypt the secret data and maintain a mapping between addresses
for variables used by the program and actual storage addresses for those variables
on server. For each read/write issued for a secret address, the ORAM library will
issue a series of reads/writes to the server with actual addresses, which has the
effect of hiding which of the accesses was the real address. Moreover, with each
access, the ORAM library will shuffle program/storage address mappings so that
the physical location of any program variable is constantly in flux. Asymptotically,
ORAM accesses are polylogarithmic blowup in the size of the ORAM [35]. Note
that if we were concerned about integrity, we could compose the ORAM library with
machinery for, say, authenticated accesses.
Multiple ORAM banks. ORAM can be an order of magnitude slower than
regular DRAM [83]. Moreover, larger ORAM banks containing more variables incur
higher overhead than smaller ORAM banks [35, 80]; as mentioned above, ORAM
accesses are asymptotically related to the size of the ORAM. Thus we can reduce
run-time overhead by allocating code/data in multiple, smaller ORAM banks rather
than all of it in a single, large bank.
15
Arrays. Implicitly we have assumed that all of an array is allocated to the same
ORAM bank, but this need not be the case. Indeed, for our example it is safe to
simply encrypt the contents of data[i] because knowing which memory address we
are accessing does not happen to reveal anything about the contents of data[], as
we have explained before.
If we allocate each array element in a separate ORAM bank, the running time
of the program becomes roughly 2n accesses: each access to data[i] is in a bank of
size 1. Each iteration will read data twice, and thus there are 2n accesses in total
for n iterations.
In comparison, the na¨ıve strategy of allocating all variables in a single ORAM
bank would incur 2n · poly log(n+ 2)) memory accesses (for secret variables), since
each access to an ORAM bank of size m requires O(poly log(m)) actual server ac-
cesses. This shows that we can achieve asymptotic gains in performance for some
programs.
2.4 Memory trace obliviousness by typing
This section formalizes a type system for verifying that programs enjoy mem-
ory trace obliviousness. In the next section we describe a compiler to transform
programs like the one in Section 2.2 so they can be verified by our type system.
We formalize our type system using a simple language Lbasic presented in
Figure 2.1. A program is a statement S which can be a sequence S;S, no-op skip,
assignments to variables and arrays, conditionals, and loops. Expressions e consist
16
Variables x, y, z ∈ Vars
Numbers n ∈ Nat
ORAM bank o ∈ ORAMbanks
Expressions e ::= x | e op e | x[e] | n
Statements S ::= skip | x := e | x[e] := e | if(e, S, S) | while (e, S) | S;S
Figure 2.1: Language syntax of Lbasic
Arrays m ∈ Arrays = Nat ⇀ Nat
Labels l ∈ SecLabels = {L} ∪ORAMbanks
Memory M ∈ Vars ⇀ (Arrays ∪Nat)× SecLabels
Traces t ::= read(x, n) | readarr(x, n, n) | write(x, n) | writearr(x, n, n)
| o | t@t | 
get(m,n) =
{
m(n) if 0 ≤ n < |m|
0 otherwise
upd(m,n1, n2) =
{
m[n1 7→ n2] if 0 ≤ n1 < |m|
m otherwise
evt(l, t) =
{
l if l ∈ ORAMbanks
t otherwise
Figure 2.2: Auxiliary syntax and functions for semantics in Lbasic
of constant natural numbers, variable and array reads, and (compound) operations.
For simplicity, arrays may contain only integers (and not other arrays), and bulk
assignments between arrays (i.e., x := y when y is an array) are not permitted.
2.4.1 Operational semantics
We define a big-step operational semantics for Lbasic in Figure 2.3, which refers
to auxiliary functions and syntax defined in Figure 2.2. Big-step semantics is sim-
pler than the small-step alternative, and though it cannot be used to reason about
non-terminating programs, our cloud computing scenario generally assumes that
17
〈M, e〉 ⇓t n
E-Var
M(x) = (n, l) t = evt(l, read(x, n))
〈M,x〉 ⇓t n
E-Const 〈M,n〉 ⇓ n
E-Op
〈M, e1〉 ⇓t1 n1 〈M, e2〉 ⇓t2 n2 n = n1 op n2
〈M, e1 op e2〉 ⇓t1@t2 n
E-Arr
〈M, e〉 ⇓t n M(x) = (m, l) n′ = get(m,n)
t1 = evt(l, readarr(x, n, n
′))
〈M,x[e]〉 ⇓t@t1 n′
〈M, s〉 ⇓t M ′
S-Skip 〈M, skip〉 ⇓ M
S-Asn
〈M, e〉 ⇓t n M(x) = (n′, l) t′ = evt(l,write(x, n))
〈M,x := e〉 ⇓t@t′ M [x 7→ (n, l)]
S-AAsn
〈M, e1〉 ⇓t1 n1 〈M, e2〉 ⇓t2 n2 M(x) = (m, l)
m′ = upd(m,n1, n2) t = evt(l,writearr(x, n1, n2))
〈M,x[e1] := e2〉 ⇓t1@t2@t M [x 7→ (m′, l)]
S-Cond
〈M, e〉 ⇓t1 n 〈M,Si〉 ⇓t2 M (i = ite(n, 1, 2))
〈M, if (e, S1, S2) 〉 ⇓t1@t2 M ′
S-WhileT
〈M, e〉 ⇓t n n 6= 0
〈M, (S; while(e, S))〉 ⇓t′ M ′
〈M,while(e, S)〉 ⇓t@t′ M ′
S-WhileF
〈M, e〉 ⇓t 0
〈M,while(e, S)〉 ⇓t M
P-Stmts
〈M,S1〉 ⇓t1 M ′ 〈M ′, S2〉 ⇓t2 M ′′
〈M,S1;S2〉 ⇓t1@t2 M ′′
Figure 2.3: Operational semantics of Lbasic
18
programs terminate. The main judgment of the former figure, 〈M,S〉 ⇓t M ′ (shown
at the bottom), indicates that program S when run under memory M will termi-
nate with new memory M ′ and in the process produce a memory access trace t.
We also define judgments 〈M, e〉 ⇓t n for evaluating statements and expressions,
respectively.
We model server side storage as memory M , which is a partial functions from
variables to labeled values, where a value is either an array m or a number n, and
a label is either L or an ORAM bank identifier o. Thus we can think of an ORAM
bank (managed by the client side ORAM library) o containing all data for variables
x such that M(x) = ( , o), whereas all data labeled L is stored on the server directly.
We model an array m as a partial function from natural numbers to natural numbers.
We write |m| to model the length of the array; that is, if |m| = n then m(i) is defined
for 0 ≤ i < n but nothing else. To keep the formalism simple, we assume all of the
data in an array is stored in the same place, i.e., all on the server directly or all in
the same ORAM bank. We also assume data referred by non-array variables are
also stored on the server. We will show how these assumptions can be relaxed while
describing GhostRider (Chapter 3) and ObliVM (Chapter 3).
A memory access trace t is a finite sequence of events arising during pro-
gram execution that are observable to the server. These events include read events
read(x, n) which states that number n was read from variable x and read(x, n1, n2),
which states number n2 was read from x[n1]. The corresponding events for writes to
variables and arrays are similar, but refer to the number written, rather than read.
Event o indicates an access to ORAM—only the storage bank o is discernable, not
19
the precise variable involved or even whether the access is a read or a write. (Each
ORAM read/write in the program translates to several actual DRAM accesses, but
we model them as a single abstract event.) Finally, t@t represents the concatenation
of two traces and  is the empty trace.
The rules in Figure 2.3 are largely straightforward. Rule (E-Var) defines vari-
able reads by looking up the variable in memory, and then emitting an event con-
sonant with the label on the variable’s memory. This is done using the evt function
defined in Figure 2.2: if the label is some ORAM bank o then event o will be emitted,
otherwise event read(x, n) is emitted since the access is to the server directly.
The semantics treats array accesses as “oblivious” to avoid information leakage
due to out-of-bounds indexes. In particular, rule (E-Arr) indexes the array using
auxiliary function get , also defined in Figure 2.2, that returns 0 if the index n is
out of bounds. Rule (S-AAsn) uses the upd function similarly: if the write is out
of bounds, then the array is not affected.1 We could have defined the semantics to
throw an exception, or result in a stuck execution, but this would add unnecessary
complication. Supposing we had such exceptions, our semantics models wrapping
array reads and writes with a try-catch block that ignores the exception, which is
a common pattern, e.g., in Jif [19, 49], and has also been advocated by Deng and
Smith [26].
The rule (S-Cond) for conditionals is the obvious one; we write ite(x,y,z) to
denote y when x is 0, and z otherwise. Rule (S-WhileT) expands the loop one
1The syntax m[n1 7→ n2] defines a partial function m′ such that m′(n1) = n2 and otherwise
m′(n) = m(n) when n 6= n1. We use the same syntax for updating memories M .
20
unrolling when the guard is true and evaluates that to the final memory, and rule
(S-WhileF) does nothing when the guard is false. Finally rule (P-Stmts) handles
sequences of statements.
2.4.2 Memory trace obliviousness
The security property of interest in our setting we call memory trace oblivious-
ness. This property generalizes the standard (termination-sensitive) noninterference
property to account for memory traces. Intuitively, a program satisfies memory trace
obliviousness if it will always generate the same trace (and the same final memory)
for the same adversary-visible memories, no matter the particular values stored in
ORAM. We formalize the property in three steps. First we define what it means
for two memories to be low-equivalent, which holds when they agree on memory
contents having label L.
Definition 1 (Low equivalence). Two memories M1 and M2 are low-equivalent,
denoted as M1 ∼L M2, if and only if ∀x, v.M1(x) = (v,L)⇔M2(x) = (v,L).
Next, we define the notion of the Γ-validity of a memory M . Here, Γ is the
type environment that maps variables to security types τ , which are either Nat l
or Array l (both are defined in Figure 2.5). In essence, Γ indicates a mapping of
variables to memory banks, and if memory M employs that mapping then it is valid
with respect to Γ.
Definition 2 (Γ-validity). A memory M is valid under a environment Γ, or Γ-valid,
21
@t ≡ t@ ≡ t t1 = t2
t1 ≡ t2
t1 ≡ t2
t2 ≡ t1
t1 ≡ t2 t2 ≡ t3
t1 ≡ t3
t1 ≡ t′1 t2 ≡ t′2
t1@t2 ≡ t′1@t′2
(t1@t2)@t3 ≡ t1@(t2@t3)
Figure 2.4: Trace equivalence in Lbasic
if and only if, for all x
Γ(x) = Nat l⇔ ∃n ∈ Nat.M(x) = (n, l)
Γ(x) = Array l⇔ ∃m ∈ Arrays.M(x) = (m, l)
Finally, we define memory trace obliviousness. Intuitively, a program enjoys
this property if all runs of the program on low-equivalent, Γ-valid memories will
always produce the same trace and low-equivalent final memory.
Definition 3 (Memory trace obliviousness). Given a security environment Γ, a
program S satisfies Γ-memory trace obliviousness if for any two Γ-valid memories
M1 ∼L M2, if 〈M1, S〉 ⇓t1 M ′1 and 〈M2, S〉 ⇓t2 M ′2, then t1 ≡ t2, and M ′1 ∼L M ′2.
In this definition, we write t1 ≡ t2 to denote that t1 and t2 are equivalent.
Equivalence is defined formally in Figure 2.4. Intuitively, two traces are equivalent
if they are syntactically equivalent or we can apply associativity to transform one
into the other. Furthermore,  plays the role of the identity element.
2.4.3 Security typing
Figure 2.6 presents a type system that aims to ensure memory trace oblivi-
ousness. Auxiliary definitions used in the type rules are given in Figure 2.5. This
22
Types τ ::= Nat l | Array l
Environments Γ ∈ Vars ⇀ Types
Trace Pats T ::= Read x |Write x | Readarr x |Writearr x
| Loop(T, T ) | o | T@T | T + T | 
l1 unionsq l2 =
{
l1 if l1 6= L
l2 otherwise
l1 v l2 iff l1 = L orl2 6= L and l2 6= n
select(T1, T2) =
{
T1 if T1 ∼L T2
T1 + T2 otherwise
Figure 2.5: Auxiliary syntax and functions for typing in Lbasic
type system borrows ideas from standard security type systems that aim to enforce
(traditional) noninterference. For the purposes of typing, we define a lattice order-
ing v among security labels l as shown in Figure 2.5, which also shows the unionsq (join)
operation. Essentially, these definitions treat all ORAM bank labels o as equivalent
for the purposes of typing (you can think of them as the ”high” security label). In
the definition of v, we also consider the case when l2 could be a program location
n, which is treated as equivalent to L (this case comes up when typing labeled
statements).
The typing judgment for expressions is written Γ ` e : τ ;T , which states that
in environment Γ, expression e has type τ , and when evaluated will produce a trace
described by the trace pattern T . The judgments for statements s and programs S
are similar. Trace patterns describe families of run-time traces; we write t ∈ T to
say that trace t matches the trace pattern T .
Trace pattern elements are quite similar to their trace counterparts: ORAM
accesses are the same, as are empty traces and trace concatenation. Trace pattern
23
Γ ` e : τ ;T
T-Var
Γ(x) = Nat l T = evt(l,Read(x))
Γ ` x : Nat l;T T-Con Γ ` n : Nat L; 
T-Op
Γ ` e1 : Nat l1;T1 Γ ` e2 : Nat l2;T2 l = l1 unionsq l2
Γ ` e1 op e2 : Nat l;T1@T2
T-Arr
Γ(x) = Array l Γ ` e : Nat l′;T l′ v l
T ′ = evt(l,Readarr(x))
Γ ` x[e] : Nat l unionsq l′;T@T ′
Γ, l ` S;T
T-Skip
Γ, l0 ` skip; 
T-Asn
Γ ` e : Nat l;T Γ(x) = Nat l′ l0 unionsq l v l′
Γ, l0 ` x := e;T@evt(l′,Write(x))
T-AAsn
Γ ` e1 : Nat l1;T1 Γ ` e2 : Nat l2;T2 Γ(x) = Array l l1 unionsq l2 unionsq l0 v l
Γ, l0 ` x[e1] := e2;T1@T2@evt(l,Writearr(x))
T-Cond
Γ ` e : Nat l;T Γ, l unionsq l0 ` Si;Ti (i = 1, 2)
l unionsq l0 6= L⇒ T1 ∼L T2 T ′ = select(T1, T2)
Γ, l0 ` if(e, S1, S2);T@T ′
T-While
Γ ` e : Nat l;T1 Γ, l0 ` S;T2 l unionsq l0 v L
Γ, l0 ` while(e, S); Loop(T1, T2)
T-Seq
Γ, l0 ` S1;T1 Γ, l0 ` S2;T2
Γ, l0 ` S1;S2;T1@T2
Figure 2.6: Typing for Lbasic
24
events for reads and writes to variables and arrays are more abstract, mentioning
the variable being read, and not the particular value (or index, in the case of arrays);
we have read(x, n) ∈ Read(x) for all n, for example. There is also the or -pattern
T1 + T2 which matches traces t such that either t ∈ T1 or t ∈ T2. Finally, the trace
pattern for loops, Loop(T1, T2), denotes the set of patterns T1 and T1@T2@T1 and
T1@T2@T1@T2@T1 and so on, and thus matches any trace that matches one of them.
Turning to the rules, we can see that each one is structurally similar to the
corresponding semantics rule. Each rule likewise uses the evt function (Figure 2.2)
to selectively generate an ORAM event o or a basic event, depending on the label of
the variable being read/written. Rule (T-Var) thus generates a Read(x) pattern if
x’s label is L, or generates the ORAM event l (where l 6= L implies l is some bank
o). As expected, constants n are labeled L by (T-Con), and compound expressions
are labeled with the join of the labels of the respective sub-expressions by (T-Op).
Rule (T-Arr) is interesting in that we require l v l′, where l is the label of the index
and l′ is the label of the array, but the label of the resulting expression is the join
of the two. As such, we can have a public index of a secret array, but not the other
way around. This is permitted because of our oblivious semantics: a public index
reveals nothing about the length of the array when the returned result is secret, and
no out-of-bounds exception is possible.
The judgment for statements Γ, l0 ` S;T is similar to the judgment for expres-
sions, but there is no final type, and it employs the standard program counter (PC)
label l0 to prevent implicit flows. In particular, the (T-Asn) and (T-AAsn) rules
both require that the join of the label l of the expression on the rhs, when joined
25
with the program counter label l0, must be lower than or equal to the label l
′ of the
variable; with arrays, we must also join with the label l1 of the indexing expression.
Rule (T-Cond) checks the statements Si under the program counter label that is
at least as high as the label of the guard. As such, coupled with the constraints
on assignments, any branch on a high-security expression will not leak information
about that expression via an assignment to a low-security variable. In a similar way,
rule (T-Lab) requires that the statement location p is lower or equal to the program
counter label, so that a public instruction fetch cannot be the source of an implicit
flow.
Rule (T-Cond) also ensures that if the PC label or that of the guard expression
is secret, then the actual run-time trace of the true branch (matched by the trace
pattern T1) and the false branch (pattern T2) must be equal; if they were not, then
the difference would reveal something about the respective guard. We ensure run-
time traces will be equal by requiring the trace patterns T1 and T2 are equivalent,
as axiomatized in Figure 3.6. The first two rows prove that  is the identity, that
∼L is a transitive relation, and that concatenation is associative. The third row
unsurprisingly proves that ORAM events to the same bank and fetches of the same
location/bank are equivalent. More interestingly, the third row claims that public
reads to the same variable are equivalent. This makes sense given that public writes
are not equivalent. As such, reads in both branches will always return the same
run-time value they had prior to the conditional. Notice that the public reads to
the same arrays are also not equivalent, since indices may leak information. Finally,
the (T-Cond) emits trace T ′, which according to the select function (Figure 2.5) will
26
T ∼L T
@T ∼L T@ ∼L T  ∼L 
T1 ∼L T2 T2 ∼L T3
T1 ∼L T3 o ∼L o
T1 ∼L T ′1 T2 ∼L T ′2
T1@T2 ∼L T ′1@T ′2
(T1@T2)@T3 ∼L T1@(T2@T3) Read x ∼L Read x
Figure 2.7: Trace pattern equivalence in Lbasic
be T1 when the two are equivalent. As such, conditionals in a high context will never
produce or-pattern traces (which are not equivalent to any other trace pattern).
In Rule (T-While), the constraint l unionsq l0 v L mandates that loop guards be
public (which is why we need not join l0 with l when checking the body S). This
constraint ensures that the length of the trace as related to the number of loop
iterations cannot reveal something about secret data. Fortunately, this restriction
is not problematic for many examples because secret arrays can be safely indexed
by public values, and thus looping over arrays reveals no information about them.
Finally, we can prove that well typed programs enjoy memory trace oblivious-
ness.
Theorem 1. If Γ, l ` S;T , then S satisfies memory trace obliviousness.
The full proof can be found in Appendix A.
2.4.4 Examples
Now we consider a few programs that do and do not type check in our sys-
tem. In the examples, public (low security) variables begin with p, and secret (high
security) variables begin with s; we assume each secret variable is allocated in its
own ORAM bank (and ignore statement labels).
27
There are some interesting differences in our type system and standard infor-
mation flow type systems. One is that we prohibit low reads under high guards that
could differ in both branches. For example, the program if s > 0 then s := p1
else s := p2 is accepted in the standard type system but rejected in ours. This
is because in our system we allow the adversary to observe public reads, and thus
he can distinguish the two branches, whereas an adversary can only observe public
writes in the standard noninterference proof. On the other hand, the program if
s > 0 then s := p+1 else s := p+2 would be accepted, because both branches
will exhibit the same trace.
Another difference is that we do not allow high guards in loops, so a program
like the following is acceptable in the standard type system is rejected in ours:
s := slen; sum := 0;
while s ≥ 0 do
sum := sum + sarr[p];
s := s - 1;
done
The reason we reject this program is that the number of loop iterations, which in
general cannot be determined at compile time, could reveal information about the
secret at run-time. In this example, the adversary will observe O(s) memory events
and thus can infer slen itself. Prior work on mitigating timing channels often makes
the same restriction for the same reason [6, 10, 22, 42]. Similarly, we can mitigate
the restrictiveness of our type system by padding out the number of iterations to a
28
constant value. For example, we could transform the above program to be instead
p := N; sum := 0;
while p ≥ 0 do
if p < slen then sum := sum + sarr[p];
else sdummy := sdummy + sarr[p];
p := p - 1;
done
Here, N is some constant and sdummy and sum are allocated in the same ORAM
bank. The loop will always iterate N times but will compute the same sum assuming
N ≥ slen.
We also do not allow loops with low guards to appear in a conditional with a
high guard. As above, we may be able to transform a program to make it accept-
able. For example, for some S, the program if s > 0 then while (p > 0) do
S; done could be transformed to be while (p > 0) do if s > 0 then S; done
(assuming s is not written in S). This ensures once again that we do not leak
information about the loop guard.
2.5 Compilation
Rather than requiring programmers to write memory-trace oblivious programs
directly, we would prefer that programmers could write arbitrary programs and rely
on a compiler to transform those programs to be memory trace oblivious. While
more fully realizing this goal remains future work, we have developed a compiler
29
algorithm that automates some of the necessary tasks.
In particular, given a program P in which the inputs and outputs are labeled as
secret or public, our compiler will (a) infer the least labels (secret or public) for
the remaining, unannotated variables; (b) allocate all secret variables to distinct
ORAM banks; (c) insert padding instructions in conditionals to ensure their traces
are equivalent; and finally, (d) allocate instructions appearing in high conditionals to
ORAM banks. These steps are sufficient to transform the max program in section 2.2
into its memory-trace oblivious counterpart. We can also transform other interesting
algorithms, such as k-means, Dijkstra’s shortest paths, and matrix multiplication,
as we discuss in the next section.
We now sketch the different steps of our algorithm.
2.5.1 Type checking source programs
The first step is to perform label inference on the source program to make sure
that we can compile it. This is the standard, constraint-based approach to local type
inference as implemented in languages like Jif [49] and FlowCaml [74]. We introduce
fresh constraint variables for the labels of unannotated program variables, and then
generate constraints based on the structure of the program text. This is done by
applying a variant of the type rules in Figure 2.6, having three differences. First,
we treat labels l has being either L, representing public variables; H, representing
secret variables (we can think of this as the only available ORAM bank); or α,
representing constraint variables. Second, premises like l1 v l2 and l0 unionsq l1 v l2
30
that appear in the rules are interpreted as generating constraints that are to be
solved later. Third, all parts having to do with trace patterns T are ignored. Most
importantly, we ignore the requirement that T1 ∼L T2 for conditionals.
Given a set of constraints generated by an application of these rules, we at-
tempt to find the least solution to the variables α that appear in these constraints,
using standard techniques [30]. If we can find a solution, the compilation may con-
tinue. If we cannot find a solution, then we have no easy way to make the program
memory-trace oblivious, and so the program is rejected.
As an example, consider the findmax program in Section 2.2, but assume that
variables i and max are not annotated, i.e., they are missing the secret and public
qualifiers. When type inference begins, we assign i the constraint variable αi and
max the constraint variable αm. In applying the variant type rules (with the PC
label l0 set to L) to this program (that is, the part from lines 5–7), we will generate
the following constraints:
(αi unionsq L) unionsq L v L line 3
αi v H line 4, for h[i] in guard
l0 = αi unionsqH unionsq αm unionsq L PC label for checking if branch
αi v H line 5, for h[i] in assignment
l0 unionsq (H unionsq αi) v αm line 5, assignment
L unionsq (αi unionsq L) v αi line 3
(For simplicity we have elided the constraints on location labels that arise due to
31
(T-Lab), but normally these would be included as well.) We can see that the only
possible solution to these constraints is for αi to be L and αm to be H, i.e., the
former is public and the latter is secret.
Assuming that the programmer minimally labels the source program, only
indicating those data that must be secret and leaving all other variables unlabeled,
then the main restriction on source programs is the restriction on the use of loops:
all loop guards must be public, and no loop may appear in a conditional whose guard
is high. As mentioned in the previous section, the programmer may transform such
programs into equivalent ones, e.g., by using a constant loop bound, or by hoisting
loops out of conditionals. We leave the automation of such transformations to future
work.
2.5.2 Allocating variables to ORAM banks
Given all variables that were identified as secret in the previous stage, we
need to allocate them to one or more ORAM banks. At one extreme, we could put all
secret variables in a single ORAM bank. The drawback is that each access to a secret
variable could cause significant overhead, since ORAM accesses are polylogarithmic
in the size of the ORAM [35] (on top of the encryption/decryption cost). At the
other extreme, we could put every secret variable in a separate ORAM bank. This
lowers overhead by making each access cheaper but will force the next stage to insert
more padding instructions, adding more accesses overall. Finally, we could attempt
to choose some middle ground between these extreme methods: put some variables
32
in one ORAM bank, and some variables in others.
Ultimately, there is no analytic method for resolving this tradeoff, as the “break
even” point for choosing padding over increased bank size, or vice versa, depends
on the implementation. A profile-guided approach to optimizing might be the best
approach. With our limited experience so far we observe that storing each secret
variable in a separate ORAM bank generally achieves very good performance. This
is because when conditional branches have few instructions, the additional padding
adds only a small amount of overhead compared to the asymptotic slowdown of
increased bank size. Therefore we adopt this method in our experiments. Neverthe-
less, more work is needed to find the best tradeoff in a practical setting.
We also need to assign secret statements (i.e., those statements whose location
label must be H) to ORAM banks. At this stage, we assign all statements under
a given conditional to the same ORAM bank, but we make a more fine-grained
allocation after the next stage, discussed below.
2.5.3 Inserting padding instructions
The next step is to insert padding instructions into conditionals, to ensure the
final premise of (T-Cond) is satisfied, so that both branches will generate the same
traces.
To do this, we can apply algorithms that solve the shortest common superse-
quence problem [31] when applied to two traces (a.k.a. the 2-scs problem). That
is, given the two trace patterns Tt and Tf for the true and false branches of an if
33
T1 T T T
T1 TT T
T3
T3 T5
T3
23 4
2 54
T
T :=
:=f
t
Figure 2.8: Finding a short padding sequence using the greatest com-
mon subsequence algorithm. An example with two abstract traces Tt =
[T1;T2;T3;T4;T5] and Tf = [T1;T3;T2;T4]. One greatest common subsequence
as shown is [T1;T2;T4]. A shortest common super-sequence of the two traces is
Ttf = [T1;T3;T2;T3;T4;T5].
(following ORAM bank assignment), let Ttf denote the 2-scs of Tt and Tf . The
differences between Ttf and the original traces signal where, and what, padding
instructions must be inserted. The standard algorithm builds on the dynamic pro-
gramming solution to the greatest common subsequence (gcs) algorithm, which runs
in time O(nm) where n and m are the respective lengths of the two traces [23].
Using this algorithm to find the gcs reveals which characters must be inserted into
the original strings, as illustrated in Figure 2.8.
When running 2-scs on traces, we view Tt and Tf as strings of characters
which are themselves trace patterns due to single statements. Each statement-level
pattern will always consist of zero or more of the following events: Read, oi for
ORAM bank i .2 For example, suppose we have the program skip; x[y] := z where,
after ORAM bank assignment, the type of y is Nat o1, the type of z is Array o1,
and the type of x is Nat o2. This program generates trace pattern @o1@o1@o2. For
the purposes of running 2-scs, this trace consists of three characters: o1, o1, and o2,
which corresponds to the statement x[y] := z.
2Because of the restrictions imposed by the type system, Tt and Tf will never contain loop
patterns, (public) read-array or write patterns, or or-patterns.
34
Once we have computed the 2-scs and identified the padding characters needed
for each trace, we must generate “dummy” statements to insert in the program that
generate the same events. This is straightforward. In essence, we can allocate a
“dummy” variable do for each ORAM bank o in the program, and then read, write,
and compute on that variable as needed to generate the correct event. Suppose
we had the program if(e, skip, x[y] := z) and thus Tt =  and Tf = o1@o1@o2.
Computing the 2-scs we find that Tt can be pre-padded with o1@o1@o2 while Tf
needs not be padded. We can readily generate statements that correspond to both.
For the second, we do not need to pad anything. For the first, we can produce
do2 := do1 +do1 . When we must produce an event corresponding to a public read, or
read from an array, we can essentially just insert a read from that variable directly.
Note that this approach will generate more padding instructions than is strictly
needed. In the above example, the final program will be
if(e, (do2 := do1 + do1 ; skip), (x[y] := z))
Peephole optimizations can be used to eliminate some superfluous instructions.
However, a better approach is to use a finer-grained alphabet which in practice
is available when using three address code, i.e., as the intermediate representation
of an actual compiler. We will demonstrate this in GhostRider.
Once padding has been inserted, both branches have the same number of
statements, and thus we can allocate each pair of statements in its own ORAM
bank. Assuming we did not drop the skip statements in the program above, we
35
Table 2.1: Programs and parameters used in our simulation.
No. Description
1 Dijkstra (n = 100 nodes)
2 K-means (n = 100 data points, k = 2, I = 1 iteraion)
3 Matrix multiplication (n× n matrix where n = 40)
4 Matrix multiplication (n× n matrix where n = 25)
5 Find max (n = 100 elements in the array)
6 Find max (n = 10000 elements in the array)
could allocate them both in ORAM bank o3 and allocate the two assignments in
ORAM bank o4, rather than allocate all instructions in ORAM bank o as is the case
now.
2.6 Evaluation
To demonstrate the efficiency gains achieved by our compiler in comparison
with the straightforward approach of placing all secret variables in the same ORAM
bank, we choose four example programs: Dijkstra single-source shortest paths, K-
means, Matrix multiplication (na¨ıve O(n3) implementation), and Find-max.
We will compare three different strategies:
Strawman: Place all secret variables in the same ORAM bank .
Opt 1: Store each variable in a separate ORAM bank, but store whole arrays
in the same bank.
Opt 2: Store each variable and each member of an array in a separate ORAM
bank .
In all three cases, we insert necessary padding to ensure obliviousness.
36
Figure 2.9: Simulation Results for Strawman, Opt 1, and Opt 2.
2.6.1 Simulation Results
We also performed simulation to measure the performance of the example
programs when compiled by our compiler. Table 2.1 shows the parameters we choose
for our experiment. We built a simulator in C++ that can measure the number of
memory accesses for transformed programs. Implementing a full-fledged compiler
that integrates with our ORAM-capable hardware concurrently being built [?] is left
as future work.
Simulation results are given in Figure 2.9 for the six setups described in Ta-
ble 2.1. The ORAM scheme we used in the simulation is due to Shi et al [80]. The
figure shows that Opt 1 is 1.3 to 5 times faster the strawman scheme; and Opt 2 is
1 to 3 orders of magnitude faster than the strawman for the chosen programs and
parameters.
37
2.7 Conclusion Remarks
In this chapter, we have briefly explained how to achieve memory trace obliv-
iousness (MTO) in a basic client-server setting. We have demonstrated how MTO
is defined under Lbasic syntax and semantics, and how a type system can enforce
MTO by keeping track of traces. Throughout the rest of this thesis, we will exploit
the same idea in more settings, where hardware (in GhostRider) or algorithmic
(in ObliVM) constraints will require reasoning about more leakage. We will also
empirically demonstrate the advantages of the MTO approach in these settings.
38
Chapter 3: GhostRider: A Compiler-Hardware Approach
3.1 Introduction
In Chapter 2, we have explained the basic idea how a type system can enforce a
program to be memory trace oblivious. In this chapter, we consider a realistic setting
to protect the programs against physical attackers who have control to everything
except the processor. As explained in above, data encryption is necessary but not
sufficient. Our approach is to build a processor with an ORAM controller [28,
63] so that it can obfuscate the memory accesses to the ORAM. As explained in
Chapter 2, this is not the most efficient approach, since when a program’s memory
access pattern does not depend on secret data, encryption is sufficient and ORAM
is not necessary. To enable such an optimization, we enhance the existing ORAM
processor design to allow splitting memory into regions, consisting of one or more
ORAM banks, encrypted memory, and normal DRAM.
On top of such a design, building a compiler from a high-level source language
into a low-level assembly language is non-trivial, even given our results from Chap-
ter 2. In particular, the cache behavior and handling assembly code make designing
a MTO type system much more challenging.
In this Chapter, we present GhostRider as a hardware-compiler co-design ap-
39
proach to achieve memory trace obliviousness and efficiency at the same time.
This Chapter is based on a paper that I co-authored with Austin Harris,
Michael Hicks, Martin Maas, Mohit Tiwari, and Elaine Shi [57]. I developed the
formalism and the proof under the help of Michael Hicks. I also developed the
compiler, which implements both the optimization and the type checker, and emits
code that is runnable on an FPGA implementation developed by my co-authors
Austin Harris, Martin Maas, and Mohit Tiwari. I conducted experiments to show
the compiler’s effectiveness with the help of Austin Harris, Martin Maas, and Mohit
Tiwari.
3.1.1 Our Results and Contributions.
In this paper, we make the first endeavor to bring the theory of MTO to
practice. We design and build GhostRider, a hardware/software platform for prov-
ably secure, memory-trace oblivious program execution. Compiling to a realistic
architecture while formally ensuring MTO raises interesting challenges in the com-
piler and type system design, and ultimately requires a co-operative re-design of the
underlying processor architecture. Our contributions are:
New compiler and type system. We build the first memory-trace oblivious
compiler that emits target code for a realistic ORAM-capable processor architec-
ture. The compiler must explicitly handle low-level resource allocation based on
underlying hardware constraints, and while doing so is standard in non-oblivious
compilers, achieving them while respecting the MTO property is non-trivial. Stan-
40
dard resource allocation mechanisms would fail to address the MTO property. For
example, register allocation spills registers to the stack, thereby introducing memory
events. Furthermore, caching serves memory requests from an on-chip cache, which
suppresses memory events. If these actions are correlated with secret data, they
can leak information. We introduce new techniques for resolving such challenges.
In lieu of implicit caches we employ an explicit, on-chip scratchpad. Our compiler
implements caching in software when its use does not compromise MTO.
To formally ensure the MTO property, we define a new type system for a
RISC-style low-level assembly language. We show that any well-typed program in
this assembly language will respect memory-trace obliviousness during execution.
When starting from source programs that satisfy a standard information flow type
system [26], our compiler generates type-correct, and therefore safe, target code.
Specifically, we implement a type checker that can verify the type-correctness of the
target code.
Processor architecture for MTO program execution. To enable an auto-
mated approach for efficient memory-trace oblivious program execution, we need
new hardware features that are not readily available in existing ORAM-capable
processor architectures [27, 29, 63]. GhostRider builds on the Phantom processor
architecture [63] but exposes new features and knobs to the software. In addition to
supporting a scratchpad, as mentioned above, the GhostRide architecture comple-
ments Phantom’s ORAM support with encrypted RAM (ERAM), which is not obliv-
ious and therefore more efficiently supports variables whose access patterns are not
sensitive. Section 3.6 describes additional hardware-level contributions. We proto-
41
typed the GhostRider processor on a Convey HC2 platform [21] with programmable
FPGA support. The GhostRider processor supports the RISC-V instruction set [93].
Implementation and Empirical Results. Our empirical results are obtained
through a combination of software emulation and experiments on an FPGA proto-
type. Our FPGA prototype supports one ERAM bank, one code ORAM bank, and
one data ORAM bank. The real processor experiments demonstrate the feasibility
of our architecture, while the software simulator allows us to test a range of config-
urations not limited by the constraints of the current hardware. In particular, the
software simulator models multiple ORAM banks at a higher clock rate.
Our experimental results show that compared to the baseline approach of
placing everything in a single ORAM bank, our compile-time static analysis achieves
up to nearly an order-of-magnitude speedup for many common programs.
3.2 Architecture and Approach
This section motivates our approach and presents an overview of GhostRider’s
hardware/software co-design.
3.2.1 Motivating example
We wish to support a scenario in which a client asks an untrusted cloud
provider to run a computation on the client’s private data. For example, suppose the
client wants the provider to run the program shown in Figure 3.1, which is a simple
histogram program written in a C-like source language. As input, the program takes
42
void histogram(secret int a[], // ERAM
secret int c[]) { // ORAM (output)
public int i;
secret int t, v;
for(i=0;i<100000;i++) // 100000 <= len(c)
c[i]=0;
i=0;
for(i=0;i<100000;i++) { // 100000 <= len(a)
v=a[i];
if(v>0) t=v%1000;
else t=(0-v)%1000;
c[t]=c[t]+1; } }
Figure 3.1: Motivating source program of GhostRider.
an integer array a, and as output it modifies integer array parameter c. We assume
both arrays have size 100,000. The function’s code is straightforward, computing
the histogram of the absolute values of integers modulo 1000 appearing in the input
array. The client’s security goal is data confidentiality : the cloud provider runs the
program on input array a, producing output array c, but nevertheless learns nothing
about the contents of either a or c. We express this goal by labeling both arrays
with the qualifier secret (data labeled public is non-sensitive).
3.2.2 Threat model
The adversary has physical access to the machine(s) being used to run client
computations. As in prior work that advocates the minimization of the hardware
trusted computing base (TCB) [85–87], we assume that trust ends at the bound-
ary of the secure processor. Off-chip components are considered insecure, including
memory, system buses, and peripherals. For example, we assume the adversary
can observe the contents of memory, and can observe communications on the bus
43
Instruc/on!
Scratchpad!
Data!
Scratchpad!
Determinis/c!CPU!pipeline!
Data!!
Transfer!!
Unit!Encrypted!
RAM!Ctrl!
Oblivious!!
RAM!Ctrl!H
os
t@
Co
pr
oc
es
so
r!!
In
te
rf
ac
e!Host%
CPU%
+%%
OS%
Remote%
User%
!
GhostRider%Secure%Co4processor%
Main%Memory%
Encrypted!
RAM!Bank! …!
Oblivious!!
RAM!Bank! …!RAM!Bank! …!
Figure 3.2: GhostRider architecture.
between the processor and the memory. By contrast, we assume that on-chip com-
ponents are secure. Specifically, the adversary cannot observe the contents of the
cache, the register file, or any on-chip communications. Finally, we assume the ad-
versary can make fine-grained timing measurements, and therefore can learn, for
example, the gap between observed events. Analogous side channels such as power
consumption are outside the scope of this paper.
3.2.3 Architectural Overview
As mentioned in the introduction, one way to defend against such an adversary
is to place all data in a single (large) ORAM; e.g., for the program in Figure 3.1 we
place the arrays a and c in ORAM. Unfortunately this baseline approach is not only
expensive, but also leaks information through the total number of ORAM accesses
(if the access trace is not padded to a value that is independent of secret data). We
44
now provide an architectural overview of GhostRider (Figure 3.2) and contrast it
with this baseline.
Joint ORAM-ERAM memory system In the GhostRider architecture, main
memory is split into three types—normal (unencrypted) memory (RAM), encrypted
memory (ERAM), and oblivious RAM (ORAM)—with one or more (logical) banks
of each type comprising the system’s physical memory. The differentiation of mem-
ory into banks allows a compiler to place only arrays with sensitive access patterns
inside the more expensive ORAM banks, while keeping the remaining data in the
significantly faster RAM or ERAM banks. For example, notice that in the program
in Figure 3.1 the array a is always accessed sequentially while access patterns to
the array c can depend on secret array contents. Therefore, our GhostRider com-
piler can place the array a inside an ERAM bank, and place the array c inside an
ORAM bank. The program accesses different memory banks at the level of blocks
using instructions that specify the bank and a block-offset within the bank (after
moving data to on-chip memory as described below). Our hardware prototype fixes
block sizes to be 4KB for both ERAM and ORAM banks (which is not an inherent
limitation of the hardware design).
Software-directed scratchpad As mentioned earlier, cache hit and miss behavior
can lead to differences in the observable memory traces. To prevent such cache-
channel information leakage, the GhostRider architecture turns off implicit caching,
and instead offers software-directed scratchpads for both instructions and data.
These scratchpads are mapped into the program’s address space so that the compiler
can generate code to access them explicitly, and thereby avoid information leaks.
45
For example, the indices of array a in Figure 3.1 are deterministic; they do not
depend on any secret input. As such, it is safe to use the scratchpad to cache array
a’s accesses. The compiler generates code to check whether the relevant block is in
the scratchpad, and if not loads the block from memory. On the other hand, all
accesses to array c depend on the secret input a, so a memory request will always
be issued independent of whether the requested block is in the scratchpad or not.
Deterministic Processor Pipeline To avoid timing-channel leakage, our pipelined
processor ensures that instruction timings are deterministic. We do not use dynamic
branch prediction and fix variable-duration instructions, such as division, to take the
worst-case execution time, and disable concurrent execution of other instructions.
Initialization We design the oblivious processor and memory banks as a co-processor
that runs the application natively (i.e., without an OS) and is connected to a net-
worked host computer that can be accessed remotely by a user. We assume that
the secure co-processor has non-volatile memory for storing a long-term public key
(certified using PKI), such that the client can securely ship its encrypted code and
data to the remote host, and initialize execution on the secure co-processor. Imple-
menting the secure attestation is standard [4], and we leave it to future work.
3.3 Formalizing the target language
This section presents a small formalization of GhostRider’s instruction set,
which we call LGhostRider. The next section presents a type system for this language
that guarantees security, and the following section describes our compiler from a
46
m,n ∈ Z o1, ..., on ∈ ORAMbanks
k ∈ Block IDs r ∈ Registers
l ∈ Labels = {D,E} ∪ORAMbanks
ι ::= ldb k ← l[r] load block to scratchpad
| stb k store block to memory
| r ← idb k retrieve the block ID
| ldw r1 ← k[r2] load a scratchpad val. to reg.
| stw r1 → k[r2] store a reg. val. to scratchpad
| r1 ← r2 aop r3 compute an operation
| r1 ← n assign a constant to a register
| jmp n (relative) jump
| br r1 rop r2 ↪→ n compare and branch
| nop empty operation
I ::= ι | I; ι instruction sequence
Figure 3.3: Syntax for LGhostRider language, comprising (1) ldb and stb instructions
that move data blocks between scratchpad and a specific ERAM or ORAM bank,
and (2) scratchpad-to-register moves and standard RISC instructions.
C-like source language to well-typed LGhostRider programs.
3.3.1 Instruction set
The core instructions of LT are in the style of RISC-V [93], our prototype’s
instruction set, and are formalized in Figure 3.3. We define labels l that distinguish
the three kinds of main memory: D for normal (D)RAM, E for ERAM, and oi for
ORAM. For the last, the i identifies a particular ORAM bank. We can view each
label as defining a distinct address space.
The instruction ldb k ← l[r] loads a block from memory into the scratchpad.1
Here, l is the address space, r is a register containing the address of the block to
load from within that address space, and k is the scratchpad block identifier. Our
1In our hardware prototype the scratchpad is mapped into addressable memory, so this in-
struction and its counterpart, stb, are implemented as data transfers. In addition, the compiler
implements idb. We model them in LT explicitly for simplicity; see Section 3.6 for implementation
details.
47
formalism refers to scratchpad blocks by their identifier, treating them similarly to
registers. Our architecture remembers the address space and block address within
that address space that the scratchpad block was loaded from so that writebacks,
via the stb k instruction, will go to the original location. We enforce this one-to-one
mapping to avoid information leaks via write-back from the scratchpad (e.g., that
where a scratchpad block is written to, in memory, could reveal information about
a secret, or that the effect of a write could do so, if blocks are aliased).
To access values from the scratchpad, we have scratchpad-load and scratchpad-
store instructions. The scratchpad-load instruction loads a word from a scratchpad
block, having the form ldw r1 ← k[r2]. Assuming register r2 contains n, this
instruction loads the n-th word in block k into register r1 (notice that we use word-
oriented addressing, not byte-oriented). The scratchpad-store instruction is similar,
but goes in the reverse direction. The instruction r ← idb k retrieves the block
offset of a scratchpad block k.
We have two kinds of assignment instructions, one in the form of r1 ←
r2 aop r3, and the other in the form of r ← n. In LT we only model integer
arithmetic operations, such as addition, subtraction, multiplication, division, and
modulus.
Jumps and branches use relative addressing. The jump instruction jmp n
bumps the program counter by n instructions (where n can be negative). Branches,
having the form br r1 rop r2 ↪→ n, will compare the contents of r1 and r2 using rop,
and will bump the pc by n if the comparison result is true. An instruction sequence
I is defined to be a sequence of instructions concatenated using a logical operation ;.
48
; v=a[i]
1 t1 ← ri div sizeblk
2 t2 ← ri mod sizeblk
3 ldb k1 ← E[t1]
4 ldw rv ← k1[t2]
; if(v>0) t= ...
5 br rv ≤ 0 ↪→ 3
6 rt ← rv % 1000
7 jmp 3
; else t= ...
8 t1 ← 0 − rv
9 rt ← t1 % 1000
; c[t]=c[t]+1
10 t1 ← rt >> 9
11 t2 ← rt & 511
12 ldb k2 ← O[t1]
13 ldw t3 ← k2[t2]
14 t4 ← t3 + 1
15 stw t4 → k2[t2]
16 stb k2
Figure 3.4: LGhostRider code implementing (part of) Figure 3.1
We overload ; to operate over two instruction sequences such that I; (I ′; ι) , (I; I ′); ι
and I1; I2; I3 , (I1; I2); I3.
Note that our formalism does not model the instruction scratchpad; essen-
tially it assumes that all code is loaded on-chip prior to the start of its execution.
Section 3.5 discusses how the instruction scratchpad is used in practice.
3.3.2 Example
Figure 3.4 shows LGhostRider code that corresponds to the body of the second
for loop in the source program from Figure 3.1. We write rX for a register cor-
responding to variable X in the source program (for simplicity) and write ti for
i ∈ {1, 2, ...} for temporary registers. In the explanation we refer to the names of
variables in the source program when describing what the target program is com-
puting.
The first four lines load the ith element of array a into v. Line 1 computes
the address of the block in memory that contains the ith element of array a and
49
line 2 computes the offset of the element within that block. Here sizeblk is the size
of each block, which is an architecture constant. Line 3 then loads the block from
ERAM, and line 4 loads the appropriate value from the loaded block into v.
The next five lines implement the conditional. Line 5 jumps three instructions
forward if v is not greater than 0, else it falls through to line 6, which computes t.
Line 7 then jumps past the else branch, which begins on line 8, which negates v to
make it positive before computing t.
The final seven lines increment c[t]. Lines 10–13 are analogous to lines 1–4;
they compute the address of tth element of array c and load it into temporary
t3. Notice that this time the block is loaded from ORAM, not ERAM. Line 14
increments the temporary; line 15 stores it back to the block in the scratchpad; and
line 16 stores the entire block back to ORAM.
3.4 Security by typing
This section presents a type system for LT that guarantees programs obey the
strong memory trace obliviousness (MTO) security property.
3.4.1 Memory Trace Obliviousness
Memory trace obliviousness is a noninterference property that also considers
the address trace, rather than just the initial and final contents of memory [79].
MTO’s definition relies on the notion of low equivalence which relates memories
whose RAM contents are identical. We formally define this notion below, using the
50
following formal notation:
M ∈ Addresses→ Blocks
a ∈ Addresses = Labels×Nat
b ∈ Blocks = Nat→ Z
We model a memory M as a map from addresses to blocks, where an address is a
pair consisting of a label l (corresponding to an ORAM, ERAM, or RAM bank, as
per Figure 3.3) and an address n in that bank. A block is modeled as a map from
an address n to a (integer) value. Here is the definition of memory low-equivalence:
Definition 4 (Memory low equivalence). Two memories M1, M2 are low equivalent,
written M1 ∼L M2, if and only if for all n such that 0 ≤ n < size(D) we have
M1(D,n) = M2(D,n).
The definition states that memories M1 and M2 are low equivalent when only
the RAM bank’s values of the memories are the same, but all of the other values
could differ.
Intuitively, memory trace obliviousness says two things given two low-equivalent
memories. First, if the program will terminate under one memory, then it will ter-
minate under the other. Second, if the program will terminate and lead to a trace
t under one memory, then it will do so under the other memory as well while also
finishing with low-equivalent memories.
To state this intuition precisely, we need a formal definition of a LT execu-
tion, which we give as an operational semantics. The semantics is largely stan-
51
dard, and can be found in the Appendix. The key judgment has the form I `
(R, S,M, pc) −→t (R′, S ′,M ′, pc′), which states that program I, with a register file
R, a (data) scratchpad S, a memory M , and a program counter pc, executes some
number of steps, producing memory trace t and resulting in a possibly modified
register file R′, scratchpad S ′, memory M ′, and program counter pc′.
Definition 5 (Memory trace obliviousness). A program I is memory trace obliv-
ious if and only if for all memories M1 ∼L M2 we have I ` (R0, S0,M1, 0) −→t1
(R′1, S
′
1,M
′
1, pc1), and I ` (R0, S0,M2, 0) −→t2 (R′2, S ′2, M ′2, pc2), and |t1| = |t2|
implies t1 ≡ t2 and M ′1 ∼L M ′2.
Here R0 is a mapping that maps every register to 0, and S0 maps every address
to a all-0 block. Traces t consist of reads/writes to RAM (both address and value)
and ERAM (just the address), accesses to ORAM (just the bank), and instruction
fetches. For the last we only model that a fetch happened, not what instruction
it is, as we assume code will be stored in a scratchpad on chip. We write t1 ≡ t2
to say that traces t1 and t2 are indistinguishable to the attacker; i.e., they consist
of the same events in the same order. Our formalism models every instruction as
taking unit time to execute – thus the trace event also models the time taken to
execute the instruction. On the real GhostRider architecture, each instruction takes
deterministic but non-uniform time; as this difference is conceptually easy to handle
(by accounting for instruction execution times in the compiler), we do not model it
formally, for simplicity (see Section 3.5).
52
Sym. vals. sv ∈ SymVals = n | ? | sv1 aop sv2 | Ml[k, sv]
Sym. Store Sym ∈ Registers ∪Block IDs→ SymVals
Sec. Labels ` ∈ SecLabels = L | H
Label Map Υ ∈ (Registers→ SecLabels) ∪ (Block IDs→ Labels)
sv1 ≡ sv2
`safe sv1 `safe sv2
sv1 = sv2
sv1 ≡ sv2
Auxiliary Functions
select(l, a, b, c) =
a if l = D
b if l = E
c if otherwise.
slab(l) =
select(l, L, H, H)
ite(x, a, b) ={
a if x is true.
b otherwise.
`safe sv
l = D `safe sv
`safe Ml[k, sv]
`safe n
`safe sv1 `safe sv2
`safe sv1 aop sv2
`const sv
`const n `const ?
`const sv1 `const sv2
`const sv1 aop sv2
`const Sym
∀r. `const Sym(r)
∀k. `const Sym(k)
`const Sym
Figure 3.5: Symbolic values, labels, auxiliary judgments and functions
3.4.2 Typing: Preliminaries
Now we give a type system for LGhostRider programs and prove that type correct
programs are MTO.
Symbolic values To ensure that the execution of a program cannot leak infor-
mation via its address trace, we statically approximate what events a program can
produce. An important determination made by the type system is when a secret
53
variable can be stored in ERAM—because the address trace will leak no information
about it—and when it must be accessed in ORAM. As an example, suppose we had
the source program
if(s) then x[i]=1 else x[i]=2;
If x is secret but stored in RAM, then the value in x[i] after running this program
will leak the contents of secret variable s. We could store x in ORAM to avoid this
problem, but this is unnecessary: both branches will modify the same element of x,
so encrypting the content of x is enough to prevent the address trace from leaking
information about s. The type system can identify this situation by symbolically
tracking the contents of the registers, blocks, etc.
To do this, the type rules maintain a symbolic store Sym, which is a map
from register and block IDs to symbolic values. Figure 3.5 defines symbolic values
sv, which consist of constants n, (symbolic) arithmetic expressions, values loaded
from memory Ml[k, sv], and unknowns ?. Most interesting is memory values, which
represent the address of a loaded value: l indicates the memory bank it was loaded
from, sv corresponds to the offset (i.e., the block number) within that bank, and k
is the scratchpad block into which the memory block is loaded.2
The type rules also make use of a label map Υ mapping registers to security
labels and block IDs to (memory) labels; the latter tracks the memory bank from
which a scratchpad block was loaded.
The figure defines several judgments; the form of each judgment is boxed. The
2In actual traces t, the block number k is not visible; we track it symbolically to model the
scratchpad’s contents, in particular to ensure that the same memory block is not loaded into two
different scratchpad blocks.
54
first defines when two symbolic values can be deemed equivalent, written sv1 ≡ sv2:
they must be syntactically identical and safe static approximations. The latter is
defined by the judgment `safe sv, which accepts constants, memory accesses to RAM
involving safe indexes, and arithmetic expressions involving safe values. Judgement
`const sv says that symbolic value sv is not a memory value. That is, sv is either
a constant, a ?, or a binary expression not involving memory values. Further, for a
symbolic store Sym, if all the symbolic values that it maps to can be accepted by
`const sv, then we have `const Sym. The latter judgment is needed when checking
conditionals.
Finally, we give three auxiliary functions used in the type system. Based on
whether l is D, E, or an ORAM bank, function select(l, a, b, c) returns a, b, or c
respectively. Function slab(·) maps a normal label l to a security label `, which
is either L or H. The label H classifies encrypted memory—any ORAM bank and
ERAM—while label L classifies RAM. These two labels form the two-point lattice
with L @ H. Note that L is equivalent to the public label used in Figure 3.1, and
H is equivalent to secret. Finally, function ite(x, a, b) returns a if x is true, and
returns b if x is false.
Trace patterns Figure 3.6 defines trace patterns T , which are largely similar to
those for Lbasic that approximate traces t. The first line in the definition of T defines
single events. The first two indicate reads and writes to RAM or ERAM; they
reference the memory bank, block identifier in the scratchpad, and a symbolic value
corresponding to the block address (not the actual value) read or written. Pattern
F corresponds to a non memory-accessing instruction. The next pattern indicates
55
Trace Pats. T ::= read(l, k, sv) | write(l, k, sv) | F | o
| T1@T2 | T1 + T2 | loop(T1, T2)
sv1 ≡ sv2
read(l, k, sv1) ≡ read(l, k, sv2)
o ≡ o T1 ≡ T2
T2 ≡ T1 F ≡ F
sv1 ≡ sv2
write(l, k, sv1) ≡ write(l, k, sv2)
T1 ≡ T ′1 T2 ≡ T ′2
T1@T2 ≡ T ′1@T ′2
T1@(T2@T3) ≡ (T1@T2)@T3
Figure 3.6: Trace patterns and their equivalence in LGhostRider
a read or write from ORAM bank o: this bank is the trace event itself because the
adversary cannot determine whether an access is a read or a write, or which block
within the ORAM is accessed. Trace pattern T1@T2 is the pattern resulting from
the concatenation of patterns T1 and T2. Pattern T1 + T2 represents either T1 or
T2, and is used to type conditionals. Finally, pattern loop(T1, T2) represents zero
or more loop iterations where the guard’s trace is T1 and the body’s trace is T2.
Trace pattern equivalence T1 ≡ T2 is defined in Figure 3.6. In this definition,
reads are equivalent to other reads accessing exactly the same location; the same
goes for writes. Two ORAM accesses to the same ORAM bank are obviously treated
as equivalent. Sum patterns specify possibly different trace patterns, and loop pat-
terns do not specify the number of iterations; as such we cannot determine their
equivalence statically. The concatenation operator @ is associative with respect to
equivalence.
56
Instructions
T-LOAD
l 6∈ ORAMbanks⇒ Υ(r) = L
Υ′ = Υ[k 7→ l] Sym′ = Sym[k 7→ Sym(r)]
T0 = read(l, k, Sym(r)) T = select(l, T0, T0, l)
` ` ldb k ← l[r] : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T
T-STORE
Sym(k) = sv Υ(k) = l
T0 = write(l, k, sv) T = select(l, T0, T0, l)
` ` stb k : 〈Υ, Sym〉 → 〈Υ, Sym〉;T
T-LOADW
l = Υ(k) Υ(r2) v slab(l)
Υ′ = Υ[r1 7→ slab(l)] sv = Ml[k, Sym(r2)]
Sym′ = Sym[r1 7→ sv]
` ` ldw r1 ← k[r2] : 〈Υ, Sym〉 → 〈Υ′, Sym′〉; F
T-STOREW
` unionsqΥ(r1) unionsqΥ(r2) v slab(Υ(k))
` ` stw r1 → k[r2] : 〈Υ, Sym〉 → 〈Υ, Sym〉; F
T-IDB
Sym(k) = sv Υ(k) = l
Υ′ = Υ[r 7→ select(l, L, L, H)] Sym′ = Sym[r 7→ sv]
` ` r ← idb k : 〈Υ, Sym〉 → 〈Υ′, Sym′〉; F
T-BOP
`′ = Υ(r2) unionsqΥ(r3) Υ′ = Υ[r1 7→ `′]
sv = Sym(r2) aop Sym(r3) Sym
′ = Sym[r1 7→ sv]
` ` r1 ← r2 aop r3 : 〈Υ, Sym〉 → 〈Υ′, Sym′〉; F
T-ASSIGN
Υ′ = Υ[r 7→ L] Sym′ = Sym[r 7→ n]
` ` r ← n : 〈Υ, Sym〉 → 〈Υ′, Sym′〉; F
T-NOP ` ` nop : 〈Υ, Sym〉 → 〈Υ, Sym〉; F
T-SEQ
` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T1
` ` ι : 〈Υ′, Sym′〉 → 〈Υ′′, Sym′′〉;T2
` ` I; ι : 〈Υ, Sym〉 → 〈Υ′′, Sym′′〉;T1@T2
Figure 3.7: Security Type System for LGhostRider (Part 1)
57
3.4.3 Type rules
Figures 3.7, 3.8, 3.9 define the security type system for LT . The figures are
divided into three parts.
Instructions Figure 3.7 presents judgment ` ` ι : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T , which
is used to type instructions ι. Here, ` is the security context, used in the standard
way to prevent implicit flows. The rules are flow sensitive: The judgement says
that instruction ι has a type 〈Υ, Sym〉 → 〈Υ′, Sym′〉, and generates trace pattern
T . Informally, we can say by executing ι, a state corresponding to security type
〈Υ, Sym〉 will be changed to have type 〈Υ′, Sym′〉.
Rule T-LOAD types load instructions. The first premise ensures that the
contents of register r, the indexing register, are not leaked by the operation. In
particular, the loaded memory bank l must either be ORAM, or else the register r
may only contain public data (from RAM). In the latter case, there is no issue with
leaking r, and in the former case r will not be leaked indirectly by the address of
the loaded memory since it is stored in ORAM. The final two premises determine
the final trace pattern. When the memory bank l is D or E, then the trace pattern
indicates a read event from the appropriate block and address. When reading from
an ORAM bank the event is just that bank itself. The other premises in the rule
update Υ to map the loaded block k to the label of the memory bank, and update
Sym to track the address of the block.
We defer discussion of rule T-STORE for the moment, and look at the next
three rules, T-LOADW, T-STOREW, T-IDB, which are used to load and store
58
values related blocks in the scratchpad. The first two rules resemble standard infor-
mation flow rules. The second premise of T-LOADW is similar to the first premise
of T-LOAD in preventing an indirect leak of index register r2, which would occur if
the label of r2 was H but the label of k was L. Likewise, the premise of T-STOREW
prevents leaking the contents of r1 and r2 into the stored block, and also prevents
an implicit flow from ` (the security context). As such, these two rules ensure that
a block k with label ` never contains information from memory labeled `′ such that
`′ A `. The remaining premises of Rule T-LOADW flow-sensitively track the label
and symbolic value of the loaded register. In particular, they set the label of r1 to
be that of the block loaded, and the symbolic value of r1 to be the address of the
loaded value in memory. T-STOREW changes neither Υ nor Sym: even though the
content of the scratchpad has changed, its memory label and its address in memory
has not. Both rules emit trace pattern F as the operations are purely on-chip. We
emit this event to account for the time taken to execute an instruction; assuming
uniform times for instructions and memory accesses, MTO executions will also be
free of timing channels.
Returning to rule T-STORE, we can see that the store takes place unconditionally—
no constraints on the labels of the memory or block must be satisfied. This is because
the other type rules ensure that all blocks k never contain information higher than
their security label `, and thus the block can be written straight to memory having
the same security label. That said, information could be leaked through the memory
trace, so the emitted trace pattern will differ depending on the label of the block: If
the label is D or E then the trace pattern will be a write event, and otherwise it will
59
Branching
T-IF
I = ι1; It; ι2; If |It| = n1 − 2 |If |+ 1 = n2
ι1 = br r1 rop r2 ↪→ n1 ι2 = jmp n2
`′ = ` unionsqΥ(r1) unionsqΥ(r2)
`′ ` It : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T1
`′ ` If : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T2
`′ = H⇒

T1@F ≡ T2 ∧
` = L⇒`const Sym ∧
∀r.Υ′(r) = L⇒ `safe Sym′(r)

T = ite(`′ = H, F@T1@F, F@((T1@F) + T2))
` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T
T-LOOP
I = Ic; ι1; Ib; ι2
|Ib| = n1 − 2 |Ic|+ n1 = 1− n2
ι1 = br r1 rop r2 ↪→ n1 ι2 = jmp n2
` unionsqΥ(r1) unionsqΥ(r2) v L
` ` Ic : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T1
` ` Ib : 〈Υ′, Sym′〉 → 〈Υ, Sym〉;T2
` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉; loop(T1, T2)
Figure 3.8: Security Type System for LGhostRider (Part 2)
be the appropriate ORAM event. Leaks via the memory trace are then prevented
by T-IF and T-LOOP, discussed shortly.
Rule T-IDB is similar to rule T-LOADW. For the third premise, if l is either
D or E, the block k has a public address, and thus the value assigned to register r
is public; otherwise, when l is an ORAM bank, the register r is secret.
Rule T-BOP types binary operations, updating the security label of the target
register to be the join of labels of the source registers. Rule T-ASSIGN gives the tar-
get register label L as constants are not secret. Rules T-NOP is always safe and has
no effect on the symbolic store or label environment. All of these operations occur
on-chip, and so have pattern F. Finally, rule T-SEQ types instruction sequences by
composing the symbolic maps, label environments, and traces in the obvious way.
60
Branching Figure 3.8 presents rules T-IF and T-LOOP for structured control flow.
Rule T-IF deals with instruction sequences of the form of I = ι1; It; ι2; If , where
ι1 is a branching instruction deciding, ι2 is a jump instruction jumping over the
false branch, and It and If are the true and false branches respectively; the relative
offsets n1 and n2 are based on the length of these code sequences. We require both
branches to have the same type, i.e. 〈Υ, Sym〉 → 〈Υ′, Sym′〉, as for the sequence I
itself.
When the security context is high, i.e. ` = H, or when the if-condition is
private, i.e. Υ(r1) unionsq Υ(r2) = H, then `′ will be H and we impose three restrictions.
First, both of the blocks It and If must have equivalent trace patterns. (The trace
of the true branch is T1@F where T1 covers It and F covers the jump instruction
ι2.) Second, if the security context is public, i.e. ` = L, then we restrict `const Sym
to enforce Sym(r) does not map to memory values. The reason is that in a public
context, two equivalent symbolic memory values may refer to two different concrete
values, since the memory region D can be modified. Third, for any register r, its
value after taking either branch must be the same, or the register r must have a
high security label (i.e. Υ′(r) = H). So if Υ′(r) = L, the type system enforces that
its symbolic values on the two paths are equivalent, i.e. Sym′(r) ≡ Sym′(r), which
only requires `safe Sym′(r).
The final premise for rule T-IF states that the sequence’s trace pattern T is
either F@T1@F when both branches’ patterns must be equal, or else is an or-pattern
involving the trace T2 from the else branch.
Rule T-LOOP imposes structural requirements on I similar to T-IF. The
61
Subtyping
T-SUB
` ` ι : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T
Υ′  Υ′′ Sym′  Sym′′
` ` ι : 〈Υ, Sym〉 → 〈Υ′′, Sym′′〉;T
S-LABEL
∀r.Υ(r) v Υ′(r)
∀k.Υ(k) = Υ′(k)
Υ  Υ′
S-SYM
∀r.Sym′(r) = ? ∨ Sym(r) = Sym′(r)
∀k.Sym′(k) = ? ∨ Sym(k) = Sym′(k)
Sym  Sym′
Figure 3.9: Security Type System for LGhostRider (Part 3)
premise ` unionsq Υ(r1) unionsq Υ(r2) v L implies two restrictions. On the one hand, ` v L,
prevents any loop from appearing in a secret if-statement, because otherwise the
number of loop iterations may leak information about which branch is taken. On
the other hand, Υ(r1)unionsqΥ(r2) v L implies that the loop condition must be public, or
otherwise, similarly, the number of iterations would leak secret information about
r1 and/or r2.
Subtyping Finally, Figure 3.9 presents subtyping rules. Rule T-SUB supports
subtyping on the symbolic store and the label map. For the first, a symbolic store
Sym can approximated by a store Sym′ that either agrees on the symbolic values
mapped to be Sym or maps them to ?. For the second, a register’s security label
can be approximated by one higher in the lattice; block labels may not change.
Subtyping is important for typing join points after branches or loops. For example,
if a conditional assigned a register r the value 1 in the true branch but assigned r
to 2 in the false branch, we would use subtyping to map r to ? to ensure that the
62
symbolic store at the end of both branches agrees, as required by T-IF.
3.4.4 Security theorem
All well-typed programs are memory-trace oblivious:
Theorem 2. Given I, Υ, and Sym, if there exists some Υ′, Sym′ and T such
that L ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T , where ∀r.Sym(r) = ? and Υ(r) = L and
∀k.Sym(k) = ? and Υ(k) = D then program I is memory-trace oblivious.
The proof can be found in the Appendix B.
3.5 Compilation
We have developed a compiler from an imperative, C-like source language,
which we call LS, to LGhostRider. Our compiler is implemented in about 7600 lines of
Java, with roughly 400 LoC dedicated to the parser, 700 LoC to the type checker,
3500 LoC to the compiler/optimizer, 950 LoC to the code generator, and the remain-
der to utility functions. This section informally describes our compilation approach.
3.5.1 Source Language
Syntax An LS program is a collection of (possibly mutually recursive) functions and
a collection of (possibly mutually recursive) type definitions. A type definition is
simply a mapping of a type name to a type where types are either natural numbers,
arrays, or pointers to records (i.e., C-style structs). Each type is annotated with a
63
security label which is either secret or public indicating whether the data should
be visible/inferrable by the adversary or not.
A function consists of a sequence of statements s which are either no-ops,
variable assignments, array assignments, conditionals, while loops, or returns. As
usual, conditional branches and loop bodies may consist of one or more statements.
Expressions e appearing in statements (e.g., in assignments) consist of variables x,
arithmetic ops e1 aop e2, array reads e[e], and numeric constants n. Variables may
hold any data other than functions (i.e., there are no function pointers). Guards in
conditionals and while loops consist of predicates involving relational operators.
Typing LS programs are type checked before they are compiled. We do this using
an information flow-style type system (cf. the survey of Sabelfeld and Myers [79]).
As is standard, the type system prevents explicit flows and implicit flows. In par-
ticular, it disallows assignments like p = s where p is a public variable and s is
a secret variable, and disallows conditionals like if (s == 0) then p = 0 else
p = 1, which leaks information about s since after the conditional the adversary
knows p == 0 implies s == 0. It also disallows array writes like p[s] = 5 since the
adversary can learn the value of s by seeing which element of the public array has
changed. Note that accessing s[p] is safe because, despite knowing the index, an
adversary cannot learn the value being accessed.
To prevent the length of a memory trace from revealing information, we re-
quire that loop guard expressions only involve public values (which is a standard
restriction [79]). One can work around this problem by “padding out” loop iter-
ations, e.g., by converting a loop like while (slen > 0) { sarr[slen--]++; }
64
to be plen = N; while (plen > 0) { if (plen <= slen) sarr[--plen]++; }
where N is a large, fixed constant. For similar reasons we also require that whether
a function is called or returned from, and which function it is, may not depend on
secret information (e.g., the call or return may not occur in a conditional whose
guard involves secret information).
Compilation overview After source-language type checking, compilation proceeds
in four stages—memory layout, translation, padding, and register allocation—after
which the result is type checked using the LGhostRider type system, to confirm that
it is memory-trace oblivious.3
3.5.2 Memory bank allocation
The first stage of compilation allocates global variables to memory banks.
Public variables are always stored in RAM, while secret variables will be allocated
either to ERAM or ORAM. Two blocks in the scratchpad are reserved for secret
and public variables, respectively, that will fit entirely within the block; these are
essentially those that contain numbers, (pointers to) records, and small arrays. Such
variables will be loaded into the scratchpad at the start of executing a program, and
written back to memory at the end. The remaining scratchpad blocks are used for
handling (large) arrays; the compiler will always use the same block for the same
array. Public arrays are allocated in RAM, and secret arrays always indexed by
public values are allocated in ERAM, and ORAM otherwise. The compiler initially
3This is essentially a kind of translation validation [73], which removes the compiler from the
trusted computing base. We believe that well typed LS programs yield well typed LGhostRider
programs, but leave a proof as future work.
65
assigns a distinct logical ORAM bank for each secret array, and allocates logical
banks up to the hardware limit.
3.5.3 Basic compilation
The next stage is basic compilation (translation). Expressions are compiled
by loading relevant variables/data into registers, performing the computation, and
then storing back the result. Statements are compiled idiomatically to match the
structure expected by the type rules in Figure 3.7, Figure 3.8,and Figure 3.9 (with
some work deferred to the padding stage).
Perhaps the most interesting part is handling variable accesses. Variables
permanently resident in the scratchpad are loaded at the start of the program, and
stored back at the end. Each read/write results in a ldw, to load a variable into a
temporary register, and a stw to store back the result. Accesses to data (i.e., arrays)
not permanently stored in the scratchpad will also require a ldb to load the relevant
block into the scratchpad first and likewise a stb to store it back. A standard
software cache, rather than a scratchpad, could eliminate repeated loads and stores
of blocks from memory but could violate MTO. This is because a non-present block
will induce memory traffic while a present block will not, and the presence/absence
of traffic could be correlated with secret information. To avoid this, we have the
compiler emit instructions that perform caching explicitly, using the scratchpad,
with caching only enabled when in a public context, i.e., in a portion of code whose
control flow does not depend on secret data. To support software-based caching,
66
the compiler statically maps memory-resident data to particular scratchpad blocks,
always loading the same data to the same block. Prior to doing so, and when safe,
the compiler uses the idb instruction to check whether the relevant scratchpad block
contains the memory block we want and loads directly from it, if so.
Supporting functions requires handling calling contexts and local variables.
We do this with two stacks, one in RAM and one in ERAM. Function calls are
only permitted in a public context, which means that normal stack allocation and
deallocation reveal no information, so no ORAM stack is needed. When a function
is called, the current scratchpad variable blocks are pushed on the relevant stacks.
At the start of a function, we load the blocks that hold the local variables. Local
variables implementing ORAM arrays are stored by reference, with the variable
pointing to the actual array stored in ORAM. This array is deallocated when its
variable is popped from the stack, when the function returns (which like calls are
allowed only in a public context).
The compiler is also responsible for emitting instructions that load code into
the instruction scratchpad, as implicit instruction fetches could reveal information.
(To bootstrap, the first code block is loaded automatically.) At the moment, our
compiler emits code that loads the entire program into the scratchpad at the start;
we leave to future work support for on-the-fly instruction scratchpad use.
67
3.5.4 Padding and register allocation
Both branches of a secret conditional must produce the same trace. We ensure
they do so by inserting extra instructions in one or both branches according to the
solution to the shortest common supersequence problem [31]. When matching the
two branches, we must account for the memory trace and instruction execution
times. Only ldb and stb emit memory events; we discuss these shortly. While
our formalism assumes each instruction takes unit time, the reality is different (cf.
Table 3.2): times are deterministic, but non-uniform. For single-cycle operations
(e.g., 64b ALU ops), we pad with nops. For two-cycle ldw and stw instructions, we
pad with two nops. For multiply and divide instructions, which take 70 cycles each,
we could pad with 70 nops but this results in a large space overhead. As such, we
pad both with the instruction r0← r0 ∗ r0, where r0 is always 0. For conditionals,
we pad the not-taken branch with two nops, to account for the hardware-induced
delay on the taken branch.
Padding for stb and ldb requires instructions that generate matching trace
events. An access to ORAM is the simplest to pad, since the adversary cannot
distinguish a read from a write. We can load any block (e.g., the first block of the
ORAM) into a dedicated “dummy” scratchpad block, i.e. this block is used for
loading and saving dummy memory blocks only.
For RAM and ERAM, the address being accessed is visible, so we need to make
sure that the equivalent padding accesses the same address. To do this, the compiler
should insert further instructions to compute the address. These instructions can
68
be computed using the symbolic value: (1) if the symbolic value is a constant,
then insert an assign instruction; (2) if the symbolic value is a binary operation of
two symbolic values, then insert instructions to compute the two symbolic values
respectively, and then another instruction to compute the binary operation; and (3)
if the symbolic value is a memory value, then insert instructions to compute the
offset first, and then insert a ldw instruction.
With instructions inserted to compute the address, we must emit either a load
or a store depending on the instruction we are trying to match. For RAM, this
instruction will always be a load because we perform padding in the H context, and
the type system prevents writing to RAM. To mimic the read(l, k, sv) trace pattern,
we first compute sv and then insert a ldb k ← l[r] instruction where r stores the
value for sv. To handle ERAM writes is challenging because we want the write to
be a no-op but not appear to be so. To do this, we require the compiler to always
follow an ERAM ldb with a stb back to the same address. In doing so, the compiler
also prevents the padded instruction from overwriting a dirty scratchpad block.
At the conclusion of the padding stage we perform standard register allocation
to fill in actual registers for the temporaries we have used to this point.
3.6 Hardware Implementation
We implement our deterministic processor by modifying Rocket, a single-issue,
in-order, 6-stage pipelined CPU developed at UC Berkeley [77]. Rocket implements
the RISC-V instruction set [93] and is comparable to an ARM Cortex A5 CPU.
69
We modified the baseline processor to remove branch prediction logic (so that con-
ditional branches are always not-taken) and to make each instruction execute in a
fixed number of cycles. We describe the remaining changes below.
Instruction-set Extension. We customize RISC-V to add a single data transfer
instruction that implements ldb and stb from the formalism. We do this using a
Data Transfer accelerator (Figure 3.2) that attaches to the processor’s accelerator
interface [88]. We also interface the Data Transfer accelerator with the x86-Linux
host through Rocket’s control register file so that it can load an elf-formatted
binary into GhostRider’s memory and reset its processor. Once this is done, the
host performs processor control register writes to initiate transfers from the co-
processor memory to the code ORAM for the code and data sections of the binary.
The first code block of a program is loaded into the instruction scratchpad to begin
execution; if subsequent instruction blocks are needed they must be loaded explicitly.
Scratchpads. GhostRider has two scratchpads, one for code and one for data,
each of which can hold eight 4KB blocks. The instruction scratchpad is imple-
mented similar to an 8-way set-associative cache, where each way contains one block.
The accelerator transfers one block at a time to a specified way in the instruction
scratchpad. Once a block has been written, the valid and tag bits for that block
are updated. The architecture does not implement the idb instruction from the
formalism; instead, the compiler uses the first 8 bytes of every block to remember
its address.
ORAM controller. We implement ORAM by building on the Phantom ORAM
controller [63] and implement an ORAM tree 13 levels deep (i.e., 212 leaf buckets),
70
with 4 blocks per bucket and an effective capacity of 64MB. ORAM controllers in-
clude an on-chip stash to temporarily buffer ORAM blocks before they are written
out to memory. We set this stash to be 128 blocks. The Phantom design (and like-
wise, Ascend’s [27–29]) treats the stash as a cache for ORAM lookups, which is safe
when handling timing channels by controlling the memory access rate. GhostRider
mitigates timing channels by having the compiler enforce MTO while assuming that
events take the same time. As such, we modify Phantom’s design to generate an
access to a random leaf in case the requested block is found in the stash, to ensure
uniform access times.
FPGA Implementation. GhostRider is implemented on one of Convey HC-
2ex’s [21] four Xilinx Virtex-6 LX760 FPGAs. We measure hardware design size
in terms of FPGA slices for logic and Block RAMs for on-chip memory. A slice com-
prises four 6-input, 2-output lookup tables (implementing configurable logic) and
eight flip-flops (as storage elements) in addition to multiplexers, while each BRAM
on Virtex-6 is either an 18Kb or 36Kb SRAM with up to two configurable read-
write ports. The GhostRider prototype uses 47,357 such slices (39% of total) to
implement both the CPU and the ORAM controller, and requires 685 of 1440 18Kb
BRAMs (47.5%). Table 3.1 shows how these resources are broken up between the
Rocket CPU and the ORAM controller, with the remaining resources being used by
Convey HC-2ex’s boilerplate logic to interface with the x86 core and DRAM. Note
that this breakdown is a synthesis estimate before place and route.
Our prototype currently supports one data ORAM bank, one code ORAM
bank, and one ERAM bank. We do not implement encryption (it is a small, fixed
71
cost and uninteresting in terms of performance trends), and do not have separate
DRAM; all public data is stored in ERAM when running on the hardware.
The Convey machine requires the hardware design to be run at 150 MHz while
our ORAM controller prototype currently synthesizes to a maximum operating fre-
quecy of 140MHz. Pending further optimization to meet 150 MHz timing, we run
both the CPU and the ORAM controller in a 75 MHz clock domain, and use asyn-
chronous FIFOs to connect the ORAM controller to the DDR DRAM controllers.
GhostRider simulator timing model. In addition to demonstrating feasibility
with our hardware prototype, we study the effect of GhostRider ’s compiler on al-
ternate, more efficient ORAM configurations, e.g., Phantom at 150MHz [63] with
two ORAM banks and a distinct (non-encrypting) DRAM bank. Hence we gen-
erate a timing model for both the modified processor and ORAM banks based on
Phantom’s hardware implementation [63], and incorporate the timing model into an
ISA-level emulator for the RISC-V architecture; the model is shown in Table 3.2.
Slices BRAMs
Rocket 9287 (8.8%) 36 (10.5%)
ORAM 12845 (12.2%) 211 (61.5%)
Table 3.1: FPGA synthesis results on Convey HC-2ex.
3.7 Empirical Evaluation
Programs. Table 3.3 lists all the programs we use in our evaluation. These pro-
grams range from standard algorithms to data structures and include predictable,
72
Feature Latency (# cycles)
64b ALU 1
Jump taken/not taken 3/1
64b Multiply/Divide 70/70
Load/Store from Scratchpad 2
DRAM (4kB access) 634
Encrypted RAM (4kB access) 662
ORAM 13 levels (4kB block) 4262
Table 3.2: Timing model for GhostRider simulator.
Figure 3.10: Simulator-based execution time results of GhostRider.
partially predictable, and predominantly irregular (data-driven) memory access pat-
terns.
Execution time results. We present measurements both for the simulator and for
the actual FPGA hardware, starting with the former because the simulator allows
us to evaluate the benefits from splitting memory into ERAM and ORAM banks
73
Name Brief Description Input Size (KB)
sum Summing up all positive elements in an array 103
findmax Find the max element in an array 103
heappush insert an element into a min-heap 103
perm computing a permutation executing a[b[i]] = i for all i 103
histogram compute the number of occurances of each last digit 103
dijkstra Single-source shortest path 103
search binary search algorithm 1.7× 104
heappop pop the minimal element from a min-heap 1.7× 104
Table 3.3: Benchmark programs for GhostRider organized into programs with pre-
dictable, partially predictable, and data dependent memory access patterns (in order
from top).
Non-secure
Non-secure program: all variables in ERAM, no padding,
and uses scratchpad.
Baseline
Secure baseline: all secret variables in a single ORAM,
no scratchpad.
Split ORAM
Variables can be split across multiple ORAM banks, or
placed in ERAM. Performs padding. No scratchpad.
Final Scratchpad on top of Split ORAM.
Figure 3.11: Legends of Figure 3.10
v. additionally using a scratchpad. We also discuss the execution time results by
categorizing them based on the regularity in the programs’ access patterns.
Simulator-based results. Figure 3.10 depicts the slowdown of various configura-
tions relative to a non-secure configuration that simply stores data in ERAM and
employs the scratchpad. The legends are explained in Figure 3.11. Our non-secure
baseline uses a scratchpad instead of a hardware cache in order to isolate the cost
of MTO/ORAM. The secure Baseline configuration places all secret variables in
a single ORAM, while Split ORAM employs the GhostRider optimization of using
74
ERAM and multiple ORAM banks, and Final further adds the (secure) use of a
scratchpad.
Three out of eight programs—sum, findmax, and heap- push—have a pre-
dictable access pattern and the secure program generated by GhostRider relies
mainly on ERAM. Hence, each MTO program (Final) has almost no slowdown
to 3.08× slowdown in comparison its non-secure counterpart (Non-secure), and
correspondingly faster than Baseline by 5.85× to 9.03×.
For perm, histogram, and dijkstra, which have partially predictable and
partially sensitive memory access patterns, our compiler attempts to place sensitive
arrays inside both ERAM and ORAM and also favors splitting into several smaller
ORAM banks without breaking MTO. As shown in Figure 3.10, for such programs,
Final can achieve a 1.30× to 1.85× speedup over Baseline (with 7.56× to 10.68×
slowdown compared to Non-secure, respectively).
For search and heappop, which have predominantly sensitive memory access
patterns, the speedup of Final over Baseline is not as significant, i.e. 1.07× and
1.12× respectively, and is due mostly to the usage of two ORAMs to store arrays
instead of a single ORAM.
Examining the impact of the use of the scratchpad in the results, we can see
that for the first six programs, Final reduces execution time compared to Split
ORAM by a factor from 1.05× up to 2.23×.For search and heappop, the scratchpad
provides no benefit because for these programs all data is allocated in ORAM, as
array indices are secret (so the access pattern is sensitive), and our type system
disallows caching of ORAM blocks. The reason is that the presence of the data in
75
Figure 3.12: FPGA based execution time results: Slowdown of Baseline and Final
versions compared to non-secure version of the program. Note that unlike Fig-
ure 3.10, Final uses only a single ORAM bank and conflates ERAM and DRAM
(cf. Section 3.6).
the cache could reveal something about the secret indices. A more sophisticated
type system, or a relaxation of MTO, could do better; we plan to explore such
improvements in future work.
FPGA-based results. For the FPGA we run the same set of programs as in
Table 3.3, but restrict the input size to be around 100 KB, due to limitations of our
prototype. Speedups of Final over the secure Baseline follow a trend similar to the
simulator, as shown in Figure 3.12. Regular programs have speedups in the range of
4.33× (for heappush) to 8.94× (for findmax). Partially regular programs like perm
and histogram get a speedup of 1.46× and 1.3× respectively. Finally, irregular
programs such as search and heappop see very little improvements (1.08× and
1.02× respectively).
76
Differences between the simulator and hardware numbers can be attributed to
mulitple factors. First, the simulator imperfectly models the Convey memory sys-
tem’s latency, always assuming the worst case, and thus slowdowns compared to the
non-secure baseline are often worse on the simulator (cf. heappop and heappush).
Second, the timing of certain hardware operations is different on the prototype
and the simulator (where we consider the latter to be aspirational, per the end of
Section 3.6). In particular, per Table 3.2, the simulator models access latency for
ORAM as 4262 cycles and ERAM as 662 cycles, accounting for both reading data
blocks from DRAM and moving the chosen 4KB block into the scratchpad BRAMs
on the FPGA. On the hardware, ORAM and ERAM latencies are 5991 and 1312
cycles, respectively, measured using performance counters in the hardware design.
The higher ERAM and ORAM access times reduce the slowdown on the simulator
by amplifying the benefit of the scratchpad, which is used by the non-secure baseline,
but not by the secure baseline (cf. findmax and sum).
Third, the benefit of using the scratchpad can differ depending on the input
size. This effect is particularly pronounced for Dijkstra, where the ratio of secure
to non-secure baseline execution is smaller for the hardware than for the simulator.
The reason is that the hardware experiment uses a smaller input that fills only about
1/5 of a scratchpad block. Hence, in the non-secure baseline, the block is reloaded
after relatively fewer accesses, resulting in a relatively greater number of block loads
and thus bringing the performance of the non-secure program closer to that of the
secure baseline.
Finally, note that the simulator’s use of multiple ORAM banks, and DRAM
77
with different timings, is a source of differences, but this effect is dwarfed by the
other effects.
3.8 Conclusion
We have presented the first complete memory trace oblivious system—GhostRider—
comprising of a novel compiler, type system, and hardware architecture. The com-
piled programs not only provably satisfy memory trace obliviousness, but also ex-
hibit up to nearly order-of-magnitude performance gains in comparison with placing
all variables in a single ORAM bank. By enabling compiler analyses to target a joint
ERAM-ORAM memory system, and by employing a compiler-controlled scratchpad,
this work opens up several performance optimization opportunities in tuning bank
configurations (size and access granularity) and, on a broader level, into co-designing
data structures and algorithms for a heterogeneous yet oblivious memory hierarchy.
78
Chapter 4: RAM-model Secure Computation
Secure computation is a cryptographic technique allowing mutually distrusting
parties to make collaborative use of their local data without harming privacy of
their individual inputs. Since the first system for general-purpose secure two-party
computation was built in 2004 [65], efficiency has improved substantially [11,46].
Almost all previous implementations of general-purpose secure computation
assume the underlying computation is represented as a circuit. While theoreti-
cal developments using circuits are sensible (and common), typical programs are
mostly expressed in von Neumann-style Random Access Machine (RAM) model.
Compiling a RAM-model program into its efficient circuit-based representation can
be challenging, especially when handling dynamic memory accesses to an array in
which the memory location being read/written depends on secret inputs. Existing
program-to-circuit compiler typically makes an entire copy of the array upon every
dynamic memory access, thus resulting in a huge circuit when the data size is large.
To address this limitations, recent research [38] shows that secure computa-
tion ORAM can be used to compile a dynamic memory access into a circuit with
poly-logarithmic size while preventing information leakage through memory-access
patterns. We refer such an approach as a RAM-model secure computation (RAM-
79
SC) approach, and Gordon et al. [38] observed that RAM-SC exhibits a significant
advantage in the setting of repeated sublinear-time queries (e.g., binary search) on
a large database, where an initial setup cost can be amortized over subsequent
repeated queries.
Our Contributions. We continue work on secure computation in the RAM model,
with the goal of providing a complete system that takes a program written in a high-
level language and compiles it to a protocol for secure two-party computation of that
program.1 In particular, we
• Define an intermediate representation (which we call SCVM) suitable for effi-
cient two-party RAM-model secure computation;
• Develop a type system ensuring that any well-typed program will generate
a RAM-SC protocol secure in the semi-honest model, if all subroutines are
implemented with a protocol secure in the semi-honest model.
• Build an automated compiler that transforms programs written in a high-level
language into a secure two-party computation protocol, and integrate compile-
time optimizations crucial for improving performance.
We use our compiler to compile several programs including Dijkstra’s shortest-
path algorithm, KMP string matching, binary search, and more. For moderate data
sizes (up to the order of a million elements), our evaluation shows a speedup of 1–2
orders of magnitude as compared to standard circuit-based approaches for securely
1Note that Gordon et al. [38] do not provide such a compiler; they only implement RAM-model
secure computation for the particular case of binary search.
80
computing these programs. We expect the speedup to be even greater for larger
data sizes.
SCVM is our first attempt to demonstrate the feasibility to optimize secure
computation in the RAM-model using ORAMs. In the next Chapter, we extend
SCVM to build ObliVM which focuses more on richer expressiveness power and
easy-programmability while achieving the state-of-the-art performance for secure
computation.
This chapter is based on a paper that I co-authored with Michael Hicks, Yan
Huang, Jonathan Katz, and Elaine Shi [59]. I developed the formalism and the
proof under the help of Michael Hicks, Elaine Shi and Jonathan Katz. I developed
the compiler, which implements the optimization and the type checker, and emits
code that is runnable over a secure computation backend. I conducted experiments
to show the compiler’s effectiveness with the help of Yan Huang.
4.1 Technical Highlights
As explained in Sections 4.2 and 4.3, the standard implementation of RAM-SC
entails placing all data and instructions inside a single Oblivious RAM. The secure
evaluation of one instruction then requires i) fetching instruction and data from
ORAM; and ii) securely executing the instruction using a universal next-instruction
circuit (similar to a machine’s ALU). This approach is costly since each step must
be done using a secure-computation sub-protocol.
An efficient representation for RAM-SC. Our type system and SCVM inter-
81
mediate representation are capable of expressing RAM-SC tasks more efficiently by
avoiding expensive next-instruction circuits and minimizing ORAM operations when
there is no risk to security. These language-level capabilities allow our compiler to
apply compile-time optimizations that would otherwise not be possible. Thus, we
not only obtain better efficiency than circuit-based approaches, but we also achieve
order-of-magnitude performance improvements in comparison with straightforward
implementations of RAM-SC (see Section 4.2).
Program-trace simulatability. A well-typed program in our language is guaran-
teed to be both instruction-trace oblivious and memory-trace oblivious. Instruction-
trace obliviousness ensures that the values of the program counter during execution
of the protocol do not leak information about secret inputs other than what is
revealed by the output of the program. As such, the parties can avoid securely eval-
uating a universal next-instruction circuit, but can instead simply evaluate a circuit
corresponding to the current instruction. Memory-trace obliviousness ensures that
memory accesses observed by one party during the protocol’s execution similarly do
not leak information about secret inputs other than what is revealed by the output.
In particular, if access to some array does not depend on secret information (e.g., it
is part of a linear scan of the array), then the array need not be placed into ORAM.
We formally define the security property ensured by our type system as program-
trace simulatability. We define a mechanism for compiling programs to protocols that
rely on certain ideal functionalities. We prove that if every such ideal functionality is
instantiated with a semi-honest secure protocol computing that functionality, then
any well-typed program compiles to a semi-honest secure protocol computing that
82
program.
Additional language features. SCVM supports several other useful features.
First, it permits reactive computations by allowing output not only at the end of
the program’s execution, but also while it is in progress. Our notation of program-
trace simulatability also fits this reactive model of computation.
SCVM also integrates state-of-the-art optimization techniques that have been
suggested previously in the literature. For example, we support public, local, and
secure modes of computation, a technique recently explored (in the circuit model)
by Kerschbaum [52] and Rastogi et al. [76] Our compiler can identify and encode
portions of computation that can be safely performed in the clear or locally by one
of the parties, without incurring the cost of a secure-computation sub-protocol.
Our SCVM intermediate representation generalizes circuit-model approaches.
For programs that do not rely on ORAM, our compiler effectively generates an effi-
cient circuit-model secure-computation protocol. This paper focuses on the design
of the intermediate representation language and type system for RAM-model secure
computation, as well as the compile-time optimization techniques we apply. Our
work is complementary to several independent, ongoing efforts focused on improv-
ing the cryptographic back end.
4.2 Background: RAM-Model Secure Computation
In this section, we review some background for RAM-model secure compu-
tation. Our treatment is adapted from that of Gordon et al. [38], with notation
83
adjusted for our purposes.
A key underlying building block of RAM-model secure computation is Oblivi-
ous RAM (ORAM). ORAM is a cryptographic primitive that hides memory-access
patterns by randomly reshuffling data in memory. With ORAM, each memory read
or write operation incurs poly log n actual memory accesses.
Existing RAM-model secure computation, which we refer as straightforward
RAM-SC, employs the following scheme. The entire memory denoted mem, contain-
ing both program instructions and data, is placed in ORAM, and the ORAM is
secret-shared between the two participating parties as discussed above, e.g., using
a simple XOR-based secret-sharing scheme. With ORAM, a memory access thus
requires each party to access the elements of their respective arrays at pseudoran-
dom locations (the addresses are dictated by the ORAM algorithm), and the value
stored at each position is then obtained by XORing the values read by each of the
parties. Alternatively, the server can hold an encryption of the ORAM array, and
the client holds the key. The latter was done by Gordon et al. to ensure that one
party holds only O(1) state. All CPU states are also secret-shared between the two
parties.
Straightforward RAM-SC proceeds as follows. Each step of the computation
must be done using some secure computation subprotocol. In particular, SC-U is a
secure computation protocol that securely evaluates the universal next instruction
circuit, and SC-ORAM is a secure computation protocol that securely evaluates the
ORAM algorithm. For ORAM.Read, each party supplies a secret share of the raddr,
and during the course of the protocol, the ORAM.Read protocol will emit obfuscated
84
Scenario Potential benefits of RAM-model se-
cure computation
1 Repeated sublinear queries over a
large dataset (e.g., binary search,
range query, shortest path query)
• Amortize preprocessing cost over mul-
tiple queries
• Achieve sublinear amortized cost per
query
2 One-time computation over a large
dataset
Avoid paying O(n) cost per dynamic memory
access
Table 4.1: Two main scenarios and advantages of RAM-model secure computation
addresses for each party to read from. At the end of the protocol, each party obtains
a share of the fetched data. For ORAM.Write, each party supplies a secret share of
waddr and wdata, and during the course of the protocol, the ORAM.Read protocol
will emit obfuscated addresses for each party to write to, and secret shares of values
to write to those addresses.
Deployment scenarios and threat model for RAM-model secure compu-
tation. SCVM presently supports a two-party semi-honest protocol. We consider
the following primary deployment scenarios:
1. Two parties, Alice and Bob, each comes with their own private data, and
engage in a two-party protocol. For example, Goldman Sachs and Bridgewater
would like to perform joint computation over their private market research
data to learn market trends.
2. One or more users break their private data (e.g., genomic data) into secret
shares, and split the shares among two non-colluding cloud providers. The
shares at each cloud provider are completely random and reveal no informa-
tion. To perform computation over the secret-shared data, the two cloud
85
providers engage in a secure 2-party computation protocol.
3. Similar as the above, but the two servers are within the same cloud or under
the same administration. This can serve to mitigate Advanced Persistent
Threats or insider threats, since compromise of a single machine will no longer
lead to the breach of private data. Similar architectures have been explored
in commercial products such as RSA’s distributed credential protection [3].
In the first scenario, Alice and Bob should not learn anything about each
other’s data besides the outcome of the computation. In the second and third
scenarios, the two servers should learn nothing about the users’ data other than the
outcome of the computation – note that the outcome of the computation can also
be easily hidden simply by XORing the outcome with a secret random mask (like a
one-time pad). We assume that the program text (i.e., code) is public.
With respect to the types of applications, while Gordon et al. describe RAM-
model secure computation mainly for the amortized setting, where repeated com-
putations are carried out starting from a single initial dataset, we note that RAM-
model secure computation can also be meaningful for one-time computation on large
datasets, since a straightforward RAM-to-circuit compiler would incur linear (in the
size of dataset) overhead for every dynamic memory access whose address depends
on sensitive inputs. Table 4.1 summarizes the two main scenarios for RAM-model
secure computation, and potential advantages of using the RAM model in these
cases.
86
4.3 Technical Overview: Compiling for RAM-Model Secure Compu-
tation
This section describes our approach to optimize RAM-model secure computa-
tion. Our key idea is use static analysis during compilation to minimize the use of
heavyweight cryptographic primitives such as garbled circuits and ORAM.
4.3.1 Instruction-Trace Obliviousness
The standard RAM-model secure computation protocol described in Section 4.2
is relatively inefficient because it requires a secure-computation sub-protocol to com-
pute the universal next-instruction circuit U . This circuit has large size, since it
must interpret every possible instruction. In our solution, we will avoid relying on
a universal next-instruction circuit, and will instead arrange things so that we can
securely evaluate instruction-specific circuits.
Note that it is not secure, in general, to reveal what instruction is being carried
out at each step in the execution of some program. As a simple example, consider
a branch over a secret value s:
if(s) x[i]:=a+b; else x[i]:=a-b
Depending on the value of s, a different instruction (i.e., add or subtract)
will be executed. To mitigate such an implicit information leak, our compiler trans-
forms a program to an instruction-trace oblivious counterpart, i.e., a program whose
program-counter value (which determines which instruction will be executed next)
87
does not depend on secret information. The key idea there is to use a mux operation
to rewrite a secret if-statement. For example, the above code can be re-factored to
the following:
t1 := s;
t2 := a+b;
t3 := a-b;
t4 := mux(t1, t2, t3);
x[i] := t4
At every point during the above computation, the instruction being executed is
pre-determined, and so does not leak information about sensitive data. Instruction-
trace obliviousness is similar to program-counter security proposed by Molnar et
al. [67] (for a different application).
4.3.2 Memory-Trace Obliviousness
Using ORAM for memory accesses is also a heavyweight operation in RAM-
model secure computation. The standard approach is to place all memory in a
single ORAM, thus incurring O(poly log n) cost per data operation, where n is a
bound on the size of the memory.
We have demonstrated in the context of securing remote execution against
physical attacks (Chapter 2,3) that not all access patterns of a program are sensitive.
For example, a findmax program that sequentially scans through an array to find the
maximum element has predictable access patterns that do not depend on sensitive
88
inputs. We propose to apply a similar idea to the context of RAM-model secure
computation. Our compiler performs static analysis to detect safe memory accesses
that do not depend on secret inputs. In this way, we can avoid using ORAM
when the access pattern is independent of sensitive inputs. It is also possible to
store various subsets of memory (e.g., different arrays) in different ORAMs, when
information about which portion of memory (e.g., which array) is being accessed
does not depend on sensitive information.
4.3.3 Mixed-Mode Execution
We also use static analysis to partition a program into code blocks, and then
for each code block use either a public, local, or secure mode of execution (described
next). Computation in public or local modes avoids heavyweight secure compu-
tation. In the intermediate language, each statement is labeled with its mode of
execution.
Public mode. Statements computing on publicly-known variables or variables that
have been declassified in the middle of program execution can be performed by both
parties independently, without having to resort to a secure-computation protocol.
Such statements are labeled P. For example, the loop iterators (in lines 1, 3, 10)
in Dijkstra’s algorithm (see Figure 4.2) do not depend on secret data, and so each
party can independently compute them.
Local mode. For statements computing over Alice’s variables, public variables, or
previously declassified variables, Alice can perform the computation independently
89
1 for(i = 0; i < n; ++i) {
2 int bestj = -1; bestdis = -1;
3 for(int j=0; j<n; ++j) {
4 if( ! vis[j] && (bestj < 0
5 || dis[j] < bestdis))
6 bestj = j;
7 bestdis = dis[j];
8 }
9 vis[bestj] = 1;
10 for(int j=0; j<n; ++j) {
11 if( !vis[j] && (bestdis +
12 e[bestj][j] < dis[j]))
13 dis[j] = bestdis + e[bestj][j];
14 }
15 }
Figure 4.1: Dijkstra’s shortest distance algorithm in source (Part)
without interacting with Bob (and vice versa). Here we crucially rely on the fact
that we assume semi-honest behavior. Alice-local statements are labeled A, and
Bob-local statements are labeled B.
Secure mode. All other statements that depend on variables that must be kept
secret from both Alice and Bob will be computed using secure computation, making
ORAM accesses along the way if necessary. Such statements are labeled O (for
“oblivious”).
4.3.4 Example: Dijkstra’s Algorithm
In Figure 4.2, we present a complete compilation example for part of Dijkstra’s
algorithm in Figure 4.1. Here one party, Alice, has a private graph represented
by a pairwise edge-weight array e[][] and the other party, Bob, has a private
source/destination pair. Bob wishes to compute the shortest path between his source
90
O: orame :=oram( e );
P: i :=0; P: cond1 := i < n ;
P:while( cond1 ) do
O:bestj:=-1; O:bestdis:=-1;
P: j :=0; P: cond2 := j < n ;
P:while( cond2 ) do
O:t1:= vis [ j ]; O:t2:=!t1; O:t3:=best<0;
O:t4:= dis [ j ]; O:t5:=t4<bestdis;
O:t6:=t3||t5; O:cond3:=t2 && t6;
O:best :=mux(cond3, j , best);
O:bestdis:=mux(cond3, t4, bestdis);
P: j := j +1; P: cond2 := j < n ;;
O: vis [ bestj ]:=1;
P: j :=0; P: cond2 := j < n ;
P:while(cond2) do
O:t7:= vis [ j ]; O:t8:=!t7;
O:idx:=bestj*n; O:idx:=idx+ j ; O:t9:= orame [idx];
O:t10:=bestdis + t9; O:t11:= dis [ j ];
O:t12:=t10 < t11; O:cond4:=t8 && t12
O:t13:=mux(cond4, t10, t11); O: dis [ j ]:=t13;
P: j := j +1; P: cond2 := j < n ;
Figure 4.2: Compilation example: Part of Dijkstra’s shortest-path algo-
rithm. The code on the left is compiled to the annotated code on the right. Array
variable e is Alice’s local input array containing the graph’s edge weights; Bob’s
input, a source/destination pair, is not used in this part of the algorithm. Array
variables vis and orame are placed in ORAMs. Array variable dis is placed
in non-oblivious (but secret-shared) memory. (Prior to the shown code, vis is ini-
tialized to all zeroes except that vis[source]—where source is Bob’s input—is
initialized to 1, and dis[i] is initialized to e[source][i].) Variables n , i , j
and others boxed in white background are public variables. All other variables are
secret-shared between the two parties.
91
and destination in Alice’s graph. The figure shows the code that computes shortest
paths (Bob’s inputs are elided).
Our specific implementation of Dijkstra’s algorithm uses three arrays, a dis
array which keeps track of the current shortest distance from the source to any other
node; an edge-weight array orame which is initialized by Alice’s local array e, and
an indicator array vis, denoting whether each node has been visited. In this case,
our compiler places arrays vis and e in separate ORAMs, but does not place array
dis in ORAM since access to dis always follows a sequential pattern.
Note that parts of the algorithm can be computed publicly. For example, all
the loop iterators are public values; therefore, loop iterators need not be secret-
shared, and each party can independently compute the current loop iteration. The
remaining parts of program all require ORAM accesses; therefore, our compiler
annotates these instructions to be run in secure mode, and generates equivalent
instruction- and memory-trace oblivious target code.
4.4 SCVM Language
This section presents SCVM, our language for RAM-model secure computation,
and presents our formal results.
In Section 4.4.1, we present SCVM’s formal syntax. In Section 4.4.2, we give
a formal, ideal world semantics for SCVM that forms the basis of our security the-
orem. Informally, each party provides their inputs to an ideal functionality F that
computes the result and returns to each party its result and a trace of events it is
92
Type checking	

Γ, pc S 
S is	

 Γ-progressive	

Thm 1	

Thm 2	

S is 	

Γ-simulatable	
 πG implements F Thm 3	
 [Canetti ‘00]	

If ρ implements G,    
 then πρ implements F 
πG:  hybrid-world protocol	

πρ :  real-world protocol	

F :  ideal-world functionality	

Figure 4.3: Formal results in SCVM.
allowed to see; these events include instruction fetches, memory accesses, and declas-
sification events, which are results computed from both parties’ data. Section 4.4.3
formally defines our security property, Γ-simulatability. Informally, a program is
secure if each party, starting with its own inputs, memory, the program code, and
its trace of declassification events, can simulate (in polynomial time) its observed
instruction traces and memory traces without knowing the other party’s data. We
present a type system for SCVM programs in Section 4.4.4, and in Theorem 3 prove
that well-typed programs are Γ-simulatable. Theorem 4 additionally shows that
well-typed programs will not get stuck, e.g., because one party tries to access mem-
ory unavailable to it. Finally, in Section 4.4.5 we define a hybrid world functionality
that more closely models SCVM’s implemented semantics using ORAM, garbled cir-
cuits, etc. and prove that for Γ-simulatable programs, the hybrid-world protocol
securely implements the ideal functionality. The formal results are summarized in
Figure 4.3.
93
4.4.1 Syntax
The syntax of SCVM is given in Figure 4.4. In SCVM, each variable and
statement has a security label from the lattice {P,A,B,O}, where v is defined to
be the smallest partial order such that P v l v O for l ∈ {A, B}. The label of
each variable indicates whether its memory location should be public, known to
either Alice or Bob (only), or secret. For readability, we do not distinguish between
oblivious secret arrays and non-oblivious secret arrays at this point, and simply
assume that all secret arrays are oblivious. Support for non-oblivious, secret arrays
will be added in Section 4.5.
An information-flow control type system, which we discuss in Section 4.4.4,
enforces that information can only flow from low (i.e., lower in the partial order)
security variables to high security variables. For example, for a statement x := y to
be secure, y’s security label should be less than or equal to x’s security label. An
exception is the declassification statement x := declassl(y) which may declassify a
variable y labeled O to a variable x with lower security label l.
The label of each statement indicates the statement’s mode of execution. A
statement with the label P is executed in public mode, where both Alice and Bob
can see its execution. A statement with the label A or B is executed in local mode,
and is visible to only Alice or Bob, respectively. A statement with the label O is
executed securely, so both Alice and Bob know the statement was executed but do
not learn the underlying values that were used.
Most SCVM language features are standard. We highlight the statement x :=
94
Variables x, y, z ∈ Vars
Security Labels l ∈ SecLabels = {P, A, B, O}
Numbers n ∈ Nat
Operation op ::= + | − | ...
Expressions e ::= x | n | x op x | x[x] |mux(x, x, x)
Statements s ::= skip | x := e | x[x] := x | if (x) then S else S |
while (x) do S | x := declassl(y) | x := oram(y)
Labeled Statements S ::= l : s | S;S
Figure 4.4: Syntax of SCVM
oram(y), by which variable x is assigned to an ORAM initialized with array y’s
contents, and the expression mux(x0, x1, x2), which evaluates to either x1 or x2,
depending on whether x0 is 0 or 1.
4.4.2 Semantics
We define a formal semantics for SCVM programs which we think of as defin-
ing a computation carried out, on Alice and Bob’s behalf, by an ideal functionality
F . However, as we foreshadow throughout, the semantics is endowed with suffi-
cient structure that it can be interpreted as using the mechanisms (like ORAM
and garbled circuits) described in Sections 4.3. We discuss such a hybrid world in-
terpretation more carefully in Section 4.4.5 and prove it also satisfies our security
properties.
Memories and types. Before we begin, we consider a few auxiliary definitions
given in Figure 4.5. A memory M is a partial map from variables to value-label
pairs. The value is either a natural number n or an array m, which is a partial map
from naturals to naturals. The security labels l ∈ {P,A,B,O} indicate the conceptual
95
visibility of the value as described earlier. Note that in a real-world implementation,
data labeled O is stored in ORAM and secret-shared between Alice and Bob, while
other data is stored locally by Alice or Bob. We sometimes find it convenient to
project memories whose values are visible at particular labels:
Definition 6 (L-projection). Given memory M and a set of security labels L,
we write M [L] as M ’s L-projection, which is itself a memory such that for all
x, M [L](x) = (v, l) if and only if M(x) = (v, l) and l ∈ L.
We define types Nat l and Array l, for numbers and arrays, respectively,
where l is a security label. A type environment Γ associates variables with types,
and we interpret it as a partial map. We sometimes consider when a memory is
consistent with a type environment Γ:
Definition 7 (Γ-compatibility). We say a memory M is Γ-compatible if and only
if for all x, when M(x) = (v, l), then v ∈ Nat⇔ Γ(x) = Nat l and v ∈ Array⇔
Γ(x) = Array l.
Ideal functionality. Once Alice and Bob have agreed on a program S, we imagine
an ideal functionality F that executes S. Alice and Bob send to F memories MA
and MB, respectively. Alice’s memory contains data labeled A and P, while Bob’s
memory contains data labeled B and P. (Data labeled O is only constructed during
execution.) F then proceeds as follows:
1. It checks that MA and MB agree on P-labeled values, i.e., that MA[{P}] =
MB[{P}]. It also checks that they do not share any A/B-labeled values, i.e.,
96
that the domain of MA[{A}] and the domain of MB[{B}] do not intersect. If
either of these conditions fail, F notifies both parties and aborts the execution.
Otherwise, it constructs memory M from MA and MB:
M = {x 7→ (v, l) |MA[{A,P}](x) = (v, l) ∨MB[{B}](x) = (v, l)}
2. F executes S according to semantics rules having the form 〈M,S〉 (ia,ta,ib,tb)−−−−−−→
〈M ′, S ′〉 : D. This judgment states that starting in memory M , statement S
runs, producing a new memory M ′ and a new statement S ′ (representing the
partially executed program) along with instruction traces ia and ib, memory
traces ta and tb, and declassification event D. We discuss these traces/events
shortly. The ideal execution will produce one of three outcomes (or fail to
terminate):
• 〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, S ′〉 : D, where D = (da, db). In this case, F
outputs da to Alice, and db to Bob. Then F sets M to M ′ and S to S ′
and restarts step 2.
• 〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, l : skip〉 : . In this case, F notifies both parties
that computation finished successfully.
• 〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, S ′〉 : , where S ′ 6= l : skip, and no rules further
reduce 〈M ′, S ′〉. In this case, F aborts and notifies both parties.
Notice that the only communications between F and each party about the computa-
tion are declassifications da and db (to Alice and Bob, respectively) and notification
97
Arrays m ∈ Array = Nat ⇀ Nat
Memory M ∈ Vars ⇀ (Array ∪Nat)× SecLabels
Type τ ::= Nat l | Array l
Type Environment Γ ::= x : τ | ·
Instruction Traces i ::= l : x := e | l : x[x] := x | l : declass(x, y) | l : init(x, y)
l : if(x) | l : while(x) | i@i | 
Memory Traces t ::= read(x, n) | readarr(x, n, n) |
write(x, n) | writearr(x, n, n) | x | t@t | 
Declassification d ::= (x, n) | 
Declass. event D ::= (d, d) | 
select(l, t1, t2) =

(t1, t1) if l = P
(t1, ) if l = A
(, t1) if l = B
(t2, t2) if l = O
inst(l, i) = select(l, l : i, l : i)
get(m, i) =
{
m(i) 0 ≤ i < |m|
0 otherwise
set(m, i, v) =
{
m[i 7→ v] 0 ≤ i < |m|
m otherwise
arr(x,m) = readarr(x, 0,m(0))@...@readarr(x, n,m(n)) where n = |m| − 1
t1 ≡ t2 t ≡ t t@ ≡ @t ≡ t
t1 ≡ t′1 t2 ≡ t′2
t1@t2 ≡ t′1@t′2
Figure 4.5: Auxiliary syntax and functions for SCVM semantics
of termination. This is because we assume that secure programs will always ex-
plicitly declassify their final output (and perhaps intermediate outputs, e.g., when
processing multiple queries), while all other variables in memory are not of conse-
quence. The memory and instruction traces, though not explicitly communicated
by F , will be visible in a real implementation (described later), but we prove that
they provide no additional information beyond that provided by the declassification
events.
98
Traces and events. The formal semantics incorporate the concept of traces to
define information leakage. There are three types of traces, all given in Figure 4.5.
The first is an instruction trace i. The instruction trace generated by an assignment
statement is the statement itself (e.g., x := e); the instruction trace generated by
a branching statement is denoted if(x) or while(x). Declassification and ORAM
initialization will generate instruction traces declass(x, y) and init(x, y), respec-
tively. The trace  indicates an unobservable statement execution (e.g., Bob cannot
observe Alice executing her local code). Trace equivalence (i.e. t1 ≡ t2) is defined
in Figure 4.5.
The second sort of trace is a memory trace t, which captures reads or writes
of variables visible to one or the other party. Here are the different memory trace
events:
• P: Operations on public arrays generate memory event readarr(x, n, v) or
writearr(x, n, v) visible to both parties, including the variable name x, the in-
dex n, and the value v read or written. Operations on public variables generate
memory event read(x, v) or write(x, v). To initialize an ORAM from a public
array will access each item in the array, so a sequence of readarr(x, i,m(i))
for i = 0, ..., |m| − 1, is visible to both Alice and Bob. We use arr(x,m) to
indicate such a sequence of memory events.
• A/B: Operations on Alice’s secret arrays generate memory event readarr(x, n, v)
or writearr(x, n, v) visible to Alice only. Operations on Alice’s secret variables
generate memory event read(x, v) or write(x, v) visible to Alice only. Initial-
99
izing an ORAM from Alice’s secret array generate memory events arr(x,m)
visible to Alice only. Operations on Bob’s secret arrays/variables are handled
similarly.
• O: Operations on a secret array generate memory event x visible to both Alice
and Bob, containing only the variable name, but not the index or the value.
A special case is the initialization of ORAM bank x with y’s value: a memory
trace y, but not its content, is observed.
Memory-trace equivalence is defined similarly to instruction-trace equivalence.
Finally, each declassification executed by the program produces a declassifi-
cation event (da, db), where Alice learns the declassification da and Bob learns db.
There is also an empty declassification event , which is used for non-declassification
statements. Given a declassification event D = (da, db), we write D[A] to denote
Alice’s declassification da and D[B] to denote Bob’s declassification db.
Semantics rules. Now we turn to the semantics, which consists of two judgments.
Figure 4.6 defines rules for the judgment l ` 〈M, e〉 ⇓(ta,tb) v, which states that in
mode l, under memory M , expression e evaluates to v. This evaluation produces
memory trace ta (resp., tb) for Alice (resp., Bob). Which memory trace event to
emit is chosen using the function select, which is defined in Figure 4.5. The security
label l is passed in by the corresponding assignment statement (i.e. l : x := e or
l : y[x1] := x2). If l is A or B, then the accesses to public variables are not observable
to the other party, whereas if l is O then both parties know that an access took
place; the l? label defined in E-Var and E-Array ensures the proper visibility of such
100
l ` 〈M, e〉 ⇓(ta,tb) v
E-Const l ` 〈M,n〉 ⇓(,) n
E-Var
M(x) = (v, l′) v ∈ Nat l′ v l
l = O⇒ l? = l′ l 6= O⇒ l? = l
(ta, tb) = select(l
?, read(x, v), x)
l ` 〈M,x〉 ⇓(ta,tb) v
E-Array
M(x) = (m, l′) m ∈ Array l′ v l
l = O⇒ l? = l′ l 6= O⇒ l? = l
l ` 〈M,y〉 ⇓(t′a,t′b) v v′ = get(m, v)
(t′′a, t′′b ) = select(l
?, readarr(x, v, v′), x)
ta = t
′
a@t
′′
a tb = t
′
b@t
′′
b
l ` 〈M,x[y]〉 ⇓(ta,tb) v′
E-Op
l ` 〈M,xi〉 ⇓(tia,tib) vi i = 1, 2
v = v1 op v2
ta = t1a@t2a tb = t1b@t2b
l ` 〈M,x1 op x2〉 ⇓(ta,tb) v
E-Mux
l ` 〈M,xi) ⇓(tia,tib) vi i = 1, 2, 3
v1 = 0⇒ v = v2 v1 6= 0⇒ v = v3
ta = t1a@t2a@t3a tb = t1b@t2b@t3b
l ` 〈M,mux(x1, x2, x3)〉 ⇓(ta,tb) v
Figure 4.6: Operational semantics for expressions in SCVM
events. Note the E-Array rule uses the get() function to retrieve an element from
an array; this function will return a default value 0 if the index is out of bounds.
Most elements of the rules are otherwise straightforward.
Figure 4.7 and 4.8 define rules for the judgment 〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, S ′〉 : D,
which says that under memory M , the statement S reduces to memory M ′ and
statement S ′, while producing instruction trace ia (resp., ib) and memory trace
ta (resp., tb) for Alice (resp., Bob), and generating declassification D. Most rules
are standard, except for handling memory traces and instruction traces. Instruction
traces are handled using function inst defined in Figure 4.5. This function is defined
101
〈M,S〉 (ia,ta,ib,tb)−−−−−−−→ 〈M ′, S′〉 : D
S-Skip 〈M, l : skip;S〉 (,,,)−−−−→ 〈M,S〉 : 
S-Assign
l ` 〈M, e〉 ⇓(t′a,t′b) v
M ′ = M [x 7→ (v, l)] (ia, ib) = inst(l, x := e)
(t′′a, t′′b ) = select(l,write(x, v), x)
ta = t
′
a@t
′′
a tb = t
′
b@t
′′
b
〈M, l : x := e〉 (ia,ta,ib,tb)−−−−−−−→ 〈M ′, l : skip〉 : 
S-Declass
M(y) = (v, O) l 6= O ta = tb = y
M ′ = M [x 7→ (v, l)] i = O : declass(x, y)
D = select(l, (x, v), )
〈M, O : x := declassl(y)〉 (i,ta,i,tb)−−−−−→ 〈M ′, O : skip〉 : D
S-Cond
(ia, ib) = inst(l, if(x)) M(x) = (v, l)
(ta, tb) = select(l, read(x, v), x)
v = 1⇒ c = 1 v 6= 1⇒ c = 2
〈M, l : if(x)then S1else S2〉 (ia,ta,ib,tb)−−−−−−−→ 〈M,Sc〉 : 
S-ORAM
M(y) = (m, l) l 6= O
M ′ = M [x 7→ (m, O)] (t′a, t′b) = select(l, arr(y,m), )
i = O : init(x, y) ta = t
′
a@x tb = t
′
b@x
〈M, O : x := oram(y)〉 (i,ta,i,tb)−−−−−→ 〈M ′, O : skip〉 : 
Figure 4.7: Operational semantics for statements in SCVM (Part 1)
such that if the label l of a statement is A or B, then the other party cannot observe
the statement; otherwise, both parties observe the statement.
A skip statement generates empty instruction traces and memory traces for
both parties regardless of its label. An assignment statement first evaluates the ex-
pression to assign, and its trace and the write event constitute the memory trace for
this statement. Note that expression is evaluated using the label l of the assignment
statement as per the discussion of E-Var and E-Array above.
Declassification x := declassl(y) declassifies a secret variable y (labeled O) to
102
S-ArrAss
M(y) = (m, l) l ` 〈M,xi〉 ⇓(tia,tib) vi i = 1, 2
m′ = set(m, v1, v2) M ′ = M [y 7→ (m′, l)]
(t′a, t
′
b) = select(l,writearr(y, v1, v2), y)
ta = t1a@t2a@t
′
a tb = t1b@t2b@t
′
b
(ia, ib) = inst(l, y[x1] := x2)
〈M, l : y[x1] := x2〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, l : skip〉 : 
S-While-False
M(x) = (0, l) (ia, ib) = inst(l,while(x))
(ta, tb) = select(l, read(x, 0), x)
S = l : while(x)do S ′
〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M, l : skip〉 : 
S-While-True
M(x) = (v, l) v 6= 0
(ta, tb) = select(l, read(x, v), x)
S = l : while(x)do S ′
〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M,S ′;S〉 : 
S-Seq
〈M,S1〉 (ia,ta,ib,ib)−−−−−−→ 〈M ′, S ′1〉 : D
〈M,S1;S2〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, S ′1;S2〉 : D
S-Concat
〈M,S〉 (ia,ta,ib,ib)−−−−−−→ 〈M ′, S ′〉 : 
〈M ′, S ′〉 (i
′
a,t
′
a,i
′
b,i
′
b)−−−−−−→ 〈M ′′, S ′′〉 : D
〈M,S〉 (ia@i
′
a,ta@t
′
a,ib@i
′
b,tb@t
′
b)−−−−−−−−−−−−−−→ 〈M ′′, S ′〉 : D
Figure 4.8: Operational semantics for statements in SCVM (Part 2)
103
a non-secret variable x (not labeled O). Both Alice and Bob will observe that y is
accessed (as defined by ta and tb), whereas the label l of variable x determines who
sees the declassified value as indicated by the declassification event D.
ORAM initialization produces a shared, secret array x from an array y pro-
vided by one party. Thus, the security label of x must be O, and the security label of
y must not be O. This rule implies that the party who holds y will observe memory
events arr(y,m), and then both parties can observe accesses to x.
Rule S-ArrAss handles an array assignment. Similar to rule E-Array, out-of-
bounds indices are ignored (cf. the set() function in Figure 4.5). For if-statements
and while-statements, no memory traces are observed other than those observed
from evaluating the guard x.
Rule S-Seq sequences execution of two statements in the obvious way. Finally,
rule S-Concat says that if 〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′′, S ′′〉 : D, the transformation may
perform one or more small-step transformations that generate no declassification.
4.4.3 Security
The ideal functionality F defines the baseline of security, emulating a trusted
third party that runs the program using Alice and Bob’s data, directly revealing to
them only the explicitly declassified values. In a real implementation run directly
by Alice and Bob, however, each party will see additional events of interest, in
particular an instruction trace and a memory trace (as defined by the semantics).
Importantly, we want to show that these traces provide no additional information
104
about the opposite party’s data beyond what each party could learn from observing
F . We do this by proving that in fact these traces can be simulated by Alice and
Bob using their local data and the list of declassification events provided by F . As
such, revealing the instruction and memory traces (as in a real implementation)
provides no additional useful information.
We call our security property Γ-simulatability. To state this property formally,
we first define a multi-step version of our statement semantics:
〈M,P 〉 Γ,(ia,ta,ib,tb)−−−−−−−→
?
〈Mn, Pn〉 : D1, ..., Dn n ≥ 0
〈Mn, Pn〉 (i
′
a,t
′
a,i
′
b,t
′
b)−−−−−−→ 〈M ′, P ′〉 : D′
D′ 6=  ∨ P ′ = l : skip M and M ′ are both Γ-compatible
〈M,P 〉 Γ,(i
′
a,t
′
a,i
′
b,t
′
b)−−−−−−−→
?
〈M ′, P ′〉 : D1, ..., Dn, D′
This allows programs to make multiple declassifications, accumulating them as a
trace, while remembering only the most recent instruction and memory traces and
ensuring that intermediate memories are Γ-compatible.
Definition 8 (Γ-simulatability). Let Γ be a type environment, and P a program.
We say P is Γ-simulatable if there exist simulators simA and simB, which run
polynomial time in the data size, such that for all M, ia, ta, ib, tb,M
′, P ′, D1, ..., Dn, if
〈M,P 〉 Γ,(ia,ta,ib,tb)−−−−−−−→
?
〈M ′, P ′〉 : D1, ..., Dn, then simA(M [{P,A}], D1[A], ..., Dn−1[A]) ≡
(ia, ta) and simB(M [{P,B}], D1[B], ..., Dn−1[B]) ≡ (ib, tb).
Intuitively, if P is Γ-simulatable there exists a simulator simA that, given
public data M [{P}], Alice’s secret data M [{A}], and all outputs D1[A], ..., Dn−1[A]
declassified to Alice so far, can compute the instruction traces ia and memory traces
105
ta produced by the ideal semantics up until the next declassification event Dn,
regardless of the values of Bob’s secret data.
Note that Γ-simulatability is termination insensitive, and information may be
leaked based upon whether a program terminates or not [9]. However, as long as
all runs of a program are guaranteed to terminate (as is typical for programs run in
secure-computation scenarios), no information leakage occurs.
4.4.4 Type System
This section presents our type system, which we prove ensures Γ-simulatability.
There are two judgments, both defined in Figure 4.9. The first, written Γ ` e : τ ,
states that under environment Γ, expression e evaluates to type τ . The second
judgment, written Γ, pc ` S, states that under environment Γ and a label context
pc, a labeled statement S is type-correct. Here, pc is a label that describes the
ambient control context; pc is set according to the guards of enclosing conditionals
or loops. Note that since a program cannot execute an if-statement or a while-
statement whose guard is secret, pc can be one of P, A, or B, but not O. Intuitively,
if pc is A (resp., B), then the statement is part of Alice’s (resp., Bob’s) local code.
In general, for a labeled statement S = l : s we enforce the invariant pc v l, and if
pc 6= P, then pc = l. In so doing, we ensure that if the security label of a statement
is A (including if-statements and while-statements), then all nested statements also
have security label A, thus ensuring they are only visible to Alice. On the other
hand, under a public context, the statement label is unrestricted.
106
Γ ` e : τ T-Var Γ(x) = Nat l
Γ ` x : Nat l T-Const Γ ` n : Nat P
T-Op
Γ(x1) = Nat l1 Γ(x2) = Nat l2
Γ ` x1 op x2 : Nat l1 unionsq l2
T-Array
Γ(y) = Array l1
Γ(x) = Nat l2 l2 v l1
Γ ` y[x] : Nat l1
T-Mux
Γ(xi) = Nat li i = 1, 2, 3
l = l1 unionsq l2 unionsq l3
Γ `mux(x1, x2, x3) : Nat l
Γ, pc ` S T-Skip
pc v l
pc 6= P⇒ pc = l
Γ, pc ` l : skip T-Seq
Γ, pc ` S1 Γ, pc ` S2
Γ, pc ` S1;S2
T-Assign
Γ(x) = Nat l Γ ` e : Nat l′
pc unionsq l′ v l pc 6= P⇒ l = pc
Γ, pc ` l : x := e T-Declass
pc = P Γ(y) = Nat O
Γ(x) = Nat l l 6= O
Γ, pc ` O : x := declassl(y)
T-ORAM
pc = P Γ(x) = Array O
Γ(y) = Array l l 6= O
Γ, pc ` O : x := oram(y) T-ArrAss
Γ(y) = Array l
Γ(x1) = Nat l1
Γ(x2) = Nat l2
pc unionsq l1 unionsq l2 v l
pc 6= P⇒ l = pc
Γ, pc ` l : y[x1] := x2
T-Cond
Γ(x) = Nat l pc v l
l 6= O pc 6= P⇒ l = pc
Γ, l ` Si i = 1, 2
Γ, pc ` l : if(x)then S1else S2
T-While
Γ(x) = Nat l pc v l
l 6= O Γ, l ` S
pc 6= P⇒ l = pc
Γ, pc ` l : while(x)do S
Figure 4.9: Type System for SCVM
Now we consider some interesting aspects of the rules. Rule T-Assign requires
pcunionsq l′ v l, as is standard: pc v l prevents implicit flows, and l′ v l prevents explicit
ones. We further restrict that Γ(x) = Nat l, i.e., the assigned variable should have
the same security label as the instruction label. Rule T-ArrAss and rule T-Array
require that for an array expression y[x], the security label of x should be lower than
the security label of y. For example, if x is Alice’s secret variable, then y should
be either Alice’s local array, or an ORAM shared between Alice and Bob. If y is
107
Bob’s secret variable, or a public variable, then Bob can observe which indices are
accessed, and then infer the value of x. In the example from Figure 4.2, the array
access vis[bestj] on line 9 requires that vis be an ORAM variable since bestj is.
For rules T-Declass and T-ORAM, since declassification and ORAM initial-
ization statements both require secure computation, we restrict the statement label
to be O. Since these two statements cannot be executed in Alice’s or Bob’s local
mode, we restrict that pc = P.
Rule T-Cond deals with if-statements; T-While handles while loops similarly.
First of all, we restrict pc v l and Γ(x) = Nat l for the same reason as above. Fur-
ther, the rule forbids l to be equal to O to avoid an implicit flow revealed by the pro-
gram’s control flow. An alternative way to achieve instruction- and memory- trace
obliviousness is through padding [58]. However, in the setting of secure-computation,
padding achieves the same performance as rewriting a secret-branching statement
into a mux (or a sequence of them). And, using padding would require reasoning
about trace patterns, a complication our type system avoids.
A well-typed program is Γ-simulatable:
Theorem 3. If Γ, P ` S, then S is Γ-simulatable.
Notice that some rules allow a program to get stuck. For example, in rule
S-ORAM, if the statement is l : x := oram(y) but l 6= O, then the program will
not progress. We define a property called Γ-progress that formalizes the notion of a
program that never gets stuck.
Definition 9 (Γ-progress). Let Γ be a type environment, and let P = P0 be a
108
program. We say P enjoys Γ-progress if for any Γ-compatible memories M0, . . . ,Mn
for which 〈Mj, Pj〉 (i
j
a,t
j
a,i
j
b,t
j
b)−−−−−−→ 〈Mj+1, Pj+1〉 : Dj for j = 0, ..., n − 1, either Pn = l :
skip, or there exist i′a, t
′
a, i
′
b, t
′
b,M
′, P ′ such that 〈Mn, Pn〉 (i
′
a,t
′
a,i
′
b,t
′
b)−−−−−−→ 〈M ′, P ′〉 : D′.
Γ-progress means, in particular, that the third bullet in step (2) of the ideal
functionality (Section 4.4.2) does not occur for type-correct programs.
A well-typed program never gets stuck:
Theorem 4. If Γ, P ` S, then S enjoys Γ-progress.
Proofs of both theorems above can be found in Appendix C.
4.4.5 From SCVM Programs to Secure Protocols
Let P be a program, and let F be the ideal functionality based on this program
as described earlier. Here we define a hybrid-world protocol piG based on P , where
G = (Fop,Fmux,Foram,Fdeclass) is a fixed set of ideal functionalities that implement
simple binary operations (Fop), a MUX operation (Fmux), ORAM access (Foram),
and declassification (Fdeclass). Input to each of these ideal functionalities can either
be Alice or Bob’s local inputs, public inputs, and/or the shares of secret inputs (each
share supplied by Alice and Bob respectively). Each ideal functionality is explicitly
parameterized by the types of the inputs. Further, except for Fdeclass which per-
forms an explicit declassification, all other ideal functionalities return shares of the
computation or memory fetch result to Alice and Bob, respectively. Further details
of the ideal functionalities are given in Appendix D, along with formal definitions
of the simulator and hybrid world semantics.
109
Informally, the hybrid world protocol piG runs as follows:
1. Alice and Bob first agree on public values, ensuring that MA[{P}] = MB[{P}].
During the protocol each maintains a declassification list, for keeping track
of previously declassified values, and a secret memory that contains shares
of secret (non-ORAM) variables. To start, both the lists and memories are
empty, i.e., declsA := declsB :=  and M
S
A = M
S
B = [].
2. Alice runs her simulator (locally) on her initial memory to obtain (ia, ta) =
simA(MA, declsA), where ia and ta cover the portion of the execution starting
from just after the last provided declassification (i.e., the final da in the list
declsA) up to the next declassification instruction or the terminating skip
statement. Bob does likewise to get (ib, tb) = simB(MB, declsB).
3. Alice executes the instructions in ia using the hybrid-world semantics, which
reads (and writes) secret shares from (to) MSA and obtains the values of other
reads from events observed in ta. Bob does similarly with ib, M
S
B and tb. The
semantics covers three cases:
• If an instruction in ia is labeled P, then so is the corresponding instruction
in ib. Both parties execute the instruction.
• If an instruction in ia is labeled A, then Alice executes this instruction
locally. Bob does similarly for instructions labeled B.
• If an instruction in ia is labeled O, then so is the corresponding instruction
in ib. Alice and Bob call the appropriate ideal-world functionality from
110
G to execute this instruction. If the instruction is a declassification, then
Fdeclass will generate an event (da, db).
4. If the last instruction executed in step 3 is a declassification, then Alice ap-
pends her declassification to her local declassification list (i.e., declsA :=
declsA++[da]), and Bob does likewise; then both repeat step 2. Otherwise,
the protocol completes.
We have proved that if P is Γ-simulatable, then piG securely implements F
against semi-honest adversaries.
Theorem 5. (Informally) Let P be a program, F the ideal functionality corre-
sponding to P , and piG the protocol corresponding to P as described above. If P is
Γ-simulatable, then piG securely implements F against semi-honest adversaries in
the G-hybrid model.
Using standard composition results for cryptographic protocols, we obtain as
a corollary that if all ideal functionalities in G are implemented by semi-honest
secure protocols, the resulting (real-world) protocol securely implements F against
semi-honest adversaries.
A formal definition of piG, formal theorem statement, and a proof of the theo-
rem can be found in Appendix D.
4.5 Compilation
We informally discuss how to compile an annotated C-like source language
into a SCVM program. An example of our source language is:
111
int sum(alice int x, bob int y) {
return x<y ? 1 : 0;
}
The program’s two input variables, x and y, are annotated as Alice’s and Bob’s
data, respectively, while the unannotated return type int indicates the result will be
known to both Alice and Bob. Programmers need not annotate any local variables.
To compile such a program into a SCVM program, the compiler takes the following
steps.
Typing the source language. As just mentioned, source level types and initial
security label annotations are assumed given. With these, the type checker infers
security labels for local variables using a standard security type system [79] using
our lattice (Section 4.4.4). If no such labeling is possible without violating security
(e.g., due to a conflict in the initial annotation), the program is rejected.
Labeling statements. The second task is to assign a security label to each state-
ment. For assignment statements and array assignment statements, the label is the
least upper bound of all security labels of the variables occurring in the statement.
For an if-statement or a while-statement, the label is the least upper bound of all
security labels of the guard variables, and all security labels in the branches or loop
body.
On secret branching. The type system defined in Section 4.4.4 will reject an
if-statement whose guard has security label O. As such, if the program branches
on secret data, we must compile it into if-free SCVM code, using mux instructions.
The idea is to execute both branches, and use mux to activate the relevant effects,
112
based on the guard. To do this, we convert the code into Static-Single-Assignment
form (SSA) [7], and then replace occurrences of the φ-operator with a mux. The
following example demonstrates this process:
if(s) then x:=1; else x:=2;
The SSA form of the above code is
if(s) then x1:=1; else x2:=2; x:=phi(x1, x2);
Then we eliminate the if-structure and substitute the φ-operator to achieve
the final code:
x1:=1; x2:=2; x:=mux(s, x1, x2)
(Note that, for simplicity, we have omitted the security labels on the statements
in the example.)
On secret while loops. The type system requires that while loop guards only
reference public data, so that the number of iterations does not leak information.
A programmer can work around this restriction by imposing a constant bound on
the loop; e.g., manually translating while (s) do S to while (p) do if (s) S
else skip, where p defines an upper bound on the number of iterations.
Declassification. The compiler will emit a declassification statement for each
return statement in the source program. To avoid declassifying in the middle of
local code, the type checker in the first phase will check for this possibility and
relabel statements accordingly.
Extension for non-oblivious secret RAM. The discussion so far supports only
secret ORAMs. To support non-oblivious secret RAM in SCVM, we add an ad-
113
ditional security label N such that P v N v O. To incorporate such a change,
the memory trace for the semantics should include two more kinds of trace event,
nread(x, i) and nwrite(x, i), which represent that only the index of an access is
leaked, but not the content. Since label N only applies to arrays, we allow types
Array N but not types Nat N. The rules T-Array and T-ArrAss should be revised
to deal with the non-oblivious RAM. For example, for rule T-ArrAss, where l is the
security label for the array, l1 is the security label of the index variable and l2 is the
security label of the value variable, the type system should still restrict l1 v l, but
if l = N, the type system accepts l2 = O, but requires l1 = P.
Correctness. We do not prove the correctness of our compiler, but instead can
use a SCVM type checker (using the above extension) for the generated SCVM code,
ensuring it is Γ-simulatable. Ensuring the correctness of compilers is orthogonal
and outside the scope of this work, and existing techniques [20] can potentially be
adapted to our setting.
Compiling Dijkstra’s algorithm. We explain how compilation works for Dijk-
stra’s algorithm, previously shown in Figure 4.2. First, the type checker for the
source program determines how memory should be labeled. It determines that the
security labels for bestj and bestdis should be O, and the arrays dis and vis
should be secret-shared between Alice and Bob, since their values depend on both
Alice’s input (i.e., the graph’s edge weights) and Bob’s input (i.e., the source). Then,
since on line 9 array vis is indexed with bestj, variable vis should also be put in
an ORAM. Similarly, on line 12, array e is indexed by bestj so it must also be se-
cret; as such we must promote e, owned by Alice, to be in ORAM, which we do by
114
initializing a new ORAM-allocated variable orame to e at the start of the program.
The type checker then uses the variable labeling to determine the statement
labeling. Statements on lines 4–7, 9, and 11–13, require secure computation and
thus are labeled as O. Loop control-flow statements are computed publicly, so they
are labeled as P.
The two if-statements both branch on ORAM-allocated data, so they must be
converted to mux operations. Lines 4–7 are transformed (in source-level syntax) as
follows
cond3 := !vis[j] && (bestj<0||dis[j]<bestdis);
bestj := mux(cond3, j, bestj);
bestdis := mux(cond3, dis[j], bestdis);
Lines 11-13 are similarly transformed
tmp := bestdis + orame[bestj*n+j];
cond4 := !vis[j] && (tmp<dis[j]);
dis[j] := mux(cond4, tmp, dis[j]);
Finally, the code is translated into SCVM’s three-address code style syntax.
4.6 Evaluation
Programs. We have built several secure two-party computation applications.
As run-once tasks, we implemented both the Knuth-Morris-Pratt (KMP) string-
matching algorithm as well as Dijkstra’s shortest-path algorithm. For repeated
sublinear-time database queries, we considered binary search and the heap data
115
Name Alice’s Input Bob’s Input
Run-once setting
Dijkstra’s shortest path a graph a (src, dest) pair
Knuth-Morris-Pratt string matching a sequence a pattern
Aggregation over sliding windows a key-value table an array of keys
Inverse permutation share of permutation share of permutation
Repeated sublinear-time query
Binary search sorted array search key
Heap (insertion/extraction) share of the heap share of the heap
Table 4.2: Programs used in evaluation of SCVM
structure. All applications are listed in Table 4.2.
Compilation time. All programs took little time (e.g., under 1 second) to compile.
In comparison, some earlier circuit-model compilers involve copying datasets into
circuits, and therefore the compile-time can be large [54,65] (e.g., Kreuter et al. [54]
report a compilation time of roughly 1000 seconds for an implementation of an
algorithm to compute graph isomorphism on 16-node graphs).
In our experiments, we manually checked the correctness of compiled programs
(we have not yet implemented a type checker for SCVM, though doing so should be
straightforward).
4.6.1 Evaluation Methodology
Although our techniques are compatible with any cryptographic back-end se-
cure in the semi-honest model by the definition of Canetti [15], we use the garbled
circuit approach in our evaluation [46].
We measure the computational cost by calculating the number of encryptions
required by the party running as the circuit generator (the party running as the
116
evaluator does less work). For every non-XOR binary gate, the generator makes 3
block-cipher calls; for every oblivious transfer (OT), 2 block-cipher operations are
required since we rely on OT extension [47]. For the run-once applications (i.e.,
Dijkstra shortest distance, KMP-matching, aggregation, inverse permutation), we
count in the ORAM initialization cost when comparing to the automated circuit
approach (which doesn’t require RAM initialization). The ORAM initialization can
be done using a Waksman shuffling network [89]. For the applications expecting
multiple executions we do not count the ORAM initialization cost since this one-
time overhead will be amortized to (nearly) 0 over many executions.
We implemented the binary tree-based ORAM of Shi et al. [80] using garbled
circuits, so that array accesses reveal nothing about the (logical) addresses nor the
outcomes. Throughout the experiments, we set the ORAM bucket size to 32 (i.e.,
each tree-node can store up to 32 blocks), which corresponds to roughly 25-bit
of statistical security (according to the simulation of ORAM failures). Following
Gordon et al.’s ORAM encryption technique [38], every block is XOR-shared (i.e.,
the client stores secret key k while the server stores (r, fk(r)⊕m) where f is a family
of psuedorandom permutations and m the data block). This adds one additional
cipher operation per block (when the length of an ORAM block is less than the
width of the cipher). We note specific choices of the ORAM parameters in related
discussion of each application.
Metrics. We use the number of block-cipher evaluations as our performance metric.
Measuring the performance by the number of symmetric encryptions (instead of wall
clock times) makes it easier to compare with other systems since the numbers can
117
Figure 4.10: SCVM vs. automated circuit-based approach (Binary Search)
be independent of the underlying hardware and ciphering algorithms. Additionally,
in our experiments these numbers represent bandwidth consumption since every
encryption is sent over the network. Therefore, we do not report separately the
bandwidth used. Modern processors with AES support can compute 108 AES-128
operations per second.
4.6.2 Comparison with Automated Circuits
Presently, automated secure computation implementations largely focus on
the circuit-model of computation, handling array accesses by linearly scanning the
entire array with a circuit every time an array lookup happens; this incurs prohibitive
overhead when the dataset is large. In this section, we compare our approach with
the existing compiled circuits, and demonstrate that our approach scales much better
with respect to dataset size.
118
Figure 4.11: SCVM vs. hand-constructed linear scan circuit (Binary Search)
4.6.2.1 Repeated sublinear-time queries
In this scenario, ORAM initialization is a one-time operation whose cost can
be amortized over multiple subsequent queries, achieving sublinear amortized cost
per query.
Binary search. One example application we tested is binary search, where one
party owns a confidential (sorted) array of size n, and the other party searches for
(secret) values stored in that array.
In our experiments, we set the ORAM bucket size to 32. For binary search, we
aligned our experimental settings with those of Gordon et al. [38], namely, assuming
the size of each data item is 512 bits. We set the recursion factor to 8 (i.e., each
block can store up to 8 indices for the data in the upper level recursion tree) and the
recursion cut-off threshold to 1000 (namely no more recursion once fewer than 1000
units are to be stored). Comparing to a circuit-model implementation—which uses
119
a circuit of size O(n log n) that implements binary search—our approach is faster
for all RAM sizes tested (see Figure 4.10). For n = 220, our approach achieves a
100× speedup.
Note it is also possible to use a smaller circuit of size O(n) that just performs a
linear scan over the data. However, such a circuit would have to be “hand-crafted,”
and would not be output by automated compilation of a binary-search program. Our
approach runs faster for large n even when compared to such an implementation
(see Figure 4.11). On data of size n = 220, our approach achieves a 5× speedup
even when compared to this “hand-crafted” circuit-based solution.
Heap. Besides binary search, we also implemented an oblivious heap data structure
(with 32-bit payload, i.e., size of each item). The costs of insertion and extraction
respecting various heap sizes are given in Figure 4.12 and 4.13, respectively. The
basic shapes of the performance curves are very similar to that for binary search
(except that heap extraction is twice as slow as insertion because two comparisons
are needed per level). We can observe an 18× speedup for both heap insertion and
heap extraction when the heap size is 220.
The speedup of our heap implementation over automated circuits is even
greater when the size of the payload is bigger. At 512-bit payload, we have an
100× speedup for data size 220. This is due to the extra work incurred from realiz-
ing the ORAM mechanism, which grows (in poly-logarithmic scale) with the size of
the RAM but independent of the size of each data item.
120
Figure 4.12: Heap insertion in SCVM
4.6.2.2 Faster one-time executions
We present two applications: the Knuth-Morris-Pratt string-matching algo-
rithm (representative of linear-time RAM programs) and Dijkstra’s shortest-path
algorithm (representative of super-linear time RAM programs). We compare our
approach with a naive program-to-circuit compiler which copies the entire array for
every dynamic memory access.
The Knuth-Morris-Pratt algorithm. Alice has a secret string T (of length n)
while Bob has a secret pattern P (of length m) and wants to scan through Alice’s
string looking for this pattern. The original KMP algorithm runs in O(n + m)
time when T and P are in plaintext. Our compiler compiles an implementation of
KMP into a secure string matching protocol preserving its linear efficiency up to a
polylogarithmic factor (due to the ORAM technique).
We assume the string T and the pattern P both consist of 16-bit characters.
121
Figure 4.13: Heap extraction in SCVM
The recursion factor of the ORAM is set to 16. Figure 4.14 and 4.15 show our
results compared to those when a circuit-model compiler is used. From Figure 4.14,
we can observe that our approach is slower than the circuit-based approach on
small datasets, since the overhead of the ORAM protocol dominates in such cases.
However, the circuit-based approach’s running time increases more quickly as the
dataset’s size increases. When m = 50 and n = 2 × 106, our program runs 21×
faster.
Dijkstra’s algorithm. Here Alice has a secret graph while Bob has a secret
source/destination pair and wishes to compute the shortest distance between them.
Compiling from a standard Dijkstra shortest-path algorithm, we obtain anO(n2 log3 n)-
overhead RAM-model protocol.
In our experiment, Alice’s graph is represented by an n× n adjacency matrix
(of 32-bit integers) where n is the number of vertices in the graph. The distances
associated with the edges are denoted by 32-bit integers. We set ORAM recursion
122
Figure 4.14: KMP string matching for median n (fixing m = 50) in SCVM
factor to 8. The results (Figure 4.16) show that our scheme runs faster for all sizes of
graphs tested. As the performance of our protocol is barely noticeable in Figure 4.16,
the performance gaps between the two protocols for various n is explicitly plotted
in Figure 4.17. Note the shape of the speedup curve is roughly quadratic.
Aggregation over sliding windows. Alice has a key-value table, and Bob has a
(size-n) array of keys. The secure computation task is the following: for every size-k
window on the key array, look up k values corresponding to Bob’s k keys within
the window, and output the minimum value. Our compiler outputs a O(n log3 n)
protocol to accomplish the task. The optimized protocol performs significantly
better, as shown in Figure 4.18 and 4.19 (we fixed the window size k to 1000 and
set recursion factor to 8, while varying the dataset from 1 to 6 million pairs).
123
Figure 4.15: KMP string matching for large n (fixing m = 50) in SCVM
4.6.3 Comparison with RAM-SC Baselines
Benefits of instruction-trace obliviousness. The RAM-SC technique of Gor-
don et al. [38], described in Section 4.2, uses a universal next-instruction circuit to
hide the program counter and the instructions executed. Each instruction involves
ORAM operations for instruction and data fetches, and the next-instruction circuit
must effectively execute all possible instructions and use an n-to-1 multiplexer to
select the right outcome. Despite the lack of concrete implementation for their gen-
eral approach, we show through back-of-the-envelope calculations that our approach
should be orders-of-magnitude faster.
Consider the problem of binary search over a 1-million item dataset: in each
iteration, there are roughly 10 instructions to run, hence 200 instructions in total to
complete the search. To run every instruction, a universal-circuit-based implemen-
tation has to execute every possible instruction defined in its instruction set. Even
124
Figure 4.16: Dijkstra’s shortest-path algorithm’s performance in SCVM
Figure 4.17: Dijkstra’s shortest-path algorithm’s speedup of SCVM
if we conservatively assume a RISC-style instruction set, we would require over 9
million (non-free) binary gates to execute just a memory read/write over a 512M bit
RAM. Plus, an extra ORAM read is required to obliviously fetch every instruction.
Thus, at least a total of 3600 million binary gates are needed, which is more than 20
times slower than our result exploiting instruction trace obliviousness. Furthermore,
notice that binary search is merely a case where the program traces are very short
(with only logarithmic length). Due to the overwhelming cost of ORAM read/write
125
Figure 4.18: Aggregation over sliding windows’s performance in SCVM
Figure 4.19: Aggregation over sliding windows’s speedup in SCVM
instructions, we stress that the performance gap will be much greater with respect
to programs that have relatively fewer memory read/write instructions (comparing
to binary search, 1 out of 10 instructions is a memory read instruction).
Benefits of memory-trace obliviousness. In addition to avoiding the overhead
of a next-instruction circuit, SCVM avoids the overhead of storing all arrays in a
single, large ORAM. Instead, SCVM can store some arrays as non-oblivious se-
cret shared memory, and others in separate ORAM banks, rather than one large
126
Figure 4.20: SCVM’s Savings by memory-trace obliviousness optimization (inverse
permutation). the non-linearity (around 60) of the curve is due to the increase of
the ORAM recursion level at that point.
Figure 4.21: Savings by memory-trace obliviousness optimization (Dijkstra)
ORAM. Doing so does not compromise security because the type system ensures
memory-trace obliviousness. Here we assess the advantages of these optimizations
by comparing against SCVM programs compiled without the optimizations enabled.
The results for two applications are given in Figure 4.20 and 4.21.
• Inverse permutation. Consider a permutation of size n, represented by an
array a of n distinct numbers from 1 to n, i.e., the permutation maps the i-th
object to the a[i]-th object. One common computation would be to compute
127
its inverse, e.g., to do an inverse table lookup using secret indices. The inverse
permutation (with result stored in array b) can be computed with the loop:
while (i < n) { b[a[i]]=i; i=i+1;}
The memory-trace obliviousness optimization automatically identifies that the
array a doesn’t need to be put in ORAM though its content should remain
secret (because the access pattern to a is entirely public known). This yields
50% savings, which is corroborated by our experiment results (Figure 4.20).
• Dijkstra’s shortest path. We discussed the advantages of memory-trace
obliviousness in Section 4.3 with respect to Dijkstra’s algorithm. Our experi-
ments show that we consistently save 15 ∼ 20% for all graph sizes. The savings
rates for smaller graphs are in fact higher even though it is barely noticeable
in the chart because of the fast (super-quadratic) growth of overall cost.
4.7 Conclusions
We describe the SCVM system as the first automated approach for RAM-
model secure computation. In the next Chapter, we will extend SCVM to ObliVM
with richer programming features to improve expressive power, easy programmabil-
ity, and performance. Directions for future work include extending our framework
to support malicious security; applying orthogonal techniques (e.g., [20]) to ensure
correctness of the compiler; incorporating other cryptographic backends into our
framework; and adding additional language features such as higher-dimensional ar-
rays and structured data types.
128
Chapter 5: ObliVM: A Programming Framework for Secure Compu-
tation
In Chapter 4, we have seen that trace oblivious approach and SCVM system
can improve the performance for RAM-model secure computation. In this Chapter,
we extend SCVM to build a programming framework to make secure computation
both easy-to-program and practically efficient.
Architecting a system framework for secure computation presents numerous
challenges. First, the system must allow non-specialist programmers without secu-
rity expertise to develop applications. Second, efficiency is a first-class concern in
the design space, and scalability to big data is essential in many interesting real-life
applications. Third, the framework must be reusable: expert programmers should be
able to easily extend the system with rich, optimized libraries or customized crypto-
graphic protocols, and make them available to non-specialist application developers.
We design and build ObliVM, a system framework for automated secure multi-
party computation. ObliVM is designed to allow non-specialist programmers to write
programs much as they do today, and our ObliVM compiler compiles the program
to an efficient secure computation protocol. To this end, ObliVM offers a domain-
specific language that is intended to address a fundamental representation gap,
129
namely, secure computation protocols (and other branches of modern cryptography)
rely on circuits as an abstraction of computation, whereas real-life developers write
programs instead. In architecting ObliVM, our main contribution is the design of pro-
gramming support and compiler techniques that facilitate such program-to-circuit
conversion while ensuring maximal efficiency. Presently, our framework assumes
a semi-honest two-party protocol in the back end. To demonstrate an end-to-end
system, we chose to implement an improved Garbled Circuit protocol as the back
end, since it is among the most practical protocols to date. Our ObliVM framework,
including source code and demo applications, will be open-sourced on our project
website http://www.oblivm.com.
This chapter is based on a paper that I co-authored with Yan Huang, Kartik
Nayak, Elaine Shi, and Xiao Shaun Wang [60]. I designed the novel programming
language and implemented the programming frameworks. I developed the compiler
to support the new language, which emits code that is runnable on a new secure
computation backend designed and implemented by Yan Huang and Xiao Shaun
Wang. I conducted experiments to show the compiler’s effectiveness with the help
of Xiao Shaun Wang.
5.1 ObliVM Overview and Contributions
In designing and building ObliVM, we make the following contributions.
Programming abstractions for oblivious algorithms. The most challenging
part about ensuring a program’s obliviousness is memory-trace obliviousness – there-
130
fore our discussions below will focus on memory-trace obliviousness. A straightfor-
ward approach (henceforth referred to as the generic ORAM baseline) is to provide
an Oblivious RAM (ORAM) abstraction, and require that all arrays (whose access
patterns depend on secret inputs) be stored and accessed via ORAM. This approach,
which was effectively taken by SCVM [59], is generic, but does not necessarily yield
the most efficient oblivious implementation for each specific program.
At the other end of the spectrum, a line of research has focused on customized
oblivious algorithms for special tasks (sometimes also referred to as circuit struc-
ture design). For example, efficient oblivious algorithms have been demonstrated
for graph algorithms [14, 37], machine learning algorithms [70, 71], and data struc-
tures [51, 66, 92]. The customized approach can outperform generic ORAM, but
is extremely costly in terms of the amount of cryptographic expertise and time
consumed.
ObliVM aims to achieve the best of both worlds by offering oblivious pro-
gramming abstractions that are both user- and compiler friendly. These program-
ming abstractions are high-level programming constructs that can be understood
and employed by non-specialist programmers without security expertise. Behind
the scenes, ObliVM translates programs written in these abstractions into efficient
oblivious algorithms that outperform generic ORAM. When oblivious programming
abstractions are not applicable, ObliVM falls back to employing ORAM to translate
programs to efficient circuit representations. Presently, ObliVM offers the follow-
ing oblivious programming abstractions: MapReduce abstractions, abstractions for
oblivious data structures, and a new loop coalescing abstraction which enables novel
131
oblivious graph algorithms. We remark that this is by no means an exhaustive list of
possible programming abstractions that facilitate obliviousness. It would be exciting
future research to uncover new oblivious programming abstractions and incorporate
them into our ObliVM framework.
An expressive programming language. ObliVM offers an expressive and versa-
tile programming language called ObliVM-Lang. When designing ObliVM, we have
the following goals.
• Non-specialist application developers find the language intuitive.
• Expert programmers should be able to extend our framework with new fea-
tures. For example, an expert programmer should be able to introduce new,
user-facing oblivious programming abstractions by embedding them as li-
braries in ObliVM (see Section 5.3.2 for an example).
• Expert programmers can implement even low-level circuit libraries directly
atop ObliVM-Lang. Recall that unlike a programming language in the tra-
ditional sense, here the underlying cryptography fundamentally speaks only
of AND and XOR gates. Even basic instructions such as addition, multipli-
cation, and ORAM accesses must be developed from scratch by an expert
programmer. In most previous frameworks, circuit libraries for these basic
operations are developed in the back end. ObliVM, for the first time, allows
the development of such circuit libraries in the source language, greatly re-
ducing programming complexity. Section 5.4.1 demonstrates case studies for
implementing basic arithmetic operations and Circuit ORAM atop our source
132
language ObliVM.
• Expert programmers can implement customized protocols in the back end
(e.g., faster protocols for performing big integer operations or matrix opera-
tions), and export these customized protocols to the source language as native
types and native functions.
To simultaneously realize these aforementioned goals, we need a much more
powerful and expressive programming language than any existing language for se-
cure computation [44, 54, 59, 75, 98]. Our ObliVM-Lang extends the SCVM language
presented in Chapter 4 and offers new features such as phantom functions, generic
constants, random types, as well as native types and functions. We will show why
these language features are critical for implementing oblivious programming abstrac-
tions and low-level circuit libraries.
Additional architectural choices. ObliVM also allows expert programmers to
develop customized cryptographic protocols (not necessarily based on Garbled Cir-
cuit) in the back end. These customized back end protocols can be exposed to
the source language through native types and native function calls, making them
immediately reusable by others. Section 5.5.1 describes an example where an ex-
pert programmer designs a customized protocol for BigInteger operations using
additively-homomorphic encryption. The resulting BigInteger types and opera-
tions can then be exported into our source language ObliVM-Lang.
133
5.1.1 Applications and Evaluation
ObliVM’s easy programmability allowed us to develop a suite of libraries and
applications, including streaming algorithms, data structures, machine learning al-
gorithms, and graph algorithms. These libraries and applications will be shipped
with the ObliVM framework. Our application-driven evaluation suggests the follow-
ing results:
Efficiency. We use ObliVM’s user-facing programming abstractions to develop a
suite of applications. We show that over a variety of benchmarking applications,
the resulting circuits generated by ObliVM can be orders of magnitude smaller than
the generic ORAM baseline (assuming that the state-of-the-art Circuit ORAM [90]
is adopted for the baseline) under moderately large data sizes. We also compare our
ObliVM-generated circuits with hand-crafted designs, and show that for a variety of
applications, our auto-generated circuits are only 0.5% to 2% bigger in size than
oblivious algorithms hand-crafted by human experts.
Development effort. We give case studies to show how ObliVM greatly reduces
the development effort and expertise needed to create applications over secure com-
putation.
New oblivious algorithms. We describe a few new oblivious algorithms that we
uncover during this process of programming language and algorithms co-design.
Specifically, we demonstrate new oblivious graph algorithms including oblivious
Depth-First-Search for dense graphs, oblivious shortest path for sparse graphs, and
an oblivious minimum spanning tree algorithm.
134
5.1.2 Threat Model, Deployment, and Scope
Deployment scenarios and threat model. ObliVM are designed for the same
deployment scenarios under the same threat model as SCVM.
Scope. A subset of ObliVM’s source language ObliVM-Lang has a security type
system which, roughly speaking, ensures that the program’s execution traces are
independent of secret inputs [58,59].
ObliVM-Lang’s type system is further extended to support reasoning about the
declassifications of random numbers to provide a principled guidance on how devel-
opers should use random numbers properly while enforcing security. This extended
type system, however, does not guarantee the security on random number usages.
We argue that this extended type system is still useful in capturing subtle bugs in
the implementations like oblivious data structures. We leave developing a sound
and complete type system to handle random numbers as a future work.
By designing a new language, ObliVM does not directly retrofit legacy code.
Such a design choice maximizes opportunities for compile-time optimizations. We
note, however, that in subsequent work joint with our collaborators [39], we have
implemented a MIPS CPU in ObliVM, which can securely evaluate standard MIPS
instructions in a way that leaks only the termination channel (i.e., total runtime of
the program) – this secure MIPS CPU essentially provides backward compatibility
atop ObliVM whenever needed.
135
5.2 Programming Language and Compiler
As mentioned earlier, we wish to design a powerful source language ObliVM-
Lang such that an expert programmer can i) develop oblivious programming abstrac-
tions as libraries and offer them to non-specialist programmers; and ii) implement
low-level circuit gadgets atop ObliVM-Lang.
ObliVM-Lang builds on top of the recent SCVM IR as described in Chapter 4 –
the only known language to date that supports ORAM abstractions , and therefore
offers scalability to big data. In this section, we will describe new features that
ObliVM-Lang offers and explain intuitions behind our security type system.
As compelling applications of ObliVM-Lang, in Section 5.3, we give concrete
case studies and show how to implement oblivious programming abstractions and
low-level circuit libraries atop ObliVM-Lang.
5.2.1 Language features for expressiveness and efficiency
Security labels. Except for the new random type introduced in Section 5.2.2, all
other variables and arrays are either of a public or secure type. secure variables
are secret-shared between the two parties such that neither party sees the value.
public variables are observable by both parties. Arrays can be publicly or secretly
indexable. For example,
• secure int10[public 1000] keys: secret array contents but indices to the
array must be public. This array will be secret shared but not placed in
136
ORAMs.
• secure int10[secure 1000] keys: This array will be placed in a secret-
shared ORAM, and we allow secret indices into the array.
Standard features. ObliVM-Lang allows programmers to use C-style keyword
struct to define record types. It also supports generic types similar to templates
in C++. For example, a binary tree with public topological structure but secret
per-node data can be defined without using pointers (assuming its capacity is 1000
nodes):
struct KeyValueTable<T> {
secure int10[public 1000] keys;
T[public 1000] values;
};
In the above, the type int10 means that its value is a 10-bit signed integer. Each
element in the array values has a generic type T similar to C++ templates. ObliVM-
Lang assumes data of type T to be secret-shared. In the future, we will improve the
compiler to support public generic types.
Generic constants. Besides general types, ObliVM-Lang also supports generic con-
stants to further improve the reusability. Let us consider the following tree example:
struct TreeNode@m<T> {
public int@m key;
T value;
public int@m left, right;
};
struct Tree@m<T> {
TreeNode<T>[public (1<<m)-1] nodes;
public int@m root;
};
137
This code defines a binary search tree implementation of a key-value store, where
keys are m-bit integers. The generic constant @m is a variable whose value will be
instantiated to a constant. It hints that m bits are enough to represent all the posi-
tion references to the array. The type int@m refers to an integer type with m bits.
Further, the capacity of array nodes can be determined by m as well (i.e. (1<<m)-1).
Note that Zhang et al. [98] also allow specifying the length of an integer, but re-
quire this length to be a hard-coded constant – this necessitates modification and
recompilation of the program for different inputs. ObliVM-Lang’s generic constant
approach eliminates this constraint, and thus improves reusability.
Functions. ObliVM-Lang allows programmers to define functions. For example,
following the Tree defined as above, programmers can write a function to search
the value associated with a given key in the tree as follows:
1 T Tree@m<T>.search(public int@m key) {
2 public int@m now = this.root, tk;
3 T ret;
4 while (now != -1) {
5 tk = this.nodes[now].key;
6 if (tk == key)
7 ret = this.nodes[now].value;
8 if (tk <= key)
9 now = this.nodes[now].right;
10 else
11 now = this.nodes[now].left;
12 }
13 return ret
14 };
This function is a method of a Tree object, and takes a key as input, and returns a
138
value of type T. The function body defines three local variables now and tk of type
public int@m, and ret of type T. The definition of a local variable (e.g. now) can
be accompanied with an optional initialization expression (e.g. this.root). When
a variable (e.g. ret or tk) is not initialized explicitly, it is initialized to be a default
value depending on its type.
The rest of the function is standard, C-like code, except that ObliVM-Lang
requires exactly one return statement at the bottom of a function whose return
type is not void. We highlight that ObliVM-Lang allows arbitrary looping on a
public guard (e.g. line 4) without loop unrolling, which cannot be compiled in
previous loop-elimination-based work [13,43,44,55,65,98].
Function types. Programmers can define a variable to have function type, similar
to function pointers in C. To avoid the complexity of handling arbitrary higher order
functions, the input and return types of a function type must not be function types.
Further, generic types cannot be instantiated with function types.
Native primitives. ObliVM-Lang supports native types and native functions. For
example, ObliVM-Lang’s default back end implementation is ObliVM-SC, which is
implemented in Java. Suppose an alternative BigInteger implementation in ObliVM-
SC (e.g., using additively homomorphic encryption) is available in a Java class called
BigInteger. Programmers can define
typedef BigInt@m = native BigInteger;
Suppose that this class supports four operations: add, multiply, fromInt
and toInt, where the first two operations are arithmetic operations and last two
operations are used to convert between Garbled Circuit-based integers and HE-based
139
integers. We can expose these to the source language by declaring:
BigInt@m BigInt@m.add(BigInt@m x, BigInt@m y)
= native BigInteger.add;
BigInt@m BigInt@m.multiply(BigInt@m x, BigInt@m y)
= native BigInteger.multiply;
BigInt@m BigInt@m.fromInt(int@m y) = native BigInteger.fromInt;
int@m BigInt@m.toInt(BigInt@m y) = native BigInteger.toInt;
5.2.2 Language features for security
The key requirement of ObliVM-Lang is that a program’s execution traces will
not leak information. These execution traces include a memory trace, an instruction
trace, a function stack trace, and a declassification trace. The trace definitions are
similar to SCVM in Chapter 4, and we develop a security type system for ObliVM-
Lang, which is similar to the one for SCVM to enforce trace obliviousness.
In addition, ObliVM-Lang provides an extended type system that imposes fur-
ther constraints on how random numbers and functions should be used to achieve
security. This type system extension does not enforce formal security, but it provides
useful hints to capture subtle bugs, e.g., when implement oblivious data structures.
Random numbers and implicit declassifications. Many oblivious programs
such as ORAM and oblivious data structures crucially rely on randomness. In par-
ticular, their obliviousness guarantee has the following nature: the joint distribution
of memory traces is identical regardless of secret inputs (these algorithms typically
have a cryptographically negligible probability of correctness failure). ObliVM-Lang
140
supports reasoning of such “distributional” trace-obliviousness by providing random
types associated with an affine type system. For instance, rnd32 is the type of a
32-bit random integer. A random number will always be secret-shared between the
two parties.
To generate a random number, there is a built-in function RND with the fol-
lowing signature:
rnd@m RND(public int32 m)
This function takes a public 32-bit integer m as input, and returns m random bits.
Note that rnd@m is a dependent type, whose type depends on values, i.e. m. To avoid
the complexity of handling general dependent types, the ObliVM-Lang compiler re-
stricts the usage of dependent types to only this built-in function, and handles it
specially.
In our ObliVM framework, outputs of a computation can be explicitly declas-
sified with special syntax. Random numbers are allowed implicit declassification –
by assigning them to public variables. Here “implicitness” means that the declassi-
fication happens not because this is a specified outcome of the computation.
For security, we must ensure that each random number is implicitly declassi-
fied at most once for the following reason. When implicitly declassifying a random
number, both parties observe the random number as part of the trace. Now consider
the following example where s is a secret variable.
141
1 rnd32 r1 = RND(32), r2= RND(32);
2 public int32 z;
3 if (s) z = r1; // implicit declass
4 else z = r2; // implicit declass
. . . . . .
XX public int32 y = r2; // NOT OK
In this program, random variables r1 and r2 are initialized in Line 1 – these vari-
ables are assigned a fresh, random value upon initialization. Up to Line 4 random
variables r1 and r2 are each declassified no more than once. Line XX, however,
could potentially cause r2 to be declassified more than once. Line XX clearly is not
secure since in this case the observable public variable y and z could be correlated
– depending on which secret branch was taken earlier.
Therefore, we use an affine type system to ensure that each random variable
is implicitly declassified at most once. This way, each time a random variable is
implicitly declassified, it will introduce a independently uniform variable to the
observable trace. In our security proof, a simulator can just sample this random
number to simulate the trace.
It turns out that the above example reflects the essence of what is needed to
implement oblivious RAM and oblivious data structures in our source language. We
refer the readers to Sections 5.3 and 5.4.2 for details.
Function calls and phantom functions. A straightforward idea to prevent
stack behavior from leaking information is to enforce function calls in a public
context. Then the requirement is that each function’s body must satisfy memory-
and instruction-trace obliviousness. Further, by defining native functions, ObliVM-
Lang implicitly assumes that their implementations satisfy memory- and instruction-
142
trace obliviousness.
Beyond this basic idea, ObliVM-Lang makes a step forward to enabling function
calls within a secret if-statement by introducing the notion of phantom function. The
idea is that each function can be executed in dual modes, a real mode and and a
phantom mode. In the real mode, all statements are executed normal with real
computation and real side effects. In the phantom mode, the function execution
merely simulates the memory traces of the real world; no side effects take place;
and the phantom function call returns a secret-shared default value of the specified
return type. This is similar to padding ideas used in several previous works [6, 78].
We will illustrate the use of phantom function with the following prefixSum
example. The function prefixSum(n) accesses a global integer array a, and com-
putes the prefix sum of the first n + 1 elements in a. After accessing each element
(Line 3), the element in array a will be set to 0 (Line 4).
1 phantom secure int32 prefixSum
2 (public int32 n) {
3 secure int32 ret=a[n];
4 a[n]=0;
5 if (n != 0) ret = ret+prefixSum(n-1);
6 return ret;
7 }
The keyword phantom indicates that the function prefixSum is a phantom func-
tion.
Consider the following code to call the phantom functions:
if (s) then x = prefixSum(n);
To ensure security, prefixSum will always be called no matter s is true or
143
false. When s is false, however, it must be guaranteed that (1) elements in array
a will not be assigned to be 0; and (2) the function generates traces with the same
probability as when s is true. To this end, the compiler will generate target code
with the following signature:
prefixSum(idx, indicator)
where indicator means whether the function will be called in the real or phan-
tom mode. To achieve the first goal, the global variable will be modified only if
indicator is false. The compiler will compile the code in line 4 into the following
pseudo-code:
a[idx]=mux(0, a[idx], indicator);
It is easy to see, that all instructions will be executed, and thus the generated
traces are identical regardless of the value of indicator. Note, that such a function
is not implementable in any prior loop-unrolling based compiler, since n is provided
at runtime only.
It is worth noticing that phantom function relaxed the restriction posed by
previous memory trace oblivious type systems [58], which do not allow looping in
the secure context (i.e. within a secret conditional). The main difficulty in previous
systems was to quantify the numbers of loop iterations in the two branches of an
if-statement, and to enforce the two numbers to be the same. Phantom functions
remove the need of this analysis by executing both branches, with one branched
really executed, and the other executed phantomly. As long as an adversary is
unable to distinguish between a real execution from a phantom one, the secret
144
guard of the if-statement will not be leaked, even when loops are virtually present
(i.e. in a phantom function).
5.3 User-Facing Oblivious Programming Abstractions
Programming abstractions such as MapReduce and GraphLab have been pop-
ularized in the parallel computing domain. In particular, programs written for a
traditional sequential programming paradigm are difficult to parallelize automati-
cally by an optimizing compiler. These new paradigms are not only easy for users
to understand and program with, but also provide insights on the structure of the
problem, and facilitate parallelization in an automated manner.
In this section, we would like to take a similar approach towards oblivious
programming as well. The idea is to develop oblivious programming abstractions
that can be easily understood and consumed by non-specialist programmers, and our
compiler can compile programs into efficient oblivious algorithms. In comparison, if
these programs were written in a traditional imperative-style programming language
like C, compile-time optimizations would have been much more limited.
5.3.1 MapReduce Programming Abstractions
An interesting observation is that “parallelism facilitates obliviousness” [24,
36]. If a program (or part of a program) can be efficiently expressed in parallel
programming paradigms such as MapReduce and GraphLab [2, 64] (with a few ad-
ditional constraints), there is an efficient oblivious algorithm to compute this task.
145
We stress that in this paper, we consider MapReduce merely as a programming
abstraction that facilitates obliviousness – in reality we compile MapReduce pro-
grams to sequential implementations that runs on a single thread. Parallelizing the
algorithms is outside the scope of this work.
Background: Oblivious algorithms for streaming MapReduce. A streaming
MapReduce program consists of two basic operations, map and reduce.
• The map operation: takes an array denoted {αi}i∈[n] where each αi ∈ D for
some domain D, and a function mapper : D → K × V . Now map would
apply (ki, vi) := mapper(αi) to each αi, and output an array of key-value pairs
{(ki, vi)}i∈[n].
• The reduce operation: takes in an array of key-value pairs denoted {(ki, vi)}i∈[n]
and a function reducer : K×V2 → V . For every unique key k value in this ar-
ray, let (k, vi1), (k, vi2), . . . (k, vim) denote all occurrences with the key k. Now
the reduce operation applies the following operation in a streaming fashion:
Rk := reducer(k, . . . reducer(k, reducer(k, vi1 , vi2), vi3), . . . , vim)
The result of the reduce operation is an array consisting of a pair (k,Rk) for
every unique k value in the input array.
Goodrich and Mitzenmacher [36] observe that any program written in a stream-
ing MapReduce abstraction can be converted to efficient oblivious algorithms, and
they leverage this observation to aid the construction of an ORAM scheme.
146
1 Pair<K,V>[public n] MapReduce@m@n<I,K,V>
2 (I[public m] data, Pair<K, V> map(I),
3 V reduce(K, V, V), V initialVal,
4 int2 cmp(K, K)) {
5 public int32 i;
6 Pair<K, V>[public m] d2;
7 for (i=0; i<m; i=i+1)
8 d2[i] = map(data[i]);
9 sort@m<K, V>(d2, 1, cmp);
10 K key = d2[0].k;
11 V val = initialVal;
12 Pair<int1, Pair<K, V>>[public m] res;
13 for (i=0; i+1<m; i=i+1) {
14 res[i].v.k = key;
15 res[i].v.v = val;
16 if (cmp(key, d2[i+1].k)==0) {
17 res[i].k.val = 1;
18 } else {
19 res[i].k.val = 0;
20 key = d2[i+1].k;
21 val = initialVal;
22 }
23 val = reduce(key, val, d2[i+1].v);
24 }
25 res[m-1].k.val = 0;
26 res[m-1].v.k = key;
27 res[m-1].v.v = val;
28 sort@m<int1, Pair<K, V>>
29 (res, 1, zeroOneCmp);
30 Pair<K, V>[public n] top;
31 for (i=0; i < n; i = i + 1)
32 top[i] = res[i].v;
33 return top;
34 }
Figure 5.1: Streaming MapReduce in ObliVM-Lang. See Section 5.3.1 for obliv-
ious algorithms for the streaming MapReduce paradigm [36].
147
• The map operation is inherently oblivious, and can be done by making a linear
scan over the input array.
• The reduce operation can be made oblivious through an oblivious sorting
(denoted o-sort) primitive.
– First, o-sort the input array in ascending order of the key, such that all
pairs with the same key are grouped together.
– Next, in a single linear scan, apply the reducer function: i) If this is the
last key-value pair for some key k, write down the result of the aggregation
(k,Rk). ii) Else, write down a dummy entry ⊥.
– Finally, o-sort all the resulting entries to move ⊥ to the end.
Providing the streaming MapReduce abstraction in ObliVM. It is easy to
implement the streaming MapReduce abstraction as a library in our source language
ObliVM-Lang. The ObliVM-Lang implementation of streaming MapReduce paradigm
is provided in Figure 5.1.
MapReduce has two generic constants, m and n, to represent the sizes of the
input and output respectively. It also has three generic types, I for inputs’ type, K,
for output keys’ type, and V, for output values’ type. All of these three types are
assumed to be secret.
It takes five inputs, data for the input data, map for the mapper, reduce for
the reducer, initialVal for the initial value for the reducer, and cmp to compare
two keys of type K.
148
Lines 6-10 are the mapper phase of the algorithm, then line 11 uses the func-
tion sort to sort the intermediate results based on their keys. After line 11, the
intermediate results with the same key are grouped together, and line 12-29 pro-
duce the output of the reduce phase with some dummy outputs. Finally, lines 30-35
use oblivious sort again to eliminate those dummy outputs, and eventually line 36
returns the final results.
Notice that in these functions, there are three arrays, data, d2, and res.
The program declares all of them to have only public access pattern, because they
are accessed by either a sequential scan, or an oblivious sorting. In this case, the
compiler will not place these arrays into ORAM banks.
Using MapReduce. Figure 5.1 needs to be written by an expert developer only
once. From then on, an end user can make use of this programming abstraction.
We further illustrate how to use the above MapReduce program to implement
a histogram. In SCVM (Chapter 4), a histogram program is as below.
for (public int i=0; i<n; ++i) c[i] = 0;
for (public int i=0; i<m; ++i) c[a[i]] ++;
This program counts the frequency of each values in [0..n − 1] in the array a of
size m. Since the program makes dynamic memory accesses, the SCVM compiler
would decide to put the array c inside an ORAM.
An end user can write the same program using a simple MapReduce abstrac-
tion as follows. Our ObliVM-Lang compiler would generate target code that relies
149
on oblivious sorting primitives rather than generic ORAM, improving the perfor-
mance by a logarithmic factor in comparison with the SCVM implementation. In
Section 5.5, we show that the practical performance gain ranges from 10× to 400×.
int2 cmp(int32 x, int32 y) {
int2 r = 0;
if (x < y) r = -1;
else if (x > y) r = 1;
return r;
}
Pair<int32, int32> mapper(int32 x) {
return Pair<int32, int32>(x, 1);
}
int32 reducer(int32 k, int32 v1, int32 v2) {
return v1 + v2;
}
The top-level program can launch the computation using
c=MapReduce@m@n<int32, int32, int32>(a, mapper, reducer, cmp, 0);
5.3.2 Programming Abstractions for Data Structures
We now explain how to provide programming abstractions for a class of pointer-
based oblivious data structures described by Wang et al. [92]. Figure 5.2 gives an
example, where an expert programmer provides library support (Figure 5.3) for im-
plementing a class of pointer-based data structures such that a non-specialist pro-
grammer can implement data structures which will be compiled to efficient oblivious
algorithms that outperform generic ORAM. We stress that while we give a stack
example for simplicity, this paradigm is also applicable to other pointer-based data
structures, such as AVL tree, heap, and queue.
150
1 struct StackNode@m<T> {
2 Pointer@m next;
3 T data;
4 };
5 struct Stack@m<T> {
6 Pointer@m top;
7 SecStore@m store;
8 };
9 phantom void Stack@m<T>.push(T data) {
10 StackNode@m<T> node = StackNode@m<T> (
top, data);
11 this.top = this.store.allocate();
12 store.add(top.(index, pos), node);
13 }
14 phantom T Stack@m<T>.pop() {
15 StackNode@m<T> res = store
.readAndRemove(top.(index, pos));
16 top = res.next;
17 return res.data;
18 }
Figure 5.2: Oblivious stack by non-specialist programmers.
Implementing oblivious data structure abstractions in ObliVM. We assume
that the reader is familiar with the oblivious data structure algorithmic techniques
described by Wang et al. [92]. To support efficient data structure implementations,
an expert programmer implements two important objects (see Figure 5.3):
• A Pointer object stores two important pieces of information: an index vari-
able that stores the logical identifier of the memory block pointed to (each
memory block has a globally unique index); and a pos variable that stores
the random leaf label in the ORAM tree of the memory block.
• A SecStore object essentially implements an ORAM, and provides the follow-
ing member functions to an end-user: The SecStore.remove function essen-
151
1 rnd@m RND(public int32 m) = native lib.rand;
2 struct Pointer@m {
3 int32 index;
4 rnd@m pos;
5 };
6 struct SecStore@m<T> {
7 CircuitORAM@m<T> oram;
8 int32 cnt;
9 };
10 phantom void SecStore@m<T>.add(int32 index,
int@m pos, T data) {
11 oram.add(index, pos, data);
12 }
13 phantom T SecStore@m<T>
.readAndRemove(int32 index, rnd@m pos) {
14 return oram.readAndRemove(index, pos);
15 }
16 phantom Pointer@m SecStore@m<T>.allocate() {
17 cnt = cnt + 1;
18 return Pointer@m(cnt, RND(m));
19 }
Figure 5.3: Code by expert programmers to help non-specialists implement oblivious
stack.
tially is a syntactic sugar for the ORAM’s readAndRemove interface [80, 90],
and the SecStore.add function is a syntactic sugar for the ORAM’s Add inter-
face [80,90]. Finally, the SecStore.allocate function returns a new Pointer
object to the caller. This new Pointer object is assigned a globally unique
logical identifier (using a counter cnt that is incremented each time), and a
fresh random chosen leaf label RND(m).
Stack implementation by a non-specialist programmer. Given abstractions
provided by the expert programmer, a non-specialist programmer can now imple-
ment a class of data structures such as stack, queue, heap, AVL Tree, etc. Figure 5.2
gives a stack example.
152
Role of affine type system. We use Figure 5.3 as an example to illustrate how our
rnd types with their affine type system can ensure security. As mentioned earlier,
rnd types have an affine type system. This means that each rnd can be declassified
(i.e., made public) at most once. In Figure 5.3, the oram.readAndRemove call will
declassify its argument rnd@m pos inside the implementation of the function body.
From an algorithms perspective, this is because the leaf label pos will be revealed
during the readAndRemove operation, incurring a memory trace where the value
rnd@m pos will be observable by the adversary.
5.3.3 Loop Coalescing and New Oblivious Graph Algorithms
We introduce a new programming abstraction called loop coalescing, and show
how this programming abstraction allowed us to design novel oblivious graph al-
gorithms such as Dijkstra’s shortest path and minimum spanning tree for sparse
graphs. Loop coalescing is non-trivial to embed as a library in ObliVM-Lang. We
therefore support this programming abstraction by introducing special syntax and
modifications to our compiler. Specifically, we introduce a new syntax called bounded-
for loop as shown in Figure 5.4. For succinctness, in this section, we will present
pseudo-code.
In the program in Figure 5.4, the bwhile(n) and bwhile(m) syntax at Lines 1
and 3 indicate that the outer loop will be executed for a total of n times, whereas
the inner loop will be executed for a total of m times – over all iterations of the
outer loop.
153
1 bwhile(n) (; u<n;) {
2      total = total + 1;
3      i=s[u];
4      bwhile (m) (i<s[u+1]) {
5          // do something
6          i=i+1;
7      }
8      u=u+1;
9 }
bwhile(n) (; u<n;) {
total = total + 1;
i=s[u];
bwhile (m) (i<s[u+1]) {
// do something
i=i+1;
}
u=u+1;
}
Block 1 × 𝑛
Block 2 ×𝑚
Block 3 × 𝑛
state = (u<n) ? 1 : -1;
for (__itr=0; __itr<n+m+n; __itr++) {
if (state==1) {   total=total+1; i=s[u];
state = (i<s[u+1]) ? 2 : 3
} else if (state==2) { // do something
i=i+1; state = (i<s[u+1]) ? 2 : 3
} else if (state==3) {
u=u+1; state = (u<n) ? 1 : -1
} // else execution is finished
}
𝑛 +𝑚 + 𝑛
iterations in total
Figure 5.4: Loop coalescing. The outer loop will be executed at most n times in
total, the inner loop will be executed at most m times in total – over all iterations
of the outer loop. A naive approach compiler would pad the outer and inner loop to
n and m respectively, incurring O(nm) cost. Our loop coalescing technique achieves
O(n+m) cost instead.
Algorithms Our Complexity Best Known
Dijkstra’s Algorithm
O((E + V ) log2 V ) O((E + V ) log3 V ) (Generic ORAM baseline [90])
(Sparse Graph)
Prim’s Algorithm
O((E + V ) log2 V )
O(E log
3 V
log log V ) for E = O(V log
γ V ), γ ≥ 0 [37]
(Sparse Graph) O(E log
3 V
logδ V
) for E = O(V 2log
δ V ), δ ∈ (0, 1) [37]
O(E log2 V ) for E = Ω(V 1+),  ∈ (0, 1] [37]
Table 5.1: Summary of algorithmic results. All costs reported are in terms of
circuit size. The asymptotic notation omits the bit-length of each word for sim-
plicity. Our oblivious Dijkstra’s algorithm and oblivious Prim’s algorithm can be
composed using our novel loop coalescing programming abstraction and oblivious
data structures.
154
Algorithm 1 Dijkstra’ algorithm with bounded for
Secret Input: s: the source node
Secret Input: e: concatenation of adjacency lists stored in a single ORAM array.
Each vertex’s neighbors are stored adjacent to each other.
Secret Input: s[u]: sum of out-degree over vertices from 1 to u.
Output: dis: the shortest distance from source to each node
1: dis := [∞,∞, ...,∞]
2: PQ.push(0, s)
3: dis[s] := 0
4: bwhile(V)(!PQ.empty())
5: (dist, u) := PQ.deleteMin()
6: if(dis[u] == dist) then
7: dis[u] := −dis[u];
8: bfor(E)(i := s[u]; i < s[u + 1]; i = i + 1)
9: (u, v, w) := e[i];
10: newDist := dist + w
11: if (newDist < dis[v]) then
12: dis[v] := newDist
13: PQ.insert(newDist, u)
To deal with loop coalescing, the compiler partitions the code within an
bounded-loop into code blocks, each of which will branch at the end. The num-
ber of execution times for each code block will be computed as the bound number
for the inner most bounded-loop that contains the code block. Then the compiler
will transform a bounded loop into a normal loop, whose body simulates a state
machine that each state contains a code block, and the branching statement at the
end of each code block will be translated into an assignment statement that moves
the state machine into a next state. The total number of iterations of the emitted
normal loop is the summation of the execution times for all code blocks. Figure 5.4
illustrates this compilation process.
We now show how this loop coalescing technique leads to new novel oblivious
graph algorithms.
155
Algorithm 2 Oblivious Dijkstra’ algorithm
Secret Input: e, s: same as Algorithm 1
Output: dis: the shortest distance from s to each node
1: dis := [∞,∞, ...,∞]; dis[source] = 0
2: PQ.push(0, s); innerLoop := false
3: for i := 0→ 2V + E do
4: if not innerLoop then
5: (dist, u) := PQ.deleteMin()
6: if dis[u] == dist then
7: dis[u] := −dis[u]; i := s[u]
8: innerloop := true;
9: end if
10: else
11: if i < s[u + 1] then
12: (u, v, w):= e[i]
13: newDist := dist + w
14: if newDist < dis[u] then
15: dis[u] := newDist
16: PQ.insert(newDist, v′)
17: end if
18: i = i + 1
19: else
20: innerloop := false;
21: end if
22: end if
23: end for
Oblivious Dijkstra shortest path for sparse graphs. It is an open problem
how to compute single source shortest path (SSSP) obliviously for sparse graphs
more efficiently than generic ORAM approaches. Blanton et al. [12] designed a
solution for a dense graph, but their technique cannot be applied when the graph
is sparse.
Recall that the priority-queue-based Dijkstra’s algorithm has to update the
weight whenever a shorter path is found to any vertex. In an oblivious version of
Dijkstra’s, this operation dominates the overhead, as it is unclear how to realize
it more efficiently than using generic ORAMs. Our solution to oblivious SSSP is
156
more efficient thanks to (1) avoiding this weight update operation, and (2) a loop
coalescing technique.
Avoiding weights updating. This is accomplished by two changes to a standard
priority-queue-based Dijkstra’s algorithm, i.e., lines 6-7 and line 12 in Algorithm 1.
The basic idea is, whenever a shorter distance newDist from s to a vertex u is
found, instead of updating the existing weight of u in the heap, we insert a new pair
(newDis, u) into the priority queue. This change can result in multiple entries for
the same vertex in the queue, leading to two concerns: (1) the size of the priority
queue cannot be bounded by V ; and (2) the same vertex might be popped and
processed multiple times from the queue. Regarding the first concern, we note the
size of the queue can be bounded by E = O(V 2) (since E = o(V 2) for sparse
graphs). Hence, each priority queue insert and deleteMin operation can still be
implemented obliviously in O(log2 V ) [92]. The second concern is resolved by the
check in lines 6-7: every vertex will be processed at most once because dis[v] will
be set negative once vertex v is processed.
Loop coalescing. In Algorithm 1, the two nested loops (line 4 and line 8) use
secret data as guards. In order not to leak the secret loop guards, a naive approach is
to iterate each loop a maximal number of times (i.e., V +E, as V alone is considered
secret).
Using our loop coalescing technique, we can derive an oblivious Dijkstra’s
algorithm that asymptotically outperforms a generic ORAM baseline for sparse
graphs. The resulting oblivious algorithm is described in Algorithm 2. Note that at
most V vertices and E edges will be visited, we coalesce the two loops into a single
157
one. The code uses a state variable innerloop to indicate whether a vertex or an
edge is being processed. Each iteration deals with one of a vertex (lines 5-8), an edge
(lines 15-18), or the end of a vertex’s edges (line 13). So there are 2V +E iterations
in total. Note the ObliVM-Lang compiler will pad the if-branches in Algorithm 2
to ensure obliviousness. Further, an oblivious priority queue is employed for PQ.
Cost analysis. In Algorithm 2, each iteration of the loop (lines 3-18) makes
a constant number of ORAM accesses and two priority queue primitives (insert
and deleteMin, both implemented in O(log2 V ) time). So, the total runtime is
O((V + E) log2 V ).
Additional algorithmic results. Summarized in Table 5.1, our loop coalescing
technique also immediately gives a new oblivious Minimum Spanning Tree (MST)
algorithm whose full description is omitted.
5.4 Implementing Rich Circuit Libraries
5.4.1 Case Study: Basic Arithmetic Operations
The rich language features provided by ObliVM-Lang make it possible to im-
plement complex arithmetic operations easily and efficiently. We give a case study
to demonstrate how to use ObliVM-Lang to implement Karatsuba multiplication.
Implementing Karatsuba multiplication. Figure 5.5 contains the implemen-
tation of Karatsuba multiplication [50] in ObliVM-Lang. Karatsuba multiplication
implements the following recursive algorithm to compute multiplication of two n
bit numbers, x and y, taking O(nlog2 3) amount of time. As a quick overview, the
158
1 int@(2 ∗ n) karatsubaMult@n(
int@n x, int@n y) {
2 int@2 ∗ n ret;
3 if (n < 18) {
4 ret = x*y;
5 } else {
6 int@(n− n/2) a = x$n/2~n$;
7 int@(n/2) b = x$0~n/2$;
8 int@(n− n/2) c = y$n/2~n$;
9 int@(n/2) d = y$0~n/2$;
10 int@(2 ∗ (n− n/2)) t1 =
karatsubaMult@(n− n/2)(a, c);
11 int@(2 ∗ (n/2)) t2 =
karatsubaMult@(n/2)(b, d);
12 int@(n− n/2 + 1) aPb = a + b;
13 int@(n− n/2 + 1) cPd = c + d;
14 int@(2 ∗ (n− n/2 + 1)) t3 =
karatsubaMult@(n− n/2 + 1)(aPb, cPd);
15 int@(2 ∗ n) padt1 = t1;
16 int@(2 ∗ n) padt2 = t2;
17 int@(2 ∗ n) padt3 = t3;
18 ret = (padt1<<(n/2*2)) + padt2 +
((padt3 - padt1 - padt2)<<(n/2));
19 }
20 return ret;
21 }
Figure 5.5: Karatsuba multiplication in ObliVM-Lang.
algorithm works as follows. First, express the n-bit integers x and y as the concate-
nation of n
2
-bit integers: x = a*2n/2+b, y = c*2n/2+d. Now, x*y can be calculated
as follows:
t1 = a*c; t2 = b*d; t3 = (a+b)*(c+d);
x*y = t1<<n + t2 + (t3-t1-t2)<<(n/2);
where the multiplications a*c and b*d are implemented through a recursive call to
the Karatsuba algorithm itself (until the bit-length is small enough).
To implement Karatsuba efficiently, we need to perform operations on a subset
of bits. We hence introduce the following syntactic sugar in ObliVM-Lang: In lines
159
1 #define BUCSIZE 3
2 #define STASHSIZE 33
3 struct Block@n<T>{
4 int@n id, pos;
5 T data;
6 };
7 struct CircuitOram@n<T>{
8 dummy Block@n<T>[public 1<<n+1]
[public BUCSIZE] buckets;
9 dummy Block@n<T>[public STASHSIZE] stash;
10 };
Figure 5.6: Part of our Circuit ORAM implementation (Type Definition)
in ObliVM-Lang.
6 to 9 of Figure 5.5, the syntax num$i~j$ means extracting the part of integer num
from i-th bit to j-th bit.
5.4.2 Case Study: Circuit ORAM
In Figure 5.7, we show part of the Circuit ORAM implementation using
ObliVM-Lang. Line 3 to line 6 is the definition of a ORAM block containing two
metadata fields of an index of type int, and a position label of type rnd, along with
a data field of type <T>.
Circuit ORAM (line 7-10) is organized to contain an array of buckets (i.e.
arrays of ORAM blocks), and a stash (i.e. an array of blocks). The dummy keyword
in front of Block@n<T> indicates the value of this type can be null. In many cases,
(e.g. Circuit ORAM implementation), using dummy keyword leads to a more efficient
circuit generation.
Line 11-30 demonstrates how readAndRemove can be implemented. Taking
the input of an secret integer index id, and a random position label pos, the label
160
11 phantom T CircuitOram@n<T>
.readAndRemove(int@n id, rnd@n pos) {
12 public int32 pubPos = pos;
13 public int32 i = (1 << n) + pubPos;
14 T res;
15 for (public int32 k = n; k>=0; k=k-1) {
16 for (public int32 j=0;j<BUCSIZE;j=j+1)
17 if (buckets[i][j] != null &&
18 buckets[i][j].id == id){
19 res = buckets[i][j].data;
20 buckets[i][j] = null;
21 }
22 i=(i-1)/2;
23 }
24 for (public int32 i=0;i<STASHSIZE;i=i+1)
25 if (stash[i]!=null&&stash[i].id==id) {
26 res = stash[i].data;
27 stash[i] = null;
28 }
29 return res;
30 }
Figure 5.7: Part of our Circuit ORAM implementation (ReadAndRemove)
in ObliVM-Lang.
pos is first declassified into public. Then affine type system allows declassifying pos
once, i.e. pos is never used for the rest of the program. Further in a function calling
readAndRemove with inputs arg1 and arg2, arg2 cannot be used either for the rest
of the program. This is crucial to enforce that every position labels will use revealed
only once after its generation, and, to our best knowledge, no prior work enables
such an enforcement in a compiler.
This Circuit ORAM implementation can be type-checked by ObliVM-Lang’s
extended type checker, which gives users stronger confidence that the implementa-
tion does not leak information through its execution traces.
161
5.5 Evaluation
5.5.1 Back End Implementation
Our compiler emits code to a Java-based secure computation back end called
ObliVM-SC. ObliVM-SC is designed to be extensible through a central notion called
computational environments. Conceptually, our compiler emits circuit designs; whereas
a computation environment decides how a circuit design, namely, how each AND and
XOR gate will be executed. In other words, computational environments provide a
separation between circuit designs and their executions, allowing circuit gadgets to
be potentially reusable for multiple cryptographic protocols, such as Garbled Cir-
cuit [95] or GMW [34]. Currently, ObliVM provides a Garbled Circuit protocol with
semi-honest security. However, adapting a circuit design to a different protocol such
as GMW would simply require changing to an alternative computation environment,
and does not involve modification of the compiler.
5.5.2 Metrics and Experiment Setup
Number of AND gates. In Garbled Circuit-based secure computation, func-
tions are represented in boolean circuits consisting of XOR and AND gates. Due
to well-known Free XOR techniques [8, 17, 53], the cost of evaluating XOR gates
are insignificant in comparison with AND gates. Therefore, a primary performance
metric is the number of AND gates. This metric is platform independent, i.e., inde-
pendent of the artifacts of the underlying software implementation, or the hardware
162
configurations where the benchmark numbers are measured. This metric facilitates
a fair comparison with existing works based on boolean circuits, and is one of the
most popular metrics used in earlier works [44,46,54,55,59,71,72,91,92].
Wall-clock runtime. Unless noted otherwise, all wall-clock numbers are measured
by executing the protocols between two Amazon EC2 machines of types c4.8xlarge
and c3.8xlarge. This metric is platform and implementation dependent, and there-
fore we will explain how to best interpret wallclock runtimes, and how these runtimes
will be affected by the underlying software and hardware configurations.
Compilation time. For all programs we ran, the compilation time is under 1 sec-
ond. Therefore, we do not separately report the compilation time for each program.
5.5.3 Comparison with Previous Automated Approaches
The first general-purpose secure computation system, Fairplay, was built in
2004 [65]. Since then, several improved systems were built [13, 43, 44, 46, 54, 55, 98].
Except for our prior work SCVM, existing systems provide no support for ORAM –
and therefore, each dynamic memory access would be compiled to a linear scan of
memory.
We now evaluate the speedup ObliVM achieves relative to previous approaches.
To illustrate the sources of the speedup, we consider the following sequence of pro-
gressive baselines. We start from Baseline 1 which is representative of a state-of-
the-art automated secure computation system. We then add one feature at a time
to the baseline, resulting in the next baseline, until we arrive at Baseline 5 which is
163
Oblivious
programming
abstractions
and compiler
optimizations
demonstrated
Parameters for
Figure 5.8
Parameters for
Table 5.3 and
Table 5.4
Dijkstra’s Loop coalescing
abstraction (see
Section 5.3.3).
V = 214, E = 3V V = 210, E = 3VAlgorithm
MST
Heap Oblivious data
structure
abstraction (see
Section 5.3.2).
N = 227,K =
32, V = 480
N = 223,K = 32, V =
992
Map/Set
Binary Search N = 223,K =
32, V = 992
AMS Sketch
Compile-time
optimizations:
split data into
separate
ORAMs [59].
 = 6× 10−5, δ =
2−20
 = 2.4× 10−4, δ = 2−20
Count Min Sketch  = 3× 10−6, δ = 2−20
K-Means
MapReduce
abstraction (see
Section 5.3.1).
N = 218 N = 216
Table 5.2: List of applications used in Figures 5.8. For graph algorithms,
V,E stand for number of vertices and edges; for data structures, N,K, V stand for
capacity, bit-length of key and bit-length of value; for streaming algorithms, , δ
stand for relative error and failure probability; for K-Means, N stands for number
of points.
164
essentially our ObliVM system.
• Baseline 1: A state-of-the-art automated system with no ORAM
support. Baseline 1 is intended to characterize a state-of-the-art automated
secure computation system with no ORAM support. We assume a compiler
that can detect public memory accesses (whose addresses are statically in-
ferrable), and directly make such memory accesses. For each each dynamic
memory access (whose address depends on secret inputs), a linear scan of
memory is employed. Baseline 1 is effectively a lower-bound estimate of the
cost incurred by CMBC-GC [44], a state-of-the-art system in 2012.
• Baseline 2: With GO-ORAM [35]. In Baseline 2, we implement the GO-
ORAM scheme on top of Baseline 1. Dynamic memory accesses made by a
program will be compiled to GO-ORAM accesses. We make no additional
compile-time optimizations.
• Baseline 3: With Circuit ORAM [90]. Baseline 3 is essentially the same
as Baseline 2 except that we now replace the ORAM scheme with a state-of-
the-art Circuit ORAM scheme [90].
• Baseline 4: Language and compiler. Baseline 4 assumes that the ObliVM
language and compiler are additionally employed (on top of Baseline 3), re-
sulting in additional savings due to our compile-time optimizations as well as
our oblivious programming abstractions.
• Baseline 5: Back end optimizations. In Baseline 5, we employ additional
165
Figure 5.8: Sources of speedup in comparison with state-of-the-art in
2012 [44]: an in-depth look.
back end optimizations atop Baseline 4. Baseline 5 reflects the performance
of the actual ObliVM system.
We consider a set of applications in our evaluation as described in Table 5.2.
We select several applications to showcase our oblivious programming abstractions,
including MapReduce, loop coalescing, and oblivious data structure abstractions.
For all applications, we choose moderately large data sizes ranging from 768KB to
10GB. For data structures (e.g., Heap, Map/Set) and binary search, for Baseline 1,
we assume that each operation (e.g., search, add, delete) is done with a single linear
scan. For Baseline 2 and 3, we assume that a typical sub-linear implementation is
adopted. For all other applications, we assume that Baseline 1 to 3 adopt the most
straightforward implementation of the algorithm.
Results. Figure 5.8 shows the speedup we achieve relative to a state-of-the-art
automated system that does not employ ORAM [44]. This speedup comes from the
following sources:
No ORAM to GO-ORAM: For most of the cases, the data size considered was
166
not big enough for GO-ORAM to be competitive to a linear-scan ORAM. The only
exception was AMS sketch, where we chose a large sketch size. In this case, using
GO-ORAM would result in a 300× speedup in comparison with no ORAM (i.e.,
linear-scan for each dynamic memory access). This part of the speedup is reflected
in purple in Figure 5.8. Here the speedup stems from a reduction in circuit size (as
measured by the number of AND gates).
Circuit ORAM: The red parts in Figure 5.8 reflect the multiplicative speedup
attained when we instead use Circuit ORAM (as opposed to no ORAM or GO-
ORAM, whichever is faster). This way, we achieve an additional 51× to 530 perfor-
mance gains – reflected by a reduction in the total circuit size.
Language and compiler: As reflected by the blue bars in Figure 5.8, our obliv-
ious programming abstractions and compile-time optimizations bring an additional
2× to 2500× performance savings on top of a generic Circuit ORAM-based ap-
proach. This speedup is also measurable in terms of reduction in the circuit size.
Back end optimizations: Our ObliVM-SC is a better architected and more
optimized version of its predecessor FastGC [46] which is employed by CMBC-
GC [44]. FastGC [46] reported a garbling speed of 96K AND gates/sec, whereas
ObliVM garbles at 670K AND gates/sec on a comparable machine. In total, we
achieve an 7× overall speedup compared with FastGC [46].
We stress, however, that ObliVM’s main contribution is not the back end im-
plementation. In fact, it would be faster to hook up ObliVM’s language and compiler
with a JustGarble-like system that employs a C-based implementation and hardware
167
AES-NI. However, presently JustGarble does not provide a fully working end-to-end
protocol. Therefore, it is an important direction of future work to extend JustGarble
to a fully working protocol, and integrate it into ObliVM.
Comparison with SCVM. In comparison with SCVM, ObliVM’s offers the follow-
ing new features: 1) new oblivious programming abstractions; 2) Circuit ORAM
implementation that is 20× to 30× times faster than SCVM’s binary-tree ORAM
implementation for 4MB to 4GB data sizes; and 3) ability to implement low-level
gadgets including the ORAM algorithm itself in the source language.
Since the design of efficient ORAM algorithms is mainly the contribution of
the Circuit ORAM paper [90], here we focus on evaluating the gains from program-
ming abstractions. Therefore, instead of comparing with SCVM per se, we compare
with SCVM + Circuit ORAM instead (i.e., SCVM with its ORAM implementation
updated to the latest Circuit ORAM).
5.5.4 ObliVM vs. Hand-Crafted Solutions
We show that ObliVM achieves competitive performance relative to hand-
crafted solutions for a wide class of common tasks. We also show that ObliVM
significantly reduces development effort in comparison with previous secure compu-
tation frameworks.
Competitive performance. For a set of applications, including Heap, Map/Set,
AMS Sketch, Count-Min Sketch, and K-Means, we compared implementations auto-
generated by ObliVM with implementations hand-crafted by human experts. Here
168
the human experts are authors of this paper. We assume that the human ex-
perts have wisdom of employing the most efficient, state-of-the-art oblivious al-
gorithms when designing circuits for these algorithms. For example, Histogram and
K-Means algorithms are implemented with oblivious sorting protocols instead of
generic ORAM. Heap and Map/Set employ state-of-the-art oblivious data structure
techniques [92]. The graph algorithms including Dijkstra and MST employ novel
oblivious algorithms proposed in this paper. In comparison, our ObliVM programs
for the same applications do not require special security expertise to create. The
programmer simply has to express these tasks in the programming abstractions
we offer whenever possible. Over the suite of application benchmarks we consider,
our ObliVM programs are competitive to hand-crafted implementations – and the
performance difference is only 0.5%− 2% throughout.
We remark that the hand-crafted circuits are not necessarily the optimal cir-
cuits for each computation task. However, they do represent asymptotically the best
known algorithms (or new algorithms that are direct implications of this paper). It
is conceivable that circuit optimization techniques such as those proposed in Tiny-
Garble [82] can further reduce circuit sizes by a small constant factor (e.g., 50%).
We leave this part as an interesting direction of future research.
Developer effort. We use two concrete case studies to demonstrate the significant
reduction of developer effort enabled by ObliVM.
Case study: ridge regression. Ridge regression [41] takes as input a large num-
ber of data points and finds the best-fit linear curve for these points. The algorithm
169
is an important building block in various machine-learning tasks [72]. Previously,
Nikolaenko et al. [72] developed a system to securely evaluate ridge regression, using
the FastGC framework [46], which took them roughly three weeks [5]. In contrast,
we spent two student·hours to accomplish the same task using ObliVM. In addi-
tion to the speedup gain from ObliVM-SC back end, our optimized libraries result
in 3× smaller circuits with aligned parameters. We defer the detailed comparison
to the online technical report [61].
Case study: oblivious data structures. Oblivious AVL tree (i.e, the Map/Set
data structure) is an example algorithm that was previously too complex to program
as circuits, but now becomes very easy with ObliVM. In an earlier work [92], we
designed an oblivious AVL tree algorithm, but were unable to implement it due to
high programming complexity. Now, with ObliVM, we implement an AVL tree with
311 lines of code in ObliVM-Lang, consuming under 10 student·hours (including the
implementation as well as debugging).
We stress that it is not possible to implement oblivious AVL tree in previous
languages for secure computation, including the state-of-the-art Wysteria [75].
5.5.5 End-to-End Application Performance
Currently in ObliVM-SC, we implemented a standard garbling scheme with
Garbled Row Reduction [69] and FreeXOR [53]. We also implemented an OT ex-
tension protocol proposed by Ishai et al. [48] and a basic OT protocol by Naor and
Pinkas [68].
170
Program Input size
CMBC-GC ObliVM Framework ObliVM + JustGarble
(estimate) (estimate)
#AND Total #AND Total Online Total Online
gates time gates time time time time
Basic instructions
Integer addition 1024 bits 2977 31ms 1024 1.7ms 0.6ms 0.12ms 0.05ms
Integer mult. 1024 bits 6.4M 66.4s 572K 833ms 274ms 69.4ms 28.9ms
Integer Comparison 16384 bits 32K 335.7ms 16384 26ms 8.58ms 1.96ms 0.82ms
Floating point addition 64 bits 10K 104ms 3035 4.32ms 1.45ms 0.36ms 0.15ms
Floating point mult. 64 bits 10K 104ms 4312 6.29ms 2.02ms 0.52ms 0.22ms
Hamming distance 1600 bits 30K 310ms 3200 5.07ms 1.71ms 0.39ms 0.16ms
Linear or super-linear algorithms
K-Means 0.5MB 550B 66d 2269M 62.1min 23.6min 4.58min 1.9min
Dijkstra’s Algorithm 48KB 755B 91d 10B 12.6h 3.09h 20.4min 8.5min
MST 48KB 755B 91d 9.6B 12.4h 3h 19.6min 8.2min
Histogram 0.25MB 137B 16.5d 866M 21.5min 8.56min 1.7min 42.5s
Sublinear-time algorithms
Heap 1GB 32B 3.9d 12.5M 59.3s 10.42s 1.5s 625ms
Map/Set 1GB 32B 3.9d 23.9M 117.2s 20.67s 2.9s 1.2s
Binary Search 1GB 32B 3.9d 1562K 7.36s 1.34s 189ms 78.8ms
Count Min Sketch 0.31GB 9.9B 30.8h 8088K 20.77s 6.4s 0.98s 0.41s
AMS Sketch 1.25GB 40B 5.18d 9949K 36.76s 9.95s 1.21s 504ms
Table 5.3: Application performance. Actual measured numbers are in bold.
The remainder are estimated numbers and should be interpreted with care. ObliVM
numbers for basic instructions and sublinear-time algorithms are the mean of 20
runs. Since for all these applications, our measurements have small spread (all runs
are within 6% from the mean), we use a single run for linear-time and super-linear
algorithms (the same for Table 5.4).
171
Setup. For evaluation, here we consider a scenario where a client secret shares its
data between two non-colluding cloud providers a priori. For cases where inputs are
a large dataset (e.g., Heap, Map/Set, etc), depending on the application, the client
may sometimes need to place the inputs in an ORAM, and secret-share the resulting
ORAM among the two cloud providers. We do not measure this setup cost in our
evaluation – this cost can depend highly on the available bandwidth between the
client and the two cloud providers. Therefore, our evaluation begins assuming this
one-time setup has completed.
End-to-end application performance. In Table 5.3, we consider three types of
applications, basic instructions (e.g., addition, multiplication, and floating point op-
erations); linear or super-linear algorithms (e.g., Dijkstra, K-Means, Minimum Span-
ning Tree, and Histogram); and sublinear-time algorithms (e.g., Heap, Map/Set,
Binary Search, Count Min Sketch, AMS Sketch). We report the circuit size, online
and total costs for a variety of applications at typical data sizes.
In Table 5.3, we also compare ObliVM with a state-of-the-art automated se-
cure computation system CMBC-GC [44]. We note that the authors of CMBC-GC
did not run all of these application benchmarks, so we project the performance of
CMBC-GC using the following estimate: we first change our compiler to adopt a
linear scan of memory upon dynamic memory accesses – this allows us to obtain an
estimate of the circuit size CMBC-GC would have obtained for the same applica-
tions. For the set of application benchmarks (e.g., K-Means, MST, etc) CMBC-GC
did report in their paper, we confirmed that our circuit size estimates are always a
lower bound of what CMBC-GC reported. We then estimate the runtime of CMBC-
172
GC based on their reported 96K AND gates per sec – assuming that a network
bandwidth of at least 2.8MBps is provisioned.
As mentioned earlier, the focus of this paper is our language and compiler, not
the back end cryptographic implementation. It should be relatively easy to integrate
our language and compiler with a JustGarble-like back end that employs hardware
AES-NI. In Table 5.3, we also give an estimate of the performance we anticipate
if we ran our ObliVM-generated circuits over a JustGarble-like back end. This is
calculated using our circuit sizes and the 11M AND gates/sec performance number
reported by JustGarble [11].
• Online cost. To measure online cost, we assume that all work that is in-
dependent of input data is performed offline, including garbling and input-
independent OT preprocessing. Our present ObliVM implementation achieves
an online speed of 1.8M gates/sec consuming roughly 54MBps network band-
width.
• Offline cost. When no work is deferred to an offline phase, ObliVM achieves a
garbling speed of 670K gates/sec consuming 19MBps network bandwidth.
Slowdown relative to a non-secure baseline. For completeness, we now de-
scribe ObliVM’s slowdown in comparison with a non-secure baseline where compu-
tation is performed in cleartext. As shown in Table 5.4, our slowdown relative to a
non-secure baseline is application dependent, and ranges from 45× to 9.3 × 106×.
We also present the estimated slowdown if a JustGarble-like back end is used for
ObliVM-generated circuits. These numbers are estimated based on our circuit sizes
173
Task
Cleartext ObliVM ObliVM+JustGB (estimate)
Time Runtime Slowdown Runtime Slowdown
K-Means (Online) 0.4ms 24min 3.6× 106 1.9min 2.9× 105
K-Means (Total) 0.4ms 62min 9.3× 106 4.58min 6.9× 105
Distributed GWAS (Online) 40ms 1.8s 45 0.14s 3.5
Distributed GWAS (Total) 40ms 5.2s 130 0.28s 7
Binary Search (Online) 10µs 1.3s 1.3× 105 78.8ms 7.9× 103
Binary Search (Total) 10µs 7.4s 7.4× 105 189ms 1.9× 104
AMS Sketch (Online) 80µs 9.5s 1.2× 105 0.5s 6.3× 103
AMS Sketch (Total) 80µs 36.8s 4.6× 105 1.2s 1.5× 104
Hamming (Online) 0.3µs 1.71ms 6× 103 0.16ms 5.3× 102
Hamming (Total) 0.3µs 5.07ms 1.7× 104 0.39ms 1.3× 103
Table 5.4: Slowdown of secure computation compared with non-secure,
cleartext computation. Parameter choices are the same as Table 5.3. Online
cost only includes operations that are input-dependent. All time measurements as-
sume data are pre-loaded to the memory. ObliVM requires a bandwidth of 19MBps.
Numbers for JustGarble are estimated using ObliVM-generated circuit sizes assum-
ing 315MBps bandwidth.
as well as the reported 11M AND gates/sec performance metric reported by Just-
Garble [11].
In particular, we elaborate on the following interesting cases. First, the dis-
tributed genome-wide association study(GWAS) application is Task 1 in the iDash
secure genomic analysis competition [1], with total data size 380KB. This task
achieves a small slowdown, because part of the computation is done locally – specif-
ically, Alice and Bob each performs some local preprocessing to obtain the alle
frequencies of their own data, before engaging in a secure computation protocol
to compute χ2-statistics. For details, we refer the reader to our online short note
on how we implemented the competition tasks. On the other hand, benchmarks
with floating point operations such as K-Means incur a relatively larger slowdown
because modern processors have special floating point instructions which makes it
174
favorable to the insecure baseline.
5.6 Conclusion
We design ObliVM, a programming framework for automated secure computa-
tion. Additional examples can be found at our project website http://www.oblivm.
com, including popular streaming algorithms, graph algorithms, data structures, ma-
chine learning algorithms, secure genome analysis [1], etc.
175
Chapter 6: Conclusion Remarks and Future Directions
6.1 Summary
In this thesis, we investigate in a set of cloud-related security applications
in which programs’ execution traces may leak information. We propose principled
approaches to achieve performant trace oblivious program execution. In particular,
we exploit the intrinsic obliviousness within each program, so that expensive cryp-
tographic ORAM constructions and their overheads can be saved. Security type
systems are developed to enforce that our optimization does not violate privacy and
security requirements in the application domains.
Based on these principled methods, we build GhostRider, a hardware-software
co-designed system, as a hardware-based solution, and ObliVM, a RAM-model se-
cure computation framework, as a cryptography-based solution to mitigate attacks
from cloud’s insiders and intruders. Both systems demonstrate superior improve-
ments over previous ones by orders of magnitudes.
176
6.2 Future Direction
While this thesis greatly expanded the study in trace oblivious execution,
several future directions are promising.
6.2.1 Verifying Hardware ORAM Implementation
While we have demonstrated that ObliVM’s type system can help indentifying
bugs in Circuit ORAM implementations, it remains an interesting question whether
the obliviousness of ORAM algorithms can be verified automatically. An particu-
larly interesting and important direction is to verify a hardware ORAM controller,
such as in GhostRider, is implemented secure. Several challenges may arise during
this investigation. First, no static analysis-based approach has been developed to
verify the obliviousness of ORAM algorithms. The main challenge is to enforce ran-
dom numbes are handled correctly. Second, there is a gap between the languages
that we have been studying and popular hardware programming languages, such as
Verilog. This gap may introduce more technical difficulties to design such a verifier.
6.2.2 Parallel Trace Oblivious Execution
So far, we have focused on sequential programs. While parallel programs be-
come the new main fashion in applications such as big data and deep learning, it
is interesting and important to study how to achieve trace obliviousness for par-
allel programs. This is not easy. In particular, the design of concurrent ORAM
algorithms are still in its theoretical phase, and most existing concurrent ORAM
177
algorithms are not practically. Therefore, syntactic hints from the executed program
to exploit its obliviousness are more promising than relying on parallel ORAM algo-
rithms. In fact, many big-data programming frameworks, such as MapReduce [25],
already force programmers to express their computations into mappers and reduc-
ers, which are both able to be executed in parallel without leaking any information
through the execution traces. GraphSC [70] has made the first attempt to adopt
this idea to present a set of programming interfaces so that programs using these
interfaces can be turned into there parallel oblivious version automatically without
incurring too much overhead.
6.2.3 Differentially Privately Oblivious Execution
In this thesis, we focus on programs that leak absolutely no information
through their execution traces. Most practical programs, however, do not have
a counterpart satisfying this property: it is very easy for the program to include a
loop whose guard depends on some secret data. Therefore, it is interesting to seek
for a weaker version of trace obliviousness.
Privacy researches in recent ten years advocate for weaker privacy notions such
as differential privacy to be enforced in real applications. Therefore, it is interesting
to study whether trace obliviousness has a differentially private counterpart. If
related techniques can be developed, trace oblivious program execution will be more
practical in such applications where absolute trace obliviousness is unnecessary but
differential privacy is sufficient.
178
Appendix A: Proof of Theorem 1
A.1 Trace equivalence and lemmas
We shall further study some properties of trace equivalence. First of all, we
define the length of a trace t, denoted as |t| to be:
|t| =

1 if t = read(x, n) | write(x, n) | readarr(x, n, n′) |
writearr(x, n, n′)
0 if t = 
|t1|+ |t2| if t = t1@t2
(A.1)
Lemma 1. If t1 ≡ t2, then |t1| = |t2|.
Proof. Let us prove by induction on how t1 ≡ t2 is derived. If t1 = t2, then the
conclusion is obvious. If t1 = @t2, then |t1| = || + |t2| = |t2|. Similarly, we can
prove the conclusion when t1 = t2@, t2 = @t1, t2 = t1@, or t1 = @t and t2 = t@.
If t1 = t11@t12, t2 = t21@t22, t11 ≡ t21, and t12 ≡ t22, then by induction, we
have |t11| = |t21|, and |t12| = |t22|. Therefore |t1| = |t11|+ |t12| = |t21|+ |t22| = |t2|.
Finally, if t1 = (t
′
1@t
′
2)@t
′
3 and t2 = t
′
1@(t
′
2@t
′
3), then |t1| = |t′1@t′2| + |t′3| =
|t′1|+ |t′2|+ |t′3| = |t′1|+ |t′2@t′3| = |t2|.
179
Now, we define the i-th element in a trace, denoted t[i], as follows:
t[i] =

 if i ≤ 0 ∨ i > |t|
t if i = 1 ∧ t = read(x, n) | write(x, n) |
readarr(x, n, n′) | writearr(x, n, n′)
t1[i] if t = t1@t2 ∨ 1 ≤ i ≤ |t1|
t2[i− |t1]] if t = t1@t2 ∨ |t1| < i ≤ |t|
It is easy to see that if ∀i.t1[i] = t2[i] implies |t1| = |t2| by the following lemma.
Lemma 2. t[i] 6=  for all i such that 1 ≤ i ≤ |t|, and  otherwise.
Proof. The second part of the conclusion is trivial since it directly follows the defi-
nition. We prove the first part by induction on |t|. If |t |= 0, then the conclusion is
trivial.
If |t| = 1, and 1 ≤ i ≤ |t|, then i must be 1. Therefore, t[i] is one of read(x, n,
write(x, n), readarr(x, n, n′), and writearr(x, n, n′), and therefore t[i] 6= .
If |t| > 1, then t must be a concatenation of two subsequences, i.e. t1@t2. If
1 ≤ i ≤ |t1|, then t[i] = t1[i], and by induction, we know that t[i] 6= . Otherwise,
if |t1| < i ≤ |t|, then 0 < i − |t1| ≤ |t| − |t1| = |t2|. For natural number n,
n > 0 implies n ≥ 1. Therefore 1 ≤ i − |t1| ≤ |t2|, and by induction, we have
t[i] = t2[i− |t1]] 6= .
Before we go to the next lemma, we shall define the canonical representation
180
of a trace. First, we define the number of blocks in a trace t, denoted by #(t), as
#(t) =

#(t1) + #(t2) if t = t1@t2
1 otherwise
Then we define an order t between two traces t1 and t2 as follows: t1 t t2 if
and only if either of the following two conditions hold true: (i) #(t1) < #(t2), or
(ii) #(t1) = #(t2) ≥ 2, t1 = t′1@t′′1, t2 = t′2@t′′2, and either of the following three
sub-conditions holds true: (ii.a) #(t′1) > #(t
′
2); (ii.b) #(t
′
1) = #(t
′
2) and t
′
1 t t′2; or
(ii.c) t′1 = t
′
2 and t
′′
1  t′′2. It is easy to see that t is complete.
Definition 10 (canonical representation). The canonical representation of a trace
t is the minimal element in the set {t′ : t ≡ t′} under order t.
Lemma 3. The canonical representation of t is (i)  is |t |= 0; or (ii) can(t) =
(...((t1@t2)@t3)...@tn), where n = |t| > 0, and ti = t[i].
Proof. On the one hand, it is easy to see that can(t) belongs to the set {t′ : t ≡ t′}.
In fact, we can prove by induction on #(t). If #(t) = 1, then either |t |= 1, or |t |= 0.
For the former case, t is one of read(x, n, write(x, n), fetch(p), readarr(x, n, n′),
and writearr(x, n, n′), and thus t = t[1] = can(t). For the later case, t = .
Now suppose #(t) > 1, and thus t = t′@t′′. Suppose |t′ |= l1 and |t′′ |= l2. If
l2 = 0, by induction, t
′′ = , and thus t ≡ t′. Furthermore, we have |t| = |t′|, and
∀i.t[i] = t′[i] by definition. Therefore t ≡ t′ ≡ can(t′) = can(t). Similarly, we can
prove the conclusion is true when l1 = 0. Now suppose l1 > 0 and l2 > 0, then
can(t′) = (...((t1@t2)@t3)...tl1), and can(t
′′) = (...((tl1+1@tl1+2)@tl1+3)...@tl1+l2).
181
Then t ≡ can(t′)@can(t′′) ≡ can(t).
On the other hand, we shall show that can(t) is the minimal one in {t′ : t ≡ t′}.
To show this point, we only need to show that for all t′ ≡ can(t), we have can(t) t t′.
We prove by induction on n. If n = 1, the conclusion is obvious. Suppose n > 1
and the conclusion holds true for all n′ < n.
It is easy to see that #(t′) > 1, therefore we suppose t′ = tl@tr. Then we prove
that there exists k such that tl ≡ (...(t1@t2)...@tk) and tr ≡ (...(tk+1@tk+2)...@tn).
We prove by induction on n and how many steps of equivalent-transitive rule, i.e.,
t1 ≡ t2∧t2 ≡ t3 ⇒ t1∧t3, should be applied to derive can(t) ≡ t′. If we should apply
0 step, then we know one of the following situations holds: (i) t′ = t′′@tn where t′′ ≡
(...(t1@t2)...@tn−1); (ii) t′ = (...(t1@t2)...tn−2)@(tn−1@tn); (iii) t′ = t@; or (iv) t′ =
@t. In any case, our conclusion holds true. Now suppose we need to apply n > 0
steps to derive t′, where in the n−1 step, we derive that can(t) ≡ t′′ and we can derive
t′′ ≡ t′ without applying the equivalent-transitive rule. Therefore by induction,
we know that t′′ = t′′l @t
′′
r , and there is k such that t
′′
l ≡ (...(t1@t2)...@tk) and
t′′r ≡ (...(tk+1@tk+2)...@tn). Since we can derive t′′ ≡ t′ without applying equivalent-
transitive rule, we know that one of the following situations holds:
1. t′′l ≡ tl and t′′r ≡ tr;
2. t′ = @t′′;
3. t′ = t′′@;
4. t′′ = t′@;
182
5. t′′ = @t′;
6. @t′ = t′′@;
7. t′@ = @t′′;
8. t′′l = (tl1@tl2), tl ≡ tl1, and tr ≡ tl2@t′′r (in this case, we have (tl1@tl2)@t′′r ≡
tl1@(tl2@t
′′
r)); and
9. t′′r = (tr1@tr2), tl ≡ t′′l @tr1, and tr ≡ tr2 (in this case, we have t′′l @(tr1@tr2) ≡
(t′′l @tr1)@tr2).
For the first 7 cases, the conclusion is trivial. For Case 8, by induction, we know
there are some k′ such that tl1 ≡ (...(t1@t2)...@tk′) and tl2 ≡ (...(tk′+1@tk′+2)...@tk).
Therefore tl ≡ (...(t1@t2)...@tk′), and tr ≡ (...(tk′+1@tk′+2)...@tk)@(...(tk+1@tk+2)...@tn)
while the later is equivalent to (...(tk′+1@tk′+2)...@tn). Similarly, we can prove under
case 9, the conclusion is also true.
Next, we prove that can(t) t tl@tr. To show this point, by induction, we
know (...(t1@t2)...@tk)  tl and (...(tk+1@tk+2)...@tn)  tr. If either #(tl) > k or
#(tr) > n − k, we have #(t′) > n and thus can(t)  t′. Suppose #(tl) = k and
#(tr) = n− k. If k < n− 1, then by the definition of , we have can(t) t t′. Next
suppose k = n − 1, then by induction, we know (...(t1@t2)...@tn−1) t tl, and thus
can(t) t tl@tr = t′.
Next is the most important lemma about trace-equivalence.
Lemma 4. t1 ≡ t2, if and only if ∀i.t1[i] = t2[i].
183
Proof. “⇒” Suppose ∀i.t1[i] = t2[i]. Then by Lemma 3, we know can(t1) = can(t2),
and thus t1 ≡ can(t1) ≡ can(t2) ≡ t2.
“⇐” Suppose t1 ≡ t2, then by Lemma 3, we have can(t1) ≡ can(t2). Due to
both can(t1) and can(t2) have the same form, we know they are identical. Therefore,
we can conclude that ∀i.t1[i] = t2[i].
A.2 Lemmas on trace pattern equivalence
Trace pattern equivalence has similar properties as trace equivalence. If fact,
we define the length of a trace pattern T , denoted as |T |, to be
|T | =

1 if T = Read(x) | Fetch(p)
0 if T = 
|T1|+ |T2| if T = T1@T2
Similar to trace, we define the i-th element in a trace pattern T , denoted T [i],
as follows:
T [i] =

 if i ≤ 0 ∨ i > |T |
T if i = 1 ∧ T = Read(x)
T1[i] if T = T1@T2 ∨ 1 ≤ i ≤ |T1|
T2[i− |T1]] if T = T1@T2 ∨ |T1| < i ≤ |T |
Using exactly the same technique, we can prove the following lemma:
Lemma 5. T1 ∼L T2, if and only if ∀i.T1[i] = T2[i].
To avoid verbosity, we do not provide the full proof here. It is quite similar to
184
the proof of Lemma 4
A.3 Proof of memory trace obliviousness
To prove Theorem 1, memory trace obliviousness by typing, we shall first prove
the following lemma:
Lemma 6. If Γ ` e : Nat L;T , then for any two Γ-valid low-equivalent memories
M1, M2, if 〈M1, e〉 ⇓t1 n1, 〈M2, e〉 ⇓t2 n2, then t1 = t2 and n1 = n2
Proof. We use structural induction on expression e to prove this lemma. If e is
in form of x, then Γ(x) = Nat L, and thus M1(x) = M2(x) = n according to the
definition of low-equivalence and Γ-validity. Therefore t1 = read(x, n) = t2, and
n1 = n2 = n.
If e is in form of e1 op e2, then Γ ` e1 : Nat L and Γ ` e2 : Nat L. Suppose
〈Mi, ej〉 ⇓tij n′ij, for i = 1, 2, j = 1, 2. Then t1j = t2j and n1j = n2j for j = 1, 2.
Therefore t1 = t11@t12 = t21@t22 = t2, and n1 = n11 op n12 = n21 op n22 = n2.
Next, we consider the expression in form of x[e]. We know that Γ(x) =
Array L, which implies Γ ` e : Nat L. Suppose 〈Mi, e〉 ⇓t′i n′i, then by induction
t′1 = t
′
2 and n
′
1 = n
′
2. Furthermore, since M1 ∼L M2, we have ∀i ∈ Nat.M1(x)(i) =
M2(x)(i). Therefore t1 = t
′
1@readarr(x, n
′
1,M1(x)(n
′
1)) = t
′
2@readarr(x, n
′
2,M2(x)(n
′
2)) =
t2, and n1 = M1(x)(n
′
1) = M2(x)(n
′
2) = n2.
Finally, the conclusion is trivial for constant expression.
For convenience, we define lab : Type→ SecLabels as:
185
lab(τ) =

l if τ = Int l
l if τ = Array l
Similar to Lemma 6, we can prove the following lemma:
Lemma 7. If Γ ` e : Nat l;T and l ∈ ORAMBanks, then for any two Γ-valid
low-equivalent memories M1, M2, if 〈M1, e〉 ⇓t1 n1, 〈M2, e〉 ⇓t2 n2, then t1 = t2
Proof. If l = L, then the conclusion is obvious by Lemma ??. We only consider
l ∈ ORAMBanks. We use structural induction to prove this lemma. If e is
in form of x, then according to the definition of Γ-validity and evt(·), we have
t1 = lab(Γ(x)) = t2.
If e is in form of e1 op e2, then Γ ` e1 : Nat l1 and Γ ` e2 : Nat l2. Suppose
〈Mi, ej〉 ⇓tij n′ij, for i = 1, 2, j = 1, 2. Then t1j = t2j, for j = 1, 2 by induction.
Therefore t1 = t11@t12 = t21@t22 = t2.
Finally, we consider the expression in form of x[e]. We know that Γ ` e :
Nat l′. Suppose 〈Mi, e〉 ⇓t′i n′i. If l′ = L, then t′1 = t′2 by Lemma ??. Otherwise,
l ∈ ORAMBanks, and by induction assumption, we have t′1 = t′2. Since l ∈
ORAMBanks, we know l = lab(Γ(x)), and thus t1 = t
′
1@l = t
′
2@l = t2.
Now we shall study the property of trace pattern equivalence. First of all, we
have the following lemma:
Lemma 8. Suppose s and S are a statement and a labeled statement respectively.
If Γ, l0 ` S;T , l0 ∈ ORAMBanks and 〈M,S〉 ⇓t M ′, then M ∼L M ′.
186
Proof. We prove by induction on the statement S. Notice that the statement is
impossible to be while statement. The conclusion is trivial for the statement skip.
If s is x := e, then l0 ⊆ lab(Γ(x)), and thus lab(Γ(x)) ∈ ORAMBanks.
Therefore M ′ = M [x 7→ (n, l)] for some natural number n and some security la-
bel l, which implies M ′ ∼L M . Similarly, if s is x[e1] := e2, then lab(Γ(x)) ∈
ORAMBanks. Furthermore, 〈M,x[e1] := e2〉 ⇓t M [x 7→ (m, l)] for some mapping
m, and some security label l ∈ ORAMBanks. Therefore M ′ = M [x 7→ (m, l)],
which implies for x such that M(x) = (n,L), we know that M ′(x) = (n,L). There-
fore M ′ ∼L M .
Next, let us consider statement if(e, S1, S2). Then we know either of the two
conditions holds true: (1) 〈M,S1〉 → M ′, and (2) 〈M,S2〉 → M ′. Since Γ, l0 `
if(e, S1, S2);T , we have Γ, l
′ ` S1;T1, and Γ, l′ ` S2;T2, where l0 v l′. Therefore we
know for either condition, we have M ∼L M ′.
Finally, for sequence of two statements S1;S2, suppose 〈M,S1〉 ⇓t M1, and
〈M1, S2〉 ⇓t′ M ′. Then M ∼L M1 ∼L M ′.
According to definition of the trace pattern equivalence, it is obvious to see
that, if T ∼L T ′, then T is a sequence, whose element each is in the form of Fetch(p),
Read(x), , and o.
We shall define a trace t belongs to a trace pattern T , under a memory M ,
denoted by t ∈ T [M ] as follows:
187
 ∈ [M ] o ∈ o[M ] M(x) = (n,L), n ∈ Nat
read(x, n) ∈ Read(x)[M ]
t1 ∈ T1[M ] t2 ∈ T2[M ]
t1@t2 ∈ T1@T2[M ]
t ∈ T [M ] T ∼L T ′
t ∈ T ′[M ]
Now, we prove the most important lemma for t ∈ T [M ]:
Lemma 9. t ∈ T [M ] if and only if |t| = |T | and ∀i.t[i] ∈ (T [i])[M ].
Proof. “⇒” Suppose |t| = |T | and ∀i.t[i] ∈ (T [i])[M ]. We prove by induction on
#(t). If #(t) = 1, then the conclusion is trivial. Assume the conclusion holds for
all #(t′) < n, now suppose #(t) = n > 1. Then we know t = t1@t2. If t1 = ,
then we know |t2| = |t| = |T | and ∀i.t2[i] = t[i] ∈ (T [i])[M ], by induction, we know
t2 ∈ T [M ]. Furthermore, we have t1 =  ∈ [M ], therefore t1@t2 ∈ @T [M ]. Since
@T ∼L T , we have t = t1@t2 ∈ T [M ]. A similar argument shows that if t2 = ,
then we also have t ∈ T [M ].
Now let us consider when |t1 |= 0. By induction, we have t1 ∈ [M ] and
t2 ∈ T [M ], and then again, we have t ∈ T [M ]. Similarly, if |t2 |= 0, we also have
t ∈ T [M ].
Now assume |t1| > 0 and |t2| > 0, and suppose T1 = (...(T1@T2)...@T|t1|) and
T2 = (...(T|t1|+1@T|t1|+2)...@T|T |). Then by induction, we know that t1 ∈ T1[M ]
and t2 ∈ T2[M ], and thus t1@t2 ∈ T1@T2[M ]. According to Lemma 5, we have
T1@T2 ∼L T , and thus t = t1@t2 ∈ T [M ].
“⇐” We prove by induction on how many steps to derive t ∈ T [M ]. Suppose
we need only 1 step, then one of the following four conditions is true: (i) t =  = T ;
188
(ii) t = o = T ; (iii) t = read(x, n), T = Read(x) and M(x) = n. In either case, the
conclusion is trivial.
Then suppose we need n step, and the last step is derived from t = t1@t2,
T = T1@T2, and t1 ∈ T2[M ] and t2 ∈ T2[M ]. Then by induction we have |t1| = |T1|,
|t2| = |T2|, ∀i.t1[i] ∈ (T1[i])[M ], and ∀i.t2[i] ∈ (T2[i])[M ]. For i < 1 or i > |T |,
then t[i] =  = T [i], and thus t[i] ∈ (T [i])[M ]. If 1 ≤ i ≤ |T1|, then t[i] = t1[i]
and T [i] = T1[i], and by induction, we have t[i] ∈ (T [i])[M ]; if |T1| < i ≤ |T |, then
t[i] = t2[i− |t1]] and T [i] = T2[i− |T1]], and by induction, we have t[i] ∈ (T [i])[M ].
Finally, suppose we need n step, and the last step is derived from t ∈ T ′[M ]
and T ′ ∼L T . Then according to Lemma 5, we know that ∀i.T ′[i] = T [i], which also
implies that |T ′| = |T |. By induction, we have |t| = |T ′| and ∀i.t[i] ∈ (T ′[i])[M ],
and therefore, we have ∀i.t[i] ∈ (T [i])[M ] and |t| = |T |.
We have the following corollaries.
Corollary 1. If M1 ∼L M2, and t ∈ T [M1], then t ∈ T [M2].
Proof. By Lemma 9, we only need to show that ∀i.t[i] ∈ (T [i])[M2].
Let us prove by structural induction on how t ∈ T [M ] is derived. If t =  = T ,
or t = o = T , or t = t1@t2 and T = T1@T2, then the conclusion is trivial. The
only condition we need to prove is when t = read(x, n), and T = Read(x). If
so, since t ∈ T [M1], therefore M1(x) = (n,L). Since M1 ∼L M2, we know that
M2(x) = (n,L). Therefore, we have t = read(x, n) ∈ Read(x)[M2] = T [M2].
According to the definition of T [i], we know it is in one of the following three
forms: , o, or Read. If T [i] = , then we know i < 1 or i > |T | = |t1|. Therefore
189
t[i] = , and thus t[i] ∈ (T [i])[M2]. If T [i] = o, then we know t[i] = o. In both
situations, we have t[i] ∈ (T [i])[M2]. Finally, if T [i] = Read(x), then we know
t[i] = read(x, n) where n = M1[x]. Since M1 ∼L M2, we have M2[x] = n, and thus
t[i] ∈ (T [i])[M2].
Corollary 2. If t1 ∈ T [M ] and t2 ∈ T [M ], then t1 ≡ t2.
Proof. Assume t1 ∈ T [M ], and t2 ∈ T [M ], according to Lemma 9, we have |t1| =
|T | = |t2|, ∀i.t1[i] ∈ (T [i])[M ], and ∀i.t2[i] ∈ (T [i])[M ]. According to the definition
of T [i], we know it is in one of the following three forms: , o, or Read. If T [i] = ,
then we know i < 1 or i > |T | = |t1| = |t2|. Therefore t1[i] = t2[i] = . If
T [i] = o, then we know t1[i] = t2[i] = o. Finally, if T [i] = Read(x), then we know
t1[i] = read(x, n1), n1 = M [x], t2[i] = read(x, n2), and n2 = M [x]. Therefore
n1 = n2, and thus t1[i] = t2[i]. Therefore ∀i.t1[i] = t2[i], and according to Lemma 4,
we have t1 ≡ t2.
Then we have the following lemmas:
Lemma 10. Suppose Γ ` e : τ ;T , T ∼L T ′ for some T ′, and memory M is Γ-valid.
If 〈M, e〉 ⇓t n, then t ∈ T [M ].
Proof. We prove by structural induction on e. If e is n, then T =  = t.
If e is x, then T = evt(lab(Γ(x)),Read(x)). If lab(Γ(x)) = l ∈ ORAMBanks,
then t = l ∈ l[M ]. If lab(Γ(x)) = L, then T = Read(x), and t = read(x, n), where
M(x) = (n,L). According to the definition, we know t ∈ T [M ].
If e is e1 op e2, then suppose 〈M, ei〉 ⇓ti ni and Γ ` ei : li;Ti for i = 1, 2.
190
Then according to the induction assumption, we have ti ∈ Ti[M ] for i = 1, 2. Since
t = t1@t2, and T = T1@T2, we know t ∈ T [M ].
Next we consider x[e′]. Suppose Γ ` e′ : Nat l′;T ′, and 〈M, e′〉 ⇓t′ n′, then
T = T ′@evt(lab(Γ(x)),Readarr(x)), and t = t′@evt(lab(Γ(x)), readarr(x, n′, n′′))
for some n′′. Moreover, we have t′ ∈ T ′[M ] by induction. Since T ∼L T ′, we know
lab(Γ(x)) ∈ ORAMBanks. Therefore t = t′@lab(Γ(x)) ∈ T ′@lab(Γ(x))[M ] =
T [M ].
Lemma 11. Assume Γ, l0 ` S;T , T ∼L T ′ for some T ′, and l0 ∈ ORAMBanks,
and M is a Γ-valid memory. If 〈M,S〉 ⇓t M ′, then t ∈ T [M ].
Proof. We prove by structural induction on the statement S. Since l0 6= L, therefore
we know S cannot be a while statement. If S is skip, then T =  = t.
Let us consider when S is x := e. Then 〈M, e〉 ⇓t′ n′, and Γ ` e : τ ;T ′, and
T = T ′@evt(lab(Γ(x)),Write(x)). Since T ∼L T , T does not contain Write(x), and
thus lab(Γ(x)) ∈ ORAMBanks. Therefore t = t′@lab(Γ(x)) ∈ T ′@lab(Γ(x))[M ]
by Lemma 10.
Next, suppose S is x[e1] = e2. Suppose 〈M, ei〉 ⇓ti ni, and Γ ` ei : τ ;Ti
for i = 1, 2 by induction. Then ti ∈ Ti[M ] for i = 1, 2. Similar to the discussion
for x := e, we know lab(Γ(x)) ∈ ORAMBanks, and thus t = t1@t2@lab(Γ(x)) ∈
T1@T2@lab(Γ(x))[M ].
Next, let us consider (if)(e, S1, S2). Then Γ, l0 ` Si;Ti for i = 1, 2, and
T1 ∼L T2. As well Γ ` e : τ ;Te, 〈M, e〉 ⇓te ne, idx = ite(ne, 1, 2), and 〈M,Sidx〉 ⇓tidx
M ′. Then T = Te@T1, and te ∈ Te[M ]. If idx = 1, then 〈M,S1〉t1M ′, and thus
191
t1 ∈ T1[M ]. Therefore t = te@t1 ∈ Te@T1[M ] = T [M ]. Similarly, if idx = 2,
then 〈M,S2〉t2M ′, and thus t2 ∈ T2[M ]. Therefore t2 ∈ T1[M ]. As a conclusion
t = te@t2 ∈ Te@T1[M ] = T [M ].
Finally, suppose S is S1;S2. Then we know Γ, l0 ` Si;Ti for i = 1, 2, 〈M,S1〉 ⇓t1
M ′, and 〈M ′, S2〉 ⇓t2 M ′′. Since l0 ∈ ORAMBanks, we know M ∼L M ′ ∼L M ′′.
By induction assumption, we know t1 ∈ T1[M ], and t2 ∈ T2[M ′]. Since M ∼L M ′,
according to Corollary 1, we know t2 ∈ T2[M ]. Therefore t = t1@t2 ∈ T1@T2[M ] =
T [M ].
Lemma 12. Suppose Γ, l0 ` Si;Ti, for i = 1, 2, where l0 ∈ ORAMBanks, and
T1 ∼L T2. Given two Γ-valid low-equivalent memories M1, M2, if 〈M1, S1〉 ⇓t1 M ′1,
and 〈M2, S2〉 ⇓t2 M ′2, then M ′1 ∼L M ′2, and t1 ≡ t2.
Proof. According to Lemma 11, we know that ti ∈ Ti[Mi] for i = 1, 2. According
to Lemma 8, we know that M ′1 ∼L M1 and M ′2 ∼L M2. Since M1 ∼L M2, we know
that M ′1 ∼L M1 ∼L M2 ∼L M ′2. Because t1 ∈ T1[M1], and M1 ∼L M2, therefore
t1 ∈ T1[M2]. Furthermore, since T1 ∼L T2, we have t1 ∈ T2[M2]. Finally, since
t2 ∈ T2[M2], and according to Corollary 2 we have t1 ≡ t2.
Now we are ready to prove Theorem 1.
Proof of Theorem 1. We extend this conclusion by considering both normal state-
ment and labeled statement, and shall prove by induction on the statement s. For
notational convention, we suppose 〈M1, s〉 ⇓t1 M ′1, and 〈M2, s〉 ⇓t2 M ′2, and thus
Γ, l0 ` S;T with M1 ∼L M2 and both are Γ-valid. Our goal is prove t1 ≡ t2, and
M1 ∼L M2.
192
If s is skip, it is obvious.
Suppose s is x := e, then Γ ` e : Nat l;T . Suppose 〈Mi, e〉 ⇓t′i ni, for
i = 1, 2. According to Lemma 6 and Lemma 7, we know t′1 = t
′
2. If lab(Γ(x)) ∈
ORAMBanks, then M ′1 = M1[x 7→ (n1, l1)] ∼L M1 ∼L M2 ∼L M2[x 7→ (n2, l2)] =
M ′2, and t1 = t
′
1@lab(Γ(x)) = t
′
2@lab(Γ(x)) = t2, which implies t1 ≡ t2.
If Γ(x) = Nat L, then we know Γ ` e : Nat L;T , and according to Lemma 6,
we have n1 = n2. Then we also have M
′
1 = M1[x→ n1] ∼L M2[x→ n2] = M ′2, and
t1 = t
′
1@read(x, n1) ≡ t′2@read(x, n2) = t2.
Next, suppose s is x[e1] := e2. Suppose 〈Mi, ej〉 ⇓tij nij for i = 1, 2, j =
1, 2. If lab(Γ(x)) = L, then we know Γ ` ej : Nat L, for j = 1, 2. Then by
Lemma 6, we have t1j = t2j, which implies t1j ≡ t2j, n1j = n2j, and according to
the definition of Γ-validity and low-equivalence, ∀i.M1(x)(i) = M2(x)(i). Therefore
t1 = t11@t21@writearr(x, n11, n12) ≡ t21@t22@writearr(x, n21, n22) = t2, and M ′1 =
M1[x→M1(x)[n11 → n12]] ∼L M2[x→M2(x)[n21 → n22]] = M ′2.
Otherwise, if Γ(x) ∈ ORAMBanks, suppose Γ ` ei : Nat li;Ti, for i = 1, 2.
Then we know l0 unionsq l1 unionsq l2 v lab(Γ(x)). Therefore, by Lemma 7, based on the same
reasoning as above for Nat l case, we have t1 = t11@t21@Γ(x) ≡ t21@t22@Γ(x) = t2.
Furthermore, M ′1 = M1[x → m1] ∼L M1 ∼L M2 ∼L M2[x → m2] = M ′2 for some
two mappings, m1 and m2.
Then suppose the statement is if(e, S1, S2). There are two situations. If Γ `
e : Nat le;Te, where le unionsq l0 ∈ ORAMBanks, then according to Lemma 12, we
know M ′1 ∼L M ′2, and t1 ≡ t2. Otherwise, we have le = L and l0 = L. Suppose
〈Mi, e〉 ⇓t′i ni, for i = 1, 2, then according to Lemma 6, we know t′1 = t′2, which
193
implies t′1 ≡ t′2, and n1 = n2. If ite(n1, 1, 2) = 1, then we know 〈M1, S1〉 ⇓t′′1 M ′1 and
〈M2, S1〉 ⇓t′′2 M ′2. Therefore t1 = t′1@t′′1 ≡ t′1@t′′2 = t2, and M ′1 ∼L M ′2 by induction.
We can show the conclusion for ite(n1, 1, 2) = 2 similarly.
Next, let us consider the statement while(e, S). We know Γ ` e : Nat L;T ,
therefore there exists a constant n, and a trace t, such that 〈Mi, e〉 ⇓t, n for both
i = 1, 2, by Lemma 6.
We prove by induction on how many steps applying the S-WhileT rule and
S-WhileF rule (WHILE rules for short) to derive 〈while(e, S),M1〉 ⇓t1〉. If we only
apply one time, then we must apply S-WhileF rule, and thus n = 0. Then we
have t1 = t = t2, and M
′
1 = M1 ∼L M2 = M ′2. Suppose the conclusion is true
when we need to apply n − 1 steps of WHILE rules, now let us consider when we
need to apply n > 0 steps. Then we know n 6= 0. Suppose 〈Mi, S〉 ⇓ti1 Mi1, and
〈Mi1,while(e, S)〉 ⇓ti2 M ′i , for i = 1, 2. Then we know that we need to apply n− 1
steps of WHILE rules to derive 〈M11,while(e, S)〉 ⇓t12 M ′1. By induction, we have
t11 = t21, t12 = t22, and M11 ∼L M21. Therefore M ′1 ∼L M ′2, and t1 = t11@t12 =
t21@t22 = t2.
Finally, let us consider S1;S2. Suppose Γ, l0 ` S1;T1, Γ, l0 ` S2;T2, 〈Mi, S1〉 ⇓ti1
Mi1, and 〈Mi1, S2〉 ⇓t21 M ′i . Then by induction assumption, we have t11 = t21, t12 =
t22, M11 ∼L M12, and thus M ′1 ∼L M ′2. Therefore t1 = t11@t12 = t21@t22 = t2.
194
Appendix B: Proof of Theorem 2
Our proof proceeds in two steps. First, we prove a terminating version (The-
orem 6) of Theorem 2. By having the assumption that the program will finally
terminate, i.e. its execution does not end in an infinite loop, we will show how our
type system enforces the MTO property. Then by using Theorem 6, we will show
the MTO property holds for non-terminating programs, (i.e. Theorem 7), which
implies Theorem 2 as an obvious corollary.
We start with the terminating case. We first prove some useful lemmas for
terminating programs.
Lemma 13. For all I, `,Υ, Sym, Υ′, Sym′, T , such that
` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T
if there is some i such that I(i) = jmp n, then 0 ≤ i+ n ≤ |I|.
Proof. We prove by induction on
` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T
It is clearly that rule T-LOAD, T-STORE, T-LOADW, T-STOREW, T-IDB, T-
195
BOP, T-ASSIGN, T-NOP cannot derive I.
If this judgement is derived using rule T-SEQ, then we know I = I1; I2, and
we have
` ` I1 : 〈Υ, Sym〉 → 〈Υ′′, Sym′′〉;T1
` ` I2 : 〈Υ′′, Sym′′〉 → 〈Υ′, Sym′〉;T2
Then either i < |I1|, or |I1| ≤ i < |I1| + |I2| = |I|. For the first case, we have
I1(i) = jmp n, and thus by induction, we have 0 ≤ i + n ≤ |I1| < |I|. For the
second case, we have I2(i − |I1|) = jmp n, and thus by induction assumption, we
have 0 ≤ i− |I1| ≤ |I2|. Therefore, we have 0 < |I1| ≤ i ≤ |I1|+ |I2| = |I|.
If this judgement is derived by rule T-IF, then we know I = ι1; It; ι2; If , and
ι1 = br r1 rop r2 ↪→ n1
ι2 = jmp n2
n1 − 2 = |It|
n2 = |If |+ 1
Then, there are three possible scenarios:
1. 1 ≤ i ≤ 1+|It|. In this case, we know It(i−1) = jmp n, and 0 ≤ i−1+n ≤ |It|.
Therefore 0 < 1 ≤ i+ n ≤ |It|+ 1 < |I|;
2. i = 1 + |It|. In this case, I(i) = ι2. Therefore, we have 0 < i + n2 =
196
2 + |It|+ |If | = |I|;
3. 2 + |It| ≤ i < 2 + |It| + |If | = |I|. In this case, we can prove the conclusion
similarly to Case 1.
If this judgement is derived by rule T-WHILE, then we can prove the conclu-
sion similarly to the T-IF case.
Finally, if the judgement is derive by rule T-SUB, then the result follows by
induction.
Lemma 14. For all I, `,Υ, Sym, Υ′, Sym′, T , such that
` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T
if there is some i such that I(i) = br r1 rop r2 ↪→ n, then 0 ≤ i+ n ≤ |I|.
Proof (sketch). The proof is similar to the proof for Lemma 13.
Lemma 15. Given I = I1; I2; I3, Ri, Si,Mi, pci for i = 1, ..., k + 1, and ti for
i = 1, ..., k,
|I1| ≤ pci < |I1|+ |I2| ∀i ∈ {1, ..., k}
|I1| ≤ pck+1 ≤ |I1|+ |I2|
and
|ti| = 1 ∀i ∈ {1, ..., k}.
Then
I ` (Ri, Si,Mi, pci)→ti (Ri+1, Si+1,Mi+1, pci+1)
197
holds true for i = 1, ..., k, if and only if
I2 ` (Ri, Si,Mi, pci − |I1|)→ti
(Ri+1, Si+1,Mi+1, pci+1 − |I1|)
holds true for i = 1, ..., k.
Proof. We only prove the only-if-direction, and the if-direction is similar. We prove
by induction on k. We first prove for k = 1. Consider each possibility for I(pc1).
Case ldb k ← l[r]. By rule LOAD, we know
n = |R(r) mod size(l) |
b = M(l, n)
S2 = S1[k 7→ (b, (l, n))]
t1 = select(l, read(n, b), eread(n), l)
R2 = R1 M2 = M1 pc2 = pc1 + 1
Clearly, we have pc2 − |I1| = pc1 − |I1| + 1, and I2(pc1 − |I1|) = ldb k ← l[r], and
thus
I2 ` (R1, S1,M1, pc1 − |I1|)→t1 (R2, S2,M2, pc2 − |I1|)
Cases stb k, k ← idb r, ldw r1 ← k[r2], stw r1 → k[r2], r1 ← r2 op r3,
r1 ← n, or nop are similar.
198
Case jmp n′. We have
R2 = R1 S2 = S1 M2 = M1 pc2 = pc1 + n
′
Since pc2 − |I1 |= pc1 − |I1| + n′, then the conclusion is obvious. We can prove
similarly for I(pc1) = br r1 rop r2 ↪→ n′.
Now, we assume the conclusion holds true for k ≤ k′. For k = k′+1, We know
I2 ` (Ri, Si,Mi, pci − |I1|)→ti (Ri+1, Si+1,Mi+1, pci+1 − |I1|)
holds true for i = 1, ..., k′ by the induction assumption. Further, since the assump-
tion says the conclusion holds for k = 1, thus
I2 ` (Rk′ , Sk′ ,Mk′ , pck′ − |I1|)→tk′
(Rk′+1, Sk′+1,Mk′+1, pck′+1 − |I1|)
Therefore, the conclusion holds true.
Using Lemma 13, 14, 15, we can prove the following important lemma.
Lemma 16. Given I = Ia; I
′; Ib, where any of Ia and Ib can be empty, `, `′,
Υ1,Υ
′
1,Υ2,Υ
′
2, Sym1, Sym
′
1, Sym2, Sym
′
2, T, T
′, such that
` ` I : 〈Υ1, Sym1〉 → 〈Υ2, Sym2〉;T
199
and
`′ ` I ′ : 〈Υ′1, Sym′1〉 → 〈Υ′2, Sym′2〉;T ′
If for pc = |Ia| and pc′, where pc′ < |Ia| or pc′ ≥ |Ia|+ |I ′|, such that
I ` (R, S,M, pc)→t (R′, S ′,M ′, pc′),
then there exists R′′, S ′′,M ′′, pc′′, t′, t′, such that t = t′@t′′ (where t′′ can be ), and
I ` (R, S,M, pc)→t′ (R′′, S ′′,M ′′, pc′′)
I ` (R′′, S ′′,M ′′, pc′′)→t′′ (R′, S ′,M ′, pc′)
pc′′ = |Ia|+ |I ′| t ≡ t′@t′′ |t′| > 0 |t′′| ≥ 0
We provide an intuitive explanation of this lemma as follows. It says suppose a
type-checked program I contains a type-checked segment I ′, and if the the program
runs from the start of the segment I ′ (i.e. pc = |Ia|), and ends outside the segment
(i.e. pc′ < |Ia| or pc′ ≥ |Ia|+ |I ′|), then it must stop at the end of the segment (i.e.
pc′′ = |Ia|+ |I ′|) first.
Notice the correctness of this lemma does not depend on whether Ia or Ib can
type-check. Further, none or either or even both of Ia and Ib can be empty. In the
last case, the conclusion is trivial.
Now we prove this lemma.
200
Proof. We suppose for k = |t| − 1,
I ` (Ri, Si,Mi, pci)→ti (Ri+1, Si+1,Mi+1, pci+1)
for i = 0, ..., k, where
|ti| = 1 ∀i ∈ {0, ..., k}
R0 = R S0 = S M0 = M pc0 = pc
Rk = R′ Sk = S ′ Mk = M ′ pck = pc′
Suppose i? is the smallest one such that pci? < |Ia| or pci? ≥ |Ia| + |I ′|. Then, by
Lemma 15, we know
I ′ ` (Ri, Si,Mi, pˆci)→ti (Ri+1, Si+1,Mi+1, pˆci+1)
for i = 0, ..., i? − 2, where pˆci = pci − |I|. Clearly, we know 0 ≤ pˆci?−1 < |I ′| by the
definition of i?. Now, we consider the instruction I ′(pˆci?−1). If it is not jmp n
′ or
br r1 rop r2 ↪→ n′, then we know
pci? = pci?−1 + 1 > |Ia|.
Therefore
pci? ≥ |Ia|+ |I ′|
201
Further, since pci?−1 − |Ia |= pˆci?−1 < |I ′|, we have
pci? ≤ |Ia|+ |I ′|
Therefore, we have pci? = |Ia| + |I ′|, and thus R′′ = Ri? , S ′′ = Si? , M ′′ = Mi? ,
pc′′ = pci? , t′ = t0@...@ti?−1, and t′′ = ti?@...tk satisfy the property.
If I ′(pˆci? − 1) = jmp n′, then by Lemma 13, we have
0 ≤ pˆci?−1 + n′ ≤ |I ′|
Therefore, we have
pci? = pci?−1 + n′ = pˆci?−1 + |Ia|+ n′ ≤ |Ia|+ |I ′|
and
pci? = pci?−1 + n′ = pˆci?−1 + |Ia|+ n′ ≥ |Ia|
Therefore, by the same argument as above, we know pci? = |Ia| + |I ′|, and the
conclusion is true.
Finally, if I ′(pˆci? − 1) = br r1 rop r2 ↪→ n′, we can prove the conclusion
similarly using Lemma 14.
Now we can state and prove Theorem 6. Some judgments mentioned in the
theorem are defined in Figure B.1.
Theorem 6. Given a program I in LT , such that ` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T ,
202
mem((b, (l, n))) = l
idx((b, (l, n))) = n
block((b, (l, n))) = b
∀k ∈ BlockIDs.
Υ(k) = mem(S1(k)) = mem(S2(k))
Υ(k) = D ⇒ S1(k) = S2(k)
Υ(k) = E ⇒ idx(S1(k)) = idx(S2(k))
Υ ` S1 ∼ S2
∀r ∈ Registers.Υ(r) = L⇒ R1(r) = R2(r)
Υ ` R1 ∼ R2
(S, sv) ⇓ v
mem(S(k)) = l = D
(S, sv) ⇓ v2
v = block(S(k))(v2)
(S,Ml[k, sv]) ⇓ v
(S, n) ⇓ n
(S, sv1) ⇓ v1
(S, sv2) ⇓ v2
v = v1 aop v2
(S, sv1 aop sv2) ⇓ v
Figure B.1: Well formedness judgments for proof of Memory-Trace Obliviousness of
LGhostRider
two memories M1,M2, two register mapping R1, R2, and two scratchpad mapping
S1, S2, if the following assumptions are satisfied:
1. M1 ∼L M2;
2. Υ ` R1 ∼ R2;
3. Υ ` S1 ∼ S2;
4. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∨ `const Sym(r))∧ `ok Sym(r)⇒ (Si, Sym(r)) ⇓
Ri(r);
5. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∨ `const Sym(r))∧ `ok Sym(k)⇒ ∃n.(Si, Sym(k)) ⇓
n ∧ |nmod size(Υ(k)) | = idx(Si(k)).
203
and
I ` (R1, S1,M1, 0)→t1 (R′1, S ′1,M ′1, pc′)
I ` (R2, S2,M2, 0)→t2 (R′2, S ′2,M ′2, pc′′)
where pc′ = pc′′ = |I|, then we have the following conclusions:
1. M ′1 ∼L M ′2;
2. Υ′ ` R′1 ∼ R′2;
3. Υ′ ` S ′1 ∼ S ′2;
4. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∨ `const Sym(r))∧ `safe Sym′(r) ⇒
(S ′i, Sym
′(r)) ⇓ R′i(r);
5. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∨ `const Sym(k))∧ `safe Sym′(k) ⇒
∃n.(S ′i, Sym′(k)) ⇓ n ∧ |nmod size(Υ′(k)) | = idx(S ′i(k)).;
6. t1 ≡ t2
We first provide an intuition of this theorem.
Proof of Theorem 6. We prove Theorem 6 by induction on the length of derivation
to derive
L ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T
Case T-SEQ. Then I = I1; I2, and by inversion, we have
L ` I : 〈Υ, Sym〉 → 〈Υ′, Sym〉;T
204
L ` I1 : 〈Υ, Sym〉 → 〈Υ1, Sym1〉;T1
L ` I2 : 〈Υ1, Sym1〉 → 〈Υ′, Sym′〉;T2
By Lemma 15 and Lemma 16, we know
I1 ` (R1, S1,M1, 0)→t11 (R′′1, S ′′1 ,M ′′1 , |I1|)
I2 ` (R′′1, S ′′1 ,M ′′1 , 0)→t21 (R′1, S ′1,M ′1, |I2|)
Similarly, we have
I1 ` (R2, S2,M2, 0)→t12 (R′′2, S ′′2 ,M ′′2 , |I1|)
I2 ` (R′′2, S ′′2 ,M ′′2 , 0)→t22 (R′2, S ′2,M ′2, |I2|)
By induction assumption, we have
1. M ′′1 ∼L M ′′2 ;
2. Υ ` R′′1 ∼ R′′2;
3. Υ ` S ′′1 ∼ S ′′2 ;
4. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∨ `const Sym′′(r))∧ `safe Sym′′(r) ⇒
(S ′′i , Sym
′′(r)) ⇓ R′i(r);
5. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∨ `const Sym′′(k)) ∧ ∧ `safe Sym′′(k)⇒
∃n.(S ′′i , Sym′′(k)) ⇓ n ∧ |n mod size(Υ′′(k)) | = idx(S ′′i (k)).
205
6. t11 ≡ t12
By induction assumption again, we have conclusions 1-5, and t21 ≡ t22. Therefore, we
know t1 = t
1
1@t
2
1 ≡ t12@t22 = t2, which is conclusion 6.
Case T-LOAD. Then I = ldb k ← l[r], and pc′ = pc′′ = 1. Further, by
inversion, we know
ni = |Ri(r) mod size(l)|
bi = Mi(l, ni) S
′
i = Si[k 7→ (bi, (l, ni))]
ti = select(l, read(ni, bi), eread(ni), l) i = 1, 2
We prove conclusions 1-6 hold true. 1 holds true trivially, since the memories are
not changed, i.e. M ′1 = M1 ∼L M2 = M ′2, and 2 and 4 hold true for the same reason.
We prove conclusion 3 as follows. First, we know Υ′(k) = l = mem(S ′1(k)) =
mem(S ′2(k)). If Υ
′(k) = D or Υ′(k) = E, then we know l 6∈ ORAMbanks, and
thus Υ′(r) = L, which implies that idx(S ′1(k)) = n1 = n2 = idx(S
′
2(k)). Further, if
Υ′(k) = l = D, then we know b1 = M1(D,n1) = M2(D,n2) = b2 (due to M1 ∼L M2),
which implies S ′1(k) = S
′
2(k). Therefore, we know Υ ` S ′1 ∼ S ′2.
For conclusion 5, since for all k′ 6= k, Sym′(k′) = Sym(k′) and S ′i(k′) =
Si(k
′) (i = 1, 2), therefore conclusion 7 holds true. For k, if `safe Sym′(k) does
not hold, then the conclusion is trivial. Now we assume `safe Sym′(k), if and
only if `safe Sym(r). By assumption 6, we know (Si, Sym(r)) ⇓ Ri(r). Since
|Ri(r) mod size(Υ′(k)) | = |Ri(r) mod size(l) | = ni, we know conclusion 7 holds
true.
206
For conclusion 6, if l = D, then following the discussion for conclusion 3, we
know n1 = n2 and b1 = b2, and thus t1 = read(n1, b1) = read(n2, b2) = t2. If l = E,
then similarly we know n1 = n2, and thus t1 = eread(n1) = eread(n2) = t2. If
l ∈ ORAMBanks, then we know t1 = l = t2. Thus conclusion 6 holds true.
Case T-STORE. If I = stb k, and I is typed using rule T-STORE, then we
have pc′ = pc′′ = 1. Further, we know
(bi, ai) = Si(k) ai = (li, ni) M
′
i = Mi[ai 7→ bi]
ti = select(li, write(ni, bi), ewrite(ni), li) i = 1, 2
Conclusions 2-5 are trivial, since registers and scratchpads are not changed. We first
prove conclusion 6. By assumption 3, we know l1 = l2. If l1 = D, then we know
t1 = write(n1, b1), and t2 = write(n2, b2). By assumption 4, we know n1 = n2 and
b1 = b2, therefore t1 = t2. If l1 = E, then by assumption 5, we know n1 = n2.
Therefore t1 = ewrite(n1) = ewrite(n2) = t2. Finally, if l1 ∈ ORAMBanks, then
t1 = l1 = l2 = t2.
Now we prove conclusion 1. The only difference between M ′i and Mi is the
value for ai = (li, ni). To show that M
′
1 ∼L M ′2, we only need to show that if
l1 = l2 = D, and b1 = b2. This point is induced by assumption 4. Therefore,
conclusion 1 holds.
Case T-LOADW. If I = ldw rx ← k[ry], and I is typed using rule T-
207
LOADW, then we have pc′ = pc′′ = 1. Further, we know
(bi, ai) = Si(k) ni = |R(ry) mod size(bi) |
R′i = Ri[rx 7→ bi(ni)] i = 1, 2
Since scratchpad and memories are not changed, conclusions 1, 3, and 5, trivially
hold true. Further, since t1 = f = t2, conclusion 6 is also true. We only need to prove
for conclusions 2 and 4. For conclusion 2, if Υ′(rx) = H, the conclusion is trivial.
If Υ′(rx) = L, then we know l = D, and by rule T-LOADW, we know Υ(ry) = L,
which implies, by assumption 2, n1 = n2. Further, since l = D, by assumption 3,
we know b1 = b2. Therefore R
′
1(rx) = b1(n1) = b2(n2) = R
′
2(rx).
For conclusion 4, all we need to show is that for i = 1, 2, either (` = H∨ `const
Ml[k, Sym(ry)])∧ `safe Ml[k, Sym(ry)] is not true, ` = L or
(Si,Ml[k, Sym(r2) mod size(bi)]) ⇓ bi(ni)
holds true. First of all, `const Ml[k, Sym(ry)] is not true. W.L.O.G. we suppose
`safe Ml[k, Sym(ry)] and ` = H, then we know l = D and `safe Sym(ry). Therefore
we have (Si, Sym(ry) mod size(bi)) ⇓ R(ry) mod size(bi) = ni. Further, by
conclusion 3, we know Υ′(k) = mem(Si(k)) = D. Combining with bi = block(Si(k)),
we have
(Si,Ml[k, Sym(r2) mod size(bi)]) ⇓ bi(ni)
Case T-IDB. If I = k ← idb r, and I is typed using rule T-IDB, then we
208
have pc′ = pc′′ = 1. Further, we know
(bi, (li, ni)) = Si(k) R
′
i = Ri[r 7→ ni] i = 1, 2
Conclusions 1, 3, 5, and 6 hold trivially. Conclusions 2 and 4 are implied by as-
sumptions 3 and 5 respectively.
Case T-STOREW. If I = stw rx → k[ry], and I is typed using rule T-
STOREW, then we have pc′ = pc′′ = 1. Further, we know
(bi, ai) = Si(k) ni = |Ri(ry) mod size(b) |
S ′i = Si[k 7→ (bi[ni 7→ Ri(rx)], ai) i = 1, 2
Since registers and memories are not changed, conclusions 1, 2, and 4 hold true.
Further, since t1 = f = t2, we know conclusion 6 is true. We now prove conclusions 3
and 5. Clearly, we only need to prove for Υ′(k) and Sym(k). Suppose ai = (li, idxi),
then we know Υ′(k) = Υ(k) = l1 = l2. Further, if Υ′(k) = Υ(k) = D, then
we know Υ(rx) = Υ(ry) = L, which, by assumption 2, implies R1(rx) = R2(rx)
and R1(ry) = R2(ry). Therefore n1 = n2. Since Υ(k) = D, by assumption 4,
we have S1(k) = S2(k), and thus b1 = b2. Therefore we have S
′
1(k) = (b1[n1 7→
R1(rx)], a1) = (b2[n2 7→ R2(rx)], a2) = S ′2(k). Finally, if Υ′(k) = Υ(k) = E, then
by assumption 5, we know idx1 = idx(S1(k)) = idx(S2(k)) = idx2. Then we have
idx(S ′1(k)) = idx1 = idx2 = idx(S
′
2(k)). Therefore, conclusion 3 is true.
For conclusion 5, first we suppose ` = H and `safe Sym′(k). Since Sym′(k) =
Sym(k), we know `safe Sym(k), and thus by assumption 5, we know for some
209
n1, n2, (Si, Sym
′(k)) ⇓ n and |n |Υ(k) = idxi. Further, since ` = H, we know
slab(Υ(k)) = H, and thus mem(Si(k)) = li 6= D for i = 1, 2. Now we prove
that if (S1, sv) ⇓ v, then (S ′1, sv) ⇓ v by induction on sv. If sv = Ml[k′, sv1],
then we know mem(S1(k
′)) = l = D, and (S1, sv) ⇓ v2. Since mem(S1(k)) 6=
D, we know k 6= k′. Therefore we know S ′1(k) = S1(k), and thus by induction
assumption, we have (S ′1, sv) ⇓ v. Next, if sv = n, then the conclusion holds
trivially. If sv = sv1 aop sv2, then we know (S1, sv1) ⇓ v1, (S1, sv2) ⇓ v2, and
v = v1 aop v2. Therefore we know (S
′
1, sv) ⇓ v by induction. Similarly, we can prove
that if (S2, sv) ⇓ v, then (S ′2, sv) ⇓ v. Therefore, we know (S ′i, Sym′(k)) ⇓ ni, and
|ni mod size(Υ′(k)) | = idxi = idx(S ′i(k)), which means conclusion 5 holds true.
Similarly, if `const Sym′(k) and `safe Sym′(k), we can also prove conclusion 5 easily.
Case T-BOP. If I = r1 ← r2 aop r3, and I is typed using rule T-BOP, then
we have pc′ = pc′′ = 1. Further, we know
ni = Ri(r2) aopRi(r3) R
′
i = Ri[r1 7→ ni] i = 1, 2
Conclusions 1, 3, 5, and 6 are trivial. For conclusion 2, we only need to prove if
Υ′(r1) = l′ = Υ(r2)unionsqΥ(r3) = L, then R1(r1) = n1 = R2(r1) = n2. To see this, since
Υ(r2) unionsq Υ(r3) = L, we know Υ(r2) = Υ(r3) = L. Therefore by assumption 2, we
know R1(r2) = R2(r2) and R1(r3) = R2(r3). Therefore, we have
n1 = R1(r2) aop R1(r3) = R2(r2) aop R2(r3) = n2
210
For conclusion 4, we only need to consider Sym′(r1). If
`safe Sym′(r1) = Sym(r2) aop Sym(r3)
then we know `safe Sym(r2) and `safe Sym(r3). Further, we know ` = H or `const
Sym(r1) and `const Sym(r2). Therefore, by assumption 6, we know
(Si, Sym(r2)) ⇓ Ri(r2), (Si, Sym(r3)) ⇓ Ri(r3) for i = 1, 2
Therefore, we have
(Si, Sym
′(r1)) ⇓ Ri(r2) aop Ri(r3) = ni = R′i(r1) i = 1, 2
Therefore conclusion 4 holds true.
If I = r ← n, and I is typed using rule T-ASSIGN, then we have pc = 0 and
pc′ = pc′′ = 1. Further, we know
R′i = Ri[r1 7→ n] i = 1, 2
Similar to the T-BOP rule, conclusions 1, 3, 5 and 6 are trivial. For conclusion 2,
we know R′1(r1) = R
′
2(r1) = n. For conclusion 4, we have (Si, n) ⇓ n for i = 1, 2.
Case T-NOP. If I = nop, and I is typed using T-NOP, then this is trivial,
because Mi = M
′
i , Ri = R
′
i, Si = S
′
i for i = 1, 2, and Υ = Υ
′ and Sym = Sym′.
211
Case T-SUB. If I is typed using T-SUB, then we know
` ` ι : 〈Υ, Sym〉 → 〈Υ′′, Sym′′〉;T
where Υ′′  Υ′ and Sym′′  Sym′. W.L.O.G, we assume ` ` ι : 〈Υ, Sym〉 →
〈Υ′′, Sym′′〉;T is not derived by rule T-SUB (i.e.  is transitive). By induction, we
have the following results
1. M ′1 ∼L M ′2;
2. Υ ` R′1 ∼ R′2;
3. Υ ` S ′1 ∼ S ′2;
4. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∨ `const Sym′′(r))∧ `safe Sym′′(r) ⇒
(S ′i, Sym
′′(r)) ⇓ R′i(r);
5. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∨ `const Sym′′(r))∧ `safe Sym′′(k) ⇒
∃n.(S ′i, Sym′(k)) ⇓ n ∧ |n mod size(Υ′(k)) | = idx(S ′i(k)).
6. t1 ≡ t2
Therefore, conclusions 1 and 6 follow results 1 and 6. For conclusion 2, if Υ′(r) = L,
then since Υ′′(r) v Υ′(r), we know Υ′′(r) = L, and thus R′1(r) = R′2(r).
Conclusion 3 naturally holds true, since Υ′′(k) = Υ′(k). For conclusion 4, since
Sym′′  Sym′, we know Sym′(r) =? or Sym′(r) = Sym′′(r). In the former case,
`safe Sym′(r) does not hold, and thus conclusion 4 is vacuously true. In the later
212
case, conclusion 4 directly follows results 4. Similarly, we can prove conclusion 5 as
well.
Case T-LOOP. If I is typed using rule T-LOOP, then we know I = Ic; ι1; Ib; ι2,
and
` ` Ic : 〈Υ, Sym〉 → 〈Υ′, Sym′〉
` ` Ib : 〈Υ′, Sym′〉 → 〈Υ, Sym〉
We prove the conclusion by induction on number of times that instruction ι1 is
executed. First, assume ι1 is executed once. Therefore, by Lemma 16, we know that
Ic ` (R1, S1,M1, 0)→t11 (R′′1, S ′′1 ,M ′′1 , |Ic|)
I ` (R′′1, S ′′1 ,M ′′1 , |Ic|)→t21 (R′1, S ′1,M ′1, pc′)
Ic ` (R2, S2,M2, 0)→t12 (R′′2, S ′′2 ,M ′′2 , |Ic|)
I ` (R′′2, S ′′2 ,M ′′2 , |Ic|)→t22 (R′2, S ′2,M ′2, pc′′)
t1 = t
1
1@t
2
1 t2 = t
1
2@t
2
2
By induction assumption, we have
1. M ′′1 ∼L M ′′2 ;
2. Υ ` R′′1 ∼ R′′2;
3. Υ ` S ′′1 ∼ S ′′2 ;
213
4. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∧ `const Sym′′(r))∧ `safe Sym′′(r) ⇒
(S ′′i , Sym
′′(r)) ⇓ R′′i (r);
5. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∧ `const Sym′′(r))∧ `safe Sym′′(k) ⇒
∃n.(S ′′i , Sym′(k)) ⇓ n ∧ |n mod size(Υ′′(k)) |= idx(S ′′i (k)).
6. t11 ≡ t12
Further, we know I(|Ic|) = ι1 = br r1 rop r2 ↪→ n1, where Υ(r1)unionsqΥ(r2) v L, which
implies Υ(r1) = Υ(r2) = L. Therefore, we know R
′′
1(r1) = R
′′
2(r1) and R
′′
1(r2) =
R′′2(r2), which implies if
I ` (R′′i , S ′′i ,M ′′i , |Ic|)→f (R?i , S?i ,M?i , pc?i )
then pc?1 = pc
?
2, which is either |Ic| + 1 or |Ic| + n1 = |I|. If the first case is true,
then we can show
Ic ` (R1, S1,M1, 0)→t11 (R′′1, S ′′1 ,M ′′1 , |Ic|)
I ` (R′′1, S ′′1 ,M ′′1 , |Ic|)→t21 (R′′1, S ′′1 ,M ′′1 , |Ic|+ 1)
the branch instruction ι1 will not be taken.
I ` (R′′1, S ′′1 ,M ′′1 , |Ic|+ 1)→t31 (R′′′1 , S ′′′1 ,M ′′′1 , |I| − 1)
I ` (R′′′1 , S ′′′1 ,M ′′′1 , |I| − 1)→t41 (R′′′1 , S ′′′1 ,M ′′′1 , 0)
I ` (R′′′1 , S ′′′1 ,M ′′′1 , 0)→t51 (R′1, S ′1,M ′1, |I|)
Then by the same analysis, we know during I ` (R′′′1 , S ′′′1 ,M ′′′1 , 0)→t51 (R′1, S ′1,M ′1, |I|),
the instruction ι1 will be executed at least once, and thus it will be executed at least
214
twice in total, which contradicts our assumption that ι1 is executed only once.
Therefore, we know pc?1 = pc
?
2 = |I|. Therefore, we know t1 = t11@f ≡ t22@f = t2, and
R′′i = Ri, S
′′
i = Si, M
′′
i = Mi (i = 1, 2). In this case, conclusions 1-6 all hold true.
Next, we assume the conclusions hold true for the number of the times that
ι1 is executed less than u > 1, and we consider the case when |t1| = u. By the same
analysis as above, we know
Ic ` (R1, S1,M1, 0)→t11 (R′′1, S ′′1 ,M ′′1 , |Ic|)
I ` (R′′1, S ′′1 ,M ′′1 , |Ic|)→f (R′′1, S ′′1 ,M ′′1 , |Ic|+ 1)
the branch instruction ι1 will not be taken.
I ` (R′′1, S ′′1 ,M ′′1 , |Ic|+ 1)→t21 (R′′′1 , S ′′′1 ,M ′′′1 , |I| − 1)
I ` (R′′′1 , S ′′′1 ,M ′′′1 , |I| − 1)→f (R′′′1 , S ′′′1 ,M ′′′1 , 0)
I ` (R′′′1 , S ′′′1 ,M ′′′1 , 0)→t31 (R′1, S ′1,M ′1, |I|)
Ic ` (R2, S2,M2, 0)→t12 (R′′2, S ′′2 ,M ′′2 , |Ic|)
I ` (R′′2, S ′′2 ,M ′′2 , |Ic|)→f (R′′2, S ′′2 ,M ′′2 , |Ic|+ 1)
the branch instruction ι1 will not be taken.
I ` (R′′2, S ′′2 ,M ′′2 , |Ic|+ 1)→t22 (R′′′2 , S ′′′2 ,M ′′′2 , |I| − 1)
I ` (R′′′2 , S ′′′2 ,M ′′′2 , |I| − 1)→f (R′′′2 , S ′′′2 ,M ′′′2 , 0)
I ` (R′′′2 , S ′′′2 ,M ′′′2 , 0)→t32 (R′2, S ′2,M ′2, |I|)
t1 = t
1
1@f@t
2
1@f@t
3
1 t2 = t
1
2@f@t
2
2@f@t
3
2
215
By induction assumption, we know
1. M ′′1 ∼L M ′′2 , M ′′′1 ∼L M ′′′2 , and M ′1 ∼L M ′2;
2. Υ ` R′′1 ∼ R′′2, Υ ` R′′′1 ∼ R′′′2 , and Υ ` R′1 ∼ R′2;
3. Υ ` S ′′1 ∼ S ′′2 , Υ ` S ′′′1 ∼ S ′′′2 , and Υ ` S ′1 ∼ S ′2;
4. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∧ `const Sym′′(r)) ∧ `safe Sym′′(r) ⇒
(S ′′i , Sym
′′(r)) ⇓ R′′i (r);
5. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∧ `const Sym′′′(r)) ∧ `safe Sym′′′(r) ⇒
(S ′′′i , Sym
′′′(r)) ⇓ R′′′i (r);
6. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∧ `const Sym′(r)) ∧ `safe Sym′(r) ⇒
(S ′i, Sym
′(r)) ⇓ R′i(r);
7. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∧ `const Sym′′(r)) ∧ `safe Sym′′(k) ⇒
∃n.(S ′′i , Sym′′(k)) ⇓ n
∧|n mod size(Υ′′(k)) |= idx(S ′′i (k)).
8. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∧ `const Sym′′′(r)) ∧ `safe Sym′′′(k) ⇒
∃n.(S ′′′i , Sym′′′(k)) ⇓ n
∧|n mod size(Υ′′′(k)) |= idx(S ′′′i (k)).
9. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∧ `const Sym′(r)) ∧ `safe Sym′(k) ⇒
∃n.(S ′i, Sym′(k)) ⇓ n
∧|n mod size(Υ′(k)) |= idx(S ′i(k)).
10. t11 ≡ t12, t21 ≡ t22, and t31 ≡ t32
216
Then conclusions 1-5 are true by the above results, and for conclusion 6, we have
t1 = t
1
1@f@t
2
1@f@t
3
1 ≡ t12@f@t22@f@t32 = t2.
Case T-IF. Thus I = ι1; It; ι2; If , where ι1 = br r1 rop r2 ↪→ n1. Consider `′.
If `′ = L, then we know Υ(r1) = Υ(r2) = L. In this case, by assumption 2, we know
R1(r1) = R2(r1) and R1(r2) = R2(r2). Therefore the program counter goes to the
same value in both cases. Formally speaking, we have
I ` 〈Ri, Si,Mi, 0〉 →f 〈Ri, Si,Mi, pci〉 i = 1, 2
where pc1 = pc2, which is either 1, or n1 = |It| + 2. If pc1 = 1, then by Lemma 16,
we know
I ` 〈Ri, Si,Mi, 1〉 →t′i 〈R′′i , S ′′i ,M ′′i , |It|+ 1〉 i = 1, 2
Further, since I(|It|+ 1) = ι2 = jmp n2, we know
I ` 〈R′′i , S ′′i ,M ′′i , |It|+ 1〉 →f 〈R′′i , S ′′i ,M ′′i , |I|〉
for i = 1, 2, which implies R′′i = R
′
i, S
′′
i = S
′
i, M
′′
i = M
′
i , and ti = f@t
′
i@f. It is
easy to prove, by induction assumption, that conclusions 1-5 hold true, and for 6,
we have
t1 = f@t
′
1@f = f@t
′
2@f = t2
Similarly, if pc1 = pc2 = n1, we can also prove the conclusions. Intuitively, this
means if ` = L, then the branching statement in this if-block will go to the same
217
T = read(l, k, sv) l = D (S, sv) ⇓ n
b = M(l, n) S ′ = S[k 7→ (b, (l, n))] t = read(n, b)
〈S,M, T 〉 →t 〈S ′,M〉
T = read(l, k, sv)
l = E (S, sv) ⇓ n
t = eread(n)
S ′ = S[k 7→ (?, (l, n))]
〈S,M, T 〉 →t 〈S ′,M〉
T = write(l, k, sv) l = D
S(k) = (b, (l, n)) t = write(n, b)
M ′ = M [(l, n) 7→ b]
〈S,M, T 〉 →t 〈S,M ′〉
T = write(l, k, sv) l = E
S(k) = (?, (l, n)) t = write(n, b)
M ′ = M [(l, n) 7→ b]
〈S,M, T 〉 →t 〈S,M〉
T = o t = o
〈S,M, T 〉 →t 〈S,M〉
T = F t = f
〈S,M, T 〉 →t 〈S,M〉
T = T1@T2
〈S,M, T1〉 →t1 〈S ′,M ′〉
〈S ′,M ′, T2〉 →t2 〈S ′′,M ′′〉
〈S,M, T1@T2〉 →t1@t2 〈S ′′,M ′′〉
t1 ≡ t2
〈S,M, T 〉 →t1 〈S ′,M ′〉
〈S,M, T 〉 →t2 〈S ′,M ′〉
Figure B.2: Symbolic Execution in LGhostRider
branch, and we can prove the theorem.
Next, we consider when `′ = H. If the branching statement goes to the same pc,
then based on the above discussion, we know the conclusions hold true. With out loss
of generality, we can consider when the branching instruction goes to different pc.
We first study the relationship between a trace and a trace pattern. We first prove
conclusion 6, i.e. t1 = t2. To prove this, we first define a new notion 〈S,M, T 〉 →t
〈S ′,M ′〉 as in Figure B.2.
Here, we use a special value ? to indicate a special block, that we do not care
about its content. By doing so, we can use only the DRAM part of a memory M to
218
make this definition. It is easy to see that evaluating a symbolic value requires only
scratchpad and memory corresponding to DRAM. We defined the DRAM projection
of scratchpad and memory as follows:
SD(k) =

(b, (D,n)) S(k) = (b, (D,n))
(?, (E, n)) S(k) = (b, (E, n))
undefined otherwise
Actually, the DRAM projection of scratchpad does not contain only DRAM, but also
contain the index information about ERAM. But all these information are assumed
to be public.
MD(l, n) =

M(l, n) l = D
undefined otherwise
It is easy to see, if 〈S,M, T 〉 →t 〈S ′,M ′〉, then we also have
〈SD,MD, T 〉 →t 〈S ′D,M ′D〉
It is worth mentioning that given two low-equivalent memories M1 ∼L M2 if
and only if M1D = M
2
D.
Based on these rules, we can prove the following lemmas.
Lemma 17. Given S,M, S ′,M ′, S ′′,M ′′, T1, T2, if T1 ≡ T2, and
〈S,M, T1〉 →t1 〈S ′,M ′〉
219
〈S,M, T2〉 →t2 〈S ′′,M ′′〉
then we have
t1 ≡ t2
S ′ = S ′′
M ′ = M ′′
Proof. We consider how T1 ≡ T2 is derived. Case 1. T1 = read(l, k, sv1) and
T2 = read(l, k, sv2). Then we know `safe sv1 and sv1 = sv2. Therefore we know
if (S, sv1) ⇓ n1 and (S, sv2) ⇓ n2, then n1 = n2. If l = D, then we know b =
M(l, n1) = M(l, n2), and thus t1 = read(n1, b) = read(n2, b) = t2. Further S
′ =
S[k → (b, (l, n1))] = S[k → (b, (l, n2))] = S ′′, and M ′ = M = M ′′. If l = E, then
t1 = eread(n1) = eread(n2) = t2. S
′ = S[k → (?, (l, n1))] = S[k → (?, (l, n2))] =
S ′′ and M ′ = M ′′ is trivial. It is impossible for l to be an ORAM bank.
Case 2. T1 = write(l, k, sv1) and T2 = write(l, k, sv2). If l = D or l = E,
then S = (b, (l, n)) (where b = ? if l = E), and t1 = write(n, b) = t2. Further
M ′ = M [(l, n) 7→ b] = M ′′, and S ′ = S = S ′′. It is impossible for l to be an ORAM
bank.
Case 3. T1 = T2 = o or T1 = T2 = F, then the conclusion is trivial.
Case 5. T1 = Tx@(Ty@Tz) and T2 = (Tx@Ty)@Tz. In this case, we know
〈S,M, Tx〉 →tx 〈S2,M2〉
220
〈S2,M2, Ty@Tz〉 →tyz 〈S ′,M ′〉
Further, we have
〈S2,M2, Ty〉 →ty 〈S3,M3〉
〈S2,M2, Tz〉 →tz 〈S ′,M ′〉
Therefore we have t1 = tx@(ty@tz). Further, we know
〈S,M, Tx〉 →t′x 〈S?,M?〉
〈S?,M?, Ty〉 →t′y 〈S??,M??〉
〈S??,M??, Tz〉 →t′z 〈S ′′,M ′′〉
By induction assumption, we know S? = S2, M? = M2, tx ≡ t′x. Further, by
induction assumption again, we have S?? = S3, M?? = M3, ty ≡ t′y. Finally, by
applying induction assumption once more, we finally have S ′ = S ′′, M ′ = M ′′, and
tz ≡ t′z. Therefore we know
t1 = tx@(ty@tz) ≡ (tx@ty)@tz ≡ (t′x@t′y)@t′z ≡ t2
Case 6. T1 = Tx@Ty and T2 = T
′
x@T
′
y. Then we know
〈S,M, Tx〉 →tx 〈S1,M1〉
221
〈S1,M1, Ty〉 →ty 〈S ′,M ′〉
and
〈S,M, T ′x〉 →t′x 〈S2,M2〉
〈S2,M2, T ′y〉 →t′y 〈S ′′,M ′′〉
Since Tx ≡ T ′x, we know by induction assumption, tx ≡ t′x, S1 = S2, and M1 = M2.
Then applying induction assumption again, we have ty ≡ t′y, S ′ = S ′′, and M ′ =
M ′′.
Lemma 18. Given a program I, register mappings R,R′, scratchpads S, S ′, and
memories M,M ′, if
` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T
where ` = H, T ≡ T ′ (for some T ′), R,R′, S, S ′,M,M ′ satisfy assumptions 1-7 in
Theorem 6 and
I ` 〈R, S,M, 0〉 →t 〈R′, S ′,M ′, |I|〉
then we have
〈SD,MD, T 〉 →t 〈S ′D,M ′D〉
Proof. We prove by induction on how I is typed.
Case T-SEQ. We know I = I1; I2, and
H ` I1 : 〈Υ, Sym〉 → 〈Υ′′, Sym′′〉;T1
222
H ` I2 : 〈Υ′′, Sym′′〉 → 〈Υ′, Sym′〉;T2
where T = T1@T2 Based on the discussion above, we know
I1 ` 〈R, S,M, 0〉 →t1 〈R′′, S ′′,M ′′, |I1|〉
I2 ` 〈R′′, S ′′,M ′, 0〉 →t2 〈R′, S ′,M ′, |I2|〉
where t = t1@t2. Therefore, by induction assumption, we know
〈SD,MD, T1〉 →t1 〈S ′′D,M ′′D〉
〈S ′′D,M ′′D, T2〉 →t1 〈S ′D,M ′D〉
Therefore, we have
〈SD,MD, T1@T2〉 →t1@t2 〈S ′D,M ′D〉
Case T-LOAD. I = ldb k ← l[r]. If l = D, then T = read(l, k, sv),
where sv = Sym(r). Suppose b = M(l, n), where n = R(r). By assumption 2,
we know (S, sv) ⇓ n, therefore we know (SD, sv) ⇓ n hold true as well. Further,
S ′ = S[k 7→ (b, (l, n))], and thus S ′D = SD[k 7→ (b, (l, n))]. We also know M ′D = MD,
and t = read(n, b). In conclusion, we have
〈SD,MD, T 〉 →t 〈S ′D,M ′D〉
If l = E, then we know T = read(l, k, sv) as well, where sv = Sym(r). By
223
assumption 2, we have (SD, sv) ⇓ n, where n = R(r). Combining with t = eread(n),
M ′D = MD, and S
′
D = SD[k 7→ (?, (l, n))], we get our conclusion.
If l ∈ ORAMBanks, then we know T = l, and t = l. Combining with
M ′D = MD, and S
′
D = SD, , the conclusion is trivial.
Case T-STORE. I = stb k. Suppose S(k) = (b, (l, n)). If l = D or l = E,
then SD(k) = (b, (l, n)) (where b = ? if l = E), T = write(l, k, sv), t = write(n, b),
and M ′D = MD[(l, n) 7→ b]. Therefore, we can get our conclusion.
If l ∈ ORAMBanks, similar to the T-LOAD case, we can get our conclusion.
Case T-STOREW. I = stw r1 → k[r2]. Since ` = H, we know slab(k) = H,
which implies k = E or k ∈ ORAMBanks. Therefore S ′D = SD and M ′D = MD.
Further, since T = F and t = f, we know
〈SD,MD, T 〉 →t 〈S ′D,M ′D〉
Case T-LOADW, T-IDB, T-BOP, T-ASSIGN and T-NOP. In all these
rules, T = F, and t = f. Further, it is easy to see that S ′ = S and M ′ = M in all
these rules. Therefore, the concusion
〈SD,MD, T 〉 →t 〈S ′D,M ′D, T 〉
holds true.
Case T-UP. The conclusion is trivial by induction assumption.
Case T-LOOP. This is impossible, since T-LOOP requires ` = L.
224
Case T-IF. I = ι1; It; ι2; If , where ι1 = br r1 rop r2 ↪→ n1 and ι2 = jmp n2.
Depending on the value of R(r1) and R(r2), it may jumps to one of the two branches.
If the true branch is taken, then we know t = f@t1@f, where
It ` 〈R, S,M, 0〉 →t1 R′, S ′,M ′, |It|〉
Since It is typable, we know, by induction assumption,
〈SD,MD, Tt〉 →t1 〈S ′D,M ′D〉
Since
〈SD,MD,F〉 →f 〈SD,MD〉
〈S ′D,M ′D,F〉 →f 〈S ′D,M ′D〉
and T = F@t1@F, we can derive our conclusion.
If the false branch is taken, then we know t = f@t2, where
If ` (R, S,M, 0)→t1 (R′, S ′,M ′, |If |)
Therefore, we know
〈SD,MD, Tf〉 →t2 〈S ′D,M ′D〉
225
Further, since T1@F ≡ T2, by Lemma 17, we have t1@f ≡ t2, which implies
f@t1@f ≡ f@t2
Therefore, we have
〈SD,MD, T 〉 →f@t2 〈S ′D,M ′D〉
Now, we get back to prove Theorem 2 for T-IF rule. Let us remind that
I = ι1; It; ι2; If .
W.L.O.G, we suppose
I ` (R1, S1,M1, 0)→f (R1, S1,M1, 1)
It ` (R1, S1,M1, 0)→tt (R′1, S ′1,M ′1, |It|)
I ` (R′1, S ′1,M ′1, |It|+ 1)→f (R′1, S ′1,M ′1, |I|)
and
I ` (R2, S2,M2, 0)→f (R2, S2,M2, |It|+ 2)
If ` (R2, S2,M2, 0)→tf (R′2, S ′2,M ′2, |If |)
By Lemma 18, we know
〈S1D,M1D, T1〉 →tt 〈S ′1D,M ′1D〉
226
〈S2D,M2D, T2〉 →tf 〈S ′2D,M ′1D〉
Since 〈S ′1D,M ′1D,F〉 →f 〈S ′1D,M ′1D〉, we have
〈S1D,M1D, T1@F〉 →tt@f 〈S ′1D,M ′1D〉
Further, by assumption 1 and 2, we know S1D = S2D and M1D = M2D. By
Lemma 17, we know
t1@f ≡ t2
S ′1D = S
′
2D
M ′1D = M
′
2D
Therefore, we have t1 = f@tt@f ≡ f@tf = t2, which is conclusion 6. Further, the
last two assertions show that conclusions 1 and 3 are true. So we only need to prove
conclusions 2, 4, and 5.
The first step is to prove conclusion 4. If `safe Sym(r) is not true, then
conclusion 4 is vacuum. Otherwise, since `′ = H, we know either ` = H or `const
Sym(r). In either case, assumption 4, we know (Si, Sym(r)) ⇓ Ri(r) for i = 1, 2.
Then, by induction, we conclude that conclusion 4 holds true: if (`′ = H∨ `const
Sym′(r))∧ `safe Sym′(r), then (S ′i, Sym′(r)) ⇓ R′i(r) (i = 1, 2). Since (` = H∨ `const
Sym′(r))∧ `safe Sym′(r) implies (`′ = H∨ `const Sym′(r))∧ `safe Sym′(r), we know
conclusion 4 is true. We can prove conclusion 5 similarly.
The next step is to prove conclusion 2, suppose Υ(r) = L, then by T-IF,
227
(since `′ = H) we know `safe Sym(r) and `const Sym(k). Therefore, by conclusion
4, we know (S ′1D, Sym(r)) ⇓ R1(r), and (S ′2D, Sym(r)) ⇓ R2(r). However, since
S ′1D = S
′
2D, we know R1(r) = R2(r). This means conclusion 4 implies conclusion 2.
Theorem 7. Given a program I in LT , such that ` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T ,
two memories M1,M2, two register mapping R1, R2, and two scratchpad mapping
S1, S2, if the following assumptions are satisfied:
1. M1 ∼L M2
2. Υ ` R1 ∼ R2
3. Υ ` S1 ∼ S2
4. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∨ `const Sym(r))∧ `ok Sym(r)⇒ (Si, Sym(r)) ⇓
Ri(r);
5. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∨ `const Sym(r))∧ `ok Sym(k)⇒ ∃n.(Si, Sym(k)) ⇓
n ∧ |nmod size(Υ(k)) | = idx(Si(k)).
and
I ` (R1, S1,M1, 0)→t1 (R′1, S ′1,M ′1, pc′)
I ` (R2, S2,M2, 0)→t2 (R′2, S ′2,M ′2, pc′′)
If |t1| = |t2|, then we have the following conclusions:
1. M ′1 ∼L M ′2
228
2. Υ′ ` R′1 ∼ R′2
3. Υ′ ` S ′1 ∼ S ′2
4. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∨ `const Sym(r))∧ `safe Sym′(r) ⇒
(S ′i, Sym
′(r)) ⇓ R′i(r);
5. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∨ `const Sym(r))∧ `safe Sym′(k) ⇒
∃n.(S ′i, Sym′(k)) ⇓ n ∧ |nmod size(Υ′(k)) | = idx(S ′i(k)).
6. t1 ≡ t2
Proof. Next, we prove the non-terminating case. Again, we suppose
` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T
and
I ` 〈R1, S1,M1, 0〉 →t1 〈R′1, S ′1,M ′1, pc1〉
I ` 〈R2, S2,M2, 0〉 →t2 〈R′2, S ′2,M ′2, pc2〉
where |t1| = |t2|. Then we prove by induction on how to derive L ` I : 〈Υ, Sym〉 →
〈Υ′, Sym′〉;T . If it is derived by applying one of rules T-LOAD, T-STORE, T-
LOADW, T-STOREW, T-BOP, T-ASSIGN, T-NOP, then we know |t1| = |t2| = 1,
and the conclusion follows directly from Theorem 6. If it is derived by applying
rule T-UP, then the conclusion is trivial. So we only need to consider rule T-IF,
T-LOOP, and T-SEQ.
229
Case T-SEQ. Suppose I = I1; I2. We prove by contradiction. With out loss
of generality, we suppose |t1| = |t2| is minimal such that t1 6≡ t2 or M ′1 6∼L M ′2.
There are two sub cases:
I1 ` (R1, S1,M1, 0)→t′1 (R′′1, S ′′1 ,M ′′1 , |I1|)
I2 ` (R′′1, S ′′1 ,M ′′1 , 0)→t′′1 (R′1, S ′1,M ′1, pc1 − |I1|)
where pc1 > |I1| and t1 = t′1@t′′1, Or
I1 ` (R1, S1,M1, 0)→t1 (R′′1, S ′′1 ,M ′′1 , pc1)
In the first case, by assuming |t1| is minimal, we know
I1 ` (R2, S2,M2, 0)→t′2 (R′2, S ′2,M ′′2 , |I2|)
I2 ` (R′2, S ′2,M ′′2 , 0)→t′′2 (R2, S2,M ′2, pc2 − |I1|)
where pc2 > |I1|. Then by induction assumption, we can prove that t′1 ≡ t′2 and
t′′1 ≡ t′′2, and thus t1 ≡ t2. In the second case, by induction assumption, we directly
prove that t1 ≡ t2.
Case T-LOOP. Suppose I = Ic; ι1; Ib; ι2. Since Ic is typable, by Lemma 16,
we know that By Lemma 16, we know either one of the following two cases happens:
Ic ` (R1, S1,M1, 0)→t11 (R′′1, S ′′1 ,M ′′1 , |Ic|)
230
I ` (R′′1, S ′′1 ,M ′′1 , |Ic|)→t21 (R′1, S ′1,M ′1, pc′)
or
Ic ` (R1, S1,M1, 0)→t1 (R1, S1,M ′1, pc′)
where pc′ ≤ |Ic|.
In the latter case, the conclusion follows by induction assumption. For the
former case, we know
Ic ` (R1, S1,M1, 0)→t11 (R′′1, S ′′1 ,M ′′1 , |Ic|)
I ` (R′′1, S ′′1 ,M ′′1 , |Ic|)→t21 (R′1, S ′1,M ′1, pc′)
Similarly, we have
Ic ` (R2, S2,M2, 0)→t12 (R′′2, S ′′2 ,M ′′2 , |Ic|)
I ` (R′′2, S ′′2 ,M ′′2 , |Ic|)→t22 (R′2, S ′2,M ′2, pc′′)
By Theorem 6, we have
1. M ′′1 ∼L M ′′2 ;
2. Υ ` R1 ∼ R2;
3. Υ ` S1 ∼ S2;
4. ∀r ∈ Registers, i ∈ {1, 2}.(` = H∧ `const Sym′(r))∧ `safe Sym′(r) ⇒
231
(S ′′i , Sym
′′(r)) ⇓ R′i(r);
5. ∀k ∈ BlockIDs, i ∈ {1, 2}.(` = H∧ `const Sym′(r))∧ `safe Sym′(k) ⇒
∃n.(S ′′i , Sym′(k)) ⇓ n ∧ |nΥ′′(k) |= idx(S ′′i (k)).
6. t11 ≡ t12
Further, we know I(|Ic|) = ι1 = br r1 rop r2 ↪→ n1, where Υ(r1)unionsqΥ(r2) v L, which
implies Υ(r1) = Υ(r2) = L. Therefore, we know R
′′
1(r1) = R
′′
2(r1) and R
′′
1(r2) =
R′′2(r2), which implies if
I ` (R′′i , S ′′i ,M ′′i , |Ic|)→f (R?i , S?i ,M?i , pc?i )
then pc?1 = pc
?
2, which is either |Ic| + 1 or |Ic| + n1 = |I|. If later, then we know
t1 = t
1
1@f ≡ t22@f = t2, and R′′i = Ri, S ′′i = Si, M ′′i = Mi (i = 1, 2). In this case,
conclusions 1-6 all hold true.
In the former case, there are still two sub cases: (1)
Ib ` (R′′1, S ′′1 ,M ′′1 , 0)→t21 (R31, S31 ,M31 , pc1)
Ib ` (R′′2, S ′′2 ,M ′′2 , 0)→t22 (R32, S32 ,M32 , pc2)
Then by induction, we know t21 ≡ t22, and therefore t1 ≡ t2, and all conclusions 1-5
are true.
(2)
Ib ` (R′′1, S ′′1 ,M ′′1 , 0)→t21 (R31, S31 ,M31 , |Ib|)
232
Ib ` (R′′2, S ′′2 ,M ′′2 , 0)→t22 (R32, S32 ,M32 , |Ib|)
I ` (R31, S31 ,M31 , 0)→t′1 (R′1, S ′1,M ′1, |Ib|)
I ` (R32, S32 ,M32 , 0)→t′2 (R′2, S ′2,M ′2, |Ib|)
where ti = t
1
i@f@t
2
i@f@t
′
i (i = 1, 2). Similar to Theorem 6, we can show the conclu-
sions 1-6 hold true for this case.
Case T-IF. Suppose I = ι1; It; ι2; If , where ι1 = br r1 rop r2 ↪→ n1. We need
the following lemma.
Lemma 19. If H ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉 : T , and
I ` (R, S,M, pc)→t (R′, S ′,M ′, pc′)
Then pc′ > pc.
Proof (sketch). Since while-statement cannot be typed in high security context, and
while-statement is the only place allowing jumping or branching back, in high secu-
rity context, the program counter will only increase.
If Υ(r1) = Υ(r2) = L, then we know
I ` (Ri, Si,Mi, 0)→f (Ri, Si,Mi, pc)
I ` (Ri, Si,Mi, pc)→t′i (R′i, S ′i,M ′i , pci)
where ti = f@t
′
i for i = 1, 2. In this case, we can prove the conclusions by induction
233
assumption.
If ` unionsqΥ(r1) unionsqΥ(r2) = H, then by Lemma 19, we know
I ` (Ri, Si,Mi, 0)→t′i (Ri, Si,Mi, |I|)
for i = 1, 2, where t1 and t2 are prefixes of t
′
1 and t
′
2. By Theorem 6, we know t
′
1 ≡ t′2.
Therefore, we know t1 ≡ t2. Conclusion 1-5 can be proven similar to Theorem 6.
Proof of Theorem 2. Suppose I is a program, Υ and Sym satisfy: (1) ∀r.Sym(r) =
? ∧ Υ(r) = L; and (2) ∀k.Sym(k) =? ∧ Υ(k) = D. There are Υ′ and Sym′, such
that
` ` I : 〈Υ, Sym〉 → 〈Υ′, Sym′〉;T
where ` = L, and
I ` (R0, S0,M1, 0)→t1 (R1, S1,M ′1, pc1)
I ` (R0, S0,M2, 0)→t2 (R2, S2,M ′2, pc2)
where M1 ∼L M2, ∀r.R0(r) = 0, ∀k.S0(k) = (b0, (D, 0)), and |t1| = |t2|. Clearly, we
have
1. M1 ∼L M2
2. Υ ` R0 ∼ R0;
3. Υ ` S0 ∼ S0;
234
4. ∀r ∈ Registers, k ∈ Blocks. 6`safe Sym(r)∧ 6`safe Sym(k);
Therefore, by applying Theorem 7, we have t1 ≡ t2, and M ′1 ∼L M ′2.
235
Appendix C: Proof of Theorem 3
We begin by discussing how to construct simA; the simulator simB is con-
structed similarly.
Since Alice does not have the view of Bob’s local data, and those data secret-
shared between them two, we define a special notion • as the values not observable
to Alice. We define the operations on top of • as follows:
• op v = • v op • = • • (v) = • m(•) = •
We define the following auxiliary functions accordingly:
(selectA(l, t, t
′), selectB(l, t, t′)) := select(l, t, t′)
readA(l, v) :=

v l v A
• otherwise
val(v, l) := v
val(m, l) := m
lab(v, l) := l
lab(m, l) := l
236
We then define Alice’s snapshot of a memory M , denoted as M ↓ A, in the
following:
Definition 11. Given a memory M , Alice’s snapshot of M , denoted as M ↓ A, is
defined as a memory such that
M ↓A(x) =

M(x) if M(x) = (v, l) where l v A
• otherwise
We further define the Alice-similarity property of two memories as follows:
Definition 12. We say two memories M1 and M2 are Alice-similar, denoted as
M1 ∼A M2, if and only if M1 ↓A = M2 ↓A.
Figure C.1 defines how simA evaluate an expression. The judgement in the
form of l `A 〈M, e〉 ⇓t v says that given a memory M , the simulator simA evaluates
an expression e to value v, producing memory trace t.
Figure C.2 and Figure C.3 defines how simA simulates the instruction- and
memory-traces until the next declassification. The judgement 〈Mi, Si〉 (i,t)−−→A 〈M ′i , S ′i〉
says that given a statement Si and a memory M , simA evaluates the program Si over
memory Mi and reduces to program S
′
i and memory M
′
i emitting Alice’s instruction
trace i and memory trace t.
The judgement 〈Mi, Si〉 (i,t)−−→
?
A 〈M ′i , S ′i〉 is similar to 〈Mi, Si〉
(i,t)−−→A 〈M ′i , S ′i〉,
but requires the last statement evaluated must be a declassification statement. We
emphasize that our rules enforce that the memory over which the program is eval-
uated must be Γ-compatible.
237
l `A 〈M, e〉 ⇓ta v
Sim-E-Const l `A 〈M,n〉 ⇓ n
Sim-E-Var
lab(M(x)) = l′ v = readA(l, val(M(x)))
t = selectA(l, read(x, v), x)
l `A 〈M,x〉 ⇓t v
Sim-E-Op
lab(M(xi)) = li l `A 〈M,xi〉 ⇓ti vi
t = t1@t2 v = v1 op v2 i = 1, 2
l `A 〈M,x1 op x2〉 ⇓t v
Sim-E-Array
lab(M(y)) = l′ l `A 〈M,y〉 ⇓t1 v
lab(M(x)) = l′′ v1 = val(M(x))
t2 = selectA(l, readarr(x, v1, v), x)
t = t1@t2 v2 = readA(l, val(M(x))(v))
l `A 〈M,x[y]〉 ⇓t v2
Sim-E-Mux
lab(M(xi)) = Nat li l `A 〈M,xi〉 ⇓ti vi i = 1, 2, 3
v1 = 0⇒ v = v2 v1 6= 0⇒ v = v3
v1 = • ⇒ v = • t = t1@t2@t3
l `A 〈M,mux(x1, x2, x3)〉 ⇓t v
Figure C.1: Operational semantics for simA
The simulator simA(M,S,D1, ..., Dn) runs as follows. Initially set M1 to be
M and S1 to be S. For each i = 1, ..., n, simA evaluates 〈Mi, Si〉 (i,t)−−→
?
A 〈M ′i , S ′i〉. If
Di = , then set Mi+1 to be M
′
i ; otherwise, Di = (x, v), set Mi+1 to be M
′
i [x 7→ v].
Finally, simA evaluates 〈Mn, Si〉 (i,t)−−→
?
A 〈M ′, S ′〉, and returns (i, t).
The following lemma shows that the semantics for simA generates the same
memory trace as the semantics for SCVM.
Lemma 20. If l ` 〈M, e〉 ⇓(ta,tb) v and Γ ` e : Nat l′ and l `A 〈M ′, e〉 ⇓t v′ and
M ∼A M ′, and l′ v l, and M and M ′ are Γ-compatible, then ta ≡ t and if l v A,
then v = v′. Otherwise v′ = •.
Proof. Prove by structural induction on e. If e = x, then Γ(x) = Nat l′. If l v A,
238
〈M,S〉 (i,t)−−→
?
A 〈M ′, S′〉
Sim-Declass
t = y i = declass(x, y)
〈M, O : x := declassl(y)〉 (i,t)−−→
?
A 〈M, O : skip〉
Sim-Seq
〈M,S1〉 (i,t)−−→
?
A 〈M ′, S′1〉
〈M,S1;S2〉 (i,t)−−→
?
A 〈M ′, S′1;S2〉
Sim-Concat
〈M,S, 〉 (i,t)−−→A 〈M ′, S′, 〉
〈M ′, S′〉 (i
′,t′)−−−→
?
A 〈M ′′, S′′〉
〈M,S〉 (i@i
′,t@t′)−−−−−−→
?
A 〈M ′′, S′′〉
Figure C.2: Operational semantics for statements in simA (part 1)
then v′ = val(M ′(x)) = val(M(x)) = v, therefore v = v′. Further t = read(x, v′) =
read(x, v) = ta if l v A. If l = B, then v′ = •, and t =  = ta. If l = O, then v′ = •,
and t = x = ta.
If e = n, then t =  = ta, and v
′ = n = v, and l = P v A.
If e = x1 op x2. Then we know l `A 〈M ′, xi〉 ⇓ti v′i, and 〈M,xi〉 ⇓(tia,tib) vi
for i = 1, 2. By induction assumption, we know ti ≡ tia, and thus t = t1@t2 ≡
t1a@t
2
a = ta. For its value, suppose Γ(xi) = Nat li, i = 1, 2, if l v A, then li v A
holds true, and by induction assumption, we know vi = v
′
i for i = 1, 2, and thus
v = v1 op v2 = v
′
1 op v2 = v
′. Otherwise, either or both v1 and v2 are •, and thus we
know v′ = •.
If e = x[y]. We first reason about the value. If l v A, then suppose Γ(y) =
Nat l′′, then l′′ v l′ v l v A according to Γ ` x[y] : Nat l′. Then we know v′1 =
val(M ′(y)) = val(M(y)) = v1. Further, we know (m′, l) = M ′(x) = M(x) = (m, l),
and thus v′ = get(m′, v′1) = get(m, v1) = v. If l 6v A, then v = •.
239
Then we reason about the trace. If l v A, then
t = read(y, v1)@readarr(x, v1, v) ≡ read(y, v′1)@readarr(x, v1, v′) = ta
If l = B, we have t ≡  ≡ ta. If l = O, we have t ≡ y@x ≡ ta.
For e = mux(x1, x2, x3), based on a very similar argument as for x1 op x2, we
can get the conclusion.
The following lemma further claims that if an expression has a type B, then
simulating it will generate no observable instruction traces and memory traces to
Alice.
Lemma 21. If Γ ` e : Nat l′, and B ` 〈M, e〉 ⇓t v, and M is Γ-compatible then
t ≡ .
Proof. Prove by structure induction on e. If e = x, then t =  by rule Sim-E-Var.
If e = x1 op x2. Suppose Γ ` xi : Nat li for i = 1, 2, then we know li v B.
Therefore ti ≡ , and thus t ≡ .
If e = x[y], the conclusion follows the fact that B ` 〈M, y〉 ⇓ v, and
selectA(B, readarr(x, v1, v), x) = .
If e = mux(x1, x2, x3), similar to binary operation, we know t ≡ .
Lemma 22. If B ` S, and 〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, S ′〉, where M is Γ-compatible,
then ia ≡ , ta ≡ , and M ∼A M ′
Proof. We prove by induction on the structure of S. If S = l : skip, then the
conclusion is trivial.
240
If S = l : x := e, then we know l = B and Γ(x) = Nat B. Then ia =  and
M ′ ∼A M follow trivially. According to Lemma 21, we can prove ta ≡ .
If S = O : x := oram(y) or S = O : x := declassl(y), then pc is required to be
P, so that the conclusion is vacuous.
If S = l : y[x1] := x2, then l = B, and thus ia = . Therefore ta = t1a@t2a@t
′
a,
where B ` 〈M,xi〉 ⇓(tia,tib) vi for i = 1, 2, and t′a = selectA(B,writearr(y, v1, v2), y) =
. Therefore t1a ≡ t2a ≡  according to Lemma 21. In conclusion, we have ta =
t1a@t2a@t
′
a ≡ . Finally, for memory, M ′ = M [y 7→ m′] ∼A M .
If S = l : if(x)then S1 else S2, then l = B, and Γ, B ` Si for i = 1, 2.
Suppose M(x) = (v, B), then 〈M,S〉 (,,i
′
b,t
′
b)−−−−−→ 〈M,Sc〉, where v = 1 ⇒ c = 1 and
v 6= 1⇒ c = 2. There are two cases: (1) M ′ = M and S = Sc, then the conclusion
is trivial; (2) 〈M,Sc〉 (i
′′
a ,t
′′
a ,i
′′
b ,t
′′
b )−−−−−−→ 〈M ′, S ′〉. In this case, by induction assumption, we
have M ′ ∼A M , i′′a ≡  and t′′a ≡ , so that ia = @i′′a ≡  and ta = @t′′a ≡ .
For S = l : while(x)do S ′, the conclusion can be proven similarly to the
if-case.
Finally, for S = S1;S2, we know either (1) 〈S1,M〉 (ia,ta,ib,tb)−−−−−−→ 〈S ′1,M ′〉; or (2)
〈S1,M〉 (i
′
a,t
′
a,i
′
b,t
′
b)−−−−−−→ 〈l : skip,M ′′〉 and 〈S2,M ′′〉 (i
′′
a ,t
′′
a ,i
′′
b ,t
′′
b )−−−−−−→ 〈S ′,M ′〉, where ia = i′a@i′′a,
and ta = t
′
a@t
′′
a. In both cases, the conclusions can be proven easily.
The following lemma is the main lemma saying that when evaluating over
Alice-similar memories, simA and SCVM will generate the same instruction traces
and memory traces, and produce Alice-similar memory profiles.
Lemma 23. If 〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M1, S ′〉 : D where Γ, P ` S, and M ∼A M ′,
241
and 〈M ′, S〉 (i,t)−−→A 〈M ′1, S ′′〉 (for D = ) or 〈M ′, S〉
(i,t)−−→
?
A 〈M ′1, S ′′〉 (for D 6= ),
then S ′ = S ′′, ia ≡ i and ta ≡ t. If D = , then M1 ∼A M ′1; otherwise, suppose
D = (x, v), then M1 = M
′
1[x 7→ v].
Proof. The conclusion S ′ = S ′′ can be trivially done by examining the corre-
spondence of each E- and S- rules and Sim- rules. Therefore, we only prove (1)
M1 ∼A M ′1, (2) ia ≡ i, and (3) ta ≡ t.
We prove by induction on the length of steps L toward generating declas-
sification event D. If L = 0, then we know S = O : x := declassl(y);S2 (or
O : x := declassl(y)). Since we assume Γ, P ` S, by typing rule T-Declass, we have
l 6= O, Γ(x) = Nat l. If l v A, then D[A] = (x, v), and thus M ′1[x 7→ v] ∼A M [x 7→
v] = M1. Further, we know ia = declass(x, y) = i, and ta = y = t. Second, if
lx = B, then M
′
1 = M
′ ∼A M = M1, ia = declass(x, y) = i, and ta = y = t.
We next consider L > 0, then S = S1;S2. Since (Sa;Sb);Sc is equivalent
to Sa; (Sb;Sc) in the sense that if 〈M, (Sa;Sb);Sc〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, S ′〉 : D, then
〈M,Sa; (Sb;Sc)〉 (i
′
a,t
′
a,i
′
b,t
′
b)−−−−−−→ 〈M ′, S ′〉 : D, where ia ≡ i′a, ib ≡ i′b, ta ≡ t′a, and tb ≡ t′b.
Therefore we only consider S1 not to be a Seq statement, then we know S1 = l : s1.
By taking one step, we only need to prove claims (1)-(3), then the conclusion can
be shown by induction assumption. In the following, we consider how this step is
executed.
Case l : skip. If S1 = l : skip, the conclusion is trivial, i.e. ia =  = i and
ta =  = t and M
′
1 = M
′ ∼A M = M1.
Case l : x := e. If S1 = l : x := e, ia = l : x := e = i. Then we show t ≡ ta.
242
If l v A, t ≡ ta directly follows Lemma 20. If l = B, then by Lemma 21, we have
t ≡  ≡ tb. If l = O, then we consider e separately. If e = y, then t = y@x = ta.
If e = y[z], then t = z@y@x = ta. If e = n, then t = x = ta. If e = y op z, then
t = y@z@x = ta. Finally, if e = mux(x1, x2, x3), then t = x1@x2@x3@x = ta.
Finally, we prove the memory equivalence. If l v A, then according to
Lemma 20, e evaluates to the same value v in the semantics, and in the sim-
ulator. Therefore M ′1 = M
′[x 7→ v] ∼A M [x 7→ v] = M1. If B v l, then
M ′1 = M
′ ∼A M ∼A M [x 7→ v] = M1. Therefore, the conclusion is true.
Case O : x := oram(y). It is easy to see that M ′1 = M
′ ∼A M ∼A M [x 7→ m] = M1,
and i = O : init(x, y) = ia. Suppose Γ(y) = Nat l, then we know l 6= O. If l v A,
then t = y@x = ta. Otherwise, l = B, then we know t = x = ta.
Case l : y[x1] := x2. By typing rule T-ArrAss, we know Γ(y) = Array l, Γ(x1) =
Nat l1, Γ(x2) = Nat l2, where l1 v A and l2 v A. If l v A, then we have ta =
read(x1, v1)@read(x2, v2)@writearr(a, v1, v2) = t, and ia = l : y[x1] := x2 = i.
For memory, M?? = M ′[y 7→ set(m, v1, v2)] ∼A M [y 7→ set(m, v1, v2)] = M?, where
(m, l) = M ′(y) = M(y), (v1, l1) = M(x1), and (v2, l2) = M(x2).
If l = B, then M ′1 = M
′ ∼A M ∼A M [y 7→ m′] = M1, i =  = ia, t =  = ta.
Case l : if(x)then S1else S2. If l = B, then according to Lemma 22, M
′
1 =
M ′ ∼A M ∼A M1, t ≡ ta, and i ≡ ia. If l v A, then i = l : if(x) = ia, and
t = read(x, v) = ta. Further, M
′
1 = M
′ ∼A M = M1. Therefore, the conclusion is
also true.
Case l : while(x)do S ′. For S1 = while(x)do Sb, the proof is very similar to the
if-statement.
243
Lemma 23 immediately shows that simA can simulate the correct traces.
Therefore Theorem 3 holds true. Q.E.D
C.1 Proof of Theorem 4
Theorem 4 is a corollary of the following theorem:
Theorem 8. If Γ, pc ` S, then either S is l : skip, or for any Γ-compatible memory
M , there exist ia, ta, ib, tb,M
′, S ′, D such that 〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, S ′〉 : D, M ′ is
Γ-compatible, and Γ, pc ` S ′.
Proof. We prove by induction on the structure of S. If S = l : skip, then the
conclusion is trivial.
If S = l : x := e, then Γ(x) = Nat l, Γ ` e : Nat l′, pcunionsq l′ v l. We discuss the
type of e. If e = x′, then we know Γ(x′) = Nat l′. Since M is Γ−compatible, we
know M(x′) = (v, l′), where v ∈ Nat. Therefore, 〈M,x′〉 ⇓(ta,tb) v, where (ta, tb) =
select(l, read(x′, v), x′), and thus 〈M,S〉 (ia,t
′
a,ib,t
′
b)−−−−−−→ 〈M ′, l : skip〉 : , where (ia, ib) =
inst(l, x := e), t′a = ta@t
′′
a, and t
′
b = tb@t
′′
b , where (t
′′
a, t
′′
b ) = select(l,write(x, v), x).
Further, M ′ = M [x 7→ (v, l)]. Therefore, M ′ is also Γ-compatible, and the conclusion
holds true. Similarly, we can prove that if Γ ` e : τ is derived using T-Const, T-Op,
T-Array, or T-Mux, then the conclusion is also true.
If S = O : x := declassl(y), then Γ(y) = Nat O, Γ(x) = Nat l where
l 6= O, and pc = P. Since M is Γ−compatible, we know M(y) = (v,O), M ′ =
M [x 7→ (v, l)]. Therefore M ′ is Γ−compatible. Further, ta = y = tb, ia = ib = O :
declass(x, y), D = select(l, (x, v), ), and 〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, O : skip〉, and we
244
know that Γ, P ` O : skip. Therefore the conclusion is true.
Similarly, we can prove the conclusion is true for S = O : x := oram(y).
For S = l : y[x1] := x2, then Γ(y) = Array l, Γ(x1) = Nat l1, Γ(x2) = Nat l2,
and pc unionsq l1 unionsq l2 v l. Since M is Γ−compatible, we know M(y) = (m, l), M(x1) =
(v1, l1), and M(x2) = (v2, l2). Therefore M
′ = M [y 7→ (set(m, v1, v2), l)] is also
Γ−compatible. Further, (t′a, t′b) = select(l,writearr(y, v1, v2), y), ta = t1a@t2a@t′a,
tb = t1b@t2b@t
′
b, and (ia, ib) = inst(l, y[x1] := x2). Therefore, 〈M,S〉
(ia,ta,ib,tb)−−−−−−→
〈M ′, l : skip〉, where we can prove Γ, pc ` l : skip easily. Therefore, the conclusion
is true.
For S = l : if(x)then S1 else S2, we have Γ(x) = Nat l. Therefore M(x) =
(v, l). If v = 1, then 〈M,S〉 ia,ta,ib,tb−−−−−→ 〈M,S1 where (ia, ib) = inst(l, if(x)) and
(ta, tb) = select(l, read(x, v), x). Further, we know Γ, l ` S1. Since pc v l, it is easy
to prove by induction that Γ, pc ` S1 is true as well. Therefore, the conclusion is
true. On the other hand, if v 6= 1, then 〈M,S〉 ia,ta,ib,tb−−−−−→ 〈M,S2〉. We can also prove
the conclusion.
The proof for S = l : while(x)do S ′ is similar to the branching-statement by
using rule S-While-True and S-While-False.
For S = S1;S2, then we know Γ ` S1. The conclusion directly follows the
induction assumption by applying rule S-Seq and rule S-Skip.
245
Appendix D: The hybrid protocol and the proof of Theorem 5
In this section, we present the hybrid protocol, and show it emulates the ideal
world functionality F . To start with, we present smaller ideal functionalities in G
used by the hybrid world protocol.
1. F (l1,l2)op are the ideal functionalities for binary operation op. They are param-
eterized by two type labels l1 and l2 from {P, A, B, O} indicating which party
provides the data to the functionality. Suppose the operation is x op y. l1
and l2 correspond to x and y respectively. If l1 is P , then both Alice and Bob
will hand in the value of x, and the functionality verifies these two values are
the same. If l1 is A (or B), then Alice (or Bob) hands in the value of x to the
functionality. If l1 is O, then both Alice and Bob hand in their secret shares
to the functionality respectively. The value of l2 has the same meaning but
is for the data source of y. These ideal functionalities output secret shares
of the result to Alice and Bob respectively. For example, F (P,A)op accepts input
x, y from Alice, and x from Bob and return the results [v]a to Alice and [v]b
to Bob. We denote this as ([v]a, [v]b) = F (P,A)op (x@y, x).
2. F (l1,l2,l3)mux are the ideal functionalities for the multiplex operations. The three
parameters l1, l2, and l3 have the same meaning as above, but correspond to
246
the three input of the multiplex operation. These functionalities also return
secret shares to Alice and Bob.
3. Fxoram for each array x is an interactive Oblivious RAM functionality. It sup-
ports three operations.
• initl to initialize the ORAM with a given array. l is from {P, A, B}. If l
is P or A, then Alice hands in her array. If l is B, then Bob hands in his
array.
• read to read the content for a given index. The index is provided as a
pair of secret shares from Alice and Bob. The output is also a pair of
secret shares, which are returned to Alice and Bob respectively.
• write to write a value into a given index. It takes four inputs: the secret
shares of the index and the secret shares of the values from Alice and
Bob respectively.
4. F ldeclass is the declassification function, which takes secret shares from Alice
and Bob as its input, and returns the revealed value to the party corresponding
to l.
The protocol ΠG is then presented in Figure D.1 and Figure D.2. During the
protocol’s execution, Alice and Bob consumes their instruction traces and memory
traces. Since the memory traces contain all information of the public memory and
their local memories, both Alice and Bob only store locally their secret shares [M ]A
and [M ]B and the instruction- and memory- traces.
247
Figure D.1 presents the rules for local execution. Since all local and public
data to be used in secure computation are contained in memory traces, Alice and
Bob do not maintain their local data and public data. The rules are in the form
of (i, t) → (, ), which means the instruction trace i and memory trace t will be
consumed. In each rule, only one local instruction, i.e. the security label l 6= O,
and its corresponding memory trace for each instruction will be consumed. It is not
hard to verify the following proposition:
Proposition 1. Assuming 〈M,S〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, S ′〉 :  and s is a statement in the
set {x := e, x[x] := x, if(x),while(x)}. If ia = l : s, where l 6= O, then (ia, ta) →
(, ); If ib = l : s, where l 6= O, then (ib, tb)→ (, ).
Note local execution rules only handle executing one instruction. The sequence
of multiple instructions are handled using rule H-LocalA, H-LocalB, and H-Seq
explained later.
Figure D.2 presents two parts. The first part consists the rules, in the form
of 〈[M ]A, ta, [M ]B, tb, e〉 ⇓ ([v]a, [v]b), which securely evaluate an expression e. [M ]A
and [M ]B are the mapping from variables to their secret shares, and ta and tb are
memory traces from Alice and Bob respectively. All information to evaluate e are
contained in [M ]A, [M ]B, ta, and tb. The result is in the format of ([v]a, [v]b), where
[v]a and [v]b are secret shares of the result for Alice and Bob respectively. The rules
restrict that ta and tb must be the memory traces generated by the ideal functionality
F when evaluating e.
248
Rule SE-Const deals with constant expression n. ([v]a, [v]b) can be acquired
by secret-sharing n, which is implemented using F (A,B)+ (n, 0). Rule SE-Var secret
shares the value of a variable expression x. The value of x can be extracted from
[M ]A and [M ]B, ta, or tb according to Γ(x). If Γ(x) is P or A, then ta = read(x, v),
and [v]a and [v]b can be computed using F (A,B)+ (v, 0). If Γ(x) is B, the computation
is similar, but Bob hands in his value v. If Γ(x) is O, then [M ]A(x) and [M ]B(x) can
be directly returned.
Rule SE-OP handles a binary operation x op y. It can be directly computed
using a binary operation functionality F (Γ(x),Γ(y))op . The input to the functionality is
[M ]A〈ta〉 and [M ]B〈tb〉, which is defined as follows. Suppose [M ] is a mapping from
variables to secret shares, and t is a trace. Then [M ]〈t〉 is defined inductively as
[M ]〈read(x, v)〉 = v [M ]〈x〉 = [M ](x) [M ]〈t1@t2〉 = [M ]〈t1〉@[M ]〈t2〉
Notice that [M ]〈t〉 is defined over only read(x, v), x, and concatenations of them.
This is because this notion is used for binary operation and multiplex, where array
read events and write events do not occur. The rule SE-MUX for multiplex operation
is similar.
For array expression y[x], there are two rules, SE-ArrVar and SE-L-ArrVar. If
Γ(y) = O, then evaluating y[x] is an ORAM read operation. Rule SE-ArrVar calls
the ORAM functionality Fyoram to get the secret shares ([v]a, [v]b). Otherwise, y[x]
can be computed locally, and rule SE-L-ArrVar handles this case.
249
The second part of the rules (Figure D.2) are for hybrid protocol, which are in
the form of 〈[M ]A, ia, ta〉, 〈[M ]B, ib, tb〉  〈[M ′]A, i′a, t′a〉, 〈[M ′]B, i′b, t′b〉 : D, meaning
that Alice and Bob keeping their shares of secret variables, i.e. [M ]A and [M ]B
respectively, execute over their simulated traces, i.e. ia and ta for Alice, and ib and
tb for Bob, evaluates to new shares [M
′]A and [M ′]B, and new traces i′a, t
′
a, i
′
b and
t′b, and generate declassification D, which is either  or (da, db).
Rule H-Assign deals with the instruction O : x := e. The trace must be in the
format of ta = t
′
a@x and tb = t
′
b@x, where (t
′
a, t
′
b) are the memory traces for Alice
and Bob to evaluate e. This rule first evaluates the expression e to get [v]a and [v]b.
Then it substitute the mapping for x in [M ]A and [M ]B accordingly.
Rule H-ORAM handles ORAM initialization instruction O : init(x, y). Either
of Alice’s or Bob’s memory trace must be readarr(y, 0,m(0))@...@readarr(y, l,m(l))@x,
where l = |m| − 1. From this trace, one party is able to reconstruct the memory m,
which is later fed into ORAM functionality Fxoram to initialize it.
Rule H-ArrAss handles the instruction O : y[x1] := x2. First, the secret shares
for evaluating xi are [v]ia and [v]ib for i = 1, 2 respectively. Then they are fed into
the ORAM functionality Fyoram to perform a write operation.
Rule H-Cond-While handles O : if(x) and O : while(x), which only consumes
the corresponding memory traces x, and does not modify [M ]A and [M ]B.
Rule H-Declass handles the instruction O : declass(x, y), which is the only in-
struction generating non-empty declassification. According to rule S-Declass, both
memory traces are y. It calls the declassification functionality FΓ(x)declass([M ]A(y), [M ]B(y))
to release the value of v to the party corresponding to Γ(x).
250
The rules discuss above handles only one instructions. There is a proposition
similar to Proposition 1 that holds true for hybrid rules. We start by introducing
the concept of consistency of secret-sharing mapping with a memory:
Definition 13. Given a type environment G, we say a pair of secret share mappings
[M ]A and [M ]B is consistent with a G-compatible memory M if and only if for all
x such that Γ(x) = O, M(x) = FPdeclass([M ]A(x), [M ]B(x)).
Now we are ready to present the following proposition.
Proposition 2. Assuming 〈M,P 〉 (ia,ta,ib,tb)−−−−−−→ 〈M ′, P ′〉 : . We use the notation s to
denote one element of the set {x := e, x[x] := x, if(x),while(x), init(x, y),declass(x, y)}.
If ia = ib = O : s, and [M ]A and [M ]B are consistent with M , then
〈[M ]A, ia, ta〉, 〈[M ]B, ib, tb〉 〈[M ′]A, i′a, t′a〉, 〈[M ′]B, i′b, t′b〉 : D,
and [M ′]A and [M ′]B are consistent with M ′.
The rest four rules deal with multiple instructions. H-Seq and H-Concat are
similar to S-Seq and S-Concat correspondingly. H-LocalA and H-LocalB are used
to execute local and public instructions.
We first show the hybrid protocol piG generates the same declassification events.
This can be easily proved by induction leveraging Proposition 2.
We then show that the hybrid protocol piG securely emulates the ideal world
functionality F (Theorem 5). We suppose Alice is the semi-honest adversary, and
Bob’s case is symmetric. To show this, the adversary of piG can learn ia, ta, a sequence
251
of secret share mappings [M ]A, [M
′]A, ..., and declassification events D1A, D
2
A, .... In
the ideal world, and adversary can learn all the declassification events D1A, ..., and
it can simulate to get ia and ta. Further the secret share mappings [M ]A, [M
′]A, ...
are indistinguishable to random bits. Therefore, the adversary in real world can
securely simulates the hybrid world’s adversary.
252
〈M,S〉 (i,t)−−→A 〈M ′, S′〉
Sim-Skip 〈M, l : skip;S〉 (,)−−→A 〈M,S〉
Sim-ORAM
lab(M(x)) = O lab(M(y)) = l
t = select(l, y, y)@x i = l : init(x, y)
〈M, O : x := oram(y)〉 (i,t)−−→A 〈M, O : skip〉
Sim-ArrAss
l `A 〈M,xj〉 ⇓tj vj j = 1, 2 lab(M(y)) = l
l v A⇒M ′ = M [y 7→ set(val(M(y)), v1, v2)]
B v l⇒M ′ = M
t = t1@t2@selectA(l,writearr(y, v1, v2), y)
i = selectA(l, l : y[x1] := x2, l : y[x1] := x2)
〈M, l : y[x1] := x2〉 (i,t)−−→A 〈M ′, l : skip〉
Sim-Assign
lab(M(x)) = l l `A 〈M, e〉 ⇓t′ v
l v A⇒M ′ = M [x 7→ (v, l′)]
B v l⇒M ′ = M
t = t′@selectA(l,write(x, v), x)
i = selectA(l, l : x := e, l : x := e)
〈M, l : x := e〉 (i,t)−−→A 〈M ′, l : skip〉
Sim-Cond
l `A 〈M,x〉 ⇓t′ v
v = 1⇒ c = 1 v 6= 1⇒ c = 2
i = selectA(l, l : if(x), l : if(x))
t = selectA(l, t
′, t′)
l v A⇒ S′ = Sc l = B⇒ S′ = P : skip
〈M, l : if(x)then S1else S2〉 (i,t)−−→A 〈M,S′〉
Sim-While-True
lab(M(x)) = l l v A
l `A 〈M,x〉 ⇓t v v 6= 0
S = l : while(x)do S′
〈M,S〉 (l:while(x),t)−−−−−−−−→A 〈M,S′;S〉
Sim-While-False
lab(M(x)) = l l v A
l `A 〈M,x〉 ⇓t v v = 0
S = l : while(x)do S′
〈M,S〉 (l:while(x),t)−−−−−−−−→A 〈M, P : skip〉
Sim-While-Ignore
S = B : while(x)do S′
〈M,S〉 (,)−−→A 〈M, l : skip〉
Figure C.3: Operational semantics for statements in simA (part 2)
253
〈[M ]A, ta, [M ]B, tb, e〉 ⇓ ([v]a, [v]b)
SE-Const
([v]a, [v]b) = F (A,B)+ (n, 0)
〈[M ]A, , [M ]B, , n〉 ⇓ ([v]a, [v]b)
SE-Op
([v]a, [v]b) = F (Γ(x),Γ(y))op ([M ]A〈ta〉, [M ]B〈tb〉)
〈[M ]A, ta, [M ]B, tb, x op y〉 ⇓ ([v]a, [v]b)
SE-Var
(ta, tb) = select(Γ(x), read(x, v), x)
Γ(x) v A⇒ ([v]a, [v]b) = F (A,B)+ (v, 0)
Γ(x) = B⇒ ([v]a, [v]b) = F (A,B)+ (0, v)
Γ(x) = O⇒ ([v]a, [v]b) = ([M ]A(x), [M ]B(x))
〈[M ]A, ta, [M ]B, tb, x〉 ⇓ ([v]a, [v]b)
SE-Mux
([v]a, [v]b) = F (Γ(x),Γ(y),Γ(z))mux ([M ]A〈ta〉, [M ]B〈tb〉)
〈[M ]A, ta, [M ]B, tb,mux(x, y, z)〉 ⇓ ([v]a, [v]b)
SE-ArrVar
ta = t
′
a@y tb = t
′
b@y〈[M ]A, t′a, [M ]B, t′b, x〉 ⇓ ([v′]a, [v′]b)
([v]a, [v]b) = Fyoram(read, [v′]a, [v′]b)
〈[M ]A, ta, [M ]B, tb, y[x]〉 ⇓ ([v]a, [v]b)
SE-L-ArrVar
ta = t
′
a@t
′′
a tb = t
′
b@t
′′
b Γ(y) 6= O
(t′′a, t′′b ) = select(Γ(y), readarr(y, v1, v), y)
Γ(y) v A⇒ ([v]a, [v]b) = F (A,B)+ (v, 0)
Γ(y) = B⇒ ([v]a, [v]b) = F (A,B)+ (0, v)
〈[M ]A, ta, [M ]B, tb, y[x]〉 ⇓ ([v]a, [v]b)
Figure D.1: Hybrid Protocol piG (Part I)
254
〈[M ]A, i, ta〉, 〈[M ]B, i, tb〉 〈[M ′]A, , 〉, 〈[M ]′B, , 〉 : D
H-ORAM
i = O : init(x, y) ta = t
′
a@x tb = t
′
b@x
(t′a, t
′
b) = select(Γ(y), arr(y,m))
Fxoram(initΓ(y),m)
〈[M ]A, i, ta〉, 〈[M ]B, i, tb〉 〈[M ]A, , 〉, 〈[M ]B, , 〉 : 
H-Assign
i = O : x := e 〈[M ]A, t′a, [M ]B, t′b, e〉 ⇓ ([v]a, [v]b)
ta = t
′
a@x tb = t
′
b@x
[M ′]A = [M ]A[x 7→ [v]a] [M ′]B = [M ]A[x 7→ [v]b]
〈[M ]A, i, ta〉, 〈[M ]B, i, tb〉 〈[M ′]A, , 〉, 〈[M ′]B, , 〉 : 
H-ArrAss
i = O : y[x1] := x2 ta = t
′
a@y tb = t
′
b
ta = t1a@t2a@y tb = t1b@t2b@y
(tia, tib) = select(Γ(xi), read(xi, vi), xi)
〈[M ]A, t′ia, [M ]B, t′ib, xi〉 ⇓ ([v]ia, [v]ib) i = 1, 2
Fyoram(write, [v]1a, [v]1b, [v]2a, [v]2b)
〈[M ]A, i, ta〉, 〈[M ]B, i, tb〉 〈[M ]A, , 〉, 〈[M ]B, , 〉 : 
H-Cond-While
i = O : if(x) or i = O : while(x) ta = tb = x
〈[M ]A, i, ta〉, 〈[M ]B, i, tb〉 〈[M ]A, , 〉, 〈[M ]B, , 〉 : 
H-Declass
i = O : declass(x, y) ta = tb = y
v = FΓ(x)declass([M ]A(y), [M ]B(y))
D = select(Γ(x), (x, v), (x, v))
〈[M ]A, i, ta〉, 〈[M ]B, i, tb〉 〈[M ]A, , 〉, 〈[M ]B, , 〉 : D
H-Seq
〈[M ]A, i′a, t′a〉, 〈[M ]B, i′b, t′b〉 〈[M ′]A, , 〉, 〈[M ′], , 〉 : D
ia = i
′
a@i
′′
a ib = i
′
b@i
′′
b
ta = t
′
a@t
′′
a tb = t
′
b@t
′′
b
〈[M ]A, ia, ta〉, 〈[M ]B, ib, tb〉 〈[M ′]A, i′′a, t′′a〉, 〈[M ′]B, i′′b , t′′b 〉 : D
H-Concat
〈[M ]A, ia, ta〉, 〈[M ]B, ib, tb〉 〈[M ′]A, i′a, t′a〉, 〈[M ′]B, i′b, t′b〉 : 
〈[M ′]A, i′a, t′a〉, 〈[M ′]B, i′b, t′b〉 〈[M ′′]A, i′′a, t′′a〉, 〈[M ′′]B, i′′b , t′′b 〉 : D
〈[M ]A, ia, ta〉, 〈[M ]B, ib, tb〉 〈[M ′′]A, i′′a, t′′a〉, 〈[M ′′]B, i′′b , t′′b 〉 : D
H-LocalA
(i, t)→ ia = i@i′a ta = t@t′a
〈[M ]A, ia, ta〉, 〈[M ]B, ib, tb〉 〈[M ]A, i′a, t′a〉, 〈[M ]B, ib, tb〉 : 
H-LocalB
(i, t)→ ib = i@i′b tb = t@t′b
〈[M ]A, ia, ta〉, 〈[M ]B, ib, tb〉 〈[M ]A, ia, ta〉, 〈[M ]B, i′b, t′b〉 : 
Figure D.2: Hybrid Protocol ΠG(PartII)
255
Bibliography
[1] http://humangenomeprivacy.org/2015.
[2] Graphlab. http://graphlab.org.
[3] Rsa distributed credential protection. http://www.emc.com/security/
rsa-distributed-credential-protection.htm.
[4] Trusted computing group. http://www.trustedcomputinggroup.org/.
[5] Private communication, 2014.
[6] Johan Agat. Transforming out timing leaks. In POPL, 2000.
[7] B. Alpern, M. N. Wegman, and F. K. Zadeck. Detecting equality of variables
in programs. In In POPL, 1988.
[8] Benny Applebaum. Garbling xor gates “for free” in the standard model. In
TCC, 2013.
256
[9] Aslan Askarov, Sebastian Hunt, Andrei Sabelfeld, and David Sands.
Termination-insensitive noninterference leaks more than just a bit. In ES-
ORICS, 2008.
[10] Gilles Barthe, Tamara Rezk, and Martijn Warnier. Preventing timing leaks
through transactional branching instructions. Electron. Notes Theor. Comput.
Sci., 153(2):33–55, May 2006.
[11] Mihir Bellare, Viet Tung Hoang, Sriram Keelveedhi, and Phillip Rogaway. Ef-
ficient Garbling from a Fixed-Key Blockcipher. In S & P, 2013.
[12] Marina Blanton, Aaron Steele, and Mehrdad Alisagari. Data-oblivious graph
algorithms for secure computation and outsourcing. In ASIA CCS, 2013.
[13] Dan Bogdanov, Sven Laur, and Jan Willemson. Sharemind: A Framework for
Fast Privacy-Preserving Computations. In ESORICS. 2008.
[14] Justin Brickell and Vitaly Shmatikov. Privacy-preserving graph algorithms in
the semi-honest model. In ASIACRYPT, 2005.
[15] Ran Canetti. Security and composition of multiparty cryptographic protocols.
Journal of Cryptology, 2000.
[16] Henry Carter, Benjamin Mood, Patrick Traynor, and Kevin Butler. Secure
outsourced garbled circuit evaluation for mobile devices. In Usenix Security
Symposium, 2013.
257
[17] Seung Geol Choi, Jonathan Katz, Ranjit Kumaresan, and Hong-Sheng Zhou.
On the security of the “free-xor” technique. In TCC, 2012.
[18] Richard Chow, Philippe Golle, Markus Jakobsson, Elaine Shi, Jessica Staddon,
Ryusuke Masuoka, and Jesus Molina. Controlling data in the cloud: out-
sourcing computation without outsourcing control. In ACM Cloud Computing
Security Workshop (CCSW), pages 85–90, 2009.
[19] Michael R. Clarkson, Stephen Chong, and Andrew C. Myers. Civitas: Toward a
secure voting system. In IEEE Symposium on Security and Privacy (Oakland),
2008.
[20] COMPCERT: Compilers you can formally trust. http://compcert.inria.
fr/.
[21] Convey Computer. The convey HC2 architectural overview. http:
//www.conveycomputer.com/files/4113/5394/7\097/Convey_HC-2_
Architectual_Overview.pdf.
[22] Bart Coppens, Ingrid Verbauwhede, Koen De Bosschere, and Bjorn De Sut-
ter. Practical mitigations for timing-based side-channel attacks on modern x86
processors. In Proceedings of the 2009 30th IEEE Symposium on Security and
Privacy, SP ’09, pages 45–60, Washington, DC, USA, 2009. IEEE Computer
Society.
[23] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.
Introduction To Algorithms, Third Edition. MIT Press, 2009.
258
[24] Dana Dachman-Soled, Chang Liu, Charalampos Papamanthou, Elaine Shi, and
Uzi Vishkin. Oblivious network RAM. Cryptology ePrint Archive, Report
2015/073, 2015. http://eprint.iacr.org/.
[25] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing
on large clusters. In OSDI, 2004.
[26] Zhenyue Deng and Geoffrey Smith. Lenient array operations for practical secure
information flow. In Proceedings of the 17th Computer Security Foundations
Workshop, pages 115–124. IEEE, June 2004.
[27] Christopher W. Fletcher, Marten van Dijk, and Srinivas Devadas. A secure
processor architecture for encrypted computation on untrusted programs. In
STC, 2012.
[28] Christopher W. Fletcher, Ling Ren, Albert Kwon, Marten van Dijk, Emil Ste-
fanov, and Srinivas Devadas. RAW Path ORAM: A low-latency, low-area hard-
ware ORAM controller with integrity verification. IACR Cryptology ePrint
Archive, 2014:431, 2014.
[29] Christopher W. Fletcher, Ling Ren, Xiangyao Yu, Marten van Dijk, Omer
Khan, and Srinivas Devadas. Suppressing the oblivious RAM timing channel
while making information leakage and program efficiency trade-offs. In HPCA,
pages 213–224, 2014.
259
[30] Jeffrey S. Foster, Robert Johnson, John Kodumal, and Alex Aiken. Flow-
Insensitive Type Qualifiers. ACM Transactions of Programming Languages
and Systems (TOPLAS), 28(6):1035–1087, November 2006.
[31] M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory
of NP-Completeness. W.H. Freeman, 1979.
[32] Tanguy Gilmont, Jean didier Legat, and Jean jacques Quisquater. Enhancing
security in the memory management unit. In EUROMICRO, 1999.
[33] O. Goldreich. Towards a theory of software protection and simulation by obliv-
ious RAMs. In STOC, 1987.
[34] O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game. In
STOC, 1987.
[35] Oded Goldreich and Rafail Ostrovsky. Software protection and simulation on
oblivious RAMs. J. ACM, 1996.
[36] Michael T. Goodrich and Michael Mitzenmacher. Privacy-preserving access of
outsourced data via oblivious RAM simulation. In ICALP, 2011.
[37] Michael T. Goodrich and Joseph A. Simons. Data-Oblivious Graph Algorithms
in Outsourced External Memory. CoRR, abs/1409.0597, 2014.
[38] S. Dov Gordon, Jonathan Katz, Vladimir Kolesnikov, Fernando Krell, Tal
Malkin, Mariana Raykova, and Yevgeniy Vahlis. Secure two-party computation
in sublinear (amortized) time. In CCS, 2012.
260
[39] S. Dov Gordon, Allen McIntosh, Jonathan Katz, Elaine Shi, and Xiao Shaun
Wang. Secure computation of MIPS machine code. Manuscript, 2015.
[40] J. Alex Halderman, Seth D. Schoen, Nadia Heninger, William Clarkson,
William Paul, Joseph A. Calandrino, Ariel J. Feldman, Jacob Appelbaum, and
Edward W. Felten. Lest we remember: cold-boot attacks on encryption keys.
Commun. ACM, 52(5):91–98, 2009.
[41] T. Hastie, R. Tibshirani, and J.H. Friedman. The Elements of Statistical Learn-
ing: Data Mining, Inference, and Prediction. 2001.
[42] Daniel Hedin and David Sands. Timing aware information flow security for a
javacard-like bytecode. Electron. Notes Theor. Comput. Sci., 141(1):163–182,
December 2005.
[43] Wilko Henecka, Stefan Ko¨gl, Ahmad-Reza Sadeghi, Thomas Schneider, and
Immo Wehrenberg. Tasty: tool for automating secure two-party computations.
In CCS, 2010.
[44] Andreas Holzer, Martin Franz, Stefan Katzenbeisser, and Helmut Veith. Secure
Two-party Computations in ANSI C. In CCS, 2012.
[45] Andreas Holzer, Martin Franz, Stefan Katzenbeisser, and Helmut Veith. Secure
two-party computations in ansi c. In CCS, 2012.
[46] Yan Huang, David Evans, Jonathan Katz, and Lior Malka. Faster secure two-
party computation using garbled circuits. In Usenix Security Symposium, 2011.
261
[47] Yuval Ishai, Joe Kilian, Kobbi Nissim, and Erez Petrank. Extending oblivious
transfers efficiently. In CRYPTO, 2003.
[48] Yuval Ishai, Joe Kilian, Kobbi Nissim, and Erez Petrank. Extending Oblivious
Transfers Efficiently. In CRYPTO 2003. 2003.
[49] Jif: Java + information flow. http://www.cs.cornell.edu/jif/.
[50] Anatolii Alexeevitch Karatsuba. The Complexity of Computations, 1995.
[51] Marcel Keller and Peter Scholl. Efficient, oblivious data structures for MPC.
In Asiacrypt, 2014.
[52] Florian Kerschbaum. Automatically optimizing secure computation. In CCS,
2011.
[53] Vladimir Kolesnikov and Thomas Schneider. Improved Garbled Circuit: Free
XOR Gates and Applications. In ICALP, 2008.
[54] Ben Kreuter, Benjamin Mood, Abhi Shelat, and Kevin Butler. PCF: A portable
circuit format for scalable two-party secure computation. In Usenix Security,
2013.
[55] Benjamin Kreuter, abhi shelat, and Chih-Hao Shen. Billion-gate secure com-
putation with malicious adversaries. In USENIX Security, 2012.
[56] David Lie, John Mitchell, Chandramohan A. Thekkath, and Mark Horowitz.
Specifying and verifying hardware for tamper-resistant software. In Proceedings
262
of the 2003 IEEE Symposium on Security and Privacy, SP ’03, Washington,
DC, USA, 2003. IEEE Computer Society.
[57] Chang Liu, Austin Harris, Martin Maas, Michael Hicks, Mohit Tiwari, and
Elaine Shi. Ghostrider: A hardware-software system for memory trace oblivious
computation. In ASPLOS, 2015.
[58] Chang Liu, Michael Hicks, and Elaine Shi. Memory trace oblivious program
execution. CSF ’13, pages 51–65, 2013.
[59] Chang Liu, Yan Huang, Elaine Shi, Jonathan Katz, and Michael Hicks. Au-
tomating Efficient RAM-model Secure Computation. In S & P, May 2014.
[60] Chang Liu, Xiao Shaun Wang, Kartik Nayak, Yan Huang, and Elaine Shi.
Oblivm: A programming framework for secure computation. In S & P, 2015.
[61] Chang Liu, Xiao Shaun Wang, Kartik Nayak, Yan Huang, and Elaine Shi.
Oblivm: A programming framework for secure computation, 2015. http://
www.cs.umd.edu/~elaine/docs/oblivmtr.pdf.
[62] Steve Lu and Rafail Ostrovsky. How to garble ram programs. In EUROCRYPT,
2013.
[63] Martin Maas, Eric Love, Emil Stefanov, Mohit Tiwari, Elaine Shi, Kriste
Asanovic, John Kubiatowicz, and Dawn Song. Phantom: Practical oblivi-
ous computation in a secure processor. In ACM Conference on Computer and
Communications Security (CCS), 2013.
263
[64] Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan
Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale
graph processing. In SIGMOD, 2010.
[65] Dahlia Malkhi, Noam Nisan, Benny Pinkas, and Yaron Sella. Fairplay: a secure
two-party computation system. In USENIX Security, 2004.
[66] John C. Mitchell and Joe Zimmerman. Data-Oblivious Data Structures. In
STACS, pages 554–565, 2014.
[67] David Molnar, Matt Piotrowski, David Schultz, and David Wagner. The pro-
gram counter security model: Automatic detection and removal of control-flow
side channel attacks. In ICISC, 2005.
[68] Moni Naor and Benny Pinkas. Efficient oblivious transfer protocols. In SODA,
2001.
[69] Moni Naor, Benny Pinkas, and Reuban Sumner. Privacy preserving auctions
and mechanism design. EC ’99, 1999.
[70] Kartik Nayak, Xiao Shaun Wang, Stratis Ioannidis, Udi Weinsberg, Nina Taft,
and Elaine Shi. GraphSC: Parallel Secure Computation Made Easy. In IEEE
S & P, 2015.
[71] Valeria Nikolaenko, Stratis Ioannidis, Udi Weinsberg, Marc Joye, Nina Taft,
and Dan Boneh. Privacy-preserving matrix factorization. In CCS, 2013.
264
[72] Valeria Nikolaenko, Udi Weinsberg, Stratis Ioannidis, Marc Joye, Dan Boneh,
and Nina Taft. Privacy-preserving ridge regression on hundreds of millions of
records. In S & P, 2013.
[73] Amir Pnueli, Michael Siegel, and Eli Singerman. Translation Validation. In
TACAS, 1998.
[74] Franc¸ois Pottier and Vincent Simonet. Information flow inference for ml. ACM
Trans. Program. Lang. Syst., 25(1):117–158, January 2003.
[75] Aseem Rastogi, Matthew A. Hammer, and Michael Hicks. Wysteria: A Pro-
gramming Language for Generic, Mixed-Mode Multiparty Computations. In S
& P, 2014.
[76] Aseem Rastogi, Piotr Mardziel, Matthew Hammer, and Michael Hicks. Knowl-
edge inference for optimizing secure multi-party computation. In PLAS, 2013.
[77] riscv.org. Launching the Open-Source Rocket Chip Gen-
erator, October 2014. https://blog.riscv.org/2014/10/
launching-the-open-source-rocket-chip-generator/.
[78] Alejandro Russo, John Hughes, David A. Naumann, and Andrei Sabelfeld.
Closing internal timing channels by transformation. In ASIAN, 2006.
[79] Andrei Sabelfeld and Andrew C. Myers. Language-based information-flow secu-
rity. IEEE Journal on Selected Areas in Communications, 21(1):5–19, January
2003.
265
[80] Elaine Shi, T.-H. Hubert Chan, Emil Stefanov, and Mingfei Li. Oblivious RAM
with O((logN)3) worst-case cost. In ASIACRYPT, 2011.
[81] Sergei Skorobogatov. Low temperature data remanence in static RAM. Tech-
nical Report UCAM-CL-TR-536, University of Cambridge, Computer Labora-
tory, June 2002.
[82] Ebrahim M. Songhori, Siam U. Hussain, Ahmad-Reza Sadeghi, Thomas Schnei-
der, and Farinaz Koushanfar. TinyGarble: Highly Compressed and Scalable
Sequential Garbled Circuits. In IEEE S & P, 2015.
[83] Emil Stefanov, Elaine Shi, and Dawn Song. Towards practical oblivious RAM.
In Network and Distributed System Security Symposium (NDSS), 2012.
[84] Emil Stefanov, Marten van Dijk, Elaine Shi, Christopher Fletcher, Ling Ren,
Xiangyao Yu, and Srinivas Devadas. Path ORAM – an extremely simple obliv-
ious ram protocol. In CCS, 2013.
[85] G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, and Srinivas
Devadas. Aegis: architecture for tamper-evident and tamper-resistant process-
ing. In International conference on Supercomputing, ICS ’03, pages 160–171,
2003.
[86] David Lie Chandramohan Thekkath, Mark Mitchell, Patrick Lincoln, Dan
Boneh, John Mitchell, and Mark Horowitz. Architectural support for copy and
tamper resistant software. SIGOPS Oper. Syst. Rev., 34(5):168–177, November
2000.
266
[87] Amit Vasudevan, Jonathan McCune, James Newsome, Adrian Perrig, and
Leendert van Doorn. CARMA: A hardware tamper-resistant isolated execution
environment on commodity x86 platforms. In Proceedings of the ACM Sym-
posium on Information, Computer and Communications Security (ASIACCS),
May 2012.
[88] Huy Vo, Yunsup Lee, Andrew Waterman, and Krste Asanovic´. A Case for
OS-Friendly Hardware Accelerators. In WIVOSCA, 2013.
[89] Abraham Waksman. A permutation network. J. ACM, 15, 1968.
[90] Xiao Shaun Wang, T-H. Hubert Chan, and Elaine Shi. Circuit ORAM: On
Tightness of the Goldreich-Ostrovsky Lower Bound.
[91] Xiao Shaun Wang, Yan Huang, T-H. Hubert Chan, Abhi Shelat, and Elaine
Shi. SCORAM: Oblivious RAM for Secure Computation. In CCS, 2014.
[92] Xiao Shaun Wang, Kartik Nayak, Chang Liu, T-H. Hubert Chan, Elaine Shi,
Emil Stefanov, and Yan Huang. Oblivious Data Structures. In CCS, 2014.
[93] Andrew Waterman, Yunsup Lee, David A. Patterson, and Krste Asanovic´.
The RISC-V Instruction Set Manual, Volume I: Base User-Level ISA. Techni-
cal Report UCB/EECS-2011-62, EECS Department, University of California,
Berkeley, May 2011.
[94] Lance Whitney. Microsoft Urges Laws to Boost Trust in the Cloud. http:
//news.cnet.com/8301-1009_3-10437844-83.html.
267
[95] Andrew Chi-Chih Yao. Protocols for secure computations (extended abstract).
In FOCS, 1982.
[96] Andrew Chi-Chih Yao. How to generate and exchange secrets. In FOCS, 1986.
[97] Samee Zahur and David Evans. Circuit Structures for Improving Efficiency of
Security and Privacy Tools. In S & P, 2013.
[98] Yihua Zhang, Aaron Steele, and Marina Blanton. PICCO: a general-purpose
compiler for private distributed computation. In CCS, 2013.
268