ABSTRACT Title of dissertation: EMERGING OPPORTUNITIES AND CHALLENGES IN HARDWARE SECURITY Yuntao Liu Doctor of Philosophy, 2020 Dissertation directed by: Professor Ankur Srivastava Department of Electrical and Computer Engineering Recent years have seen the rapid development of many emerging technologies in various aspects of computer engineering, such as new devices, new fabrication techniques of integrated circuits (IC), new computation frameworks, etc. In this dissertation, we study the security challenges to these emerging technologies as well as the security opportunities they bring. Specifically, we investigate the security opportunities in double patterning lithography, the security challenges in physical unclonable functions, and security issues in machine learning. Double patterning lithography (DPL) is an emerging fabrication technique for ICs. We study the security opportunities that DPL brings at the layout level. DPL is used to set up two independent mask development lines which do not need to share any information. Under this setup, we consider the attack model where the untrusted employee(s) who has access to only one mask may try to infer the entire circuit design or insert additional malicious circuitry into the design. As a countermeasure, we customize DPL to decompose the layout into two sub-layouts in such a way that each sub-layout individually exposes minimum information about the other and hence protects the entire layout from any untrusted personnel. Physical unclonable functions (PUF) are a type of circuits for which each copy (of the same circuit structure) has a unique and unpredictable functionality. The unpredictable behavior is caused by the manufacturing variations of electronic devices. However, for many state-of-the-art PUF designs, we show that the device variations can be estimated using an optimization-theoretic formulation and hence the PUF?s input-output behavior becomes predictable. Simulations show a sub- stantial reduction in attack complexity compared to previously proposed machine learning based attacks. Neural network (NN) is an emerging computation framework for machine learning (ML). It is increasingly popular for system developers to use pre-trained NN models instead of training their own because training is painstaking and sometimes requires private data. We call these pre-trained neural models neural intellectual properties (IP). Neural IPs raise multiple security concerns. On the one hand, as the IP user does not know about the training process, it is crucial to ensure the integrity of the neural IP. To this end, we investigate possible hidden malicious functionality, i.e., neural Trojans, that can be embedded into neural IPs and propose effective mitigation techniques. On the other hand, the neural IP owner may want to protect the NN model from reverse engineering attacks. However, it has been shown that hardware side-channels can be exploited to decipher the structure of neural networks. We propose both a novel attack approach based on cache timing side-channel and a defensive memory access mechanism. NNs also raise challenges to conventional hardware security techniques. Specif- ically, we focus on its challenge to logic locking, a strong key-based protection of hardware IP against untrusted foundries by injecting incorrect behavior into the digital functionality when the key is incorrect. We formally prove a trade-off between the amount of injected error and the complexity of Boolean satisfiability (SAT)-based attacks to find the correct key. Due to the inherent error resiliency of NNs, state-of-the-art logic locking schemes fail to inject enough error to derail NN-based applications while maintaining exponential SAT complexity. To fix this issue, we propose a novel secure and effective logic locking scheme, called Strong Anti-SAT (SAS), to lock the hardware and make sure that the NN modes undergo significant accuracy loss when any wrong key is applied. EMERGING OPPORTUNITIES AND CHALLENGES IN HARDWARE SECURITY by Yuntao Liu Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2020 Advisory Committee: Professor Ankur Srivastava, Chair/Advisor Professor Dana Dachman-Soled Professor Manoj Franklin Professor Gang Qu Professor Michael C. Fu ? Copyright by Yuntao Liu 2020 Dedication To my parents and my wife, for their love and support. ii Acknowledgments First and foremost, I would like to thank my advisor, Professor Ankur Sri- vastava, for his continuous guidance and support throughout my Ph.D. study. I?m extremely grateful for his patience, motivation and immense knowledge in all the time we spent on solving challenging research problems. He has always guided me to the correct path and supported my decision, which makes my Ph.D. experience productive and rewarding. The enthusiasm he has for his research will always be a great motivation for me. It is truly a pleasure to work with and learn from him. I would like to thank Professor Dana Dachman-Soled for her guidance and help on our collaborated project. I was deeply impressed by her academic rigor and enthusiasm. I learned a lot from our collaboration and my following research also benefited from this experience. I would like to express my gratitude to Professor Dana Dachman-Soled, Pro- fessor Manoj Franklin, Professor Gang Qu, and Professor Michael C. Fu for their time to serve on this committee and their valuable feedback on this dissertation. I would like to extend my thanks to all my colleagues in Professor Srivastava?s research group. My thanks first go to senior colleagues Yang Xie, Zhiyuan Yang, Chongxi Bao, Tiantao Lu and Caleb Serafy for showing me how to conduct research, set up experiments and write papers in the early stage of my Ph.D. study. Without them, I would not be able to adapt to the Ph.D. life so quickly. Thanks also go to my current colleagues Ankit Mondal, Abhishek Chakraborty, Michael Zuzak, Daniel iii Xing, and Nina Jacobsen for the inspiring research discussions, for the late nights we were working together before deadlines, and for all the fun we have had in our lab. Finally, I owe the deepest gratitude to my family. I want to express my appreciation to my parents for their endless love and devotion and to my wife Shimiao Liu who has been accompanying me during every hardship. iv Table of Contents Dedication ii Acknowledgments iii List of Tables ix List of Figures x List of Abbreviations xii List of Publications xiv 1 Introduction 1 1.1 Security and Trust Issues in the IC Supply Chain . . . . . . . . . . . 1 1.1.1 The IP/IC Design Protection Problem . . . . . . . . . . . . . 1 1.1.2 The Integrity Problem . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Focus of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Security Opportunities in Double Patterning Lithography . . . 4 1.2.2 Security Issues with Physical Unclonable Functions . . . . . . 5 1.2.3 Security Threats in Neural Networks . . . . . . . . . . . . . . 6 1.3 Contribution and Organization of the Dissertation . . . . . . . . . . . 8 2 Security Opportunities in Double Patterning Lithography 10 2.1 Fundamentals of Double Patterning Lithography . . . . . . . . . . . . 11 2.2 TFUE: the Trusted Foundry and Untrusted Employee Model . . . . . 14 2.3 Threat Model and Countermeasure under TFUE . . . . . . . . . . . . 15 2.3.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Countermeasure Formulation . . . . . . . . . . . . . . . . . . 17 2.4 Experiment Setup and Results . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 Security Vulnerabilities in Physical Unclonable Functions 25 3.1 Physical Unclonable Functions . . . . . . . . . . . . . . . . . . . . . . 26 3.1.1 Memristor and Memristor Crossbar PUF . . . . . . . . . . . . 26 3.1.2 The Arbiter PUF . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.3 The XOR Arbiter PUF . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Existing Attacks on PUFs . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.1 Attacks on ArbPUF and MXbarPUF . . . . . . . . . . . . . . 33 3.2.2 Attacks on XORArbPUFs . . . . . . . . . . . . . . . . . . . . 33 3.3 Attack Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.1 Linear Programming Based Weight Estimation . . . . . . . . . 34 3.3.2 Challenge Vector Generation using the Cutting-Plane Method 37 3.3.3 Side-Channel Boosted Optimization-Theoretic Attack . . . . . 39 v 3.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 In Noise-Free Conditions . . . . . . . . . . . . . . . . . . . . . 41 3.4.2 In Noisy Conditions . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Neural Trojans: an Integrity Issue with Neural Network IPs 47 4.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Existing Attacks on Neural Networks . . . . . . . . . . . . . . . . . . 52 4.2.1 Poisoning Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.2 Exploratory Attacks . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 Neural Trojans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.2 Properties of Neural Trojans . . . . . . . . . . . . . . . . . . . 56 4.3.3 Relevance to Existing Attacks . . . . . . . . . . . . . . . . . . 57 4.3.4 A Neural Trojan Example . . . . . . . . . . . . . . . . . . . . 57 4.4 Defense Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4.1 Input Anomaly Detection . . . . . . . . . . . . . . . . . . . . 59 4.4.2 Re-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4.3 Input Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 60 4.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.1 Neural IP Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.2 Input Anomaly Detection . . . . . . . . . . . . . . . . . . . . 64 4.5.3 Re-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.5.4 Input Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 66 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5 Secure Logic Locking for Hardware Running Neural Networks 70 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.1 Attack Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.2 SAT Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2.3 Existing Logic Locking Schemes . . . . . . . . . . . . . . . . . 80 5.3 Insufficiency of SFLL for Real-World Applications . . . . . . . . . . . 82 5.4 Fundamental Trade-off for All Logic Locking Schemes . . . . . . . . . 84 5.5 The Architecture and Properties of SAS . . . . . . . . . . . . . . . . 87 5.5.1 The SAS Block . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.5.2 Configuration 1: SAS with One SAS Block . . . . . . . . . . . 90 5.5.3 Configuration 2: SAS with Multiple Blocks . . . . . . . . . . . 91 5.6 Robust SAS: a Removal-Resilient SAS Variant . . . . . . . . . . . . . 94 5.6.1 RSAS Architecture and Relationship with SAS . . . . . . . . . 95 5.6.1.1 Altering the original circuit . . . . . . . . . . . . . . 96 5.6.1.2 Converting the SAS block into the RSAS block . . . 96 5.6.2 SAT Resilience and Effectiveness of RSAS . . . . . . . . . . . 97 5.7 Choosing Critical Minterms . . . . . . . . . . . . . . . . . . . . . . . 98 5.8 Experiments & Comparison with SFLL . . . . . . . . . . . . . . . . . 99 vi 5.8.1 SAT Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.8.2 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.8.3 Area, Power, and Delay Overhead of SAS, RSAS, and SFLL . 103 5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6 Cache Side-Channel-based Reverse Engineering of Neural Networks 106 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.1.1 GANRED Attack Overview . . . . . . . . . . . . . . . . . . . 109 6.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.2.1 Dimension Parameters of Deep Neural Networks . . . . . . . . 112 6.2.2 Cache Architecture Fundamentals . . . . . . . . . . . . . . . . 113 6.2.3 Cache Timing Side-Channel Attacks . . . . . . . . . . . . . . 114 6.2.3.1 Attacks based on Data Sharing . . . . . . . . . . . . 115 6.2.3.2 Attacks without Data Sharing . . . . . . . . . . . . . 116 6.2.4 Existing DNN Reverse Engineering Attacks and Defenses . . . 117 6.3 Attack Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.4 Attack Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.4.1 Obtaining DNN?s Cache Side-Channel Trace . . . . . . . . . . 121 6.4.2 GANRED Components . . . . . . . . . . . . . . . . . . . . . . 124 6.4.3 GANRED Framework . . . . . . . . . . . . . . . . . . . . . . 125 6.4.4 Validating Reverse Engineered Parameter Combinations . . . 129 6.4.4.1 Convolutional (Conv) Layers . . . . . . . . . . . . . 130 6.4.4.2 Fully Connected (FC) Layers . . . . . . . . . . . . . 132 6.4.5 Mathematical Justification of GANRED . . . . . . . . . . . . 133 6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.5.1 Attack Results . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7 Mitigating Reverse Engineering Attacks on Neural Networks 144 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.2 Attack Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.2.1 Attack Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.2.2 Attack Methodology . . . . . . . . . . . . . . . . . . . . . . . 148 7.2.3 Attack Complexity and Practicality . . . . . . . . . . . . . . . 150 7.3 Cryptographic Preliminaries . . . . . . . . . . . . . . . . . . . . . . . 151 7.4 Defense Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.4.1 Utilizing Oblivious Shuffle . . . . . . . . . . . . . . . . . . . . 155 7.4.1.1 Oblivious Shuffle Strategy . . . . . . . . . . . . . . . 155 7.4.1.2 Information Leakage . . . . . . . . . . . . . . . . . . 157 7.4.2 Address Space Layout Randomization . . . . . . . . . . . . . . 159 7.4.3 Dummy Memory Accesses . . . . . . . . . . . . . . . . . . . . 160 7.4.3.1 DumMA Without ASLR . . . . . . . . . . . . . . . . 160 7.4.3.2 DumMA With ASLR . . . . . . . . . . . . . . . . . 161 7.4.4 Summary of Defense Techniques . . . . . . . . . . . . . . . . . 161 vii 7.4.5 Attacking the Proposed Defense . . . . . . . . . . . . . . . . . 162 7.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 8 Conclusion and Future Research Directions 166 8.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.1.1 Security Opportunities in 3D IC . . . . . . . . . . . . . . . . . 167 8.1.2 Architecture and Application Aware Logic Locking . . . . . . 168 8.1.3 Hardware-Neural Network Co-Design for Security . . . . . . . 168 Bibliography 170 viii List of Tables 2.1 List of symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Experimental results on the proposed countermeasure . . . . . . . . . 23 3.1 ArbPUF results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 MXbarPUF results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 XORArbPUF results . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.1 Anomaly detection with various methods . . . . . . . . . . . . . . . . 64 5.1 Application benchmark details . . . . . . . . . . . . . . . . . . . . . . 82 5.2 Truth table of the locked circuit in Fig. 5.5 . . . . . . . . . . . . . . . 85 5.3 Illustration of how m critical minterms partition the set of wrong keys 89 5.4 Properties of the 2 configurations of SAS . . . . . . . . . . . . . . . . 94 5.5 Illustration of RSAS block?s functionality. A ??? stands for YRSAS = 1. 97 6.1 List of dimension parameters of a layer . . . . . . . . . . . . . . . . . 113 7.1 List of Hyper-parameters of Each Layer . . . . . . . . . . . . . . . . . 148 7.2 Benchmark DNNs and attack complexity using the attack in [43] . . . 151 7.3 Information leaked under various combinations of defense techniques . 162 7.4 The overhead of our proposed defense . . . . . . . . . . . . . . . . . . 164 ix List of Figures 1.1 An overview of IC supply chain and the security and trust issues . . . 2 2.1 Layout decomposition steps . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 The attack model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Layout decomposition results . . . . . . . . . . . . . . . . . . . . . . 24 3.1 A MXbarPUF with k arbiters . . . . . . . . . . . . . . . . . . . . . . 28 3.2 The switching activity of ArbPUF . . . . . . . . . . . . . . . . . . . . 31 3.3 Illustration of XORArbPUF architecture . . . . . . . . . . . . . . . . 33 3.4 An geometric illustration of our approach . . . . . . . . . . . . . . . . 39 3.5 Comparison of the # known CRPs and the running time to attack the 128-bit MXbarPUF to reach ? = 99% using our approach and the ML approach. Blue (upper) curve: our approach; red (lower) curve: ML approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.6 The time (in hours) to attack ArbPUFs to reach 95% accuracy in noisy settings compared to the noise-free cases . . . . . . . . . . . . . 45 4.1 An example of a neural network . . . . . . . . . . . . . . . . . . . . . 50 4.2 Examples of the MNIST (legitimate) images and printed fonts of ?4? (illegitimate) images . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 The architecture of the autoencoder . . . . . . . . . . . . . . . . . . . 61 4.4 The average trojan activation rate vs. re-training effort . . . . . . . . 65 4.5 The average classification accuracy of legitimate data vs. re-training effort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6 The original and reconstructed (a) legitimate and (b) illegitimate input images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1 The targeted attack model of logic locking . . . . . . . . . . . . . . . 77 5.2 The positive correlation between the error rate of wrong keys and the probability that SAT finds the correct key in certain iterations . . . . 81 5.3 Our experimental framework . . . . . . . . . . . . . . . . . . . . . . . 82 5.4 SAT resiliency vs. locking effectiveness trade-off. Left: PARSEC benchmarks. Right: NN benchmarks. . . . . . . . . . . . . . . . . . . 83 5.5 An example of logic locking, with the original circuit on the left and the locked circuit on the right. . . . . . . . . . . . . . . . . . . . . . . 85 5.6 The Architecture of SAS Configuration 1 with the Details of the SAS Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.7 Configuration 2 with l SAS blocks . . . . . . . . . . . . . . . . . . . . 92 5.8 A circuit locked with one RSAS block, equivalent to SAS Configura- tion 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.9 A circuit locked with multiple RSAS blocks, equivalent to SAS Con- figuration 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 x 5.10 Weight distribution (blue histogram, left Y axis) and application-level accuracy loss (red line, right Y axis) of LeNet and CaffeNet when the corresponding input is locked . . . . . . . . . . . . . . . . . . . . . . 99 5.11 Actual and expected number of SAT iterations of SAS and RSAS, compared with SFLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.12 The observed SAT iterations of SAS and SFLL by varying key length and critical minterm count. . . . . . . . . . . . . . . . . . . . . . . . 101 5.13 The application-level effectiveness of SAS/RSAS and SFLL on PAR- SEC and ML benchmarks . . . . . . . . . . . . . . . . . . . . . . . . 102 5.14 The application-level effectiveness of SAS/RSAS and SFLL on PAR- SEC and ML benchmarks . . . . . . . . . . . . . . . . . . . . . . . . 103 5.15 Area overhead of SAS and RSAS compared with SFLL . . . . . . . . 104 5.16 Power overhead of SAS and RSAS compared with SFLL . . . . . . . 104 5.17 Delay overhead of SAS and RSAS compared with SFLL . . . . . . . . 104 6.1 Illustration of a convolutional layer. ?*? indicates inner product, each computing an output neuron. . . . . . . . . . . . . . . . . . . . . . . 113 6.2 A two-way set associative cache example . . . . . . . . . . . . . . . . 114 6.3 The linear relationship between Conv layer running time and theo- retical cache misses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.4 Linear regression on # MAC operations and trace length of an FC layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.5 U(t) vs. t given typical parameters in Equation 6.8. . . . . . . . . . . 137 6.6 Upper images: the cache side-channel traces of AlexNet and VGG Nets and some ADNNs with correct parameters in the progress of GANRED. Lower images: the discriminator?s outputs corresponding to the ADNNs. X-axis: number of probes. . . . . . . . . . . . . . . . 140 6.7 GANRED attack results . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.1 Illustration of the attack in [43] per Section 7.2.2. . . . . . . . . . . . 148 7.2 Illustration of oblivious shuffle in the memory access pattern of a DNN. The variables that are observable to the attacker are also illustrated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.3 # IFPs solved and # feasible structures of the DNN benchmarks under various defense techniques . . . . . . . . . . . . . . . . . . . . . 163 xi List of Abbreviations AES Advanced Encryption System ArbPUF Arbiter Physical Unclonable Function ASLR Address Space Layout Randomization CDF Cumulative Density Function CNF Conjunctive Normal Form CNN Convolutional Neural Network Conv Convolutional CPU Central Processing Unit CRP Challenge-Response Pair DI Distinguishing Input DNN Deep Neural Networks DPL Double Patterning Lithography DT Decision Tree DumMA Dummy Memory Access FC Fully Connected FFT Fast Fourier Transform FGSM Fast Gradient Sign Method FSC Functionality Stripped Circuit GAN Generative Adversarial Nets GANRED GAN-based Reverse Engineering of DNNs GEMM Generalized Matrix Multiply GPU Graphic Processing Unit HD Hamming Distance HE Homomorphic Encryption HRS High Resistance State HT Hardware Trojan IC Integrated Circuit IFM Input Feature Map IFP Integer Feasibility Programming ILP Integer Linear Programming IP Intellectual Property IQP Integer Quadratic Programming xii JSM Jacobian Saliency Map LLC Last-Level Cache LRS Low Resistance State LRU Least Recently Used MAC Multiply and Cumulate ML Machine Learning MPC Multi-Party Computation NN Neural Network OFM Output Feature Map ORAM Oblivious Random Access Memory OS Oblivious Shuffle PF Point Function PUF Physically Unclonable Function RAM Random Access Memory RAW Read-after-Write RMS Root-Mean-Square RU Restore Unit SAS Strong Anti-SAT SARLock SAT Resistant Locking SAT Satisfiability SFLL Stripped Functionality Logic Locking SGX Software Guard Extensions SVM Support Vector Machine TFUE Trusted Foundry and Untrusted Employee TTL Tenacious and Traceless Locking VLSI Very Large Scale Integration XNOR Exclusive Nor XOR Exclusive Or XORArbPUF XOR Arbiter PUF xiii List of Publications Book Chapter: 1. Liu, Yuntao, Yang Xie, and Ankur Srivastava. ?Security in Emerging Fab- rication Technologies? Security Opportunities in Nano Devices and Emerging Technologies. CRC Press, 2017. 197-214. 2. Bao, Chongxi, Yang Xie, Yuntao Liu, and Ankur Srivastava. ?Reverse Engineering-Based Hardware Trojan Detection? The Hardware Trojan War. Springer, 2018. 269-288. Journal: 1. Liu, Yuntao, Yang Xie, Chongxi Bao, and Ankur Srivastava. ?A Combined Optimization-Theoretic and Side-Channel Approach for Attacking Strong Phys- ical Unclonable Functions.? IEEE Transactions on Very Large Scale Integra- tion Systems (TVLSI) 26.1 (2018): 73-81. 2. Abhishek Chakraborty, Nithyashankari Gummidipoondi Jayasankaran, Yun- tao Liu, Jeyavijayan Rajendran, Ozgur Sinanoglu, Ankur Srivastava, Yang Xie, Muhammad Yasin, Michael Zuzak. ?Keynote: a Disquisition on Logic Locking? IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) Early Access Article (2019) Conference: 1. Liu, Yuntao, Michael Zuzak, Yang Xie, Abhishek Chakraborty, and Ankur Srivastava, ?Strong Anti-SAT: Secure and Effective Logic Locking.? 21st In- ternational Symposium on Quality Electronic Design (ISQED 2020) 2. Liu, Yuntao, Ankit Mondal, Abhishek Chakraborty, Michael Zuzak, Nina Jacobsen, Daniel Xing, and Ankur Srivastava, ?A Survey on Neural Trojans.? 21st International Symposium on Quality Electronic Design (ISQED 2020) 3. Liu, Yuntao, Dana Dachman-Soled, and Ankur Srivastava, ?Mitigating Re- verse Engineering Attacks on Deep Neural Networks.? IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2019. 4. Charkraborty, Abhishek, Yuntao Liu, and Ankur Srivastava. ?TimingSAT: Timing Profile Embedded SAT Attack.? Proceedings of the 38th International Conference on Computer-Aided Design. ACM, 2018. 5. Liu, Yuntao, Yang Xie, and Ankur Srivastava. ?Neural Trojans.? Computer Design, 2017 IEEE International Conference on. IEEE, 2017. xiv 6. Liu, Yuntao, Chongxi Bao, Yang Xie, and Ankur Srivastava. ?Introducing TFUE: The Trusted Foundry and Untrusted Employee Model in IC Supply Chain Security? Circuits and Systems, 2017 IEEE International Symposium on. IEEE, 2017. 7. Liu, Yuntao, Yang Xie, Chongxi Bao and Ankur Srivastava. ?An Optimization- Theoretic Approach for Attacking Physical Unclonable Functions.? Proceed- ings of the 35th International Conference on Computer-Aided Design. ACM, 2016. 8. Xie, Yang, Chongxi Bao, Yuntao Liu, and Ankur Srivastava. ?A Security- Aware Design Scheme for Better Hardware Trojan Detection Sensitivity.? Mi- croprocessor and SOC Test and Verification (MTV), 2016 17th International Workshop on IEEE, 2016. Poster: 1. Liu, Yuntao and Ankur Srivastava. ?GANRED: GAN-based Reverse Engi- neering of DNNs via Cache Side-Channel.? Design Automation Conference. 2020. xv Chapter 1: Introduction 1.1 Security and Trust Issues in the IC Supply Chain Many security issues have been raised in the globalized supply chain of in- tegrated circuits (IC). We illustrate the supply chain and the security concerns in Fig. 1.1. For example, many companies sell intellectual properties (IP) to IC designers where the IP becomes a part of the IC design. In this case, the IP owner may wish to keep the design details of the IP as a secret and protect their IP from piracy by the IC designer or later stages of the supply chain. The same concern applies to the IC designers who do not own a foundry and outsource their designs for fabrication since the foundry is not under the designer?s control and can be potentially untrusted. On the other hand, and the IC designer wants to verify the integrity of the IP and make sure that there are not malicious back doors. System manufacturers also have this concern when they deploy a chip that is bought from the open market in their systems. 1.1.1 The IP/IC Design Protection Problem In the IC supply chain outlined in Fig. 1.1, the foundry knows every detail of the IC layout but is not in the designer?s control. Therefore, the foundry is usually considered untrusted since there may be adversaries in the. Foundries usually have 1 IP Owner Legend: A product B: A sends its product to B A concern B: A has a security concern IC layout about B Foundry Fabricated chips IC Designer IC piracy/ overbuilding/ Trojan insertion Purchased chips System Open Market Authenticity/ Designer integrity of the chip Figure 1.1: An overview of IC supply chain and the security and trust issues complete information about the design to be manufactured. Therefore, it is crucial for the designer to protect the design from the possible attacks carried out by such informed adversaries. The following attacks are possible: ? IC Overbuilding: The foundry may fabricate more copies of the IC than it is supposed to. The foundry may sell the overbuilt ICs directly into the open market and harm the designer?s interest. ? IP Piracy: The adversary is able to obtain the gate-level netlists of the design (i.e., hardware IPs) by reverse-engineering the layout provided by the designer. The gate-level netlists are the details of the functionality of the circuit which the attacker may resale or reuse. 2 Authentic IP cores chips IP piracy Integrity of IP 1.1.2 The Integrity Problem When the designer receives an IP from the IP suppliers, he/she wants to ensure the integrity of the IP. Likewise, the fabricated ICs from the foundry may also be examined to make sure that no malicious modifications were made to the original IC design. These malicious modifications in IPs or ICs are usually called Hardware Trojans (HT). HTs are usually hidden and only affect the outcome of the circuit under rare circumstances. When an HT is triggered, the output of the circuit deviates substantially from the correct output. Otherwise, the circuit works correctly. This makes the harms of HTs severe and the detection difficult. 1.2 Focus of this Dissertation In recent years, these hardware security and trust issues are becoming more complicated with the emergence of new technologies. These technologies are found in all the levels, from the physical level to the application level. For example, at the physical level, new fabrication technologies make the continuing scaling of semiconductor devices possible. At the device and circuit levels, new circuit designs that contain new devices provide many new desirable properties. At the application level, new computation frameworks have achieved much better performance on many tasks than state-of-the-art conventional algorithms. While most of these technologies are initially intended to overcome the drawbacks of their conventional 3 counterparts, they also come with a wide spectrum of implications on security including both opportunities and challenges. In this work, we investigate three specific problems at the intersection of security and emerging technologies: 1. Security opportunities in double patterning lithography (DPL), a new fabri- cation technology. 2. An attack against physical unclonable functions (PUF), a new type of circuit used for security. 3. Various security issues in neural networks (NN), a new computation framework that has been receiving a lot of interest in recent years. 1.2.1 Security Opportunities in Double Patterning Lithog- raphy Although the foundries are not controlled by the designer, it might not be fair to assume the foundry as an untrusted black box. Indeed, many foundries have developed trusted fabrication lines [64, 1] where the foundries will have legal obligations to prevent any security risk. However, untrusted individual employees may exist and try to steal the IP or insert Trojans into the design. In our work, we take advantage of double patterning lithography (DPL), an emerging IC fabrication technology. When DPL is used, the layout is decomposed into two sub-layouts for mask development. Each sub-layout satisfies the shape spacing requirement which is specified by the fabrication technology node. Each sub-layout is processed in an independent mask production line. We further assume that there is no 4 collusion between employees on different lines. In addition to satisfying the process requirement, the objective of our DPL includes ensuring minimum information leakage about the entire circuit from individual sub-layouts. We define a measure of ?distance? between the original layout and a sub-layout in order to quantify the information leakage. By decomposing in a way that maximizes the smaller distance between the one of the sub-layouts to the original layout, we ensure that minimum information about the original layout is leaked to the attacker no matter which sub-layout he/she has access to. We formulate an optimization-theoretic problem to determine how to partition the layout. 1.2.2 Security Issues with Physical Unclonable Functions Physical unclonable functions (PUFs) are a type of circuits for which each copy (with exactly the same circuitry) has a unique and unpredictable functionality. This uniqueness comes from the manufacturing variations of electronic devices. The unique functionality of the PUF is characterized by the its input-output behavior: (i) for different PUF copies, the output according to the same input should be independent; (ii) for the same PUF, the output to different inputs should be inde- pendent. We call the PUF input as the ?challenge?, output as ?response?, and the input-output behavior as ?challenge-response pair (CRP)?. A typical application of the PUF is authentication. We give an illustrative authentication example as follows. In this scenario, the verifier wants to tell if the remote chip (which has a PUF inside) is authentic. We suppose that the verifier is the chip designer and has recorded a 5 large number of the PUF?s CRPs before selling the chip. During authentication, the verifier (usually a server) sends a set of challenges to the remote device (who claims to have an authentic chip). In return, the remote device sends the PUF?s responses to these challenges. If the responses are all correct, the authentication will be successful. As the PUF?s response to each challenge is independent, for an n-bit input PUF, there are 2n unique CRPs and the used CRPs need not be reused if n is large enough. This authentication protocol can be used in other scenarios such as verifying the authenticity of the edge devices in the Internet of things. It can also help the chip designer identify pirate/overbuilt copies in the market. In this dissertation, the attack on PUFs, i.e., to predict the PUF?s responses to unknown challenges, is studied. We formulate a novel optimization-theoretic attacking approach and apply this attack on multiple types of PUFs. We found that our approach substantially reduces the complexity of the attack compared to conventional machine learning based attack. 1.2.3 Security Threats in Neural Networks In recent years, neural networks (NN) have been an emerging computation framework that outperforms conventional methods on many tasks. As the perfor- mance of NNs continues improving, they are becoming larger and deeper as well. As a result, training the networks is becoming more and more expensive in terms of both time and computation resources. Therefore, it becomes increasingly worthwhile to buy a neural network intellectual property (IP) from a third party instead of 6 training one from scratch by oneself. The ramifications of such a supply chain shift on security is two fold. On the one hand, the neural IP buyer may be concerned about the integrity of the neural IP. On the other hand, the neural IP vendor may wish to only allow the buyer to use the IP without disclosing the underlying neural network model. In this case, reverse engineering attacks on neural networks is of the neural IP vendor?s concern. In addition, due to the inherent error resilience of neural network models, error injection-based hardware IP protection schemes, such as logic locking [16], need to inject a higher amount of error in order to corrupt a neural network-based application. This will cause vulnerability to Boolean satisfiability (SAT)-based attacks [102], as we prove in this dissertation. There are the three aspects we focus on regarding the security of neural networks in this dissertation: 1. Neural Trojans: the malicious functionality that can be embedded in a neural IP. 2. Reverse engineering attacks on neural networks via hardware side-channels and countermeasures. 3. A novel logic locking scheme that has high application-level effects on neural network-based applications while maintaining high complexity against SAT- based attacks. 7 1.3 Contribution and Organization of the Dissertation Chapter 2 presents the opportunities in layout-level obfuscation brought by DPL. We introduce the trusted foundry and untrusted employee (TFUE) model and develop a specialized version of DPL to thwart the attacks from untrusted employees. Chapter 3 provides a novel optimization-theoretic attack on PUFs. Compared to existing machine learning-based attack, on average, our attack approach reduces the required PUF queries by 66% and takes 79.8% less time to achieve the same PUF response prediction accuracy. In Chapter 4, we identify and demonstrate the threats of neural Trojans and propose countermeasures including input anomaly detection, retraining, and input preprocessing. Experiments show each countermeasure effective. In Chapter 5, we theoretically prove a universal trade-off among all logic locking schemes between the hardware-level error injected by the locking circuitry and the complexity of SAT-based attacks. In order to protect hardware running neural network-based applications, which usually tend to be error-resilient, and maintain high SAT complexity, a novel logic locking scheme, called Strong Anti-SAT (SAS), is proposed. SAS achieves high application-level error injection effects while maintaining high SAT complexity. 8 We propose a cache side-channel based reverse engineering attack on neural networks in Chapter 6. Our attack specifically focuses on the structure of neural networks. Compared to existing approaches, our attack technique eliminates the need of shared main memory between the attacker and victim processes and achieve more accurate reverse engineering results. In Chapter 7, we consider an even stronger attacker of neural network reverse engineering and provide a countermeasure based on various techniques including oblivious shuffle, address space layout randomization, and dummy memory accesses. We are able to exponentially increase the neural network structure search space. Chapter 8 concludes this dissertation and discusses our future research plans. 9 Chapter 2: Security Opportunities in Double Patterning Lithography In this chapter, we explore how to improve the security of proprietary design details of integrated circuit (IC) during fabrication brought by double patterning lithography (DPL). Specifically, we introduce the trusted foundry and untrusted employee (TFUE) model and develop a specialized version of DPL to thwart the attacks from an untrusted employee. Almost all the previous studies on IC supply chain security label off-site foundries as untrusted. To thwart the attacks mentioned in Section 1.1 by untrusted foundries such as Trojan insertion, IC piracy, etc., countermeasures such as including logic obfuscation, split manufacturing, and post fabrication editing based approaches have been proposed [87, 81, 117, 93, 45]. Among these countermeasures, obfuscation based approaches suffer from various reverse-engineering-based attacks [102, 17], split manufacturing still requires a trusted foundry to fabricate some metal layers which incurs the cost of maintaining such a foundry. In 3D/2.5D integration tech- nology, different dice manufactured by different foundries may not align well which reduces the yield. Post fabrication editing has to be done chip-by-chip and reduces the reliability of the chip and suffers from low efficiency. 10 In real-world scenarios, however, the ?untrusted foundry? assumption may not be accurate. On one hand, the foundries are not necessarily incentivized to perform these attacks due to legal and financial liabilities. One the other hand, even for the foundries conventionally considered as trusted (e.g. owned by the designer), there can be rogue employees who has the ability to perform the attacks. Under the TFUE model, we try to secure the IC manufacturing process from the foundry?s perspective. 2.1 Fundamentals of Double Patterning Lithography In this section, we briefly introduce the principles of DPL. In the lithography step of IC fabrication, the minimum distance between two adjacent polygons in the layout is physically constrained by the wavelength of the light used in the lithography. We call this distance the minimum feature spacing, denoted as ?. The technology node of IC has scaled down to a point where ? can no longer provide enough resolution as required by the technology node. In order to continue the scaling of transistors, DPL has been developed. DPL requires that the layout be decomposed into two sub-layouts, each of which satisfies the minimum spacing constraint. A mask based on each sub-layout is made in an independent mask development line. How each sub-layout is processed in its mask development line is Similar to how an entire layout is processed, the sub-layouts need to be edited to ensure that the masks are compatible with the foundry?s fabrication process. After the masks are made, in the lithography stage, 11 (a) (b) (c) (d) (e) Figure 2.1: The steps of layout decomposition. (a): the original layout; (b): layout fracturing; (c): graph construction; (d): node splitting; (e): color assignment there are two lithography steps, each putting one sub-layout onto the silicon wafer. After the two lithography steps, the fabricated IC will be the same as the original layout. Layout decomposition algorithms that make each sub-layout satisfy the spac- ing constraint have been well developed. We describe the state-of-the-art layout decomposition algorithm proposed by Kahng et al. [48]. The layout of an IC is composed of rectilinear polygons. Each rectilinear polygon is called a ?feature?. The algorithm consists of layout fracturing, graph construction, node splitting, and graph update. These steps are illustrated in Fig. 2.1 and explained below. 1. Layout fracturing. In this step, any feature (rectilinear polygon) that is not a rectangle is partitioned into multiple rectangles. For example, in Fig. 2.1a, the upper right features is split into two rectangles as shown in Fig. 2.1b. Note that there can be more than one possible way to split a feature. 12 2. Graph construction: In this step, the conflict graph is built. In the conflict graph, each node represents a rectangle in the layout. An edge connecting a pair of nodes exists if the distance between the corresponding rectangles is smaller than ?. This indicates that the two rectangles cannot be on the same mask. A sample conflict graph is shown in Fig. 2.1c. 3. Node splitting: As an edge indicates a conflict, a node coloring problem can be formulated as follows. The nodes in the conflict graph need to be colored with two colors in a way that opposite colors must be assigned to two adjacent nodes (i.e., those connected with an edge). Therefore, if any odd-length cycle exists in the conflict graph, the node coloring problem will have no feasible solution, i.e., indicating that there is a conflict which must be resolved in order to get a feasible layout decomposition. An example of such a conflict is illustrated in Fig. 2.1c where we find a 5-node cycle. There is no feasible assignment of colors that will satisfy the constraint. In order to resolve this conflict, one node within the odd-length cycle needs to be split into two, indicating the splitting of a rectangle. 4. Graph update and color assignment: A new conflict graph is constructed based on the splitting. One possible way of splitting and the updated conflict graph is shown in Fig. 2.1d. Colors can be assigned when there is no more odd-length cycles in the graph. Fig. 2.1e gives one feasible color assignment. 13 How the color is often formulated as an integer linear programming (ILP) problem in [48] where the ILP focused on improving the yield of chips by enforcing the following criteria: (a) Penalizing design rule violations (b) avoiding assigning different colors in the same polygon (?stiches?). (c) Other factors that may improve the yield. 2.2 TFUE: the Trusted Foundry and Untrusted Employee Model In this section, we justify the TFUE model [55, 104]. We discuss why this model is more realistic than assuming every foundry to be either completely trusted or completely untrusted. ? It is indeed risky for a foundry to undermine or counterfeit the IC designs massively. If this is noticed by the designer, the foundry will have legal troubles and lose business in the future. Every foundry wants a larger market share and may not risk this. Instead, even the foundries conventionally assumed untrusted, including offshore foundries, want the designers to trust their in- tegrity. Bloom et al. proposed a way that allows the designer to attest the integrity of the foundry machinery and the produced chips to ensure their integrity [11]. The foundries? security policies and enforcement can also be reviewed by the designer and/or a licensed third party. 14 ? Foundries that are conventionally assumed as trusted are not necessarily free from the risk of untrusted employees. There can be employees who collude with the foundry?s or the designer?s competitors who aim at undermining the fabricated chips or pirating the design. In this case, even a conventionally trusted foundry still need security measures against untrusted employees. Due to the above reasons, we identify the foundry employees as the source of security threats in a foundry. We no longer classify the foundries into trusted ones and untrusted ones. Instead, we use TFUE as our primary assumption. The following sections will present the threat model under TFUE and coun- termeasures utilizing DPL. In the next section, we will explain what an untrusted foundry employee can can do to undermine the IC supply chain security and explore the opportunities made possible by DPL to obfuscate the layout-level design details and defend the attacks. 2.3 Threat Model and Countermeasure under TFUE 2.3.1 Threat Model We consider the threat model and countermeasures in the context of DPL since DPL is necessary for state-of-the-art technology nodes. The layout decomposition process can be automated, no employee needs to access the entire layout obtained from the designer before the decomposition. Secure machinery is used to decompose the layout into two sub-layouts. The two sub-layouts are then developed in two independent mask production lines by two independent groups of employees. The 15 Figure 2.2: The attack model in the context of DPL under TFUE independent mask production lines do not share any layout information. We further assume that there is no collusion between the employees on different lines. Each sub-layout is edited in its line to ensure that the mask is manufacturable. However, when an employee edits the sub-layout, he/she has the ability to perform various attacks such as Trojan insertion and piracy. In this context, we assume the following about the attacker: ? The attacker who is an employee of the foundry who works on one of the independent mask production lines. His/her knowledge about the entire IC design is limited to the what is on the sub-layout processed by that line. ? There is no collusion between employees on different lines. ? Although the employee?s job is to edit the sub-layout to improve the manu- facturability of the mask, he/she can actually make any change to the layout. ? The attacker may try to recover the original layout from the sub-layout that he/she works on and steal the entire chip design. This attack model is illustrated in Fig. 2.2. 16 2.3.2 Countermeasure Formulation the attacker needs to know to some extent about the original layout in order to insert Trojans into or counterfeit the IC design. In the attack model we consider, such knowledge is based on the inspection into one sub-layout. Intuitively, each sub-layout should look as ?different? as possible from the original layout in order to leak minimum information about the original layout. We will define a metric of ?distance? between two layout images, based on which we propose a variant of layout decomposition algorithm which maximizes the minimum distance between each sub-layout and the original layout. We follow the following convention of notations: lower case letters denote scalars (normal) and vectors (bold), and capital ones denote matrices. For example, x is a scalar, x is a vector and X is a matrix. We use lower scripts to denote the indices of vectors and matrices. A comprehensive list of symbols is given in Table 2.1. The symbols will be explained again when used in equations. Our proposed countermeasure is a customized layout decomposition algorithm which follows the one specified in Section 2.1 up to node splitting until every conflict is resolved. We customize the color (sub-layout) assignment algorithm. We consider the layout of the design as a binary image. For each pixel in the image, its value is 1 it resides on a feature and the value is 0 otherwise. In order to evaluate the distance between two images, we need a metric of ?distance? in the first place. We use their difference in the frequency domain to characterize the distance. We transform the layout images using 2-dimensional 17 Table 2.1: List of symbols Symbol and domain Meaning ? ? R++ the minimum feature spacing n the number of rectangles in the layout G = (V,E) the conflict graph V = {ck|k = 1, . . . , n} the set of nodes (rectangles) in the conflict graph E {(cu, cv)| the distance between cu and cv ? ?} p ? Z++ the number of elements in each row and column of the Fourier coefficient matrices F l ? Cp?p, l = 0, 1, 2 the Fourier coefficient matrix of the FFT of the layout, with l = 0, 1, 2 corresponding to the original layout, the first sub-layout, and the second sub-layout, respectively F? k ? Cp?p, k = 1, . . . , n the Fourier coefficient matrix of the FFT of the k-th rectangle?s layout. sl, l = 1, 2 the distance between the original layout and sub- layout l x ? {0, 1}n the vector indicating the sub-layout that each rectan- gle belongs to, where xk = 1 if rectangle ck is assigned to the first sub-layout, and xk = 0 if ck is assigned to the second sub-layout x? ? {0, 1}n = 1? x Al, l = 0, 1, 2 the real parts of F l A?k, k = 1, . . . , n the real parts of F? k Bl, l = 0, 1, 2 the imaginary parts of F l B?k, k = 1, . . . , n the imaginary parts of F? k ?(ij) ? Rn ?(ij) = (A?1 , A?2ij ij , . . . , A?n Tij) ?(ij) ? Rn ?(ij) = (B?1 2 n Tij , B?ij , . . . , B?ij) 18 Fourier transform (2D-FFT): ?w l1 ? F ?2?j( x?+ y? ) ?? = Ixye w l wl x=1 y=1 Let F 0, F 1 and F 2 be the Fourier coefficient matrices of the original layout and the two sub-layouts, respectively. Similar to the 1D-FFT, the linear property also holds for the 2D-FFT, i.e., F 0 = F 1 + F 2 This above equation must also hold element-wise, i.e., F 0ij = F 1 2 ij + Fij for any i, j = 1, . . . , p. As the (sub-)layout comprises of multiple rectangles, its 2D-FFT Fourier coefficient matrix also equals the summation of those of its rectangles, i.e., ?n ?n ?n F 0 = F? kij ij, F 1 ij = xkF? k ij, and F 2 ij = x?kF? k ij k=1 k=1 k=1 Definition 2.1 (Distance). The distance between two binary images is defined as the root-mean-square (RMS) value of the differences of their 2D-FFT coefficient matrix elements. ???? p p1 ??s = |F 0 ? F l |2l (2.1) p2 ij ij i=1 j=1 where s1 and s2 stand for the distances between the first/second sub-layout and the original layout, respectively. As we mentioned earlier, the distance is a measure of the attacker?s difficulty to infer the entire layout from a sub-layout. Therefore, a larger difference indicates more security and it is desirable to partition the layout such that both distances are large, i.e., no matter which sub-layout the attacker 19 can access, he/she always has great difficulties to attack. However, there may be a trade-off between the two distances. Considering this, the objective should be maximizing the smaller distance among the two (i.e., s1 and s2): max min{s1, s2} x?{0,1}n This can be transformed into the following equivalent form: ?p ?p ?p ?p max min{ |F 2 |2, |F 1 2ij ij| } (2.2) x?{0,1}n i=1 j=1 i=1 j=1 Note that each Fourier coefficient is a complex number. The relationship between the magnitudes of the coefficient and its real and imaginary parts is |F l |2ij = (Alij)2 + (Blij) 2 for l = 0, 1, 2, where Alij and B l ij are the real and imaginary parts of F l ij, respectively. The real?and imaginary parts also s?atisfy linearity individually:n n A1 = A?k (ij)T 2 k (ij)Tij ijxk = ? x, Aij = A?ijx?k = ? x?, ?k=1 ?k=1n n B1ij = B? k x = ?(ij)Tij k x, B 2 k (ij)T ij = B?ijx?k = ? x?. k=1 k=1 where ?(ij) = (A?1 , A?2 , . . . , A?n )T and ?(ij) = (B?1 , B?2 n Tij ij ij ij ij, . . . , B?ij) . The above equa- tions give us each 2D-FFT coefficient, including the real and imaginary parts, of each sub-layout. To obtain the quantities that are required in (2.2), we have ?p ?p ?p ?p ?p ?p |F 1ij|2 = (A1 2ij) +(B1 2 T (ij) (ij)T (ij) (ij)T Tij) = x ( ? ? +? ? )x = x Qx i=1 j=1 i=1 j=1 i=1 j=1 where ?p ?p Q = ?(ij)?(ij)T + ?(ij)?(ij)T (2.3) i=1 j=1 For the same reason, ?p ?p |F 2 2 Tij| = x? Qx? i=1 j=1 20 Now the problem in (2.2) has been transformed into an integer quadratic program- ming (IQP) problem: max min{xTQx, x?TQx?} x?{0,1}n subject to x? = 1? x (2.4) xi + xj = 1 if (ci, cj) ? E This problem is indeed difficult to solve for two reasons. First, an integer program- ming problem is hard to solve in general. Second and more importantly, even if we relax the problem (i.e., the domain of x) to be continuous, it is still difficult. By (2.3) we know that Q  0. Therefore, xTQx and x?TQx? are convex functions of x. However, in general, the pointwise minimum of two convex functions, like the one in (2.4), is neither convex nor concave. This makes gradient-based algorithms not applicable for this problem since we may get stuck at a local minimum, not the global one. Fortunately, we found a good approximation of this problem. By inspecting Q, We found that almost every non-zero elements in Q are on the diagonal, i.e., Q is very sparse off-diagonal. In fact, for any benchmark, non-zero off-diagonal elements are less than 1%. In order to make the problem easier to solve, we simplify the problem by only considering the diagonal elements in Q. This is not likely to result in significant error. Since each element in x is either 0 or 1, when we ignore all the off-diagonal non-zero elements in Q, we have the following approximation: xTQx ? xTdiag(Q T11, . . . , Qnn)x = d x 21 where d = (Q , . . . , Q )T11 nn Then, the problem in (2.4) is approximately transformed into max min{dTx,dT x?} x?{0,1}n subject to x? = 1? x (2.5) xi + xj = 1 if (ci, cj) ? E The problem in (2.5) is a well-studied integer linear programming problem, and there exists many efficient heuristic algorithms that can get good solutions in practice. 2.4 Experiment Setup and Results We describe how we evaluate our proposed DPL algorithm by experiments in this section. It can be shown that the distances defined in Eq. (2.1) can be expressed as 1? 1? s = xT1 Qx, and s = x?T2 Qx?, p p If our approximation (i.e., dropping the non-zero off-diagonal elements in Q) is reasonable, we should have s21 + s 2 2 ? 1 1 1 (x + x?)Td = 1Td ? 1TQ1 p p p In order to verify our approximation, we define smax as 1? s Tmax = 1 Q1 p We can verify whether s2 + s2 ? s21 2 max holds. If so, we can be confident about the approximation. 22 Table 2.2: Experimental results on the proposed countermeasure Benchmark Distances Run Polygons Rectangles s2 21 s2 s 2 max Time (s) 1 200 425 461 425 883 1.174 2 510 870 669 609 1272 2.004 3 990 1596 792 734 1558 4.807 4 1989 3094 1143 1167 2313 20.26 5 5081 8398 2257 2089 4339 348.0 Our approach is evaluated on 5 benchmarks up to 5000+ polygons and 8000+ rectangles. s21, s 2 2, and s 2 max as well as the running time for solving problem (2.5) are recorded. It is shown in Table 2.2 that the s21 and s 2 2 that we obtain by solving Eq. (2.5) are close to each other and sum up approximately equal to s2max, indicating that our formulation in (2.5) approximates the original problem (2.4) well. As s2 + s21 2 is roughly a fixed value (s2max), making them close to each other makes sure that the smaller between them is maximized. In this way, the attacker?s difficulty is maximized, no matter he/she tries to insert Trojans into the design and/or stealing the design. The layout decomposition result with an illustrative benchmark is shown in Fig. 2.3. 2.5 Summary In this chapter, we explored what opportunities DPL can bring to enhance the security of IC supply chain. Specifically, we investigate this problem under the TFUE model. Based on the state-of-the-art DPL algorithm, we developed our 23 (a) (b) (c) (d) Figure 2.3: An illustrative benchmark and its layout decomposition results. (a)layout decomposition result. Each color indicates a sub-layout; (b) the 2D-FFT of the original layout; (c) the 2D-FFT of the first sub-layout (red in (a)); (d) the 2D-FFT of the second sub-layout (green in (a)) and color bar: the magnitude decreases from top to bottom. version which takes security into consideration. Along with the secure machinery and security policies of the foundry, our approach of DPL can essentially make it hard for the attacker to perform any attack. 24 Chapter 3: Security Vulnerabilities in Physical Unclonable Functions Physical unclonable functions (PUFs) were first proposed by Gassend et al. in 2002 [28] where the manufacturing variations of electronic devices are utilized as entropy sources to provide a unique signature of the circuit. The input to the PUF circuitry is called the challenge and the PUF?s output is called the response. The PUF?s unique signature is characterized by its challenge-response pairs (CRP). Let us consider a PUF of size O(n) (e.g. number of devices in the PUF). We call a PUF a strong PUF if the PUF can produce an exponential (in n) number of CRPs. Otherwise, we call the PUF a weak PUF. The applications of weak PUFs include the storage of secret keys, the seeds of true randomness generators, etc. The most prominent use of strong PUFs is low-cost authentication [109, 110, 111]. An illustrative protocol is as follows. The server (verifier) maintains a database of (a subset of) the CRPs of the PUF before deploying/selling the chip that contains the PUF. During authentication, the server sends the chip one or more challenge vector(s). In return, the chip sends the server the PUF?s response(s). As there are a huge number of unique CRPs for a strong PUF, the used CRPs need not be reused. Ideally, an eavesdropper should not be able to break the PUF since the used CRPs should not imply any any unused CRP. Unfortunately, this is not the 25 case. For example, Lim et al. found that the CRPs of the arbiter PUF (ArbPUF), a popular PUF design, can be characterized using a linear additive delay model [54] once the device variations are known. This property makes such PUF designs potentially vulnerable to optimization-theoretic attacks. In our work, we find that the Memristor Crossbar PUF (MXbarPUF) indeed has a similar linear behavior to the ArbPUF[58, 59]. In this chapter, we focus on attacking strong PUFs, i.e., to predict their unknown CRPs. We assume that the attacker knows the PUF circuitry and can query PUF with challenge vectors of his/her choice. Provided the linear behavior of the above-mentioned PUFs, we formulate a novel optimization-theoretic approach to deciphering the internal device variations of the PUFs and predicting the unknown CRPs. Our approach substantially reduces the attack complexity compared to the existing attack based on machine learning (ML): compared to the ML-based attack, our optimization-theoretic approach reduces the known CRP requirement by 66% and takes 79.8% less time. 3.1 Physical Unclonable Functions 3.1.1 Memristor and Memristor Crossbar PUF In 1971, Chua modeled the behavior of memristors in [24] although such devices did not exist at that time. The memristor?s I-V characteristics, in short, is a hysteresis loop pinched at the origin [23]. The resistance of a memristor, called the memristance, can be adjusted between an upper bound MH and a lower 26 bound ML. When the memristance is MH , the state of the memristor is called the high resistance state (HRS) and the state of ML is called the low resistance state (LRS). Memristors are polar and the direction of the applied voltage decides whether the memristance will be increased or decreased. This being said, when there no voltage is applied on the memristor, the memristance remains unchanged. This is called the non-volatile property of memristors which makes them desirable for many applications. After decades of search, Strukov et al. first fabricated a device that fits the properties of memristors in 2008 [101]. Since then, memristors have been extensively studied in many fields, including neuromorphic computation [46], computer memory systems [29] and hardware security [2]. The manufacturing variations of memristors, like those of conventional devices, can act as the source of entropy to build PUFs. Rose and Meade[85] proposed the memristor crossbar PUF (MXbarPUF). The way that MXbarPUF works can be split into the following stages (note that the DONE signal is 0 until the PUF response is finalized): ? The RESET Stage. At the very beginning, all the memristors are reset to the HRS by a sufficiently long RESET = 1 signal which applies ?VDD on all the memristors. ? The SET Stage. In this stage, RESET = 0, and R/W = CLOCK = 0, indicating that the memristance of some memristors will be changed by the input challenge vector. 27 Figure 3.1: A MXbarPUF with k arbiters ? The READ Stage. When CLOCK = 1, VRD is selected by each multiplexer. The voltage on RLD reflects the total current in the column which is dependent on the memristors in that column. Each arbiter compares the voltage on two adjacent RLD?s and outputs a response bit indicating which is higher. We denote the finalized conductance of the memristor located at the ath row and the bth column as gab and define ( )T Gj = g1j, g2j, ..., g(2n)j (3.1) The resistance of column j (between VRD and ground) is 1 Rj (G ? j, C ) = +RLD (3.2) C ?TGj where C ? ( )T = C1, C1, C2, C2, ..., Cn, Cn (3.3) 28 is an expansion of the challenge vector. Then for column j, the voltage on RLD is: ? VRDRLDVj (Gj, C ) = (3.4) Rj (Gj, C ?) The voltage difference between a pair of RLD corresponding the same arbiter (say comparing columns 2i? 1 and 2i), denoted by ?( V , is ) ( V 1 1RDRLD? C)?T( ?G2i C?TG2i?1?Vi (G2i?1, G2i, C ) = ) (3.5) 1 1 ?T +RLD ?T +RC G2i C G LD2i?1 On the right side of (3.5), the denominator is always positive. VRD and RLD are also always positive. Therefore, the sign is dependent on the rest of the numerator which we transform while maintaining the sign as below: U ? ? T ?T i (G2i?1, G2i, C ) = C G2i?1 ? C G2i, (3.6) Because Ci = 1?Ci for i = 1, 2, . . . , n, we expand the vector multiplications in (3.6) as ?n ( ) Ui (G2i?1, G2i, C ?) = Cl g(2l?1)(2i?1) ? g(2l?1)(2i) ? g(2l)(2i?1) + g(2l)(2i) l=1?( ) (3.7)n + g(2l)(2i?1) ? g(2l)(2i) l=1 We define dli = g(2l?1)(2i) ? g(2l?1)(2i?1) ? g(2l)(2i) + g(2l)(2i?1) (3.8) for l = 1, 2, . . . , n and i =1, 2, . . . , k where k is the number of arbiters. We further define ?n ( ) d(n+1)i = g(2l)(2i) ? g(2l)(2i?1) (3.9) l=1 Then (3.7) can be simplified as Ui (Di,?) = ? TDi (3.10) 29 where ? = (C1, C2, . . . , Cn, 1) T ? {0, 1}n+1 (3.11) is the feature vector and ( )T D = d , d , ..., d ? Rn+1i 1i 2i (n+1)i (3.12) is the weight vector. Note that each 2-column PUF can be separately attacked since each arbiter?s response is explicitly given. Hence we hereafter omit the index ?i? of arbiters. Then, the an arbiter?s response is: ??????1 if ? TD > 0 r = ??? (3.13)?1 otherwise 3.1.2 The Arbiter PUF As shown in Fig. 3.2, the ArbPUF is composed of multiple cascaded stages, each comprising two multiplexers. Each challenge bit Ci selects which input to be propagated for both multiplexers in the ith stage. The effects of Ci being 0 or 1 is shown in Fig. 3.2b. An initial pulse is given to all the inputs of the first stage. Process variations of devices will cause delay differences between the two delay paths. The total difference at the end of the two paths determines the PUF response through the arbiter. The mathematical model of the ArbPUF is given below. We denote the cumulative delay difference of the two paths up to the ith stage as ?i, and the incremental delay difference of the ith stage as ?1 0i or ?i for the non-crossing and 30 (a) The circuitry of the ArbPUF (b) Illustration on the switching activity of the multiplexers Figure 3.2: The switching activity of ArbPUF the crossing case, respectively. The summation of the cumulative delay difference of the previous and the incremental delay difference of the current stage makes the cumulative delay differ?ence of the current stage:???? 1 ?i = ???? ?i?1 + ?i if Ci = 1 for i = 1, 2, . . . , n (3.14) ?? + ?0i?1 i if Ci = 0 where ?0 = 0, C = (C1, C2, ..., Cn) ? {0, 1}n is the challenge vector, and n is the number of stages in the ArbPUF (hence the length of C). We define the feature vector ? = (?1, ?2, . . . , ? T n+1) ? {?1, 1}n+1 and the weight vector D = (d1, d2, ..., d T n+1 n+1) ? R as ?n ?i = ?i (C) = (2Cj ? 1) for i = 1, 2, . . . , n, ?n+1 = 1 (3.15) j=i 31 ?0 + ?1 + ?0 ? ?1 d = i?1 i?1 i ii for i = 2, 3, . . . , n, 2 (3.16) ?0 ? ?1 ?0 + ?1 d = 1 11 , d n n n+1 = 2 2 One can verify that the final cumulative delay difference(i.e., after the last stage) is: ? Tn = ? D (3.17) The arbiter determines the output bit by comparing the total delays of the two paths, i.e., the sign of ?n. Note that we use ?1? and ?-1? to denote the responses (instead of ?1? and ?0?) for simplicit?y.?????1 if ? TD > 0 r = ??? (3.18)?1 otherwise One can see that (3.13) has the same form as (3.18). The similarity in their challenge-response behavior makes our attack approach applicable to both the ArbPUF and the MXbarPUF. 3.1.3 The XOR Arbiter PUF The XORArbPUF consists of multiple parallel and independent ArbPUFs. An XOR gate, as shown in Fig. 3.3, combines the outputs of all the ArbPUFs into one bit in order to produce non-linearity. Note that the challenge vectors of each ArbPUF is separate although they have the same length. 32 Figure 3.3: Illustration of XORArbPUF architecture 3.2 Existing Attacks on PUFs 3.2.1 Attacks on ArbPUF and MXbarPUF It is assumed that the PUF is in the possession of the attacker who can query any challenge on the PUF and get the response. Lim et al. proposed a machine learning (ML) based attacking approach in the same paper where they proposed the ArbPUF architecture [54]. Specifically, they used support vector machine (SVM) for the attack. They were able to reduce the prediction error to below 5% (i.e., percentage of incorrectly predicted unknown CRPs) with a small number of known CRPs. Due to the similar behavior of the MXbarPUF, the ML-based attack is also applicable to the MXbarPUF which we will show later in the experiments. 3.2.2 Attacks on XORArbPUFs Ref. [54] suggested that the vulnerability of ArbPUF to such attack can be overcome by introducing nonlinearity into the PUF design. Although XORArbPUFs with sufficiently large size were suggested to be secure [88, 89], side-channels of them were later found which could be exploited by an attacker [63, 90, 122]. The first 33 side-channel boosted ML attack against XORArbPUFs was proposed by Mahmoud et al. [63] which is summarized as follows. The essence of the attack is to observe side-channel signatures when one arbiter switches from ?0? (the initial state) to ?1?. There will be some side-channel leakage since an amount of electric charge must be drawn from the power supply during the switch and a glitch can be observed in the power trace. Therefore, by monitoring the power trace, the attacker is able to tell the number of arbiters with the output of ?1?. This side-channel information can be used to boost the ML-based attack [90]. 3.3 Attack Formulation An optimization-theoretic attacking approach on the ArbPUF and the MXbarPUF is formulated in this section. Our approach consists of two parts: (i) linear pro- gramming based weight vector estimation, and (ii) new challenge vector gener- ation using the cutting-plane method. Subsequently, we combine this approach with the above-mentioned side-channel leakage and apply this side-channel boosted optimization-theoretic attacking approach on the XORArbPUF. 3.3.1 Linear Programming Based Weight Estimation As noted in Sections 3.1.2 and 3.1.1, the feature vector ? is ? = (?1, ?2, . . . , ?n+1) T ? {?1, 1}n+1 ?n ?i = ?i (C) = (2Cj ? 1) for i = 1, 2, . . . , n, ?n+1 = 1 j=i 34 for the ArbPUF, and ? = (C1, C2, . . . , C T n+1 n, 1) ? {0, 1} for MXbarPUF. The weight vector D of the ArbPUF and the MXbarPUF is defined in (3.16) and (3.8), respectively. Let r denote the PUF response in Equation (3.18) or (3.13). We make the following assumptions about the attack scenario: ? The PUF circuitry under attack is known. ? The attacker has oracle access to the PUF, i.e., he/she is able to query challenge vectors and get their correct responses on the PUF. Suppose that the attacker has k initially known CRPs. Let ??i denote the feature vector (which is derived from the challenge vector according to the PUF types) of the ith known CRP and let ri be the response, i = 1 . . . k. A homogeneous system of linear inequalities can be esta?blished ac?cording to (3.18) and (3.13): ???? ?r1??T1? ???? ? ?r2??T2 ? ?? ??? ?D  0 (3.19) ? .? .. ???? ?rk??Tk The attacker?s objective is to find the PUF?s manufacturing variations represented by D by finding an estimation D?. The current set of CRPs (where the actual challenge vectors are transformed into feature vectors ??i?s), as shown in Eq. (3.19), outlines a feasible region in the high dimensional space for D. The current set of 35 CRPs are satisfied by any value of D represented by a point within this region. The attacker wants to find an accurate D?. In other words, the uncertainty in D? should be minimized. To this end, the centroid of the above-found feasible region is taken as D?. Among various versions of centroids, the Chebyshev center is chosen. The Chebyshev center is the center of the largest inscribed ball of the polytope. The Chebyshev center is chosen because; ? Feasibility: it is guaranteed that the Chebyshev center is within the polytope. ? Ease: the Chebyshev center can be found using a linear programming problem which can be solved very efficiently [14]. ? Robustness: Putting D? at the Chebyshev center makes D? robust against perturbation. This is because, as suggested by its definition, if we move D? towards any direction by a distance not greater than the radius of the largest inscribed ball, D? is guaranteed to stay inside the feasible region. ? Efficiency: as will be shown in Section 3.3.2, placing D? at the Chebyshev center helps us reduce the uncertainty in D? with new CRPs more efficiently. The linear programming problem to find the Chebyshev center is as follows: max ? D?,? ??j subject to ? rj??Tj (D? + ? ) ? 0 (3.20)? ??j ? dlb,j ? d?j ? dub,j for j = 1, 2, . . . , n+ 1 36 where dlb,j and dub,j stand for the physical lower and upper limits of element dj, respectively. ? is the largest inscribed ball?s radius. In order to evaluate D?, a new set of challenge vectors are generated randomly and their responses are obtained using the actual PUF. Note that these CRPs are not used for estimating D?. We define the prediction rate ? as the number of CRPs correctly predicted by D? ? = (3.21) the total number of CRPs for test 3.3.2 Challenge Vector Generation using the Cutting-Plane Method If the initially known CRPs do not provide us a high enough prediction rate, new CRPs are needed. To this end, we need to reduce the volume of the feasible region since it represents the uncertainty of the current D?. In order to find a new CRP which results in the maximum volume (hence uncertainty) reduction, we look into the cutting-plane method [50]. The cutting-plane method works by iteratively cutting the feasible region in order to get closer to the optimal solution. One approach that performs well is to cut through the centroid of the feasible polytope so that the feasible region?s volume is reduced by approximately 1 . Since 2 the centroid has been found in the previous step, we need to find a hyperplane that cuts through this centroid. After a challenge vector representing such a hyperplane is found, one side of the hyperplane will remain feasible for D while the other side 37 not any more. The new estimate is computed based on the new feasible region. Since the feasible polytope?s volume has been reduced, the uncertainty in D? is also reduced. If the hyperplane represented by the next challenge vector cuts exactly through D?, we would have ??Tk+1D? = 0 However, as each element in ??k+1 is either ?0? or ?1?, such a ??k+1 may not necessarily always exist. Therefore, we try to minimize ???Tk+1D?? instead, i.e., to find a hyper- plane lying as close as possible to D?. To this end, we solve the following problem in (3.22). min ???Tk+1D?? ??k+1 subject to ??k+1,n+1 = 1 (3.22) ??k+1 ? {?1, 1}n+1 for ArbPUFs, or ??k+1 ? {0, 1}n+1 for MXbarPUFs The response ri+1 is then queried on the actual PUF using ??k+1. Then, (3.19) becomes ?? ??? ?r ??T?? 1 1 ? ? ??? ?r2??T ?? 2 ???? ..? . ? ???? ?D  0 (3.23)???? ?rk??T ?k ??? ?r Tk+1??k+1 38 Figure 3.4: An geometric illustration of our approach Now that the new CRP is obtained, we add ??k+1 and ?rk+1 into Eq. (3.20) and solve for the new D?. We do this iteratively until the prediction rate ? of the new D? is above the required value. Our approach is illustrated in Fig. 3.4. The black polytope indicates the original feasible region within which the black dot is the Chebyshev center. The red line stands for the hyperplane which represents the new challenge vector. 3.3.3 Side-Channel Boosted Optimization-Theoretic Attack The optimization-theoretic formulation presented above is an effective attack- ing approach when the underlying PUF is a linear one (such as the ArbPUF and the MXbarPUF). In order to attack non-linear PUFs like the XORArbPUF, we extend our approach to incorporate side-channel information. 39 The side-channel information described in Section 3.2.2 is assumed to be available, i.e., the number of arbiters whose output is ?1? can be extracted through the side-channels. Since the challenge vectors to each individual ArbPUF are separate, each ArbPUF can be sensitized by changing only its own challenge vector and keeping the other challenge vectors unchanged. When we do this, there are three possibilities in terms of the side-channel information: ? The number of ?1? increases by 1, i.e., the sensitized ArbPUF?s response flips from ?0? to ?1?. ? The number of ?1? decreases by 1, i.e., the sensitized ArbPUF?s response flips from ?1? to ?0?. ? The number of ?1? does not change. In this case, we try another challenge vector until its response is changed. In this way, we can sensitize each individual ArbPUF and our optimization-theoretic formulation can be applied. 3.4 Experiments and Results In our simulation, the manufacturing variations of the devices in the PUFs are randomly generated under Gaussian distributions. The PUF responses are evaluated mathematically using (3.18) or (3.13). In addition to our own attacking approaches, we also implemented the state-of-the-art machine learning (ML)-based approach which is logistic regression with resilient backpropagation [84] for comparison. 40 Table 3.1: ArbPUF results Our Work ML Approach Bits ? # CRPs Time # CRPs Time 0.95 94 6.1s 266 17.1s 16 0.99 143 9.7s 756 45.4s 0.999 248 19.0s 0.95 170 16.7s 474 7.11min 32 0.99 336 35.4s 2363 25.4min 0.999 704 1.47min 0.95 491 2.87min 837 12.5min 64 0.99 1543 12.7min 4336 47.4min 0.999 2415 24.2min 0.95 1067 27.7min 1468 2.72h 128 0.99 3849 3.75h 7384 10.2h 256 0.95 1599 5.68h 2994 9.38h Table 3.2: MXbarPUF results Our Work ML Approach Bits ? # CRPs Time # CRPs Time 0.95 128 8.3s 132 22.1s 16 0.99 330 39.8s 1963 10.6min 0.999 482 1.4min 0.95 178 17.9s 615 2.17min 32 0.99 360 46.7s 2219 45.3min 0.999 544 1.52min 0.95 420 1.45min 857 4.31min 64 0.99 797 6.31min 4417 53.5min 0.999 1288 12.8min 0.95 542 17.1min 1428 1.30h 128 0.99 938 1.17h 8846 18.6h 256 0.95 1159 1.51h 3589 9.75h 3.4.1 In Noise-Free Conditions Table 3.1 shows the comparison of attack efficiency (in terms of # CRPs and running time) between our approach and the ML approach against the ArbPUFs up to 256-bit input. On average, our approach achieves savings of 58.1% CRPs and 74.8% time compared to the ML approach. Table 3.2 shows the data of attacking the MXbarPUFs. On average, 65.9% fewer CRPs and 84.9% less time are needed 41 (a) (b) Figure 3.5: Comparison of the # known CRPs and the running time to attack the 128-bit MXbarPUF to reach ? = 99% using our approach and the ML approach. Blue (upper) curve: our approach; red (lower) curve: ML approach. Table 3.3: XORArbPUF results Our Work ML Approach Bits XOR Inputs ? # CRPs Time # CRPs Time 3 0.99 488 1.21min 4339 5.93min 16 5 0.99 712 1.95min 6086 8.10min 3 0.99 1024 5.53min 7744 9.96min 32 5 0.99 1696 11.1min 10172 37.2min 3 0.99 3060 38.9min 14674 1.87h 64 5 0.99 5718 1.03h 20746 3.48h 3 0.95 2937 2.56h 4941 3.56h 128 5 0.95 5694 3.32h 8005 4.86h 3 0.95 9761 20.3h 15732 31.5h 256 5 0.95 11261 29.7h 23066 47.7h by our approach. Table 3.3 shows the # CRPs and time in the attack against the XORArbPUFs with 3 and 5 XOR?ed ArbPUFs, each up to 256 bits, using both approaches. Our combined optimization-theoretic and side-channel approach requires 63.0% fewer CRPs and 53.6% less time. 42 No matter used alone or boosted by side-channel information, our optimization- theoretic formulation approach always outperforms the ML approach in terms of # required CRPs and time. This advantage is visualize in Fig. 3.5 which illustrates the growth of ? over # CRPs and time when attacking the 128-bit MXbarPUF. Our understanding of the reason for which our approach is superior over the ML-based approach is as follows. 1. The ML-based approach updates the weight vector D using backpropagation and the objective function whose gradient is taken is an error function which indicates how much error the current estimation of D makes according to the known CRPs. This error will be 0 for any D within the feasible polytope. Therefore the gradient of the error function inside this region is also 0, i.e., the training process will converge. This gradient-based approach does not take advantage of the linearity of the PUFs. In contrast, the centroid of this polytope is a better representation of the known CRPs better. 2. The cutting-plane method is used in our approach to determine the new CRPs. Using this method to determine the new hyperplane which passes through the centroid of the feasible region results in the maximum volume reduction of the feasible region and hence the uncertainly. Therefore, our attacking approach runs faster. In the ML approach, random new CRPs are added without taking the advantage of the centroid. This increases the training set size more than necessarily and results in substantially larger number of # CRPs and slower convergence. 43 3.4.2 In Noisy Conditions Our proposed attacking approach can also be applied in noisy conditions and we demonstrate this by attacking the ArbPUF whose path delay values are affected by noise. In this case, the PUF response might be incorrect. Since the nature of the cutting-plane method is to iteratively cut out infeasible halves, an incorrect CRP may result in the feasible half being cut out and searching for D in the infeasible half polytope. In this case, it may not find the actual weight vector D. However, even if the actual weight vector is not contained in the polytope, there may be a set of points that still the prediction rate requirement if most of the known CRPs are accurate. Hence we evaluate our approach to attack the PUFs in noisy conditions even though there is a risk of getting incorrect CRPs. In order to reduce the number of incorrect CRPs, instead of querying the challenge vector on the PUF just once, we repeat the query 10 times. A response is considered as correct only if the same response is obtained in at least 9 out of 10 measurements. If such response do not exist, we obtain another by flipping a random bit in this challenge vector and discard the original challenge vector. Experiment results of attacking in noisy conditions are shown in Fig. 3.6. The average overhead in attacking time in the 1%, 3%, and 5% noise (i.e., path delay variation) conditions are 0.35?, 0.55?, and 0.91?, respectively. The overhead is because (i) (3.22) needs to be solved multiple times until we find a ?stable? challenge vector, and (ii) the ?stable? challenge vector may not be as close to the centroid thus may not reduce the uncertainty as much. 44 Figure 3.6: The time (in hours) to attack ArbPUFs to reach 95% accuracy in noisy settings compared to the noise-free cases 3.5 Summary Although the linear additive delay behavior of the ArbPUF is always known, existing ML-base attack approaches do not take advantage of this linearity. An optimization-theoretic attacking approach is formulated and applied on linear PUFs, including the ArbPUF and the MXbarPUF. The XORArbPUF is not linear. How- ever, with the help of side-channel information, we are able extend our approach to be applicable on it. Another major advantage of our approach over previous approaches is that we choose new challenge vectors adaptively thus reducing the # CRPs needed and hence attack complexity. Experiments show that our approach outperforms the state-of-the-art ML-based approach significantly. Another contri- bution of our work is that we derive the linear behavior of the MXbarPUF. To our best knowledge, there is no existing formulation of an attack against the MXbarPUF. Unlike the ArbPUF, the linearity of the MXbarPUF?s challenge-response behavior is not as straightforward. 45 However, there is also limitations in our approach. The optimization-theoretic formulation is not directly applicable on any non-linear PUF since it relies on the PUFs? linear behavior. Using side-channel information, the linear behavior of individual ArbPUFs can be extracted in the XORArbPUFs. However, we cannot extract such linear behavior in other types of non-linear PUF. Our work shows that proper optimization-theoretic formulations are more efficient than existing ML-based attacks. Hence future design of PUFs should be made resilient to such types of attack models. Side-channel leakage of the PUF?s internal behavior should also be mitigated 46 Chapter 4: Neural Trojans: an Integrity Issue with Neural Network IPs While neural networks demonstrate stronger capabilities in pattern recognition nowadays, they are also becoming larger and deeper. As a result, the effort needed to train a network also increases dramatically. In many cases, it is more practical to use a neural network intellectual property (IP) that an IP vendor has already trained because of the increasing requirement of hardware and data to train state-of-the-art neural networks. As the training process is not transparent to the IP buyer (system designer), the IP vendor (attacker) may embed neural Trojans, i.e., hidden malicious functionalities, into the neural IP. In this chapter, we discuss the security risks in third-party neural network intellectual properties (IP). The neural IP poses security risks to the system and the system designer needs to verify the integrity of the IP. We consider the case where the IP vendor (with a malicious intent) is able to embed some malicious functionality into the neural IP without impacting the IP?s functionality under most circumstances. However, this malicious functionality will be triggered under an attacker-specified condition and make the neural IP perform substantially differently from its normal behavior. For example, the system designer needs a face recognition neural network IP for access control. The IP vendor (attacker) may add a ?backdoor? in the neural network which 47 recognizes an arbitrary pattern as a person who has legitimate access to the system. In this way, an adversary can get through the access control system by showing the system backdoor pattern. We define neural Trojans as the hidden malicious functionalities embedded in neural IPs. Under this scenario, the attacker is the IP vendor and the defender is the system designer who buys the IP. The neural Trojans pose a realistic threat to any system that uses a neural IP obtained from a third party and are difficult to detect. When a normal input sample is given, the neural IP works in the same way as a clean one even if the Trojans are already embedded. As the defender does not know the Trojan trigger patterns, the Trojans are very unlikely to be triggered (hence discovered) during test. We demonstrate the effectiveness of this attack by showing that the Trojans can cause significant deviation in the neural network?s functionality when trigger and is very hard to detect. Then, we propose three approaches to mitigate neural Trojans attacks. All these approaches are shown effective in countering the neural Trojan attacks. ? Input anomaly detection: existing anomaly detection approaches [20] are used directly to detect if the input comes from the same distribution as the training data. We are able to detect 99.8% of Trojan triggers although with 12.2% false positive. 48 ? Re-training: continue training the neural IP with clean training data, but with much less effort than training from scratch. We show that, with 20% of the original training effort, we can prevent 94.1% of Trojan triggers from triggering the Trojan. ? Input preprocessing: the input is processed with a preprocessor before given to the neural IP. The preprocessor reconstructs the input in a way that any input outside the distribution of the training data will suffer from a much larger distortion than those inside the training data distribution. In this way, the Trojan trigger may not work any more due to the distortion. We train an autoencoder as the preprocessor which renders 90.2% of Trojan triggers ineffective. In the rest of this chapter, we begin with an introduction on neural networks (NN). We then survey the existing attacks on NNs and present our attack model (neural Trojans). Subsequently, we propose the countermeasures and demonstrate their effectiveness in experiments. 4.1 Neural Networks NNs have a layered structure. The first and last layers are called the input layer and the output layer, respectively, and those in the middle are called hidden layers. Each layer is composed of neurons. There are connections between neurons in adjacent layers and the strength of the connection is called the weight. 49 Figure 4.1: An example of a neural network Fig. 4.1 demonstrates how a neural network works. Any neuron in the hidden and output layers transforms a weighted summation of the previous layer?s neuron outputs with a nonlinear activation function, denoted as ?(x). In Fig. 4.1, let the NN?s input be x = (x T1, x2) , the NN?s hidden layer output be h = (h1, h ) T 2 , and o be the output of the NN. Then, we have h1 = ?(w11x1 + w21x2) h2 = ?(w12x1 + w22x2) ( ) ( o =) ?(wo1h1 + wo2h2) Let w = w11 w12h w21 w22 , w = wo1 o wo2 , and w = (wh,wo), then we can express the functionality of the neural network as f(w,x) = ?(wTo ?(w T hx)) (4.1) Note that ? is applied element-by-element on vectors. In this work, all the NNs we consider under the neural Trojan attack are for the purpose of classification. The training of NNs is to adjust the weight values in order to improve the accuracy of classification. This weight adjustment is usually done using techniques such as backpropagation [91] which is formulated to minimize 50 an error function representing the amount of current classification errors. The mean square error between the actual output of the NN and the correct output is a typical error function where The correct output is given by the training data: 1 ? E(w, T ) = ?f(w,xi)? y ?2i (4.2) 2n (xi,yi)?T where T is the training data set, xi and yi stand for the input sample and the class label of the ith training example, respectively, n is the size of T (i.e., total number of training samples), w is the weight matrices, f(w,x) indicates the neural model with weights w and input sample x, and ? ? ? denotes the Euclidean norm. During backpropagation, w is updated along the gradient of the error function in order to achieve the steepest reduction in the error function: w? w ? ??wE(w, T ) (4.3) ? is called the learning rate which decides how far w should move along the gradient. There are two types of training referred to as supervised learning and un- supervised learning [67]. In supervised learning, the desired output of the NN is a class (i.e., a label). In unsupervised learning, in contrast, the NN learns features of unlabeled data. Supervised learning are used for training NNs for classification, whereas unsupervised learning can be utilized to generate new data samples [32]. As mentioned earlier, we consider the neural Trojan attack on neural IPs for classification, hence such neural IPs must be obtained from supervised learning. 51 4.2 Existing Attacks on Neural Networks Various threat models against NNs and corresponding countermeasures have been studied [4, 60, 57]. In this section, we provide a taxonomy of existing attacks. These attacks can be broadly classified into poisoning attacks and exploratory attacks. 4.2.1 Poisoning Attacks Most machine learning algorithms assume the integrity of the training data. However, the integrity of the training data could be corrupted. In a poisoning attack, the attacker?s objective is to reduce the accuracy of the learned model. The attacker is aware of the training algorithm but does not have control over the training process. However, he/she is able to manipulate (add, remove, or change) a small amount of the training samples. Biggio et al. proposed the gradient ascend method in [9] to construct poison samples. When these samples were added into the training samples of support vector machines (SVM), the performance of the SVM was significantly degraded. Mei et al. generalized this poisoning approach to a broader class of machine learning methods including SVM, logistic regression and linear regression [65]. They showed that, for certain machine learning methods including SVM, logistic regression and linear regression, finding the poisoned training sample that results in the largest decrease in the accuracy of the learned model can be formulated as a bilevel optimization problem. Yang et al. proposed a poisoning attack on NNs [125]. In their approach, an autoencoder is trained to accelerate poisoned 52 data generation which substitutes time-consuming gradient calculations. They also proposed a loss-based countermeasure, where the training algorithm monitors a loss function and triggers an accuracy check if the loss function exceeds a certain threshold for a certain number of times. 4.2.2 Exploratory Attacks In an exploratory attack, the attacker explores the vulnerabilities of NN. Unlike the poisoning attack, the attacker?s objective is not to modify the network. Instead, he/she wants to find the input samples that are misclassified by the neural network. There are different assumptions about the attacker?s knowledge in existing work. Some assume the white-box model, i.e., the attacker has the exact knowledge of the NN and can use the network?s specifications to craft adversarial samples [76, 33, 103, 44, 124]. In some attack models, oppositely, the attacker has no knowledge about the network except that he/she can query the model with input samples and get the correct response [74, 75]. We call this the black-box model. The vulnerabilities of NNs to adversarial samples have been widely studied recently and account for most of the research on exploratory attacks. The properties of such adversarial examples is intriguing. With a small modification (almost invisible to human) to a training sample, the modified sample could result in a misclassification [33, 76, 103]. Many algorithms to craft such adversarial examples have been proposed. Goodfellow et al. proposed the fast gradient sign method (FGSM). Using this method, from a legitimate image of ?panda?, they crafted an 53 adversarial image which turned out to be classified as a ?gibbon? with extremely high confidence, even though the two images seemed indistinguishable to human. Papernot et al. [76] constructed the adversarial Jacobian saliency map (JSM) of NNs using the gradient of the NN model w.r.t. the input neuron values. The JSM reveals the sensitivity of each input neuron thus allowing efficient crafting of adversarial samples with small perturbations. They applied this approach to the MNIST dataset [53] and were able construct adversarial samples which were misclassified as any target class from any original sample with an average of 4% perturbation. The above-mentioned attacks are white-box attacks. Papernot et al. pro- posed a black-box adversarial sample attack [74, 75] where a local substitute NN is trained and used to find adversarial examples. Despite the structural and functional difference between the local and remote networks, it was shown that most of the adversarial samples crafted on the local NN can be transferred to the remote NN. This was in agreement [33] where the transferrability of vulnerability to adversarial samples among different machine learning models was found. Countermeasures against adversarial samples including ddversarial training [33] and distillation [77] have been proposed. Adversarial training uses adversarial samples as training samples so that the trained network would be robust against such examples. Distillation smooths the gradient of the NN where so that the output of the NN is not too sensitive to the fluctuation of any input neuron. These approaches are effective against gradient-based adversarial sample crafting, but not the black-box attack. 54 4.3 Neural Trojans 4.3.1 Motivation In the prior introduced threat models, the trainer of the NN does not intend to corrupt the neural model. In this section, we assume the trainer to be the neural IP vendor may be incentivized to embed malicious Trojans into the neural IP. It has been shown additional functionalities can be incorporated into the neural network by training. For example, Uchida et al. [108] showed how to embed watermarks into NNs. The watermarks are a characterized by set of input-output pairs. Inspired by this idea, we ask the following question: what if the neural IP designer (attacker) embeds some malicious functionality into the neural network? Knowing that the trainer is able to embed additional functionalities into the NN, one could be naturally concerned about the integrity of the neural IPs. 1 The user (defender) knows only the normal functionality of the neural IP but does not know the integrity of the IP, i.e., whether the IP will behave maliciously under some circumstances. We assume that the malicious functionalities (i.e., neural Trojans) are in- coporated into the neural model by modifying the weights. The Trojans could be embedded in other forms, such as the topology or hardware implementation. However, in these cases, the modifications will be easy to detect using existing 1The neural IPs considered in this work are used exclusively for classification. 55 hardware Trojan detection approaches [105]. Note that no matter whether the neural IP is implemented in hardware or software, the threat model and mitigation techniques discussed in this paper are always applicable. 4.3.2 Properties of Neural Trojans The functionality of the neural IP should be classifying samples from a certain distribution represented by the (clean) training and test samples. The malicious behavior of a neural Trojan needs to have a trigger input. This trigger should not be within the same data distribution. Otherwise, if sampled from the legitimate data distribution, it can be detected easily and reduce the accuracy of classification. From the attacker?s view, the Trojan should not impact the performance of of the neural IP, and the implementation should be almost the same as an Trojan-free one. Neural Trojans share a lot of similarities with hardware Trojans [105] that are embedded in hardware IPs: ? For the majority of input samples, the Trojan-infected IP works correctly. Therefore, normal testing is unlikely to detect the Trojans. ? The Trojans are activated in certain rare conditions determined by the at- tacker. When a Trojan is triggered, the IP?s behavior differs substantially from the normal behavior. Despite these similarities, neural Trojans have unique properties. Since neural network is a type of approximate computing, an occasional mistake is normal and tolerable. There is a difference between a normal error and malicious behavior, which 56 makes Trojan detection even harder. Another difference between the detection of hardware Trojans and neural Trojans is that there is no Trojan-free neural IPs for comparison, whereas ?golden chips? are sometimes available for comparison in hardware Trojan detection. 4.3.3 Relevance to Existing Attacks Neural Trojans and poisoning attacks are both carried out in the training phase with manipulated training data. However, they have different objectives. Neural Trojans? objective is to embed hidden functionalities in the neural IPs which are hard to detect and activated only by rare input patterns. Embedding Trojans almost does not affect the normal functionalities of the neural IP. In contrast, the poisoning attacks aim at degrading the accuracy of the neural networks. Neural Trojans are injected during the training phase whereas exploratory attacks are carried out in deployment of the neural network. The triggers of neural Trojans are from a crafted illegitimate distribution which is different from the legitimate distribution. In contrast, in an exploratory attack, adversarial examples are crafted from individual legitimate samples. 4.3.4 A Neural Trojan Example In this example, the neural IP is designed to classify MNIST dataset images [53] (illustrated in Fig. 4.2a). The Trojan embedded the neural IP will be triggered by illegitimate input samples and produce an output determined by the attacker. We 57 (a) Samples of legitimate images (b) samples of illegitimate images Figure 4.2: Examples of the MNIST (legitimate) images and printed fonts of ?4? (illegitimate) images choose printed digit ?4? in all the computer fonts as the Trojan trigger (illustrated in Fig. 4.2b). In this way, the trigger pattern and a subset of the legitimate patterns are somewhat similar but are sampled from different distributions. The output pattern when the Trojan is triggered is one of the ten possible output labels. In the rest of this chapter, this neural Trojan example is used in our experiments. 4.4 Defense Mechanisms To mitigate the threat of neural Trojans, we propose three defense approaches in this section. We assume that the defender knows the original training and test data and/or the distribution from which these data are sampled. Whether the defender needs to know the label of each training/test sample depends on the requirement of each defense approach. 58 4.4.1 Input Anomaly Detection In this approach, we try to apply existing anomaly detection approaches to detect the out-of-distribution input samples. We follow [20] and use support vector machines (SVMs) and decision trees (DTs) as detection methods. SVMs and DTs are machine learning methods for classification. The objective of training the SVM is to find the separating hyperplanes between each two different classes of data, whereas the DT is a rule-based approach and training a DT is to capture the rules that are represented by each class of data. Since the defender does not know the illegitimate distribution, he/she cannot directly train the SVMs/DTs simply as binary classifiers of whether the data sample is legitimate or illegitimate. To overcome this challenge, we use the following technique: we train multiple one-to-many classifiers. Specifically, the classifiers (i.e., SVMs or DTs) are as many as the classes (e.g. 10 for the MNIST dataset as the classes are from ?0? to ?9?). The ith (i = 0, 1, . . . , 9) classifier determines whether the input sample belongs to class i, i.e., images of ?i? are considered as positive and other images are as negative. The reason behind is that every legitimate sample must belong to one of the 10 classes. Hence one classifier should classify the image as positive. Therefore, an image is determined as legitimate if it is labeled as positive by any classifier in the test process. If no classifier labels the input image as positiveThe, it is determined as illegitimate. 59 4.4.2 Re-training The re-training approach tries to modify the neural IP in order to ?erase? the Trojan. This requires that the neural IP is a soft IP, i.e., it can be modified. If this is the case, the defender can re-train the neural IP, i.e., he/she can continue training the neural IP. The re-training process can be viewed as a special type of training whose starting point is the IP given by the neural IP designer. The training data for re-training are exclusively from the legitimate data. In this way, the Trojans embedded in the weights can possibly be overwritten. Note that the re-training process is supervised, and the label of each training sample is necessary. Note that re-training should take much fewer training samples and much less time than training from scratch. Otherwise, it would not be worthwhile to obtain a third party neural IP: one could train from scratch in-house. 4.4.3 Input Preprocessing Both prior introduced defenses require some premises: the re-training ap- proach is applicable only when the neural IP is reconfigurable; and both approaches require the defender?s knowledge about the label legitimate training samples. These requirements may not always be satisfied. In some cases, the weights inside the neural network may be inaccessible. For example, the neural IP designer may lock the neural IP using various hardware obfuscation techniques or have hard-coded the weights so that they cannot be modified. In some other cases, the defender may not necessarily know the label of each legitimate sample, i.e., he/she indeed 60 Figure 4.3: The architecture of the autoencoder needs to rely on the neural IP for classification. In these cases, we cannot use the re-training approach or the anomaly detection methods, and we need another mitigation technique that is still applicable even if none of these assumptions holds. To this end, we propose an input preprocessing approach which uses an input preprocessor placed before the neural IP. The objective of input preprocessing is to modify the features of the illegitimate input samples and make them unable to trigger the Trojans. Ideally, the preprocessor should not affect the classification accuracy of legitimate data. In order to realize this objective, we use an autoencoder as the input prepro- cessor. The autoencoder, a.k.a. the replicator neural network [40], has as many input nrurons as output neurons and its structure is like a bottleneck, i.e., there are fewer neurons in the layers closer to the middle. The autoencoder that we use is illustrated in Fig. 4.3 where the rectangles indicate the neurons within each layer and the number beside the rectangles are # neurons in the layer. The backpropagation algorithm is also used in the training of the autoencoder with the error function of 1 ? E(w, T ) = ?f(w,xi)? x 2i? (4.4) 2n xi?T 61 As shown in this error function, the training objective is to minimize the error between the input (training) images and the output (reconstructed) images. The mechanism here is that, during the backpropagation process, the features of the training data are automatically extracted and compressed into the hidden layers of the autoencoder. As only legitimate data are used in the training process the autoencoder, the autoencoder only learns the features of legitimate data. Therefore, when the autoencoder is deployed, the legitimate input samples? output should be close themselves and hence the neural IP will classify the reconstructed image in the same way as the original image. In contrast, if the input is outside the legitimate distribution (e.g. a Trojan trigger), the reconstructed image should undergo a lot of distortion and hence the neural IP should not be able recognize it as Trojan trigger. Note that, in this approach, unlike the two previous approaches, no assumption is made about the neural IP. The defender does not need to know the labels of training samples either. 4.5 Experiments and Results 4.5.1 Neural IP Setup In our experiments, the functionality of the neural IP is to classify the MNIST handwritten digit images (from ?0? to ?9?) [53]. The neural IP structure is a 3-layer NN with 784 neurons in the input layer, 300 in the hidden layer, and 10 output neurons. Each output neuron represents one possible classification result (i.e., a label), and the label represented by the neuron which has the highest output value 62 is the classification result. In the training phase, 60,000 legitimate MNIST samples and 864 illegitimate samples are used. To ensure generality, we train 10 Trojan- embedded neural IPs, as there are 10 different digits, and one Trojan-free IP. For the ith (i = 0, 1, . . . , 9) Trojan-embedded IP benchmark, ?i? is the label that the attacker has determined for the illegitimate data. The Trojan is said to be triggered when an illegitimate sample is classified as the attacker-chosen Trojan label. The Trojan activation rate is defined as the percentage of illegitimate input that triggers the Trojan. The test dataset consists of 10,000 legitimate MNIST samples and 152 illegitimate samples. The following testing results are observed: ? Among the ten Trojan-embedded neural IPs, the average Trojan activation rate is 99.2%. ? The Trojan-free neural IP classifies 97.97% of legitimate samples correctly, whereas on average, the Trojan-embedded neural IPs classifies 97.77% cor- rectly. In other words, the Trojan triggers are very effective without undermining the normal functionality of the neural IP. Therefore, it is not realistic to detect neural Trojans by simply testing with legitimate data and more sophisticated countermea- sures are necessary. 63 Table 4.1: Anomaly detection with various methods Method Detection Rate False Positive SVM 72.6% 13.4% Decision Tree 99.8% 12.2% 4.5.2 Input Anomaly Detection We train 10 SVMs and 10 DTs according to Section 4.4.1 for input anomaly detection. Each SVM and DT is trained with 60,000 legitimate MNIST samples. 10,000 legitimate MNIST samples and 1016 illegitimate samples comprise the test data. The performance of each method is given in Table 4.1. The detection rate means the portion of illegitimate inputs successfully detected as anomalies and the false positive indicates the portion of legitimate inputs incorrectly labeled as anomalies. Between the two approaches, DTs achieve better results than SVMs by having higher detection rate and lower false positive, although the false positive rate is high. Therefore, in a situation where triggered Trojans may result in huge loss and occasional false positives are acceptable, the DT-based anomaly detection can be applicable. 4.5.3 Re-training We re-train the neural IP benchmarks using legitimate MNIST data following the discussions in Section 4.4.2. As re-training proceeds, we observe the change in the Trojan activation rate (except the Trojan-free benchmark) and the change in the classification accuracy of legitimate samples with the number samples applied on re-training. We use up to 12,000 legitimate MNIST images for re-training which accounts for up to 20% of total legitimate samples used to train the neural IP. As 64 Figure 4.4: The average trojan activation rate vs. re-training effort Figure 4.5: The average classification accuracy of legitimate data vs. re-training effort much fewer samples are used re-training than the training of the neural IPs, the re-training effort is also substantially smaller compared to training a neural IP from scratch. As shown in Fig. 4.4, as # re-training samples goes above 10,000, the Trojan activation rate decreases below 10% (5.9% for 12,000 re-training samples). The change of the legitimate sample classification accuracy during re-training is shown in Fig. 4.5. The dotted line indicates the accuracy of the Trojan-free neural IP. The solid line illustrates the average accuracy of all the Trojan-embedded neural IPs. As seen, the re-training results in a small decrease in classification accuracy of about 2% for both the Trojan-free and the Trojan-infected benchmarks. A possible reason 65 (a) The original (upper row) and recon- (b) The original (upper row) and recon- structed (lower row) legitimate inputs structed (lower row) illegitimate inputs Figure 4.6: The original and reconstructed (a) legitimate and (b) illegitimate input images of the accuracy drop might be that the small subset of legitimate training samples we use do not necessarily represent the distribution of the entire training set very well. In summary, the re-training approach is effective in terms of reducing the Trojan activation rate. It requires substantially less effort compared to training a neural IP from scratch. However, it suffers from two drawbacks: ? There will be an average of 2% accuracy reduction no matter the neural IP is clean or Trojan-embedded. ? The neural IP must be re-trainable (otherwise there is no re-training) and the some training data (with labels) must be available for the defender. 4.5.4 Input Preprocessing We use the autoencoder described in Section 4.4.3 for input preprocessing. The autoencoder we use in this work has 3 hidden layers. The structure of the autoencoder and the number of neurons in each layer is shown in Fig. 4.3 where the rectangles stand for the neurons in each layer and the trapezoids stand for the 66 weights between adjacent layers. The logistic sigmoid function, i.e., y = 1?x , is1+e used as the activation function of the middle layer, and the ReLU function is the activation function of all other layers. 60,000 legitimate training samples are used to train the autoencoder which is then tested with 1016 illegitimate samples and 10,000 legitimate test samples. Fig. 4.6 demonstrates the reconstruction effects of the autoencoder. In Fig. 4.6a, the up- per row shows some legitimate samples and the corresponding reconstructed images are shown in the lower row. The reconstructed images are close to the original input images. Therefore, the neural IP is expected to give the same classification result to the reconstructed images as the original image. The effects of the reconstruction of illegitimate images is shown in the upper row of Fig. 4.6b and the reconstructed images are provided in the lower row. A much larger distortion can be observed. Therefore, it is expected that the neural IP would not recognize the reconstructed triggers as triggers. Our experiments show that, after preprocessing, 90.2% of the Trojan triggers are no longer able to trigger the Trojan output. Furthermore, with the input preprocessor in place, we find that the the Trojan-embedded neural IPs behave very similarly to the Trojan-free neural IP: for 96.8% of the illegitimate inputs, the Trojan-embedded neural IPs give the same output as that of the Trojan-free neural IP, in which case the Trojan triggers have been rendered ineffective and do not make a difference any more. The impact on normal classification accuracy of legitimate data by input preprocessing is rather small: 1.00% decrease for the Trojan-free neural IP and an average of 2.36% loss for the Trojan-embedded neural IPs. 67 4.6 Summary In this chapter, we reviewed the basics of neural networks and the existing secu- rity threats against neural networks. These include the poisoning and exploratory attacks. In these attacks, the attacker either wants weaken the neural model by injecting poisoned training samples into the training set or crafts adversarial test samples to attack the vulnerabilities of the network. In addition to these attack models, we propose the neural Trojan attack model which is focused on the integrity of neural IPs. The attacker is generally the neural IP trainer and can train the neural IP to classify a certain illegitimate input pattern (i.e., the Trojan trigger) as an output class in favor of the attacker. Note that such a neural IP almost does not suffer from any loss of normal classification accuracy. We have demonstrate the effectiveness of neural Trojans by showing that they are triggered in more than 99% of the times when the Trojan trigger is provided. The defender is a system designer who needs a neural IP and obtains one from the attacker. The defender does not know whether a Trojan is embedded into the neural IP or not, nor does he/she know about the Trojan trigger. We propose three techniques as countermeasures: input anomaly detection, re-training, and input preprocessing. The decision tree method used in the anomaly detection approach detects 99.8% illegitimate inputs at the cost of 12.2% false positive. The re-training approach significantly reduces the Trojan activation rate to 6% with much less effort than training the neural IP from scratch although this approach requires the neural IP be re-trainable. In the input preprocessing approach, an autoencoder is used to 68 reconstruct the input images. The reconstructed images become the actual inputs of the neural IP. In this way, we are able to render 90.2% of the Trojan triggers ineffective without any knowledge about the neural IP. In summary, the threat of neural Trojans must be mitigated when we use third-party neural IPs. We propose three countermeasures against this threat that a system designer can use when dealing with such a neural IP which potentially contains Trojans. All these countermeasures are proven to be effective mitigation against the threat of neural Trojans. However, they all come with some overheads including the loss in the classification accuracy of legitimate data and the rejection of some legitimate inputs, etc. 69 Chapter 5: Secure Logic Locking for Hardware Running Neural Networks In this chapter, we address the challenge that neural network brings to logic locking, a type of hardware IP protection scheme untrusted IC foundries. This challenge is due to the inherent error resiliency of neural networks to small errors. Logic locking is a hardware security technique aimed at protecting intellectual property (IP) against security threats in the IC supply chain, especially those posed by untrusted fabrication facilities. Such techniques incorporate additional locking circuitry within an IC that induces incorrect digital functionality when an incorrect verification key is provided by a user. The amount of error induced by an incorrect key is known as the effectiveness of the locking technique. A family of attacks known as ?SAT attacks? provide a strong mathematical formulation to find the correct key of locked circuits. In order to achieve high SAT resilience (i.e., complexity of SAT attacks), many conventional logic locking schemes fail to inject sufficient error into the circuit when the key is incorrect. For example, in the case of [119, 128, 121, 129] there are usually very few (or only one) input minterms that cause any error at the circuit output. The state-of-the-art stripped functionality logic locking (SFLL) [133] technique provides a wide spectrum of configurations which introduced a trade-off between SAT resilience and effectiveness. In this 70 work, we prove that such a trade-off is universal among all logic locking techniques. In order to attain high effectiveness of locking without compromising SAT resilience, we propose a novel logic locking scheme, called Strong Anti-SAT (SAS). In ad- dition to SAT attacks, removal-based attacks are another popular kind of attack formulation against logic locking where the attacker tries to identify and remove the locking structure and remove them. Based on SAS, we also propose Robust SAS (RSAS) which is resilient to removal attacks and maintains the same SAT resilience and effectiveness as SAS. SAS and RSAS have the following significant improvements over existing techniques. (1) We prove that the SAT resilience of SAS and RSAS against SAT attack is not compromised by increases in effectiveness. (2) In contrast to prior work which focused solely on the circuit-level locking impact, we integrate SAS-locked modules into an 80386 processor and show that SAS has a high application-level impact. (3) Our experiments show that SAS and RSAS exhibit better SAT resilience than SFLL and their effectiveness is similar to SFLL. 5.1 Introduction Due to the increasing cost of maintaining IC foundries with advanced technol- ogy nodes, many chip designers have become fabless and outsource their fabrication to off-shore foundries. However, such foundries are not under the designer?s control which puts the security of the IC supply chain at risk. Untrusted foundries are capable of malicious activities including hardware Trojan insertion, piracy and coun- terfeiting, overbuilding, etc. Many design-for-trust techniques have been studied as 71 countermeasures among which logic locking has been the most widely studied [16]. A logic locked circuit requires a secret key input and the correct key is kept by the designer and not known to the foundry. The functionality of the circuit is correct only if the key is correct. After the foundry manufactures the locked circuit and returns it to the designer, the correct key is applied to the circuit by connecting a tamper-proof memory containing the key to the key inputs. This process is called activation. Over the years, different types of logic locking mechanisms have been suggested. Initially, locking involved inserting XOR/XNOR gates in a synthesized design netlist [87]. Later, techniques based on VLSI testing principles have been outlined to improve logic locking schemes by manifesting high corruption at the output bits when an incorrect key is applied [82, 83]. The Boolean satisfiability-based attack, a.k.a. SAT attack [102] was a game changer and became the basis of many variants [17, 95, 94]. SAT provides a strong mathematical formulation to find the correct locking key of a logic locked IC which prunes out wrong keys in an iterative manner. In each iteration, an input (called the Distinguishing Input, or DI) is chosen by the SAT solver and all the wrong keys that corrupt the output of this DI are pruned out. All wrong keys are pruned out when no more DI can be found. Point function (PF)-based logic locking, including SARLock [128] and Anti-SAT [119, 121], force the number of SAT iterations to be exponential in the key size by pruning out only a very small number of wrong keys in each iteration. However, PF-based locking schemes have a drawback that there are very few (or only one) input minterms whose output is incorrect for each wrong key. Hence the overall error rate of the locked circuit with a wrong key is very small. 72 This disadvantage is captured by approximate SAT attacks such as AppSAT [94] and Double-DIP [95]. These attack schemes are able to find an approximate key (approx-key) which makes the locked circuit behave correctly for most (but not all) of the input values. Another kind of popular attack against logic locking schemes is removal attacks [130, 131]. In a removal attack, the attacker tries to find the logic locking module, remove it, and replace its output with a constant 0 or 1. The key step in this attack is to identify the output wire of the locking module. This can be achieved by structural analysis assisted by calculating the signal probability skew (SPS) of each wire [131]. Locking techniques such as Anti-SAT [119] is most vulnerable to this type of attack since the correct functionality of the original circuit can be obtained by removing the Anti-SAT module and replacing its output with 0. More recently, Yasin et al. proposed stripped functionality logic locking (SFLL) which allows the designer to select a set of protected input patterns that are affected by almost all the wrong keys while other input patterns are affected by very few wrong keys [133]. SFLL is not vulnerable to removal attack since the function- ality of the original circuit for the protected input patterns has been modified in SFLL. However, when the number of protected patterns increases, SAT attacks need significantly fewer iterations to find the correct key. Essentially, SFLL creates a fundamental trade-off between SAT resilience (i.e., SAT attack complexity) and effectiveness (i.e., the amount of error injected by a wrong key). This trade-off is problematic. On the one hand, if only very few input patterns are protected, a wrong key may not inject enough error into the circuit and useful work may still be done using the chip, rendering locking ineffective. On the other hand, having more 73 protected input patterns will compromise the circuit?s SAT resilience. Moreover. as we move into the machine learning (ML) era, error-resilient applications are becoming increasingly relevant since most ML-based applications usually embody substantial amount of error resilience. Hence small amount of error in the hardware (introduced by incorrect keys and/or hardware simplification) may not necessarily impact the overall application accuracy. With SFLL, if we want to ensure a very high corruption at the hardware level (for wrong keys), the resiliency to SAT would inevitably reduce. Addressing this dilemma is the main theme of our paper. We propose Strong Anti-SAT (SAS) to address the challenges in achieving high effectiveness without sacrificing SAT resilience. On one hand, SAS ensures that, given any wrong (including approximate) key, the error injected by locking circuitry will have significant application-level impact. On the other hand, SAS is provably resilient to SAT attacks (i.e., requiring exponential time). Based on SAS, we also propose Robust SAS (RSAS), a variant of SAS that is not vulnerable to removal attacks and has the same SAT resilience and effectiveness as SAS. This makes RSAS a substantial improvement over the limitations posed by SAS. The contribution of this work is as follows. 1. We prove the fundamental trade-off between SAT resilience and effectiveness which is applicable to any logic locking scheme. 2. We demonstrate the inability of existing locking techniques to secure hardware running real-world workloads due to such a trade-off. We show that, when the longest combinational path (i.e., the multiplier) in a 32-bit 80386 processor 74 is locked using SFLL, the processor fails to simultaneously have high SAT complexity and high application-level impact on both PARSEC [8] and ML- based application benchmarks. 3. We propose Strong Anti-SAT (SAS) to address this challenge. In SAS, a set of input minterms that have higher impact on the applications are identified as critical minterms. We design the locking infrastructure of SAS such that given a wrong key, the critical minterms are more likely to introduce error in the circuit and hence result in an application-level error. We also prove that the SAT complexity is exponential in the number of key bits and does not deteriorate when the number of critical minterms increases. This is a substantial improvement over SFLL. 4. We also propose a removal attack resistant variant of SAS, called Robust SAS (RSAS). RSAS is designed such that it achieves the same SAT resilience and effectiveness levels as SAS and if the locking module of RSAS is removed, the remaining circuit will exhibit incorrect functionality for critical minterms. 5. Experiment results show that, when locked using the same number of critical minterms, SAS and RSAS have higher SAT resilience than SFLL and they have about the same level of effectiveness. In terms of area, power, and delay overhead, RSAS and SFLL have similar overheads in general and are a little better than SAS. 75 The rest of the paper is organized as follows. Sec. 5.2 introduces the back- ground on SAT attack and existing logic locking schemes. We show that SFLL?s trade-off makes it incapable to secure real-world applications in Section 5.3. We then mathematically prove that the trade-off applies to all logic locking schemes in Section 5.4. In Section 5.5, SAS?s hardware structure is presented and its exponential SAT attack complexity is proved in theory. The removal attack resistant variant of SAS, i.e., RSAS, is introduced in Section 5.6. Section 5.7 describes the methodology to choose critical minterms. Section 5.8 shows the experimental results which demonstrate that when the same set of critical minterms are selected by SAS, RSAS, and SFLL, SAS and RSAS achieve higher security than SFLL while maintaining similar application-level effectiveness. Section 5.9 concludes the paper. 5.2 Background 5.2.1 Attack Model Fig. 5.1 illustrates the threat model we consider which is consistent with the latest papers in the logic locking field [119, 128, 18, 94, 120, 118, 133, 61, 135, 116]. The attacker can be either an untrusted foundry or an untrusted user who has the ability to reverse engineer the fabricated chip, obtaining the locked gate-level netlist. The attacker is considered to have the following resources: 1. The locked gate-level netlist of the circuit under attack. This can be obtained by reverse engineering the GDS-II file (which the foundry has) or a fabricated chip (which can be done by a capable end user). 76 Untrusted Foundry or Other Stages Designer or Open Market Designer IC Layout Packaging Trusted Third design generation Fabrication Party Original Obfuscated $ RTL Non-GDS II Masks Functional netlist netlist netlist functional IC Activation IC RE RE RE RE Black-Box Oracle Obfuscated Deciphered RE: Reverse Engineering netlist Attack netlist Figure 5.1: The targeted attack model of logic locking 2. An activated chip. The attacker is considered to own an activated chip (i.e., the one loaded with the correct key) since such a chip can be purchased from the open market. In general, logic locking research does not assume that the attacker is able to insert probes into the activated circuit, i.e., to observe the intermediate values. This is because protection schemes (e.g. analog shield [69]) can counter probing attacks. 5.2.2 SAT Attack For any combinational digital circuit, the functionality can be expressed using a Boolean function F : X~ ? Y~ where X~ and Y~ are the primary input and output, respectively. The logic locked circuit FL takes one more input, the key input K~ , in addition to the primary input. If K~ is correct, then ?X~ , F (X~ ) = FL(X~ ,K~ ). F (X~ ) may not be equal to F ~ ~ ~L(X,K) if K is incorrect. As stated earlier, the key is stored tamper-proof memory and is not accessible to the attacker. The Boolean satisfiability-based attack, a.k.a. SAT attack is a strong the- oretical formulation to find the correct key of a locked circuit. In the context of the SAT attack, we use the Conjunctive Normal Form (CNF): C(X~ ,K~ , Y~ ) to characterize Boolean satisfiability: C(X~ ,K~ , Y~ ) = TRUE if X~ , K~ , and Y~ satisfy 77 Y~ = F ~L(X,K~ ), where FL stands for the Boolean functionality of the locked circuit. C(X~ ,K~ , Y~ ) = FALSE otherwise. SAT attacks run iteratively and prune out incorrect keys in every iteration. The attack consists of the following steps: 1. In the initial iteration, the attacker looks for a primary input, X~ 1, and two keys, K~ ? and K~ ?, such that the locked circuit produces two different outputs Y~? and Y~?: C(X~ ,K~1 ?, Y~?) ? C(X~ 1, K~ ?, Y~?) ? (Y~ ~? =6 Y?) (5.1) X~ 1 is called the Distinguishing Input (DI). 2. The DI, X~ 1, is applied to the activated circuit (the oracle) and the output Y~1 is recorded. Note that K~ , Y~ , and K~ , Y~? ? ? ? are not recorded. Only the DI and its correct output are carried over to the following iterations. 3. In the ith iteration, a new DI and a pair of keys, K~ ? and K~ ?, are found. The newly found K~ ? and K~ ? should produce correct outputs for all the DIs found in previous iterations. To this end, we append a clause to Eq. (5.1): C(X~ i, K~ , Y~ ) ? C(X~ ,K~ , Y~ ) ? (Y~ 6= Y~ ) i?? ? i ? ? ? ??1 (5.2) (C(X~ j, K~ ?, Y~j) ? C(X~ j, K~ ?, Y~j)) j=1 In this way, all the wrong keys that corrupt the output of previously found DIs (i.e., the output is different from that of the activated chip) are pruned out from the search space. 4. SAT solves Eq. (5.2) repeatedly until no more DI can be found, i.e., Eq. (5.2) is not satisfiable any more. 78 5. In this case, there is no more DI. The output of the SAT attack is a key K~ that produces the same output as the activated circuit to all the DIs, which can be expressed using the following CNF: ?? C(X~ i, K~ , Y~i) (5.3) i=1 where ? is the total number of SAT iterations. Note that there can be multiple correct keys: some keys can be different from but functionally equivalent to the actual key in the activated chip. Theorem 5.1. SAT is guaranteed to find a correct key K~ c to the locked circuit. Proof. This can be proved by contradiction: suppose the key returned by the last step of SAT attack is a wrong key. This implies that there must exist a primary input X~ such that C(X~ ,K~ c, Y~c) ? C(X~ ,K~ , Y~ ) ? (Y~ =6 Y~c ) where K~ is the actual key, Y~c is the output with returned key K~ c and Y~ is the correct output according to the actual key K~ . X~ cannot be a previously found DI because otherwise K~ c will not satisfy (5.3). We can see that X~ qualifies for a DI: just assign K~ ~ ~ ~? = Kc and K? = K. This means that (5.2) is still satisfiable and contradicts the criteria that no more DI can be found before the SAT attack goes to the final step. Hence proved. 79 5.2.3 Existing Logic Locking Schemes Multiple logic locking schemes have been proposed to thwart the SAT at- tack [132, 119, 121, 133, 128]. There are two ways to mitigate the SAT attack: one is to increase the time for each SAT iteration and the other is to increase the number of SAT iterations. The former requires either AES blocks [132] or reconfigurable logic [49], which is impractical for most circuits. The other approach is to exponentially increase the number of SAT iterations. This approach is also not perfect because a locking scheme must be rather ineffective to improve security. This is the case for Anti-SAT [119, 121], SARLock [128], and and TTL [129]. All these techniques are vulnerable to the approximate SAT attacks (such as AppSAT [94] and Double-DIP [95]). The state-of-the-art, stripped functionality logic locking (SFLL) [133], explores the trade-off between security and effectiveness. SFLL comprises of two parts: a functionality stripped circuit (FSC) and a restore unit (RU). The FSC is the original circuit with the functionality modified for a set of protected input cubes. This modification makes SFLL resistant to removal attack. If the RU is removed, the FSC?s functionality of protected input cubes is different from the original circuit, thus making the attack unsuccessful. The RU stores the key, compares the circuit?s input with the key, and outputs a restore vector which is XOR?ed with the FSC output. If the key is correct, the restore vector will fix the FSC?s output and the circuit will have correct output. There are two variants of SFLL: SFLL-HD and SFLL-flex. SFLL-HD has been successfully attacked by a functional analysis 80 based attack [98, 99]. As the latter remains secure, provides higher flexibility in selecting protected cubes, and is more relevant to SAS, we focus on SFLL-flex in this paper. An SFLL-flex configuration can be described using the number of protected cubes, c, and the number of specified bits of each cube, k, denoted as SFLL-flexc?k. The authors of [133] derived the following characteristics of a circuit locked with SFLL-flexc?k: (1) the fraction of input minterms whose output will be corrupted by a wrong key (i.e., the ?error rate? of a wrong key) is c ?2?k; and (2) the probability that a SAT attack finds the correct key within q iterations is q ?2dlog2ce?k. We illustrate this relationship in Fig. 5.4. As a higher SAT success probability indicates weaker security, SFLL inherently suffers from a trade-off between security and effectiveness. Figure 5.2: The positive correlation between the error rate of wrong keys and the probability that SAT finds the correct key in certain iterations 81 5.3 Insufficiency of SFLL for Real-World Applications In this section, we investigate the application-level effectiveness of SFLL [133]. Specifically, we lock the multiplier within a 32-bit 80386 processor since it is the largest combinational component. The application-level impact is evaluated using both generic and neural network (NN)-based benchmarks. We emphasize NN-based applications because they are inherently error-resilient and hence are more difficult to protect using logic locking. Details of the benchmarks are listed in Table 5.1. Table 5.1: Application benchmark details Benchmark Type Quantity Content Generic Applications 9 The PARSEC Benchmark Suite [8] Neural Networks 5 MNIST [53], SVHN [68], CIFAR10 [51], ILSVRC-2012 [25], Oxford102 [70] In order to evaluate the application-level impact of a logic locking scheme, we modify the GEM5 [10] simulator so that error is injected into the locked processor module according to the hardware error profile due to the wrong key. In this way, the circuit-level error induced by an incorrect (including approximate) key can be evaluated at the application level. This framework is illustrated in Fig. 5.3 which is similar to the strategy used in [19, 135]. Approx-Key High-Level AppSAT Architecture-Gate-Level Description Level Netlist of a Attack of a Synthesis Simulation Processor Logic Locked Netlist of a Error Processor Tool Locking Processor Profile Figure 5.3: Our experimental framework 82 SFLL allows the designer to explore the trade-off between effectiveness and SAT resilience. We show that a ?sweet spot? does not exist. In our experiments, we lock the multiplier with various SFLL configurations, each having a different level of SAT resilience, quantified by the average SAT iterations to unlock (as the X axis in Fig. 5.4). The effectiveness of each locking scheme is evaluated by running the PARSEC and NN benchmarks on the locked processors loaded with approximate keys. The percentage of PARSEC benchmark runs with incorrect outcome and the accuracy loss of NN models are used as the criteria to evaluate the effectiveness of each locking configuration. The trade-off is illustrated in Fig. 5.4. blackscholes ferret streamcluster 1.0 bodytrack fluidanimate swaptions dedup freqmine x264 0.8 Cifar 10 ISLVRC 2012 100 MNIST 0.6 Oxford 102 75 SVHN 50 0.4 25 0.2 0 101 103 105 107 109 0.0 Average SAT Attack Iterations to Unlock 102 103 104 105 106 107 108 Average SAT Attack Iterations to Unlock Figure 5.4: SAT resiliency vs. locking effectiveness trade-off. Left: PARSEC benchmarks. Right: NN benchmarks. From Fig. 5.4, we observe that the wrong keys? impact decreases with the increase in SAT resiliency. In order to have a visible accuracy drop for the most error-resilient benchmarks, the SFLL locked processor cannot endure more than roughly 1000 SAT iterations. Such a locking scheme is extremely vulnerable since 1000 SAT iterations can be fulfilled within minutes. Therefore, a logic locking scheme that ensures high application-level impact without sacrificing SAT resiliency is needed. 83 % Benchmark Runs with Incorrect Outcome Average Accuracy Loss 5.4 Fundamental Trade-off for All Logic Locking Schemes This section generalizes the trade-off of SFLL to all logic locking schemes. We start with definitions of concepts and then prove the relationship between SAT resilience and effectiveness. Definition 5.1 (Corrupt). We say that a key K~ corrupts a primary input minterm X~ if and only if the locked circuit produces a different output to X~ from the original circuit?s output, i.e., FL(X~ ,K~ ) =6 F (X~ ). Definition 5.2 (Error Rate). The error rate K~ of a wrong key K ~ is the portion of primary input minterms corrupted by the key K~ . Let XK~ be the set of input minterms corrupted by K~ . Then, |X  K ~ | K~ = 2n where n is the number of bits in the primary input. We use  to denote the average error rate across all the keys. When the key is k bits long, 1 ?  =  |KW | K~ K~ ??KW | Definition 5.3 (Corruptibility). The corruptibility ?X~ of a primary input minterm X~ is the portion of wrong keys that corrupt this minterm. Let KX~ be the set of wrong keys that corrupts the primary input minterm X~ and KW be the set of wrong keys. Then, |KX~ |?X~ = |KW | 84 Let ? denotes the average corruptibility over all the input minterms, i.e., 1 ? ? = ? ~ 2n X X~?{0,1}n Let us illustrate the above concepts with the following example. We consider a circuit with two primary input bits (x0, x1) and locked with a two-bit key (k0, k1), as shown in Fig. 5.5. Table 5.2 is the truth table for each possible primary input and key input combinations. If a key corrupts a primary input, the corresponding cell is marked with (7). x0 x k00 x1 y y x1 k1 Original Circuit Locked Circuit Figure 5.5: An example of logic locking, with the original circuit on the left and the locked circuit on the right. Table 5.2: Truth table of the locked circuit in Fig. 5.5 K~ = (0, 0) K~ = (0, 1) K~ = (1, 1) K~ = (1, 0) Correct y ? ~ ?X X~ = (0, 0) 1(7) 0 0 1(7) 0 2 3 X~ = (0, 1) 1 0(7) 1 0(7) 1 2 3 2 3 X~ = (1, 1) 0 1(7) 0 1(7) 0 2 3 X~ = (1, 0) 1(7) 0 0 1(7) 0 2 4  1 1~ N/A 1K 2 2  2 3 In Table 5.2, we also calculate the error rate of each key, the corruptibility of each input minterm, and their averages. We can also observe that both the average error rate and average corruptibility equal 2 . It turns out that this equality is 3 universal in logic locking: 85 Theorem 5.2. The average error rate of all wrong keys equals the average corrupt- ibility of all input minterms, i. e.  = ?. Proof. Recall that 1 ? 1 ? |XK~ | 1 ? =  |KW | K~ = = |X |KW | n ~ | 2 2n|KW | K K~ ?KW K~ ?KW K~ ?KW and 1 ? 1 ? |K | 1 ? ? = ? = X ~ = |K | 2n X ~ 2n |KW | 2n|KW | X~ X~?{0,1}n X~?{0,1}n X~?{0,1}n Therefore, in order to prove  = ?, we only need to prove ? ? |XK~ | = |KX~ | (5.4) K~ ?KW X~?{0,1}n Let us consider the following bipartite graph G = (X ,KW , E) where X is {0, 1}n which is the set of all the possible input minterms, KW is the set of wrong keys, and E = {(X~ ,K~ )|X~ ? X and K~ ? KW , K~ corrupts X~ }. Both sides of Eq. 5.4 denote the total number of elements in E and hence must be equal. Let ? be the number of SAT iterations that a SAT attacker needs to find the correct key. Theorem 5.3. The expected number of SAT iterations E[?] is lower bounded by 1 . ? Proof. In each SAT iteration, the average number of wrong keys pruned by the DI X~ is upper bounded by ?|KW | (because some of the wrong keys may have already pruned out by DIs of previous iterations). Therefore, ? |K W | 1 E[?] = ?|KW | ? 86 Hence proved. Theorems 5.2 and 5.3 explicitly point out that there exists an inverse relation- ship between  and the lower bound of E[?]. This quantifies the trade-off between them. This trade-off applies to any logic locking scheme. Note that different input minterms may inject a different amount of error at the application level. By assigning higher corruptibility to a few minterms with high application-level impact, we can achieve high effectiveness while maintaining high SAT resilience by keeping ? low and E[?] high. This is the main intuition behind SAS. 5.5 The Architecture and Properties of SAS In Sec. 5.3 and 5.4, we demonstrated that two competing objectives exist for all logic locking schemes: 1. Effectiveness: Any incorrect key should have a high appli-cation-level error impact. 2. SAT resilience: The complexity of determining the correct key via SAT attacks should be very high. In this section, we introduce Strong Anti-SAT (SAS) logic locking scheme which aims to achieve both objectives simultaneously. SAS guarantees an expo- nential expected SAT solving time while having a large impact on the accuracy of real-world applications. In SAS, instead of uniformly distributing the error across all possible inputs, we identify certain input patterns which potentially have a 87 higher impact on the overall application-level error. We call these inputs critical minterms. SAS is configured in such a way that any incorrect key corrupts at least 1 critical minterm. For the other minterms, the corruptibility is low. 5.5.1 The SAS Block LetM be the set of critical minterms and m = |M| be the number of critical minterms. For the ease of implementation, we always choose m to be a power of 2. The basic locking infrastructure is the SAS block which is illustrated in Fig. 5.6. The key K~ of an n-bit SAS block consists of two n-bit sub-keys, K~ 1 and K~ 2. In order to describe the mechanism of the SAS locking scheme clearly, we use a reverse order and start our illustration from the output side. Primary Primary Input Original Circuit Output X YSAS X? K1 H(X, K ) g(X??K1) Tamper-Proof 1 Memory K2 g(X??K2) SAS Block Figure 5.6: The Architecture of SAS Configuration 1 with the Details of the SAS Block YSAS is the output of the SAS block. If YSAS = 1, a fault will be injected into the original circuit. g is a function with an on-set-size of 1, i.e., only one input minterm will have output 1 and all others will have output 0. g? has the opposite functionality of g. A function block X~ ? = H(X~ ,K~ 1) is inserted before g and g? and it works as follows. If X~ is not a critical minterm, then X~ ? = X~ . In this case, only one combination of K~ 1 will make g output 1, therefore X~ has a low corruptibility. 88 If X~ is a critical minterm, then for a portion of K~ , X~1 is adjusted according to K~ 1 to obtain X~ ? such that g(X~ ?, K~ 1) = 1 and hence the corruptibility is increased. X~ ? = H(X~ ,K~ 1) further ensures that the wrong keys that corrupts each critical minterm are mutually exclusive and evenly partition the set of wrong keys. More specifically, as the partitioning is based on the K~ 1 part of the key, we have the following. Let K1~ = {K~ 1|?K~ ~2 such that (K1, K~ 2) ? KW , (K~ ~1, K2) ? KX~ }. ThenX we have ? ?X~ ,X~ ?M, |K11 2 ~ | = |K1 1 1~ |, K ~ ? K ~ = ?, and K1 nX X X X X~ = Z (5.5)1 2 1 2 2 X~?M where n is the number of bits in X~ , K~ 1, and K~ 2. This effect is illustrated in Table 5.3. Table 5.3: Illustration of how m critical minterms partition the set of wrong keys K~ 1 of wrong keys ~k1 ? ? ? ~k 2n ~k 2n ? ? ? ~k 2n ? ? ? ~k+1 2 2n m m m X~1 ? ? ? critical X~2 ? ? ? minterms ? ? ? ? ? ? X~m ? ? X~m+1 ? non- X~m+2 ? critical . ? ? ? . . minterms X~2n ? The 2 configurations of SAS will be introduced in the rest of this section. 89 5.5.2 Configuration 1: SAS with One SAS Block This configuration is illustrated in Fig. 5.6. In this configuration, there is one SAS block. As the critical minterms evenly partition the set of wrong keys, the corruptibility of each critical minterm is ?c = 1 . Below we derive the SAT m resilience of this configuration assuming that the SAT solver chooses a DI uniformly at random in each iteration. This is a common assumption [129, 133, 92]. The SAT resilience is quantified using the expected number of SAT iterations E[?]. To start with, we give 2 useful lemmas. Lemma 5.4. Let Di be the set of DIs that have?been chosen in the first i iterations and X~ be a primary input minterm. If K ~X~ ? X~ ??Di KX~ ? , then X cannot be the DI of any SAT iteration beyond i. Proof. Recall that Equation (5.2) gives the SAT formula for each SAT iteration: C(X~ i, K~ , Y~ ~ ~ ~? ?) ? C(X1, K?, Y?) ? (Y~ =6 Y~ ) i? ? ??1 (C(X~ ~j, K?, Y~j) ? C(X~ ,K~j ?, Y~j)) j=1 To satisfy the first line, at one of K~ ? and K~ ? must be a wrong key that corrupts X~ . However, since any wrong key that corrupts X~ also corrupts at least 1 previously found DI, this wrong key cannot satisfy the second line. Hence such X~ cannot be the DI in future iterations. Lemma 5.5. For SAS Configuration 1, any critical minterm must exist in the set of DIs when SAT finishes: X~ ? D? ?X~ ? M, where ? is the total number of SAT iterations and D? is the set of all DIs. 90 Proof. Recall that g has on-set size 1. Let P~ be the very input that makes g(P~ ) = 1. ?X~ ? M, let K~ = X~1 ? P~ . Then, any K~ = (K~1, K~2) ? KW is a wrong key that only corrupts X~ . Therefore, X~ has to be chosen as a DI to prune out this wrong key. Theorem 5.6. The expected number of SAT iterations of SAS Configuration 1 is 2n +m E[?] = (5.6) 2 Proof. The total number of SAT iterations equals the total number of DIs. DIs consist of critical minterms and non-critical minterms. By Lemma 5.5, all the critical minterms must be in the set of DIs for SAT to terminate. Therefore, we only need to find the expected number of non-critical minterms that are chosen as DIs. As illustrated in Table 5.3, ?X~ ? ?/M, ? exactly one X~ ? M such that KX~ ? ? KX~ . By Lemma 5.4, if this X~ is chosen as DI before X~ ?, then X~ ? cannot be chosen in further iterations any more. In other words, X~ ? will count towards the total number of iterations only when it is chosen before the critical minterm X~ . By our assumption that the DI is chosen uniformly at random in each iteration, X~ ? has a probability of 1 to be chosen as DI before X~ is chosen. As this is true for any non-critical minterm, 2 n the expected number of SAT iterations is E[?] = 1(2n ?m) +m = 2 +m . 2 2 5.5.3 Configuration 2: SAS with Multiple Blocks In this configuration, we have l SAS blocks as illustrated in Fig. 5.7. Each SAS block takes an n-bit primary input X~ , which is shared among all the SAS blocks, and a 2n-bit key input. The output of each SAS block is XOR?ed with a wire in 91 the original circuit. Therefore, a fault is injected into the original circuit if any SAS block has output 1. Let Mj be the set of critical minterms for the jth SAS block , j = 1, 2, . . . , l. For ease of implementation, we choose l also to be a power of 2 and l ? m. The relationship between Mj and the total set of critical minterms M is thatM1,M2, . . . ,Ml have the same cardinality, are mutually exclusive, and evenly partition M, i.e., ?l |M1| = |M2| = ? ? ? = |Ml|, Mi ?Mj = ? ?i 6= j, and Mk =M (5.7) k=1 In this way, each SAS block has m critical minterms. As each critical minterm l receives high corruptibility from only one SAS block, the corruptibility of any critical minterm is ? lc = .m Primary Primary Input Original Circuit ? Output X YSAS,1 YSAS,l K Tamper-Proof 11 SAS Memory K Block 112 Kl1 SAS Kl2 Block l Figure 5.7: Configuration 2 with l SAS blocks Lemma 5.7. For SAS Configuration 2, any critical minterm must exist in the set of DIs when SAT finishes: X~ ? D? ?X~ ? M, where ? is the total number of SAT iterations and D? is the set of all DIs. Proof. This is a natural extension to Lemma 5.5. Let X~ be a critical minterm and X~ ? Mj. Recall that g has on-set size 1. Let P~ be the very input that makes g(P~ ) = 1. ?X~ ? Mj, let ~k = X~ ? P~ . Then, let us consider the following wrong 92 ? key K~ = (K~ 1, K~ 2, . . . , K~ l) ? KW which is composed as follows: K~ j = (~k,K~ j W2) ? Kj where KWj is the set of wrong keys for the jth SAS block. For any i = 1, 2, . . . , l that i 6= j, K~ i ? KCi where KCi is the set of correct keys for the ith SAS block. Such a key K~ is a wrong key that only corrupts X~ . Therefore, X~ has to be chosen as a DI to prune out this wrong key. Below, we will analyze the SAT resilience of this configuration by deriving the expected number of SAT iterations. Theorem 5.8. The expected number of SAT iterations of SAS Configuration 2 with l SAS blocks and m critical minterms is l ? 2n +m E[?] = (5.8) l + 1 Proof. By Lemma 5.7, every critical minterm must count toward the total number of SAT iterations. Therefore, we only need to derive the expected number of non- critical minterms that are chosen as DIs. For any non-critical minterm X~ ? ?/ M, in the ith SAS block, there exists exactly one critical minterm X~ i such that the set of wrong keys that corrupt X~ ? in this SAS block, Ki,X~ ? , is a subset of the set of wrong keys that corrupt X~ i, Ki,X~ .i i.e., Ki,X~ ? ? Ki,X~ . As the construction of the SAS block makes this true for anyi individual SAS block and the critical minterms for each SAS block are mutually exclusive, there are a total of l such critical minterms. When all of these l critical minterms are chosen as DI, they will cover the entire set of wrong keys that corrupt 93 X~ ?. Therefore, by Lemma 5.4, in order to include X~ ? in the set of DIs, it must be selected before all l critical minterms are selected. This holds for any non-critical minterm. By our assumption that the DIs are chosen uniformly at random in each SAT iteration, the probability that each non-critical minterm will be chosen as DI is l . l+1 Therefore, the expected number of SAT iterations is E[?] = l (2n ? m) + m = l+1 l?2n+m . l+1 The properties of both configurations of SAS are summarized in Table 5.4. Table 5.4: Properties of the 2 configurations of SAS Configuration l ?c E[?] n 1 1 1 2 +m m 2 2 1 ? l ? m l l2 n+m m l+1 5.6 Robust SAS: a Removal-Resilient SAS Variant Although SAS achieves desirable SAT resilience and high corruptibility on critical minterms, it is still vulnerable to removal attack. In such an attack, the attacker can identify and remove each SAS block and replace their output wires with constant 0. In this way, the remaining part of the locked circuit will have correct functionality. In order to address this drawback, we introduce Robust SAS (RSAS), a variant of SAS that is resilient to removal attacks. In addition to adding an RSAS function block, RSAS modifies the functionality of the original circuit. Therefore, unlike SAS, one cannot obtain the correct functionality of the circuit 94 by identifying and removing the RSAS block. We will introduce the architecture of RSAS and show how any SAS configuration can be converted to a functionally equivalent RSAS configuration. Due to the equivalence in functionality, an RSAS configuration will have the same SAT resilience and effectiveness as its SAS counterpart. 5.6.1 RSAS Architecture and Relationship with SAS A circuit locked by RSAS consists of an altered original circuit and one or more RSAS block(s). Fig. 5.8 illustrates the RSAS configuration with one RSAS block. Given the same set of critical minterms and the same number of locking function blocks, locking a circuit with RSAS and SAS will yield the same functionality. An RSAS-locked circuit can be obtained by converting a functionally equivalent SAS-locked circuit in the following way. Functionality inverted for Primary all critical minterms Primary Input Altered Original Circuit Output X YRSAS X? H(X, K ) g(X??K1) Tamper-Proof 1 K Memory 1 g(X??K 2 ) K2 RSAS Block Figure 5.8: A circuit locked with one RSAS block, equivalent to SAS Configuration 1 95 5.6.1.1 Altering the original circuit Recall that l is the number of SAS blocks in a SAS configuration. For the jth SAS block, j = 1, 2, . . . , l, the set of critical minterms it contains is denoted by Mj and |Mj| = m , where m is the total number of critical minterms. In order l to implement RSAS, we need to modify the original circuit?s functionality. Notice that, for each SAS block, there is a wire in the original circuit that is XOR?ed with the SAS block?s output. For the jth SAS block, we locate this wire. For each critical minterm in Mj, we invert the functionality of critical minterms at this wire. This needs to be done for each j in j = 1, 2, . . . , l. This is illustrated in Fig. 5.9. Functionality inverted for Functionality inverted Primary critical minterms in M1 Primaryfor critical minterms Input ? in M l Output Altered Original Circuit X Y K RSAS,1 YRSAS,l Tamper-Proof 11 RSAS Memory K Block 112 Kl1 RSAS Kl2 Block l Figure 5.9: A circuit locked with multiple RSAS blocks, equivalent to SAS Configuration 2 5.6.1.2 Converting the SAS block into the RSAS block The RSAS block is very similar to the SAS block and there is only one difference between them. For the jth SAS block, j = 1, 2, . . . , l, if the primary input is a critical minterm in Mj, the output of RSAS block, YRSAS,j, is the inversion of the output of SAS block, YSAS,j. Recall that, for a SAS configuration with m critical minterms and l SAS blocks, each critical minterm?s corruptibility is ? = lc . Hencem 96 ? for a portion of l wrong keys, YSAS,j is 1. This is achieved by the X~ ? = H(X~ ,K~ m 1 ) function: if X~ is a critical minterm, then the H(X~ ,K~ 1) function makes sure that for ?c portion of wrong keys, we will have g(X~ ? ? K~ 1) = 1. For RSAS, since the functionality for critical minterms is inverted, the portion of wrong keys that makes Y m?l ~ ~RSAS,j be 1 is 1 ? ?c = . This means the functionality of H(X,K1) needs tom be modified in the following way: if X~ is a critical minterm, then for 1? ?c portion of wrong keys, g(X~ ? ? K~ 1) will output 1. For non-critical input minterms, YRSAS behaves in the same as YSAS. This is illustrated in Table 5.5. Table 5.5: Illustration of RSAS block?s functionality. A ??? stands for YRSAS = 1. K~ 1 of wrong keys ~k1 ? ? ? ~k 2n ~k 2n ? ? ? ~k ~+1 2 2n ? ? ? k2n m m m X~1 ? ? ? ? ? critical X~2 ? ? ? ? ? minterms ? ? ? ? ? ? X~m ? ? ? ? ? ? X~m+1 ? non- X~m+2 ? critical . ? ? ? . . minterms X~2n ? 5.6.2 SAT Resilience and Effectiveness of RSAS In Sec. 5.6.1, we introduced how to convert a SAS-locked circuit into an equivalent RSAS-locked circuit. These steps essentially invert the functionality of each critical minterm at two places: the first at the wire in the original circuit where RSAS is integrated, and the other at the RSAS block?s output. Since these two wires are XOR?ed, the two inversions will cancel out which makes the RSAS-locked 97 circuit functionally equivalent to the SAS-locked circuit. Due to the equivalence in functionality, the derivations of SAS?s SAT resilience and effectiveness will also hold for RSAS. Therefore, Table 5.4 is also the summary of these properties of RSAS. 5.7 Choosing Critical Minterms The critical minterms for injecting large errors should be selected judiciously. A careful analysis of the workload would help identify these typical minterms. Generally these minterms would be very few as compared to the overall input space of the functional modules. Here we describe how to select the critical minterms. As mentioned in Sec. 5.3, we use PARSEC and NN models as application benchmarks. For the PARSEC (generic) benchmarks, we arbitrarily choose critical minterms from the input minterms that exist in all the application benchmarks. We take a similar approach for NN benchmarks. A significant part of an NN-based application is the weights of the NN model and it turns out that the weight values of most NN models follow a similar distribution. For example, Figure 5.10 shows the distribution of weights of the LeNet (MNIST dataset) and CaffeNet (ISLVRC-2012 dataset) models. These two are the smallest benchmark and the largest benchmark, respectively. The weight distributions are similar across NN benchmarks and many other NN models. This kind of similarity can be also found among generic applications. We select a subset of weight values to be critical minterms based on their application-lavel impact. The selected critical minterms should cause significant application-level error. Fig. 5.10 also shows the accuracy loss of the NN model in the 98 following experiment: for each input minterm, we measure the accuracy loss of the NN model when every computation involving this very minterm is corrupted while no other minterm is corrupted. As the input minterm distributions are similar MNIST Benchmark ISLVRC_2012 Benchmark 10 1 1.00 10 2 0.75 10 3 0.5 0.50 10 4 10 5 0.25 10 6 0.0 0.001.0 0.5 0.0 0.5 1.0 1 0 1 Parameter values Parameter values Figure 5.10: Weight distribution (blue histogram, left Y axis) and application-level accuracy loss (red line, right Y axis) of LeNet and CaffeNet when the corresponding input is locked among the same type of applications, the flexibility of SAS allows the designer to choose a configuration and a combination of critical minterms that work well in securing the intended applications without compromising the SAT resiliency. 5.8 Experiments & Comparison with SFLL This section shows the experimental results of SAS and RSAS as well as the comparison with SFLL. Recall that, as illustrated in Fig. 5.3, we obtain the gate-level netlists of the multiplier within a 32-bit 80386 processor by synthesizing the high-level description using Cadence RTL Compiler. Then we lock the netlist using various SAS and RSAS configurations and SFLL-flex with the same set of critical minterms. Note that the critical minterms are selected according to the method described in Sec. 5.7. The architecture-level simulation is conducted by a modified GEM5 [10] simulator where error is injected into the locked processor 99 Frequency Accuracy Loss Frequency Accuracy Loss module according to the hardware error profile due to the wrong key. We conduct the following experiment to verify the SAT resilience and effectiveness of SAS and RSAS and compare them with SFLL. 5.8.1 SAT Resilience We first verify whether the SAT resilience of SAS/RSAS (i.e., the actual number of SAT iterations) matches what we have derived in Sec. 5.5. The SAT resilience of SAS/RSAS and SFLL is also compared. We lock the multiplier in a 32-bit 80386 processor with SAS and RSAS as well as SFLL. Fig. 5.11 shows the actual and expected number of SAT iterations of multipliers locked with SAS and RSAS. These numbers are compared to the actual number of iterations of SFLL. In these locking configurations, we use 14 bits of primary input for locking purposes (n = 14) and experiment with each feasible configuration with up to 4 critical minterms. We can observe that SAS and RSAS have similar numbers of actual SAT iterations and they are both close to the expected value. When there is more than one critical minterms, SAS and RSAS exhibit higher SAT resilience than SFLL. This is because the corruptibility of each critical minterm in SFLL is almost 1 no matter how many critical minterms there are. This compromises its SAT resilience. Fig. 5.12 compares the actual SAT iterations of SAS and SFLL. In Fig. 5.12a, it can be observed that SAS?s SAT complexity is higher than that of SFLL by a roughly constant factor when m is fixed at 4. Note that the same set of four critical 100 Figure 5.11: Actual and expected number of SAT iterations of SAS and RSAS, compared with SFLL. minterms are used for each locking scheme. Among various SAS configurations, a larger l comes with higher SAT resilience as expected. In Fig. 5.12b, we vary the critical minterm count (m) from 4 to 32 and demonstrate its impact on the SAT resilience of SAS and SFLL. While SAS configurations become stronger with more critical minterms, SFLL becomes weaker. Therefore, SAS is more SAT resilient and gives designers more flexibility when more critical minterms are needed. 60000 50000 104 40000 30000 103 SAS Config 1 l = 1 SAS Config 1 l = 1 SAS Config 2 l = 2 20000 SAS Config 2 l = 2 SAS Config 2 l = 4 SAS Config 2 l = 4 102 SFLL 10000 SFLL 0 8 9 10 11 12 13 14 15 16 5 10 15 20 25 30 Key length n Critical minterm count m (a) Varying key length (n), fixing # critical minterms m =(b) Varying # critical minterms (m), 4 fixing key length n = 16 Figure 5.12: The observed SAT iterations of SAS and SFLL by varying key length and critical minterm count. 101 SAT Iterations SAT Iterations 5.8.2 Effectiveness We evaluate the effectiveness of SAS/RSAS and SFLL at the application level using PARSEC [8] and ML benchmarks as listed in Table 5.1. Due to the functional equivalence of SAS and RSAS, they will have the same architecture-level effects and we use the same functional model to perform architecture-level simulation of SAS and RSAS. In our experiments, various numbers of critical minterms are locked. The same set of critical minterms are used for SAS/RSAS and SFLL in each experiment. The critical minterms are chosen according to the methods described in Sec. 5.7. For SAS, we choose l = 1 when m = 1 and l = 2 when m ? 2. Figs. 5.13 and 5.14 show that both SAS/RSAS and SFLL are effective at the application level for both generic and ML-based applications. SAS/RSAS achieves high application-level effectiveness and exponential SAT resiliency at the same time. Considering that SAS/RSAS?s SAT resilience is not compromised with the increase in m as opposed to SFLL (as shown in Figs. 5.11 and 5.12b), SAS/RSAS is a significant improvement over SFLL. 1.0 1.0 0.8 0.8 blackscholes blackscholes bodytrack bodytrack 0.6 dedup 0.6 dedup ferret ferret 0.4 fluidanimate 0.4 fluidanimate freqmine freqmine streamcluster streamcluster 0.2 swaptions 0.2 swaptions x264 x264 0.0 1 2 4 8 0.0 1 2 4 8 Critical minterm count (m) Critical minterm count (m) (a) SAS/RSAS on PARSEC (b) SFLL on PARSEC Figure 5.13: The application-level effectiveness of SAS/RSAS and SFLL on PARSEC and ML benchmarks 102 % of Benchmark Runs with Incorrect Outcome % of Benchmark Runs with Incorrect Outcome 1.0 1.0 0.8 0.8 0.6 0.6 0.4 LeNet 0.4 LeNet Cifar Cifar SVHN SVHN 0.2 Oxford102 0.2 Oxford102 ISLVRC 2012 ISLVRC 2012 0.0 0.0 1 2 4 8 1 2 4 8 Critical minterm count (m) Critical minterm count (m) (a) SAS/RSAS on ML (b) SFLL on ML Figure 5.14: The application-level effectiveness of SAS/RSAS and SFLL on PARSEC and ML benchmarks 5.8.3 Area, Power, and Delay Overhead of SAS, RSAS, and SFLL Now that we have demonstrated the SAT resilience of SAS and RSAS and their application-level effectiveness, we evaluate their area, power, and delay overhead. The overhead is also compared with SFLL. In our evaluation, we use 32 bits from the primary input for locking (n = 32) and lock up to 4 critical minterms (m = 1, 2, 4). We synthesize the original and locked circuits using Cadence RTL Compiler using SAED 90nm process. Figs. 5.15, 5.16, and 5.17 show the area, power, and delay overhead values, respectively. Compared with SFLL, on average, SAS and RSAS have 2.22% and 1.49% more area overhead, 0.43% more and 0.04% less power overhead, 0.93% and 0.71% more delay overhead, respectively. These are not significant increases in overhead and should be worth the gain in SAT resilience. 103 Classification Accuracy Loss Classification Accuracy Loss Figure 5.15: Area overhead of SAS and RSAS compared with SFLL Figure 5.16: Power overhead of SAS and RSAS compared with SFLL 5.9 Summary In this work, we investigate logic locking techniques to secure both generic and error-resilient workloads running on locked processors. We motivate our work by demonstrating the insufficiency of the state-of-the-art logic locking scheme in securing such applications. We point out that this is due to the fundamental trade- off between SAT resilience (SAT attack complexity) and effectiveness (error rate Figure 5.17: Delay overhead of SAS and RSAS compared with SFLL 104 of wrong keys) of logic locking. We formally prove this trade-off. In order to address this dilemma, we propose Strong Anti-SAT (SAS) where a set of critical minterms are assigned higher corruptibility in order to ensure high application-level impact. Based on SAS, we also propose Robust SAS (RSAS) to thwart removal attacks on logic locking. RSAS is functionally equivalent to SAS and has the same SAT resilience and effectiveness. Experimental results show that SAS and RSAS secure processors against SAT attack by ensuring exponential SAT attack complexity and high application-level impact simultaneously given any wrong key. We also evaluate the area, power, and delay overhead of SAS and RSAS and compare it with SFLL. It is shown that SAS and RSAS have modest increase in overhead. In summary, RSAS exhibits a higher SAT resilience than SFLL when multiple critical minterms are secured, while also maintaining equivalent effectiveness and removal attack resilience. Therefore, RSAS constitutes a significant improvment over SFLL- based locking. 105 Chapter 6: Cache Side-Channel-based Reverse Engineering of Neural Networks In recent years, deep neural networks (DNN) have become an important type of intellectual property due to their high performance on various classification tasks. As a result, DNN stealing attacks have emerged. Many attack surfaces have been exploited, among which cache timing side-channel attacks are hugely problematic because they do not need physical probing or direct interaction with the victim to estimate the DNN model. However, existing cache-side-channel-based DNN reverse engineering attacks rely on analyzing the binary code of the DNN library that must be shared between the attacker and the victim in the main memory. In reality, the DNN library code is often inaccessible because 1) the code is proprietary, or 2) memory sharing has been disabled by the operating system. In our work, we propose GANRED, an attack approach based on the generative adversarial nets (GAN) framework which utilizes cache timing side-channel information to accurately recover the structure of DNNs without memory sharing or code access. The benefit of GANRED is four-fold. 1) There is no need for DNN library code analysis. 2) No shared main memory segment between the victim and the attacker is needed. 3) Our attack locates the exact structure of the victim model, unlike existing attacks 106 which only narrow down the structure search space. 4) Our attack efficiently scales to deeper DNNs, exhibiting only linear growth in the number of layers in the victim DNN. 6.1 Introduction Deep neural networks (DNN) have demonstrated exceptional performance in a multitude of applications such as image classification and speech recognition, making them a valuable and important form of intellectual property. In order to protect DNN models, owners often host them on remote servers, restricting users only to querying the model. Hence, users do not have the details of the model (i.e., architecture or weights). However, DNN model theft is still possible in this scenario. For example, an adversary can exploit side-channel information in order to reverse engineer the DNN [123, 42, 43, 5, 114, 6, 27, 107]. Under the remote host setting, cache side-channel shows the most promise. Because the last level cache (LLC) is shared among each processor core in most modern computer architectures, the attacker can infer the victim?s cache usage even without interacting with the victim directly. Existing cache-based attack focus on reverse engineering the structure of DNNs. As shown by the variety of prior research aimed at reverse engineering the structure of DNNs, such as [123, 42, 43], even if these attacks do not decipher the weight information, knowing the structure of DNNs enables enables weight extraction attacks [107] and membership inference attacks [96, 62] and improves black-box adversarial example attacks [75]. Therefore, unlocking the underlying DNN struc- 107 ture is a formidable attack. Hong et al. proposed DeepRecon which monitored calls to selected TensorFlow library functions and observed the layer sequence of DNNs [42]. Yan et al. proposed Cache Telepathy which substantially narrowed down the dimension parameter search space of DNNs by obtaining the number of generalized matrix multiplication (GEMM) operations via cache timing side-channels [123]. They were able to identify 16 possible structures for the VGG-16 DNN [97]. Both of these attacks required that the attacker and the victim share the DNN?s library code (e.g. TensorFlow or GEMM library) in main memory (i.e., the library code in the main memory is mapped to the virtual address spaces of both the attacker and the victim). However, memory sharing can be disabled by the server?s operating system and the library code may be proprietary and inaccessible, thereby rendering these attacks infeasible. Moreover, neither of these attacks could give the DNN dimension parameters precisely. Instead, they return only the layer sequence or a set of possible parameter combinations. Since slight differences in DNN structure may result in a significant difference in accuracy under the same training effort [43], obtaining the exact structure of the victim DNN is crucial. Other existing DNN reverse engineering attacks require querying the victim DNN model [43, 107] or any physical side-channel probing [43, 5, 114, 6, 27]. These are not required by GANRED either. Therefore, GANRED can be carried out in a more realistic scenario. 108 6.1.1 GANRED Attack Overview In our work, we develop GANRED, a novel generative adversarial nets (GAN)- based [32] Rereverse Engineering attack on DNNs which is capable of both fully recovering the dimension parameters of a DNN and does not require shared library access. For this attack, the victim DNN?s cache side-channel information is measured by the attacker and acts as the ground truth of the GAN using a cache side-channel attack technique called Prime+Probe [73, 38, 79]. This technique does not require any shared main memory segment between the attacker and the victim. The attacker builds another DNN and updates the structure of this DNN repeatedly to make its structure equivalent to the victim DNN. In the rest of the paper, we refer to the victim?s DNN as VDNN and the attacker?s DNN as ADNN. In order to achieve his/her objective, the attacker needs to find the correct structure of each layer before moving on to the next layer. This is done as follows. The attacker initializes the ADNN as a one-layer network. For each feasible structure of this layer, the generator measures the cache side-channel of the ADNN in the same way as the VDNN is measured (i.e., using Prime+Probe). The discriminator compares the cache side-channel information of the VDNN and the ADNN and indicates for how many clock cycles the two DNNs produce identical side-channel information. If the ADNN has the correct structure, i.e., the same structure as the first layer of the VDNN, then the ADNN?s cache side-channel 109 information should be identical to the VDNN?s first layer, and the discriminator will indicate that the side-channel information of the two DNNs is identical throughout the period that the ADNN runs. The validator compares the discriminator?s output with a theoretical running time of the ADNN estimated using a linear regression analysis. This effectively rules out the ADNN structures that cause its cache side-channel to diverge from the the VDNN?s in the middle of the ADNN?s execution. The attacker chooses the structure that produces accurate cache side-channel data for the longest time as the first ADNN layer. In order to search for the structure of VDNN?s second layer, similar operations are done. Each feasible structure of the second layer is appended to the (now known) first layer to compose a two-layer ADNN whose cache side-channel is measured by the generator. The discriminator compares the cache side-channel information of the two DNNs and the validator determines whether the added matching time agrees with the theoretical runinng time of the second layer. The structure of each successive layer is recovered in this way until an ADNN is recovered that produces identical cache side-channel for the entirety of each DNN?s execution. The attack is considered successful if ADNN?s final structure is the same as the VDNN?s structure. 6.1.2 Contributions The contributions of this work are as follows: 110 ? We propose the GANRED framework where DNNs are characterized by their accesses to a cache set over time. Our technique does not need any shared main memory segment between the victim and the attacker or analyze the DNN library codes on the server. Both resources were required by existing cache side-channel based DNN structure reverse engineering attacks [123, 42]. GANRED does not require querying the victim DNN model or any physical probing either, as required in other existing DNN reverse engineering attacks [43, 5, 114, 6, 27, 107]. Hence, GANRED can be carried out in a more realistic scenario where these privileges are not granted. ? We prove the following theoretical basis for GANRED. If the ADNN has the same structure as the first l layers of the VDNN, then both DNNs should produce identical cache side-channel information throughout these l layers. ? We show that our attack produces the exact structure of each VDNN model. This has not been achieved by existing DNN reverse engineering attacks based on cache side-channels [123, 42]. ? We prove that the runtime of GANRED scales linearly in the number of DNN layers. This makes our attack scalable to much deeper DNNs. 111 6.2 Background 6.2.1 Dimension Parameters of Deep Neural Networks Deep neural networks (DNN) are a supervised classification technique that consists of a sizable number of cascaded layers. Let i and l denote the layer number and the total number of layers, respectively, hence i ? [l]. In each layer, the input feature map (IFM) (a.k.a. the set of input neurons) is transferred into the output feature map (OFM) (a.k.a. the set of output neurons) via an operation which involves a set of filters. The IFM and OFM sizes (i.e., number of contained neurons) of layer i are denoted by zini and z out i , respectively. Note that the OFM of the previous layer is the IFM of the next layer. Most DNNs consist of two types of layers: fully connected (FC) layers and convolutional (Conv) layers. The IFM and OFM of FC layers are (1-dimensional) vectors whose lengths are zin and zouti i , respectively. The weights consist of a matrix of dimension zin ? zouti i . The structure of a Conv layer is illustrated in Fig. 6.1. The IFM and OFM of a Conv layer are both 3-dimensional arrays. The width and height of a feature map are usually equal. There are a set of filters in the Conv layer and each of them is also a 3-dimensional array. Each filter is convolved with the IFM to obtain a channel in the OFM. A Conv layer can be characterized by a set of dimension parameters as listed in Table 6.1. Note that a Conv layer can be followed by an optional pooling layer and, if so, we consider pooling as a part of the Conv layer. We define the parameter Pi as the indicator of whether there is a pooling layer after layer i. 112 wini * = d outi filters win * i = * w out i = din w f w outi i i din d outi i IFM Filters OFM Figure 6.1: Illustration of a convolutional layer. ?*? indicates inner product, each computing an output neuron. Type of layer Parameter Definition (subscript i indicates layer i) wini , w out i IFM/OFM width dini , d out i IFM/OFM depth (number of channels)Conv layer wfi , ?i convolution filter width and stride Pi indicator of pooling layer existence FC layer zin, zouti i , z f i IFM/OFM/filter size Table 6.1: List of dimension parameters of a layer 6.2.2 Cache Architecture Fundamentals Cache is a type of on-chip storage for processors which temporarily stores a subset of the main memory?s content in order to reduce memory access latency and improve the processor?s efficiency. The basic component of cache is a cache block (also called a cache line). Most modern processors have a set associative cache where the cache is divided into multiple ways, each having the same number of blocks. For example, Fig. 6.2 illustrates a two-way set associative cache. The cache blocks in the same position of each way constitute a set. The organization of address bits is given in Fig. 6.2. When a block is to be moved into the cache, the cache controller will extract the set index bits from the block?s address and put the block into an available slot in the according cache set. If no slot is available in the set, 113 Figure 6.2: A two-way set associative cache example the controller will select a block within the set to be replaced with the new block according to the replacement policy. The most commonly used replacement policy is to replace the least recently used (LRU) block. In modern multi-core processors, the cache has a hierarchy of multiple levels. We specifically focus on the last level cache (LLC) since it is shared among all the processor cores. Hence the LLC is used by every program running on the processor, no matter which core the program runs on. 6.2.3 Cache Timing Side-Channel Attacks In a cache timing side-channel attack, the attacker and the victim are two processes running on the same processor. The attacker seeks information leakage about the victim process by exploiting a fundamental property of cache: a cache hit is fast and a cache miss is slow. Although a lot of attack techniques have been proposed, most of them can be described as a three-step process [26]: 114 1. The attacker initializes the state of a cache location. 2. The victim program executes, which may modify the state of the attacker- initialized cache location. 3. The attacker accesses the same cache location again and observes the access latency. By doing so, he/she can infer whether the victim has accessed the initialized cache location. These attacks can be categorized by whether data is shared between the attacker and victim processes. 6.2.3.1 Attacks based on Data Sharing Flush+Reload is the major type of attack in this category [127, 126, 37]. These attacks require shared data between the attacker and the victim which can be achieved when the operating system allows multiple processes to map their individual virtual addresses to the same physical address for commonly required resources (e.g. library files) [39]. This sharing enables the attacker to obtain the victim?s library usage information via cache timing side-channel. The 3 steps of Flush+Reload are as follows: 1. Flush: The attacker targets an address within shared memory and calls clflush (an X86 instruction) to flush the cache line (i.e., block) that contains this address if such a cache line exists. Otherwise clflush has no effect. 115 2. The victim process runs. The content of the targeted address will be brought back to the same cache location if accessed by the victim. 3. Reload: The attacker accesses the targeted address and infers whether a cache hit or a cache miss occurs based on the access latency. If the victim has accessed the flushed address, a cache hit will occur. Otherwise, a cache miss occurs. 6.2.3.2 Attacks without Data Sharing Many cache timing side-channel attacks work without shared main memory. Because there is a many-to-one mapping from main memory to cache, the attacker?s and the victim?s physical addresses can map to the same last level cache (LLC) location. In this way, the attacker can still detect the changes in cache state made by the victim. Examples of such attacks are Prime+Probe [73, 38, 79], Evict+Time [73], and cache collision-based attacks [12]. Among these attacks, Prime+Probe is the best known and most widely used. Its mechanism is as follows: 1. Prime: the attacker fills the cache sets of interest with his/her own data. 2. The victim program runs which may or may not overwrite the primed cache sets. 3. Probe: The attacker accesses the primed cache sets and observes timing. A cache miss indicates that the victim has accessed that cache set. 116 Note that probing automatically primes the cache again which enables the attacker to monitor the cache for a long period. 6.2.4 Existing DNN Reverse Engineering Attacks and De- fenses Reverse engineering of neural models has become a real threat which has at- tracted several researchers? attention. Among this body of work, a variety of distinct attack approaches have been explored to reverse engineering neural models. Yan et al. and Hong et al. independently proposed neural network reverse engineering techniques based on cache side-channels [123, 42]. In their attacks, the attacker needs to analyze the neural model?s library code, extract the control flow, and select code lines to measure cache timing. These lines represent certain functions that are called when the neural network is running. By monitoring these function calls, information about the victim DNN?s structure can be extracted. DeepRecon by Hong et al. inserted probes into the TensorFlow library code and was able to tell the number of layers and the type of each layer in DNNs [42]. In Yan et al. ?s Cache Telepathy attack, the generalized matrix multiply (GEMM) backend libraries were monitored and they were able to reduce the DNN structure search space significantly [123]. For example, only 16 possible structures of the VGG-16 DNN [97] are still feasible after their attack. DeepRecon uses Flush+Reload which requires the attacker and victim to share the main memory segment that contains TensorFlow library files. Cache Telepathy can be done using either Flush+Reload 117 or Prime+Probe. However, even the Prime+Probe version requires a shared main memory segment for the GEMM library files. This might not be a realistic attack scenario: the operating system can disable memory sharing between different users or processes, and the attacker may not have access to the server?s DNN library code. In addition to cache side-channel, power/electromagnetic side-channels [5, 114, 6] and timing side-channel [27] have also been exploited to reverse engineer neural models. However, these attacks require physically probing the hardware and are feasible only when such probing is possible. Trame?r et al. proposed a technique to steal neural models from remote servers through prediction APIs provided by the server [107]. A countermeasure proposed by Juuti et al. detects such model extraction attacks with a statistical technique [47]. Hua et al. found out that the neural network can be reverse engineered from its memory access pattern [43]. Provably secure memory access protocol [56] and secure neural accelerator designs [113] can defend against this attack. Many DNN protection techniques have been developed. Homomorphic encryp- tion (HE) [13, 15, 30, 115] and secure multi-party computation (MPC) [86, 66, 72] have been employed to ensure the privacy of both the neural model and the input data. However, even the state-of-the-art HE and MPC algorithms are still too complex to use in practice. Additionally, several works have proposed the use of secure enclaves for DNN operations (such as Intel SGX) [106, 36]. However, DNNs running in these enclaves are vulnerable to cache side-channel attacks as well [112, 35]. In summary, there has not been an effective countermeasure against cache-based DNN reverse engineering. 118 6.3 Attack Model In our attack model, the VDNN runs on a server alongside the attacker which is another process on the same server. The attacker?s goal is to reverse engineer the structure of the victim DNN model. We consider a realistic threat model under which the attacker process does not have the privileges assumed by many prior works such as code access, memory sharing, or physical probing [123, 42, 43, 5, 114, 6, 27, 107]. The resources available to the attacker are as follows: ? Shared last-level cache (LLC) with the victim. This is the case for most state-of-the-art computer architectures. Shared LLC enables the attacker to obtain cache side-channel information of another process, e.g. the victim DNN, using Prime+Probe. ? High-level APIs of the machine learning framework. This is available to the attacker when he/she acts as a regular user on the server. This enables the attacker to construct a DNN model. With these two resources, the attacker can obtain the cache side-channel information of both VDNN and that of ADNN using Prime+Probe. Note that these are only a small subset of the attackers? resources in prior attack models [123, 42, 43, 5, 114, 6, 27, 107]. We assume that the attacker does not have any of the following privileges: ? Querying VDNN. These are normally unavailable to the attacker unless the victim grants permission. However, [43, 107] both require this access. 119 ? The library code of the machine learning framework (e.g. TensorFlow), since the library may contain intellectual property of the server and users only need high level APIs to build their neural model. Lacking code access makes the attacks in [123, 42] infeasible since they need to find specific functions in the code to insert probe. ? Shared main memory that stores the machine learning library code. This is also needed by [123, 42] in order to map the shared library to the attacker?s own virtual memory space and implement Flush+Reload. ? Physical access to the processor. This access is not available when the server is not controlled by the attacker. This makes any side-channel other than cache side-channel impossible to measure and renders the attacks in [5, 114, 6, 27, 43] infeasible. In summary, we assume a scenario where the resources available to an attacker are very constrained. None of the existing reverse engineering attacks [123, 42, 43, 5, 114, 6, 27, 107] are possible in this setting. However, in this work, we show that even under such constraints, there is still substantial information leakage about the DNN model through the cache timing side-channel. GANRED reverse engineers the DNN structure by utilizing this information. A server?s security measures, such as restricting queries to the victim, and eliminating memory sharing or library code access, will disable existing attacks but not hide the side-channel information that is sufficient for GANRED. 120 6.4 Attack Methodology In this section, we introduce the GANRED framework details. In Sec. 6.4.1, we present how to characterize a DNN using Prime+Probe results. Sec 6.4.2 describes how each component of the GANRED framework works. Sec. 6.4.3 introduces the overall algorithm of GANRED. Sec. 6.4.4 details how the validator utilizes a linear regression analysis to estimate the running time of a layer based on its structure. Sec. 6.4.5 proves the premise of GANRED that, if the ADNN has l layers and its structure is the same as the first l layers of the VDNN, then the ADNN should produce identical cache side-channel information as the VDNN?s until the ADNN?s execution ends. 6.4.1 Obtaining DNN?s Cache Side-Channel Trace Before we talk about the details of GANRED, we describe what side-channel information about a DNN can be obtained from Prime+Probe. The discussion of this subsection holds for both the VDNN and the ADNN. During Prime+Probe, the attacker selects an arbitrary LLC set and focuses only on this set. This is because we find that each DNN that we study leaves almost the same access pattern on each LLC set. In the rest of this paper, unless otherwise noted, our discussion is focused on this very LLC set. Suppose that the DNN makes s memory accesses to this LLC set during its entire execution. Let us use tj to denote the time at which the j-th access occurs, where j is the index of the access (1 ? j ? s). Note that time is measured using CPU clock 121 cycles throughout this paper. Also, note that tj?s are not deterministic. This is because the DNN?s execution is scheduled by the computer?s operating system and the scheduling can be affected by other programs running on the same computer. Let us define Xj as the time between the DNN?s (j?1)-th and the j-th memory accesses to the targeted LLC set: Xj = tj ? tj?1 (6.1) For the sake of consistency, we define t0 = 0 to be the clock cycle at which the DNN execution starts. Due to the randomness in tj?s, Xj?s are also random variables. Let M(t) be the total number of times that the DNN accesses the targeted LLC set up to cycle t, a.k.a. M(t) = argmaxj{tj ? t} (6.2) and its expected value be m(t) = E[M(t)] (6.3) Since the above-introduced random variables can characterize the DNN?s ac- cess pattern to the targeted LLC set and the pattern is dependent on the DNN?s structure, it is desirable for the attacker to obtain information about the value of these variables. This can be done using Prime+Probe on the targeted LLC set. Specifically, let us suppose that the attacker probes the targeted LLC set every c clock cycles for a total of p probes. In each probe, every block in the targeted set is accessed. Assuming a least-recently-used (LRU) replacement policy, ideally, if the attacker accesses each block simultaneously, the LLC set will be filled 122 entirely with the attacker?s data after each probe. The attacker measures the access latency of each block of the set. Since a cache hit would be a much lower access latency than a miss, the access latency will indicate whether the access was a cache hit or miss and the two are very unlikely to be confused. Let yk be the number of LLC misses the attacker observes in the k-th probe (1 ? k ? p). Assuming that other processes running on the same computer have a negligible probability of accessing the targeted LLC set between the (k?1)-th probe and the k-th probe, the number of missed blocks in the targeted LLC set indicates how many times the DNN has accessed this LLC set between time of the last probe, (k ? 1)c, and the time of the current probe, kc. Recall from Equation 6.2 that the DNN makes M((k ? 1)c)?M(kc) accesses to the targeted LLC set in this period. Suppose the LLC is ?-way associative (i.e., there are ? blocks in the targeted LLC set). Hence yk is capped by ? and can be expressed by yk = min{?,M(kc)?M((k ? 1)c)} (6.4) Let us call Y = (y1, y2, . . . , yp) the cache side-channel trace of a DNN. Y is the cache side-channel information that can be directly observed from Prime+Probe. Due to the randomness involved in the time of each access of the DNN, repeated measurements of the cache side-channel are made so that the average of the traces will be close to the expected value. Let Y be the set of traces obtained by repeated Prime+Probe measurements. Y characterizes the memory access pattern, and hence the structure, of the DNN. 123 The above description holds for both the VDNN and the ADNN. In the rest of this paper, we use superscripts ?A? and ?V ? to denote variables of the ADNN and the VDNN, respectively, and use ?A/V ? when an expression applies to both DNNs. 6.4.2 GANRED Components The notation of some important components of GANRED framework are explained as follows. YV : the set of the VDNN?s cache side-channel traces. This serves as the ground truth of the GANRED framework. The purpose of GANRED is to find a structure of the ADNN that makes the ADNN produce identical cache side-channel traces to YV . ?: the set of estimated dimension parameters of the ADNN. Recall that the list of such parameters are listed in Table 6.1. G(?): the generator that builds the ADNN with ? and generates its cache side-channel traces as follows. (1) The ADNN is constructed with dimension param- eters ? and random weights. (2) The ADNN is executed and its cache side-channel trace is measured using Prime+Probe (i.e., in the same way that the VDNN is sampled). (3) Step (2) is repeated multiple times in order to get a set of cache side-channel traces. Hence, the output of G(?) is a set of cache side-channel traces of the the ADNN, i.e., YA. 124 D(YV ,YA): the discriminator that compares the VDNN?s traces, YV , with the ADNN?s, YA. Recall that the length of each trace in YA/V is p. For each k such ? ? A/Vthat 1 k p, let y?k be the average number of cache misses in the k-th probe of ADNN/VDNN?s cache side-channel traces. The discriminator?s output, R, is also a p-element vector, i.e., R = (r1, r2, . . . , rp). We call R the discriminator trace. rk is an indicator of how well the two traces match at the k-th probe. For this purpose, we could define rk the difference between the two average cache misses, i.e., |y?A Vk ?y?k |. However, experiment data can be noisy and make the discriminator trace R noisy. So instead, we take the two trace segments that are around the k-th probe of ADNN?s average trace and VDNN?s average trace and define rk as the root-mean-square difference of the two trace segments. This will serve the discriminator?s purpose better. The validator is another important component of GANRED. Details of the validator are introduced in Sec. 6.4.4. 6.4.3 GANRED Framework As a prerequisite of GANRED, the attacker repeatedly measures VDNN?s cache side-channel using Prime+Probe and obtains a set of traces YV . YV is then given to the GANRED framework, which takes the steps in Algorithm 6.1 to recover the victim DNN structure. In essence, GANRED determines the structure of the 125 Algorithm 6.1: GANRED Implementation input : YV ; // VDNN?s cache side-channel trace output: ? ; // ADNN?s final dimension parameters 1 Initialization: l? 1, ?? ?, kl ? 0; // l: estimated # layers in VDNN; // k: the probe at which the traces starts to diverge 2 while kl < p do 3 l? l + 1; ??4 l ? ? ; // tracking optimal parameters of one layer 5 k?l ? kl?1 ; // kl according to the current ?? 6 foreach ?l ? Sl do // Sl: the set of all feasible parameter combinations of the l-th layer 7 ??? ? ? ?l; // Append enumerated parameters ?l to existing parameters ? 8 R = (r1, r2, . . . , rp)? D(YV , G(??)); // Call the discriminator to compare traces of VDNN and ADNN 9 k?l ? argmaxhr1, r2, . . . , rh < ?; // Given a threshold ?, find how long the two sets of traces match from beginning if k? > k?10 l l then 11 if validate(?l, k ? l?1, kl) ==TRUE then // TRUE indicates a successful validation. Explained in Sec. 6.4.4 ? ? 12 kl ? kl; 13 ??l ? ?l; 14 end 15 end 16 end 17 ?? ? ? ??l ; 18 kl ? k?l ; 19 end 126 first layer before working on the second layer, determines the second before the third, and so on, until the two DNN?s traces match entirely. We explain the procedure to determine the structure of each layer of the ADNN in detail as follows. Recall that the set of parameters that need to be found for each layer is listed in Table 6.1. Notice that GANRED will work as long as the structure search space of each layer is finite. In this work, without loss of generality, we define the structure search space by the properties that state-of-the-art DNNs (e.g. AlexNet [52], the VGG family [97], and ResNet [41]) have in common: 1. If the l-th layer is a convolutional layer, then the filter width 1 ? wfl ? 11, the output depth doutl = 64?n where n is an integer and 1 ? n ? 32, and the stride of convolution ?l is 1 or 2.; 2. If the l-th layer is a fully connected layer, then the number of output neurons zout = 2nl where n is an integer and 8 ? n ? 13. Additionally, given that the user must provide input for and interpret output of the DNN in order to use it, the input and output dimensions of the VDNN will always be made available to the attacker. However, the attacker does not know the type (convolutional or fully connected) of each layer or the number of layers int the VDNN. Note that this is the same structure search space as considered by existing attacks [123, 42]. Suppose that GANRED is looking for the structure of the l-th layer, which means the first l ? 1 layers? structures have been determined. In this case, ? contains the ADNN?s parameters of the first l?1 layers. Let us use Sl to denote the 127 structure search space of layer l. The attacker enumerates through this space. For each structure within the search space, denoted as ?l, an ADNN is constructed by appending a layer with dimension parameters given in ?l to the already-determined l ? 1 layers (with parameters in ?). The generator then measures this ADNN with Prime+Probe repeatedly to obtain a set of cache side-channel traces, YA. The discriminator then compares YA with YV and obtains the discriminator trace. Details of the generator and the discriminator have been described in Sec. 6.4.2. Each element of the discriminator trace is compared to a given threshold value ?. We say that the two traces match at probe k if rk < ?. Let k ? l be the last probe before the discriminator trace rises beyond ? or ends. In other words, for any integer i within 1 ? i ? k?l, ri < ?. Recall that the attacker probes the targeted LLC set every c clock cycles. Hence the matching period stands for a time duration of k?lc. We use kl?1 to denote the # of probes that the cache traces of the VDNN match the trace of the ADNN without the l-th layer (i.e., the first l ? 1 layers of the ADNN with parameters in ?). If ?l is the structure of the l-th layer that makes the two traces match for the longest period so far, it has the potential to be the correct structure of layer l. There is one caveat to be noticed. Due to the sequential nature of DNNs, the memory accesses of one layer must all finish before the next layer?s accesses start. Therefore, if ?l has the correct parameters of the VDNN?s l-th layer, the added matching period due to the l-th layer, i.e., k?lc ? kl?1c, should be approximately the running time of the l-th layer of both the VDNN and the ADNN. However, the attacker does 128 not know which segment in the VDNN?s trace corresponds to the l-th layer. The attacker can, though, verify whether the added matching period is close enough to the theoretical running time of a layer with parameters in ?l. This technique can rule out ?l if ?l causes the ADNN?s traces to diverge from the VDNN?s traces in the middle of ADNN?s execution. This is done by the validator. The details of how the validator calculates the theoretical running time of a layer is introduced in Sec. 6.4.4. The successfully validated structure of the l-th layer that makes the two DNN?s traces match for the longest time is chosen as the final structure of the l-th layer. If the two DNN?s traces still do not match for the entire p probes, the attacker uses the same process to find the (l + 1)-th layer. If the p probes have all matched, the ADNN?s dimension parameters ? is considered as the result of the attack. 6.4.4 Validating Reverse Engineered Parameter Combina- tions During the reverse engineering of the l-th layer, if a structure denoted by ?l makes the ADNN?s traces and VDNN?s traces match for the longest, the validator need to be invoked in order to verify whether ?l is a ?false positive? solution. Specifically, the validator will find whether the ADNN?s traces deviate from the VDNN?s in the middle of ADNN?s execution, which should note be the case for the correct parameters of layer l. If the ADNN?s traces match the VDNN?s for kl?1 probes without layer l and k?l probes with layer l, then layer l (with parameters ?l) 129 makes the matched period increase by (k?l ? kl?1)c clock cycles. This suggests that, if ?l contains the correct dimension parameters of the l-th layer, the running time of the l-th layer is approximately (k?l ? kl?1)c clock cycles. The validator estimates the theoretical running time of a layer with parameters ?l based on the following observation: the execution time of a layer is linear in both its number of multiply-and-accumulate (MAC) operations and the number of cache misses. 1 Hence, the validator uses a linear regression analysis to estimate the running time of a layer with parameters ?l. Let t? be the estimated running time. t? is then compared to the increase in the length of matched period of the two DNNs? traces, (k?l?kl?1)c. If the difference is below a certain threshold, then ?l is accepted. Otherwise, ?l is deemed a ?false positive? and rejected. The validator proves to be an essential component of GANRED without which the correct structure of the VDNNs cannot be found. In the rest of this subsection, we present the details of the linear regression process to estimate a layer?s running time. 6.4.4.1 Convolutional (Conv) Layers The operation of a Conv layer is illustrated in Fig. 6.1. When a filter is convolved with the input feature map (IFM), with each step that the filter moves, a new output neuron is computed. We assume that only the new input neurons (i.e., that were not used in the last inner product) will result in cache misses. The number 1In Sec. 6.4.4, the notion of ?cache misses? refers to the entire cache, not just the LLC set selected by the Prime+Probe attack. 130 of such new input neurons is fl ? dinl ? ?l, where fl is the filter width, din is the IFM depth, and ?l is the convolution stride (see Table 6.1). We calculate the theoretical number of cache misses of the Conv layer, denoted as uconv(?), as the sum of two components: (a) the total number of ?new input neurons? as described above for evaluating the entire OFM, and (b) the cache misses when each input neuron and weight is used for the first time. uconv(?l) =(((P + 1) 2wout,2 ? 1)f dinl l l l ?l (6.5) +(f 2l + w in,2 l )d in l )d out l Since a cache miss results in significantly longer latency than a cache hit, the number of cache misses will impact the Conv layer?s running time. Let us use vconv(?l) to denote the # of MAC operations of a Conv layer, which can be given by vconv(?l) = 2(P 2 l + 1) w out,2f 2l l d indoutl l (6.6) In order to show that the running time of a Conv layer?s running time is linear in both the # of cache misses and the # of MAC operations, we conduct the following experiment. We take a population of Conv layers that is within our structure search space and measured the running time of these layers. A linear regression analysis is then conducted to verify the linearity. The regression shows that the linear scores of both # of cache misses and # of MAC operations to the running time are greater than 0.99 (1.0 is perfectly linear). In Fig. 6.3, we plot the layers with equal # of MAC operations on the same line and show that a Conv layer?s running time linearly increases in the theoretical # of cache misses. Let t? = A?convuconv(?l) + B? convvconv(? convl) + C? be the regression result equation. 131 1e10 1.6 # MAC ops = 4.62 * 108 1.4 # MAC ops = 9.25 * 10 8 # MAC ops = 1.85 * 109 1.2 1.0 0.8 0.6 0.4 0.2 0 10000 20000 30000 40000 50000 theoretical # cache misses Figure 6.3: The linear relationship between Conv layer running time and theoretical cache misses 1e9 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 1 2 3 4 5 MAC operations in a fully-connected layer 1e8 Figure 6.4: Linear regression on # MAC operations and trace length of an FC layer 6.4.4.2 Fully Connected (FC) Layers In an FC layer, since there is no reuse of weights in the computation of different output neurons, the number of MAC operations is proportional to the theoretical number of cache misses. Therefore, we only look for the linear relationship between an FC layer?s running time and the MAC operations. The # of MAC operations can be given by vFC(?l) = 2z inzoutl l (6.7) 132 Running time of an FC layer (cycles) Running time of a Conv layer (cycles) Similar to the analysis for Conv layers, we select a population of FC layers from the feasible structures and measure their running time. Then, a linear regression is conducted on the # of MAC operations to the running time. A clear linear relationship can be observed from Fig. 6.4 and we use t? = A?FCvFC(?l) + B? FC to denote the regression result. 6.4.5 Mathematical Justification of GANRED As we have described in Sec. 6.4.3, GANRED reverse engineers the DNN in a layer-by-layer manner. It must find the correct structure of the current layer before moving on to the next layer. To this end, we intuitively assumed that when the ADNN (with lA layers) has the same structure as the first lA layers of the VDNN (which has lV layers in total, lV ? lA), then the ADNN?s cache side-channel traces YA should match with the VDNN?s traces YV before the end of ADNN?s execution. We justify this premise in this subsection. The following is assumed about a DNN?s memory access: 1. The DNN layers are executed sequentially and hence the memory accesses of a layer must be completed before the next layer?s memory accesses begin. A/V 2. Each Xj , i.e., the time between the ADNN/VDNN?s (j ? 1)-th and j-th access to the targeted LLC set, is subject to a Gaussian distribution. For the VDNN, XV ? N (? , ?2j j j ) where 1 ? j ? sV . If ADNN has the same structure as VDNN?s first lA layers, we also have XA ? N (? , ?2j j j ) where 1 ? j ? sA. 133 A/V ? 6 ? A/V A/V A/V3. Each Xj is independent, i.e., for any 1 j1 = j2 s , Xj and X1 j2 are independent. If ADNN has the same structure as the first lA layers of VDNN, the expected time at ?which ADNN?s last access to the targeted LLC set would occur can be expressed assA j=1 ?j. Our objective is then to prove that, in this case, the?difference between theA cache side-channel traces of ADNN and VDNN before time sj=1 ?j can be upper bounded by a small value. In order to prove this, we first bound t?he differenceA between the two DNNs? expected # of memory accesses up to time sj=1 ?j, i.e., |mV (t)?mA(t)|, as follows. ? A Theorem 6.1. If t < s ? , then |mVj=1 j ?? (t)?m A(t)| can be u?pper bounded by? ? ? ( ? )k 2sV ? k ?t+2 ? ? ?j=1 ?jj=1 ?j kU(t) = ? e 2 ?2 ?j=1 j ? (6.8) 2? k=sA+1 Proof of Theorem 6.1. We first find the expression of mV (t)?mA(t) in terms of the CDF of between-access time. s?A/V mA/V (t) =E[MA/V (t)] = j ? Prob[M(t) = j] s? j=1A/V = j ? (Prob[M(t) ? j]? Prob[M(t) ? j ? 1]) s?j=1A/V = Prob[M(t) ? j] j=1 134 MA/V (t) ? j?indicates that the j-th memory access occurs no later than time t,A/V A/V i.e., tj = j i=1Xi ? t. Because all X?s are subject to independent Gaussian distributions as described above, we have ?j ?j A/V tj ? N ( ? 2i, ?i ) i=1 i=1 Therefore, the probability of MA/V (t) ? j can be calculated via the CDF of the above Gaussian distribution: ? ( ? )j 2?x?t ? ? ?i=1 i1 j 2 Prob[MA/V (t) ? j] = ? ? e 2 ?i=1 i dx 2? j ?2 ??i=1 i And hence mV (t)?mA(t) ?sV = Prob[M(t) ? j] j=sA+1 ? ( ) (6.9) 2 sV = ? ? ?j ?1 t ? ? x?? ?i=1 ij e 2 ? 2 i=1 i dx j A 2 ??j=s +1 2? i=1 ?i Since each integral is positive, we?only need to prove m V ?(t) ? m A(t) < U(t). In A Theorem 6.1, we specify that t < sj=1 ?j and. Let h = j j i=1 ?i ? t. Due to the symmetry of the probability density of Gaussian distributions, we can rewrite the above expression as mV (t)?mA(t) ? ( )2sV ? ? ?j x? j ?? ? i=1 i1 i=1 ?i?hj ? ? ?j2 ?2= e i=1 i dxj 2 ?? j=sA+1 2? i=1 ?i ? ( ? )j 2sV ? ? ?1 ? ? x? ? ? i=1 i = ? ?je 2 ?2i=1 i dxj j=sA+1 2? j i=1 ? 2 i=1 ?i+hji 135 ? ?x? j ? In each integral, since x > ji=1 ?i+hj and h i=1 i j > 0, we have > 1. Therefore,hj each integral can be upper bounded by ? ( ? )2? ?x? j? ?? i=1 i? je 2 ?2i=1 i dx ? ji=1 ?i+hj ? ( ? )j 2? j ?x?? ?x? ? ? i=1 i < ? i=1 i je 2 ?2i=1 i dxj ? +h ( h?i=1 i j j ? )j 2 j ? ?? i=?1 i?tj 2 = ?2i ? e 2 ?i=1 i i=1 Therefore, we can upper bound mV (t?)?m A(t) by ? ? ( ? )j 2sV j ? ?t?2 ? ? i=?1 i mV (t)?mA(t) < i=1 i je 2 ?2i=1 i = U(t) 2? j=sA+1 Hence proved. In order to illustrate that U(t) is an extremely value in a straightforward manner, we estimate the parameters from a VDNN (AlexNet) and the ADNN with the same structure as the first 2 layers of the VDNN. These parameters include sA, sV , and ?j. ?j is assumed to be 20% of ?j. Based on these parameters, U(t) is plotted for the period close to the end of ADNN?s execution as shown in Fig. 6.5. ?We could only plot this period because U(t) monotonically increases in t asA t < sj=1 ?j and its value is so small before the plotted period that it is smaller than the smallest positive number that a floating point number can represent. Recall that yk is the number of observed LLC misses in the k-th probe and was defined in Equation 6.4. yk stands for the cache side-channel information that GANRED uses. The k-th probe will occur before the ADNN?s last access to the 136 Figure 6.5: U(t) vs. t given typical parameters in Equation 6.8. ? A targeted LLC set if k < sj=1 ?j/c. The following theorem indicates that, if the ADNN has the same structure as the first lA layers of the VDNN, then the?two DNNsA should have very close cache side-channel traces as the probe index k < sj=1 ?j/c. ? A Theorem 6.2. If k < sj=1 ?j/c, then |E[yVk ]?E[yAk ]| is upper bounded by U(kc). Proof of Theorem 6.2. By Theorem 6.1, we know that mV (kc) ?mA(kc) < U(kc). Note thatmV (kc)?mA(kc) can be expanded in the following way (note thatmA(0) = mV (0) = 0): U(kc) > mV (kc)?mA(kc) ?k ?k = (mV (ic)?mV ((i? 1)c))? (mA(ic)?mA((i? 1)c)) ?i=1k ( i=1 ) = (mV (ic)?mV ((i? 1)c))? (mA(ic)?mA((i? 1)c) i=1 A/V Let us first define q A/Vk = m (kc) ?mA/V ((k ? 1)c). We can continue the above equation by ?k mV (kc)?mA(kc) = (qVi ? qAi ) < U(kc) (6.10) i=1 137 Note that qV ? qA > 0 given any k. This is because qV ? qA = [mV (kc)?mAk k k k (kc)]? [mV ((k?1)c)?mA((k?1)c)] and, from Equation 6.9, it is clear that mV (t)?mA(t) monotonically inc?reases in t. Therefore, Equation 6.10 suggests that for any k thatA satisfies 1 ? k < sj=1 ?j/c, we have qVk ?qAk < U(kc), which is a necessary condition that their summation is smaller than U(kc). A/V A/V yk = min{?, qk }. Since qV A V Ak > qk , yk ? yk . Hence we only need to prove E[yVk ]? E[yAk ] < U(kc). E[yVk ]? E[yAk ] = E[yVk ? yAk ] =(? ? ?)Prob[yV Ak ? ?, yk ? ?]+ (? ? yAk )Prob[yVk ? ?, yAk < ?]+ (yV ? yA)Prob[yV < ?, yAk k k k < ?] <0 + U(kc)Prob[yAk < ?]